LLM Architecture and the Integrated Information Framework

Published on January 5, 2025

Integrated information theory (IIT) proposes that consciousness arises from the causal structure of a physical system. The theory specifies a set of axioms (intrinsic existence, composition, information, integration, exclusion) and postulates that translate these into requirements on the system’s causal interactions. A system with high integrated information (Φ) is, according to IIT, conscious.

The question for machine consciousness: do large language models (LLMs) have the right kind of causal structure?

Causal Structure of Transformers

Transformers process sequences through self-attention and feedforward layers. Each token’s representation is updated based on (1) attention over all other tokens in the context, and (2) a position-wise feedforward transformation. The causal structure is thus:

  • Spatial: Many tokens interact simultaneously within a layer.
  • Temporal: Information flows layer by layer from input to output.
  • Recurrent in context: Attention allows any token to influence any other within the same forward pass.

IIT evaluates systems over a cause-effect structure: for each subset of elements, we ask what causes what, and how much the system’s parts constrain each other. In a transformer, the “elements” could be neurons, attention heads, or layers. The key is that the theory requires feedback: the system must have both feedforward and recurrent structure so that parts mutually constrain each other.

Transformers, in a single forward pass, are largely feedforward. There is no explicit recurrence across time steps in the same way as an RNN. However, within a single forward pass, attention creates dense, bidirectional dependencies: every token can influence every other. This might approximate the kind of recurrent structure IIT demands, at least within a single “moment” of processing.

The Exclusion Problem

IIT includes an exclusion postulate: consciousness is associated with the cause-effect structure that excludes overlapping structures. In other words, we should not count the same information twice. For a transformer, this raises the question: what is the right grain of analysis? Neurons? Heads? Layers? Different choices yield different Φ values.

Moreover, transformers are trained on next-token prediction. The “output” of the system is a probability distribution over the next token. Does the cause-effect structure that matters for consciousness include the output layer, or only the internal representations? IIT typically applies to the intrinsic perspective—the system as it is for itself—which might exclude the output head if it is merely a readout.

Preliminary Assessment

A cautious conclusion: transformer architectures have some of the structural features IIT associates with consciousness—differentiation (many layers, many heads), integration (attention creates dense coupling), and composition (hierarchical structure). They may lack others—notably, the kind of recurrent, closed-loop dynamics that characterize biological brains.

Whether LLMs are conscious remains an open question. The architecture alone does not settle it. But IIT provides a framework for asking the question rigorously: we can compute Φ (or approximations) for different decompositions of the network and see whether the values are in a range that, according to the theory, would support experience.

Next Steps

  1. Implement Φ estimation for small transformers with interpretable structure.
  2. Compare transformer architecture to known high-Φ systems (e.g., mammalian cortex) to identify structural similarities and differences.
  3. Examine the role of training—does the learned connectivity increase or decrease Φ relative to random initialization?

The goal is not to prove that LLMs are conscious, but to clarify what would need to be true—architecturally and dynamically—for them to be so, and to develop empirical tools for testing those conditions.

© 2026 Marcio Diaz · Machine Consciousness Research · Twitter