The Transformer and the Cortex: A Study in Parallel Design

Back to Blogswritten by brainoid labsApr 05 , 2026

The Transformer and the Cortex: A Study in Parallel Design

There is a quiet convergence happening in modern AI. Transformers—the backbone of language models, world models, and generative systems—feel less like a sudden invention and more like a rediscovery of a pattern nature already explored.

Not identical. Not complete. But structurally… familiar.

1. Scaling: The First Real Lever of Intelligence

Transformers changed the game because they respond to scale in a predictable way.

  1. Increase depth → better abstraction
  2. Increase width → richer representations
  3. Increase data → broader generalization

Instead of breaking under complexity, they absorb it. This is rare in engineered systems—and very common in biological ones.

2. The Neocortex: Intelligence Through Repetition

The neocortex, forming ~76% of the human brain, is built from repeating units called cortical columns.

Each column:

  1. Contains 6 layers
  2. Processes signals hierarchically
  3. Shares a common structure across the cortex

It is not diversity that creates intelligence here—but scaled uniformity.

3. The Structural Parallel (The Table You Asked For)

Here’s the clean comparison that ties everything together:

Aspect Neocortex Transformer
Fundamental Unit Cortical Column Transformer Block
Internal Structure 6 layered vertical stack Multi-layer stacked architecture
Information Flow Layer-wise hierarchical processing Sequential layer refinement
Early Stage Sparse, raw sensory input Token embeddings
Middle Stage Pattern extraction Attention-based feature mixing
Final Stage Abstract, high-level representation Output logits / predictions
Scaling Method More columns + connectivity More layers + parameters
Learning Self-organized, continuous Pretrained via optimization

This is where the resemblance becomes difficult to dismiss.

4. The Flow of Understanding

Both systems follow a similar transformation path:

  1. Start with fragmented signals
  2. Gradually extract structure
  3. End with coherent meaning

The key idea: Each layer doesn’t just pass data—it reinterprets reality at a higher level.

5. Where the Analogy Breaks

Now the important restraint—this is not a one-to-one match.

  1. The brain learns continuously; transformers train in phases
  2. The cortex is deeply recurrent; transformers are mostly feedforward
  3. Energy use differs by orders of magnitude

So yes, transformers resemble the neocortex structurally—but they lack its adaptive fluidity.

6. A Deeper Insight: Intelligence as Compression

Here’s a perspective you won’t see often:

Both systems may fundamentally be doing this:

  1. Compress raw input into efficient representations
  2. Discard redundancy
  3. Build abstractions from compressed signals

Meaning isn’t the starting point—it’s the end product of layered compression.

7. Why This Matters for World Models

World models aim to simulate reality itself.

Transformers already:

  1. Build hierarchical internal representations
  2. Scale with more data and compute
  3. Generalize across tasks

This makes them early-stage internal simulators, not just predictors.

8. The Missing Ingredient

Still, something crucial is absent:

  1. Direct interaction with the environment
  2. Continuous feedback loops
  3. Self-driven objective formation

Without these, transformers remain powerful—but disembodied.

The resemblance between transformers and the neocortex is not accidental—it hints at constraints underlying intelligence itself.

We are not yet building minds.

But we are, perhaps for the first time, building systems that organize information the way minds must.

Join Brainoid Labs Research