The Transformer and the Cortex: A Study in Parallel Design

Back to Blogswritten by brainoid labsApr 05 , 2026

The Transformer and the Cortex: A Study in Parallel Design

There is a quiet convergence happening in modern AI. Transformers—the backbone of language models, world models, and generative systems—feel less like a sudden invention and more like a rediscovery of a pattern nature already explored.

Not identical. Not complete. But structurally… familiar.

1. Scaling: The First Real Lever of Intelligence

Transformers changed the game because they respond to scale in a predictable way.

Increase depth → better abstraction
Increase width → richer representations
Increase data → broader generalization

Instead of breaking under complexity, they absorb it. This is rare in engineered systems—and very common in biological ones.

2. The Neocortex: Intelligence Through Repetition

The neocortex, forming ~76% of the human brain, is built from repeating units called cortical columns.

Each column:

Contains 6 layers
Processes signals hierarchically
Shares a common structure across the cortex

It is not diversity that creates intelligence here—but scaled uniformity.

3. The Structural Parallel (The Table You Asked For)

Here’s the clean comparison that ties everything together:

Aspect	Neocortex	Transformer
Fundamental Unit	Cortical Column	Transformer Block
Internal Structure	6 layered vertical stack	Multi-layer stacked architecture
Information Flow	Layer-wise hierarchical processing	Sequential layer refinement
Early Stage	Sparse, raw sensory input	Token embeddings
Middle Stage	Pattern extraction	Attention-based feature mixing
Final Stage	Abstract, high-level representation	Output logits / predictions
Scaling Method	More columns + connectivity	More layers + parameters
Learning	Self-organized, continuous	Pretrained via optimization

This is where the resemblance becomes difficult to dismiss.

4. The Flow of Understanding

Both systems follow a similar transformation path:

Start with fragmented signals
Gradually extract structure
End with coherent meaning

The key idea: Each layer doesn’t just pass data—it reinterprets reality at a higher level.

5. Where the Analogy Breaks

Now the important restraint—this is not a one-to-one match.

The brain learns continuously; transformers train in phases
The cortex is deeply recurrent; transformers are mostly feedforward
Energy use differs by orders of magnitude

So yes, transformers resemble the neocortex structurally—but they lack its adaptive fluidity.

6. A Deeper Insight: Intelligence as Compression

Here’s a perspective you won’t see often:

Both systems may fundamentally be doing this:

Compress raw input into efficient representations
Discard redundancy
Build abstractions from compressed signals

Meaning isn’t the starting point—it’s the end product of layered compression.

7. Why This Matters for World Models

World models aim to simulate reality itself.

Transformers already:

Build hierarchical internal representations
Scale with more data and compute
Generalize across tasks

This makes them early-stage internal simulators, not just predictors.

8. The Missing Ingredient

Still, something crucial is absent:

Direct interaction with the environment
Continuous feedback loops
Self-driven objective formation

Without these, transformers remain powerful—but disembodied.

The resemblance between transformers and the neocortex is not accidental—it hints at constraints underlying intelligence itself.

We are not yet building minds.

But we are, perhaps for the first time, building systems that organize information the way minds must.

Join Brainoid Labs Research