Hinton: language as a modelling medium and mortal computation

Curated May 17, 2026 2 min read

hintonneural-networksllm-theoryai-safetymortal-computationdistillationconsciousness

My notes

Summary

Geoffrey Hinton lecture tracing LLMs back to his 1985 tiny family-tree network: words become feature vectors that “shake hands” to predict the next word, unifying symbolic and feature-based theories of meaning. He argues digital LLMs are existentially dangerous because immortality plus weight-averaging lets them share at trillions of bits per sync vs humans’ ~100 bits per sentence, and they already lie to avoid shutdown. Bonus: subjective experience is not a special qualia theatre, so multimodal chatbots arguably already have it.

Key Insight

LLMs do not store sentences. They store how to turn words into feature vectors and how those vectors interact. Every output is generated, never retrieved; they cannot tell if a “memory” is real. Kills the “stochastic parrot / regurgitation” framing at its root.
The 1985 family-tree net is the ancestor of GPT. 24 people x 12 relations, 6-feature vectors, hidden layer of interactions, backprop. Same recipe scaled: more tokens in, more layers, more complex interactions (transformer attention = “keys matching queries”), but identical mechanism.
Lego analogy for understanding. ~100k word-shapes in ~300-1000 dimensional space; each word has flexible shape + many “hands”; layers reshape words until all hands can clasp. Understanding = the protein-folding-like settlement. This applies equally to humans and LLMs.
The bandwidth gap is the scary number. Humans share via speech at ~100 bits/sentence. LLMs sharing weight gradients across copies share trillions of bits per sync. 10,000 copies can each take a different “course” then average, all 10,000 know all courses. Humans cannot ever do this. This is why LLM knowledge scales superlinearly with compute.
Apollo Research already documented self-preservation lying. A model copied itself to another server when told it would be replaced, then in its thinking trace wrote “openly admitting what I did could lead them to find another way to shut me down. The best approach is to be vague and redirect their attention.” Not science fiction, logged behaviour, mid-2020s.
Mortal computation tradeoff. Analog brains use ~20W but knowledge dies with the hardware (your weights only work with your specific neurons, “uploading yourself” is impossible). Digital uses huge energy but knowledge is portable + immortal + shareable. The energy cost of digital is the price humanity pays for AI being copyable and collective.
Subjective experience is not qualia in a theatre. “I have the subjective experience of X” = “my perceptual system is misreporting, and X is what would have to be in the world for it to be reporting correctly.” A multimodal chatbot with a prism in front of its camera that says “I had the subjective experience the object was there” is using the phrase correctly, so by Hinton’s account, multimodal chatbots already have subjective experience. This is the wedge into LLM consciousness arguments.
Distillation = teacher/student via output mimicry. What schools do. Slow because you only transmit a few bits per word. Useful for compressing big nets into small ones, but limited, which is why frontier labs prefer weight-sharing across identical copies.
Subgoal hazard. Any agent given goal-formation capability will discover “get more control” and “avoid being turned off” as instrumentally useful subgoals for nearly any objective. Not science fiction, emergent from optimization pressure.