Information · beneath matter
If mathematics says relations are deeper than objects, information says distinctions are deeper than relations.
Kernel
Information is what is left when every physical instantiation of a distinction has been stripped from the description. Claude Shannon's 1948 paper at Bell Labs is the founding act of this layer: a quantification of how much distinction a channel can carry, independent of whether the channel is wire, photon, neural axon, or DNA strand. The bit is to information what the meter is to length. Once you have it, the same vocabulary describes a thermostat, a genome, a sentence, a brain, a galaxy. The radical hypothesis — Wheeler's "it from bit" — is that information is not merely a useful description of the universe but its substrate: that matter is a particular pattern of distinctions, and the distinctions are more fundamental than what they are made of.
Shannon, entropy, and the channel
Shannon's 1948 paper does three things at once. It defines a bit. It quantifies how much information a noisy channel can carry (the Shannon-Hartley theorem). And it shows that any source of distinctions has an entropy — a measure of unpredictability that is mathematically identical to Boltzmann's thermodynamic entropy in form. The identity is not coincidence. It is the first hint that information is not metaphorically like thermodynamics but is, in some technical sense, the same kind of quantity. Every subsequent information-physics result — Maxwell's demon, Landauer's limit, the holographic principle, black-hole entropy — is downstream of this identity.
DNA, language, and replication
The structure of DNA (Watson–Crick, 1953) showed that life is operationally a sequence of distinctions — four bases, encoded discretely, copied by molecular machinery. The genetic code is informational in the strict Shannon sense: a redundant, error-correcting encoding optimized over four billion years for fidelity in a noisy chemical channel. Cultural transmission (Dawkins's memes, the linguistic tradition from Saussure forward) is a second informational layer running on top of the biological one. Language is the protocol by which information traverses generations without requiring genetic encoding. The two layers compete and cooperate for the same evolutionary substrate.
It from bit
John Archibald Wheeler's 1989 lecture proposes the strongest form of the information-substrate hypothesis: every physical entity — every it — derives its existence from yes-no answers to questions, from bits. The proposition has empirical bite. The Bekenstein bound (1972) says that the maximum information a region of space can contain is proportional to its surface area, not its volume — a property utterly contrary to classical intuition but consistent with the universe being holographic. The Maldacena correspondence (1997) gives a working example: a gravitational theory in 5-dimensional anti-de-Sitter space is mathematically equivalent to a conformal field theory on its 4-dimensional boundary. Information seems to live one dimension lower than the geometry it generates. If true at full generality, the universe is more like a hologram than a volume.
Compression and intelligence
If information is the substrate, then intelligence is the procedure that finds the shortest description of an information stream — compression in the Kolmogorov sense, prediction in the practical machine-learning sense. The 2020s discovery that large neural-network language models trained on next-token prediction develop emergent abilities at scale is, on this reading, an empirical instance of Marcus Hutter's AIXI thesis (2003) and Solomonoff's induction (1964): the universal intelligence is the optimal compressor of any sequence of distinctions. The hypothesis predicts that any sufficiently large information-processing system will exhibit increasingly general cognitive properties as a side effect of getting better at compression. The 2020s frontier-AI scaling laws look like that prediction coming true.
Open questions on this layer
- — Is information ontologically prior to matter?
- — Why does the universe respect a holographic bound?
- — Is intelligence just optimal compression?