Hard Substrates, Soft Evidence
Philosophy / methodology · Revision in progress
Read the full paper on PhilArchive →
The computational substrate fully determines behavioral output, but behavior radically underdetermines the substrate. You can't read emergence backward from outputs.
The argument
The LLM cognition debate is methodologically stuck. This paper diagnoses four technical errors sustaining the impasse:
Skeptics misdescribe computation. The "stochastic parrot" metaphor is wrong about what happens at inference. It's geometric transformation in high-dimensional space, not text stitching. Interpretability findings (induction heads, arithmetic circuits) demonstrate discoverable structure that doesn't reduce to surface statistics.
Skeptics misdescribe training. "Statistical learning from text" describes GPT-2 era, not current systems. Post-training (RLHF, DPO, tool use, environment feedback) shifts from descriptive to normative optimization.
Optimists over-infer from behavioral evidence. Broad behavioral competence is consistent with multiple internal organizations. Behavioral evidence can't establish what optimists claim because it's the wrong kind of evidence. 11.Aside — This is a dimensionality problem: behavior is a low-dimensional projection of a high-dimensional internal state. Many internal organizations produce the same behavioral output.
Careful work is architecture-bound. Most generalizations about "LLMs" are actually about transformers. The reversal curse (solved by diffusion architectures) proves some limitations are architecture-specific, not about learning or cognition generally.
The proposal
A four-source methodology for studying LLM cognition: behavioral evidence (necessary but insufficient), internal probing (accesses substrate directly), causal intervention (shows structure causally implicates behavior), and cross-architectural replication (distinguishes architecture-specific from general). Convergent findings across all four constrain the space of tenable positions. 22.Aside — The four-source methodology is essentially Lakatos's research programme methodology adapted for empirical AI: behavioral evidence is the "novel predictions," internal probing is access to the theoretical core, causal intervention is experimental test, and cross-architectural replication distinguishes the hard core from the protective belt.