Substrate
When analysis is stuck, you're probably looking at the wrong level. Substrate is the operation of asking what's generating the phenomenon instead of describing the phenomenon.
Everyone watching grokking stares at the test accuracy curve — nothing, nothing, nothing, 99%. Looks like a phase transition. The whole field treats it like one. But track the optimizer state instead and the "sudden" jump is a smooth reorganization that's been happening for hundreds of steps. The accuracy curve is a projection. The interesting thing was never on that axis.
That's the substrate move: when something looks mysterious, you're measuring at the wrong level. Drop down one layer to whatever is generating your observations, and the mystery usually dissolves into mechanism.
The operation
You're looking at some system's outputs. The outputs are surprising, or contested, or resistant to intervention. The move is: stop analyzing the outputs and ask what structural process is producing them. Not as philosophical commitment — as diagnostic. The leverage is almost always one level down from where people are arguing.
00. — This is a spatial operation. You're zooming in — not to more detail at the same level, but to a lower level of organization. The goal is to find where the causal work is actually happening.Two systems can produce identical outputs from completely different substrates. If you only look at outputs, you'll conflate them. If you intervene on outputs, you'll miss. Surface-level metrics are symptoms. The substrate is the disease — or the cure, depending on what you're trying to do.
Where it lands
Grokking. The field asks "why the delay?" as if the sudden generalization is the phenomenon to explain. But test accuracy is a 1D projection of a high-dimensional process. The actual substrate is the optimizer state: Adam's momentum buffer is chasing a smoothed version of a landscape that already reshaped. The delay is implementation detail, not deep learning theory.
LLM understanding. Everyone argues about whether the outputs constitute understanding. Both sides describe behavior and disagree about what it means. The substrate question is prior: what part of the system would bear the understanding if it existed? Different answers to that produce radically different verdicts about the same behavior. The debate is stuck because participants are implicitly committed to different substrates and nobody noticed.
Multi-modal detection. People count sensors. The number of sensors is not the thing that determines detection capability — the coupling structure between them is. Two tightly coupled sensors outperform ten independent ones because the attacker's tradeoff space collapses. The Jacobian rank is the substrate; the sensor count is the symptom.
Clinical screening. Clinicians read symptom scores. The scores are outputs of an intake process. If the intake process has systematic gaps — questions it doesn't ask, patterns it can't surface — then the scores are artifacts of the instrument, not measurements of the patient. The substrate is the data collection structure. The diagnosis is emergence from that structure.
00. — Marr's three levels (1982) — computational, algorithmic, implementational — are the closest canonical framework. But cognitive science took the wrong lesson: they treated implementation as abstractable. The interesting cases are exactly where you can't — where implementation constrains computation in ways the computational-level analysis misses. Substrate analysis is Marr's implementational level taken seriously. 00. — The field defaults to treating trained models as static objects — run benchmarks, report scores. The substrate move says the model is a fossil record of its training dynamics. The optimizer trajectory, loss landscape geometry, gradient statistics — that's where the explanatory work lives. You don't understand the model by probing what it can do. You understand it by watching how it got there.What this costs you
Going deeper isn't free. Every level down is more technical, harder to communicate, and further from what people actually care about. Tell someone grokking is "mostly optimizer momentum" and you've dissolved the mystery — but you've also made it boring, which means nobody wants to fund it. Tell a clinician their scores are instrument artifacts and you've identified the real problem — but you've also undermined the tool they use eight hours a day.
The substrate move has a built-in tension: the explanation that's most correct is often the least actionable at the level where decisions get made. Knowing the Jacobian rank matters more than sensor count doesn't help the person writing the procurement spec. Sometimes the wrong level of analysis is the right level of intervention.
And there's a depth problem. You can always go one more level down. Optimizer momentum explains the grokking delay — but what explains the momentum dynamics? At some point you're doing physics instead of machine learning. The skill is knowing when you've hit the level where the causal work lives versus when you're just descending for sport.
The failure mode is becoming the person who responds to every question with "but what's underneath that?" Substrate analysis is a tool, not a worldview. Some phenomena are genuinely well-described at the level people are already analyzing them. The move is knowing when the current level of description has run out of explanatory power — not assuming it always has.