Blog Home

On the Epistemic Status of Generative Outputs

I keep coming back to the same question whenever I work with large language models: what, exactly, is the epistemic status of what they produce? Not whether it is useful, or convincing—but whether it should be understood as knowledge in any meaningful philosophical sense. This question matters, because we increasingly treat generative outputs as if they participate in the same epistemic economy as human claims. They are cited, trusted, debated, and sometimes acted upon. If we don’t clarify what kind of thing an LLM output actually is, we risk confusing fluency with truth.

At a baseline, I think we need a conditional notion of objectivity for generative systems. An output from an LLM can be considered objectively true if and only if it satisfies the linguistic, factual, and societally accepted standards of truth within a particular domain. The condition is strict, and intentionally so. The truth does not originate in the model; it is inherited from an external epistemic framework that predates and constrains the model’s training.

Consider a simple example. If an LLM states that water is composed of two hydrogen atoms and one oxygen atom, we are comfortable labeling that statement objectively true. But the truth of the statement has nothing to do with the internal mechanics of the model. It is true because it coheres with a well-established scientific ontology, one that has been experimentally validated, pedagogically transmitted, and socially integrated. The model is not discovering this fact; it is reproducing a consensus that already exists. In this sense, the model functions as a high-dimensional retrieval and rearticulation mechanism, not as an epistemic agent.

The picture changes immediately when we move from reproduction to novelty. When an LLM generates a new hypothesis, a novel explanation, or an unfamiliar theoretical structure, the epistemic status of that output is fundamentally different. Such an output is not objectively true by default, regardless of how plausible or well-formed it appears. It becomes truth-apt only if it is subjected to external validation—evaluation by domain experts, empirical testing where applicable, and eventual legitimation by the relevant intellectual community. Until that process occurs, the output remains a linguistic artifact that resembles knowledge without yet being verified as such.

This distinction leads to an important but often blurred conclusion: large language models are capable of producing objective truths, but they are not capable of originating them on their own. The truths they generate are parasitic on prior human epistemic labor. This their fundamental training and inferential constraint. Treating LLMs as independent discoverers of truth confuses the surface form of knowledge with the conditions under which knowledge is produced.

At this point, the discussion naturally turns to probability. Every token an LLM outputs is selected because it is, given the model’s training and context window, the highest-probability linguistic continuation. This raises a deeper philosophical question: if an output is probabilistically optimal in linguistic space, does that confer any claim to objective truth? In other words, is the model’s notion of “likelihood” aligned with the human notion of “truth”?

I think the answer is no not in any robust sense. LLMs are grounded in linguistic reality, not ontological reality. Language is not a transparent window onto the world. It is a representational system shaped by convention, necessary information compression, and historical relevance. It allows humans to coordinate, reason, and infer, but it does not exhaust the structure of the things it describes. Linguistic coherence is therefore a necessary but insufficient condition for truth.

Because of this, LLM outputs do not directly track ontological truth. They approximate it by reproducing the statistical contours of how humans talk about the world. When an LLM appears to “understand” something, what it is actually doing is navigating a space of linguistic regularities that correlate—sometimes very strongly—with human knowledge. The correlation can be extremely useful, but it is still a correlation. The model does not have access to the substrate that gives those regularities their truth conditions.

This limitation becomes especially clear when we talk about discovery. LLM-based systems can absolutely contribute to applied research. They can surface connections, generate candidate ideas, and assist humans in exploring large conceptual spaces more efficiently. But discovering a new ontological truth—introducing a genuinely new mathematical object, proving a theorem that reorganizes an existing field, or formulating a theory that reshapes our understanding of some domain—requires grounding beyond language alone. When such discoveries occur with LLMs in the loop, they are better understood as cases of human-guided exploration aided by stochastic recombination, not as autonomous epistemic achievements by the model. I like the example of Terrance Tao using copilot to formalize a proof. He laid the groundwork and the LLM sped up the follow-through.

In that sense, the generation of new ontology by an LLM is largely a matter of chance filtered through human judgment. The model can propose. Humans must dispose. Recognizing this boundary is not an attempt to diminish the power of generative systems. On the contrary, it makes their role clearer. LLMs are extraordinarily capable linguistic instruments, but they do not bear the burden of truth on their own. That burden remains with the human communities that define, test, and sustain what counts as knowledge in the first place.