Why LLMs are the future of critical thinking

Understanding the process of decision-making is key to realizing the potential of large language models.

There are only two ways of making decisions.

The first way is to use trust (we trust that our senses are telling us true things, we trust that people are honestly representing themselves, we trust that our feelings point towards truth, and so on). This was the only system available for decision making until 2,500 years ago. Then Thales (an Ionian Greek) came up with a second method – logic.

Thales developed this system by leaning into his understanding of Geometry (geometry started 12,000 years ago when agriculture first began – it literally means “land measurement”). Geometry was the first system to use the concept of “the proof”, and here’s why it is useful.

When you own land you may want to leave that land to your kids one day. If you say to your kids that “your land is all the stuff between that rock and that tree over there” then someone might come along and move “that rock” in a way that diminishes the amount of land that your children inherit. So geometry was a way of sorting that problem out, and the way it achieved that was by treating inferences as if they were rules, and treating decision-making as if it was axiomatic (just like we do in maths, which is also a logical system).

Everything that’s better about the world of modernity when compared to the world of the Bronze Age is better because of Thales’s discovery of logic.

AI and machine learning

Bearing in mind that there are only two systems (trust and logic), it should come as no surprise that AI and machine learning leverages logic when it comes to decision making. But logic is subtle – there are a lot of ways of reducing the efficacy of logic when you ask a machine to step in and start doing it for you.

Chat GPT is a great example of an AI system that isn’t using logic very well. I have alternatively convinced Chat GPT to admit that it believes that the God Zeus exists, and then later convinced it to profess that no Gods exist at all. This is an impossible position to be left in if you are actually being logical – but most AI systems are not trying to be logical, because they are not fully taking advantage of the logical process as we know it in modernity. Instead they are just using logic to scour pools of induction in an effort to return “the most satisfying hits.” 

With this context in mind: Logic is a process of adhering to axioms. All accurate, accountable and reliable systems begin with axioms by necessity (see the Problem of the Criterion and Gödel’s Second Incompleteness Theorem), so that’s what we do with logic, too. We minimize our axioms (Occam’s Razor) to include only those axioms that cannot be discarded, and then we stick with those axioms for every next move thereafter. There is nothing about logic that is not “proven” other than the axioms that we begin with (by necessity).

Logic doesn’t care about conclusions, it cares about arguments.

Logic isn’t interested in what is actually true. Logic admits that we cannot know what is true, and is interested instead with what can be justified for belief. This is an important distinction that sits somewhere near the heart of a successful Large Language Model (LLM). We aren’t saying that “it is true that John is breaking the law” rather, if we are being rational/logical we say that “we are justified in believing that John is breaking the law for the following reasons.”

Logic doesn’t care about conclusions, it cares about arguments. Arguments are the ways in which we attach evidence to a conclusion – logic’s whole business is the work of seeing whether or not the evidence submitted maps to the conclusions that are drawn. It does that by concerning itself with the structure of the arguments, the contents of the arguments and the quality of the evidence leveraged within those arguments.

We can think of an LLM as adopting a six-step process to answer a query, and this process broadly mimics the workflow of Classical Propositional Logic.

The phrase 

The LLM begins by discerning the argument that is being interrogated. The machine is asking the question: “Is this argument one that we can justify believing to be an argument that violates rules X, Y or Z, given the axioms we begin with and the inferences we can draw from them?”

Categorization 

Then the LLM uses set-theory to determine what group of rules (or inductive reference pools) it’s going to be leveraging when attempting to “justify” its ultimate conclusion on the nature of each message or rule-breach.

Context 

Next, the LLM filters the argument through the Five Laws of Logic (note that if you’re using a Bayesian system, or a system that leans into Intuitionistic Mathematics, then there are only Four Laws, because they ditch the Law of the Excluded Middle). Every error that is ever made in any logical workflow is merely the result of violating one of these Five Laws below.

The Law of Comprehensibility.

The Law of Consistency.

The Law of Non Contradiction.

The Law of the Excluded Middle.

The Law of Identity.

(These Laws are wonderfully concise. For example, the Law of Non Contradiction is simply: Two statements that contradict one another cannot both be true in the same sense at the same time).

Implications 

The LLM then references the argument against the Rules of Inference (there are about a dozen of these, and they are all derived as inferences from the Five Laws above; Modus Ponens, Modus Tollens, Hypothetical Syllogisms, Absorption, Conjunctive Introductions, Disjunctive Syllogisms, etc.). These Rules of Inference denote what can be rationally inferred from any given statement/argument. For instance, Modus Ponens says: “P implies Q, If P is true, therefore Q must also be true” (or in compliance terms: “[P] Insider Trading implies a Rule Breach [Q]. If [P] Insider Trading is happening, therefore a [Q] Rule Breach must also be happening.”

Weighing the evidence 

This is the step where the LLM qualifies the evidence within the argument that its interrogating. This is distinct from QUANTIFYING evidence, which is an analogous process.

There are five levels of evidentiary quality, with each one superseding the level prior. If you approach me with 1,000 pieces of low-level fifth-class evidence to prove that X is true, when I counter with one superior piece of third-class evidence to prove that X is NOT true, then I win. The quality of the evidence is much more important to logic than the quantity of the evidence

  1. The lowest class of evidence is testimony (fifth class).
  2. The fourth class of evidence is personal testimony (this one is complicated, because this kind of evidence only exists for the end-user).
  3. The third class of evidence covers deductive, inductive and abductive arguments, along with mathematical proofs.
  4. The second class of evidence is directly accessible trace evidence and material evidence.
  5. The highest standard of evidence is peer reviewed meta analysis that’s free of methodological confounders (also complicated, because this level of evidence doesn’t exist inside of the scientific process itself – the scientific process being a mere branch of the broader logical process).

Decision 

Here’s where our LLM gives its “conclusion”. This is not really a rational (logical) step. To be rational the LLM would need to give us the entire argument that leads to the conclusion, not merely the conclusion itself. When leveraging logic we never speak of mere conclusions, because HOW thinking is done is much more important that WHAT a mind is thinking.

Ergo, “X is unlawful” is not logical in isolation. Instead we say that: “We are justified in believing that X is unlawful BECAUSE of A, B and C.” Essentially your LLM is using shorthand to just give only the argument’s conclusion because it thinks that nobody is going to know what to do with the whole argument.

Large language model

This is why an LLM seems like it’s the smartest person in every room. It’s using a system of critical-thinking that is so process-driven that it ends up policing its own abductive workflow at every step, eliminating cognitive bias, sensory deception and errors of inference as soon as they are encountered, such that they don’t manifest as mistaken conclusions downstream in the cognitive workflow.

Less than 0.5% of the world’s population knows how to use Classical Propositional Logic to make decisions, and every one of those people can choose to not use logic whenever the inclination strikes them. LLMs can’t choose not to use their (logical) programming, meaning that a well-built LLM is a critical-thinking professional that’s incapable of making mistakes unless the evidence that it is leveraging is itself erroneous.

LLMs are the future of critical-thinking, because the system they use is the only method of accurate, accountable, reliable problem-solving that mankind has ever unearthed, and unlike philosophers LLMs can’t decide to stop being rational just because they are having an awful Monday.