• 0 Posts
  • 6 Comments
Joined 1 year ago
cake
Cake day: January 18th, 2025

help-circle

  • LLMs do not reason in the human sense of maintaining internal truth states or causal chains, sure. They predict continuations of text, not proofs of thought. But that does not make the process ‘fake’. Through scale and training, they learn statistical patterns that encode the structure of reasoning itself, and when prompted to show their work they often reconstruct chains that reflect genuine intermediate computation rather than simple imitation.

    Stating that some errors appear isolated is fair, but the conclusion drawn from it is not. Human reasoning also produces slips that fail to propagate because we rebuild coherence as we go. LLMs behave in a similar way at a linguistic level. They have no persistent beliefs to corrupt, so an error can vanish at the next token rather than spread. The absence of error propagation does not prove the absence of reasoning. It shows that reasoning in these systems is reconstructed on the fly rather than carried as a durable mental state.

    Calling it marketing misses what matters. LLMs generate text that functions as a working simulation of reasoning, and that simulation produces valid inferences across a broad range of problems. It is not human thought, but it is not empty performance either. It is a different substrate for reasoning, emergent, statistical, and language-based, and it can still yield coherent, goal-directed outcomes.


  • You’re assuming that transformation only counts when it yields visible scientific breakthroughs. That overlooks how many technologies reshape economies by compressing time, labor, and coordination across everyday work. When a tool removes friction from millions of small interactions, its cumulative effect can be structural even if each individual use feels modest, much like spreadsheets, search engines, or email once did.

    The distinction between predictive systems and LLMs is broadly right, but in practice the boundary is porous. Most high-impact AI systems still rely on classical predictive models, optimization methods, and domain-specific algorithms, while LLMs increasingly act as a control and translation layer. They map ambiguous human intent into structured actions, route tasks across tools, and integrate heterogeneous systems that previously required expert interfaces. This does not make LLMs the source of breakthroughs, but it does make them central to how breakthroughs scale, combine, and reach non-experts.

    The reasoning critique strengthens when framed around control and guarantees rather than capability. LLMs do generalize to new problems, so their limitation is not simple memorization. Their reasoning emerges from next-token prediction, not from an explicit objective tied to truth, proof, or logical consistency. This architecture optimizes for plausibility and coherence, sometimes producing fluent but unfounded claims. The problem is not that LLMs reason poorly, but that they reason without dependable constraints.

    The hallucination problem can be substantially reduced, but within a single LLM it cannot be eliminated. That limit, however, applies to models, not necessarily to systems. Multi-model and hybrid architectures already point toward ways of approaching near-perfect reliability. Retrieval and grounding modules can verify claims against live data, tool use can offload factual and computational tasks to systems with hard guarantees, and ensembles of models can cross-check, critique, and converge on shared answers. In such configurations, the LLM serves as a reasoning interface while external components enforce truth and precision. The remaining difficulty lies in coordination, ensuring that every step, claim, and interpretation remains tied to verifiable evidence. Even then, edge cases, underspecified prompts, or novel domains can reintroduce small error rates. But in principle, hallucination can be driven to vanishingly low levels when language models are treated as parts of truth-preserving systems rather than isolated generators.

    The compute and energy debate is directionally sensible but unsettled. It assumes progress through brute-force scaling toward brain-like complexity, yet history shows that architectural shifts, hybridization, and efficiency gains often reset apparent limits. Real constraints are likely, but their location and severity remain uncertain.

    Where your argument is strongest is on incentives. The current investment cycle undoubtedly rewards short-term monetisation and narrative dominance over long-term scientific and infrastructural progress. This dynamic can crowd out foundational research in safety, evaluation, and interpretability. Yet, as in past bubbles, the aftermath tends to leave behind useful assets, tools, datasets, compute capacity, and talent, that more serious work can build upon once the hype cools.


  • The reasoning models were the breakthrough in its ability to reason and understand?

    AI has solved 50-year-old grand challenges in biology. AlphaFold has predicted the structures of nearly all known proteins, a feat of “understanding” molecular geometry that will accelerate drug discovery by decades.

    We aren’t just seeing a “faster horse” in communication; we are seeing the birth of General Purpose Technologies that can perform cognitive labor. Stagnation is unlikely because, unlike the internet (which moved information), AI is beginning to generate solutions.

    1. Protein folding solved at near-experimental accuracy, breaking a 50-year bottleneck in biology and turning structure prediction into a largely solved problem at scale.

    2. Prediction and public release of structures for nearly all known proteins, covering the entire catalogued proteome rather than a narrow benchmark set.

    3. Proteome-wide prediction of missense mutation effects, enabling large-scale disease variant interpretation that was previously impossible by human analysis alone.

    4. Weather forecasting models that outperform leading physics-based systems on many accuracy metrics while running orders of magnitude faster.

    5. Probabilistic weather forecasting that exceeds the skill of top operational ensemble models, improving uncertainty estimation, not just point forecasts.

    6. Formal mathematical proof generation at Olympiad level difficulty, producing verifiable proofs rather than heuristic or approximate solutions.

    7. Discovery of new low-level algorithms, including faster sorting routines, that were good enough to be merged into production compiler libraries.

    8. Discovery of improved matrix multiplication algorithms, advancing a problem where progress had been extremely slow for decades.

    9. Superhuman long-horizon strategic planning in Go, a domain where brute force search is infeasible and abstraction is required.

    10. Identification of novel antibiotic candidates by searching chemical spaces far beyond what human-led methods can feasibly explore.