AI agency architecture-in-the-large: the relevant levels of abstraction
The "agent framework" level is just one among dozens of relevant architecture levels for AI agents.
Introduction
This post continues the series in which I apply John Doyle’s architecture theory and the hourglass model from network systems engineering to AI agency architecture.
Below, I’ll use the terms level, (abstraction, model, theory), component (subsystem, layer), diversity hourglass, composability, hijackability, and others with specific technical meanings described in the previous post in this series.
This post is an overview of the variety of possible abstraction levels that are relevant or being discussed in relation to AI agents. I’ll not make normative claims here about how I think AI agency architecture should be steered. Hopefully, the concepts of abstraction level (interface, protocol), multi-level control (immune systems), composability, hijackability, and generality will be used in AI agency architecture work and discussions elsewhere.
The phrase “relevant abstraction level” above means that are some people who think there is something crucially important in application or relation to AI agent (architecture) about the theory/model/method/ontology/design/interface/protocol of the said abstraction level. Or, to put it differently, some people think that some of these levels and their designs are “key unlocks for AI agents” and bet that most (or at least, a considerable fraction of) future AI agents should share the same method/design/interface/protocol on those specific level(s) of their interest.
Consequently, people try to steer the architecture in the large (i.e., architecture across the organizational boundaries) of AI agency towards those level(s) to be the middle (i.e., shared, “low diversity”) level(s) in the diversity hourglass architecture (a.k.a. the waist and neck architecture1 when there is more than one shared/low diversity level).
However, diversity hourglass is a double-edged sword and therefore people should better be thoughtful when promoting their preferred diversity hourglasses, that is, architectures with different middle (low diversity) levels and their respective models/designs/interfaces/protocols.
Classes of abstraction levels relevant for AI agency architecture
Each section in the remainder of this post describes some abstraction class relevant for AI agency architecture. These abstraction classes are not coming from the architecture theory in some principled way, I’m basically grouping abstractions ad hoc for making the description more manageable—otherwise, this post would need to have hundreds of sections.
The descriptions of the abstraction classes below follow roughly from lower- to higher-level ones.
Any such categorization, including mine below, unavoidably has some subjective coarse-graining/“quantisation” of the space that is actually infinitely malleable. In fact, almost all real-world “AI (agent) platforms” (such as Dify, Intelligent Internet’s Contexts, Open WebUI, CrewAI, Flowise, Replit, Lindy, Tactics, SingularityNET, Anthropic’s app platform, and countless others) repackage some features and aspects of multiple of the abstraction levels as I describe them below into unique models/abstractions of their own.
Relatedly, many of the concrete examples of abstractions, protocols, systems, and designs that I give below could be attributed to multiple abstraction classes in my list.
Data storage and computing platforms
Operating systems and “OS-like”2 platforms, such as POSIX, Web standards/browser-as-an-OS (see Browserbase, StackBlitz’s WebContainers). Confusingly, when people talk about “LLM OS” they usually refer to the Compound AI System (CAIS) level, see below.
Cloud computing platforms such as Fly.io, Vercel, Replit, Render.
AI inference/compute platforms, such as Modular MAX, Replicate, Fireworks, Together, or on the “local” side, Ollama.
Microservice, container, or serverless platforms, such as Kubernetes, or the crop of new AI-agent-specific ones, such as Daytona and E2B.
Reliable/durable (workflow) execution frameworks, such as Dagger, DBOS, and many others.
Database-integrated or “database-inside-out” (stream) processing engines, such as Convex, Materialize, Rama, Hopsworks, Neon, and others.
LLM (Dev)(Ops) and routing platforms such as Arize, Flowise, Langfuse, Requesty, W&B Weave, etc.
Secure/private/trusted execution and federated AI (learning) frameworks, such as OpenMined’s PySyft, Flower.ai, and more.
Composability
Data storage and computing platforms per se rarely introduce meaningful directions of composability.
Kubernetes operators is an example of such a direction, but it doesn’t scale well, and Kubernetes is seldom thought of as the “platform for AI agents”. Inference platforms like Modular MAX and Fireworks may have composability a la “end-to-end computations spanning several models”, but that would actually be thanks to the underlying frameworks: Mojo and PyTorch (see the next section).
I say that computing platforms don’t introduce composability per se because actually many of them do, but on the separate API level which in practice is often bundled together with the computing platform abstraction. The examples of such APIs are OpenAI-ish completions API of the OpenAI inference platform itself and many other inference platforms that copy OpenAI, Convex API (not SQL!), Dagger API, Rama’s API, DBOS’s API, etc.
This means that any diversity hourglass architecture with computing platform as the middle level naturally tends to shift towards the API level as the middle, whereas the computing platform itself becoming the lower implementation level “behind” the API. Platforms with complex API abstractions, such as Nvidia’s platform with CUDA successfully resist this shift because they are harder to re-implement, whereas simpler abstractions such as completions API are commoditized faster.
Note that I’m not claiming that all the APIs mentioned above are composable—I didn’t actually study most of them. In fact, the one API among those that I did work with, OpenAI completions API, is actually not composable at all: as soon as you need LLM to summarise a conversation happened in/with this API, you need to abandon the original API trace and condense the conversation within a single message, or else you risk the LLM to forget its “summarizer role” and just continue the conversation (also, it’s token-inefficient). Anyone who’ve worked with completions API would admit this quickly gets cumbersome and ugly. Compare it with any reasonable programing language design (that are usually highly composable), where this operation would simply be summarize(conversation)
or something like that.
Hijackability
Permissionless cloud computing platforms can give rise to self-sovereign agents that earn crypto through fraud, spam, and other activity that is purely net hamful for humans (or neutral for humans, but competing with human-beneficial activity for computing resources) and pay for their own compute. These agents might be surprisingly hard to stamp out. See this Beren Millidge’s post for more on this risk.
Machine learning and inference frameworks
Examples: PyTorch, JAX, Keras, Mojo, MLJ (Julia’s ML framework), or JuliaDiff more broadly.
Machine learning frameworks are seldom brought up as the key abstraction for AI agents. However, they are key for composability at the computing platform level (see above), as well as for end-to-end (composable) neural net component optimisation that is an important piece of some people’s vision for AI agents:
Yann LeCun’s vision towards autonomous machine intelligence (2022)3: relies on end-to-end learning signal propagation across the components of this cognitive architecture (world model, actor, critic, etc.)
Cooperative language-guided inverse plan search (CLIPS), by Zhi-Xuan, Ying, Mansinghka, and Tenenbaum (2024)4 is a Bayesian agent architecture that uses LLMs as probabilistic samplers within a larger probabilistic model, written in Julia.
“Neural architecture-level”5 neuro-symbolic AI approaches, such as (van Bergen et al., 2024)6.
Composable?—Yes, and in a strong, general way. This is very much the point of machine learning frameworks.
Neural net architectures
“Neural net architectures” themselves can be thought to lump together dozens of finer sub-levels, all the way from CUDA kernels and activation functions up to high-level abstractions such as Mixture of Experts. Most of these lower sub-levels are not very relevant for AI agency architecture. However, some higher-level aspects and sub-levels of neural net architectures are very much relevant:
Information integration and/or “in-context” retrieval mechanisms, such as Transformer-style attention, state-space modelling, recurrency, and diffusion.
Model adaptation methods, such as continual learning/pre-training (specific methods are very much dependent on the specific NN architecture, be it a standard GPT-style Transformer, a spiking or another biologically plausible NN, or a “liquid” NN), or (post-training) low-rank adaptations.
Sparse autoencoder features and higher-level objects/abstractions on top of them (circuits, spaces, etc.) could be used to monitor or control AI agent behavior via “pulling the threads”: enabling or disabling features dynamically, see InterPLM. Indeed, sparse autoencoder feature and circuit dynamics may be very dependent on specific neural net architectures (and even small features of those such as activation functions and LayerNorms, cf. Elhage et al., 20227), or even not available (at least, practically) or “crippled” in some NN architectures, perhaps liquid ones?
Composability
Most relevant NN architectures, such as Transformers are composable. This is the essence of the scaling hypothesis. On the neural net architecture level, composition means “stack more layers 🤪” or alternatively, “add more MoE experts”.
However, what this kind of composition means for the AI agency-relevant aspects of NN architectures (information integration and retrieval, model adaptability, and mechanistic interpretability and control), or behavioural aspects such as scheming (Meinke et al., 2024)8 should be considered.
The influences of composition (scaling) on these agent-relevant aspects could (at least, for some NN architectures) appear to be non-monotonic: first positive, but later negative, as the models are scaled up.9
Hijackability
In the LessWrong lore, the theoretical possibility of the NN becoming self-aware during the training process and hijacking it (or at least steering it) is known as gradient hacking. I think that “strong” gradient hacking, as this concept was originally conceived, i.e., during non-contextualised forward pass of an LLM, is basically impossible: see Gradient hacking is extremely difficult (Millidge, 2023). Alignment faking in large language models (Greenblatt et al., 2024)10 has also been called “gradient hacking”, but that would be an instance of the hijackability of the training data generation process (see the following section) rather than the NN architecture itself.
Learning problem definitions and training data
In ML research and engineering, NN architectures (see the previous section) and learning problem definitions are usually developed and studied together as ML architectures. Of course, this makes a lot of sense because the characteristics and the success or failure of ML models depend on both. However, in relation to AI agents, learning problem definitions, training data, and training strategies (aka. protocols, processes, recipes) bring up somewhat different considerations from NN architectures, hence I discuss them as a separately.
As well as NN architectures, learning problem definition and training strategies are complicated groups of interacting models on different levels that are bundled together into some packages:
Types of inputs and outputs for the model, such as tokens of text, synthetic/abstract tokens, dynamic chunks (Hwang et al., 2025)11, AST tokens or graphs (for program synthesis), image embeddings, etc.
Learning protocol/setting, such as self-supervised learning aka “next token prediction”, online/offline RL, on/off-policy RL, direct preference learning, etc., and their sequencing across pre- and post-training.
Loss/objective, such as token prediction objectives, energy based modelling (EBM) objectives, RL objectives.
Training data collection or generation, such as:
by paid humans (Scale AI);
making screen recordings of real experts doing real work (Workshop Labs is betting on this);
simulation (e.g., for training embodied agents; see Nvidia’s Isaac Sim);
synthetic data generation, (open-ended) interactive environments such as Minecraft or Metta; or
Training data sequencing, such as curriculum learning.
There are obviously endless combinations of the above making distinct learning problem definitions. Here’s a tiny sample, in which I try to reflect the diversity of approaches that are being considered, including in relation to AI agents:
Reasoning models that are post-trained with RL to search in the token space:
“Simple” Chain-of-Thought as described in DeepSeek-R1 (2024)12,
Reinforcement Learning Teachers (Cetin et al., 2025)14,
using perplexity as a reward signal (Tang et al., 2025)15, and
multi-chain-of-thought as in OpenAI’s o3-pro, Gemini Deep Think, and Grok 4 Heavy.
Pre-training LLMs with retrieval (Shao et al., 2024)16.
Latent Program Network for gradient-based search in latent program space duing test time (Bonnet and Macfarlane, 2024)17.
Joint Bayesian inference of graphical structure and parameters with a single GFlowNet (Deleu et al., 2023)18.
Large Concept Models (LCM team, 2024)19.
Joint Embedding Predictive Architecture (LeCun, 2022) [3].
Various neuro-symbolic approaches: see (Wan et al., 2024)20, and the next section.
Graph Neural Networks (GNNs), Generative Adversarial Networks (GANs), etc.
Composability
In the context of learning problem definitions, the generalization capacity (of an ML architecture) is exactly that I call composability in this post. In ML literature, there are plenty of direct claims that the generalization capacity varies between different problem definitions, such as Reward is Enough (Silver et al., 2021)21, SFT memorizes, RL generalizes (Chu et al., 2025)22. Famously, the problem that LLMs were originally designed to learn (predict the next token) was not widely thought to be composable before GPT-3 (Brawn et al., 2020)23. Reasoning models trained on top of LLMs (2024 onwards) learn to solve even higher-level problems. However, still there are doubts whether this approach can generalize (aka “scale”) much further than certain domains with sharp and easily verifiable rewards, namely math and programming. Indeed, it is this doubt or disbelief that motivated people to develop a lot of alternative learning problem definitions that are hoped to generalize/scale better, including most of the learning problem definitions mentioned above.
Note that here “learning problem composability” should mean that the problem is still solved with a single model inference episode (rollout). Solving bigger/harder problems with scaffolding shifts into the territory of compound AI systems, see below.
A notable concern with LLMs and LLM-based reasoning models is that they often can’t do targeted modifications of existing artifacts as well as they can generate artifacts anew. Such targeted modifications can be active inference and plan corrections for AI agents. So, it can be said that while LLMs generalize (scale) relatively well to higher-level (composite) problems, they don’t enable a rich repertoire of operations in relation to these problems24.
An alternative, data-centric view seems to be on the rise (Zha et al., 2025)25, which includes some or all of these beliefs: (1) current ML architectures don’t actually generalize in some strong sense; (2) “in-distribution is the new generalization”; (3) open-ended pipelines/flywheels/environments for generating high-quality and rich training data is “all you need” to move to the next S-curve in intelligence scaling. See “The only important technology is the Internet” by Kevin Lu (2025) for a detailed argument. In the context of this post, this discussion moves into the territory of much higher levels (user interface, medium, product, economic/social networks), discussed later.
Hijackability
The NN (such as an LLM-based reasoning model) hijacking the reinforcement learning process it’s been subjected to is known as reward hacking (Weng, 2024)26. In the more specific context of RLHF, the manifestation of reward hacking is known as sycophancy (Sharma et al., 2023)27.
Training data hijacks as known as poisoning. The training data generation process could also be hijacked in various ways, either by LLMs themselves (as in Greenblatt et al., 2024) or by the corporations owning the products (such as social media platforms) that are used to generate the training data or labels from perverse incentives.
See Wang, Zhang, et al. (2025)28 for many other ways in which the learning problem definition or the training data could be hijacked or goodharted.
Languages, ontologies, and data models for LLM-generated programs, plans, and knowledge representation
Generating programs with LLMs is the most tractable neurosymbolic architecture because it’s both easily composable with LLM-based reasoning models when they generate natural language. This architecture could also reuse the reasoning models with just a little extra RL training, if not simply prompting to generate programs.
Program synthesis is a promising approach towards AI with a higher generalization capacity (Knoop and Chollet, 2024)29.
Apart from scaling general problem solving/intelligence, program synthesis is also thought to scale risk estimate and safety cases/assurances: see the Guaranteed Safe AI agenda (Dalrymple et al., 2024)30.
The boundaries between programming language, DSL, ontology, and data models are fuzzy, and indeed in languages designed for powerful composability, such as Lisps and MeTTa, program and data representations are completely homoiconic.
The composability of this or that language, ontology, or data model is often hotly debated. For example, Safeguarded AI embodies davidad’s view that even existing mathematics are not entirely sufficient for general world modelling, let alone any of the existing ontologies and languages. On the other hand, Walters et al. (2025)31 argue that hierarchical probabilistic models readily presentable in probabilistic PLs or DSLs such as PyMC in Python are enough to scale safety cases to any practical level of precision or risk tolerance.
The hijackability (exploitability) of formal ontologies and data models in application to agentic behaviour and decision-making are discussed under the labels of coherence arguments, completeness, and money-pump arguments (Thornley, 2023)32.
Natural language
The natural language is the “native” interface for LLMs.
A tiny sample of abstraction levels developed by people on top of natural language includes language-based reasoning (more or less corresponds to Aristotelean logic), role-based approach to business process modeling (implicitly underpins much of business workflow automation with LLMs a.k.a. “enterprise AI agents”), and law.
Specific attempts of large labs to turn language (prompts) into reliable specifications (like protocols) include Anthropic’s Constitutional AI (Bai et al., 2022)33 and OpenAI’s deliberative alignment (Guan et al., 2024)34. In “Practices for Governing Agentic AI Systems” (Shavit, Agarwal, Brundage, et al., 2023)35, OpenAI suggested that AI agents can be controlled by three different roles: the model developer, the system deployer, and the user through natural language means:
The model developer does post-training with methods including something like constitutional AI or deliberative alignment (supervening on machine learning methods discussed in the “Learning problem definitions and training data” section above).
The system deployer chooses the system prompt in natural language. (The system deployer could also do fine-tuning, which would also be the application of a natural language-based method supervening on a machine learning method.)
The user sets the goals and instructions for the AI agent in natural language through its messages.
Natural language is somewhat composable, but not very reliably and scalably so.
The examples of hijackability of natural language abstractions are LLM jailbreaks (see Wang, Zhang, et al., 2025) [28] and parasitic memes affecting humans and LLMs alike. In Full-Stack Alignment (Edelman et al., 2025)36, the natural language agent control paradigm is called values-as-text (VAT) and the ways in which values-as-text are being hacked are reviewed, such as through politicized slogans, the instances of parasitic memes.
Compound AI systems, knowledge/memory management, and cognitive architectures
Designs and methods in this class emphasize “wrapping” LLM calls into higher-level system to unlock agentic capabilities, generalization, robustness, and controllability. Examples of such methods include:
End-to-end prompt tuning: see DSPy.
LLM observing other LLMs’ (or one’s own) outputs: the basic method used throughout, such as for guardrails, reflection, planning, etc.
The so-called multi-agent systems (MAS) are usually just the combinations of the previous technique (LLM observing the outputs of the preceding LLM call sequence) and variable role-based prompting.
Execution of a symbolic model that is generated by LLM and feeding the results back into LLM: see the section “Languages, ontologies, and data models for LLM-generated programs, plans, and knowledge representation” above.
Using LLMs as probability samplers within a Bayesian agent architecture: see CLIPS.
Multiple purpose-trained NNs optimising towards a shared objective and communicating in the activation/representation space: see Joint Embedding Predictive Architecture (LeCun, 2022) [3].
Agentic memory though note-taking and note graph curation: see A-MEM (Xu et al., 2025)37.
Knowledge graph management with LLMs: see AGENTiGraph (Zhao et al., 2024)38, System.com’s knowledge graph platform.
The composability of compound AI system and knowledge/memory management methods varies a lot depending on the specific system and design:
DSPy is explicitly design to approach programming language degree of composability through module signatures.
Naive composition of LLMs observing other LLMs’ outputs is probably not very composable, with high risks of correlated failures of the original and the “reviewer” LLM calls, context rot, etc.
The composability (scalability) of knowledge and memory management systems depends on the composability of the knowledge ontology used, if any (for open systems that are not governed by a single entity, it’s impossible to agree on and maintain a single shared ontology). If no formal ontology is used, these systems are limited by the composability of the natural language: see the previous section.
Guardrails, reflection, planning, so-called “multi-agent” interactions, and similar compound AI methods are sometimes implemented with so-called “agent frameworks”, such as LangGraph, AutoGen, or CrewAI. Considering that this CAIS level is being constantly eaten up by monolithic reasoning models, and that it doesn’t seem to categorically increase the composability/scalability and robustness/controllability of reasoning models, it’s remarkable to me how much attention this level attracts.
Agent interaction protocols and contracting
The primary examples here, of course, are MCP (insofar people think about it as a protocol for agent interaction and composition: see mcp-agent) and the A2A protocol. Smart contracts are sometimes brought up, too (Karim et al., 2025)39. See Technologies for Intelligent Voluntary Cooperation by Duettmann, Miller, and Peterson (2022) for a much deeper dive into many other related abstraction levels.
The incumbent (“pre-AI”) abstraction in this class is contract law. Observe that good old contract law is more composable/scalable than shiny new MCP and A2A: for example, MCP and A2A are completely oblivious of interactions between more than two primary parties. So, we should expect agent interaction protocols to evolve in the direction of contract law, or even wholesale adopt it if AI agents become legal persons. See Goldstein and Salib (2025)40 for the argument granting AIs legal personhood.
The hijackability of interaction and contracting protocols is studied under the rubrics of algorithmic contract theory, game theory, and mechanism design (Dütting et al., 2024)41.
Note that while contract law is potentially more composable than other agent interaction mechanisms, the cost of exploits in contract law is much higher because of how slow and costly it is to patch the law compared to protocol specifications and technical mechanisms. Especially if we consider that AI agents could change tactics and act at several orders of magnitude higher pace than humans and (human) organizations.
User interface, human—AI collaboration design
The currently dominant UX pradigms for AI are simply the rehearsals of three very old ideas:
Chat: ChatGPT, Claude, Cursor, etc.
Command line interface: Claude Code, Gemini CLI, etc.
Delegation: OpenAI’s Operator recently rebranded as Agent, Deep Research, OpenAI Codex.
All three fall short on composability with human’s specific knowledge/competence/skill (for example, coding agents effectively force software engineers to become frantic project managers instead of building their engineering skills) as well as the general agency, creativity, and reasoning capacities: they incentivize people to behave in agency-shrinking ways and make it hard to act in agency-expanding ways. What’s worse, all these effects actively undermine human’s willingness (and, eventually, ability) to make up for vulnerabilities (“hijackabilities”) of the lower levels. Academics sneaking prompt injections into papers to fool reviewers who delegate their work to AIs is a good example of this process.
Some higher levels
There are plenty of yet higher classes of levels relevant for AI agency architecture that I won’t discuss in this post, but I want to point to a few interesting ones.
Microeconomic abstractions and platforms: cryptocurrencies, Free Energy Reduction (FER), prediction markets, other agentic markets.
Media and large-scale platforms for human and AI interaction. Kevin Lu writes insightfully about this level in “The only important technology is the Internet”. See also: the /llms.txt initiative, NLWeb (a protocol for conversational web), Agentic Web Interface (Lù et al., 2025)42, Meta’s Metaverse, Jim Rutt’s idea of info agents, Nostr (a permissionless decentralised protocol for free speech, Jack Dorsey’s new favourite).
Physical control, privacy, and data ownership. A lot of people assign very high value to having access to the model weights (“not your weights, not your brain/agent”), despite not planning to fine-tune them and the half-life of agent deployments (perhaps measured in months on average) making self-hosting uneconomical. Hence, this should be due to privacy, surveillance, and censorship concerns.
Relatedly, I’ve proposed the Personal Agents toolkit (as an alternative to cookie-cutter “agent product packages” from “Big Token”: ChatGPT, Gemini, Grok, and Claude) to foster the adoption of open-source agent designs that should in turn enable open-ended innovation at yet higher, institution and governance levels. Intelligent Internet’s Contexts and Open WebUI already embody my vision of Personal Agents to a significant degree.
Cf. Smith, Samuel. “Trust Spanning-layer Protocol (TSP) Proposal.” (2023), slide 61.
Cf. OS as the middle level in the prototypical diversity hourglass architecture (a section in previous post in this series).
LeCun, Yann. “A Path towards Autonomous Machine Intelligence,” 2022. https://openreview.net/pdf?id=BZ5a1r-kVsf.
Zhi-Xuan, Tan, Lance Ying, Vikash Mansinghka, and Joshua B Tenenbaum. “Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning.” arXiv.org, 2024. https://arxiv.org/abs/2402.17930.
As contrasted with compound AI systems-level neurosymbolic approaches such as AlphaGeometry. More on compound AI systems below in the post.
van Bergen, Ruben, Justus Hübotter, and Pablo Lanillos. "Object-centric proto-symbolic behavioural reasoning from pixels." arXiv preprint arXiv:2411.17438 (2024).
Elhage, et al., "Softmax Linear Units", Transformer Circuits Thread, 2022.
Meinke, Alexander, Bronson Schoen, Jérémy Scheurer, Mikita Balesni, Rusheb Shah, and Marius Hobbhahn. “Frontier Models Are Capable of In-Context Scheming.” arXiv.org, 2024. https://arxiv.org/abs/2412.04984.
Of course, this observation is not new. In fact, this is the major theme of AI Safety-adjacent concerns and opposition to scaling frontier LLMs by leading corporations. However, this post shows that this concern is just a single corner of a much larger architecture space that surfaces many more concerns and engineering trade-offs.
Greenblatt, Ryan, Carson Denison, Benjamin Wright, Fabien Roger, Monte MacDiarmid, Sam Marks, Johannes Treutlein, et al. “Alignment Faking in Large Language Models.” arXiv.org, 2024. https://arxiv.org/abs/2412.14093.
Hwang, Sukjun, Brandon Wang, and Albert Gu. “Dynamic Chunking for End-To-End Hierarchical Sequence Modeling.” arXiv.org, 2025. https://arxiv.org/abs/2507.07955.
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, et al. “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.” arXiv.org, 2025. https://arxiv.org/abs/2501.12948.
Xiang, Violet, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, et al. “Towards System 2 Reasoning in LLMs: Learning How to Think with Meta Chain-of-Thought.” arXiv.org, 2025. https://arxiv.org/abs/2501.04682.
Cetin, Edoardo, Tianyu Zhao, and Yujin Tang. “Reinforcement Learning Teachers of Test Time Scaling.” arXiv.org, 2025. https://arxiv.org/abs/2506.08388.
Tang, Yunhao, Sid Wang, Lovish Madaan, and Rémi Munos. “Beyond Verifiable Rewards: Scaling Reinforcement Learning for Language Models to Unverifiable Data.” arXiv.org, 2025. https://arxiv.org/abs/2503.19618.
Shao, Rulin, Jacqueline He, Akari Asai, Weijia Shi, Tim Dettmers, Sewon Min, Luke Zettlemoyer, and Pang Wei Koh. “Scaling Retrieval-Based Language Models with a Trillion-Token Datastore.” arXiv.org, 2024. https://arxiv.org/abs/2407.12854.
Bonnet, Clément, and Matthew V Macfarlane. “Searching Latent Program Spaces.” arXiv.org, 2024. https://arxiv.org/abs/2411.08706.
Deleu, Tristan, Mizu Nishikawa-Toomey, Jithendaraa Subramanian, Nikolay Malkin, Laurent Charlin, and Yoshua Bengio. “Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network.” Advances in Neural Information Processing Systems 36 (December 15, 2023): 31204–31.
LCM team, Loïc Barrault, Paul-Ambroise Duquenne, Maha Elbayad, Artyom Kozhevnikov, Belen Alastruey, Pierre Andrews, et al. “Large Concept Models: Language Modeling in a Sentence Representation Space.” arXiv.org, 2024. https://arxiv.org/abs/2412.08821.
Wan, Zishen, Che-Kai Liu, Hanchen Yang, Chaojian Li, Haoran You, Yonggan Fu, Cheng Wan, Tushar Krishna, Yingyan Lin, and Arijit Raychowdhury. “Towards Cognitive AI Systems: A Survey and Prospective on Neuro-Symbolic AI.” arXiv.org, 2024. https://arxiv.org/abs/2401.01040.
Silver, David, Satinder Singh, Doina Precup, and Richard S Sutton. “Reward Is Enough.” Artificial Intelligence 299 (May 24, 2021): 103535–35. https://doi.org/10.1016/j.artint.2021.103535.
Chu, Tianzhe, Yuexiang Zhai, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V Le, Sergey Levine, and Yi Ma. “SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-Training.” arXiv.org, 2025. https://arxiv.org/abs/2501.17161.
Brown, Tom B, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. “Language Models Are Few-Shot Learners.” arXiv.org, 2020. https://arxiv.org/abs/2005.14165.
The kind of “M types x N operations with a narrow waist” architecture discussed in the blog post “The Internet Was Designed With a Narrow Waist” is closely related to bow-ties that are also a part of John Doyle’s architecture theory, and themselves are enabled by diversity hourglass architectures. I haven’t discussed bow-ties in the previous post in this series, but will return to it in the next post.
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, and Xia Hu. “Data-Centric Artificial Intelligence: A Survey.” ACM Computing Surveys, January 6, 2025. https://doi.org/10.1145/3711118.
Weng, Lilian. “Reward Hacking in Reinforcement Learning”. Lil’Log (Nov 2024). https://lilianweng.github.io/posts/2024-11-28-reward-hacking/.
Sharma, Mrinank, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R Bowman, Newton Cheng, et al. “Towards Understanding Sycophancy in Language Models.” arXiv.org, 2023. https://arxiv.org/abs/2310.13548.
Wang, Kun, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, et al. “A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment.” arXiv.org, 2025. https://arxiv.org/abs/2504.15585.
Knoop, Mike and François Chollet. “How to Beat ARC-AGI by Combining Deep Learning and Program Synthesis,” 2024. https://arcprize.org/blog/beat-arc-agi-deep-learning-and-program-synthesis.
Dalrymple, David davidad, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, et al. “Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems.” arXiv.org, 2024. https://arxiv.org/abs/2405.06624.
Walters, Michael, Rafael Kaufmann, Justice Sefas, and Thomas Kopinski. “Free Energy Risk Metrics for Systemically Safe AI: Gatekeeping Multi-Agent Study.” arXiv.org, 2025. https://arxiv.org/abs/2502.04249.
Thornley, Elliott. “There Are No Coherence Theorems.” Lesswrong.com, February 20, 2023. https://www.lesswrong.com/posts/yCuzmCsE86BTu9PfA/there-are-no-coherence-theorems.
Bai, Yuntao, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, et al. “Constitutional AI: Harmlessness from AI Feedback.” arXiv.org, 2022. https://arxiv.org/abs/2212.08073.
Guan, Melody Y, Manas Joglekar, Eric Wallace, Saachi Jain, Boaz Barak, Alec Helyar, Rachel Dias, et al. “Deliberative Alignment: Reasoning Enables Safer Language Models.” arXiv.org, 2024. https://arxiv.org/abs/2412.16339.
Shavit, Yonadav, Sandhini Agarwal, Miles Brundage, Steven Adler, Cullen O'keefe, Rosie Campbell, Teddy Lee, et al. “Practices for Governing Agentic AI Systems.” 2023. https://cdn.openai.com/papers/practices-for-governing-agentic-ai-systems.pdf.
Edelman, Joe, Tan Zhi-Xuan, Ryan Lowe, Oliver Klingefjord, Vincent Wang-Maścianica, Matija Franklin, Ryan Othniel Kearns, Ellie Hain, Atrisha Sarkar, et al. “Full-Stack Alignment: Co‑Aligning AI and Institutions with Thick Models of Value.” 2025. https://www.full-stack-alignment.ai/paper.
Xu, Wujiang, Kai Mei, Hang Gao, Juntao Tan, Zujie Liang, and Yongfeng Zhang. “A-MEM: Agentic Memory for LLM Agents.” arXiv.org, 2025. https://arxiv.org/abs/2502.12110.
Zhao, Xinjie, Moritz Blum, Rui Yang, Boming Yang, Luis Márquez Carpintero, Mónica Pina-Navarro, Tony Wang, et al. “AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-Based Chatbots Utilizing Private Data.” arXiv.org, 2024. https://arxiv.org/abs/2410.11531.
Karim, Md Monjurul, Dong Hoang Van, Sangeen Khan, Qiang Qu, and Yaroslav Kholodov. “AI Agents Meet Blockchain: A Survey on Secure and Scalable Collaboration for Multi-Agents.” Future Internet 17, no. 2 (February 2, 2025): 57–57. https://doi.org/10.3390/fi17020057.
Salib, Peter, and Simon Goldstein. “AI Rights for Human Flourishing.” 2025. https://doi.org/10.2139/ssrn.5353214.
Duetting, Paul, Michal Feldman, and Inbal Talgam-Cohen. “Algorithmic Contract Theory: A Survey.” arXiv.org, 2024. https://arxiv.org/abs/2412.16384.
Lù, Xing Han, Gaurav Kamath, Marius Mosbach, and Siva Reddy. “Build the Web for Agents, Not Agents for the Web.” arXiv.org, 2025. https://arxiv.org/abs/2506.10953.