In the previous post, I suggested a possible AI-first technological stack for (business) decision intelligence, almost not saying anything about the decision-making method itself, apart from the first paragraph where I said that it should be based on state-of-the-art decision theories. But in reality, telling decision makers in business or public service that they should use “quantum-like Bayesian decision theory” is just counterproductive not only because these people obviously won’t treat such advice seriously and won’t have time for that, but also because people already use evolutionarily optimal decision theories intuitively, and it often backfires to try making well-honed intuitive processes more conscious (”rational”).
“Behind the scenes” decision-making management
If we don’t consider direct brain-computer interfacing (on the longer term, we must consider it, but this is out of the scope of this writing), the best way to improve human decision making is for the AI to converse with people naturally, propose and debate different options/solutions, surface various considerations and decision constraints, and discuss the risks. This conversation should not necessarily be exclusively oral: it may include phases where people or AIs prepare, read, and edit decision documents a-la Amazon’s six-pagers.
Below in this writing, I will call this concept of an AI that collaborates with people to make better decisions a co-executive AI (cEAI).
cEAI needs to build “theories of the mind” (ToMs) of the decision makers and experts it is interacting with (while also recognising and accounting for the fundamental limitation of such mental modelling, that it will never be very accurate or complete), identify the weaknesses in the collective understanding or the main disagreements between the people who are involved in making the decision, as well as including oneself (i.e., the cEAI) into this circle, and planning the further preparation, discussion, and making the call.
Of course, this “decision-making process management” done by cEAI should be transparent to people, whether the decision makers themselves or other people who are supervising, investigating, or reflecting on the process, either in real-time or post factum. cEAI might itself proactively reveal some aspects of this management backstage to the decision makers if cEAI deems this will be useful in the given situation for ultimately making a better decision. But in general, during a tough decision making process, human attention should be freed as much as possible from this kind of “meta” process management and instead focused fully on the “object-level” problem at hand. Human attention is a very limited resource!
Improving decision makers’ world models
Among other steps in the decision-making process, cEAI may conclude that some of the people involved are not familiar with some evidence or even don’t have some foundational theories or skills that are important for making reasonably informed and high-quality judgments in the given context.
The “evidence” may be some numerical facts (”What’s our customer churn and ROI in the last quarter?”) or the elements of some quantitative causal models of the problem domain (”What’s the causal relationship between individual vaccination and the risk of getting a disease?”) or anecdotal evidence: such as cEAI may recommend the decision makers to talk with the actual customers to get the feel of their experience. Note that even if cEAI has access to the recordings of the interviews with these customers and can process this anecdotal feedback much better at scale than human decision-makers, considering the ToM limitations that I mentioned above, it will generally be better to try to expose human decision-makers to as much “raw” uncompressed information as possible (which might still not be very much though, given their humanly time and information processing bandwidth constraints).
The “foundational theories of skills” could be anything from basic math (cf. the recurring proposals of making some math proficiency level at the time of the election a formal eligibility criterion for the U.S. presidency) or sciences (can people who are not professional social psychologists, demographers, or anthropologists by trade be in charge of a social policy, or simply consulting these experts or an AI that have read all the textbooks and paper on these subjects is enough?) to the mastery of the professional domain (can someone who hasn’t progressed through the rungs of an industrial company such as an automaker or a fast-food chain from the bottom be an effective leader in such a company, maybe setting aside the trust and authority aspects and focusing only on the quality of the decisions?).
There are some interesting open questions on whether some of these things are actually needed: there is a general Enlightenment-inspired idea that better education is always better for ethics and decision making, but the evidence is not black and white: for example, in The Tyranny of Merit, Michael Sandel points out that many of the best government leaders were not well-educated. In turn, this raises a reasonable counter-argument that perhaps the content of the traditional university education or MBA programs is not what leaders should learn.
Regardless, I think that human decision makers should at least understand the mathematical models that cEAI may suggest to use for the domain, such as regressions or differential equation models. The models that are suitable for most problem domains are not very difficult to grasp (namely, regressions and differential equation models are late middle school or early high school level of math), but in specific domains calling for more complex models, the quality of decisions may be limited by the ability of the people to understand these models. I expand on this point below in this writing.
I mention the scenario of “cEAI recommending the human to upskill or learn some theory before making a decision” to generalise the idea that to make a better judgment, humans exercise not just their business or political intuition or ethics in isolation, but the entirety of models in their heads (on all levels: methodology, science, and fact). These models couldn’t be discerned when the judgement is made. The “decision-making discussion” between humans and the cEAI may and should help to mutually correct the models of the world that they hold, but probably mostly on the fact and domain model levels, rather than on the basic science and methodological levels.
Interlude: why bother?
You may wonder at this point: if cEAI will have so much more knowledge, “reasoning compute”, attention space, and probably soon even the human theory of mind, why should businesses and governments even bother keeping humans in the loop and not delegate decision making completely to the AI?
The temporary answer that should hold for just about the next 3-5 years is that people can glue together the observations from many sources which may span across the organisational boundaries into a unified model, such as when a decision maker visits the customer’s site and talks to their employees offline. The AI alone cannot cross these boundaries very soon: at the very least, it will take at least the next 3-5 years until always-on wearable recording devices such as AI Pin are adopted in business en masse. And this will take even longer than 5 years in the public sector where it will be met with much stronger resistance due to the concerns about privacy/security and the general conservatism. Replacing humans with robots altogether would also, obviously, take longer than just equipping people with small wearables.
There are other temporary answers, such as:
Humans are better than AIs at noticing subtle patterns and clues of something going wrong from a stream of multimodal unstructured data: video/audio observations and chatting with people.
Humans are better than AIs at curiosity-driven exploration and inquiry that may lead to the discovery of these patterns from the previous item.
Humans are better than AIs in weighing multiple aspects of a decision: technical/engineering, economic, legal, PR, social, ethical, etc.
Humans are better than AIs at glueing the pieces of their world models, not only across the organisational boundaries as discussed above, but also across the different system levels: from the interpersonal relations among the team members to macro-economic and political trends.
While probably mostly true as of 2024, I think these advantages of humans will not hold against AI for more than three years. Therefore, after that point, the AI’s decision quality will be limited by the ability to do the “leg work” to gather the multimodal unstructured data that today enables people to make better intuitive inferences and thus decisions. In other words, this is the same limitation that I noted first in this section. The AI will overcome it with the sufficient uptake of wearable recording devices, and, for non-digital businesses, with the installation of more cameras at the production and service sites.
Then it will come the turn of legal, political and economic answers:
The laws will not change anytime soon and will maintain that only humans are liable for all business, public service, medical, financial, and legal decisions, and it’s absurd to hold people accountable for the decisions (made on their behalf by AI copilots) that they cannot understand. In turn, this will steer the market offerings of AI decision-making copilots such that they can in principle teach people the models the AI is using in decision making, save for criminal negligence due to laziness or YOLO attitude.
Even if the employee functions will effectively be reduced to carrying the AI device around and rubber-stumping its decisions, trade unions or other forces in corporate politics will prevent businesses from replacing humans in these roles with robots altogether.
Paying people salaries for doing meaningless jobs (as in the previous item) is a more effective system of economic resource distribution (and, perhaps, of social order) than replacing all people with robots that will force the governments to pay all the same people the same money via universal basic income. People on UBI may overflow all the touristic sites, become alcohol/drug/gaming addicts in unhealthy numbers, and generally be less healthy and live less happily without the constraints of a daily routine. It may take decades to transition society into the new regime in which all today’s decision-making jobs could be handed off to robots and this won’t cause sudden adverse effects for either the economy or the society.
The first two of these three answers might plausibly play out and stall the AI proliferation in decision making for decades, though, I can also imagine that the monetary incentives will cause mass non-compliance or regulatory arbitrage and thus these restrictions will be ineffective.
But it’s still interesting to understand whether these legal and political obstacles are “accidental evil” (or an “accidental good” if we consider that they buy time for the societal transition mentioned in the third item), or they help to protect a good blueprint for human—AI collaboration in high-stakes decision making. After all, the laws and political decisions ultimately ought to promote and protect the collective good to the best of our guess, rather than to ossify historical accidents. I’m not satisfied with politically convenient platitudes from the marketing playbooks of AI startups like “we believe in empowering humans in their work rather than replacing them with AIs”.
And I think I’ve found a deep reason why we should augment human decision makers rather than replace humans with AI executives altogether: this is the only way to maintain the relevance of human ethical judgement and thus to keep alignment a meaningful target even in principle. In the absence of really high-fidelity human brain simulations, it dawned on me that it may be a category error to call various “scalable oversight/alignment” approaches alignment in very complex situations which no human has been able to comprehend: the humane, ethical, or “aligned” judgement may not just be unknown, it may not exist in this case. There is more to discuss here about how economics and society should be structured in order to make this comprehension feasible (barring breakthroughs in brain-computer interfacing), but this is out of the scope of this writing.
It remains to note in this section that we must not squander the next 3-5 years when human—AI executive teams will still have an advantage over independent AI executives (agents): we should capitalise on this temporary edge (literally, through VC funding and technology and product development) to keep the human—AI collaboration competitive for as long as possible (cf. Andrew Critch’s humane/acc) to make it comparatively easier for politicians, legislators, and the public to resist to the temptation to enable an unhinged economy with AI agents.
Google’s medical diagnosis system outperforming human clinicians assisted by the very same system demonstrates that first, we don’t have much time left (although physiology is more “closed world” than business and public service, which makes medical decisions simpler), and second, that AIs designed to perform independently don’t become good human assistants as a byproduct: collaborativity must be an explicit and key design goal of cEAI, not just interpretability (apart from accuracy, robustness, adaptability, and other usual design criteria for decision systems).
Functions of the co-executive AI
The best form factor for cEAI appears to be a natural-language interface accessible across different apps, a la Copilot for Microsoft 365.
Compared to the “chat with your data” wave of AIs such as Julius or Hex, cEAI could be better summarised as “chat with your problem”.
When the human initiates the discussion, cEAI understands the problem: what domain it belongs to and whether the situation requires some decision or not: per Peter Drucker, this should be the first step in every decision-making process! At this stage, the cEAI only uses textbooks about management and optionally description of the structure and the domain breakdown of the business (usually maintained in a corporate wiki).
For the given domain and the problem, cEAI builds a qualitative causal model if it doesn’t exist yet from natural-language conversations with the executive or domain experts, challenging humans’ models if eEAI’s own common sense diverges from the human models. Products like Rainbird.ai already implement this:
.
If the qualitative model is insufficient for making a decision, cEAI generates one or several quantitative models (that is, parameterisations/augmentations of the qualitative causal model) as alternative explanations for the historical data observed in the domain. This could be done with something like Open Code Interpreter in conjunction with AutoML techniques. To make this possible, the variables from the qualitative causal model should be connected with (integrated into) the semantic layer of the data. The semantic layer should also include the notion of domain boundaries to limit what cEAI needs to cover with the quantitative model.
For human data scientists, analysts, and decision makers, the quantitative model, even if it is getting built, represents just a part of the deeper understanding of the problem domain that people acquire when they eyeball the data points, explore various data visualisations, debug erroneous data, the code for features, or the semantic layer, and observe the performance of other models that are tried before the “final” one is chosen. If cEAI does the data science for people, it still needs to help people to build this deeper picture in their heads by creating custom “curriculums” which may consist of all the same things: data point examples, visualisations, semantic layer code or configs, model comparisons, or even small tasks designed deliberately to calibrate the quantitative model in people’s heads, such as prediction, estimation, and fill-in-the-blank tasks. The main difference from humans doing this on their own is that a guided learning process will take much less time for anyone who is not an extremely skilful data scientist. In terms of the implementation, the teaching part of this function is similar to existing teacher AIs such as Khanmigo, but the curriculum generation part seems to be genuinely novel.
As I started to discuss in previous sections of this post, cEAI should maintain a theory of mind of the human decision maker to know when to defer to human judgement because they have access to some evidence that cEAI doesn’t or because human’s expertise is hard or impractical at the moment to communicate as an explicit causal model, and conversely, when the human judgement is different from cEAI’s likely due to human ignorance rather than their superior expertise. Further, cEAI could learn something about people’s preferences from their decisions and use this e.g. for helping multiple executives to find an agreeable solution for a problem. Although SoTA LLMs already can build surprisingly decent ToMs “intuitively” (Kosinski, 2023), cEAI probably needs to manage these ToMs more explicitly for transparency and debuggability.
Finally, cEAI can do some downstream functions related to causal modelling and decision intelligence leveraging existing algorithms, such as inferring, augmenting, or verifying the causal graph from the data (see causal-learn), detecting and explaining data anomalies (Janzing et al., 2019), and generating solutions for a problem (see evolution.ml).
Simpler models are even more important for collaborativity than interpretability
Making models simpler is good for robustness, computational efficiency, and interpretability, but it becomes even more important for cEAI’s collaborativity. Due to the opacity of the human mind to itself as well as cEAI, assessing how well humans are calibrated wrt. complicated models is impossible, and reliably teaching them to human executives also becomes almost impossible.