The AI safety proposal “AI Scientists: Safe and Useful AI?” was published by Yoshua Bengio on May 7, 2023.
Below I quote the salient sections of the proposal and comment on them in turn.
Main thesis: safe AI scientists
[…] I would like to share my thoughts regarding the hotly debated question of long-term risks associated with AI systems which do not yet exist, where one imagines the possibility of AI systems behaving in a way that is dangerously misaligned with human rights or even loss of control of AI systems that could become threats to humanity. A key argument is that as soon as AI systems are given goals – to satisfy our needs – they may create subgoals that are not well-aligned with what we really want and could even become dangerous for humans.
The bottom line of the thesis presented here is that there may be a path to build immensely useful AI systems that completely avoid the issue of AI alignment, which I call AI scientists because they are modeled after ideal scientists and do not act autonomously in the real world, only focusing on theory building and question answering. The argument is that if the AI system can provide us benefits without having to autonomously act in the world, we do not need to solve the AI alignment problem. This would suggest a policy banning powerful autonomous AI systems that can act in the world (“executives” rather than “scientists”) unless proven safe. However, such a solution would still leave open the political problem of coordinating people, organizations and countries to stick to such guidelines for safe and useful AI. The good news is that current efforts to introduce AI regulation (such as the proposed bills in
EU, but see action in the
USas well) are steps in the right direction.
Bengio’s “AI scientists” are similar in spirit to tool AI, Oracle AI, and the “simulators” agenda. All these ideas share a weakness: tool AI may still cause a lot of harm in the hands of “misaligned” people, or people with poor ethics. I’m not sure which AI design would be on balance better: tool AI or agent AI, but my intuition, stemming perhaps from the “skin in the game” principle, tells me that agent AI would be on balance better (if some other aspects of the overall system design are also done right, which I touch upon below, in the last section).
Note that in the Gaia network architecture (Kaufmann & the Digital Gaia Team, 2023), the intelligence nodes (called “Ents”) are agents as well as Bayesian learners (which is a synonym for “scientists”), as in Bengio’s proposal. But Ents are supposed to be situated within the system of incentives and action affordances that prevent these agents (if they are misaligned) from amassing a lot of power, unless they are able to create a completely isolated supply chain, like in Yudkowsky’s self-replicating nanobots argument (which many people don’t think is realistic).
AI scientists and humans working together
It would be safe if we limit our use of these AI systems to (a) model the available observations and (b) answer any question we may have about the associated random variables (with probabilities associated with these answers). One should notice that such systems can be trained with no reference to goals nor a need for these systems to actually act in the world. The algorithms for training such AI systems focus purely on truth in a probabilistic sense. They are not trying to please us or act in a way that needs to be aligned with our needs. Their output can be seen as the output of ideal scientists, i.e., explanatory theories and answers to questions that these theories help elucidate, augmenting our own understanding of the universe. The responsibility of asking relevant questions and acting accordingly would remain in the hands of humans. These questions may include asking for suggested experiments to speed-up scientific discovery, but it would remain in the hands of humans to decide on how to act (hopefully in a moral and legal way) with that information, and the AI system itself would not have knowledge seeking as an explicit goal.
These systems could not wash our dishes or build our gadgets themselves but they could still be immensely useful to humanity: they could help us figure out how diseases work and which therapies can treat them; they could help us better understand how climate changes and identify materials that could efficiently capture carbon dioxide from the atmosphere; they might even help us better understand how humans learn and how education could be improved and democratized. A key factor behind human progress in recent centuries has been the knowledge built up through the scientific process and the problem-solving engineering methodologies derived from that knowledge or stimulating its discovery. The proposed AI scientist path could provide us with major advances in science and engineering, while leaving the doing and the goals and the moral responsibilities to humans.
I think this vision of human-AI collaboration, which I can summarise as “AI does science, humans do ethical deliberation”, has a big practical problem which I already touched upon above: (some) humans, including scientists, have very bad ethics. Both “The Wisdom Gap” and the famous Edward Wilson’s quote “The real problem of humanity is the following: we have Paleolithic emotions, medieval institutions and godlike technology.” point to the fact that both humans’ individual and collective ethics (as well as our innate biological capacities to execute upon these ethics, cf. Bertrand Russell’s “two kinds of morality”) become increasingly inadequate to the power of the technology and the civilisational complexity. However, Bengio’s proposal seemingly helps to advance technology much further but doesn’t radically address the ethics question (apart from “helping to improve the education”, but this is a very slow route).
I think that all paths to safe superhuman AI (whether a “scientist” or an agent) must include turning ethics into science and then delegating it to AI as well. The only remaining role of humans would be to provide “phenomenological” evidence (i.e., telling if they perceive something as “good” or “bad”, or feel that way on the neurological level, which AI may figure out bypassing unreliable verbal report) from which AIs will infer concrete ethical models.
But then, if AIs are simultaneously superhuman at science and have a superhuman conscience (ethics), it feels that they could basically be made agents, too, and the results would probably be better than if agency is still in the hands of people even though both epistemology and moral reasoning are delegated to AI.
In short: I think that Bengio’s proposal is insufficient for ensuring with reasonable confidence that the AI transition of the civilisation goes well, even though the “human—AI alignment problem” is defined out of existence because the human-to-human alignment (ethics, and wisdom) problems remain unsolved. On the other hand, if we do solve all these problems (which is hard), then formally taking agency away from AI seems unwarranted.
Bengio’s “AI scientists” are also similar to OpenAI’s “alignment MVP” and Conjecture’s CoEms, with the difference that neither the “alignment MVP” nor CoEms are supposed to be “relatively permanent” solutions, but rather mostly (or exclusively) used to develop cognitive science, epistemology, ethics, and alignment science, and then scrap the “AI scientist” and build something that it itself helped to design instead.
The (not only) political challenge
However, the mere existence of a set of guidelines to build safe and useful AI systems would not prevent ill-intentioned or unwitting humans from building unsafe ones, especially if such AI systems could bring these people and their organizations additional advantages (e.g. on the battlefield, or to gain market share). That challenge seems primarily political and legal and would require a robust regulatory framework that is instantiated nationally and internationally. We have experience of international agreements in areas like nuclear power or human cloning that can serve as examples, although we may face new challenges due to the nature of digital technologies.
It would probably require a level of coordination beyond what we are used to in current international politics and I wonder if our current world order is well suited for that. What is reassuring is that the need for protecting ourselves from the shorter-term risks of AI should bring a governance framework that is a good first step towards protecting us from the long-term risks of loss of control of AI. Increasing the general awareness of AI risks, forcing more transparency and documentation, requiring organizations to do their best to assess and avoid potential risks before deploying AI systems, introducing independent watchdogs to monitor new AI developments, etc would all contribute not just to mitigating short-term risks but also helping with longer-term ones.
I would add to this that apart from legal and political, there are also economicand infrastructural aspects of building up the civilisational immune system against misaligned AI developments and rogue actors.
Some economic and infrastructural restructuring ideas are presented in the Gaia network architecture paper which I already referred to earlier. I pointed out some more infrastructural and economic inadequacies here:
There are no systems of trust and authenticity verification at the root of internet communication (see Trust Over IP)
The storage of information is centralised enormously (primarily in the data centres of BigCos such as Google, Meta, etc.)
Money has no trace, so one may earn money in arbitrary malicious or unlawful ways (i.e., gain instrumental power) and then use it to acquire resources from respectable places, e.g., paying for ML training compute at AWS or Azure and purchasing data from data providers. Formal regulations such as compute governance and data governance and human-based KYC procedures can only go so far and could probably be social-engineered by a superhuman imposter or persuader AI.
See also “Information security considerations for AI and the long term future” (Ladish & Heim, 2022).
This post was originally published on LessWrong.
In technical terms, “better” here could be something like occupying a better Pareto front on the usefulness/efficiency vs. safety/robustness tradeoff chart, however, using the proper logic of risk taking may lead to a different formulation.
Later, AI might acquire such a powerful morphological intelligence that it could be able to model entire human brains and thus could in principle take away this “phenomenological” role from humans, too. See “Morphological intelligence, superhuman empathy, and ethical arbitration” on this topic.
The economy may coalesce with politics since political economy is often viewed as an indivisible system and the area of study.
Yes sir! I just recently read the book This Is How They Tell Me the World Ends by Nicole Perlroth and one cannot help but wonder where this is heading!
We already have people using the ChatGPT to write code and it seems inevitable that these NSA types are going to intentionally develop AI for hacking purposes. What then!?! It's going to be like the archetypal brother-battle in the cloud, battling AI's straight out of William Gibson.