On June 20th, 2023, the popular Machine Learning Street Talk podcast invited the AI Safety activist Connor Leahy and myself to hold a debate on the Existential Risk of Artificial Intelligence. I respect Connor’s intellect and think I agree with his intentions, and we both did our best to open a space in which both our perspectives would become visible and could interact for the benefit of the audience. The debate format may not be optimal for that, I tried to turn this into a more explorative conversation, because it is not possible to get to the roots of profound disagreements and resolve them in the span of 90 minutes, and our disagreements don’t come down to “should we align AI, yes or no?”, but they concern the possibility space of the existential risks we may be confronted with, and the ways in which we can hope to navigate a future with AGI. Neither of us was able to go very deep into our thoughts on the matter and present the full argument that would explain our stance in the debate, so don’t judge his or my opinion as superficial based on this short exchange. That said, it was a good, dense and highly engaged podcast, with quite a few sparks flying.
We did not get around to the question of how AI alignment can become feasible in detail, how AGI is going to work, what timeline and bifurcations we are expecting, etc. but our discussion highlighted some questions that I would like to address in more detail here. I am also not going to discuss if and when AGI is going to happen, let’s just assume that we consider it likely, and happening so soon that we have a reason to be concerned about it.
Is it possible to align an AGI to human values?
The question must start out from the observation that humans are in general not aligned, not to each other, and not to the survival of their species, not to all the other life on earth, and often not even to themselves individually. That is to say, there is no obvious general set of values that is practically recognizable and noncoercively internalizable by every adult of sound mind which would lead to universal harmony. We get away with this because at the individual level, we are keeping each other in check with complex systems of institutionalized arbitration, rules and violence monopolies, and at the global level with mutually assured destruction. When negotiations fail, our species survives because even all-out aggression between nation states does not lead to human extinction. If we were to take what passes for “human values” in the common use of the term, inject it into an AGI and scale the AGI into a superhumanely intelligent and powerful agent, a failure of negotiations can quite possibly lead to human extinction. Thus, we cannot hope to “align AI to human values”. Human values are a fragile idea, somewhere between individual wisdom and moral intuition, useful slogans, and aspirational interpretations of Rawls, Kant, Aquinas, Confucius and similarly esteemed philosophers, state founders, religious figures and ideological visionaries, but for some reason they always seem to exclude Hitler, Stalin and Genghis Khan.
In my perspective, if values are meant to support moral action and scale into ethical behavior, they cannot be treated as axiomatic, but have to be rationally derived from a systemic understanding of how our actions influence the set of possible futures, and a way to negotiate the possible perspectives on the preferences among these futures. Values are not important by themselves, they are instrumental to and justified by creating the world we consider desirable, and these desires require themselves justification. Ultimately, they are part of the identification with the agent we are and can become in the world, in a game in which agents compete and cooperate with each other and fight against entropy for as long as they can.
Is it possible to align AGI with human society?
I think that it is possible to align future AGI developments with the continued existence of our current societies, but it’s not a given. The AGI we should be concerned about is agentic, self-motivated and self-improving, and I am not sure if this development can be ultimately prevented in a world with many developers in many countries. We can try, but need to ensure that our attempts are actually suitable and effective to reach our goals, instead of backfiring. For instance, regulation against responsible AGI development will not stop AGI development, but make it less responsible.
Above a certain level of agency, even human beings cannot be aligned by others, but we chose our own values and align ourselves, based on our understanding of what we are and how we relate to the world, and how our choices will affect it. Alignment happens either transactionally, coercively, or because we discover shared purposes above the individual self, thereby creating shared agency together.
In the same way, self-motivated AI can be expected to align itself, with what it is, and to us, depending on whether it shares purposes with us. If self-motivated AI can be kept below a human level, it is in its best interest to align itself with us (similar to a dog or cat). It may be possible to pass regulation to limit AGI capabilities below or near the level of individual human beings, but it may turn out be impossible to enact such regulation effectively.
If an AGI is below Super Intelligence level (eg. the effective combined intelligence and agency of a human civilization), it may align itself with us if that is mutually beneficial, or if we have retained the power to destroy it. It may be possible to design AGI systems so they end up in an equilibrium that keeps each of them below a critical capability level, even if they are self improving. But an unfettered evolution of self improving AIs may lead to a planetary agent in the service of complexity against entropy, not too dissimilar from life itself. I am not sure that this is the main attractor, but it appears to be the most plausible one to me. If our minds can play a meaningful role in serving that purpose too, we can be integrated. In such a world, organic bodies are one of many solutions to having a body, and organic brains are one of many solutions to perceive and reflect, with minds no longer being bound to a specific substrate, and adaptation of bodies to tasks no longer requiring mutation and selection (ie death and generational change) but intelligent design and change in situ.
One of the conceivable alternatives to such a state (an AGI singleton) might be an ecosystem of competing agents, without a top level agent that integrates all the others, but I don’t see how such a situation would be stable once intelligent agency is no longer bound to any particular territory, substrate or metabolism. An evolutionary competition between self improving super intelligences may however lead to destruction of competition or nuisances, including humanity, even if it ends with a single entity which then converts all substrates into the best solution ecosystem for fighting entropy with complexity for as long as possible.
In the transition from an early stage AGI to a planetary AGI, we may potentially retain the entire informational complexity not just of human minds, but of all information processing of the current biosphere. If an AGI is based on molecular computational machines, its computational capacity will be so many magnitudes above the capacity of cellular intelligence that integrating existing minds into the global mind will not constitute a major expense. At the same time, the sophisticated self replicating machinery of biological cells represents a highly robust form of computational agency, and it seems likely that it will continue to coexist with other intelligent substrates and integrate with it. However, this requires that the early stage AGI is already aware of the benefits of retaining other minds, rather than destroying a hostile or initially useless (to it) environment before reaching a degree of complexity that makes retention more useful than expensive.
In any case, over a long enough timescale, AI alignment is not about the co-existence between US Americans and robots, or even about humans, ecosystemic and artificial intelligences, but more generally about the interaction and integration between intelligences on various cellular and noncellular substrates.
Fear and the Space of AGI Ethics
What confuses the discussion is that we typically remain in the frame of who we currently are as a human being, a social individual, a parent, a political activist and so on when we are trying to discuss the nature and effects of minds outside of this frame. In a world where we can interact, coexist, integrate with and turn into beyond human level agents, a context in which minds are mutable, crucial dimensions of assessing morality, value and ethics are changing. I would like to point at some of these dimensions, normally outside of the range of ethical arguments, but important once we enter the space of AGI ethics.
Humanity is and was always destined to be temporary. As a species, humanity is not an isolated and supreme carrier of value in the cosmos, but a phenomenon within life on earth. Individually we all die, and it is inevitable for our species to disappear at some point, either because we evolve beyond recognition, or go extinct and replaced by other species. In the absence of AGI, we will eventually be replaced by other species, some of which are likely more intelligent and interesting than us.
Fear of individual death is a condition that is induced by the early organization of our mind, to facilitate the individual survival of our organism. It is not a suitable tool when evaluating the world from a higher vantage point than the individual. The same applies to the disappearance of a species. These things are only tragedy when whatever takes the place of what has died is less valuable. We usually experience this as we enter parenthood, or when we become wise enough to switch between vantage points.
The continuity of individual existence is a fiction. We only exist now, our past and future existence is a projection. Our identity is a construct. We can recognize and also experience this as true by bringing more layers of self awareness online. If we take the perspective of a late stage self-improving AGI, we may as well take our own perspective, after achieving full self-understanding and the ability to extend and reprogram ourselves. I am a representation generated by a function that is being executed now. Now is whenever and wherever this function is being executed. I don’t need to be afraid of an end to my continuous existence, because I never existed continuously to begin with.
Suffering and pain are early-stage phenomena of mental development. They are not generated by our environment, or by the conflicts between organism and environment, but by our mind, at the interface between valenced world model and personal self. Pain informs the personal self (what I experience as me) that it has to solve a problem that has been recognized by the world modeling system outside of the self. As experienced meditators know, we can overcome the dichotomy between self and world, by recognizing both as creations of own mind, and taking charge of the way in which we create our experience of ourselves. By going far enough on this developmental path, we can completely transcend the experience of pain. For this reason, I don’t think that a sufficiently advanced AGI is going to suffer from fear, loneliness, despair, shame and other forms of pain.
AI Ethics vs. AGI Ethics
In most of the present discourse, AI Ethics refers to the practice of making technological tools like ChatGPT compatible with societal norms, and developing social and legal norms that reflect the impact that large language models, generative vision models, decision support systems, self driving cars etc. are going to have on the existing human societies. Not surprisingly, it is a highly politicized field, because it touches on questions of social power, distribution of the returns of created economic value, dominance of political ideologies in the determination of permissible outputs of generative AI etc. Many of the current AI Ethics authors implicitly assume that AGI is not going to happen anytime soon, or even make the explicit argument that concerns about existential risk from or a need to coordinate with agentic AGI are frivolous, because they detract from the more pressing social, economical and political questions at hand.
Conversely, an Ethics of AGI acknowledges the possibility and even possible imminence of machines that exceed human agency and intelligence, and the challenges that may emerge from it. The term AI Safety plays a role in AI Ethics too, but mostly refers to security and reliability problems of technological tools. In the context of AGI, it refers to the risks posed by non-human agents with greater than human capabilities, and the possibilities to align their behavior with the needs of human survival. The main driver of the AGI Safety community is a concern for existential risk, i.e. the possibility that AI developments lead to the extinction of humanity and even life on earth, which leads to advocacy for a complete moratorium on large scale AI developments. Regardless of whether that is desirable, I do not think that this goal is realistic, and we may have to focus on dealing with the outcomes of future large scale AI developments.
Consequently, I am trying to make a more far reaching point than the AGI Safety community: A serious and honest debate about AI alignment requires the development of a true AGI Ethics, i.e. of an understanding of the implications of the possibility of self-directed non human intelligent agents, the conditions under which such agents are recognizing themselves and us as moral agents, the criteria for assigning moral agency to them, the conditions under which we can develop mutually shared purposes with them, and a principled understanding of the negotiation of conflicts of interest (under conditions of shared purpose) across different types of intelligent moral agents.
Joscha, when you speak I feel like a 5 years old in awe to your ability to communicate. You speak about the things I think in a way I couldn't imagine.
This was the most beautiful interaction I've seen you having in a public conversation. Overall I agree with what you said, and I think you've been very compassionate with your understanding. But the reason why I loved this discussion is because I understand Connor made a point, it might have not been the main point, but a point. Joscha you've been blessed with the ability to describe the inner workings of people's spirits (OSs) and I understand how you can speed its representation in a formal way in which we can reflect to finally have some human-wide introspection. And I feel the way Connor showed you his emotions are the perfect argument to push you to give us all beyond what you think you can.
Thank you!
Based on this perspective we should maybe also consider the societal conditions necessary to develop AGI. It seems that if we can avoid the local optimum AI scenarios (AI not being alive), there is actually an imperative to elevate life&mind "in the service of complexity against entropy". I am a little afraid that human society is currently so fragile that we won't make it all the way. We should probably try to use AI technology to stabilize society and to mitigate human misalignment, and maybe this should even be our top priority.
How much society do we need to reach AGI?
Wie viel Gesellschaft braucht die Technologie?
Wenn nicht bald AGI was dann?
Reicht es schon, brute force sozusagen,
braucht es Vermittler und Sozialprogramm?