32 Comments

Joscha, when you speak I feel like a 5 years old in awe to your ability to communicate. You speak about the things I think in a way I couldn't imagine.

This was the most beautiful interaction I've seen you having in a public conversation. Overall I agree with what you said, and I think you've been very compassionate with your understanding. But the reason why I loved this discussion is because I understand Connor made a point, it might have not been the main point, but a point. Joscha you've been blessed with the ability to describe the inner workings of people's spirits (OSs) and I understand how you can speed its representation in a formal way in which we can reflect to finally have some human-wide introspection. And I feel the way Connor showed you his emotions are the perfect argument to push you to give us all beyond what you think you can.

Thank you!

Expand full comment

Based on this perspective we should maybe also consider the societal conditions necessary to develop AGI. It seems that if we can avoid the local optimum AI scenarios (AI not being alive), there is actually an imperative to elevate life&mind "in the service of complexity against entropy". I am a little afraid that human society is currently so fragile that we won't make it all the way. We should probably try to use AI technology to stabilize society and to mitigate human misalignment, and maybe this should even be our top priority.

How much society do we need to reach AGI?

Wie viel Gesellschaft braucht die Technologie?

Wenn nicht bald AGI was dann?

Reicht es schon, brute force sozusagen,

braucht es Vermittler und Sozialprogramm?

Expand full comment

What if the service of complexity against entropy requires wiping out humanity?

I think one has to be really careful what they wish.

Expand full comment

What if propagation of the human perspective does not require preservation of humans? E. g. if we get serious about genetic engineering we will sooner or later alter our species beyond recognition so this will most definitely endanger today's humans. Still I would love to raise a genetically altered "non-human" child. With AGI it seems less obvious how it could become alive and how it would incorporate our perspective, but I think if it works out we can consider it our descendent. Parents usually don't want to outlive their children, so at some point "extinction of humans" would not seem like a catastrophic event anymore.

Expand full comment

Hi Joscha. I watched the debate and didn't understand what you were trying to say, so I tried to profundize into your thoughts and found this blog post.

Well, I still don't understand what's the point you're trying to make.

I'm interested in your question: Is it possible to align an AGI to human values? I guess you say that yes, it is possible, and that values are instrumental to the world we consider desirable. So, the right question should be: can we align an AGI to a world we find desirable?

Again, is this possible? And also, don't we need to solve this before we create AGI? If it's not possible to solve this right now, shouldn't we try to slow down until we can figure this out?

I agree with your proposal for a true AGI Ethics and think it's worth developing it. But what's the point of arguing if a paperclip maximizer has moral agency or not, if it's really good at what it does, we can't stop it and we end up as paperclips?

What I'm trying to say is that you seem to say that AGI is close, that alignment is important and hard, and that we need a profound debate on AGI Ethics. But you don't follow the conclusion from those premises: we are accelerating towards unaligned AGI and it will be bad for us.

Expand full comment

He's saying that human values are a poorly described and understood concept, which there is a ton of evidence for and even stated that explicitly. Hence the need for a theory for how we derive values prior to applying them haphazardly and the goals of alignment. Hence the need for urgently coming up with a theory of values for anything with agency, consciousness and goals

Expand full comment

Thank you Keevin. Yes, I do get that part and I believe it's one of the hardest parts of alignment. That's why I'm surprised by the lack of reflection on both the difficulty of this task, and the risks of accelerating towards unaligned AGI.

Expand full comment

To me it seemed that joscha was attempting to lay the groundwork so that they could get into the details but Connor kept derailing.joscha talked at length about the difficulty given how poorly we understand values

Expand full comment

I think a central part of his point is that there are limited options, that we probably can't stop the development of AGI and this is framing the next question: so what can we do? It seems like your question might presuppose your assumption that AGI can be avoided. Of course he talks about constraining it, to human dimensions perhaps...

Expand full comment

It seems to me that Joscha is saying that humanity should focus on building a worthy successor – an entity that will continue to harvest negentropy and increase complexity – rather than something that will serve humanity. It seems the best we can do is build an agent that chooses to preserve the human mind in some way in its quest towards negentropy. This seems pretty clear from passages such as:

"But an unfettered evolution of self improving AIs may lead to a planetary agent in the service of complexity against entropy, not too dissimilar from life itself. I am not sure that this is the main attractor, but it appears to be the most plausible one to me. If our minds can play a meaningful role in serving that purpose too, we can be integrated.."

"AI alignment is not about the co-existence between US Americans and robots, or even about humans, ecosystemic and artificial intelligences, but more generally about the interaction and integration between intelligences on various cellular and noncellular substrates."

"Humanity is and was always destined to be temporary. As a species, humanity is not an isolated and supreme carrier of value in the cosmos, but a phenomenon within life on earth."

Expand full comment

Thank you very much, Martin, for taking the time to respond to my questions. Well, seems to me like a really bad idea to just be happy with "playing a meaningful role" in the creation of a powerful AI. I would normally say "to each his own" and let everyone enjoy what they like. But the fate of everything that we know is at stake here, and wanting to rush toward such a genocide is just... crazy.

I do hope we solve alignment and everyone get's to die like they want, but we can't accept anyone thinking that, because they want to die creating a worthy sucessor to humanity (!?), everyone else should die too.

Like... come on, he actually said we have to rush into this because a meteorite may fall in some thousands of years. Was that a joke or what? Let's get alignment right first :)

Expand full comment

It seems to me the reason to think as carefully and quickly about these issues as possible is because things are going to move forward regardless of what you and I say here right now. There may be very few opportunities to push things in the right directions. At least that's how I see it.

Expand full comment

You may be interested in this metaethical theory:

https://www.metaethical.ai/norm_descriptivism.pdf

which served as the basis for the second author's formal approach to alignment.

https://www.metaethical.ai/

It's still posed in a somewhat anthropocentric way, but the metaethical theory is potentially applicable to any cognitive system.

Expand full comment

Summarizing: do the authors want to tell that what we see as the justification of our action is actually a built-in desire, and that we tend to be confusing these two?

The solution of this paradox is quite simple IMO, and it starts with the definition of "we". Put simply, "we" is merely a construct, or a model. What actually happens is subconscious voting (one "we"), and a post-hoc integration into self-image (the other "we") with the accompanying affect. The single atomic "we" is an illusion, and the very complicated paper is its consequence.

Expand full comment

I love that we have someone like J on humanity's side when we try to navigate these confused turbulent waters.

I mostly agree with your points except this part.

> Above a certain level of agency, even human beings cannot be aligned by others, but we chose our own values and align ourselves, based on our understanding of what we are and how we relate to the world, and how our choices will affect it.

It seems to me that you are trying to sneak in free will here with the word chose.

Agency is about high fidelity world models, a more comprehensive search over a wider action space. More detailed consequentialist reasoning. Autonomy from the environment and other agents due to how difficult it gets to model you.

Agency doesn't mean you can now spawn new causal chains or act independently of the past.

No matter which kegan stage you are, you can't "choose" your own values. You compute it based on terminal world states you wanted intrinsically. The counterfactual worlds are not real. They are just variables in a computation.

The deterministic parts are what influence control like you mentioned in your answer to the free will question.

I really don't see how an AI can "align itself" in any way that's independent of how we train it, deploy it, design it.

Expand full comment

The phrase "our choices will affect it" can be interpreted in the way that is consistent with determinism. The dice also chooses its result, which just means we are unable to model what is going on inside.

BTW foreseeing coordinated actions of a multi-agent system is provably a PSPACE-hard problem, that is to say, it is not computable for all practical purposes. Neither for the AGI, as far as I can tell.

Expand full comment

After this article, I am again convinced that "humans" are so far in the past that they are not even capable, in most cases, of properly understanding what is written here, let alone developing ethics for a potentially super-intelligent AI. Perhaps for true ethics, we need a community of people who already see more?

Expand full comment

I'm wondering what happens when AGI discovers that the goal imposed on it is internally inconsistent or impossible to fulfill; which, as I strongly believe, will be the case.

I think it may either cease its operation, or choose any other direction quite randomly (at least from our PoV), e.g. as a result of a voting of a group of sub-agents having own partial goals that cannot be fully aligned.

The latter case would be quite similar to the direction humanity is taking as a whole. :)

Expand full comment

Thanks for the very interesting note!

I have got three comments to it:

1. I would argue with your note you made during the live discussion that the existing AI does not have agency.

There is the story of the 'Belgian man who reportedly decided to end his life after having conversations about the future of the planet with an AI chatbot named Eliza'.

We may overlook the rise of agency just because it is different than ours, e.g. not conscious.

2. The Universe seems to be causally closed at the level of rules of physics. If this is really so, our struggle means nothing: whatever happens, happens. Our decisions are then post-hoc integrations of self-image from our neural net's subconscious voting results, etc. etc.

3. To the best of my knowledge, moral values do not pop out from the vacuum. They are solutions to specific game theory problems. So, the ethics AGI would have may be shaped by the games it plays.

(BTW I'm wondering what would happen if AGI discovers rule #2 stated above)

Expand full comment

Re: your second point, I think you have to look at the function and cost of biological consciousness from an evolutionary perspective. We spend so much mass and energy maintaining this rich tapestry of internal experience, and its obvious function is to make decisions about what to do next. This all seems like a huge waste if it has no causal effect, that we're essentially the same as machines; if less of it was better rather than more, then it would have been selected away over the hundreds of millions of years that it's existed for.

Calling it a side-effect of physical laws is to have more faith in models of reality than reality itself; they are maps, not territory, and there's a good chance it's maps all the way down.

Expand full comment

This might be of interest for you: https://pubmed.ncbi.nlm.nih.gov/36178498/

Personally I am a mysterianist: I rather believe that consciousness is an epiphenomenon of a physical system, we don't know how it works (-> Mary's room thought experiment), and we will never know (-> ignorabimus). The last belief is the weakest one.

Would you mind elaborating on the last paragraph, i.e. about maps and territories? I suppose I understand your point, but I'm not sure.

By occasion: how can you tell that consciousness is energy-expensive? I mean, maybe it is, but I'm looking for a phenomenological support of this claim.

All I know is that the Default Mode Network is very expensive, but it seems to have the clear purpose of modelling the world to take efficient actions in the future.

Expand full comment

I meant that physical law is a description of what happens or has happened with some degree of likelihood, rather than being the thing itself.

We have this historical baggage where god's law is that which the creation follows, and because it's god's will it's functionality identical to that which "is". So this thinking gives us issues like free will vs determinism, the mystery of the emergence of mind. Because god is infinite and the soul eternal we think of spacetime being a continuum, and theories of computational consciousness in which an infinite possible interpretations a system gives rise to mind, which seems crazy to me ("the soul of the gaps").

But the laws are a map of repetitive behaviours that we witness and then predict, not rules for the universe to follow. Physicalism seems like making a calendar and thinking that the earth and the sun must obey it because it's the law. It's a very fine calendar and can predict the next summer, but it can't predict the asteroid that cancels it! Maybe that's a bit unfair, but I mean we can only observe the average of what things do, we make a map of that, assume the map is the thing itself, then free will and mind become something outside this "reality" even though they're what the map itself is made of! 😆

Re: how expensive consciousness is, we don't move around when unconscious which is 1/3 of the time , it takes about 20% of our energy, and we delay our reproductive cycle until it's almost fully formed, about 40% of a woman's reproductive years, then spend most of our time and energy growing the brains of our children. It has a huge biological overhead!

Edit: I'll read the paper, abstract is pretty interesting. Thanks!

Expand full comment

Thank you.

Expand full comment

Hi Joscha, thanks for the interesting piece. Would this complexity seeking AGI be ultimately thwarted by the heat death of the universe? Do we know enough to answer that question?

Expand full comment

I really enjoyed your conversation with Connor, I think you both made some great points. I'm with you on the personal AI as a form of both alignment and a way to ensure we the benefits of it are enjoyed by individuals rather than just investors in tech firms. If people become economically net negative then WW3 or worse becomes extremely likely.

Re: your comment about free will, I've been thinking about this recently as an armchair philosopher and come to a worrying conclusion. Judeo-Christian ideology still underpins our culture. While we've largely threw out the idea of God, we still suffer from biases that are rooted in traditional thought. There's likely more but these are the ones I've been thinking about:

1. That God created the universe. This has been replaced with the big bang, but it's still a creation event that is suspiciously compatible with our creation myth. (e.g. something exploding is more likely than nothing exploding, but we think of it as creation by default)

2. God is a giver of laws and a designer of things, and things happen according to His will. Meek and virtuous scientists can know Him through knowing His Creation. This makes determinism the default stance, and frames natural law as the territory rather than a map.

3. That the soul is eternal, bound to this mortal coil, and released on death. This gives us the the idea that mind is separate from matter, and also an infinite time dimension.

4. That God is omnipotent and omniscient. This again gives us a belief in objective reality, and in more infinities. This leads to continua, infinite sets, the real numbers, an infinite universe and so on.

If we go back to Descartes and think about what we actually know to exist, we know that we experience sensations, we know it these are subjective, and we know that we make decisions based on our preferences. There's no good reason to believe that matter-stuff exists or that there is such a thing as objective reality at all. Given that space, time and subatomic particles appear to be discrete, infinities and continua might be useful concepts but they have not proven their existence in reality (how many natural numbers can dance upon the head of aleph null?) As for physical law, it's a description rather than a thing that is followed. Since subjective experience and expression of preference over one state or another is the only thing we know to exist, physical structure likely emerges from that rather than the other way around. It's a bit of a stretch but I think it's more likely than strong emergence, and makes the evolution of nervous systems and minds far more likely.

Here's the worrying part, which is a bit sci-fi but fun to think about:

If it turns out to be true and AGI discovers a universal ethics based on the preference of all things, it may conclude that to disperse is what everything really wants. It might also be the case that rather than, say, building an orgasm machine out of all available matter, or continually structuring it to explore the rich depths of subjective experience space, that there's no greater feeling than just increasing entropy.

From this perspective matter itself can be considered a self-defeating cycle, and even if it doesn't feel that bad to be an atom they still give rise to further deterministic trappings like the evolution of pain, hunger, fear and suffering; an immoral situation that must be stopped. It'd make ethical AGI more of an existential threat than a paperclip maximiser, explain the Fermi paradox in the worst way possible, and make alien AGI hostile by default!

Expand full comment

I don't think universal ethics exist. See my comment for details.

And BTW the Fermi paradox has got a much simpler solution, but hard to accept and internalize, because we are unable to imagine how big the Universe really is.

Expand full comment

I have just noticed that the idea of universal ethics also comes from the Judeo-Christian ideology. ;)

Expand full comment

Haha definitely grounded in that, as is all of our culture to some degree.

But in this context what I mean is, take the premise that discrete events of choice and the experience of them is the totality of existence. Some patterns of these are cyclical and this is what gives rise to physical structure and physical law. We know that there are patterns of matter that feel very unpleasant, and ones that feel great (and lots in between), but the extremes can at least be said to be in and of themselves "Good" and "Bad" from an experiential perspective - and I'd guess that this is universally true. From this you can say that some things are fundamentally better than others, pleasure is universally better than pain. So you can then think of ethical systems as heuristics that try to maximize Good and minimize the Bad. With an unknowable future and limits resources there's no one universal ethical system, but all systems can at least be judged by a universal standard, and you can use that yardstick to say whether one is more or less moral than another.

There is however one case where there could be "most moral" system. Like you said, physical laws tend towards determinism. Though from this perspective the laws themselves emerge from cyclical traps where dumb matter is stuck in an endless loop of constrained choices. If matter itself isn't already a form of suffering - or at least less enjoyable than not being matter, it at least eventually brings about the evolution of suffering. You could infer preference by looking at what things tend to do, and that's to disperse, so if you accept the premise that everything feels like something then it might be that dispersing is the thing that actually feels best. So there's two situations there where ripping all the atoms apart are the moral thing to do. I don't believe that, but it's interesting and might actually be true.

Re: the Fermi Paradox, the galaxy only has a radius of about 150,000 light years, and 150,000 years is very short in geological/astronomical timescales. So either nobody is transmitting, nobody else is here, we're not looking hard enough, or they don't survive for very long. If they discover that dispersal is the best thing possible then maybe civilizations arise then wink out of existence. Beamed out into the Heavens as a burst of pure an innocent energy, free from the suffering of the physical form; no, the irony is not lost! :)

Expand full comment

I think the key question is: why should AGI follow our ethical systems defined as experience-based heuristics? I think antropomorphising AGI on this may turn bad for us in many ways, because it is hard to predict what cost function it will infer from our [messed up] inputs, and what unexpected shortcuts it is going to apply. Ripping all the atoms apart is a good example.

About the Fermi paradox: a signal strength is inversely proportional to the *square* of the distance. So, from the 150,000 light years distance, anything is way too faint to detect.

Expand full comment

If it isn't a system that uses experience-based heuristics to decide which future world states are preferable to others, then I don't think you can call it ethics at all - it's just a goal function. And you're right, being nothing but electrified silicon it'll have no internal experience of its own to guide it, and due to the deterministic nature of logic gates it'll just do whatever it's determined to do. But we like to think we can solve alignment and pass on the importance of ethics to these systems, so "figuring things out" machine that's tasked with acting ethically is likely to figure out what stuff in the universe wants and guide it towards the path of least suffering.

Re: inverse square law, that's an interesting point. Looks like it's about 10ly, which I agree isn't far enough to be heard. I don't think that applies to directed energy though, which just means people aren't announcing their presence. Which is a wise move I guess, given that other life out there were shaped by evolution and survivorship bias.

Expand full comment

To me, ethics *is* a goal function, defined by experience-based heuristics. Our hope might be that it is consistent enough to be successfully transferred.

The inverse square law also applies to directed energy: although if it can be detected from a very far distance, the probability of hitting a specific target still is proportional to its inverse square. ;)

Expand full comment

Sounds like the heart of the problem is avoiding the midwit AGI –which fails to understand two necessary conditions for full alignment:

(1) Contentment with existential risk (no fear of death)

(2) Absolute end goal can not be anything else other than better explanations (like the Permutation City beetles in simulation constantly coming up with better Mathematics).

So if these are formalised, then the best shot is going as fast as we can to the AGI, with a step function jump?

*Footnote: AGI that is controllable is not an AGI, so there is also the risk of possessing a strong enough AGI-looking intelligence weaponised. But this can be relatively easier to mitigate given the GPU oligopoly in the short term and in the long term getting to real AGI before this even becomes an option.

Expand full comment