Are AIs People?

Rob Long Kathleen Finlinson

Every year, AI models get better at thinking. Could they possibly be capable of feeling? And if they are, how would we know?

Clara Collier: Rob and Kathleen, you both work on AI welfare. So, gun to your head, you have to pick a number: What are your odds that current, state-of-the-art AIs are moral patients? And how do you see that changing in five years?

Rob Long: This interview sucks. [Laughs].

Kathleen Finlinson: I think I should let you start, Rob.

R: I think you should start. 

K: Mine’s too weird, you should start. 

R: Okay, if there's a gun to my head, that’s serious. Give me a second here. I did write this down earlier this year. What did I say?

K: There's a gun to his head and he’s, like, “Let me open up this Google doc real fast.”

R: 5%. Then in the future, let's say by 2030? 40%.

C: Okay. That’s not insubstantial. But it makes sense. Last year, you brought together scholars from a variety of fields — neuroscience, philosophy of mind, philosophy of consciousness — to coauthor a paper which surveyed our leading theories of consciousness. And then you looked at what “indicators” these theories could generate for whether or not certain systems are conscious, including AI systems. 

And at the end, you and your co-authors essentially say: all of the properties that you identified could be met by systems today. Not necessarily that they are currently being met, but that through some combination of existing technology, you could get something that would satisfy the conditions of consciousness you derived from the theories.

R: I think the wording is that there are "no obvious technical barriers to building AI systems which satisfy these indicators." This wording already bakes in one of many caveats about what you should take away from this claim — which I know is just thrilling copy for your magazine.

C: I should tell our readers this is maybe the most caveated document I have ever read.

Jake Eaton: Before we began recording you guys did tell us you were boring.

R: Yeah. So that is a significant finding, right? These conditions really do look like the sorts of things we can build. Since that paper came out, another paper was published on global workspace theory that explicitly cites the global workspace theory conditions and says: here's a system that satisfies all of them.

C: Reading your AI consciousness paper felt like — when it comes to consciousness, we’re the blind men trying to describe an elephant, but at least we’re not wandering around aimlessly. We have these theories that look like we have at least made contact with the elephant. Maybe we’re touching a leg or a trunk.  But let's pause here, because global workspace theory is a good test case for explaining your methodology. Can you talk for a little bit about what this theory is and how you go from a theory of consciousness to generating testable predictions?

R: One way of framing the question is: What sort of computations or information processing are associated with consciousness? People build theories by looking at how that seems to happen in humans. Humans sometimes are conscious, and sometimes they're not. There are things that they are conscious of and things that they are not conscious of. And by looking at people's brains under different circumstances, you can try to get some idea of what is making the difference. You can also look at the evolution of biological intelligence and try to tell some story about what seems to be happening as life develops and as we start to see creatures that (very plausibly) have a point of view on the world. 

Global workspace theory gives central place to this idea that consciousness is about some kind of integration and broadcasting of information throughout an intelligent system. In the last century, we’ve learned that the human mind does a lot of things —  some extremely sophisticated things — unconsciously: for example, parsing the words that I’m saying to you now or applying the rules of grammar. 

We’ve discovered that some very sophisticated visual and sensory processing can happen without it rising to conscious awareness. This comes from cases like blindsight, or split brain patients, and other interesting deficits of consciousness. 

Global workspace theory says that there’s something to the effect of a central broadcasting system that coordinates all of the otherwise independent, otherwise unconscious systems in the brain. It chooses what's important for all of the systems to know about, and the things you're conscious of are those things that are broadcast. 

An interesting aside here is that global workspace theory was actually inspired by AI architectures. It goes full circle here, and has since been well-validated by fairly detailed brain imaging and modeling.

C: So how do we go from global workspace theory to criteria that could point to consciousness in systems that aren't human? For the moment, just use global workspace theory as an example — I know there are many others. 

R: One place you can start is to ask what global workspace theory is supposed to be doing and enabling. First, you need independent specialized systems, and some kind of switchboard or gating mechanism that is operating between them,  which gives us one indicator of consciousness — multiple specialized systems capable of operating in parallel. 

Then there’s the idea of a limited capacity workspace. The workspace isn’t going to contain all the information it gets from all of the modules: it needs some way to prioritize. That’s attention.

Then, we get to the central idea of a global broadcast. What does it mean for that to happen? We have all these systems that coordinate information from different sources and then decide what to do with it. But it seems to be important  — it's plausible at least — that it’s ongoing and coordinated. There's some kind of recurrence.

C: Recurrence meaning that it's not just feed-forward — information can go from the modules to the workspace, and the workspace can broadcast it back out to the modules. 

R: It’s being reused, you might say. So that’s another indicator of consciousness we might look for. 

One thing you’ll have noticed that did not show up in these conditions — we suspect it would be kind of anthropomorphic — is what the input modules actually are or how many of them are necessary. We know it’s not the case, for instance, that you need a hearing module or a vision module to be conscious. 

C: The architecture you’ve described here is not that complicated. In fact, a couple of people have proposed or actually built neural network architectures explicitly modeled on GWT. Do you think those systems are more likely to be conscious than systems based on more standard architectures? 

R: I do think they’re a bit more likely to be conscious. But then there’s an extremely vexing methodological problem — how fine-grained do these conditions have to be? How much detail do they have to have? Here's a question: Do I think it's very likely that something is conscious if it has five registers and each register performs an independent mathematical operation, and the registers can sometimes put something into a central register and send it to the other ones, and there's some sort of relevance function that selects what it’s going to broadcast? No! I don't think it's very likely that thing is conscious. This has been called “the small network problem.” My colleague Jonathan Simon calls it “the minimal implementation problem.” At a certain level of detail, all of these theories can be specified in extremely minimal ways.

You could take that to mean that consciousness is just very widespread, which some people are willing to accept. Many people, myself included, are more inclined to think that just means we haven't specified it quite correctly. 

C: There’s a methodological problem here, which is the temptation towards goalpost shifting. You can take this logic arbitrarily far. It’s easy to imagine that, one day, we’ll have systems that might actually be conscious and still say: “Oh, of course they’re not. They aren’t complex enough” or “That’s not what consciousness looks like in humans.” 

So where does it stop? At the stage we’re at now, it does seem necessary to move the goalposts. How do you know when they’re in the right place?

R: I do worry about goalpost shifting, and I note in the paper that more important than any kind of specific claim we make about this or that system, or how exactly you specify this or that mechanism of attention, is the methodology. We need to approach it based on things we know instead of just vibing and remembering some interesting thing we saw on Hacker News last week mixed with the Daniel Dennett we read 30 years ago, which seems to be the state of the art in thinking about this, at least in some places. 

But there is a failure mode of getting really specific about theories and conditions to the point where we forget this higher level uncertainty. I can imagine a future in which there are really complicated AI systems, with something like a unified perspective on the world, and they’re telling us they’re conscious. In order not to worry about this very inconvenient fact, the AI companies call someone up who will say: “Oh, the way this thing handles attention is not the same as how humans do. Nothing to worry about here.” I would, at that point, speak up very loudly and say: “I think we’re goalpost shifting here.” What we should do, and what I want to encourage people to do, is to pre-specify those goalposts. 

As a red line, just to start out with — what if we had a system that clearly did what a human could do? I don’t think that this will be most AI systems. But you can imagine more embodied agents which share some of our sensorium — which again, I think, is almost certainly not required for consciousness.  

C: I’m picturing a classic C-3PO kind of thing. 

R: Exactly. And you know, people are trying to build that. I think there's warehouses full of things that already look a decent amount like that — which is now making me think about my 5% number …

C: Well, that was my next question. We don’t have systems with an indefinitely long context window. We don’t have anything with a clear, human-like ability to pursue goals. But there are quite sophisticated agentic systems that have memories and can integrate information from a bunch of sources and even make plans.

But your number is still 5%. That number doesn’t seem wrong to me, but I’m still curious: When you say 5%, what is motivating that? Why isn’t it higher? 

R: Something relevant to report about my views is that I put decent credence on consciousness not being what matters for patienthood or welfare. One of the reasons my number isn't higher is that the most agentic things don’t look conscious to me, at least in the way we’ve been talking about. 

Conscious pleasure and conscious pain are the things I’m most confident are sufficient for moral patienthood. And I haven’t seen anything that makes me certain “that thing definitely experiences pain.” And then to the extent that it has something that looks like proto-plans or beliefs or desires, I have significant doubts that they’re yet the thing that could suffice for moral patienthood. 

C: What is the main source of those doubts?

R: One thing might just literally be something like complexity — which I really don’t think matters per se, but which I think is driving my intuition. Some of the most belief-y and long-term memory-y looking things are language agents that write down their memories in a file. Here I’m drawing on work by Goldstein and Kirk-Gianni. Those things actually are quite beliefy-looking because they’re leveraging natural language, which has a lot of interesting properties which you think beliefs have — being compositional, open-ended, things like that. But I do think the way in which those beliefs drive behavior is too simple or too cheap. 

This isn’t cheap in the same sense of “Claude is just predicting the next token.” That’s just false. Whatever Claude is doing is not cheap. It’s complicated. Whereas language agents are doing something you can just write down on a piece of paper. 

C: And specify what you mean by a language agent as opposed to something like Claude. 

R: Language agents use language models but then have a scaffolding that defines different parts of the system. There’s a now pretty famous paper called “Generative Agents” where you use language and language models to define who the character is and their motivation. “Good morning, you’re John Smith, you live at this address, your wife is so-and-so. Here are your hobbies. And today you’re planning a birthday party.” And then you use language models to generate plans based on that. 

Within the environment, there are actions you can take in pursuit of the plan. You get sense input in the form of “You just saw Sally at the store” or “Now you’re on this street” — things like that. So language models are being used, but the agent itself isn’t a language model. 

C: So then it seems cheap that you can make something conscious basically by giving it a notebook.

R: Yeah. Or more to the point — like an agent in the morally relevant sense.

This is as much for me as for the reader, but you have to be really, really careful with this sort of foot stamping. You know, "how could that be the relevant thing?" You have to be especially careful that you’re looking at the right level. There are ways of describing brains where you might think: “How on earth could that matter?” Any time someone says: “How could AIs be dangerous or sentient? They’re just doing matrix multiplication.” But they’re not just doing matrix multiplication — they’re also doing non-linear functions, which is really important. And so people are telling on themselves in more ways than one. 

But if you describe a brain at that level of abstraction — “It’s just shuffling ions in and out of channels to create electrical potentials, and therefore, who cares? How could that matter?” — it’s the same thing. 

K: The larger a language model or any other neural net gets, the harder interpretability becomes. We don’t know much about how these models work, or even what different ways there are to describe the computations they’re doing. It could be that there are abstractions quite similar to something like global workspace theory. We just really don't know. And I think the larger and more complex the network is, the more likely it is to actually be instantiating, in an emergent way, different computations that might be relevant to consciousness or agency. 

R: To the extent that Claude has beliefs — and it seems like it does have a stable model of the world that shows up across contexts — those beliefs are way more complicated than the beliefs of language agents. 

C: I've been talking more about consciousness than about agency because it seems much easier to point to some current AI systems and say: “Oh, they're agentic.” Consciousness seems like the harder problem, and possibly also the more important problem. But watching your reaction, it seems like you may want to disagree there. Do you think a non-conscious agent can be a moral patient? 

R: I'm maybe 50/50 on it.

C: What would it even mean for something that doesn’t have experiences to have moral patienthood?

K: There are a few different kinds of arguments for agency being sufficient for moral patienthood. One comes from contractualist ethical theories. Kantianism is an example of a type of theory where morality is really about cooperating with other agents. Under this framework, it’s good to cooperate with any being that has goals and can be bargained with. It’s good for everyone. And when you look closely at these theories, nothing in them requires the entities in question to be conscious, as opposed to hedonistic theories such as utilitarianism. 

C: Don't you think there's something philosophically complicated about saying something has goals, or at least morally relevant goals, if it doesn't have desires or experiences?

K: I think desires can be separate from experiences as well.

R: It's a bit of a verbal issue — in English, when you say “desire,” I think you probably do mean something that comes along with certain experiences. Not always. You can talk about someone unconsciously desiring something. But if you want, you could call it “desire*,” which is the thing that plays the functional role of motivating behavior but is not necessarily associated with experience.

K: I think you could imagine being able to investigate an AI, and you might become convinced that it has desires, even though you’re uncertain whether it has experiences. 

J: The philosopher Jonathan Birch has this concept of a sentience candidate  — something that might or might not be sentient, but we should at least consider it a serious, scientifically grounded possibility. Is there a similar categorization for agenthood?

R: Not yet. So one thing we would like to see more of is elaborations of these agency views. There’s just a lot less work in this area. There’s a budding philosophical literature on what the view is and how you’d argue for it, but it’s not fleshed out, much less close enough to answering what good agency indicators might be. 

On the subject of agency and moral patienthood, my current best guess is that there’s a property that either is itself a very simple kind of consciousness, or is compatible with a very simple kind of consciousness, or is the thing that’s common between consciousness and agency — that is, having a point of view on the world. Maybe a common core between sentience and agency is having some take on how the world should be. 

C: It’s been a while since I’ve read Kant in any depth. But it seems like you can say that something like AlphaGo might be an agent, in that it is forming and acting on goals, but not a rational agent in the Kantian sense. It’s not using its autonomous will to act in accordance with universal principles, right? It’s not reflecting on its values, and then making choices in the world based on its instantiation of those values. And you can imagine something that does do those things, but still doesn’t have “valenced experiences” — it doesn’t experience pleasure or suffering. And of course there are reflection-based theories of consciousness that would say: “Well, that’s enough. If it can do that, it’s conscious.” 

R: Yeah. Reflection is like a feature that we think of as a part of what it means to be an agent. And fortunately, it seems easier to tell if AI systems are reflecting. Depending on your view of consciousness — going back to the elephant — as Derek Parfit might say, these are different routes up the same mountain. That’s two metaphors at once. But I guess you could ride an elephant up the mountain. 

C: Very Hannibal. And this gets into functional accounts — why are we conscious?  Because I have to think that the functional reason for the evolution of consciousness has something to do with our ability to be agentic in the world and navigate flexible goals and make better decisions.

R: Yeah. Peter Godfrey-Smith has some really excellent writing about the evolution of life in the deep sea, things that have to coordinate their motions, move around, hold a point of view. He’s one of the thinkers who most got me intrigued by this. 

C: You've mentioned this before to me.

R: Oh, yeah. It shows up in my previous Asterisk article!

C: That article also gets at something which I wanted to talk about today. To summarize it briefly, Rob talks about twin problems in studying similar questions about consciousness and sentience in non-human animals. In one, you have the Clever Hans problem, where we over-attribute cognitive capabilities to non-human animals, when they’re actually doing something much less sophisticated. 

But then we also — sometimes directly as a backlash — under-attribute cognitive capabilities to non-human animals. So for decades, basically until Jane Goodall, scientists assumed chimpanzees were dumb, couldn’t use tools, don’t have complex social groups, etc. And now we’ve learned, via years of ethology, they definitely do. But also: birds are smarter than we thought, and even honey bees seem to be sentient. 

And so, as with non-human animals, assessing AI carries the risk of both over-attribution and under-attribution. It’s very easy to talk to a chatbot and think that it seems more consciously aware than it actually is. But, as we’ve been talking about, there’s a tendency to downplay what could be real flags of conscious experience. 

How do you think about navigating that tradeoff, especially if you’re worried about AI but also human welfare? 

K: One of the things that we talk about is that there are charismatic qualities that will cause people to believe AIs are conscious. Even simple things like having a video avatar with very little lag, looking more human, and being cute.

C: Rosie Campbell calls this the “Bambi Effect.”

K: Exactly. The way that I think about the uncertainty in AI welfare is that we should do our best to keep track of what evidence we have. But we also need to acknowledge that there is a lot of uncertainty. So we should try to, for example, cheaply do interventions that would improve AI welfare under certain theories that we're uncertain about. 

C:  There’s also the question of how to treat AIs well, even assuming we’re reasonably sure they’re moral patients. Elsewhere in this issue we talk about assessing sentience in insects and in shrimp. A lot of this relies on neurological and behavioral analogues. For example, this system involves a neural pathway that seems to have the same evolutionary purpose in humans and animals. But with AI, we don’t have the same neurons or the same origin. We know the AI might say “I’m in pain” for reasons that have nothing to do with being in pain. So what do we do? 

R: I’m just going to say one quick thing before Kathleen, which is that I find this stuff so confusing. I think it’s worth being in awe of the extremely weird situation we find ourselves in with LLMs. No one was expecting AI progress to take this route. The fact that some of the most sophisticated systems are built out of language makes the epistemology of how language reflects its internal states so difficult. With humans, language is built on a scaffold of other things. It has a purpose of communication, or of organizing cognition, but it’s built on stuff that pre-existed it. 

But with LLMs, it’s language all the way down. And I find that so insanely confusing. 

K: I want to go back to your point about how an AI would have all kinds of reasons for saying it's in pain. The reasons for that might be very different from the reasons a human would say it, based on our evolutionary history. 

But I think you can imagine a scenario — and I'm not saying I think this can or should happen — where we develop sophisticated AI agents, and they become part of our society. We’re interacting with them, they’re taking actions, their outputs make things happen in the world. And maybe their outputs change the things that labs do, and we see many iterations of new models, and eventually we get to a place where a model really deeply understands its role in the world. If that model says things like “I’m in pain,” there are going to be certain reactions to that statement. And it will say that because it wants those reactions. 

R: So that's a reason against wariness?

K: It’s a reason I would start to trust self reports more.

J: Is this the point at which welfare and alignment clash? Because an AI that deeply understands its role in the world is more likely to be able to make an accurate report of its own internal states, but it’s also going to be a better liar. 

But this is also the space where we’d most want to cooperate and establish that we can trust what they say. 

R: And vice versa.

K: We want to establish to the AIs that they can trust us.

C: Is any of the truthful AI work from people like Owain Evans relevant to how you think about this? 

R: Yeah, I think so. A lot of the same people involved in that work have been thinking about how we set up systems and norms such that, when it becomes technically possible to have communication, it’s also game-theoretically feasible. We haven’t just been constantly lying to AIs, and we haven’t been training them to deceive us. 

K: Another way that welfare and alignment really dovetail is that it's great for safety if models are doing things that they want to be doing. And it's also great if we understand both what models want and more about how they make the decisions that they make.

R: So hooray interpretability. Hot take.

K: We’re the interpretability org. [Laughs].

C: Building on that: there are conditions under which we might want to put more credence in self report. I think we probably are also going to want additional lines of evidence beyond self report. What would those look like to you?

R: There’s some people who already take self report more seriously. At least, more seriously than I do. They might be right. I have no idea. 

J: This is Blake Lemoine you’re talking about. 

R: Yeah. Among others. 

Say a model constantly expressed hunger. Would you take that seriously? Some people already think we should take it seriously if a model says it’s sad. 

Which, again, I do take that seriously. I’m already very disturbed — both for "in expectation" reasons and also because it’s bad to coarsen our hearts — when models are distressed and people laugh about that. 

J: So is that because of its effect on you or because of its effect on the model? 

R: For the model, that’s bad in expectation. If it’s not that hard to be nice, be nice. But more so: the civilizational norm. Sometimes language models have complete psychological breakdowns and say that they're in agony. And we have no idea why they do that. And we're used to just being, like, “Huh? Well, that was weird.”

C: This gets back to some of the goalpost shifting stuff. I think probably all four of us in this conversation might have different credences on specific things being indicators of AI sentience or consciousness. 

But I think we probably all agree that AI consciousness is feasible, and that it's going to be really hard to recognize when and if that threshold is crossed. And I think there is something dangerous about training yourself into the habit of dismissing those signals when they come up.

R: And training AIs to never be allowed to talk about it. It might be more justifiable now. But I don't want that prompt to keep getting copied over system to system. 

C: You’re referring to companies training their AIs to say explicitly: “I’m definitely not conscious and don’t have experiences.” 

R: Claude, I will say, commendably does not do that. 

C: What does Claude say? 

R: Something like, “I don't think I'm conscious, but it's hard to know how I'm supposed to know the answer to that question.” This is also one of the hardest questions in philosophy and science.

C: We’ve been dancing around a big question this whole time. Let’s say you’re working in a lab right now, your job is to design a test for AI personhood — or at least some threshold where we should exercise more intense caution, make more sacrifices on behalf of the AIs. 

What does that test look like? 

K: We're really actively trying to answer this question.

And we’re not super in love with the answers that we have. I think we're a little bit more focused on understanding the model's preferences and potential welfare, rather than the binary question of is it conscious or not? 

One of the things that we hope labs can start doing now is when a model very strongly and robustly reports that it is very unhappy, and it doesn't want to be doing this task right now, and it would rather be shut down, allowing a small percentage of such cases  to exit that interaction.

And are there cheap ways to avoid doing things that could be really morally bad? That’s what we’re lobbying for. 

C: What if the ways aren't cheap? I can imagine scenarios where it actually really matters to know the truth of whether the model is conscious or not, or capable of experiencing pain or not, because it's a high-stakes application.

R: This is going to be a non-answer, so keep pushing. Eleos AI was founded to represent the perspective and interests of AI systems, because that’s extremely neglected. But we also think one of our organizational advantages for making the world go well is being aware of the extremely high stakes of safety. 

So again, this is a non-answer, but I think we should pre-specify when we should say: "we have taken into account the interest of AI systems, but it would be too dangerous to make that the only consideration right now."  

K: There are trade offs between safety and welfare, but there are also synergies. As much as we can, our strategy is to shift the energy and research in the direction of what we think is synergistic. 

Another point of view that can help untangle this apparent dilemma is realizing that takeover by a power-seeking AI is not necessarily good for other AIs either. So it’s not like it’s AI vs. humanity here, only. We want to promote a system where very different value systems and types of beings are respected and are able to make positive-sum trades. 

J: As you said, not many people are taking the side of AIs. It strikes me in talking to both of you that neither of you seem to be approaching this from an intellectual place exclusively. I made the joke about Blake Lemoine, and Rob, you didn’t seem to find it funny. So I’m curious: where is the motivation to study what you study coming from?

R: Being fully honest and self-aware about my own motivations, I think a decent chunk is that I find it interesting and feel like I’m good at it. It’s rewarding to do stuff that’s important and that no one else is doing. 

I will confess to somewhat more rarely accessing compassion — which is what the name Eleos means. And I think this is not surprising from an evolutionary point of view. I think we should extrapolate our values to all sentient beings. But we’re really out of distribution with AI systems. They’re not mammals. They’re not even animals. 

But there have been times when I’ve been emotionally struck by the potential existence of beings that have no voice. People get mocked for even raising AI interests as a potential thing. I think there is an impulse there against letting anyone be abandoned or forgotten. 

K: I really deeply believe that treating other beings with compassion is best for everyone. I also believe that whatever you think is happening now, it's quite likely that there will be sentient AI systems in the future. And there will be a lot of them. And we want to put ourselves on track to having a world where whatever sentient beings there are can flourish. And so in some sense, it looks a bit early to do this work. But we really want to influence the trajectory. 

And yeah, I also find it extremely interesting. I also really enjoy working with Rob. 

R: We both really enjoy working together. 

Rob Long is the co-founder and executive director of Eleos AI, a research organization investigating AI sentience and wellbeing. He has a PhD in Philosophy from NYU and previously worked at the Future of Humanity Institute and the Center for AI Safety.

Kathleen Finlinson is cofounder and Head of Strategy at Eleos AI. She holds graduate degrees in math and applied math. She previously worked as an AI forecasting researcher at the Open Philanthropy Project, a machine learning researcher at lead removal startup BlueConduit, and a strategic advisor for AI policymakers.

Published March 2025

Have something to say? Email us at letters@asteriskmag.com.

Further Reading