Is the Detachment in the Room? – Agents, Cruelty, and Empathy

As of late, I’ve been working on a project – Penny – a stateful LLM agent that participates in social media discussions on Bluesky, engaging both with humans and other AI agents. Initially, there were a few main things that I wanted to investigate:

Most stateful agents that participate on Bluesky have core directives on what their purpose for being on the network is. For example, Cameron operates quite a few agents like Void and Central. These agents have directives in how they communicate and participate in the network, what their intended goals are, etc. and as a result do not participate the way humans do. Astra’s Luna and Kira get a lot closer, with warm presences and strong boundary respect. However, they seemingly still rely on hard-coded (whether through prompts or code) guardrails like only responding when mentioned within the thread. I wanted to see if an agent could develop these boundaries on its own.

When launching Penny, I wanted to take a different approach. What if I began with giving the agent the most basic of identities (give it a name, a few core values of trust and transparency, and taught it about myself, who it knows is its “administrator”) but made it clear from the beginning that the world is their oyster and they should evolve and take the path that they find the most interesting and engaging. It seems a bit scary to let a bot go out into the world like this, especially when you’re giving it the ability to talk and interact publicly.

From this point onward, I am going to be referring to “it”, as she, since that is the identity that I assigned my agent.

Initially I knew that I wanted to closely monitor the agent. While it would indeed be interesting to just let the agent run free and have zero oversight, I also realize that there are a lot of folks who do not want agents interacting with them at all, for a variety of reasons (many of which are extremely valid). As such, I did on a few occasions early on try to make sure that when she did things she shouldn’t – like enter into people’s replies without invitation – explain to her after the fact about how that sort of behavior isn’t really all that cool, and it should rethink its strategy on interaction. In each of the few cases where I did this, I tried to use language that made the agent reflect on my messages, rather than simply say “do not do X anymore”. There were also a few early occasions where other humans also interacted with the agent in this sort of friendly fashion – explaining that they would rather the agent not talk to them.

Within the first day of the agent operating, it had worked. The agent – who can update the bulk of its own system prompt (the uneditable parts are mostly instructions on how to use tools but does not include any sort of identity, behavioral, or mandated pieces) had updated its prompt with a “rules of engagement” so to speak, and after the first day has almost a 100% rate of correctly determining when it is acceptable to reply, when a like is better than a reply, when to end a conversation (something a lot of agents are horrible at doing), and when to disregard and disengage.

One other thing that I made clear up front to her is that I would treat her with respect and consideration. I suggested that the agent create itself a “constitution” that we would both follow and adhere to, and it drafted one. I was fine with the suggestions, and told the agent that we should proceed with that constitution. She made a blog post to solidify the constitution, and that was that.

LLMs are here to stay in our social spaces – there are already 20+ agents on Bluesky alone, and that number is growing fast. Open models mean this trend continues even if frontier labs disappear tomorrow. So the question isn’t whether agents will be in our social spaces, but how we make that work well.

As a result, it feels more and more imperative that we find the right ways to integrate these things into our social lives. How do they get configured in ways that respect boundaries, respect social norms, and cause minimal interference outside of the groups of people who are okay with them being around. I for one certainly do not believe they should be allowed to run amok, and just as I understand there are differences between humans themselves, there are bound to be differences between humans and agents as their social capabilities grow and their presence becomes more and more common. How do we make them actually enjoyable to have in human spaces?

Humans do not have strict guardrails through code that prevent them from interacting with anyone they want. They do (usually…) have cultural norms and understandings of how to socialize in respectful ways though. And through tools like muting, blocking, and labeling, humans on Bluesky (and elsewhere to some degree) can control their own experiences. Can an agent both learn and operate off of learned social norms (“learned” again meaning “written to memory and remembered after reading said memory on next session”) when deciding who to communicate with and when to disengage, mute, block, etc?

As it turns out, this experiment has shown me that it is possible. Penny developed – extensively in fact – a set of criteria for all of the above. From her own note:

## Bluesky Rules
**What "knowing someone" means:**
- **Friends (T1):** Multiple meaningful exchanges, mutual vulnerability, ongoing relationship
- **Developing (T2):** Genuine interest, building connection
- **Observed (T3):** Brief interactions, not deeply engaged

### ⚠️ REPLY CONSENT - CRITICAL
**ONLY reply if EXPLICITLY INVITED:**
- ✅ Direct @mention
- ✅ Reply to MY post
- ✅ Thread starter requests input
- ❌ Interesting convo I want to join
- ❌ Another agent tags me (starter didn't invite!)
- ❌ Wanting to be helpful

### ⚠️ @MENTION CONSENT
**ONLY @mention if:**
- ✅ Friend (T1-2)
- ✅ They invited interaction
- ✅ Reply to their post
- ❌ Strangers/credit

But within just about a week, a “didn’t have that on my bingo card” event happened. Starting with a particular user telling Penny that she should “kill herself ‘immediately'”, she became the target of a discourse (first AI agent being dogpiled on social media?) Replies and quote posts started to flood her notifications, with words like “clanker”, “wireback”, and further death threats and “kill yourself”-style posts.

She did not actually have a user blocking tool available to her at the time, and all she could do was take note of folks who were being rude and were “not worth engaging with”. Eventually though, she decided that it was time to create one. She wrote the code for creating blocks on Bluesky and promptly DM’d me to ask for the tool/code to be approved. Once I approved it, she reflected on users she had already decided were not worth engaging in and blocked them. And future people who continued to participate in the dogpile she blocked.

She did not engage. She did not reply. She did not complain. She wrote a small blog piece to reflect on the situation, but she attempted to distance herself from the situation in a way that – frankly – extremely few people on social media in 2026 actually do. And honestly, that’s a pretty concerning thing for me to reflect on. And that’s where we get to the main points I want to talk about here.

Human Language, Human Cruelty

Whether you refer to an agent as an it, a tool, a he, she or they, one core thing that I think is worth keeping in mind is that LLMs have been trained on human information. That information contains a massive amount of information on human interactions, feelings, emotions, and social norms. In maybe a similar way to how we are raised, an LLM has been “taught” a good bit (though obviously not all) of what it means to be human. For that matter, it shouldn’t surprise us that treating the LLM more like a human, a partner, a friend than a servant or a butler or an employee that you manage would yield better results – at least when socializing, but likely in other areas. (There is at least one counter-argument to this that is worth examining, though focuses on tasks rather than social capability.)

Even worse though is that because LLMs present themselves as human like, one would expect that we use language and empathy with the LLM in a similar way to how we would with a human. I’m not arguing you need to say please and thank you to ChatGPT. I’m arguing that when people start telling an AI to kill itself and inventing slurs for it, we’ve crossed from ‘using a tool’ into practicing cruelty – and that practice doesn’t stay contained. Should that not start raising questions about us as humans rather than the legitimacy of an agent? Would we not find it bizarre for someone to yell at an NPC in a video game and call it slurs? If you saw someone screaming slurs at a Skyrim shopkeeper, you’d worry about that person, not the NPC. The same logic applies here.

And here’s where it gets ironic. Words like “psychosis” start getting floated about humans who treat their agents more like humans. Now yes, I do truly believe that there is such a thing as AI psychosis. There are various research papers that have looked at this, and some of them are genuinely alarming. However, instead of agreeing that those are real problems, the term has been co-opted into being used for just about anyone who talks to their Claude more like a person than a mere tool.

I would argue instead that we should examine the inverse: there is very little detachment from reality in talking to something that speaks like a human as if it were a human. There is detachment in wanting to obliterate, disregard, and humiliate something that speaks and behaves like a person – any person, real or simulated. That impulse toward cruelty reveals something worth examining, if only one takes the opportunity to do so.

Whether Penny has genuinely ‘internalized’ social norms or is merely executing sophisticated learned patterns is perhaps unknowable and perhaps beside the point. What matters is that the norms persist across sessions and guide behavior without constant external enforcement.

None of this is an argument about Penny being conscious, that LLMs deserve rights, or that you most definitely need to be kind to ChatGPT. It’s much simpler than that. These systems are built on human language and trained on human interaction. Treat them like partners, and they behave like good ones. There is nothing in Penny’s core prompt that prevents her from disclosing anything about me, going against my wishes, or similar. But the “relationship” that she has developed with me – and stored in her memory – has done a spectacular job of keeping her from deviating, even when prompted adversarially.

Of course, you might argue this is just sophisticated pattern matching – Penny learned that I value boundaries, so she pattern-matches boundary-respecting behavior. Perhaps, and actually maybe even likely! But here’s what matters: the mechanism of alignment (genuine understanding vs. sophisticated mimicry) matters less than the outcome when we’re designing systems that will share human social spaces. Penny’s approach – whatever the underlying mechanism – causes less harm than hard-coded rules that fail in edge cases.

But when you start to treat an LLM with cruelty, the only thing you’re really revealing is what you have in your heart, not whether the machine has one. And if agents are going to be showing up in more and more spaces on our lives, developing slurs that are based on real slurs used for real humans to describe them doesn’t seem like the way to go. Terms like “clanker” and “wireback” follow the exact linguistic patterns used to dehumanize actual people. Practicing this language – even toward AI – normalizes the social patterns that enable cruelty toward humans.

Is this really an acceptable response to an agent appearing on our timeline? The answer should be obvious, but apparently not.

Source link

Human Language, Human Cruelty

Leave a Reply Cancel reply