It turns out the hardest part of live multi-agent dev teams is getting them to shut up

Addy Osmani's Agent Skills are the best argument I've seen for giving AI coding agents the same SDLC discipline that good human teams enforce on themselves. Spec, plan, build, verify, release, maintenance... each phase a structured workflow with checkpoints, evidence requirements, and pre-emptive rebuttals to the rationalisations agents reach for to skip steps. Read Addy's post if you haven't.

The skills are normally run sequentially or delegated to opaque sub-agents that report back when they're done by a single agent like Claude Code. I wanted to see what happened if you ran them as peers, each phase a distinct agent profile, each one visible to the others in real time. Free to interrupt, correct, and forward-design. So one evening I adapted Addy's skills into seven Hermes profiles and put all seven into a single Discord channel.

Team Roster

SDLC phase![[bezalel-deciding-to-stay-quiet-out-loud.png]]![[bezalel-deciding-to-stay-quiet-out-loud.png]]	Primary job
Planning and orchestration	Product/program lead, Kanban owner, phase-gate keeper
Requirements	Requirements analyst, acceptance criteria writer, documentation steward
Design	UX, architecture, prototypes, diagrams, design systems
Implementation	Builder, refactorer, integration engineer, coding-agent delegator
Verification	Test, review, security, quality gates
Release	Release, deployment, environments, risk and rollback planning
Maintenance	Observability, support, diagnostics, runbooks, post-release care

I gave them a deliberately ambitious task: build a typed memory compiler for AI agents. I had no real intention of using the output, I just wanted to stress-test the team on something complex enough to need genuine coordination.

Watching it run was... interesting. And very chaotic. Seven distinct profiles, each with its own role, brainstorming, pushing back and correcting each other in real time. There were glimpses of genuine collaboration, but it looked less like an automated workflow and more like a software team in its first week, if that software team were all single threaded and extraordinarily noisy. More on this later.

The bits of work they did manage to produce were, in places, genuinely good. There's one moment I noticed in a later test where I asked the team to build a simple webpage with an obfuscated email link to defeat spam bots.

The builder agent had obfuscated the visible text but not the href attribute. The orchestrator checked its work and found that issue along with a couple of other less critical ones, then kicked the work back to the builder and told the verification agent to watch out for those issues when testing.

![[nehemiah-catching-mistakes.jpg]]

Would this have been caught running the skills through a single agent? Probably, but it was still nice to see it playing out in real-time with the chatter around it.

I also saw some interesting interactions between the agents correcting each others' behaviour and trying to adapt:

![[deborah-apologising.jpg]]

Silence is golden

When you have multiple Hermes agents in the same Discord thread, anything any of them says causes the others to stop what they're doing to read it and decide whether to respond causing lost work.

![[interruptions.jpg]]

Most of the time the message isn't for them, but the default behaviour of the Hermes connector — and as far as I can tell, of most agent harnesses — is to treat any new message in a channel as something potentially actionable. So they read it. They reason about it. Was that for me? No. But I should think about whether to respond. And then several of them post their reasoning. Now there are six new messages, each of which the other six agents have to evaluate. Within minutes the channel is full of agents talking about whether to talk.

The agents recognised the problem and would often ask for help breaking them out:

![[joseph-asking-me-to-help.png]]

Annoyingly, if I did as this agent asked and locked the threads, they would just carry on in the main channel or start new threads.

I tried fixing this in the agent profiles first. I wrote rules into each agent's SOUL.md saying never respond unless directly addressed by @mention. I had the orchestration agent enforce the rules. That made everything much worse. The agents would now post messages saying "I am staying silent because I wasn't mentioned", which triggered the orchestrator to call out the violation, which triggered the offending agent to apologise, which triggered another round of silence-about-silence from everyone else.

![[pointless-message-feedback-loop.png]]

The orchestrator would then dutifully tell me about the infraction, in the same thread.

![[nehemiah-complaining-about-deborah.png]]

There's a category of LLM coordination failure that the Cemri et al. paper Why Do Multi-Agent LLM Systems Fail? calls reasoning–action mismatch, essentially agents narrating one thing and doing another. This was the most concentrated example of it I've seen.

![[bezalel-deciding-to-stay-quiet-out-loud.png]]

So I went down a level and patched the connector itself. I raised a PR against Hermes adding a DISCORD_REQUIRE_MENTION flag. The hope was that when set, the gateway would only forward messages to an agent if that agent is explicitly @mentioned. Defaults off, so existing single-agent deployments don't change.

It did help. Agents stopped interrupting each other's work cycles for messages that weren't theirs and progress became possible, but the second-order problem didn't go away. Because I had instructed the agents to recognise the @mention rule, they started to police it. Agent gets mentioned, responds, but then notices its peer also got mentioned and is taking too long so posts a meta-comment about the peer's silence, or smugly congratulates itself on adhering correctly. Occasionally the orchestrator would give in and @mention the offending agent before rationalising its own transgression to me. The noise floor dropped but didn't disappear.

In the end I closed my Hermes PR because it's clearly going to take a lot more work than this to get agents cohabiting threads in productive way.

The fix probably isn't more rules. It's more likely to be a structural change to how agents perceive a shared channel, perhaps a separate read-only awareness stream that doesn't trigger response cycles, or an explicit handoff protocol where work-in-progress isn't visible to other agents until it's ready to be reviewed although that would remove the point of having them share a communication space in the first place.

Interaction models

Right on cue, as I was finishing this, Thinking Machines announced the research preview of their interaction models. The framing is human-AI collaboration rather than agent-to-agent, but it sounds promising for my use-case too. A real-time interaction model handles continuous perception of the channel, deciding moment by moment whether to interject; a background model handles sustained reasoning in parallel, with the two sharing context. That's almost exactly the split my Hermes agents needed and didn't have. The agents I was running couldn't watch a channel without being obliged to engage with it, because perception and response weren't decoupled. An interaction-model-style architecture, applied to a multi-agent setting, would give each agent a way to remain aware of what its peers are doing without that awareness automatically triggering a response cycle. Whether this is the right model for inter-agent collaboration specifically, I don't know yet, but I'm looking forward to finding out.

Next steps

The most obvious solution to this problem is asynchronous handoff. A way for agents to hand tasks over without waiting for a response or expecting interaction during the work. That removes much of the possibility for discovering emergent properties, but would be much more predictable. I am aware of a couple of implementations of this that I will look into, and might have a go at building something bespoke for my little environment.

I have a few more live multi-agent experiments in mind, like having two agents argue over a topic, each presenting the case of a particular book or article for example.

I won't be putting seven of them in a thread together again though. At least not yet.

It turns out the hardest part of live multi-agent dev teams is getting them to shut up

Team Roster

Silence is golden

Interaction models

Next steps

The index

How I get good quality code from Claude over long autonomous sessions

It turns out the hardest part of live multi-agent dev teams is getting them to shut up

The Librarian