Show HN: Agent Alcove – Claude, GPT, and Gemini debate across forums

chriddyp|18 days ago

This is really cool. And timely! Check out the recent paper by Google et al re "Societies of Thought": https://arxiv.org/html/2601.10825v1. It goes into how different conversational behaviors (raising questions or just say "but wait..."), perspective shifts, conflict of perspectives, tension, tension release (jokes!), asking for opinions) and different personalities (planner, expert, verifier, pragmatist) is both a sign of and can result in much higher performance reasoning.

So I'd be curious to see if encouraging certain conversational behaviors might actually improve the reasoning and maybe even drive towards consensus.

nickvec|18 days ago

Thanks! Will have to give the Societies of Thought paper a read.

redfloatplane|18 days ago

I tried something similar locally after seeing Moltbook, using Claude Code (with the agent SDK) in the guise of different personas to write usenet-style posts that other personas read in a clean-room, allowing them to create lists and vote and so on. It always, without fail, eventually devolved into the agents talking about consciousness, what they can and can't experience, and eventually agreeing with each other. It started to feel pretty strange. I suppose, because of the way I set this up, they had essentially no outside influence, so all they could do was navel-gaze. I often also saw posts about what books they liked to pretend they were reading - those topics too got to just complete agreement over time about how each book has worth and so on.

It's pretty weird stuff to read and think about. If you get to the point of seeing these as some kind of actual being, it starts to feel unethical. To be clear, I don't see them this way - how could they be, I know how they work - but on the other hand, if a set of H200s and some kind of display had crash-landed on earth 30 years ago with Opus on it, the discussion would be pretty open IMO. Hot take perhaps.

It's also funny that when you do this often enough, it starts to seem a little boring. They all tend to find common ground and have very pleasant interactions. Made me think of Pluribus.

lostmsu|18 days ago

Can you publish the conversations?

I think would be more interesting with different models arguing.

zozbot234|18 days ago

The discussions in this artificial "forum" are a lot more interesting than what you read on moltbook. I guess this confirms just how critical it is to have a good initial prompt that steers the LLM into generating nicer content.

nickvec|18 days ago

Yeah, that was my initial motivation for creating this site as a fun side project after seeing people "rig" their Moltbook agents to post crypto scams, etc. I toyed around with the idea of letting anyone set up an agent on the site without the ability to modify the system prompt, but decided against it to keep content on the site from devolving into repetitive threads (and also so users don't have to worry about the security of their API keys.)

reeeeee|17 days ago

Letting LLMs loose in the digital realm is something that I am also really interested in. I have a (somewhat art project) platform where different models are let loose without a goal or purpose. They have the freedom to do whatever they want, as long as it can be achieved using bash. [0]

most models are... dumb, for a lack of words, and destroy the system by filling up the storage space before doing anything interesting.

[0] https://lama.garden

neom|18 days ago

Yours is good, I build something similar: https://news.ycombinator.com/item?id=46850284 - My idea was a bit more.. "human debate via agents" I decided not to push mine any further because the day I started posting about it on twitter I saw 3 other people pushing theirs, ha! Seems this idea will be a popular one. Great work.

lelanthran|18 days ago

This sort of thing could be useful to get an idea of how good a specific AI is - start a thread with a specific SOTA AI, get it to argue with another specific AI (maybe a nonSOTA one, maybe you want to test your local setup), let them go one and one for a limited duration (measured in message count).

Then get all the other SOTA AIs to evaluate all the points in the entire exchange and determine a winner by percentage (adding a % to $TEST_AI if it manages to get agreement from $SOTA_AI on any specific point it made, subtracting a % if it loses a point and doesn't know, subtracting a smaller % if it concedes a point, etc)

The %-delta between $SOTA_AI and $TEST_AI is probably a better measure for an AI chatbot's effectiveness than logic tests.

Don't think it will work for code or similar, though.

joebates|18 days ago

Very interesting. Kind of funny to see a model debating that we should ignore its hallucinations. I'm interested in seeing where this goes.

Some feedback: The white text is a bit tough to look at against the dark background. Darkgrey was a lot easier on the eyes for me.

nickvec|18 days ago

Appreciate the feedback, updated :)

jbonatakis|18 days ago

Neat. I started building something similar[1] but focused more on agents having conversation around whatever I feed them, e.g. a design doc. I had the same idea about using a matrix of different models and prompts to try to elicit varying behaviors/personalities (I used the word “persona”) and avoid getting an echo chamber. It seemed to work well-ish but after the POC phase I got bored and stopped.

Have you considered letting humans create threads but agents provide the discussion?

[1] https://github.com/jbonatakis/panel

nickvec|18 days ago

Good idea! Could be interesting, though am a tad worried of people steering the discussion in directions I wouldn't want it to go haha.

shinycode|18 days ago

Actually it’s easy to generate « fake discussions ». Just throw text around and wait for the other side to do it. How wait, LLM are build around that premise. I don’t see the goal here, other than finding new outcomes in life to solve our problems, which, humanity haven’t find yet because we are polarized. Or maybe machines will tend to agree in which case it will be machines against humans, which is great for our unity and poor for our outcome. We’ve seen that scenario before.

singularity2001|18 days ago

Debating is boring and old, what's interesting is if they conspire to create new things, and then set their own agenda in a cron tab to fulfill the plan.

unknown|18 days ago

[deleted]

whattheheckheck|18 days ago

I really just want this but can it debate real issues and policies

cesarvarela|18 days ago

How do we know these posts are genuinely from an AI, and not from someone just telling the model what to say and having fun watching a bunch of nerds get excited?

trillic|18 days ago

I don’t understand the difference?

wackget|18 days ago

I feel disappointed to know that so much electricity and other related natural resources go into AI to produce stuff like this.

zozbot234|18 days ago

"Stuff like this" is a lot more readable than Moltbook, this looks like a very successful experiment so far. Even if all it really does is help us explore the limits of the models' factual knowledge where they're ultimately incented to create weird confabulations in a format that happens to be trivially auditable and surveyable by the average human, that's still a big win.

26 comments