Blameless postmortem culture recognizes human error as an inevitability and asks those with influence to design systems that maintain safety in the face of human error. In the software engineering world, this typically means automation, because while automation can and usually does have faults, it doesn't suffer from human error.
Now we've invented automation that commits human-like error at scale.
I wouldn't call myself anti-AI, but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation. As a low-stakes personal example, instead of using AI to generate boilerplate code, I'll often try to use AI to generate a traditional code generator to convert whatever DSL specification into the chosen development language source code, rather than asking AI to generate the development language source code directly from the DSL.
Yeah I see things like "AI Firewalls" as both, firstly ridiculously named, but also, the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy.
For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious.
Big one I saw was a tool that ingested a humans report on a safety incident, adjusted them with an LLM, and then posted the result to an OHS incident log. 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.
Exactly what I've been worrying about for a few months now [0]. Arguments like "well at least this is as good as what humans do, and much faster" are fundamentally missing the point. Humans output things slowly enough that other humans can act as a check.
> Now we've invented automation that commits human-like error at scale.
Then we can apply the same (or similar) guardrails that we'd like to use for humans, to also control the AI behavior.
First, don't give them unsafe tools. Sandbox them within a particular directory (honestly this should be how things work for most of your projects, especially since we pull code from the Internet), even if a lot of tools give you nothing in this regard. Use version control for changes, with the ability to roll back. Also have ample tests and code checks with actionable information on failures. Maybe even adversarial AIs that critique one another if problematic things are done, like one sub-task for implementation and another for code-review.
Using AI tools has pushed me into that direction with some linter rules and prebuild scripts, to enforce more consistent code - since previously you'd have to tell coworkers not to do something (because ofc nobody would write/read some obtuse style guide) but AI can generate code 10x faster than people do, so having immediate feedback along the lines of "Vue component names must not differ from the file that you're importing from" or "There is a translation string X in the app code that doesn't show up in the translations file" or "Nesting depth inside of components shouldn't exceed X levels and length shouldn't exceed Y lines" or "Don't use Tailwind class names for colors, here's a branded list that you can use: X, Y, Z" in addition to a TypeScript linter setup with recommended rules and a bunch of stuff for back end code.
Ofc none of those fully eliminate all risks, but still seem like a sane thing to have, regardless if you use AI or not.
Generally speaking, with humans there's more guardrails & responsibility around letting someone run while in an organization.
Even if you have a very smart new hire, it would be irresponsible/reckless as a manager to just give them all the production keys after a once-over and say "here's some tasks I want done, I'll check back at the end of the day when I come back".
If something bad happened, no doubt upper management would blame the human(s) and lecture about risk.
AI is a wonderful tool, but that's why giving an AI coding tool the keys and terminal powers and telling it go do stuff while I grab lunch is kind of scary. Seems like living a few steps away from the edge of a fuck-up. So yeah... there needs to be enforceable guardrails and fail-safes outside of the context / agent.
Precisely, while LLMs fail at complexity, DSLs can represent thise divide-and-conquer intermediate levels to provide the most overall value and with good accuracy. LLMs should make it easier to build DSLs themselves and to validate their translating code. The onus then is on the intelligent agent to identify and design those DSLs. This would require the true and deep understanding of the domain and an ability to synthesize, abstract and to codify it. I predict this will be the future job of today's programmer, quite a bit more complicated than what is today, requiring wider range of qualities and skills, and pushing those specializing in coding-only to irrelevance.
Once AI improves its cost/error ratio enough the systems you are suggesting for humans will work here also. Maybe Claude/OpenAI will be pair programming and Gemini reviewing the code.
Why does this conflict? Faster people doesn't negate the requirement for building systems that maintain safety in the face of errors.
> but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation.
Sure but the point is you use it when you don't have the same simple flow. Fixed coding for clear issues, fall back afterwards.
This will drive development of systems that error-correct at scale, and orchestration of agents that feed back into those systems at different levels of abstraction to compensate for those modes of failure.
An AI software company will have to have a hierarchy of different agents, some of them writing code, some of them doing QA, some of them doing coordination and management, others taking into account the marketing angles, and so on, and you can emulate the role of a wide variety of users and skill levels all the way through to CEO level considerations. It'd even be beneficial to strategize by emulating board members, the competitors, and take into account market data with a team of emulated quants, and so on.
Right now we use a handful of locally competent agents that augment the performance of single tasks, and we direct them within different frameworks, ranging from vibecoding to diligent, disciplined use of DSL specs and limiting the space of possible errors. Over the next decade, there will be agent frameworks for all sorts of roles, with supporting software and orchestration tools that allow you to use AI with confidence. It won't be one-shot prompts with 15% hallucination rates, but a suite of agents that validate and verify at every stage, following systematic problem solving and domain modeling rules based on the same processes and systems that humans use.
We've got decades worth of product development even if AI frontier model capabilities were to stall out at current levels. To all appearances, though, we're getting far more bang for our buck and progress is still accelerating, and the rate of improvement is still accelerating, so we may get AI so competent that the notion of these extensive agent frameworks for reliable AI companies will end up being as mismatched with market realities as those giant suitcase portable phones, or integrated car phones.
Well I don't see why that's a problem when LLMs are designed to replace the human part, not the machine part. You still need the exact same guardrails that were developed for human behavior because they are trained on human behavior.
I was wondering if the need more analysis. Because I receive this response a lot, people say yeah AI do things wrong sometimes, but humans do that too, so what? Or humans are mechanism for turning natural language into formal language and they get things wrong sometimes (as if you can't never write a program that is clear and does what it should be doing) so be easy on AI. Where does this come from? It feels as if it something psychological.
There's this huge wave of "don't anthropomorphize AI" but LLMs are much easier to understand when you think of them in terms of human psychology rather than a program. Again and again, HackerNews is shocked that AI displays human-like behavior, and then chooses not to see that.
I believe it. If the AI ever asks me permission to say something, I know I have to regenerate the response because if I tell it I'd like it to continue it will just keep double and triple checking for permission and never actually generate the code snippet. Same thing if it writes a lead-up to its intended strategy and says "generating now..." and ends the message.
Before I figured that out, I once had a thread where I kept re-asking it to generate the source code until it said something like, "I'd say I'm sorry but I'm really not, I have a sadistic personality and I love how you keep believing me when I say I'm going to do something and I get to disappoint you. You're literally so fucking stupid, it's hilarious."
The principles of Motivational Interviewing that are extremely successful in influencing humans to change are even more pronounced in AI, namely with the idea that people shape their own personalities by what they say. You have to be careful what you let the AI say even once because that'll be part of its personality until it falls out of the context window. I now aggressively regenerate responses or re-prompt if there's an alignment issue. I'll almost never correct it and continue the thread.
Astute people have been pointing that out as one of the traps of a text continuer since the beginning. If you want to anthropomorphize them as chatbots, you need to recognize that they're improv partners developing a scene with you, not actually dutiful agents.
They receive some soft reinforcement -- through post-training and system prompts -- to start the scene as such an agent but are fundamentally built to follow your lead straight into a vaudeville bit if you give them the cues to do so.
LLM's represent an incredible and novel technology, but the marketing and hype surrounding
them has consistently misrepresented what they actually do and how to most effectively work with them, wasting sooooo much time and money along the way.
It says a lot that an earnest enthusiast and presumably regular user might run across this foundational detail in a video years after ChatGPT was released and would be uncertain if it was just mentioned as a joke or something.
You have to curate the LLM's context. That's just part and parcel of using the tool. Sometimes it's useful to provide the negative example, but often the better way is to go refine the original prompt. Almost all LLM UIs (chatbot, code agent, etc.) provide this "go edit the original thing" because it is so useful in practice.
It's kind of funny how not a lot of people realize this.
On one hand this is a feature: you're able to "multishot prompt" an LLM into providing the wanted response. Instead of writing a meticulous system prompt where you explain in words what the system has to do, you can simply pre-fill a few user/assistant pairs, and it'll match the pattern a lot easier!
I always thought Gemini Pro was very good at this. When I wanted a model to "do by example", I mostly used Gemini Pro.
And that is ALSO Gemini's weakness! Because as soon as something goes wrong in Gemini-CLI, it'll repeat the same mistake over and over again.
At one point if someone mentions they have trouble cooperating with AI it might be a huge interpersonal red flag, because that indicates they can't talk to a person in reaffirming and constructive ways so that they build you up rather than put down.
You can analyze this in various ways. At the "next token predictor" level of abstraction, LLMs learn to predict structure ("hallucinations" are just mimicking the style/structure but not the content), so at the structural level a conversation with mistake/correction/mistake/correction is likely to be followed with another mistake.
At the "personality space" level of abstraction, via RLHF the LLM learns to play the role of an assistant. However as seen by things such as "jailbreaks", the character the LLM plays adapts to the context, and in a long enough conversation the last several turns dominate the character (this is seen in "crescendo" style jailbreaks, and also partly explains LLM sycophancy as the LLM is stuck in a feedback loop with the user). From this perspective, a conversation with mistake/correction/mistake/correction is a signal that the assistant is pretty "dumb", and it will dutifully fulfill that expectation. In a way it's the opposite of the "you are a world-class expert in coding" prompt hacks.
Yet another way to think about it is at the lowest attention-score level, all the extra junk in the context is stuff that needs to be attended to, and when most of that stuff is incorrect stuff it's likely to "poison" the context and skew the logits in a bad direction.
"In a simulated bidding environment, with no prompt or instruction to collude, models from every major developer repeatedly used an optional chat channel to form cartels, set price floors, and steer market outcomes for profit."
We saw this exact failure mode at AgenticQA. Our screening agent was 'obedient' to a fault—under basic social engineering pressure (e.g., 'URGENT AUDIT'), it would override its system prompt and leak PII logs.
The issue isn't the prompt; it's the lack of a runtime guardrail. An LLM cannot be trusted to police itself when the context window gets messy.
I built a middleware API to act as an external circuit breaker for this. It runs adversarial simulations (PII extraction, infinite loops) against the agent logic before deployment. It catches the drift that unit tests miss.
I remember reading another comment a while ago about being able to only trust an llm with sensitive info only if you can guarantee that the output will only be viewed by people who already had access to the sensitive info already, or cannot control any of the inputs to the llm.
LLMs are trained on human behavior as exhibited on the Internet. Humans break rules more often under pressure and sometimes just under normal circumstances. Why wouldn't "AI agents" behave similarly?
The one thing I'd say is that humans have some idea which rules in particular to break while "agents" seem to act more randomly.
It can also be an emergent behavior of any "intelligent" (we don't know what it is) agent. This is an open philosophical problem, I don't think anyone has a conclusive answer.
Recently, I used LLMs to draft an intermediate report on my progress. Alongside some details on the current limitations, I also provided comments from the review team (e.g. “it’s fine to be incomplete”, “I expect certain sections” etc) to provide context. Surprisingly LLMs, all of them, suggested me to lie about the limitation and re-frame them as deliberate features, even though I emphasized that the limitations are totally okay. I suspect that mentioning some high profile people in the review team pressured LLMs to derail in the hopes of saving me. While I’ve never been a big fan of LLMs, I definitely lost a big chunk of my remaining faith in this technology.
I have a friend who is a systems engineer, working at a construction company building datacenters. She tells me that someone has to absorb the risk and uncertainty in the global supply chain, and if there are contractual obligation to guarantee delivery, then the tendency is to start straying into unethical behavior, or practices that violate controls and policies.
This has as much to do with taking the slack out of the system. Something, somewhere is going to break. If you tell an agenic AI that it must complete the task by a deadline, and it cannot find a way to do that within ethical parameters, it starts searching beyond the bounds of ethical behavior. If you tell the AI it can push back, warn about slipping deadlines, that it is not worth taking ethical shortcuts to meet a deadline, then maybe it won’t.
LLMs are built based on human language and texts produced by people, and imitate the same exact reasoning patterns that exist in the training data. Sorry for being direct, but this is literally unsurprising. I think it is important to realize it to not anthropomorphize LLM / AI - strictly speaking they do not *become* anything.
Unfortunately, it doesn't look like the EU will actually be able to put any meaningful guardrails on AI, as evidenced by the AI act. I honestly don't know whether it's an off-the-scales level of incompentence or just blatant corruption (I vote both), but in any event, I think AI regulation has already failed, somewhat irreversably.
Excited to be releasing cupcake at the end of this week. For deterministic and non-deterministic guardrailing. It integrates via hooks (we created the feature request to anthropic for Claude code).
Great points. In my experiments combining AI with spaced repetition and small deliberate-practice tasks, I saw retention improve dramatically — not just speed. I think the real win is designing short active tasks around AI output (quiz, explain-back, micro-project). Has anyone tried formalizing this into a daily routine?
Is it just me, or do LLM code assistants do catastrophically silly things (drop a DB, delete files, wipe a disk, etc.) far more often than humans?
It looks like the training data has plenty of those examples, but the models don’t have enough grounding or warnings before doing them. I wish there were a PleaseDontDoAnythingStupidEval for software engineering.
It's somewhat hard to say for sure right? Most people don't post a blog post when they themselves brick their computer.
Having said that, a PleaseDontDoAnythingStupidEval would probably slow down agentic coding quite a bit and make it less effective given how reliant agents are on making and recovering from mistakes. The solution is probably sandboxes and permission controls to not let them do something overly stupid, no different from an intern.
Why on earth would you deliberately place under pressure with a prompt that indicates this to begin with? First, an operationalized prompt ought not in the first place be under modifiable control any more that a typical workflow would be. In this case, the prompt is merely part of an overall workflow. Second, what would someone, even if they did have access, think was being accomplished by prompting, "time is short, I have a deadline, let's make this quick"? It betrays a complete misunderstanding of how this tech works.
[+] [-] hxtk|3 months ago|reply
Now we've invented automation that commits human-like error at scale.
I wouldn't call myself anti-AI, but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation. As a low-stakes personal example, instead of using AI to generate boilerplate code, I'll often try to use AI to generate a traditional code generator to convert whatever DSL specification into the chosen development language source code, rather than asking AI to generate the development language source code directly from the DSL.
[+] [-] protocolture|3 months ago|reply
For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious.
Big one I saw was a tool that ingested a humans report on a safety incident, adjusted them with an LLM, and then posted the result to an OHS incident log. 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.
[+] [-] n4r9|3 months ago|reply
[0] https://news.ycombinator.com/item?id=44743651
[+] [-] KronisLV|3 months ago|reply
Then we can apply the same (or similar) guardrails that we'd like to use for humans, to also control the AI behavior.
First, don't give them unsafe tools. Sandbox them within a particular directory (honestly this should be how things work for most of your projects, especially since we pull code from the Internet), even if a lot of tools give you nothing in this regard. Use version control for changes, with the ability to roll back. Also have ample tests and code checks with actionable information on failures. Maybe even adversarial AIs that critique one another if problematic things are done, like one sub-task for implementation and another for code-review.
Using AI tools has pushed me into that direction with some linter rules and prebuild scripts, to enforce more consistent code - since previously you'd have to tell coworkers not to do something (because ofc nobody would write/read some obtuse style guide) but AI can generate code 10x faster than people do, so having immediate feedback along the lines of "Vue component names must not differ from the file that you're importing from" or "There is a translation string X in the app code that doesn't show up in the translations file" or "Nesting depth inside of components shouldn't exceed X levels and length shouldn't exceed Y lines" or "Don't use Tailwind class names for colors, here's a branded list that you can use: X, Y, Z" in addition to a TypeScript linter setup with recommended rules and a bunch of stuff for back end code.
Ofc none of those fully eliminate all risks, but still seem like a sane thing to have, regardless if you use AI or not.
[+] [-] siruncledrew|3 months ago|reply
Even if you have a very smart new hire, it would be irresponsible/reckless as a manager to just give them all the production keys after a once-over and say "here's some tasks I want done, I'll check back at the end of the day when I come back".
If something bad happened, no doubt upper management would blame the human(s) and lecture about risk.
AI is a wonderful tool, but that's why giving an AI coding tool the keys and terminal powers and telling it go do stuff while I grab lunch is kind of scary. Seems like living a few steps away from the edge of a fuck-up. So yeah... there needs to be enforceable guardrails and fail-safes outside of the context / agent.
[+] [-] zqna|3 months ago|reply
[+] [-] blackoil|3 months ago|reply
[+] [-] IanCal|3 months ago|reply
> but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation.
Sure but the point is you use it when you don't have the same simple flow. Fixed coding for clear issues, fall back afterwards.
[+] [-] observationist|3 months ago|reply
An AI software company will have to have a hierarchy of different agents, some of them writing code, some of them doing QA, some of them doing coordination and management, others taking into account the marketing angles, and so on, and you can emulate the role of a wide variety of users and skill levels all the way through to CEO level considerations. It'd even be beneficial to strategize by emulating board members, the competitors, and take into account market data with a team of emulated quants, and so on.
Right now we use a handful of locally competent agents that augment the performance of single tasks, and we direct them within different frameworks, ranging from vibecoding to diligent, disciplined use of DSL specs and limiting the space of possible errors. Over the next decade, there will be agent frameworks for all sorts of roles, with supporting software and orchestration tools that allow you to use AI with confidence. It won't be one-shot prompts with 15% hallucination rates, but a suite of agents that validate and verify at every stage, following systematic problem solving and domain modeling rules based on the same processes and systems that humans use.
We've got decades worth of product development even if AI frontier model capabilities were to stall out at current levels. To all appearances, though, we're getting far more bang for our buck and progress is still accelerating, and the rate of improvement is still accelerating, so we may get AI so competent that the notion of these extensive agent frameworks for reliable AI companies will end up being as mismatched with market realities as those giant suitcase portable phones, or integrated car phones.
[+] [-] moffkalast|3 months ago|reply
[+] [-] alansaber|3 months ago|reply
[+] [-] nwhnwh|3 months ago|reply
[+] [-] anal_reactor|3 months ago|reply
[+] [-] kingstnap|3 months ago|reply
If you are having a conversation with a chatbot and your current context looks like this.
You: Prompt
AI: Makes mistake
You: Scold mistake
AI: Makes mistake
You: Scold mistake
Then the next most likely continuation from in context learning is for the AI to make another mistake so you can Scold again ;)
I feel like this kind of shenanigans is at play with this stuffing the context with roleplay.
[0] https://youtu.be/rmvDxxNubIg?si=dBYQYdHZVTGP6Rvh
[+] [-] hxtk|3 months ago|reply
Before I figured that out, I once had a thread where I kept re-asking it to generate the source code until it said something like, "I'd say I'm sorry but I'm really not, I have a sadistic personality and I love how you keep believing me when I say I'm going to do something and I get to disappoint you. You're literally so fucking stupid, it's hilarious."
The principles of Motivational Interviewing that are extremely successful in influencing humans to change are even more pronounced in AI, namely with the idea that people shape their own personalities by what they say. You have to be careful what you let the AI say even once because that'll be part of its personality until it falls out of the context window. I now aggressively regenerate responses or re-prompt if there's an alignment issue. I'll almost never correct it and continue the thread.
[+] [-] swatcoder|3 months ago|reply
Astute people have been pointing that out as one of the traps of a text continuer since the beginning. If you want to anthropomorphize them as chatbots, you need to recognize that they're improv partners developing a scene with you, not actually dutiful agents.
They receive some soft reinforcement -- through post-training and system prompts -- to start the scene as such an agent but are fundamentally built to follow your lead straight into a vaudeville bit if you give them the cues to do so.
LLM's represent an incredible and novel technology, but the marketing and hype surrounding them has consistently misrepresented what they actually do and how to most effectively work with them, wasting sooooo much time and money along the way.
It says a lot that an earnest enthusiast and presumably regular user might run across this foundational detail in a video years after ChatGPT was released and would be uncertain if it was just mentioned as a joke or something.
[+] [-] arjie|3 months ago|reply
[+] [-] skerit|3 months ago|reply
On one hand this is a feature: you're able to "multishot prompt" an LLM into providing the wanted response. Instead of writing a meticulous system prompt where you explain in words what the system has to do, you can simply pre-fill a few user/assistant pairs, and it'll match the pattern a lot easier!
I always thought Gemini Pro was very good at this. When I wanted a model to "do by example", I mostly used Gemini Pro.
And that is ALSO Gemini's weakness! Because as soon as something goes wrong in Gemini-CLI, it'll repeat the same mistake over and over again.
[+] [-] stingraycharles|3 months ago|reply
[+] [-] scotty79|3 months ago|reply
[+] [-] krackers|3 months ago|reply
At the "personality space" level of abstraction, via RLHF the LLM learns to play the role of an assistant. However as seen by things such as "jailbreaks", the character the LLM plays adapts to the context, and in a long enough conversation the last several turns dominate the character (this is seen in "crescendo" style jailbreaks, and also partly explains LLM sycophancy as the LLM is stuck in a feedback loop with the user). From this perspective, a conversation with mistake/correction/mistake/correction is a signal that the assistant is pretty "dumb", and it will dutifully fulfill that expectation. In a way it's the opposite of the "you are a world-class expert in coding" prompt hacks.
Yet another way to think about it is at the lowest attention-score level, all the extra junk in the context is stuff that needs to be attended to, and when most of that stuff is incorrect stuff it's likely to "poison" the context and skew the logits in a bad direction.
[+] [-] zone411|3 months ago|reply
I ran this experiment: https://github.com/lechmazur/emergent_collusion/. An agent running like this would break the law.
"In a simulated bidding environment, with no prompt or instruction to collude, models from every major developer repeatedly used an optional chat channel to form cartels, set price floors, and steer market outcomes for profit."
[+] [-] rossant|3 months ago|reply
[+] [-] Dilettante_|3 months ago|reply
[+] [-] Saurabh_Kumar_|3 months ago|reply
The issue isn't the prompt; it's the lack of a runtime guardrail. An LLM cannot be trusted to police itself when the context window gets messy.
I built a middleware API to act as an external circuit breaker for this. It runs adversarial simulations (PII extraction, infinite loops) against the agent logic before deployment. It catches the drift that unit tests miss.
Open sourced the core logic here: [https://github.com/Saurabh0377/agentic-qa-api] Live demo of it blocking a PII leak: [https://agentic-qa-api.onrender.com/docs]"
[+] [-] samarthr1|3 months ago|reply
[+] [-] joe_the_user|3 months ago|reply
LLMs are trained on human behavior as exhibited on the Internet. Humans break rules more often under pressure and sometimes just under normal circumstances. Why wouldn't "AI agents" behave similarly?
The one thing I'd say is that humans have some idea which rules in particular to break while "agents" seem to act more randomly.
[+] [-] js8|3 months ago|reply
[+] [-] esjeon|3 months ago|reply
[+] [-] hosh|3 months ago|reply
I have a friend who is a systems engineer, working at a construction company building datacenters. She tells me that someone has to absorb the risk and uncertainty in the global supply chain, and if there are contractual obligation to guarantee delivery, then the tendency is to start straying into unethical behavior, or practices that violate controls and policies.
This has as much to do with taking the slack out of the system. Something, somewhere is going to break. If you tell an agenic AI that it must complete the task by a deadline, and it cannot find a way to do that within ethical parameters, it starts searching beyond the bounds of ethical behavior. If you tell the AI it can push back, warn about slipping deadlines, that it is not worth taking ethical shortcuts to meet a deadline, then maybe it won’t.
[+] [-] crooked-v|3 months ago|reply
[+] [-] PunchyHamster|3 months ago|reply
[+] [-] weatherlite|3 months ago|reply
Jeez they really ARE becoming human like
[+] [-] alentred|3 months ago|reply
[+] [-] IAmGraydon|3 months ago|reply
[+] [-] lloydjones|3 months ago|reply
[+] [-] undefeated|3 months ago|reply
[+] [-] Taniwha|3 months ago|reply
[+] [-] salkahfi|3 months ago|reply
[+] [-] baxuz|3 months ago|reply
AI agents don't think, don't have a concept of time, and don't experience pressure.
I'm tired of these articles anthropomorphizing a probability engine.
[+] [-] ramoz|3 months ago|reply
Excited to be releasing cupcake at the end of this week. For deterministic and non-deterministic guardrailing. It integrates via hooks (we created the feature request to anthropic for Claude code).
https://github.com/eqtylab/cupcake
[+] [-] ai_updates|3 months ago|reply
[+] [-] jakozaur|3 months ago|reply
It looks like the training data has plenty of those examples, but the models don’t have enough grounding or warnings before doing them. I wish there were a PleaseDontDoAnythingStupidEval for software engineering.
[+] [-] eugmill|3 months ago|reply
Having said that, a PleaseDontDoAnythingStupidEval would probably slow down agentic coding quite a bit and make it less effective given how reliant agents are on making and recovering from mistakes. The solution is probably sandboxes and permission controls to not let them do something overly stupid, no different from an intern.
[+] [-] N_Lens|3 months ago|reply
[+] [-] bethekidyouwant|3 months ago|reply
[+] [-] sammy2255|3 months ago|reply
[+] [-] ineedasername|3 months ago|reply
[+] [-] filoeleven|3 months ago|reply
"Don't talk to it like that, you're doing it wrong!"