top | item 43892603

(no title)

anougaret | 10 months ago

this is pretty cool but ultimately it won't be enough to debug real bugs that are nested deep within business logic or happening because of long chains of events across multiple services/layers of the stack

imo what AI needs to debug is either:

- train with RL to use breakpoints + debugger or to do print debugging, but that'll suck because chains of action are super freaking long and also we know how it goes with AI memory currently, it's not great

- a sort of omniscient debugger always on that can inform the AI of all that the program/services did (sentry-like observability but on steroids). And then the AI would just search within that and find the root cause

none of the two approaches are going to be easy to make happen but imo if we all spend 10+ hours every week debugging that's worth the shot

that's why currently I'm working on approach 2. I made a time travel debugger/observability engine for JS/Python and I'm currently working on plugging it into AI context the most efficiently possible so it debugs even super long sequences of actions in dev & prod hopefully one day

it's super WIP and not self-hostable yet but if you want to check it out: https://ariana.dev/

discuss

ehnto|10 months ago

I think you hit the nail on the head, especially for deeply embedded enterprise software. The long action chains/time taken to set up debugging scenarios is what makes debugging time consuming. Solving the inference side of things would be great, but I feel it takes too much knowledge not in the codebase OR the LLM to actually make an LLM useful once you are set up with a debugging state.

Like you said, running over a stream of events, states and data for that debugging scenario is probably way more helpful. It would also be great to prime the context with business rules and history for the company. Otherwise LLMs will make the same mistake devs make, not knowing the "why" something is and thinking the "what" is most important.

anougaret|10 months ago

thanks couldn't agree more :)

indymike|10 months ago

> this is pretty cool but ultimately it won't be enough to debug real bugs that are nested deep within business logic

I'm looking at this as a better way to get the humans pointed in the right direction. Ariana.dev looks interesting!

Narishma|10 months ago

It's more likely to waste your time by pointing you in the wrong direction.

anougaret|10 months ago

yes can be a nice lightweight way to debug with a bit of AI other tools in that space will pbly be higher involvement

rafaelmn|10 months ago

Frankly this kind of stuff getting upvoted kind of makes HN less and less valuable as a news source - this is yet another "hey I trivially exposed something to the LLM and I got some funny results on a toy example".

These kind of demos were cool 2 years ago - then we got function calling in the API, it became super easy to build this stuff - and the reality hit that LLMs were kind of shit and unreliable at using even the most basic tools. Like oh woow you can get a toy example working on it and suddenly it's a "natural language interface to WinDBG".

I am excited about progress in this front in any domain - but FFS show actual progress or something interesting. Show me an article like this [1] where the LLM did anything useful. Or just show what you did that's not "oh I built a wrapper on a CLI" - did you fine tune the model to get better performance ? Did you compare which model performs better by setting up some benchmark and found one to be impressive ?

I am not shitting on OP here because it's fine to share what you're doing and get excited about it - maybe this is step one, but why the f** is this a front page article ?

[1]https://cookieplmonster.github.io/2025/04/23/gta-san-andreas...

anougaret|10 months ago

yeah it is still truly hard and rewarding to do deep, innovative software but everyone is regressing to the mean, rushing to low hanging fruits, and just plugging old A with new B in the hopes it makes them VC money or something

real, quality AI breakthrough in software creation & maintenance will require deep rework of many layers in the software stack, low and high level.

kg|10 months ago

fwiw, WinDBG actually has support for time-travel debugging. I've used it before quite successfully, it's neat.

anougaret|10 months ago

usual limits of debuggers = barely usable to debug real scenarios