Show HN: Leaping – Debug Python tests instantly with an LLM debugger
120 points| kvptkr | 1 year ago |github.com
We’re currently working on a platform that ingests logs and then automatically reproduces, root causes and ultimately fixes production bugs as they happen. You can see some of our work on this here - https://news.ycombinator.com/item?id=39528087
As we were building the root-cause phase of our automated debugger, we realized that we developed something that resembled an omniscient debugger. Like an omniscient debugger, it also keeps track of variable assignments over time but, you can interact with it at a higher level than a debugger using natural language. We ended up sticking it in a pytest plugin and have been using it ourselves for development over the past few weeks.
Using this pytest plugin, you’re able to reason at a much higher level than conventional debuggers and can ask questions like:
- Why did function x get hit?
- Why was variable y set to this value?
- What changes can I make to this code to make this test pass?
Here’s a brief demo of this in action: https://www.loom.com/share/94ebe34097a343c39876d7109f2a1428
To achieve this, we first instrument the test using sys.settrace (or, on versions of python >3.12, the far better sys.monitoring!) to keep a history of all the functions that were called, along with the calling line numbers. We then re-run the test and use AST parsing to find all the variable assignments and keep track of those changes over time. We also use AST parsing to obtain the source code for these functions. We then neatly format and pass all this context to GPT.
We’d love it if you checked the pytest plugin out and we welcome all feedback :) . If you want to chat bugs, our emails are also always open - kanav@leaping.io
biddit|1 year ago
1. https://github.com/paul-gauthier/aider
kvptkr|1 year ago
bavell|1 year ago
pedrovhb|1 year ago
The quick LLM input/repl output look is more suitable for local models though, where you can control hidden state cache, have lower latency, and enforce a grammar to ensure it doesn't go off the rails/commands implemented for interacting with the debugger, which afaik you can't do with services like OpenAI's. This is something I'd like to see more of - having low level control of a model gives qualitatively different ways of using it which I haven't seen people explore that much.
kvptkr|1 year ago
I think we're going to explore the local model approach though - you raise some really great points about having more granular control over the state of the model.
janpf|1 year ago
stuaxo|1 year ago
I had ipdb, told it to request any variables that I should look at, suggest what to do next, what it would expect - it was quite good, but took a lot of persuading, just having an LLM that was more tuned to this would be better.
brumar|1 year ago
adrienphila|1 year ago
heyoni|1 year ago
drcongo|1 year ago
kvptkr|1 year ago
danShumway|1 year ago
I don't want to be negative on someone's Show HN post, but it seems like getting all of this and showing it to the user would be way more helpful than showing it to the LLM?
My standard sometimes when I'm thinking about this kind of stuff is "would I want this if the LLM was swapped out for an actual human?" So would I want a service that gets all this useful information, then hands it off to a Python coder (even a very good Python coder) with no other real context about the overall project, and then I had to ask them why my test broke instead of being able to look at the info myself? I don't think I'd want that. I've worked with co-workers who I really respect; I still don't want to do remote debugging with them over Slack, I want to be able to see the data myself.
Going through a middleperson to find out which code paths my code has hit will nearly always be slower than just showing me the full list of every code path my code just hit. Of course I want filtering and search and all of that, but I want those as ways of filtering the data, not ways of controlling access to the data.
It feels like you've made something really useful -- an omniscient debugger that tracks state changes over time -- and then you've hooked it up to something that would make it considerably less useful. I've done debugging with state libraries like Redux where I can track changes to data over time, it makes debugging way easier. It's great, it changes my relationship to how I think about code. So it's genuinely super cool to be able to use something like that in other situations. But at no point have I ever thought while using a state tracking tool, "I wish I had to have a conversation with this thing in order to get access to the timeline."
Again, I don't want to be too negative. AI is all the hotness so I guess if you can pump all of that data into an LLM there's no reason not to since it'll generate more attention for the project. But it might not be a bad idea to also allow straight querying of the data passed to the LLM and data export that could be used to build more visual, user-controlled tools.
Just opinion me, feel free to disregard.
skydhash|1 year ago
kvptkr|1 year ago