I had a similar anecdotal experience a few weeks ago.
I was working on a blog entry in a VS Code window and I hadn't yet saved it to disk. Then I accidentally hit the close-window keyboard shortcut... and it was gone. The "open last closed window" feature didn't recover it.
On a hunch, I ran some rg searches in my VS Code Library feature on fragments of text I could remember from what I had written... and it turned out there was a VS Code Copilot log file with a bunch of JSON in it that recorded a recent transaction with their backend - and contained the text I had lost.
Really? After teaching/mentoring new devs and interns for the last two years at my job I definitely think there's plenty of space and opportunity for improvements on version control systems over git, large files and repos being one thing but primarily on user friendliness and accessibility where even existing ones like mercurial do a much nicer job in many ways.
Commit even as a WIP before cleaning up! I don't really like polluting the commit history like that but with some interactive rebase it can be as if the WIP version never existed.
(Side ask to people using Jujutsu: isn't it a use case where jujutsu shines?)
I think maybe one other lesson, although I certainly agree with yours, and with the other commenters who talk about the unreliability of this particular method? This feels like an argument for using an editor that autosaves history. "Disk is cheap," as they say -- so what if your undo buffer for any given file goes back seven days, or a month? With a good interface to browse through the history?
A fun anecdote, and I assume it's tongue in cheek, although you never know these days, but is the LLM guaranteed to give you back an uncorrupted version of the file? A lossy version control system seems to me to be only marginally better than having no VCS at all.
I frequently (basically every conversation) have issues with Claude getting confused about which version of the file it should be building on. Usually what causes it is asking it do something, then manually editing the file to remove or change something myself and giving it back, telling it it should build on top of what I just gave it. It usually takes three or four tries before it will actually use what I just gave it, and from then on it keeps randomly trying to reintroduce what I deleted.
I'd say it's more likely guaranteed to give you back a corrupted version of the file.
I assume OP was lucky because the initial file seems like it was at the very start of the context window, but if it had been at the end it would have returned a completely hallucinated mess.
From experience, no. I’ve customized my agent instructions to explicitly forbid operations that involve one-shot rewriting code for exactly this reason. It will typically make subtle changes, some of which have had introduced logic errors or regressions.
When I used toolcalls with uuids in the name, tiny models like quantized qwen3-0.6B would occasionally get some digits in the UUID wrong. Rarely, but often enough to notice even without automation. Larger models are much better, but give them enough text and they also make mistakes transcribing it
Well, it'll give you what the tokenizer generated. This is often close enough for working software, but not exact. I notice it when asking claude for the line number of the code with specific implementation. It'll often be off by a few because of the way it tokenizes white space.
Indeed. OP, nothing is "in" an LLM's context window at rest. The old version of your file is just cached in whatever file stores your IDE's chat logs, and this is an expensive way of retrieving what's already on your computer.
Technically it doesn't have to be since that part of the context window would have been in the KV cache and the inference provider could have thrown away the textual input.
If you sent the python file to Gemini, wouldn't it be in your database for the chat? I don't think relying on uncertain context window is even needed here!
A big goal while developing Yggdrasil was for it to act as long term documentation for scenarios like you describe!
As LLM use increases, I imagine each dev generating so much more data than before, our plans, considerations, knowledge have almost been moved partially into the LLM's we use!
>give me the exact original file of ml_ltv_training.py i passed you in the first message
I don't get this kind of thinking. Granted I'm not a specialist in ML. Is the temperature always 0 or something for these code-focused LLMs? How are people so sure the AI didn't flip a single 0 to 1 in the diff?
Even more so when applied to other more critical industries, like medicine. I talked to someone who developed an AI-powered patient report summary or something like that. How can the doctor trust that AI didn't alter or make something up? Even a tiny, single digit mistake can be quite literally fatal.
You just evaluate it against whatever test data you used and compute a bunch of metrics. You decide to use the model, if "bad things" happen at an acceptable enough rate.
> I refactored all the sketchy code into a clean Python package, added tests, formatted everything nicely, added type hints, and got it ready for production.
The fact that type hints are the last in the list, not first, suggests the level of experience with the language
I would have pressed Ctrl-Z in my editor like mad until I got the file. If I was using vim I could even grep for it through my history files, thanks to vim-persisted-undo.
It's disabled by default, but even with the default setups, you can find large snippets of code in ~/.gemini/tmp.
tl;dr: Gemini cli saves a lot of data outside the context window that enables rollback.
I'm sure other agents do the same, I only happen to know about Gemini because I've looked at the source code and was thinking of designing my own version of the shadow repo before I realized it already existed.
I've had this exact thing happen, but with the LLM deciding to screw up code it previously wrote. I really love how Jujutsu commits every time I run a "jj status" (or even automatically, when anything changes), it makes it really easy to roll back to any point.
> And I never committed the changes that got me the +5%.
Damn, I forget how much of a noob I used to be. I used to lose changes sometimes. But I commit very, very very often now or at least git stash, which creates some type of hash to recover from.
I find git is just about the only thing you need to lock down when using AI. Don't let it mess with your git, but let it do whatever else it wants. Git is then a simple way to get a summary of what was edited.
Like the author, I've also found myself wanting to recover an accidentally deleted file. Luckily, some git operations, like `git add` and `git stash`, store files in the repo, even if they're not ultimately committed. Eventually, those files will be garbage collected, but they can stick around for some time.
Git doesn't expose tools to easily search for these files, but I was able to recover the file I deleted by using libgit2 to enumerate all the blobs in the repo, search them for a known string, and dump the contents of matching blobs.
It stands to reason the OP doesn't understand the code or what he's (probably the LLM) has written if he can't manage to reproduce his own results. We have all been there, but this kind of "try stuff" and "not understand the cause and effect" of your changes is a recipe for long-term disaster. Noticeably also is a lack of desire to understand what the actual change was, and reinforcement of bad development practices.
Using LLMs as an extra backup buffer sounds really neat.
But I was trained with editing files in telnet over shaky connections, before Vim had auto-backup. You learn to hit save very frequently. After 3 decades I still reflexively hit save when I don’t need to.
I don’t forget to stage/commit in git between my prompts.
The new checkpoint and rollback features seem neat for people who don’t have those already. But they’re standard tools.
This isn't actually a Gemini - a copy of the file was already stored locally by Cursor. Most modern editors, including VS Code, can recover files from local history without needing Git.
The interesting thing here is the common misconception that LLMs maintain internal state between sessions which obviously they don't. They don't have memory and they don't know about your files.
Over the years, I've heard so many stories like these without happy ending - developers wasting days and sometimes even a week or two of work, because they do like to commit and use git often - that my long-time upheld practice is to pretty much always create feature/develop branches and commit as often as possible, often multiple times per hour.
Git is not just for saving personal history. It's also, and more importantly, a collaboration tool. Your contex window is no substitute for that, and can't even be relied on to be either complete or accurate over what might be years of development.
1M context is amazing, but even after 100k tokens Gemini 2.5 Pro is usually incapable of consistently reproducing 300 LOC file without changing something in process. And it actually take a lot of effort to make sure it do not touch files it not suppose to.
With Gemini I have found some weird issues with code gen that are presumably temperature related. Sometimes it will emit large block of code with a single underscore where it should be a dash or some similar very close match that would make sense as a global decision but is triggered for only that one instance.
Not to mention sneakily functions back in after being told to remove them because they are defined elsewhere. Had a spell where it was reliably a two prompt process for any change, 1) do the actual thing, 2) remove A,B and C which you have reintroduced again.
I have had some very weird issues with Gemini 2.5 Pro where during a longer session it eventually becomes completely confused and starts giving me the response to the previous prompt instead of the current one. I absolutely would not trust it to handle larger amounts of data or information correctly.
I would recommend the Code Supernova model in Cursor if you want a 1M token context window. It's free right now since the model is being tested in stealth, but your data will be used by XAI or whoever it turns out the model creator is.
For the same reason, I run OpenCode under Mac's sandbox-exec command with some rules to prevent writes to the .git folder or outside of the project (but allowing writes to the .cache and opencode directories).
This is something i'm currently working on as a commercial solution - the whole codebase sits in a special context window controlled by agents. No need for classic SCM.
I'm waiting for the day someone builds a wrapper around LLM chats and uses it as a storage medium. It's already been done for GitHub, YouTube videos and Minecraft.
I suppose if you want an extremely lossy storage medium that may or may not retrieve your data, stores less than a 3.5” storage medium, and needs to be continually refreshed as you access it.
It has nothing to do with context window . Its cursor stores locally gigabytes of data including your requests and answers. It’s a classical rag , not a “long context”
I cannot wrap my head around the anecdote that opens the article:
> Lately I’ve heard a lot of stories of AI accidentally deleting entire codebases or wiping production databases.
I simply... I cannot. Someone let a poorly understood AI connected to prod, and it ignored instructions, deleted the database, and tried to hide it. "I will never use this AI again", says this person, but I think he's not going far enough: he (the human) should be banned from production systems as well.
This is like giving full access to production to a new junior dev who barely understands best practices and is still in training. This junior dev is also an extraterrestrial with non-human, poorly understood psychology, selective amnesia and a tendency to hallucinate.
I mean... damn, is this the future of software? Have we lost our senses, and in our newfound vibe-coding passion forgotten all we knew about software engineering?
Please... stop... I'm not saying "no AI", I do use it. But good software practices remain as valid as ever, if not more!
The common story getting shared all over is from a guy named Jason Lemkin. He’s a VC who did a live vibe-coding experiment for a week on Twitter where he wanted to test if he, a non-programmer, could build and run a fake SaaS by himself.
The AI agent dropped the “prod” database, but it wasn’t an actual SaaS company or product with customers. The prod database was filled with synthetic data.
The entire thing was an exercise but the story is getting shared everywhere without the context that it was a vibe coding experiment. Note how none of the hearsay stories can name a company that suffered this fate, just a lot of “I’m hearing a lot of stories” that it happened.
It’s grist for the anti-AI social media (including HN) mill.
Its a matter of priorities. Its cheap and fast and there is a chance that it will be OK. Even just OK until I move on. People often make risky choices for those reasons. Not just with IT systems - the crash of 2008 was largely the result of people betting (usually correctly) that the wheels would not fall off until after they had collected a few years of bonuses.
I use Crystal which archives all my old claude code conversations, I've had to do this a few times when I threw out code that I later realized I needed.
I find gemini 2.5 pro starts losing its shit around 50K tokens, using Roo Code. Between Roo's system prompt and my AGENTS.md there's probably about 10k used off the bat. So I have about 30-40k tokens to complete whatever task I assign it.
It's a workable limit but I really wish I could get more out of a single thread before it goes crazy. Does this match others' experience?
Sometimes I notice myself go a bit too long without a commit and get nervous. Even if I'm in a deep flow state, I'd rather `commit -m "wip"` than have to rely on a system not built for version control.
this shit is so depressing, having a "secret sauce" and it being just mystical and unknowable, a magic incantation which you hopefully scribbled down to remember later
Agreed. Most plausible reason they "can't remember" the good solution is because they were vibe coding and didn't really understand what they were doing. Research mode my ass.
If you're an engineer it can be quite shocking to see how people like the author work. It's much more like science than engineering. A lot of trial and error and swapping things around without fully understanding the implications etc. It doesn't interest me, but it's how all the best results are obtained in ML as far as I can tell.
simonw|4 months ago
I was working on a blog entry in a VS Code window and I hadn't yet saved it to disk. Then I accidentally hit the close-window keyboard shortcut... and it was gone. The "open last closed window" feature didn't recover it.
On a hunch, I ran some rg searches in my VS Code Library feature on fragments of text I could remember from what I had written... and it turned out there was a VS Code Copilot log file with a bunch of JSON in it that recorded a recent transaction with their backend - and contained the text I had lost.
I grabbed a copy of that file and ran it through my (vibe-coded) JSON string extraction tool https://tools.simonwillison.net/json-string-extractor to get my work back.
magicalhippo|4 months ago
Primarly because it taught me to save every other word or so, in case my ISR caused the machine to freeze.
[1]: https://wiki.osdev.org/Interrupt_Service_Routines
throwzasdf|4 months ago
[deleted]
wseqyrku|4 months ago
No matter how that sentence ends, I weep for our industry.
waffletower|4 months ago
bgwalter|4 months ago
"The phone/computer will just become an edge node for AI, directly rendering pixels with no real operating system or apps in the traditional sense."
ripped_britches|4 months ago
josu|4 months ago
oompty|4 months ago
gregjw|4 months ago
dotancohen|4 months ago
This is an amusing anecdote. But the only lesson to be learned is to commit early, commit often.
cranium|4 months ago
(Side ask to people using Jujutsu: isn't it a use case where jujutsu shines?)
BryantD|4 months ago
I'm sure there's an emacs module for this.
f1shy|4 months ago
beefnugs|4 months ago
"Hey copilot, what are all my passwords and credit card numbers"
superxpro12|4 months ago
1. Commit 2. Push 3. Evacuate
bogzz|4 months ago
zobzu|4 months ago
with that said its true that it works =)
SketchySeaBeast|4 months ago
mathieuh|4 months ago
iLoveOncall|4 months ago
I assume OP was lucky because the initial file seems like it was at the very start of the context window, but if it had been at the end it would have returned a completely hallucinated mess.
adastra22|4 months ago
shallmn|4 months ago
wongarsu|4 months ago
flerchin|4 months ago
kristjansson|4 months ago
rfw300|4 months ago
red2awn|4 months ago
desipenguin|4 months ago
He complained to me that he "could not find it in ChatGPT history as well"
I think @alexmolas was lucky
yggdrasil_ai|4 months ago
A big goal while developing Yggdrasil was for it to act as long term documentation for scenarios like you describe!
As LLM use increases, I imagine each dev generating so much more data than before, our plans, considerations, knowledge have almost been moved partially into the LLM's we use!
You can check out my project on git, still in early and active development - https://github.com/zayr0-9/Yggdrasil
lazyfanatic42|4 months ago
lbrito|4 months ago
I don't get this kind of thinking. Granted I'm not a specialist in ML. Is the temperature always 0 or something for these code-focused LLMs? How are people so sure the AI didn't flip a single 0 to 1 in the diff?
Even more so when applied to other more critical industries, like medicine. I talked to someone who developed an AI-powered patient report summary or something like that. How can the doctor trust that AI didn't alter or make something up? Even a tiny, single digit mistake can be quite literally fatal.
svg7|4 months ago
jdlyga|4 months ago
bad_username|4 months ago
The fact that type hints are the last in the list, not first, suggests the level of experience with the language
hiccuphippo|4 months ago
cimi_|4 months ago
ants_everywhere|4 months ago
It's disabled by default, but even with the default setups, you can find large snippets of code in ~/.gemini/tmp.
tl;dr: Gemini cli saves a lot of data outside the context window that enables rollback.
I'm sure other agents do the same, I only happen to know about Gemini because I've looked at the source code and was thinking of designing my own version of the shadow repo before I realized it already existed.
stavros|4 months ago
ElijahLynn|4 months ago
Damn, I forget how much of a noob I used to be. I used to lose changes sometimes. But I commit very, very very often now or at least git stash, which creates some type of hash to recover from.
lordnacho|4 months ago
GrantMoyer|4 months ago
Git doesn't expose tools to easily search for these files, but I was able to recover the file I deleted by using libgit2 to enumerate all the blobs in the repo, search them for a known string, and dump the contents of matching blobs.
iamleppert|4 months ago
didi_bear|4 months ago
Are people coding on notepad ?
jasonjmcghee|4 months ago
defraudbah|4 months ago
sshine|4 months ago
But I was trained with editing files in telnet over shaky connections, before Vim had auto-backup. You learn to hit save very frequently. After 3 decades I still reflexively hit save when I don’t need to.
I don’t forget to stage/commit in git between my prompts.
The new checkpoint and rollback features seem neat for people who don’t have those already. But they’re standard tools.
jbki|4 months ago
_pdp_|4 months ago
The interesting thing here is the common misconception that LLMs maintain internal state between sessions which obviously they don't. They don't have memory and they don't know about your files.
BinaryIgor|4 months ago
notacoward|4 months ago
big-and-small|4 months ago
Lerc|4 months ago
like code containing the same identifier.
Not to mention sneakily functions back in after being told to remove them because they are defined elsewhere. Had a spell where it was reliably a two prompt process for any change, 1) do the actual thing, 2) remove A,B and C which you have reintroduced again.this_user|4 months ago
alansaber|4 months ago
unsignedchar|4 months ago
tripplyons|4 months ago
ed_elliott_asc|4 months ago
tripplyons|4 months ago
sandbox-exec -p "(version 1)(allow default)(deny file-write* (subpath \"$HOME\"))(allow file-write* (subpath \"$PWD\") (subpath \"$HOME/.local/share/opencode\"))(deny file-write* (subpath \"$PWD/.git\"))(allow file-write* (subpath \"$HOME/.cache\"))" /opt/homebrew/bin/opencode
Flavius|4 months ago
SketchySeaBeast|4 months ago
siva7|4 months ago
unknown|4 months ago
[deleted]
mobeigi|4 months ago
Aurornis|4 months ago
tripplyons|4 months ago
maxdo|4 months ago
actinium226|4 months ago
OK, but if you'd used git properly, you wouldn't have had this problem in the first place.
the_af|4 months ago
> Lately I’ve heard a lot of stories of AI accidentally deleting entire codebases or wiping production databases.
I simply... I cannot. Someone let a poorly understood AI connected to prod, and it ignored instructions, deleted the database, and tried to hide it. "I will never use this AI again", says this person, but I think he's not going far enough: he (the human) should be banned from production systems as well.
This is like giving full access to production to a new junior dev who barely understands best practices and is still in training. This junior dev is also an extraterrestrial with non-human, poorly understood psychology, selective amnesia and a tendency to hallucinate.
I mean... damn, is this the future of software? Have we lost our senses, and in our newfound vibe-coding passion forgotten all we knew about software engineering?
Please... stop... I'm not saying "no AI", I do use it. But good software practices remain as valid as ever, if not more!
Aurornis|4 months ago
The AI agent dropped the “prod” database, but it wasn’t an actual SaaS company or product with customers. The prod database was filled with synthetic data.
The entire thing was an exercise but the story is getting shared everywhere without the context that it was a vibe coding experiment. Note how none of the hearsay stories can name a company that suffered this fate, just a lot of “I’m hearing a lot of stories” that it happened.
It’s grist for the anti-AI social media (including HN) mill.
graemep|4 months ago
f1shy|4 months ago
>(the human) should be banned from production systems as well.
The human may have learnt the lesson... if not, I would still be banned ;)[0]
[0] I did not delete a database, but cut power to the rack running the DB
qwertytyyuu|4 months ago
jbentley1|4 months ago
jasonjmcghee|4 months ago
unknown|4 months ago
[deleted]
j45|4 months ago
Aeolun|4 months ago
pdntspa|4 months ago
It's a workable limit but I really wish I could get more out of a single thread before it goes crazy. Does this match others' experience?
AntoineN2|4 months ago
seanw265|4 months ago
Sometimes I notice myself go a bit too long without a commit and get nervous. Even if I'm in a deep flow state, I'd rather `commit -m "wip"` than have to rely on a system not built for version control.
smallpipe|4 months ago
BestHackerOnHN|4 months ago
[deleted]
scroblart|4 months ago
cmsj|4 months ago
everyone|4 months ago
big_hacker|4 months ago
globular-toast|4 months ago
eisbaw|4 months ago
iambateman|4 months ago