top | item 45608631

(no title)

Agreed, and its larger context window is fantastic. My workflow:

- Convert the whole codebase into a string

- Paste it into Gemini

- Ask a question

People seem to be very taken with "agentic" approaches were the model selects a few files to look at, but I've found it very effective and convenient just to give the model the whole codebase, and then have a conversation with it, get it to output code, modify a file, etc.

discuss

Galanwe|4 months ago

I usually do that in a 2 step process. Instead of giving the full source code to the model, I will ask it to write a comprehensive, detailed, description of the architecture, intent, and details (including filenames) of the codebase to a Markdown file.

Then for each subsequent conversation I would ask the model to use this file as reference.

The overall idea is the same, but going through an intermediate file allows for manual amendments to the file in case the model consistently forgets some things, it also gives it a bit of an easier time to find information and reason about the codebase in a pre-summarized format.

It's sort of like giving a very rich metadata and index of the codebase to the model instead of dumping the raw data to it.

kridsdale3|4 months ago

My special hack on top of what you suggested: Ask it to draw the whole codebase in graphviz compatible graphing markup language. There are various tools out there to render this as an SVG or whatever, to get an actual map of the system. Very helpful when diving in to a big new area.

leetharris|4 months ago

For anyone wondering how to quickly get your codebase into a good "Gemini" format, check out repomix. Very cool tool and unbelievably easy to get started with. Just type `npx repomix` and it'll go.

Also, use Google AI Studio, not the regular Gemini plan for the best results. You'll have more control over results.

georgemcbay|4 months ago

> Convert the whole codebase into a string

When using the Gemini web app on a desktop system (could be different depending upon how you consume Gemini) if you select the + button in the bottom-left of the chat prompt area, select Import code, and then choose the "Upload folder" link at the bottom of the dialog that pops up, it'll pull up a file dialog letting you choose a directory and it will upload all the files in that directory and all subdirectories (recursively) and you can then prompt it on that code from there.

The upload process for average sized projects is, in my experience, close to instantaneous (obviously your mileage can vary if you have any sort of large asset/resource type files commingled with the code).

If your workflow already works then keep with it, but for projects with a pretty clean directory structure, uploading the code via the Import system is very straightforward and fast.

(Obvious disclaimer: Depending upon your employer, the code base in question, etc, uploading a full directory of code like this to Google or anyone else may not be kosher, be sure any copyright holders of the code are ok with you giving a "cloud" LLM access to the code, etc, etc)

pdimitar|4 months ago

Well I am not sure Gemini or any other LLMs respect `.gitignore` which can immediately make the context window jump over the maximum.

Tools like repomix[0] do this better, plus you can add your own extra exclusions on top. It also estimates token usage as a part of its output but I found it too optimistic i.e. it regularly says "40_000 tokens" but when uploading the resulting single XML file to Gemini it's actually f.ex. 55k - 65k tokens.

[0] https://github.com/yamadashy/repomix/

asah|4 months ago

try codex and claude code - game changing ability to use CLI tools, edit/reorg multiple files, even interact with git.

8n4vidtmkvmk|4 months ago

Gemini cli is a thing that exists. Are you saying those specifically are better? Or CLIs are better?

xnx|4 months ago

Gemini CLI does all this too

Keyframe|4 months ago

I started using gemini like that as well, but with gemini cli. Point it at the direction and then converse with it about codebase. It's wonderful.

fennecbutt|4 months ago

Idk though, I've seen many issues occur because of a longer context though. I mean it makes sense, given there are only so many attention heads, the longer the context the less chance attention will pick relevant tokens.

HDThoreaun|4 months ago

the cli tools really are way faster. You can use them the same way if you want you just dont have to copy paste stuff around all the time