top | item 46627195

(no title)

elliotto | 1 month ago

I worked on a startup experimenting with using gemini-2.0-flash (the year old model) using its full 1m context window to query technical documents. We found it to be extremely successful at needle-in-a-haystack type problems.

As we migrated to newer models (gemini-3.0 and the o4-mini models) we again found it performed even better with x00k tokens. Our system prompt grew to about 20k tokens and the bots were able to handle it perfectly. Our issue became time to first token with large context, rather than the bot quality.

The ultra large 1m+ llama models were reported to be ineffective at >1m context. But at this point, it becomes so cost prohibitive to use anyway.

I am continuing to have success using Cursor's Auto model, and GPT-5.1 with extremely long conversations. I use different chats for different problems moreso for my own compartmentalisation of thoughts, rather than as a necessity for the bot.

discuss

No comments yet.