top | item 44207063

Reverse Engineering Cursor's LLM Client

159 points| paulwarren | 8 months ago |tensorzero.com

35 comments

order

serf|8 months ago

Cursor is the only product that I have cancelled in 20+ years due to a lack of customer service response.

Emailed them multiple times over weeks about billing questions -- not a single response. These weren't like VS code questions , either -- they needed Cursor staff intervention.

No problem getting promo emails though!

The quicker their 'value' can be spread to other services the better, imo. Maybe the next group will answer emails.

angst|8 months ago

Indeed.

Mail to <hi@cursor.com> is replied by “Sam from Cursor” which is “Cursor's AI Support Assistant” and after few back and forth it tells “I'm connecting you with a teammate who can better investigate”. Guess what? It’s been a month and no further communication whatsoever.

I don’t have high hopes for its customer services.

robkop|8 months ago

There is much missing from this prompt, tool call descriptors is the most obvious. See for yourself using even a year old jailbreak [1]. There’s some great ideas in how they’ve setup other pieces such as cursor rules.

[1]: https://gist.github.com/lucasmrdt/4215e483257e1d81e44842eddb...

GabrielBianconi|8 months ago

They use different prompts depending on the action you're taking. We provided just a sample because our ultimate goal here is to start A/B testing models, optimizing prompts + models, etc. We provide the code to reproduce our work so you can see other prompts!

The Gist you shared is a good resource too though!

ericrallen|8 months ago

Maybe there is some optimization logic that only appends tool details that are required for the user’s query?

I’m sure they are trying to slash tokens where they can, and removing potentially irrelevant tool descriptors seems like low-hanging fruit to reduce token consumption.

CafeRacer|8 months ago

Soooo.... wireshark is no longer available or something?

vrm|8 months ago

wireshark would work for seeing the requests from the desktop app to Cursor’s servers (which make the actual LLM requests). But if you’re interested in what the actual requests to LLMs look like from Cursor’s servers you have to set something like this up. Plus, this lets us modify the request and A/B test variations!

bredren|8 months ago

Cursor and other IDE modality solutions are interesting but train sloppy use of context.

From the extracted prompting Cursor is using:

> Each time the USER sends a message, we may automatically attach some information about their current state…edit history in their session so far, linter errors, and more. This information may or may not be relevant to the coding task, it is up for you to decide.

This is the context bloat that limits effectiveness of LLMs in solving very hard problems.

This particular .env example illustrates the low stakes type of problem cursor is great at solving but also lacks the complexity that will keep SWE’s employed.

Instead I suggest folks working with AI start at chat interface and work on editing conversations to keep clean contexts as they explore a truly challenging problem.

This often includes meeting and slack transcripts, internal docs, external content and code.

I’ve built a tool for surgical use of code called FileKitty: https://github.com/banagale/FileKitty and more recently slackprep: https://github.com/banagale/slackprep

That let a person be more intentional about what the problem they are trying to solve by only including information relevant to the problem.

jacob019|8 months ago

I had this thought as well and find it a bit surprising. For my own agentic applications, I have found it necessary to carefully curate the context. Instead of including an instruction that we "may automatically attach", only include an instruction WHEN something is attached. Instead of "may or may not be relevant to the coding task, it is up for you to decide"; provide explicit instruction to consider the relevance and what to do when it is relevant and when it is not relevant. When the context is short, it doesn't matter as much, but when there is a difficult problem with long context length, fine tuned instructions make all the difference. Cursor may be keeping instructions more generic to take advantage of cached token pricing, but the phrasing does seem rather sloppy. This is all still relatively new, I'm sure both the models and the prompts will see a lot more change before things settle down.

lyjackal|8 months ago

I've been curious to see the process for selecting relevant context from a long conversation. has anyone reverse engineered what that looks like? how is the conversion history pruned, and how is the latest state of a file represented?

GabrielBianconi|8 months ago

We didn't look into that workflow closely, but you can reproduce our work (code in GitHub) and potentially find some insights!

We plan to continue investigating how it works (+ optimize the models and prompts using TensorZero).

notpushkin|8 months ago

Hmm, now that we have the prompts, would it be possible to reimplement Cursor servers and have a fully local (ahem pirated) version?

smcleod|8 months ago

Or you could just use Cline / Roo Code which are better for agentic coding and open source anyway...

handfuloflight|8 months ago

Were you really waiting for the prompts before disembarking on this adventure?

tomr75|8 months ago

presumably their apply model is run on their servers

I wonder how hard it would be to build a local apply model/surely that would be faster on a macbook

sjapps|8 months ago

[deleted]