(no title)
sestinj | 11 months ago
1. The Bitter Lesson extends to test-time compute (some call this the "Bitter-er Lesson" https://yellow-apartment-148.notion.site/AI-Search-The-Bitte...), and we've bet that agentic LLMs will become a major transformation in how software is built. Agent mode (https://docs.continue.dev/agent/how-to-use-it) is here to stay. This means that models are going to take very extended action for you. It could be 1, 15, 60, or more minutes of work at a time without requiring interjection. As these trajectories become longer it becomes _more_, not less, important to give the model the correct initial conditions. This is the role of rules and prompts.
2. Cheap inference matters and the trend in frontier models (for those watching) is distillation, not increased parameter count. There's great reason to believe that we're headed toward a future where a few billion parameter model can contain all of the reasoning circuits necessary to solve difficult problems and that when combined with a massive context window will become the "engine" in every AI tool. The difficult part is obtaining that context, and if you watch the actions of people who work at companies, a large majority of their time is spent on reading, writing, sharing the right context with each other.
3. My co-founder Ty wrote a piece 2 years ago describing the path where language models automate increasing amounts of software and we use live coding interaction data to make them even better, in a positive feedback loop of automation: https://blog.continue.dev/its-time-to-collect-data-on-how-yo.... If you believe in this future, then you're going to want to collect your own data to post-train (e.g. https://arxiv.org/pdf/2502.18449v1) rather than letting another tool absorb all of the intellectual property without giving it back. They aren't going to train a model that knows the private details of every company's workflows, they will train on a distribution that helps primarily with the most basic tech stacks.
4. No matter how many parameters a foundation model has, there's no way for it to know in the weights that "We (at some particular team within some larger company) organize our unit tests into separate files for selectors, actions, and tests" (e.g. https://hub.continue.dev/continuedev/playwright-e2e-test). This is purely team knowledge and preference, and is often private data. The next thought in the chain here is "can't it just use tools to inspect the repository and find this out?". And the answer is absolutely, but that quickly gets expensive, slow, annoying. And you're going to end up writing a rule to save both money and time. Next: can't the model just write the rules for me? Again, absolutely! We're working on this. And to us the natural outcome of this is that the model writes the rules and you want to share this potentially expensive "indexing" step with your team or the world.
5. Probably the most obvious, but worth saying: advanced language models will use tools much more. Hooking up the right MCP is a non-negotiable part of getting out of the way so they can do their work.
No comments yet.