wsxiaoys | 1 month ago | on: A 4-part deep dive on building AI code edits inside VS Code
wsxiaoys's comments
wsxiaoys | 2 months ago | on: Working Around VS Code APIs to Render LLM Suggestions
Most tools fork the editor or build a custom IDE so they can skip the hard interaction problems.
Our NES is a VS Code–native feature. That meant living inside strict performance budgets and interaction patterns that were never designed for LLMs proposing multi-line, structural edits in real time.
In this case, surfacing enough context for an AI suggestion to be actionable, without stealing attention, is much harder.
That pushed us toward a dynamic rendering strategy instead of a single AI suggestion UI. Each path gets deliberately scoped to the situations where it performs best, aligning it with the least disruptive representation for a given edit.
If AI is going to live inside real editors, I think this is the layer that actually matters.
Happy to hear your thoughts!
wsxiaoys | 2 months ago | on: More Context Won't Fix Bad Timing in Tab Completion for Coding Agents
This was one of the more unexpectedly tricky layers of building real-time LLM suggestions, and I’d love to hear how others have approached timing, cancellation, and speculative prediction in their editors or agents.
wsxiaoys | 3 months ago | on: How we built context management for tab completion
The real challenge (and what ultimately determines whether NES feels “intent-aware”) was how we manage context in real time while the developer is editing live. For anyone building real-time AI inside editors, IDEs, or interactive tools.
I hope you find this interesting. Happy to answer any questions!
wsxiaoys | 3 months ago | on: Creating a Tab completion model from scratch
The cool part is we fine-tuned Gemini Flash Lite with LoRA instead of an OSS model, helping us avoid all the infra overhead and giving us faster responses with lower compute cost.
wsxiaoys | 4 months ago | on: Ask HN: What are you working on? (October 2025)
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
Example: https://demo.tabbyml.com/search/how-to-configure-sso-in-tabb...
Settings page: https://demo.tabbyml.com/settings/providers/doc
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
wsxiaoys | 1 year ago | on: Tabby: Self-hosted AI coding assistant
Tabby has undergone significant development since its launch two years ago [0]. It is now a comprehensive AI developer platform featuring code completion and a codebase chat, with a team [1] / enterprise focus (SSO, Access Control, User Authentication).
Tabby's adopters [2][3] have discovered that Tabby is the only platform providing a fully self-service onboarding experience as an on-prem offering. It also delivers performance that rivals other options in the market. If you're curious, I encourage you to give it a try!
[1]: https://demo.tabbyml.com/search/how-to-add-an-embedding-api-...
[2]: https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ
[3]: https://www.linkedin.com/posts/kelvinmu_last-week-i-introduc...
wsxiaoys | 1 year ago | on: Rank Fusion for improved code context in RAG
We're satisfied with the quality and performance this approach yields, while still keep Tabby embed everything into a single binary.
[1] My binary vector search is better than your FP32 vectors: https://blog.pgvecto.rs/my-binary-vector-search-is-better-th...
[2] Tantivy: https://github.com/quickwit-oss/tantivy
wsxiaoys | 1 year ago | on: Ask HN: Who is hiring? (May 2024)
Tabby strives to become the AI Intelligence Stack for the entire development lifecycle. We are a fully distributed, all-remote team.
Our tech stack includes:
* Frontend: TypeScript, React, Next.js
* Backend: Rust, GraphQL
* IDE/Extension: TypeScript, Node.js
* Tools: GitHub, Slack, Linear, Lark
Please apply here: https://tabbyml.vercel.app/wsxiaoys | 1 year ago | on: Google CodeGemma: Open Code Models Based on Gemma [pdf]
Blog post on repository context: https://tabby.tabbyml.com/blog/2023/10/16/repository-context...
(Disclaimer: I started this project)
wsxiaoys | 2 years ago | on: Show HN: CodeDiagram – VSCode extension for quickly making code flow diagrams
It'll be really nice if the code diagram itself could be version tracked together with code.
wsxiaoys | 2 years ago | on: Show HN: Tabby back end in 20 Python lines (self-hosted AI coding assistant)
Yeah - ultimately, it won't be as performant or feature-rich compared to https://github.com/TabbyML/tabby, but it's still perfect for educational purposes!
I wrote a 4-part series on how we built the AI edit model behind Pochi’s coding agent.
It covers everything from real-time context management and request lifecycles to dynamically rendering code edits using only VS Code’s public APIs.
I’ve written this as openly and concretely as possible, with implementation details and trade-offs.
Full series:
1. The Edit Model Behind Tab Completion: https://docs.getpochi.com/developer-updates/how-we-created-n...
2. Real-Time Context Management in Your Code Editor: https://docs.getpochi.com/developer-updates/context-manageme...
3. Request Management Under Continuous Typing: https://docs.getpochi.com/developer-updates/request-manageme...
4. Dynamic Rendering Strategies for AI Code Edits: https://docs.getpochi.com/developer-updates/dynamic-renderin...