botirk | 1 month ago
botirk's comments
botirk | 1 month ago | on: Nordlys Hypernova: 75.6% on SWE-Bench
botirk | 1 month ago | on: Show HN: Minikv – Distributed key-value and object store in Rust (Raft, S3 API)
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
botirk | 5 months ago
For a while, Container Apps worked fine. Then we launched our AI model router demo, and everything changed.
In just two days, we spent over $250 on GPU compute. Two uni students, a side project, and suddenly we were paying production-level bills.
Autoscaling was slow. Cold starts were bad. Costs were unpredictable.
Then I watched a talk from one of Modal’s founders about GPU infra. We gave Modal a try.
Now we’re running the same workloads for under $100, with fast autoscaling and no lag.
Azure was stable, but Modal gave us speed, control, and real cost efficiency.
Anyone else switch from Azure (or AWS/GCP) to Modal for AI workloads? What was your experience?
botirk | 5 months ago
I needed to extend the OpenAI SDK / Anthropic SDK types with some extra fields.
In most languages, this is trivial.
In Go, it meant:
→ Embedding the original struct and hoping it wouldn’t break with the next SDK release.
→ Or recreating the types entirely, just to add fields.
That feels painful for something so basic.
But here’s the twist.
I also love how Go won’t let me build endless inheritance hierarchies or clever “extension” tricks that make a codebase unreadable.
The rigidity forces simplicity.
The problem is sometimes it becomes too simple.
When I want type-specific extensions, Go makes me fight the language instead of working with it.
That’s why I both hate and love Go’s type system.
It keeps my code clean — but makes it harder to grow.
botirk | 5 months ago | on: Lessons from building an intelligent LLM router
Our first attempt was to just use a large LLM itself to decide routing. It was too costly and the decisions were unreliable.
Next we tried training a small fine-tuned LLM as a router. It was cheaper, but the outputs were poor and not trustworthy.
Then we wrote heuristics to map prompt types to model IDs. That worked for a while, but it was brittle. Every API change or workload shift broke it.
Eventually we shifted to thinking in terms of model criteria instead of hardcoded model IDs. We benchmarked models across task types, domains, and complexity levels, and made routing decisions based on those profiles.
To estimate task type and complexity, we used NVIDIA’s Prompt Task and Complexity Classifier. It classifies prompts into categories like QA, summarization, code generation, and more. It also scores prompts along six dimensions such as creativity, reasoning, domain knowledge, contextual knowledge, constraints, and few-shots. From this it produces a weighted overall complexity score.
This gave us a structured way to decide when a prompt justified a premium model like Claude Opus 4.1 and when a smaller model like GPT-5-mini would perform just as well.
Now we are working on integrating this with Google’s UniRoute (https://arxiv.org/abs/2502.08773
botirk | 5 months ago
It started as a project to understand Redis internals.
It turned into a complete implementation with real-world features.
Key features: → Full RESP protocol compatibility (works with redis-cli and Redis clients) → Master-slave replication with PSYNC → Streams support (XADD, XRANGE, XREAD) → Transactions (MULTI / EXEC / DISCARD) → Thread-safe concurrent handling with read-write locks → RDB persistence format → 256 tests with 100% pass rate
The focus was on memory safety, proper cleanup, and thread safety. The code is clean C with a modular architecture so you can actually follow how things work.
This could be useful as: → A learning resource for anyone curious about Redis internals → A lightweight alternative when you need Redis compatibility without the full Redis overhead
I would love feedback on the architecture, threading model, and implementation details.
botirk | 5 months ago
The problem is cost, running everything directly through Anthropic gets expensive fast.
We built Adaptive, a model routing platform that integrates with Claude Code as a drop-in replacement for the Claude API.
You keep the exact same Claude Code workflow, but Adaptive routes requests intelligently across models to cut costs by 60–80% while maintaining performance.
Setup takes one script install. Docs: https://docs.llmadaptive.uk/developer-tools/claude-code
botirk | 5 months ago | on: Show HN: I wrote an OS in 1000 lines of Zig
So I built one in Zig, keeping the whole thing under 1000 lines of code.
It can: → Boot from GRUB → Manage memory → Schedule simple tasks → Output text to VGA
The point was not to make it feature-rich, but to show how much is possible with just a few hundred lines if you strip everything down to the essentials.
botirk | 6 months ago | on: Show HN: Semanticcache – A high-performance semantic caching library for Go
For example, Claude Opus 4.5 solves the most tasks overall, but a significant number of tasks it fails are solved by other models like Sonnet or Gemini. The reverse is also true. This suggests strong task-level specialization that a single-model baseline cannot exploit.
We built a simple routing system to test this idea. Instead of training a new foundation model, we embed each problem description, assign it to a semantic cluster learned from a separate general coding dataset, and route the task to the model with the highest historical success rate in that cluster.
Using this approach, the system exceeds single-model baselines on SWE-Bench Verified (75.6% versus ~74% for the best individual model).
A few clarifications up front: we did not train on SWE-Bench problems or patches. Clusters are derived from general coding data, not from SWE-Bench. SWE-Bench is used only to estimate per-cluster model success rates. At inference time, routing uses only the problem description and historical cluster statistics, with no repo execution or test-time search.
The main takeaway is not the absolute number, but the mechanism. Leaderboard aggregates hide complementary strengths between models, and even simple routing can capture a higher performance ceiling than any single model.