(no title)
h14h | 5 days ago
Using Vultr for the VPS hosting, as well as their inference product which AFAIK is by far the cheapest option for hosting models of these class ($10/mo for 50M tokens, and $0.20/M tokens after that). They also offer Vector Storage as part of their inference subscription which makes it very convenient to get inference + durable memory & RAG w/ a single API key.
Their inference product is currently in beta, so not sure whether the price will stay this low for the long haul.
ac29|5 days ago
What other models do they offer? The web page is very light on details
h14h|4 days ago
And yeah, given Vultr inference is in beta, their docs ain't great. In addition to kimi-k2-instruct and gpt-oss-120b, they currently offer:
deepseek-r1-distill-llama-70b deepseek-r1-distill-qwen-32b qwen2.5-coder-32b-instruct
Best way to get accurate up-to-date info on supported models is via their api: https://api.vultrinference.com/#tag/Models/operation/list-mo...
K2 is the only of the 5 that supports tool calling. In my testing, it seems like all five support RAG, but K2 loses knowledge of its registered tools when you access it through the RAG endpoint forcing you to pick one capability or the other (I have a ticket open for this).
Also, the R1-distill models are annoying to use because reasoning tokens are included in the output wrapped in <think> tags instead of being parsed into the "reasoning_content" field on responses. Also also, gpt-oss-120b has a "reasoning" field instead of "reasoning_content" like the R1 models.