top | item 47153171

(no title)

h14h | 5 days ago

I'm using kimi-k2-instruct as the primary model and building out tool calls that use gpt-oss-120b to allow it to opt-in to reasoning capabilities.

Using Vultr for the VPS hosting, as well as their inference product which AFAIK is by far the cheapest option for hosting models of these class ($10/mo for 50M tokens, and $0.20/M tokens after that). They also offer Vector Storage as part of their inference subscription which makes it very convenient to get inference + durable memory & RAG w/ a single API key.

Their inference product is currently in beta, so not sure whether the price will stay this low for the long haul.

discuss

ac29|5 days ago

You can definitely get gpt-oss-120b for much less than $0.20/M on openrouter (cheapest is currently 3.9c/M in 14c/M out). Kimi K2 is an order of magnitude larger and more expensive though.

What other models do they offer? The web page is very light on details

h14h|4 days ago

Oh dang I had no idea that gpt-oss-120b was that cheap these days.

And yeah, given Vultr inference is in beta, their docs ain't great. In addition to kimi-k2-instruct and gpt-oss-120b, they currently offer:

deepseek-r1-distill-llama-70b deepseek-r1-distill-qwen-32b qwen2.5-coder-32b-instruct

Best way to get accurate up-to-date info on supported models is via their api: https://api.vultrinference.com/#tag/Models/operation/list-mo...

K2 is the only of the 5 that supports tool calling. In my testing, it seems like all five support RAG, but K2 loses knowledge of its registered tools when you access it through the RAG endpoint forcing you to pick one capability or the other (I have a ticket open for this).

Also, the R1-distill models are annoying to use because reasoning tokens are included in the output wrapped in <think> tags instead of being parsed into the "reasoning_content" field on responses. Also also, gpt-oss-120b has a "reasoning" field instead of "reasoning_content" like the R1 models.