(no title)
FlyingLawnmower | 5 months ago
We have support for Huggingface Transformers, llama.cpp, vLLM, SGLang, and TensorRT-LLM, along with some smaller providers (e.g. mistral.rs). Using any of these libraries as an inference host means you can use an OSS model with the guidance backend for full support. Most open source models will run on at least one of these backends (with vLLM probably being the most popular hosted solution, and transformers/llama.cpp being the most popular local model solutions)
We're also the backend used by OpenAI/Azure OpenAI for structured outputs on the closed source model side.
No comments yet.