top | item 46769775

Show HN: Python SDK for RamaLama AI Containers

1 points| ersatz_username | 1 month ago |github.com

TL;DR An SDK for running AI on-device with even the most non-standard hardware.

Hey, I’m one of the maintainers of RamaLama[1] which is part of the containers ecosystem (podman, buildah, skopeo). It’s a runtime-agnostic tool for coordinating local AI inference with containers.

I put together a python SDK for programmatic control over local AI using ramalama under the hood. Being runtime agnosti you can use ramalama with llama.cpp, vLLM, mlx, etc… so long as the underlying service exposes an OpenAI compatible endpoint. This is especially powerful for users deploying to edge or other devices with atypical hardware/software configuration that, for example, requires custom runtime compilations.

``` from ramalama_sdk import RamalamaModel

runtime_image = "quay.io/ramalama/ramalama:latest" model = "huggingface://ggml-org/gpt-oss-20b-GGUF"

with RamalamaModel(model, base_image=runtime_image) as model:

    response = model.chat("How tall is Michael Jordan?")

    print(response["content"])

```

This SDK manages:

  - Pulling and verifying runtime images
  - Downloading models (HuggingFace, Ollama, ModelScope, OCI registries)
  - Managing the runtime process

It works with air-gapped deployments and private registries and also has async support.

If you want to learn more the documentation is available here: https://docs.ramalama.com/sdk/introduction.

Otherwise, I hope this is useful to people out there and would appreciate feedback about where to prioritize next whether that’s specific language support, additional features (speech to text? RAG? MCP?), or something else.

1. github.com/containers/ramalama

discuss

No comments yet.