top | item 42115016

(no title)

c6401 | 1 year ago

IMO the simplest option is llamafile (it is multiplatform using "cosmopolitan" lib so should run on Windows too, but I haven't tried)

    wget https://huggingface.co/Mozilla/Llama-3.2-1B-Instruct-llamafile/resolve/main/Llama-3.2-1B-Instruct.Q6_K.llamafile
    chmod +x Llama-3.2-1B-Instruct.Q6_K.llamafile
    ./Llama-3.2-1B-Instruct.Q6_K.llamafile --server

discuss

c6401|1 year ago

It has a webui, but this is how I use it from python (sorry I like python, but similar connection method should work from the other langs too).

    ai = openai.AsyncOpenAI(base_url="http://localhost:8080/v1", api_key="sk-no-key-required")
    response = await ai.chat.completions.create(
        messages=[
            {"role": "system", "content": "..."}, {"role": "user", "content": "..."},
        ],
        max_tokens=100,
        model="Llama-3.2-1B-Instruct.Q6_K.gguf",
    )

    content = response.choices[0].message.content