top | item 40893028

Gemma 2 on AWS Lambda with Llamafile

10 points| metaskills | 1 year ago |unremarkable.ai

3 comments

A small experiment to see if we are there yet with highly virtualized CPU compute and Small Language Models (SLM). The answer is a resounding maybe, but most likely not. Huge thanks to Justine for her work on Llamafile supported by Mozilla. Hope folks find this R&D useful.

noman-land|1 year ago

Can you expound a bit about why not?

Does it produce bad results? Is it slow to respond? Slow to load?

I've been wanting to play around with llamafile-based edge functions but storing even small models in GitHub (for automated deploys) is a terrible and often impossible experience.

xhkkffbf|1 year ago

This is great work. Has anyone used it enough to compare the lambda costs with the cost of running a comparable model on, say, OpenAI?