fexelein's comments

fexelein | 6 months ago | on: Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model)

My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev

fexelein | 6 months ago | on: Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5

I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.

fexelein | 1 year ago | on: Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU

On my machine, I am able to create a prompt that suits my need and chat with the model in realtime. With 100% GPU offload, it replies within half a second. LM studio provides an OpenAI compatible api endpoint for my Dotnet software to use. This boosts my developer experience significantly. The Azure services are slow and if you want to regenerate a serie of responses (e.g part of conversation flow) it just takes too long. On my local machine I also do not worry about cloud costs.

As a bonus; I also use this for a personal project where I use prompts and Llama3 to control smart devices. JSON responses from the LLM are parsed and translated into the smart device commands from a raspberry pi. I control it using speech via my Apple Watch and Apple shortcuts to the raspberry pi’s api. It all works magically and fast. Way faster than pulling up the app on my phone. And yes the LLM is smart enough to control groups of devices using simple conversational AI.

edit; here's a demo https://www.youtube.com/watch?v=dCN1AnX8txM

fexelein | 4 years ago | on: Code-First vs. Product-First

As a code-first developer I never judge my work by code metrics or other things in the abstract. What does that even mean? I just cannot agree with most of the things said here.

The definition of great code is not simplicity and test coverage.

The best developers I’ve known have always been code-first developers. They care about the thing they are building, more than just the result. You don’t want a car that just rolls off a slope. You want a car that was pieced together with blood, sweat, tears and love.

page 1