fexelein
|
6 months ago
|
on: Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5
So I am running Ollama on Windows using an 10700k and 3080ti. I'm using models like Qwen3-coder (4/8b) and 2.5-coder 15b, Llama 3 instruct, etc. These models are very fast on my machine (~25-100 tokens per second depending on model)
My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev
fexelein
|
6 months ago
|
on: Qwen3 30B A3B Hits 13 token/s on 4xRaspberry Pi 5
I’m having a lot of fun using less capable versions of models on my local PC, integrated as a code assistant. There still is real value there, but especially room for improvements. I envision us all running specialized lightweight LLMs locally/on-device at some point.
fexelein
|
1 year ago
|
on: Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
On my machine, I am able to create a prompt that suits my need and chat with the model in realtime. With 100% GPU offload, it replies within half a second. LM studio provides an OpenAI compatible api endpoint for my Dotnet software to use. This boosts my developer experience significantly. The Azure services are slow and if you want to regenerate a serie of responses (e.g part of conversation flow) it just takes too long. On my local machine I also do not worry about cloud costs.
As a bonus; I also use this for a personal project where I use prompts and Llama3 to control smart devices. JSON responses from the LLM are parsed and translated into the smart device commands from a raspberry pi. I control it using speech via my Apple Watch and Apple shortcuts to the raspberry pi’s api. It all works magically and fast. Way faster than pulling up the app on my phone. And yes the LLM is smart enough to control groups of devices using simple conversational AI.
edit; here's a demo
https://www.youtube.com/watch?v=dCN1AnX8txM
fexelein
|
1 year ago
|
on: Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU
As a cloud solution developer that has to build AI on Azure I have been using this instead of Azure OpenAI. It has sped up my development workflow a lot, and for my purposes it’s comparable enough. I’m using LM studio to load these models.
fexelein
|
2 years ago
|
on: Ask HN: How can I back up an old vBulletin forum without admin access?
It seems to me like you don’t own this data. If you want to preserve this data, try again with the owner?
fexelein
|
3 years ago
|
on: Your website should work without JavaScript (2021)
As recent as 2016 I was building some sites without any JavaScript. These weren’t small sites either. You can achieve a lot using some basic forms. It was quite fun
fexelein
|
4 years ago
|
on: Show HN: 3D model file thumbnails for Windows Explorer
Hope it’s fast. I’m using a similar plugin which really is fast, but so many others are not!
fexelein
|
4 years ago
|
on: My 2022 high-end Linux PC
I’m sad that a high end pc in 2022 is made from 2018 parts (gpu)
fexelein
|
4 years ago
|
on: James Webb Space Telescope launch [video]
Cool link thanks
fexelein
|
4 years ago
|
on: Implementing RSA in Python from Scratch
fexelein
|
4 years ago
|
on: Implementing RSA in Python from Scratch
just make sure nobody ever uses it ;)
fexelein
|
4 years ago
|
on: Twitch source code and customer data has reportedly been leaked
Why not?
fexelein
|
4 years ago
|
on: Use of artificial intelligence for image analysis in breast cancer screening
It depends on the accuracy of the humans as well. Less accurate doesn’t mean not accurate.
fexelein
|
4 years ago
|
on: Code-First vs. Product-First
As a code-first developer I never judge my work by code metrics or other things in the abstract. What does that even mean? I just cannot agree with most of the things said here.
The definition of great code is not simplicity and test coverage.
The best developers I’ve known have always been code-first developers. They care about the thing they are building, more than just the result. You don’t want a car that just rolls off a slope. You want a car that was pieced together with blood, sweat, tears and love.
fexelein
|
4 years ago
|
on: Show HN: A C# library to help you enforce a Given-When-Then structured Unit test
The examples look like a lot more work. The ingredients and naming those into tuple variables, having to return the tuple type itself, three lambdas in there.
What benefit do these abstractions give me over simply coding an Arrange/Act/Assert?
My use case is custom software that I build and host that leverages LLMs for example for domotica where I use my Apple watch shortcuts to issue commands. I also created a VS2022 extension called Bropilot to replace Copilot with my locally hosted LLMs. Currently looking at fine tuning these type of models for work where I work in finance as a senior dev