behohippy's comments

behohippy | 6 months ago | on: Benchmark Framework Desktop Mainboard and 4-node cluster

Yeah 48g, sub 200W seems like a sweet spot for a single card setup. Then you can stack as deep as you want to get the size of model you want for whatever you want to pay for the power bill.

behohippy | 6 months ago | on: Ask HN: Do you think differently about working on open source these days?

Sure, all the slop code projects I produce get MIT licensed on public repos. It wasn't mine to begin with, so I wouldn't prevent anyone from using it.

behohippy | 7 months ago | on: Benchmark Framework Desktop Mainboard and 4-node cluster

Used 3090s have been getting expensive in some markets. Another option is dual 5060ti 16 gig. Mine are lower powered, single 8 pin power, so they max out around 180W. With that I'm getting 80t/s on the new qwen 3 30b a3b models, and around 21t/s on Gemma 27b with vision. Cheap and cheerful setup if you can find the cards at MSRP.

behohippy | 9 months ago | on: Deepseek R1-0528

About 768 gigs of ddr5 RAM in a dual socket server board with 12 channel memory and an extra 16 gig or better GPU for prompt processing. It's a few grand just to run this thing at 8-10 tokens/s

behohippy | 11 months ago | on: DeepSeek-V3 Technical Report

These articles are gold, thank you. I used your gemma one from a few weeks back to get gemma 3 performing properly. I know you guys are all GPU but do you do any testing on CPU/GPU mixes? I'd like to see the pp and t/s on pure 12 channel epyc and the same with using a 24 gig gpu to accelerate the pp.

behohippy | 1 year ago | on: Building a personal, private AI computer on a budget

I run the KV cache at Q8 even on that model. Is it not working well for you?

behohippy | 1 year ago | on: Building a personal, private AI computer on a budget

Qwen is a little fussy about the sampler settings, but it does run well quantized. If you were getting infinite repetition loops, try dropping the top_p a bit. I think qwen likes lower temps too

behohippy | 1 year ago | on: Building a personal, private AI computer on a budget

You probably won't be running fp16 anything locally. We typically run Q5 or Q6 quants to maximize the size of the model and context length we can run with the VRAM we have available. The quality loss is negligable at Q6.

behohippy | 1 year ago | on: Ask HN: Is anyone doing anything cool with tiny language models?

Just this pic: https://imgur.com/ip8GWIh

behohippy | 1 year ago | on: Ask HN: Is anyone doing anything cool with tiny language models?

I don't have a video but here's a pic of the output: https://imgur.com/ip8GWIh

behohippy | 1 year ago | on: Ask HN: Is anyone doing anything cool with tiny language models?

It's a 3b model so the creativity is pretty limited. What helped for me was prompting for specific stories in specific styles. I have a python script that randomizes the prompt and the writing style, including asking for specific author styles.

behohippy | 1 year ago | on: Ask HN: Is anyone doing anything cool with tiny language models?

I have a mini PC with an n100 CPU connected to a small 7" monitor sitting on my desk, under the regular PC. I have llama 3b (q4) generating endless stories in different genres and styles. It's fun to glance over at it and read whatever it's in the middle of making. I gave llama.cpp one CPU core and it generates slow enough to just read at a normal pace, and the CPU fans don't go nuts. Totally not productive or really useful but I like it.

behohippy | 1 year ago | on: Phi-3 Technical Report

I had this same issue with incomplete answers on longer summarization tasks. If you ask it to "go on" it will produce a better completion, but I haven't seen this behaviour in any other model.

behohippy | 2 years ago | on: Microsoft Phi-2 model changes licence to MIT

It's probably an evolution of the phi-1/1.5 "Textbooks are all you Need" training method: https://arxiv.org/abs/2309.05463

behohippy | 2 years ago | on: Amazon-Unveils-Q

No joke, that would be an awesome LLM project name!

behohippy | 2 years ago | on: Amazon-Unveils-Q

Top_p and top_k are pretty important concepts for LLMs same as temperature so P,K,C and F are underutilized

behohippy | 2 years ago | on: OpenLLaMA 13B Released

Hey emad, thanks for SD and this! What's the plan if Meta does Apache 2.0 for LLaMA? Just keep going and making the 30b and 65b or build different models?

behohippy | 2 years ago | on: RedPajama 7B (an Apache 2.0-licensed LLaMa) is now available

Vicuna-13b (4bit) got the answer right, the first time as well.

behohippy | 3 years ago | on: Tell HN: Impact of using 1 or 2 sticks of DDR5 on a 6800HX with 680M IG

I've noticed the same with my Asus TUF laptops. I've had 2 generations of the 15" models with Ryzen processors and adding a second stick seemed to "wake" them up in a noticeable way. They Ryzen memory controller really seems to benefit from memory running in DDR mode (2 sticks).

behohippy | 3 years ago | on: Do heat pumps work in cold climates?

My dad builds houses in northern Ontario and heat pumps seem to be getting more popular on new builds. This is a place that regularly gets below -30C in the winter. The heat pump (usually a heat pump/AC combo unit) by itself doesn't work in these conditions, you'll always pair it up with another system like a natural gas furnace, or resistive heat electric.

My personal heat pump is an older unit, it works down to -10C then then forced air electric (resistive) furnace kicks in. That thing is pricey to run, so I also light up the wood stove at that temp to reduce the costs.