(no title)
solarkraft | 1 day ago
Up until relatively recently, while people had already long been making these claims, it came with the asterisks of „oh, but you can’t practically use more than a few K tokens of context“.
solarkraft | 1 day ago
Up until relatively recently, while people had already long been making these claims, it came with the asterisks of „oh, but you can’t practically use more than a few K tokens of context“.
derekp7|1 day ago
Qwen 3.5 122b/a10b (at q3 using unsloth's dynamic quant) is so far the first model I've tried locally that gets a really usable RPN calculator app. Other models (even larger ones that I can run on my Strix Halo box) tend to either not implement the stack right, have non-functional operation buttons, or most commonly the keypad looks like a Picasso painting (i.e., the 10-key pad portion has buttons missing or mapped all over the keypad area).
This seems like such as simple test, but I even just tried it in chatgpt (whatever model they serve up when you don't log in), and it didn't even have any numerical input buttons. Claude Sonet 4.6 did get it correct too, but that is the only other model I've used that gets this question right.
rienko|1 day ago
airstrike|23 hours ago
if so, a better approach would be to ask it to first plan that entire task and give it some specific guidance
then once it has the plan, ask it to execute it, preferably by letting it call other subagents that take care of different phases of the implementation while the main loop just merges those worktrees back
it's how you should be using claude code too, btw
tempest_|1 day ago
HarHarVeryFunny|7 hours ago
andy_ppp|1 day ago
rubyn00bie|1 day ago
The more I use the cloud based frontier models, the more virtue I find in using local, open source/weights, models because they tend to create much simpler code. They require more direct interaction from me, but the end result tends to be less buggy, easier to refactor/clean up, and more precisely what I wanted. I am personally excited to try this new model out here shortly on my 5090. If read the article correctly, it sounds like even the quantized versions have a “million”[1] token context window.
And to note, I’m sure I could use the same interaction loop for Claude or GPT, but the local models are free (minus the power) to run.
[1] I’m a dubious it won’t shite itself at even 50% of that. But even 250k would be amazing for a local model when I “only” have 32GB of VRAM.
__mharrison__|1 day ago