(no title)
chisleu | 7 months ago
I downloaded the 4bit quant to my mac studio 512GB. 7-8 minutes until first tokens with a big Cline prompt for it to chew on. Performance is exceptional. It nailed all the tool calls, loaded my memory bank, and reasoned about a golang code base well enough to write a blog post on the topic: https://convergence.ninja/post/blogs/000016-ForeverFantasyFr...
Writing blog posts is one of the tests I use for these models. It is a very involved process including a Q&A phase, drafting phase, approval, and deployment. The filenames follow a certain pattern. The file has to be uploaded to s3 in a certain location to trigger the deployment. It's a complex custom task that I automated.
Even the 4bit model was capable of this, but was incapable of actually working on my code, prefering to halucinate methods that would be convenient rather than admitting it didn't know what it was doing. This is the 4 bit "lobotomized" model though. I'm excited to see how it performs at full power.
No comments yet.