top | item 43666607

(no title)

ach9l | 10 months ago

yep, this is the way, i guess. as somebody who has taken this very same exercise of cloning cline using cline for my own cline, a cline that compiles itself, i’ve also learned to steal* things over the years. i’ve seen your extension, but i was reluctant to give it a try just because it looked just like any other clone, but i guess i’ll do the same thing again. i’ve started to see the value when i decided to fork and declutter again, this time roo code. actually i’ve perfected forking cline and derivatives with my own framework. when you know what you’re doing, these tools don’t put you in the flow. vibe coding done right is another level of progress. i’ve got a cs major though, so i’m a bit biased, also helps that i’ve done masters in theoretical computing, theoretical linguistics and machine learning, so i’ve always been attracted to these toys and frameworks, not so much to javascript or web development however. this whole exercise, or should i say automation? now, takes me back to the days i wrote compilers. this is just as fun as code that can compile itself in the end. same shit all over again.

so i gave roo code a try, set a few test cases, and proceeded to declutter, refactor, rewrite the whole thing. i’ve never really written long apps in javascript nor typescript for that matter, and man, i just think 3k lines of code in a single file is just bad code, and i’ve been proven right. 3k lines fucks your context really good. you can’t use cline to code cline because it will ruin you financially one way or another. jesus fuckin’ christ the old cline.ts file was like responsible for the whole damn extension, over 3k lines, the kind of code i would write 10 years ago as an intern. anyway, i’ve added (and learned in the process) react.js components to have an interface to easily collect the data for my own loras. honestly if you are looking to integrate large local models into kilo, i’d love to collaborate. my forks mostly provide data analysis for the fine-tuning of my own personal repositories, using years of commit history as training data, even bash history. i’ve benchmarked several tasks. i can basically fork roo code or cline, declutter it, refactor it, with a gemma or qwq running in a mac studio for a few watts. i’ve been logging everything that i do ever since we were granted api access to gpt3 at a lab i coordinated about 5 years ago. so i’ve mastered the filtering of the completions api, reconstruction of streams, all using airflow and python scripts. i added a couple buttons such as the download task you’ve also added, but more along the lines of “send this to the batch in the datacenter so we train a new gemma” filtering good solutions vs not so good, the old thumbs up thumbs down situation, helps a lot, adding a couple of mcp integrations for applying quick loras locally, plus the addition of test driven development, aiming for reinforcement learning based loras. i built myself a very nice toy, or should i say, i bootstrapped a very nice tool that creates itself? anyway, thanks for sharing this.

i think the next major thing that is gonna happen with these tools is that it gets free at home as new chips become cheaper. llama 4 running in mac studios or dgx stations is as fast as you can get today and it is already good enough (if prepared correctly) to build any yc startup codebase from before covid, or even from before chatgpt, in a weekend. it will definitely happen. i’m wrapping fixing llama4 scout, allow me to mention the fact that it has a tendency to fix bugs by commenting code and adding TODOs, fucking great architecture though, just what we needed, i mean for optimal local development. i’ll try to publish results soon enough, optimized for the top mac studio though, haven’t got a dgx yet. i’ll prepare macbook versions too. the world needs more of this, a cline that fixes itself just on battery power...

discuss

jawon|10 months ago

What size gemma are you using? Is the refactoring running independently or managed by you?

ach9l|10 months ago

i've been testing all models that fit the mac studio 512 gb ever since i got it. previously i was mostly focused on getting tool use and chain of thought fine-tuning for coding, around the size of llama 3.2 11b. but even some distill r1s on llama 3 70b run well on macbooks, although quite slow compared to a regular api call to the closed models.

for mac studios i've found the sweet spot to be the largest gemma, up until llama scout was released, which fits the mac studio best. scout, although faster to generate, takes a while longer to fill in the long context, basically getting the same usability speeds as with the qwq or gemma 27b.

the refactoring is a test driven task that i've programmed to run by itself, think deep research, until it passes the tests or exhausts imposed trial limits. i've wrote it by instructing gemini, r1 and claude. in short, i've made gemini read and document proposals for refactoring, based on the way i code and strict architectural patterns that i find optimal for projects that handle both an engine and some views such as the react.js views that are present in these vscode extensions.

gemini pro gets it really well and has enough context capacity to maintain several different branches of the same codebase with these crazy long files without losing context. once this task is completed, training a smaller model based on the executed actions, (by that i mean all the tool use: diff, insert, replace and most importantly, testing) to perform the refactoring instructions is fairly easy.