top | item 42080038

(no title)

haxton | 1 year ago

The demos I see for these types of tools are always some toy project and doesn't reflect day to day work I do at all. Do you have any example PRs on larger more complex projects that have been written with codebuff and how much of that was human interactive?

The real problem I want someone to solve is helping me with the real niche/challenging portion of a PR, ex: new tiptap extension that can do notebook code eval, migrate legacy auth service off auth0, record and replay API GET requests and replay a % of them as unit tests, etc.

So many of these tools get stuck trying to help me "start" rather than help me "finish" or unblock the current problem I'm at.

discuss

order

jahooma|1 year ago

I hear you. This is actually a foundational idea for Codebuff. I made it to work within the large-ish codebase of my previous startup, Manifold Markets.

I want the demos to be of real work, but somehow they never seem as cool unless it's a neat front end toy example.

Here is the demo video I sent in my application to YC, which shows it doing real stuff: https://www.loom.com/share/fd4bced4eff94095a09c6a19b7f7f45c?...

tyre|1 year ago

This comment makes me think of Coke vs. Pepsi.

Historically, Pepsi won taste tests and people chose Coke. Because Pepsi is sweeter, so that first sip tastes better. But it's less satisfying—too sweet—to drink a whole can.

The sexy demos don't, in my opinion and experience, win over the engineers and leaders you need. Lil startups, maybe, and engineers that love the flavor of the week. But for solving real, unsexy problems—that's where you'll pull in organizations.

reportgunner|1 year ago

Watching the demo it seems like it would be more effective to learn the skills you need rather than using this for a decade.

It takes 5+ seconds just to change one field to dark mode, I don't even want to imaigne a situation where I have two fields and I want to explain that I need to change this field and not that field.

I'm not sure who is the target audience for this, people who want to be programmers without learning programming ?

senko|1 year ago

My 2c as someone who worked on a similar product:

> it seems like it would be more effective to learn the skills you need rather than using this for a decade.

Think of it as a calculator. You do want to be able to do addition, but not neccessarily to manually add 4-digit numbers in your head.

> It takes 5+ seconds just to change one field to dark mode

Our current LLMs are way too slow for this. I am chuckling every time someone says "we don't need LLMs to be faster because people can't read faster". Imagine this using Groq with a future model with similar capability level, and taking 0.5 seconds to do this small change.

People need to remember we're at the very beginning of using AI for coding. Of course it's suboptimal for majority of cases. Unless you believe we're way past half the sigmoid curve on AI improvements (which I don't), consider that this is the worst the AI is ever going to be for coding.

A year ago people were incredulous when told that AI could code. A year before that people would laugh you out of the room. Now we're at the stage where it kinda works, barely, sometimes. I'm bullish on the future.

bambax|1 year ago

Every experience I have had with LLMs generating code. LLMs tend to follow the prompt much too closely and produce large amounts of convoluted code that in the end prove not only unnecessary but quite toxic.

Where LLMs shine is in being a personal Stack Overflow: asking a question and having a personalized, specific answer immediately, that uses one's data.

But solving actual, real problems still seem out of reach. And letting them touch my files sound crazy.

(And yes, ok, maybe I just suck at prompting. But I would need detailed examples to be convinced this approach can work.)

brandonchen|1 year ago

I'm sure your prompting is great! It's just hard because LLMs tend to be very wordy by default. This was something we struggled with for a while, but I think we've done a good job at making Codebuff take a more minimal approach to code edits. Feel free to try it, let me know if it's still too wordy/convoluted for you.

handfuloflight|1 year ago

> LLMs tend to follow the prompt much too closely

> produce large amounts of convoluted code that in the end prove not only unnecessary but quite toxic.

What does that say about your prompting?

jeswin|1 year ago

> Do you have any example PRs on larger more complex projects that have been written with codebuff and how much of that was human interactive?

We have a lot of code in production which are AI written. The important thing is that you need to consciously make a module or project AI-ready. This means that things like modularity and smaller files are even more important than they usually are.

I can't share those PRs, but projects on my profile page are almost entirely AI written (except the https://bashojs.org/ link). Some of them might meet your definition of niche based on the example you provided.

zh2408|1 year ago

This is so REAL: LLMs suck probably because your modularity sucks LOL

cratermoon|1 year ago

Kind of like "please describe the solution and I will write code to do it". That's not how programming works. Writing code and testing it against expectations to get to the solution, that's programming.

brandonchen|1 year ago

FWIW I don't find that I'm losing good engineering habits/thought processes. Codebuff is not at the stage where I'm comfortable accepting its work without reviewing, so I catch bugs it introduces or edge cases it's missed. The main difference for me is the speed at which I can build now. Instead of fussing over exact syntax or which package does what, I can keep my focus on the broader implications of a particular architecture or nuances of components, etc.

I will admit, however, that my context switching has increased a ton, and that's probably not great. I often tell Codebuff to do something, inevitably get distracted with something else, and then come back later barely remembering the original task

mathgeek|1 year ago

Language is important here. Programming, at its basic definition, is just writing code that programs a machine. Software development or even design/engineering are closer to what you’re referring to.

Aeolun|1 year ago

> ex: new tiptap extension that can do notebook code eval

Claude wrote me a prosemirror extension doing a bunch of stuff that I couldn’t figure out how to do myself. It was very convenient.

craigds|1 year ago

+1; Ideally I want a tool I don't have to specify the context for. If I can point it via config files at my medium-sized codebase once (~2000 py files; 300k LOC according to `cloc`) then it starts to get actually usable.

Cursor Composer doesn't handle that and seems geared towards a small handful of handpicked files.

Would codebuff be able to handle a proper sized codebase? Or do the models fundamentally not handle that much context?

jahooma|1 year ago

Yes. Natively, the models are limited to 200k tokens which is on the order of dozens of files, which is way too small.

But Codebuff has a whole preliminary step where it searches your codebase to find relevant files to your query, and only those get added to the coding agent's context.

That's why I think it should work up to medium-large codebases. If the codebase is too large, then our file-finding step will also start to fail.

I would give it a shot on your codebase. I think it should work.

amethystcookie|1 year ago

It's pretty good for complex projects imo because codebuff can understand the structure of your codebase and which files to change to implement changes. It still struggles when there isn't good documentation, but it has helped me finish a number of projects

handfuloflight|1 year ago

> It still struggles when there isn't good documentation

@Codebuff team, does it make sense to provide a documentation.md with exposition on the systems?

SpaghettiX|1 year ago

Absolutely! Imaging setting a bunch of css styles through a long winded AI conversation, when you could have an IDE to do it in a few seconds. I don't need that.

The long tail of niche engineering problems is the time consuming bit now. That's not being solved at all, IMHO.

bung|1 year ago

> ... setting a bunch of css styles through a long winded AI conversation

Any links on this topic you rate/could share?

fragmede|1 year ago

Which IDE do you use for CSS editing/adjustment?

brandonchen|1 year ago

Great question – we struggled for a long time to put our demo together precisely for this reason. Codebuff is so useful in a practical setting, but we can't bore the audience with a ton of background on a codebase when we do demos, so we have to pick a toy project. Maybe in the future, we could start our demo with a half-built project?

Hopefully the demo on our homepage shows a little bit more of your day-to-day workflows than other codegen tools show, but we're all ears on ways to improve this!

To give a concrete example of usefulness, I was implementing a referrals feature in Drizzle a few weeks ago, and Codebuff was able to build out the cli app, frontend, backend, and set up db schema (under my supervision, of course!) because of its deep understanding of our codebase. Building the feature properly requires knowing how our systems intersect with one another and the right abstraction at each point. I was able to bounce back and forth with it to build this out. It felt akin to working with a great junior engineer, tbh!

EDIT: another user shared their use cases here! https://news.ycombinator.com/item?id=42079914

eterm|1 year ago

Why not take a large and complex code-base such as the firefox source code, feed that in and demonstrate how that goes?

echelon|1 year ago

If you're not worried about showing off little hints of your own codebase, record running it on one of your day to day engineering tasks. It's perfect dog fooding and would be a fun meta example.

> To give a concrete example of usefulness, I was implementing a referrals feature in Drizzle a few weeks ago, and Codebuff was able to build out the cli app, frontend, backend, and set up db schema

Record this!

Better yet, stream it on Twitch and/or YouTube and/or Discord and build a small community of followers.

People would love to watch you.