top | item 43047223

(no title)

sarosh | 1 year ago

Interesting that the underlying model, a LoRA fine-tune of Qwen2.5-Coder-32B, relies on synthetic data from Claude[1]:

  But we had a classic chicken-and-egg problem—we needed data to train the model, but we didn't have any real examples yet. So we started by having Claude generate about 50 synthetic examples that we added to our dataset. We then used that initial fine-tune to ship an early version of Zeta behind a feature flag and started collecting examples from our own team's usage.

  ...

  This approach let us quickly build up a solid dataset of around 400 high-quality examples, which improved the model a lot!

I checked the training set, but couldn't quickly identify which were 'Claude' produced[2]. Would be interesting to see them distinguished out.

[1]: https://zed.dev/blog/edit-prediction [2]: https://huggingface.co/datasets/zed-industries/zeta

discuss

hereonout2|1 year ago

Yes this is very interesting!

The hardware, tooling and time required to do a LoRa fine tune like this are extremely accessible.

Financially this is also not a big expense and I assume would have cost in the order of $100s of dollars in GPU rentals, possibly less if you ignore experimentation time.

So what is a barrier to entry here? The data? Well they didn't have that either so automatically generated a dataset of just 500 examples to achieve the task.

I'm sure they spent some time on that but again it doesn't sound an incredibly challenging task.

It's worth realising if you've not delved into fine tuning llms before. In terms of time, scale and financial costs there is a world of difference between building a product like this and building a base model.