(no title)
saurabh20n | 3 years ago
A few questions that might help an enterprise customer: How big is your base model? Where did you find more datasets (maybe just a hint would be sufficient)? Are you using SantaCoder [3]? Anything you can say about your fine-tuning that makes it special? Totally on board with you that HumanEval/MBPP are not great benchmarks for real world, and do you have a suggested alternative to help me see the value?
The calculus for an enterprise customer might be: "We could fine tune a 6B model on our internal code and internal benchmarks (say with a month of work, a few thousand in compute, 2 people on task), but I'd rather buy an off-the-shelf solution like codecomplete.ai. They give us XYZ benefits." Articulate the XYZ for a technical decision maker who will be your target audience.
* [1] https://huggingface.co/datasets/bigcode/the-stack
lumax15|3 years ago
I will expand a bit on fine-tuning. It's really hard to get this right, and the iteration speed is slow. Of course these companies can build their own, but we want to save them a lot of headache.
So far, we haven't found any off-the-shelf open source base model that works super well for code completions. We've augmented models with a huge amount of data in order to see our current performance, and we ran into a lot of pain along the way.