stefanwebb | 4 months ago | on: Is more training data always better?
stefanwebb's comments
stefanwebb | 4 months ago | on: The case for the return of fine-tuning
I discuss a large-scale empirical study of fine-tuning 7B models to outperform GPT-4 called "LoRA Land", and give some arguments in the discussion section making the case for the return of fine-tuning, i.e. what has changed in the past 6 months
stefanwebb | 4 months ago | on: Small Fine-Tuned Models Are All You Need
stefanwebb | 5 months ago | on: Custom AI models in hours not months with auto Data Synth and LLM-as-a-Judge
In brief, we've developed an easy-to-use platform for fine-tuning custom models. We automate data synthesis for judging and training, as well as automating the judge prompt itself. The end result is that model development times and costs are drastically cut!
Check out our Substack article above if you're interested in learning more or signing up for early access :)
stefanwebb | 5 months ago | on: Voronoi map generation in Civilization VII
stefanwebb | 5 months ago | on: Voronoi map generation in Civilization VII
stefanwebb | 5 months ago | on: Llama-Factory: Unified, Efficient Fine-Tuning for 100 Open LLMs
stefanwebb | 11 months ago | on: Avoid manual sharding in vector db or any db
stefanwebb | 1 year ago | on: DeepSearcher: A local open-source Deep Research
I think the biggest one is the goal: HF is to replicate the performance of Deep Research on the GAIA benchmark whereas ours is to teach agentic concepts and show how to build research agents with open-source.
Also, we go into the design in a lot more detail than HF's blog post. On the design side, HF uses code writing and execution as a tool, whereas we use prompt writing and calling as a tool. We do an explicit break down of the query into sub-queries, and sub-sub-queries, etc. whereas HF uses a chain of reasoning to decide what to do next.
I think ours is a better approach for producing a detailed report on an open-ended question, whereas HFs is better for answering a specific, challenging question in short form.
stefanwebb | 1 year ago | on: DeepSearcher: A local open-source Deep Research
https://milvus.io/blog/i-built-a-deep-research-with-open-sou...
https://milvus.io/blog/introduce-deepsearcher-a-local-open-s...
"There are many things one needs to live a rich and fulfilled life (according to AI researchers). A good initialization [Mishkin and Matas, 2015], attention-based neural networks [Vaswani et al., 2017], and a good title for your research paper [Myself, just now], to name a few.
In this post, we discuss another piece of eternal wisdom from AI researchers: “less is more.” Specifically, how foundation models can be fine-tuned for new capabilities with small data, in many cases less than one-thousand samples, and often outperform the same model fine-tuned on larger datasets. Meditate on that for a moment (suggested pose in figure above)."