nihit-desai | 2 years ago | on: Show HN: Autolabel, a Python library to label and enrich text data with LLMs
nihit-desai's comments
nihit-desai | 2 years ago | on: Show HN: Autolabel, a Python library to label and enrich text data with LLMs
nihit-desai | 2 years ago | on: Show HN: Autolabel, a Python library to label and enrich text data with LLMs
The earlier post was a report summarizing LLM labeling benchmarking results. This post shares the open source library.
Neither is intended to be an ad. Our hope with sharing these is to demonstrate how LLMs can be used for data labeling, and get feedback from the community
nihit-desai | 2 years ago | on: LLMs can label data as well as human annotators, but 20 times faster
All the datasets and labeling configs used for these experiments are available in our Github repo (https://github.com/refuel-ai/autolabel) as mentioned in the report. Hope these are useful!
nihit-desai | 2 years ago | on: LLMs can label data as well as human annotators, but 20 times faster
Is there some noise in these labels? Sure! But the relative performance with respect to these is still a valid evaluation
nihit-desai | 2 years ago | on: LLMs can label data as well as human annotators, but 20 times faster
The need for labeled data for any kind of training is a constant though :)
nihit-desai | 2 years ago | on: LLMs can label data as well as human annotators, but 20 times faster
From benchmarking, we've been positively surprised by how effective few-shot learning and PEFT are, at closing the domain gap.
"When it encounters novel data (value) it will likely perform poorly" -- is that not true of human annotators too? :)
nihit-desai | 2 years ago | on: LLMs can label data as well as human annotators, but 20 times faster
nihit-desai | 2 years ago | on: LLMs can label data as well as human annotators, but 20 times faster
For each of these datasets, we specify task guidelines/prompts for the LLM and human annotators, and compare each of their performance against ground truth labels.
nihit-desai | 2 years ago | on: Cloud GPU Resources and Pricing
nihit-desai | 2 years ago | on: Show HN: SpaceBadgers – Free and Libre SVG Badges
nihit-desai | 3 years ago | on: Show HN: New course on real-world ML systems
The format is 4 weeks of project-driven learning with a peer cohort of motivated, interesting learners. It takes about 10 hours total per week including interactive discussion time and project work. First iteration of the course starts July 11th. We are offering a limited number of scholarships for the course (details on the course page)
Autolabel is quite orthogonal to this - it's a library that makes interacting with LLMs very easy for labeling text datasets for NLP tasks.
We are actively looking at integrating function calling into Autolabel though, for improving label quality, and support downstream processing.