(no title)
callesgg | 2 years ago
1. Ask it things. Let it answer.
2. Ask it to find errors in the answer it outputted and for it to correct the answer.
3. Use the original prompt and the corrected output as training data.
This should, with each iteration make the model less and less likely to output statements that are self contradictions or obviously wrong, until the model can no longer spot its own faults.
Drakim|2 years ago
But if you let an AI's approval be the metric, things turn a lot more fussy and subjective. The goal is not actually "to write a good answer without error" but actually "to write an answer that is approved by the AI". Those are very different goals, and as you keep using it you'll get a bigger and bigger divergence, until eventually the AI is just answering complete garbage nonsense that precisely hits certain sweet spots in the grading AI.
This divergence of the target vs the actual human goal is a pretty interesting problem in AI safety research. I love the example where an AI trained to stay alive as long as possible in Tetris realized that pausing the game was the best strategy.
aqme28|2 years ago
But yeah, you’re going to need an objective metric or human input otherwise the system is going to diverge in strange ways.
newswasboring|2 years ago
Dwedit|2 years ago
jkeisling|2 years ago
- Self-Instruct: Creates artificial training data on instruction tuning from an untuned base model, from a tiny seed of prompts, and filters out the bad ones before fine-tuning. Manages to approach Instruct-GPT performance with only ~100 human labels. https://arxiv.org/abs/2212.10560
jointpdf|2 years ago
8jy89hui|2 years ago
tysam_and|2 years ago
This is somewhat similar to how GANs try to learn the density of the underlying data, but here you do not have the underlying data as a reference, if that makes sense. It's sort of like filling a mattress with helium instead of air. Sure, the mattress will be lighter, but that does not mean you will float on it, if that makes any sense at all.
Hope that helps as a cogent answer to this question.