(no title)
lifis
|
6 months ago
Does anyone know how synthetic data is commonly generated? Do they just sample the model randomly starting from an empty state, perhaps with some filtering? Or do they somehow automatically generate prompts and if how? Do they have some feedback mechanism, e.g. do they maybe test the model while training and somehow generate data related to poorly performing tests?
LeoPanthera|6 months ago
Mars008|6 months ago
I suspect there are no larger models trained on pure real-world data. They all use a mix of real and generated.
janalsncm|6 months ago
Mars008|6 months ago
But in general it's a big secret because the training data and techniques are the only difference between models as architecture is more or less settled.
duchenne|6 months ago
ethan_smith|6 months ago
unknown|6 months ago
[deleted]