(no title)
napier
|
2 years ago
I’d like to see a model with the effluent of the internet intelligently filtered from
the pretraining data by LLM and human curation, and much more effort to include digitised archival sources and the entirety of books and high quality media transcripts. I imagine it would yield far better baseline quality outputs with much less than current “requirements” for (over)correction with ultimately disastrous RLHF masking.
jiggawatts|2 years ago
Or one tuned with every fiction novel ever written, along with every screenplay.
benxh|2 years ago
napier|2 years ago