The article is quite literally a[1] review of exactly how we might evaluate that, with evidence of people who got results.
[1] To be fair, way to wordy and blowhardated version. Alexander seems to be getting worse and not better. The core ideas here could be presented in about a third the space.
> The article is quite literally a[1] review of exactly how we might evaluate that, with evidence of people who got results.
the procedure seems more like a way to evaluate Anthropic-based AIs with different numbers of parameters, rather than a cross-the-board evaluation of fine-tuned chat AIs, and then those results are extrapolated to somehow say something about all AIs that are built similarly.
unless i'm missing some key here, it feels like a rather loose way to derive experimental data from the landscape.
ajross|3 years ago
[1] To be fair, way to wordy and blowhardated version. Alexander seems to be getting worse and not better. The core ideas here could be presented in about a third the space.
astrange|3 years ago
serf|3 years ago
the procedure seems more like a way to evaluate Anthropic-based AIs with different numbers of parameters, rather than a cross-the-board evaluation of fine-tuned chat AIs, and then those results are extrapolated to somehow say something about all AIs that are built similarly.
unless i'm missing some key here, it feels like a rather loose way to derive experimental data from the landscape.
nickthegreek|3 years ago