top | item 45685211 (no title) growdark | 4 months ago I'd love to see a benchmark that tests different LLMs for slop, not necessarily limited to code. That might be even more interesting than ARC-AGI. discuss order hn newest Bolwin|4 months ago See the writing benchmarks here https://eqbench.com/creative_writing_longform.html Der_Einzige|4 months ago Note this is the same first author jampa|4 months ago Not a benchmark per se, but there is a "Not x, but y" Slop Leaderboard:https://www.reddit.com/r/LocalLLaMA/comments/1lv2t7n/not_x_b... topaz0|4 months ago 100% of LLM output is slop. Done. unknown|4 months ago [deleted]
Bolwin|4 months ago See the writing benchmarks here https://eqbench.com/creative_writing_longform.html Der_Einzige|4 months ago Note this is the same first author
jampa|4 months ago Not a benchmark per se, but there is a "Not x, but y" Slop Leaderboard:https://www.reddit.com/r/LocalLLaMA/comments/1lv2t7n/not_x_b...
Bolwin|4 months ago
Der_Einzige|4 months ago
jampa|4 months ago
https://www.reddit.com/r/LocalLLaMA/comments/1lv2t7n/not_x_b...
topaz0|4 months ago
unknown|4 months ago
[deleted]