top | item 45418259

(no title)

rapfaria | 5 months ago

His "pelican riding a bicycle" tests are now a classic and AI shops are benchmaxxing for it

discuss

order

simonw|5 months ago

They need to benchmaxxx a whole lot harder, the illustrations still all universally suck!

lxgr|5 months ago

I fully expect a model to output a SVG made up of 1000x1000 rectangles (i.e. pixels) representing a raster image of a beautifully hand-drawn pelican riding a bicycle any day now :)

astrange|5 months ago

If they were testing that it'd work more often.

Other things you can ask that they're still clearly not optimizing for are ASCII art and directions between different locations. Complete fabrications 100% of the time.

Sharlin|5 months ago

Well, I definitely hope they aren't trying to teach LLMs directions between locations, given how idiotic use of compute and parameter space that would be. We already have excellent AIs for route planning. What they ought to optimize for is, of course, finally teaching them to say they don't know, or just automatically opting to call a route-planning API if the user asks for directions.