top | item 47160537

(no title)

BugsJustFindMe | 4 days ago

Yes. "Show Code", not "Show CPU cycles". There's a difference. Writing code is not the same as running code. It looks to you like it ran the code. But you have no proof that it did. I've seen many times LLM systems from companies that claimed that their LLMs would run code and return the output claiming that they ran some code and returned the output but the output was not what the shown code actually produced when run.

discuss

order

Sophira|3 days ago

In my experience, models do not tend to write their own HTML output. They tend to output something like Markdown, or a modified version of it, and they wouldn't be able to write their own HTML that the browser would parse as such.

BugsJustFindMe|2 days ago

What, in your view, does sending one markup language instead of another markup language tell you about whether the back-end executed some code or only pretended to?

The front-end display is a representation of what the back-end sends it. Saying "but the back-end doesn't send HTML" is as meaningless as saying that about literally any other SPA website that builds its display from API requests that respond with JSON.

xVedun|4 days ago

Maybe the only way to be sure is to have it generate (not stable diffuse) an image with the value in there.

BugsJustFindMe|4 days ago

You cannot know that anything it shows you was generated by executing the code and isn't merely a simulacrum of execution output. That includes images.