Giving agents linux has compounding benefits in our experience. They're able to sort through weirdness that normal tooling wouldn't allow. Like they can read and image, get an error back from the API and see it wasn't the expected format. They read the magic bytes to see it was a jpeg despite being named .png, and read it correctly.
storystarling|1 month ago
jtbayly|1 month ago
ndsipa_pomu|1 month ago
Maybe I'm missing something, but it seems trivial to implement reading the magic bytes. I haven't tested it, but I'd expect most linux image displayers/editors to automatically work with misnamed files as that is almost entirely the purpose of magic bytes.
Personally, I think Microsoft is to blame for everyone relying on file extensions too much as it was a bad idea which led to a lot of security issues.
lpcvoid|1 month ago
hex4def6|1 month ago
The whole point is that you are enabling the LLM through tool use. The prompt might be "Download all the images on the wikipedia article for 'Ascetic', and print them on my dot matrix printer (the driver of which only accepts BMPs, so convert as needed)"
Your solution using file / curl is just one part of the potential higher level problem statement. Yes, someone could write those lines easily. And they could write the wrapper around them with only a little more difficulty. And they could add the 404 logic detection with a bit more...
Are you arguing LLMs should only be used on 'hard' problems, and 'easy' problems (such as downloading with curl) should be done by humans? Or are you arguing LLMs should not be used for anything?
Because I think most people would suggest humans tackle the 'hard' problems, and let the tools (LLMs) tackle the 'easy' ones.
darknoon|1 month ago
toddmorey|1 month ago
Lerc|1 month ago
It stops an LLM from being blocked by the inability to do this thing. Removing this barrier might enable the LLM to complete a task that would be considerable work for a human.
For instance, identifying which files are PNG files containing pictures of birds, regardless of filename, presence or absence of suffix. An image handling LLM can identify if an image is of a bird much more easily than it could determine that an arbitrary file is a png. They can probably still do it, wasting a lot of tokens along the way, but using a few commands to determine which files to even bother looking at as images means the LLM can do what it is good at.