(no title)
ep103 | 5 months ago
It got them all right. Except when I really looked through the data, for 3 of the excel cells, it clearly just made up new numbers. I found the first one by accident, the remaining two took longer than it would have taken to modify the file from scratch myself.
Watching my coworkers blindly trust output like this is concerning.
photonthug|5 months ago
My take-away re: chain-of-thought specifically is this. If the answer to "LLMs can't reason" is "use more LLMs", and then the answer to problems with that is to run the same process in parallel N times and vote/retry/etc, it just feels like a scam aimed at burning through more tokens.
Hopefully chain-of-code[2] is better in that it's at least trying to force LLMs into emulating a more deterministic abstract machine instead of rolling dice. Trying to eliminate things like code, formal representations, and explicit world-models in favor of implicit representations and inscrutable oracles might be good business but it's bad engineering
[0] https://en.wikipedia.org/wiki/Datasaurus_dozen [1] https://towardsdatascience.com/how-metrics-and-llms-can-tric... [2] https://icml.cc/media/icml-2024/Slides/32784.pdf
dingnuts|5 months ago
IT IS A SCAM TO BURN MORE TOKENS. You will know when it is no longer a scam when you either:
1) pay a flat price with NO USAGE LIMITS
or
2) pay per token with the ability to mark a response as bullshit & get a refund for those wasted tokens.
Until then: the incentives are the same as a casino's which means IT IS A SCAM.
befictious|5 months ago
I have a growing tin foil hat theory that the business model of LLM's is the same as 1-900-psychic numbers of old.
For just 25¢ 1-900-psychic will solve all your problems in just 5 minutes! Still need help?! No problem! We'll work with you until you get your answers for only 10¢ a minute until your happy!
eerily similar
jmogly|5 months ago
Maybe there is some way to do it based on the geometry of how the neural net activated for a token, or some other more statistics based approach, idk I’m not an expert.
weinzierl|5 months ago
It had a small suggestion for the last sentence and repeated the whole corrected version for me to copy and paste.
Only last sentence slightly modified - or so I thought because it had moved the date of the event in the first sentence by one day.
Luckily I caught it before posting, but it was a close call.
toss1|5 months ago
Just because every competent human we know would edit ONLY the specified parts, or move only the specified columns with a cut/paste operation (or similar deterministically reliable operation), does not mean an LLM will do the same, in fact, it seems to prefer to regenerate everything on the fly. NO, just NO.
throwawayoldie|5 months ago
I'm struggling with trying to understand how using an LLM to do this seemed like a good idea in the first place.
recursive|5 months ago
spongebobstoes|5 months ago
I expect future models will be able to identify when a computational tool will work, and use it directly
epiccoleman|5 months ago
If I was trying to do something like this I would ask the LLM to write a Python script, validate the output by running it against the first handful of rows (like, `head -n 10 thing.csv | python transform-csv.py`).
There are times when statistical / stochastic output is useful. There are other times when you want deterministic output. A transformation on a CSV is the latter.
ep103|5 months ago