(no title)
quincepie | 6 months ago
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.
> The way to solve this particular problem is to make a correct example available to it.
My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.
cj|6 months ago
I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"
It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.
These are the kind of innocent errors that can be dangerous if users trust it blindly.
The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?
zehaeva|6 months ago
I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?
blehn|6 months ago
BeetleB|6 months ago
This use case is bad by several degrees.
Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.
What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.
I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.
No, not Mayo clinic or any other site (unless it's a pretty generic medicine).
This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.
SV_BubbleTime|6 months ago
kenjackson|6 months ago
This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.
I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.
QuantumGood|6 months ago
naet|6 months ago
tuatoru|6 months ago
Aeolun|6 months ago
dncornholio|6 months ago
christkv|6 months ago
ljsprague|6 months ago
giancarlostoro|6 months ago
This is why I've said a few times here on HN and elsewhere, if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer. Juniors can do amazing things, they can also goof up hard. What's really funny is you can make them audit their own code in a new context window, and give you a detailed answer as to why that code is awful.
I use it mostly on personal projects especially since I can prototype quickly as needed.
skydhash|6 months ago
The thing is coding can (and should) be part of the design process. Many times, I though I have a good idea of what the solution should look like, then while coding, I got exposed more to the libraries and other parts of the code, which led me to a more refined approach. This exposure is what you will miss and it will quickly result in unfamiliar code.
bobbylarrybobby|6 months ago
puchatek|6 months ago
HarHarVeryFunny|6 months ago
It's also useful to have an intuition for what things an LLM is liable to get wrong/hallucinate, one of which is questions where the question itself suggests one or more obvious answers (which may or may not be correct), which the LLM may well then hallucinate, and sound reasonable, if it doesn't "know".
felipeerias|6 months ago
netcan|6 months ago
I think there's a parallel here for the internet as an i formation source. It delivered on "unlimited knowledge at the tip of everyone's fingertips" but lowering the bar also lowered the bar.
That access "works" only when the user is capable of doing their part too. Evaluating sources, integrating knowledge. Validating. Cross examining.
Now we are just more used to recognizing that accessibility comes with its own problem.
Some of this is down to general education. Some to domain expertize. Personality plays a big part.
The biggest factor is, i think, intelligence. There's a lot of 2nd and 3rd order thinking required to simultaneously entertain a curiosity, consider of how the LLM works, and exercise different levels of skepticism depending on the types of errors LLMs are likely to make.
Using LLMs correctly and incorrectly is.. subtle.
theshrike79|6 months ago
This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.
I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing
None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.
barapa|6 months ago
estimator7292|6 months ago
I can't think of any other tools like this. An LLM can multiply your efforts, but only if you were capable of doing it yourself. Wild.