top | item 45101219

(no title)

quincepie | 6 months ago

I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.

> The way to solve this particular problem is to make a correct example available to it.

My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.

discuss

order

cj|6 months ago

> the user will at least need to know something about the topic beforehand.

I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"

It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.

These are the kind of innocent errors that can be dangerous if users trust it blindly.

The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?

zehaeva|6 months ago

What if you had told it again that you don't think that's right? Would it have stuck to it's guns and went "oh, no, I am right here" or would it have backed down and said "Oh, silly me, you're right, here's the real dosage!" and give you again something wrong?

I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?

blehn|6 months ago

Perhaps the absolute worst use-case for an LLM

BeetleB|6 months ago

> I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication.

This use case is bad by several degrees.

Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.

What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.

I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.

No, not Mayo clinic or any other site (unless it's a pretty generic medicine).

This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.

SV_BubbleTime|6 months ago

LANGUAGE model, not FACT model.

kenjackson|6 months ago

"The main challenge is LLMs aren't able to gauge confidence in its answers"

This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.

I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.

QuantumGood|6 months ago

With search and references, and without search and references are two different tools. They're supposed to be closer to the same thing, but are not. That isn't to say there's a guarantee of correctness with references, but in my experience, accuracy is better, and seeing unexpected references is helpful when confirming.

naet|6 months ago

That is exactly the kind of question that I would never trust to chatgpt.

tuatoru|6 months ago

Modern Russian Roulette, using LLMs for dose calculations.

Aeolun|6 months ago

I feel like asking an LLM for medicine dosage guidelines is exactly what you should never use it for…

dncornholio|6 months ago

Using a LLM for medical research is just as dangerous as Googling it. Always ask your doctors!

christkv|6 months ago

I find if I force thinking mode and then force it to search the web it’s much better.

ljsprague|6 months ago

Maybe don't use an LLM for dosing guidelines.

giancarlostoro|6 months ago

> the user will at least need to know something about the topic beforehand.

This is why I've said a few times here on HN and elsewhere, if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer. Juniors can do amazing things, they can also goof up hard. What's really funny is you can make them audit their own code in a new context window, and give you a detailed answer as to why that code is awful.

I use it mostly on personal projects especially since I can prototype quickly as needed.

skydhash|6 months ago

> if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer.

The thing is coding can (and should) be part of the design process. Many times, I though I have a good idea of what the solution should look like, then while coding, I got exposed more to the libraries and other parts of the code, which led me to a more refined approach. This exposure is what you will miss and it will quickly result in unfamiliar code.

HarHarVeryFunny|6 months ago

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

It's also useful to have an intuition for what things an LLM is liable to get wrong/hallucinate, one of which is questions where the question itself suggests one or more obvious answers (which may or may not be correct), which the LLM may well then hallucinate, and sound reasonable, if it doesn't "know".

felipeerias|6 months ago

LLMs are very sensitive to leading questions. A small hint of that the expected answer looks like will tend to produce exactly that answer.

netcan|6 months ago

>the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input

I think there's a parallel here for the internet as an i formation source. It delivered on "unlimited knowledge at the tip of everyone's fingertips" but lowering the bar also lowered the bar.

That access "works" only when the user is capable of doing their part too. Evaluating sources, integrating knowledge. Validating. Cross examining.

Now we are just more used to recognizing that accessibility comes with its own problem.

Some of this is down to general education. Some to domain expertize. Personality plays a big part.

The biggest factor is, i think, intelligence. There's a lot of 2nd and 3rd order thinking required to simultaneously entertain a curiosity, consider of how the LLM works, and exercise different levels of skepticism depending on the types of errors LLMs are likely to make.

Using LLMs correctly and incorrectly is.. subtle.

theshrike79|6 months ago

> the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand

This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.

I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing

None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.

barapa|6 months ago

They are training on your queries. So they may have some exposure to them going forward.

estimator7292|6 months ago

It's really strange to me that the only way to effectively use LLMs is if you already have all the knowledge and skill to do the task yourself.

I can't think of any other tools like this. An LLM can multiply your efforts, but only if you were capable of doing it yourself. Wild.