top | item 44224168

(no title)

WhyIsItAlwaysHN | 8 months ago

There's something I don't get in this analysis.

The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.

You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.

You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.

If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.

In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.

discuss

order

barrkel|8 months ago

Oh contraire, I ask questions about recent things all the time, because the LLM will do a web search and read the web page - multiple pages - for me, and summarize it all.

4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.

o3 meanwhile will do many searches and look at the thing from multiple angles.

zambal|8 months ago

But in that case it's hard to argue that llm's are cheap in comparison to search (the premise of the article)

WhyIsItAlwaysHN|8 months ago

But that was my point, then you need to include the entire websites in the context and it won't be 506 tokens per question. It will be thousands

pzo|8 months ago

But that's from user perspective, check Google or openai pricing if you wanted to have grounded results in their API. Google ask $45 for 1k grounded searches on top of tokens. If you have business model based on ads you unlikely gonna have $45 CPM. Same if you want to offer so free version of you product then it's getting expensive.

harperlee|8 months ago

Nitpick: Au contraire

skywhopper|8 months ago

Yeah, the point is that this behavior uses a lot more tokens than the OP says is a “typical” LLM query.

brookst|8 months ago

Just tried asking “what is the maximum carryon size for an American Airlines flight DFW-CDG” and it used a webs search, provided the correct answer, and provided links to both the airline and FAA sites.

Why wouldn’t I use it like this?

ceejayoz|8 months ago

That search query brings up https://www.aa.com/i18n/travel-info/baggage/carry-on-baggage... for the first result, which says "The total size of your carry-on, including the handles and wheels, cannot exceed 22 x 14 x 9 inches (56 x 36 x 23 cm) and must fit in the sizer at the airport."

What benefit did the LLM add here, if you still had to vet the sources?

adrian_b|8 months ago

I do not see which is the added benefit provided by the LLM in such cases, instead of doing yourself that web search, and for free.

WhyIsItAlwaysHN|8 months ago

What I was saying is that you wouldn't use a raw LLM (so 506 tokens to get an answer). You would use it with web search so you can get the links.

The LLM has to read the websites to answer you so that significantly increases the token count, since it has to include them in its input.