(no title)
WhyIsItAlwaysHN | 8 months ago
The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.
You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.
You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.
If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.
In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.
barrkel|8 months ago
4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.
o3 meanwhile will do many searches and look at the thing from multiple angles.
zambal|8 months ago
WhyIsItAlwaysHN|8 months ago
pzo|8 months ago
harperlee|8 months ago
skywhopper|8 months ago
brookst|8 months ago
Why wouldn’t I use it like this?
ceejayoz|8 months ago
What benefit did the LLM add here, if you still had to vet the sources?
adrian_b|8 months ago
WhyIsItAlwaysHN|8 months ago
The LLM has to read the websites to answer you so that significantly increases the token count, since it has to include them in its input.