top | item 42748427

(no title)

alexwebb2 | 1 year ago

I think your intuition on this might be lagging a fair bit behind the current state of LLMs.

System message: answer with just "service" or "product"

User message (variable): 20 bottles of ferric chloride

Response: product

Model: OpenAI GPT-4o-mini

$0.075/1Mt batch input * 27 input tokens * 10M jobs = $20.25

$0.300/1Mt batch output * 1 output token * 10M jobs = $3.00

It's a sub-$25 job.

You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

discuss

order

simonw|1 year ago

You might be able to use an even cheaper model. Google Gemini 1.5 Flash 8B is Input: $0.04 / Output: $0.15 per 1M tokens.

17 input tokens and 2 output tokens * 10 million jobs = 170,000,000 input tokens, 20,000,000 output tokens... which costs a total of $6.38 https://tools.simonwillison.net/llm-prices

As for rate limits, https://ai.google.dev/pricing#1_5flash-8B says 4,000 requests per minute and 4 million tokens per minute - so you could run those 10 million jobs in about 2500 minutes or 42 hours. I imagine you could pull a trick like sending 10 items in a single prompt to help speed that up, but you'd have to test carefully to check the accuracy effects of doing that.

w10-1|1 year ago

The question is not average cost but marginal cost of quality - same as voice recognition, which had relatively low uptake even at ~2-4% error rates due to context switching costs for error correction.

So you'd have to account for the work of catching the residue of 2-8%+ error from LLMs. I believe the premise is for NLP, that's just incremental work, but for LLM's that could be impossible to correct (i.e., cost per next-percentage-correction explodes), for lack of easily controllable (or even understandable) models.

But it's most rational in business to focus on the easy majority with lower costs, and ignore hard parts that don't lead to dramatically larger TAM.

gf000|1 year ago

I am absolutely not an expert in NLP, but I wouldn't be surprised if for many kinds of problems LLMs would have far less error rate, than any NLP software.

Like, lemmation is pretty damn dumb in NLP, while a better LLM model will be orders of magnitude more correct.

griomnib|1 year ago

This assumes you don’t care about our rapidly depleting carbon budget.

No matter how much energy you save personally, running your jobs on Sam A’s earth killer ten thousand cluster of GPUs is literally against your own self interest of delaying climate disasters.

LLM have huge negative externalities, there is a moral argument to only use them when other tools won’t work.

amanaplanacanal|1 year ago

It's digging fossil carbon out of the ground that's the problem, not using electricity. Switch to electricity not from fossil carbon and you're golden.

renewiltord|1 year ago

Haha, this is pretty good. I’m going to take a plane to SF while I laugh at this.

elicksaur|1 year ago

How do you validate these classifications?

bugglebeetle|1 year ago

The same way you check performance for any problem like this: by creating one or more manually-labeled test datasets, randomly sampled from the target data and looking at the resulting precision, recall, f-scores etc. LLMs change pretty much nothing about evaluation for most NLP tasks.

segmondy|1 year ago

The same way you validate it if you didn't use an LLM.

jeswin|1 year ago

Isn't it easier and cheaper to validate than to classify (requires expensive engineers)? I mean the skill is not as expensive - many companies do this at scale.

scarface_74|1 year ago

You need a domain expert either way. I mentioned in another reply that one of my niches is implementing call centers with Amazon Connect and Amazon Lex (the NLP engine).

https://news.ycombinator.com/item?id=42748189

I don’t know the domain beforehand they are working in, I do validation testing with them.

axegon_|1 year ago

Yeah... Let's talk time needed for 10M prompts and how that fits into a daily pipeline. Enlighten us, please.

FloorEgg|1 year ago

Run them all in parallel with a cloud function in less than a minute?

LeafItAlone|1 year ago

>You'd need to be doing 20 times that volume every single day to even start to justify hiring an NLP engineer instead.

How much for the “prompt engineer”? Who is going to be doing the work and validating the output?

blindriver|1 year ago

You do not need a prompt engineer to create: “answer with just "service" or "product"”

Most classification prompts can be extremely easy and intuitive. The idea you have to hire a completely different prompt engineer is kind of funny. In fact you might be able to get the llm itself to help revise the prompt.

alexwebb2|1 year ago

All software engineers are (or can be) prompt engineers, at least to the level of trivial jobs like this. It's just an API call and a one-liner instruction. Odds are very good at most companies that they have someone on staff who can knock this out in short order. No specialized hiring required.

IanCal|1 year ago

Prompt engineering is less and less of an issue the simpler the job is and the more powerful the model is. You also don't need someone with deep nlp knowledge to measure and understand the output.