top | item 43385764

(no title)

genmon | 11 months ago

My assumption is that models are getting cheaper, fast. So you can build now with OpenAI/Anthropic/etc and swap it out for a local or hosted model in a year.

This doesn't work for all use cases but data extraction is pretty safe. Treat it like a database query -- a slow but high availability and relatively cheap call.

discuss

Cthulhu_|11 months ago

While it will become cheaper, it will never be as fast / efficient as 'just' parsing the data the old-fashioned way.

It feels like using AI to do computing things instead of writing code is just like when we moved to relatively inefficient web technology for front-ends, where we needed beefier systems to get the same performance as we used to have, or when cloud computing became a thing and efficiency / speed became a factor of credit card limit instead of code efficiency.

Call me a luddite but I think as software developers we should do better, reduce waste, embrace mechanical sympathy, etc. Using AI to generate some code is fine - it's just the next step in code generators that I've been using throughout all my career IMO. But using AI to do tasks that can also be done 1000x more efficiently, like parsing / processing data, is going in the wrong direction.

relistan|11 months ago

I know this particular problem space well. AI is a reasonable solution. WHOIS records are intentionally made to be human readable and not be machine parseable without huge effort because so many people were scraping them. So the same registrar may return records in a huge range of text formats. You can write code to handle them all if you really want to, but if you are not doing it en masse, AI is going to probably be a cheaper solution.

Example: https://github.com/weppos/whois is a very solid library for whois parsing but cannot handle all servers, as they say themselves. That has fifteen + years of work on it.

ohmygoodniche|11 months ago

Requesting that people think before transferring mission critical code into the hands of LLMs is not being a Luddite lol.

Can you imagine how many ridiculous errors we would have if LLMs structured data into protobufs. Or if they compiled software.

It's more than 1000x more wasteful resources wise too. The llm swiss army knife is the Balenciaga all leather garbage bag option for a vast majority of use cases

GTP|11 months ago

Still, I wouldn't use an LLM for what's essentially a database query: by their very nature, LLMs will give you the right answer most of the times, but will sometimes return you wrong information. Better stay on a deterministic DB query in this case.

lucianbr|11 months ago

As usual, arguments for LLMs are based on rosy assumptions about future trajectory. How about we talk about data extraction at that point in the future when models are already cheap enough. And in the meantime just assume the future is uncertain, as it obviously is.

szundi|11 months ago

[deleted]