top | item 44608975

(no title)

lsy | 7 months ago

If you have a decent understanding of how LLMs work (you put in basically every piece of text you can find, get a statistical machine that models text really well, then use contractors to train it to model text in conversational form), then you probably don't need to consume a big diet of ongoing output from PR people, bloggers, thought leaders, and internet rationalists. That seems likely to get you going down some millenarian path that's not helpful.

Despite the feeling that it's a fast-moving field, most of the differences in actual models over the last years are in degree and not kind, and the majority of ongoing work is in tooling and integrations, which you can probably keep up with as it seems useful for your work. Remembering that it's a model of text and is ungrounded goes a long way to discerning what kinds of work it's useful for (where verification of output is either straightforward or unnecessary), and what kinds of work it's not useful for.

discuss

crystal_revenge|7 months ago

I strongly agree with this sentiment and found the blog's list of "high signal" to be more a list of "self-promoting" (some good people who I've interacted with a fair bit on there, but that list is more 'buzz' than insight).

I also have not experienced the post's claim that: "Generative AI has been the fastest moving technology I have seen in my lifetime." I can't speak for the author, but I've been in this field from when "SVMs are the new hotness and neural networks are a joke!" to the entire explosion of deep learning, and insane number of DL frameworks around the 20-teens, all within a decade (remember implementing restricted Boltzmann machines and pre-training?). Similarly I saw "don't use JS for anything other than enhancing the UX" to single page webapps being the standard in the same timeframe.

Unless someone's aim is to be on that list of "High signal" people, it's far better to just keep your head down until you actually need these solutions. As an example, I left webdev work around the time of backbone.js, one of the first attempts at front end MVC for single pages apps. Then the great React/Angular wars began, and I just ignored it. A decade later I was working with a webdev team and learned React in a few days, very glad I did not stress about "keeping up" during the period of non-stop changing. Another example is just 5 years ago everyone was trying to learn how to implement LSTMs from scratch... only to have that model essentially become obsolete with the rise of transformers.

Multiple times over my career I've learned lesson that moving fast is another way of saying immature. One would find more success learning about the GLM (or god forbid understanding to identify survival analysis problems) and all of it's still under appreciated uses for day-to-day problem solving (old does not imply obsolete) than learning the "prompt hack of the week".

megh-khaire|7 months ago

I completely get where you're coming from. There’s a ton of noise in the space right now, and the hype is very real. I think that's mostly because the AI wave reached a broader, non-technical audience pretty quickly. That visibility has created a lot of excitement.

However, this AI wave does feel a bit different. What stands out is the speed of progress in multiple directions. We’ve seen new model architectures, prompting techniques, and agent frameworks. And every time one of those advances, it opens up new possibilities that startups are quick to explore.

I’m with you that chasing every shiny thing isn’t practical or even useful most of the time. But as someone curious about the space, I still find it exciting.

thorum|7 months ago

Beyond a basic understanding of how LLMs work, I find most LLM news fits into one of these categories:

- Someone made a slightly different tool for using LLMs (may or may not be useful depending on whether existing tools meet your needs)

- Someone made a model that is incrementally better at something, beating the previous state-of-the-art by a few % points on one benchmark or another (interesting to keep an eye on, but remember that this happens all the time and this new model will be outdated in a few months - probably no one will care about Kimi-K2 or GPT 4.1 by next January)

I think most people can comfortably ignore that kind of news and it wouldn’t matter.

On the other hand, some LLM news is:

- Someone figured out how to give a model entirely new capabilities.

Examples: RL and chain of thought. Coding agents that actually sort of work now. Computer Use. True end-to-end multimodal modals. Intelligent tool use.

Most people probably should be paying attention to those developments (and trying to look forward to what’s coming next). But the big capability leaps are rare and exciting enough that a cursory skim of HN posts with >500 points should keep you up-to-date.

I’d argue that, as with other tech skills, the best way to develop your understanding of LLMs and their capabilities is not through blogs or videos etc. It’s to build something. Experience for yourself what the tools are capable of, what does and doesn’t work, what is directly useful to your own work, etc.

PaulHoule|7 months ago

Rewritten in response to quality complaint.

A lot of people are feeling HN is saturated with AI posts whether it is how MCP is like USB-C (repeated so much you know it is NPCs) or how outraged people are that their sh1t fanfics are being hoovered up to train AI.

This piece is not “news”, it’s a summary which is tepid at best, I wish people had some better judgement about what they vote up.

pyman|7 months ago

I agree. This is what I tell my students:

1. Stop living other people's experiences. Start having your own.

2. Stop reading blogs. Start building apps.

3. Everyone's experience depends on their use case or limitations. Don't follow someone's opinion or ideology without understanding why.

4. Don't waste time chasing employees or researchers on Twitter or Substack. Most of them are just promoting themselves or their company.

5. Don't let anxiety or FOMO take over your time. Focus on learning by doing. If something important comes out, you'll find out eventually.

6. Being informed matters, but being obsessed with information doesn't. Be smart about how you manage your time.

That's what I tell them.

godelski|7 months ago

To be honest, this even is mostly true in the research side of things. Granted, 99% of research has always been incremental (which is okay! Don't let Reviewer #2 put you off). Lots of papers are filled with fluff. That is, if you have a strong background understanding these systems (honestly, a math background goes a long way to genearlizing this as lots of papers are just "we tried this math idea" and if you already knew it, you'd have a good guess as its effects).

I think it is easy for it to feel like the field is moving fast while it actually isn't. But I learned a lesson where I basically lost a year when I had to take care of my partner. I thought I'd be way behind when coming back but really not much had changed.

I think gaining this perspective can help you "keep up". Even if you are having a hard time now, this might suggest that you just don't have enough depth yet. Which is perfectly okay! Just might encourage you to focus on different things so that you can keep up. You can't stay one step behind if you first don't know how to run. Or insert some other inspirational analogy here. The rush is in your head, not in reality.

alphazard|7 months ago

When explaining LLMs to people, often the high level architecture is what they find the most interesting. Not the transformer, but the token by token prediction strategy (autoregression), and not always choosing the most likely token, but a token proportional to its likelihood.

The minutiae of how next token prediction works is rarely appreciated by lay people. They don't care about dot products, or embeddings, or any of it. There's basically no advantage to explaining how that part works since most people won't understand, retain, or appreciate it.

Melonololoti|7 months ago

You still need to learn the names of models, understand their use cases, concepts like MoE, then you have different architectures like diffusion vs transformers, agents etc.

And then you have GenAI like flux and all the open source projects.

I think it's beneficial to get all of that and then keeping an eye on it to catch the moment when it becomes relevant for you and not being surprised and too late.

gammalost|7 months ago

> You still need to learn the names of models, understand their use cases, concepts like MoE, then you have different architectures like diffusion vs transformers, agents etc

Why? When you think you might need something just search for it. There are too many models with incremental improvements

helloplanets|7 months ago

It's not a model of text, though. It's a model of multiple types of data. Pretty much all modern models are multimodal.

nerdsniper|7 months ago

I have a very good idea of how various models work. But the business I run benefits immensely from utilizing the latest models, whether thats ultra low-latency YOLO-style models or “SOTA” high performing ViT, LLMs, etc.

I maintain a funnel sucking up all the PR stuff — but I skip straight to the papers, benchmarks, and githubs.

qsort|7 months ago

I agree, but with the caveat that it's probably a bad time to fall asleep at the wheel. I'm very much a "nothing ever happens" kind of guy, but I see a lot of people who aren't taking the time to actually understand how LLMs work, and I think that's a huge mistake.

Last week I showed some colleagues how to do some basic things with Claude Code and they were like "wow, I didn't even know this existed". Bro, what are you even doing.

There is definitely a lot of hype and the lunatics on Linkedin are having a blast, but to put it mildly I don't think it's a bad investment to experiment a bit with what's possible with the SOTA.

crystal_revenge|7 months ago

> I see a lot of people who aren't taking the time to actually understand how LLMs work

The trouble is that the advice in the post will have very little impact on "understanding how LLMs work". The number of people who talk about LLMs daily but have never run an LLM local, and certainly never "opened it up to mess around" is very large.

A fun weekend exercise that anyone can do is to implement speculative decoding[0] using local LLMs. You'll learn a lot more about how LLMs work than reading every blog/twitter stream mentioned there.

0. https://research.google/blog/looking-back-at-speculative-dec...

layer8|7 months ago

> the lunatics on Linkedin are having a blast

That’s a nice way to put it, made me chuckle. :)

chamomeal|7 months ago

I mean I didn’t find out about Claude code until like a week ago and it hasn’t materially changed my work, or even how I interact with LLMs. I still basically copy paste into claude on web most of the time.

It is ridiculously cool, but I think anybody developer who is out of the loop could easily get back into the loop at any moment without having to stay caught up most of the time.

bravesoul2|7 months ago

Agreed. I played with a few code assistants and I dont see any stark differences in capability. Mostly UI. Do you want it in your editor, on the terminal, in the browser etc. It is because there is fierce competition everything hyped is quite good.

panarchy|7 months ago

AI research was so interesting pre-transformers it was starting to get a bit wild around GPT2 IIRC but now the signal to noise is so low with every internet sensationalist and dumb MBA jumping on the bandwagon.

victorbjorklund|7 months ago

Indeed. We have just had a few really big shifts since launch of GPT3. Rest has just been bigger and more optimized models + tooling around the models.