top | item 36860992

What we know about LLMs

351 points| wilhelm____ | 2 years ago |willthompson.name

164 comments

[+] killernap|2 years ago|reply

ChatGPT was announced November, 2022 - 8 months ago. Time flies.

Question for HN: Where are we in the hype cycle on this?

We can run shitty clones slowly on Raspberry Pi's and your phone. The educational implementations demonstrate the basics in under a thousand lines of brisk C. Great. At some point you have to wonder... well, so what?

Not one killer app has emerged. I for one am eager to be all hip and open minded and pretend like I use LLMs all the time for everything and they are "the future" but novelty aside it seems like so far we have a demented clippy and some sophomoric arguments about alignment and wrong think.

It did generate a whole lot of breathless click-bait-y articles and gave people something to blab about. Ironically it also accelerated the value of that sort of gab and clicks towards zero.

As I am not a VC, politician, or opportunist, hand waving and telling me this is Frankenstein's monster about to come alive and therefore I need billions of dollars or "regulations" just makes folks sound like the crypto scammers.

Please HN, say something actually insightful, I beg you.

[+] iainctduncan|2 years ago|reply

I work in tech diligence so I look at companies in detail. I have seen a couple where good machine learning is going to make a massive difference (whether it will keep them ahead of everyone is a separate question). I think it really boils down to:

"Is this a problem where an answer that is mostly right and sometimes wrong is still a great value proposition?"

This is what people don't get. If sometimes the answer is (catastrophically) wrong, and the cost of this is high, there's no market fit. So I think a lot of these early LLM related startups are going to be trainwrecks because they haven't figured this out. If the cost of an error is very high in your business, and human checking is what you are trying to avoid, these are not nearly as helpful.

I looked at one company in this scenario and they were dying. Couldn't get big customers to commit because the product was just not worth it if it couldn't be reliably right on something that a human was never going to get wrong (can't say what it was, NDAs and all that.) I also looked at one where they were doing very well because an answer that was usually close would save workers tons of time, and the nature of the biz was that eliminating the human verification step would make no sense anyway. Let's just say it was in a very onerous search problem, and it was trivial for the searcher to say "wrong wrong wrong, RIGHT, phew that saved me hours!". And that saving was going to add up to very significant cash.

So killer apps are going to be out there. But I agree that there is massive overhype and it's not all of them! (or even many!)

[+] bastawhiz|2 years ago|reply

> Not one killer app has emerged.

I'll say that I pretty firmly disagree with this. I've been using Github Copilot for about six months for my own work and it has fundamentally changed how I write code. Ignoring the ethics of Copilot, if I just need to read a file with some data, parse it, and render that data on screen, Copilot just _does_ most of that for me. I write a chunky comment explaining what I want, it writes a blob of code that I tab through, and I'm left with a nicely-documented, functioning piece of software. A one-off script that took me 30 minutes to write previously now takes me maybe a minute on a bad day.

For ages we've had Text Expander and key mappings and shortcuts and macros that render templates of pre-built code. Now I can just say what I'm trying to do, the language model considers the other code on the page, and it gets done.

If this isn't a "killer app" then I'm not sure what is. In my entire career I can think of maybe two things that I've come upon that have affected my workflow this much: source control and continuous integration. Which, frankly, is wild.

Separately, I use LLMs to generate marketing copy for my side hustle. I suck at marketing, but I can tell the damn thing what I want to market and it gives me a list of tweets back that sound like the extroverted CMO that I don't have. I can outsource creative tasks like brainstorming lists of names for products, or coming up with text categories for user feedback from a spreadsheet. I don't know if I'd call either of those things "killer apps" but I have a tool which can do thinking for me at a nominal cost, quickly, and with a high-enough quality bar that it's usually not a waste of my time.

[+] Arkhaine_kupo|2 years ago|reply

> Not one killer app has emerged.

I think the microsoft gpt integration on Office is probably that app.

Ability to ask to have your email's summarised, or getting your excel sheets formulas configured with natural language, etc are increidbly useful tools to lower the floor of entry to tools that already speed up humans so much.

I don't think the use of this tools is some life redefining feature, but a friend of mine joked that in a year from now you will right a simple sentence like "write polite work email with following request: Come to the meeting, you are late" then Gpt will write the email, another gpt will send it and his GPT will sumarise it and he will reply with another gpt message instantly apologising that you will read the summary off. Leaving a trail of polite long messages that no one will even open.

[+] IKantRead|2 years ago|reply

> Where are we in the hype cycle on this?

Can we stop acting like the Gartner "hype cycle" is anything more than a marketing gimmick created Gartner to validate their own consulting/research services?

While you can absolutely find cases that map to the "hype cycle", there is nothing whatsoever to validate this model as remotely accurate or valid for describing technology trends.

Where is crypto in the "hype cycle"? It went through at least 3 rounds of "peak of inflated expectation" and I'm not confident it will ever reach a meaningful "plateau of productivity".

Did mobile ever have "inflated expectation"? Yes there was a lot of hype in the early days but those people hyped about it, rushing to build mobile versions of their websites... were correct.

The "hype cycle" is a neat idea but doesn't really map to reality in a way that makes it useful. It's only useful for Gartner to create an illusion of credibility and sell their services.

[+] lolinder|2 years ago|reply

That 8 months seems like a long time to you is indicative of just how fast tech has been moving lately. I expect at least another year before we have a good sense for where we actually are, probably more.

However, I'll hazard a guess: I think we haven't seen many real new apps since then because too many people are focused on packaging ChatGPT for X. A chatbot is a perfectly decent use case for some things, but I think the real progress will come when people stop trying to copy what OpenAI already did and start integrating LLMs in a more hands-off way that's more natural to their domains.

A great example that's changed my life is News Minimalist [0]. They feed all the news from a ton of sources into one of the GPT models and have it rate the story for significance and credibility. Only the highest rated stories make it into the newsletter. It's still rough around the edges, but being able to delegate most of my news consumption has already made a huge difference in my quality of life!

I expect successful and useful applications to fall in a similar vein to News Minimalist. They're not going to turn the world upside down like the hype artists claim, but there is real value to be made if people can start with a real problem instead of just adding a chatbot to everything.

[0] https://www.newsminimalist.com/

[+] mjr00|2 years ago|reply

> Not one killer app has emerged. I for one am eager to be all hip and open minded and pretend like I use LLMs all the time for everything and they are "the future" but novelty aside it seems like so far we have a demented clippy and some sophomoric arguments about alignment and wrong think.

In my mind I divide LLM usage into two categories, creation and ingestion.

Creation is largely a parlor trick that blew the minds of some people because it was their first exposure to generative AI. Now that some time has passed, most people can pattern match GPT-generated content, especially one without sufficient "prompt engineering" to make it sound less like the default writing style. Nobody is impressed by "write a rap like a pirate" output anymore.

Ingestion is a lot less sexy and hasn't gotten nearly as much attention as creation. This is stuff like "summarize this document." And it's powerful. But people didn't get as hyped up on it because it's something that they felt like a computer was supposed to be able to do: transforming existing data from one format to another isn't revolutionary, after all.

But the world has a lot of unstructured, machine-inaccessible text. Legal documents saved in PDF format, consultant reports in Word, investor pitches in PowerPoint. And when I say "unstructured" I mean "there is data here that it is not easy for a machine to parse."

Being able to toss this stuff into ChatGPT (or other LLM) and prompt with things like "given the following legal document, give me the case number, the names of the lawyers, and the names of the defendants; the output must be JSON with the following schema..." and that save that information into a database is absolutely killer. Right now companies are recruiting armies of interns and contractors to do this sort of work, and it's time-consuming and awful.

[+] Jevon23|2 years ago|reply

>Not one killer app has emerged.

Surely the “killer app” is ChatGPT itself?

ChatGPT has already put some copywriters and journalists out of work, or at least reduced their hours. The app is quite literally “killing” something, i.e. people’s jobs. For those people, it’s not just empty hype. It’s very real. Certainly it’s already more real than anything having to do with blockchain/crypto.

[+] DebtDeflation|2 years ago|reply

The killer app for large enterprises is Q&A against the corporate knowledgebase(s). Big companies have an insane amount of tribal knowledge locked away in documents sitting on Sharepoint, on Box, on file servers, etc. Best case scenario, their employees can do keyword search against a subset of those documents. Chunk those docs, run them through an embedding process, store the embeddings in a vector store, let employees ask questions, do a similarity search against the vector store, pass the top results and the question to the LMM, get an actual answer back to present to the employee. This unlocks a ton of knowledge and can be a massive productivity booster.

[+] monero-xmr|2 years ago|reply

There is definitely interesting and high-potential technology here. I do not think the current crop of "wrap ChatGPT in an API for XYZ business-case" startups will succeed - they will be total fails across the board. There is also an issue where anyone with an iota of experience or degree in something tangential to AI or ML can be the "genius" behind a new startup for funding - a telltale sign of bubble mentality to me.

If LLMs in their current form as human-replacement agents are cheaper versions of Fiver / mechanical turks, and we all know there are very limited, bottom-of-the-barrel use cases for those cheap labor technologies, then why would LLMs be a radical improvement? It's nonsensical.

[+] usaar333|2 years ago|reply

> Not one killer app has emerged

ChatGPT itself is a killer app.

[+] infinitezest|2 years ago|reply

I personally use copilot every day and I love it. It reduces the amount of typing I have to do, gives me lots of good suggestions for solving simple problems and has made working with unfamiliar languages so much easier.

[+] sanderjd|2 years ago|reply

I'd say we're maybe half or two-thirds of the way down from the peak of inflated expectations toward the trough of disillusionment. Before long, I think maybe in the next three months or so, certainly around the time we hit the one year anniversary of chatgpt's release, we'll start seeing mainstream takes along the lines of "chatgpt and Bing's Sydney episode and such were good entertainment, but it's obvious in hindsight that it was a fad; nobody is posting funny screenshots of their conversations anymore, and all the pronouncements about a superhuman AGI apocalypse were obviously silly, it's clear chatgpt has failed and this whole thing was the same old hype-y SV pointlessness".

And at that point, we will have reached the trough of disillusionment. I think funding will be less readily available, and we'll start seeing some of the bevy of single-purpose LLM-based products start closing up shop.

But more quietly, others will be (already are) traversing up the slope of enlightenment. As others have mentioned, this is stuff like features in Microsoft's and Google's productivity products (including those for software engineering productivity like Github Copilot), and some subset of products and features elsewhere that turn out to be compelling in a sticky way.

I expect 2024 and 2025 to be the more interesting part of this hype cycle. I don't think we're on the verge of waking up in a world nobody recognizes in a small number of days or months, but I think in a few years we're going to have a bunch of useful tools that we didn't have a year ago, some of which are the obvious ones we've already seen, but improved, and others that are not obvious right now.

Not sure if this was insightful enough for you :) Apologies if not.

[+] rm445|2 years ago|reply

We're still on the exponential rise of the hype cycle. If capabilities appear to plateau - no GPT5/6 that are even more amazing, then the hype will not merely plateau but plummet. For now, anything seems possible.

As for a killer app, I'm another person for whom ChatGPT is it. I use GPT-4 something like Google, Wikipedia and Stack Overflow in one, but being very aware of the limitations. It feels a bit like circa 2000 when being good at googling things felt like a superpower. It doesn't do everything for you but can make you drastically more effective.

There's three levels of what's going on with AI at the moment, each with their own momentum and hype cycle: (1) the current generation of chat bots and image generators, which some of us would be using for the rest of our lives even with only minor refinements; (2) the prospect that new tools built on top of this and subsequent generations could remake the internet and how we interact with our gadgets; and (3) the prospect that the systems will keep getting smarter and smarter.

[+] Mertax|2 years ago|reply

I wonder if language translation will be one of the "killer apps".

Especially if it can be done real-time and according to the context/level of the audience/listener. Even within the same language, translation from a more technical/expert level to a simplified summary helps education/communication/knowledge transfer significantly.

[+] KronisLV|2 years ago|reply

> Not one killer app has emerged.

I mentioned the Stack Overflow Developer Survey once already today, but at the risk of sounding like a broken record, it has some data on this as well: https://survey.stackoverflow.co/2023/#ai

To save someone a click, around 44% of the respondents (some 39k out of 89k people) are already using "AI" solutions as a part of their workflow, another 25% (close to 23k people) are planning to do so soon.

The sentiment also seems mostly favorable, most aim to increase productivity or help themselves with learning and just generally knock out some more code, though there is a disconnect between what people want to use AI for (basically everything) and what they currently use it for (mostly just code).

There's also a section on the AI search tools in particular, about 83% of the respondents have at least had a look at ChatGPT, which is about as close to a killer app as you can probably get, even if it's cloud based SaaS: https://survey.stackoverflow.co/2023/#section-most-popular-t...

> Where are we in the hype cycle on this?

I'm not sure about the specifics here, but the trend feels about as significant as Docker and other container technologies more or less taking the industry by storm and changing a bunch of stuff around (to the point where most of my server software is containers).

That said, we're probably still somewhere in the early stages of the hype cycle for AI (the drawbacks like hallucination will really become apparent to many in the following years).

Honestly, the technology itself seems promising for select use cases and it's still nice that we have models that can be self hosted and somehow the software has gotten decent enough that you can play around with reasonably small models on your machine even without a GPU: https://blog.kronis.dev/tutorials/self-hosting-an-ai-llm-cha...

I'm cautiously optimistic about the current forms of LLM/AI, but fear that humanity will misuse the tech (as a cost cutting measure sometimes, without proper human review).

[+] gaganyaan|2 years ago|reply

The killer app is ChatGPT. I'm not sure what you're expecting here, but it's been enormously useful while trying out new languages. For example, even if it's not 100% right, it has been a great help while working with nix, as I'm often ignorant to entire methods of solving a problem, and it's pretty good at suggesting the right method.

It's also super useful for things like "convert this fish shell snippet to bash" or "rewrite this Python class as a single function". It tends to really nail these sorts of grounded questions, and it legitimately saves me time.

[+] reader5000|2 years ago|reply

I think 8 months is a little short for the utility of a new tech to be fully realized and utilized. I'm pretty sure there were still horses on the roads long after 8 months after the Model T first went on sale.

[+] unethical_ban|2 years ago|reply

I can't tell if this is satire or not. It is so... Well, to be polite, sounds so much like an uninformed stock trader, that I find it hard to believe this isn't some sort of meta commentary on hacker News conversations.

There are plenty examples of where the technology can eventually lead in terms of entertainment, impact on society and news, knowledge work, and so on. It doesn't have to happen immediately. But to handwave The myriad articles about the subject away and just say " I don't believe any of it, what else you got" is a bit annoying.

[+] jawerty|2 years ago|reply

I run through a lot of these concepts, specifically RLHF, in my latest coding stream where I finetune LLama 2 if anyone's interested in getting a LLM deep dive https://www.youtube.com/watch?v=TYgtG2Th6fI&t=4002s

Long story short, the size of the model and reward mechanisms used in validating off of human annotating/feedback are the main differences between what we can do as independents in OSS vs OpenAI. BigCode's StarCoder (https://huggingface.co/bigcode/starcoder) has some human labor backing it (I believe correct me if I'm wrong) but at the end of the day a company will always be able to gather people better.

Not knocking Starcoder, in fact I streamed how to fine tune it the other day. However, it's important to mention some of the limitations in the OSS space now (big reason Meta pushing LLama 2 is a nice to have)

[+] unknown|2 years ago|reply

[deleted]

[+] bilsbie|2 years ago|reply

When you’re doing RLHF are you actually modifying the weights of llama itself?

Or is something on top?

[+] jannyfer|2 years ago|reply

Great summary.

I’ve been reading a pop neuroscience book called Incognito (2011).

In it, the author talks about how the brain is a group of competing sub-brains of many forms, and the brain might have several ways of doing the same thing (e.g. recognizing an object). The author also posited that the lack of AI progress back then was due to the fact that there are no constantly competing sub-brains. Our brains are always adjusting and trying new scenarios.

I was struck by how similar these brain observations were to recent developments in AI and LLMs.

The book is full of cool stories, even if some of them are now recognized as non-reproducible. I recommend!

[+] go_elmo|2 years ago|reply

In the end - an AI should have these competing subsystems in one system - just as our brains are one system. What I find extremely interesting is how perception and thinking differs from person to person too - it was a "taboo" topic to call this neurodiversity - just as other genetic traits, but AI makes this relevant more than ever imo. Sure, its complicated and much comes from nurture (Nurture vs nature.. as exposure/epigeneticd vs genetics) but there sure are markable differences - the ones starting to stand out are e.g. adhd / autistic people, but Im sure it wont stay just there over time!

[+] esafak|2 years ago|reply

> The author also posited that the lack of AI progress back then was due to the fact that there are no constantly competing sub-brains.

That became popular in neural networks after the introduction of dropout regularization, which forced neurons to "co-adapt" and learn to do each others' jobs. Large, over-specified models also provide a natural setting for co-adaptation.

[+] masswerk|2 years ago|reply

In fact, this is what psychoanalysis and the notion of the unconscious (as opposed to "subconscious processes") was all about. (And it's also, where the "talking cure" found its leverage.)

[+] wilhelm____|2 years ago|reply

thank you!

[+] throwuwu|2 years ago|reply

How are they similar?

[+] cubefox|2 years ago|reply

Specifically about RLHF, I find this video by Rob Miles still the best presentation of the ingenious original 2017(!) paper: https://youtube.com/watch?v=PYylPRX6z4Q

RLHF is actually older than GPT-1, which came out in 2018. It didn't get applied to language models until 2022 with InstructGPT, an approach which combined supervised instruction fine-tuning with RLHF.

[+] getmeinrn|2 years ago|reply

How do you do science on LLMs? I would imagine that is super important, given their broad impact on the social fabric. But they're non-deterministic, very expensive to train, and subjective. I understand we have some benchmarks for roughly understanding a model's competence. But is there any work in the area of understanding, through repeatable experiments, why LLMs behave how they do?

Do we care?

[+] swyx|2 years ago|reply

> Transformers can be generally categorized into one of three categories: “encoder only” (a la BERT); “decoder only” (a la GPT); and having an “encoder-decoder” architecture (a la T5). Although all of these architectures can be rigged for a broad range of tasks (e.g. classification, translation, etc), encoders are thought to be useful for tasks where the entire sequence needs to be understood (such as sentiment classification), whereas decoders are thought to be useful for tasks where text needs to be completed (such as completing a sentence). Encoder-decoder architectures can be applied to a variety of problems, but are most famously associated with language translation.

theres a whole lot of "thought to be"'s here. is there a proper study done on the relative effectiveness of encoder only vs decoder only vs encoder-decoder for various tasks?

[+] tmaly|2 years ago|reply

>Rather than explicitly labeling data, it might be easier for a human to read two or more LLM outputs and encode their preferences through comparison.

This reminded me a lot of what economist Murray Rothbard talked about on preferences in his treatise Man, Economy, and State.

There is likely to be other insights hidden in these philosophical works on human choices.

[+] samstave|2 years ago|reply

Subscribe.

ELI5 - I'd like to know more about this, as I have no experience with this line of thought as of yet.

[+] blitzar|2 years ago|reply

You had me at "Crypto VCs & ”builders” making a hard left into AI"

[+] dsco|2 years ago|reply

As a matter of fact, there’s even more developers making a hard left into AI who have never touched crypto.

The interesting follow up question is: what will they actually spend time on? Training new models? Copy pasting front ends on ChatGPT? Fine tuning models?

I think many of them will be scared by how much of a hard science ML is vs just spinning up old CRUD apps

[+] wilhelm____|2 years ago|reply

haha ty ty!

[+] JazzyEssy|2 years ago|reply

Oh really now lol

[+] janalsncm|2 years ago|reply

> Transformers can read the entire sequence at once and learn to “pay attention to” only the values that came earlier in time (via “masking”)

Unless the text fits into the model’s context window, this is incorrect. The self-attention layers will train via a sliding window over the text.

Learning to attend only to previous tokens is also not correct. There are a lot of ways to train a transformer. BERT is bidirectional for example.

[+] wilhelm____|2 years ago|reply

I mention that above this sentence in the image - glad someone was "paying attention to" the text ;)

[+] yding|2 years ago|reply

Good job Will!

[+] wilhelm____|2 years ago|reply

thank you, Yi!

[+] zuckerborgggg|2 years ago|reply

[deleted]

[+] FishInTheWater|2 years ago|reply

Given a set of instructions, an instruction fine-tuned/aligned LLM is able (conditional on size and training quality) to reason through a set of steps to produce a desired output.

This is plainly wrong. The model's growing size makes it better at guessing the outcome of a reasoning task, but little to no actual reasoning is performed.

It's trivial to prove this as well, as LLMs will still fail miserably at (larger) math problems that even basic computer algebra systems will handle with ease.

[+] viraptor|2 years ago|reply

> The model's growing size makes it better at guessing the outcome of a reasoning task, but little to no actual reasoning is performed.

If there's no observable difference between the behaviours, why not call it as the post did?

> LLMs will still fail miserably at (larger) math problems

They're neither trained on such problems, nor is that a goal for LLMs. They can however tell you how to convert that problem into steps that can be run in an algebra system.

[+] JustBreath|2 years ago|reply

There's some argument to be made that a form of reasoning happens in a roundabout way when the AI is told to explain it's reasoning.

For example if you tell it "Do <thing>" and then open a new context and say "Do <thing>, explain your reasoning beforehand." you will often get a more accurate response.

Granted, it's not that any "Hmm, let me think about that." Deep Thought reasoning occurs, but simply that predicting what the reasoning would look like and then predicting what comes after that reasoning results in a more accurate - and ironically, reasoned - response.

Kinda funny actually, it's a bit like how in Hitchiker's Guide they just had to tell the probability machine to calculate the odds of an improbability drive in order to create it.

[+] unknown|2 years ago|reply

[deleted]

[+] llm_nerd|2 years ago|reply

"Crypto VCs & ”builders” making a hard left into AI"

This is a humorous intro graphic caption, but this sentiment appears on here constantly and it's self-destructive. This response might seem a bit over the top to a funny graphic, but I am replying to the general "ha ha AI like crypto amirite?" sentiment that is incredibly boring and worn out.

When confronted with challenging new technology that we don't understand, some knee-jerk to acting dismissive. As if that has any hope at all of changing outcomes.

It's especially weird when people who are clearly on the "I must desperately learn this as quickly as I can and try to present myself as some sort of expert" still incant the rhetoric -- "joking on the square" as it were -- as if they need to defend their prior dismissals. Constantly on here there is yet another trivial "intro to tokenization" blog entry that brays some tired crypto comparison.

Stop it.

The Venn diagram of people at the forefront of ML/LLM, and its advocates, is almost entirely separate from the web/crypto sphere. There is astonishingly little overlap. Crypto was hyped because some people truly saw a purpose, coupled with masses of scammers and getrichquick sorts. AI/LLM/ML is hyped because it is revolutionary and has already yielded infinitely more practical impact than crypto ever did.