top | item 36861580

(no title)

getmeinrn | 2 years ago

How do you do science on LLMs? I would imagine that is super important, given their broad impact on the social fabric. But they're non-deterministic, very expensive to train, and subjective. I understand we have some benchmarks for roughly understanding a model's competence. But is there any work in the area of understanding, through repeatable experiments, why LLMs behave how they do?

Do we care?

discuss

order

api|2 years ago

I'm pretty much certain the cost of training and running large LLMs is going to come down, because it's only a matter of time before truly customized chips come out for these.

GPUs really aren't that. They're massively parallel vector processors that turn out to be generally better than CPUs at running these models, but they're still not the ideal chip for running LLMs. That would be a large even more specialized parallel processor where almost all the silicon is dedicated to running exactly the types of operations used in large LLMs and that natively supports quantization formats such as those found in the ggml/llama.cpp world. Being able to natively run and train on those formats would allow gigantic 100B+ models to be run with more reasonable amounts of RAM and at a higher speed due to memory bandwidth constraints.

These chips, when they arrive, will be a lot cheaper than GPUs when compared in dollars per LLM performance. They'll be available for rent in the cloud and for purchase as accelerators.

I'd be utterly shocked if lots of chip companies don't have projects working on these chips, since at this point it's clear that LLMs are going to become a permanent fixture of computing.

DanHulton|2 years ago

I feel like it took practically no time for custom ASICs for bitcoin mining to show up, as soon as it was determined there was real money involved.

Given that there's already definitely real money involved here, I wonder what's holding up the custom AI ASICs?

gjm11|2 years ago

I would imagine it's a bit like doing science on human beings, who are also non-deterministic, expensive to train, and subjective. Perhaps there's scope for a scientific discipline corresponding to psychology but concerned with AI systems. We could call it robopsychology.

scottydog51834|2 years ago

There's a field called Interpretability (sometimes "Mechanistic Interpretability") which researches how weights inside of a neural network function. From what I can tell, Anthropic has the largest team working on this [0]. OpenAI has a small team inside their SuperAlignment org working on this. Alphabet has at least one team on this (not sure if this is Deepmind or Deepmind-Google or just Google). There are a handful of professors, PhD students, and independent researchers working on this (myself included); also, there are a few small labs working on this.

At least half of this interest overlaps with Effective Altruism's fears that AI could one day cause considerable harm to the human race. Some researchers and labs are funded by EA charities such as Long Term Future Fund and Open Philanthropy.

There is the occasional hackathon on Interpretability [1].

Here's an overview talk about it by one of the most-known researchers in the field [2].

[0] https://transformer-circuits.pub/2021/framework/index.html [1] https://alignmentjam.com/jam/interpretability [2] https://drive.google.com/file/d/1hwjAK3lWnDRBtbk3yLFL2DCK1Dg...

scottydog51834|2 years ago

Some people (namely the EAs) care because they don't want AI to kill us.

Another reason is to understand how our models make important decisions. If we one day use models to help make medical diagnoses or loan decisions, we'd like to know why the decision was made to ensure accuracy and/or fairness.

Others care because understanding models could allow us to build better models.

ShamelessC|2 years ago

> At least half of this interest overlaps with Effective Altruism's fears that AI could one day cause considerable harm to the human race.

That’s a little depressing.