top | item 44833413

(no title)

neltnerb | 6 months ago

That's great, but AlphaGo used artificial and constrained training materials. It's a lot easier to optimize things when you can actually define an objective score, and especially when your system is able to generate valid training materials on its own.

discuss

order

FrustratedMonky|6 months ago

"artificial and constrained training materials"

Are you simply referring to games having a defined win/loss reward function?

Because pretty sure Alpha Go was ground breaking also because it was self taught, by playing itself, there were no training materials. Unless you say the rules of the game itself is the constraint.

But even then, from move to move, there are huge decisions to be made that are NOT easily defined with a win/loss reward function. Especially early game, there are many moves to make that don't obviously have an objective score to optimize against.

You could make the big leap and say that GO is so open ended, that it does model Life.

neltnerb|6 months ago

That quote was intended to mean --

"artificial" maybe I should have said "synthetic"? I mean the computer can teach itself.

"constrained" the game has rules that can be evaluated

and as to the other -- I don't know what to tell you, I don't think anything I said is inconsistent with the below quotes.

It's clearly not just a generic LLM, and it's only possible to generate a billion training examples for it to play against itself because synthetic data is valid. And synthetic data contains training examples no human has ever done, which is why it's not at all surprising it did stuff humans never would try. A LLM would just try patterns that, at best, are published in human-generated go game histories or synthesized from them. I think this inherently limits the amount of exploration it can do of the game space, and similarly would be much less likely to generate novel moves.

https://en.wikipedia.org/wiki/AlphaGo

> As of 2016, AlphaGo's algorithm uses a combination of machine learning and tree search techniques, combined with extensive training, both from human and computer play. It uses Monte Carlo tree search, guided by a "value network" and a "policy network", both implemented using deep neural network technology.[5][4] A limited amount of game-specific feature detection pre-processing (for example, to highlight whether a move matches a nakade pattern) is applied to the input before it is sent to the neural networks.[4] The networks are convolutional neural networks with 12 layers, trained by reinforcement learning.[4]

> The system's neural networks were initially bootstrapped from human gameplay expertise. AlphaGo was initially trained to mimic human play by attempting to match the moves of expert players from recorded historical games, using a database of around 30 million moves.[21] Once it had reached a certain degree of proficiency, it was trained further by being set to play large numbers of games against other instances of itself, using reinforcement learning to improve its play.[5] To avoid "disrespectfully" wasting its opponent's time, the program is specifically programmed to resign if its assessment of win probability falls beneath a certain threshold; for the match against Lee, the resignation threshold was set to 20%.[64]

nopinsight|6 months ago

There are quite a few relatively objective criteria in the real world: real estate holdings, money and material possessions, power to influence people and events, etc.

The complexity of achieving those might result in the "Centaur Era", when humans+computers are superior to either alone, lasting longer than the Centaur chess era, which spanned only 1-2 decades before engines like Stockfish made humans superfluous.

However, in well-defined domains, like medical diagnostics, it seems reasoning models alone are already superior to primary care physicians, according to at least 6 studies.

Ref: When Doctors With A.I. Are Outperformed by A.I. Alone by Dr. Eric Topol https://substack.com/@erictopol/p-156304196

jimbo808|6 months ago

It makes sense. People said software engineers would be easy to replace with AI, because our work can be run on a computer and easily tested, but the disconnect is that the primary strength of LLMs is that they can draw on huge bodies of information, and that's not the primary skill programmers are paid for. It does help programmers when you're doing trivial CRUD work or writing boilerplate, but every programmer will eventually have to be able to actually truly reason about code, and LLMs fundamentally cannot do that (not even the "reasoning" models).

Medical diagnosis relies heavily on knowledge, pattern recognition, a bunch of heuristics, educated guesses, luck, etc. These are all things LLMs do very well. They don't need a high degree of accuracy, because humans are already doing this work with a pretty low degree of accuracy. They just have to be a little more accurate.

jacquesm|6 months ago

Humans are statistically speaking static. We just find out more about them but the humans themselves don't meaningfully change unless you start looking at much longer time scales. The state of the rest of the world is in constant flux and much harder to model.

ben_w|6 months ago

Sure, that does make things easier: one of the reasons Go took so long to solve is that one cannot define an objective score for Go beyond the end result being a boolean win or loose.

But IRL? Lots of measures exist, from money to votes to exam scores, and a big part of the problem is Goodhart's law — that the easy-to-define measures aren't sufficiently good at capturing what we care about, so we must not optimise too hard for those scores.

Jensson|6 months ago

> Sure, that does make things easier: one of the reasons Go took so long to solve is that one cannot define an objective score for Go beyond the end result being a boolean win or loose.

Winning or losing a Go game is a much shorter term objective than making or losing money at a job.

> But IRL? Lots of measures exist

No, not that are shorter term than winning or losing a Go game. A game of Go is very short, much much shorter than the time it takes for a human to get fired for incompetence.