top | item 18993322

(no title)

rcheu | 7 years ago

This is really impressive, I didn't expect starcraft to be played this well by a machine learning based AI. I'm excited to read the paper when it comes out!

That said, I'm not sure I agree that it was winning mainly due to better decision making. For context, I've been ranked in the top 0.1% of players and beaten pros in Starcraft 2, and also work as a machine learning engineer.

The stalker micro in particular looked to be above what's physically possible, especially in the game against Mana where they were fighting in many places at once on the map. Human players have attempted the mass stalker strategy against immortals before, but haven't been able to make it work. The decisions in these fights aren't "interesting"--human players know what they're supposed to do, but can't physically make the actions to do it.

While they have similar APM to SC2 pros, it's probably far more efficient and accurate so I don't think that alone is enough. For example, human players have difficulty macroing while they attack because it takes valuable time to switch context, but the AI didn't appear to suffer from that and was extremely aggressive in many games.

discuss

order

gamegoblin|7 years ago

In the mass stalker battles, the AI APM exceeded 1000 a few times, and no doubt that most of that was precisely targeted. Whereas a human doing 500 APM micro is obviously going to be far more imprecise.

I think a far more interesting limitation would be to cap APM at 150 or so, or to artificially limit action precision with some sort of virtual mouse that reduced accuracy as APM increased.

wnevets|7 years ago

>I think a far more interesting limitation would be to cap APM at 150 or so, or to artificially limit action precision with some sort of virtual mouse that reduced accuracy as APM increased.

IIRC OpenAI limits the reaction time to ~200ms when playing DoTA2. AI employing better strategies than humans will always be more interesting than AI that can out click humans.

a_wild_dandan|7 years ago

Here's a graph of AlphaStar's APM versus a professional player's: https://i.imgur.com/TXeLkQK.png Evidently AlphaStar also has a similar Economy of Attention (where the player focuses) to a professional player, at around 30 screens per minute. Additionally, AlphaStar's reaction time is around 350ms, a significant disadvantage over a pro.

The skepticism in this thread is absolutely justified but I think it's important to note the lengths to which DeepMind has gone to address and assuage the fears of superhuman mechanical skills being employed in these games.

taneq|7 years ago

How many of those 500 actions are actually useful? I haven't watched competitive StarCraft games for years but back when I did, rates were more like 300APM and even then the players basically spam clicked the background or selected random units non-stop and were probably only doing 50-100 actual effective actions.

simmanian|7 years ago

> artificially limit action precision with some sort of virtual mouse that reduced accuracy as APM increased

I like the idea of having action noise that's linearly related to APM

pesmhey|7 years ago

There would be an entire new dimension of decision making, in addition to good macro, where you have to prioritize actions. Will be interesting to see.

sayno3|7 years ago

[deleted]

pmontra|7 years ago

I understand the spirit of the proposal but that would be like limiting a computer to add at most two numbers per second. It's OK if we want an interesting contest against humans but it wouldn't be a fair estimate of a computer math capability. It's also not the point of using computers to do math instead of a room full of accountants. I'm OK with the AI going as fast as it can and play superhuman strategies because it can be that fast. After all we'll not limit AIs output rate when we'll let them manage a country's power grid.

cm2012|7 years ago

In the showmatched they made the computer have to look at a regular screen to control, the stalker micro was much less impressive - and mana won.

CydeWeys|7 years ago

For now. Give them another month. This is like AlphaGo vs Fan Hui all over again -- people knocked that accomplishment at the time because he was just a master, not one of the top players in the world. Well, not much longer, AlphaGo beat Lee Sedol, the best player in the world.

The ceiling here is going to be incredibly high, much higher than the level of play that people are capable of, even when restricted to a single window.

Jyaif|7 years ago

The AI lost because it completely messed up the response to the immortal drop, nothing to do with micro.

Cookingboy|7 years ago

The results are obviously impressive, but even then there is a lot of work to do as far as learning efficiency goes:

"The AlphaStar league was run for 14 days, using 16 TPUs for each agent. During training, each agent experienced up to 200 years of real-time StarCraft play. "

MaNa probably played less than 2-3 years of Starcraft in his whole life (by that I mean 24hr x 365d x 3), and was learning with a much less focused/rigorous methodology.

derefr|7 years ago

Another way to think about it is that a human brain is mostly doing transfer-learning, on top of a 99%-baked deep net that was wired up during foetal development from our DNA, where that DNA-persisted model has "seen" hundreds of millions of years of training data.

Humans don't have to learn to process, recognize, and classify objects in visual sense-data, for example. We can do that from the moment we're born, because we already have hundreds of precisely-tuned "layers" laying around in our brains for doing just that. We just need to transfer-learn the relevant classes.

nopinsight|7 years ago

One macro technique used by AlphaStar agents that is not used by human pros is building extra workers beyond currently exploitable capacity.

This gives them reserves when attacked and some workers killed. They can also ramp up mining at a new base quickly by moving the extra workers there.

Apparently the benefits outweigh the costs for these workers for AlphaStar. It will be interesting to see if some pros decide to adopt the technique and if it improves human performance as well.

Disclaimer: I do not have much Starcraft experience.

jammygit|7 years ago

Workers mine 40 minerals per minute and cost 50, taking... 15 seconds to build? I forget. Workers beyond 24 provide zero benefit (better to send them to the natural).

Let's say you make 4 extra at a cost of 200 minerals and then lose 4 workers to harassment. You are out 200 minerals in both cases, but the prebuilt workers in the prebuilt case will mine an extra... 100 minerals? (40 + 30 + 20 + 10).

This doesn't take chronoboost into account though. I don't know, the gain is marginal, and the opportunity cost is having a smaller army (2 zealots for example)

Please correct my numbers if I've made a mistake, I forget build times and havent played since hots

TulliusCicero|7 years ago

Yeah, it looked like Mana was copying this behavior somewhat in the live game.

proc0|7 years ago

I bet there's a sweet spot in-between that will come out of this, like saturating your natural to 24 workers minutes before expanding.

rkagerer|7 years ago

Yeah that stalker micro really showcases a particular advantage leveraged by the AI.

I'd love to watch the results of constraining the AI so instead of seeing the whole map at once it has to pan around the same way a human would to get updated information on each battle. Counting those "info-gathering" window pans against the actions tally might yield slightly fairer APM metrics. (EDIT: Turns out they built a new agent for game 11 to do just that)

One of my biggest beefs with strategy games of this genre occurred around the time sprites went 3D and the player viewports got smaller (presumably to showcase all the cosmetic detail, and since it became harder to distinguish between visuals when zoomed out farther). I always feel too constrained on the modern games - like I can't see enough of the map at once. In my opinion that "full size viewport" gives a multi-tasking edge to the engine that the player doesn't share (beyond the human cognitive overhead from context switching you already pointed out).

On the other hand I find it fascinating our AI's have become strong enough at our games that we're having to handicap them to avoid players crying foul that they're not fair.

fandango|7 years ago

I agree. Most RTS games feel constrained because of the limited viewport. Supreme Commander has a nice feature where you can zoom all the way out at any time.

sciyoshi|7 years ago

I would agree with that. If you take a look at the exhibition match replay, there's some cases where it makes objectively suboptimal decisions. We couldn't see this during the live stream, but the double immortal warp prism caused AlphaStar to bring back its entire army from across the map, when a few units at home would have been enough to defend. It even kept trying to blink its stalkers to a place where the warp prism couldn't be reached. Perhaps this version with the limited viewpoint hadn't been trained with enough games?

andreyk|7 years ago

Also worth noting that it starts by imitation learning from pros. I'd be curious to see if the macro can be learned without imitation; a much harder challenge. Also, playing with full visibility as was mostly the case in the demonstration is quite lame...

hughzhang|7 years ago

I'll bet you that AlphaStarZero comes out in a year and just learns from scratch.

kolinko|7 years ago

It wasn't a full visibility - Alpha had a fog of war. It just saw the whole map at the same time.

celeritascelery|7 years ago

Also I wonder how it handles invisible units. Because as a human player you can see the shimmer if you look close. Can it see that or are they just totally invivisble to it?

freeflight|7 years ago

If you learn, why not learn from the best, the pros? These people already have spent years figuring out what works and what doesn't. Why not draw from that pool of knowledge and instead spend extra time going through the same motions?

olliej|7 years ago

it seems like in some cases at least it didn't have to move the camera (it had direct interfaces) which for some of the stalker micro battles (especially in game 3 or 4?) the battles were larger than the screen space -- it would not have been possible to micro that well if your control interface limited what you can control or where you can place them.

methodover|7 years ago

This is a great point, and something that seems a bit lost in the discussion:

In StarCraft 2, the game IS the interface. That is to say, the developers have constructed the game in such a way as to be difficult to control; and human mastery of the interface is a large percentage of the game. Strategy in the game is important, of course -- but this is not chess, where human beings are not limited by the interface of the game. In StarCraft, you are intentionally given a limited interface to monitor and control a gigantic game while under incredibly tight time controls.

And I should also note that Blizzard is extremely reluctant to add features that make it easier to control the game. I have a friend who works on the StarCraft 2 team. We talked at length about this one feature that he designed and proposed for the team to make a specific aspect of the game friendlier towards players. It was turned down for exactly the reasoning above -- the game is the interface. By making the game easier to control, it disrupts the entire experience; an StarCraft 2 that is easier to control is no longer StarCraft 2.

notSupplied|7 years ago

It's a question of whether "played with human level latency and precision" be a part of the rules of the game we are making the AI play.

I would say yes, because StarCraft was very clearly balanced for human players. We already saw some indication that when played with super-human micro, mass blink stalkers is a stronger strategy than when humans are in control. Without the active intervention of game balancing, RTS metas tend to devolve into "mass one or two units" which was what happenes to every Command & Conquer game (and why SC is a respected eSport while C&C is not).

I suspect this will happen when you have agents playing parameters that don't match what the game was balanced for. The strategic landscape will shrivel up and the game cease to captivate us.

stared|7 years ago

APM is one thing. I am curious what would happen if it could only see a limited view (as in the last game with MaNa, which it lost to him) and physical click dynamics (i.e. clicking + gaussian noise as an action, instead of giving direct commands). That way there will be misclicks, preventing this super-efficient Stalker micro.

sytelus|7 years ago

Also these wins are not using same inputs that human receive (ie on screen image) and outputs that humans are allowed. They instead use PySC APIs which has much more flexibility, perfect information and no constraints of limited screen real estate and pixels. There is a claim in that article that they have another version being trained that uses on screen only information but I still don’t know if AI is allowed to bypass the physical constraints of controller. So if AI has access to super human controller you will see AI performing super human actions like many commentators have described here.

ygra|7 years ago

Perfect information is a bit of a stretch. There was still fog of war. The AI just played as if the portion of the map visible and actionable at any point in time was the whole map. They retrained with a restriction to a given locus of attention that can change, akin to a screen the player is looking at and acting on.

kibibu|7 years ago

The final game in the video has this limitation. It does affect the performance of the agent.

Thaxll|7 years ago

This is exactly what I think, I'd like to see how Alphastar react to "cannon rush" or other weird bo where you need to be "smart" to counter it and just not be based on insane / none human micro.

hughzhang|7 years ago

AlphaStar makes up for its slightly subpar macro with REALLY good at micro. Thus, more micro heavy counters like cheeses are unlikely to beat it.

javier2|7 years ago

I am really impressed it learned when to pull probes in that game against Mana where the AI was pressured into his natural.

It was also extremely active with the stalkers, deciding to split them in three and not let Mana cross the map with his immortals.

throwawaymath|7 years ago

> For context, I've been ranked in the top 0.1% of players and beaten pros in Starcraft 2, and also work as a machine learning engineer.

What's that hireability like?

sidusknight|7 years ago

What was your SC2 alias? I played at a similar level as you.

ajuc|7 years ago

Mana tried to outblink an AI?

Damn I really need to watch these games :)

throwaway415415|7 years ago

Totally. What would be interesting to see is a low APM bot that still beats human players. A lot of that macro was unbeatable.

porky|7 years ago

And also, latency is lower

pesmhey|7 years ago

In a nutshell, AI micro was flawless, makes up for suboptimal macro?

cjbprime|7 years ago

The macro seemed fine -- AlphaStar usually had more workers than the human opponent, in every game, and was producing more army. The suboptimality seemed to be in army composition (blink stalkers) and strategic decision making (pulling all of a superior army back home to defend a single warp prism drop).

ehsankia|7 years ago

> While they have similar APM to SC2 pros

Wasn't the APM closer to half that of the pros?

https://storage.googleapis.com/deepmind-live-cms/images/SCII...

arcticfox|7 years ago

This is super deceiving and I'm kind of upset they posted this image, knowing it would mislead people not familiar with the game. The AI sits around during lulls at <30 APM - meanwhile MaNa and TLO were literally spamming keys to keep their fingers warm, not actually doing anything.

During the fights, the critical moments in when MaNa would top out at ~600 humanly inaccurate APM (this is 10 inputs per second), the AI would jump up to over 1000 - we don't know exactly what it was doing, but it was presumably pixel-precise. Meanwhile the physical inertia of the mouse is a challenge for humans at that speed - imagine trying to click five totally different places with perfect precision in a single second.

mactrey|7 years ago

A huge part of a human's APM is meaningless spam, for example right-clicking the same unit multiple times to attack it, or setting the same waypoint thousands of times in the early game when there's nothing to do. The computer might be at double the human's effective APM, if only we had a credible way to measure that.

biohazardpb4|7 years ago

There's a tail which shows that a small number of AlphaStar minutes had > 1k actions.