Custom bots for Unreal Tournament 2004 pass Turing test

[+] techdmn|13 years ago|reply

It seems strange to me to invoke Turing's test when discussing a highly defined problem such as playing a FPS. I think the strength of the original test is that a conversation is by nature very open. When you change the test to a game with clear scoring criteria, both humans and computers will tend toward optimization strategies that may become difficult to distinguish.

[+] romaniv|13 years ago|reply

I think many people in IT want to believe in "AI that's as good as human" so much that they erode the notions of humanity to make computers pass.

For example, a chat bot can certainly behave as the worst chat user, but that doesn't mean they can hold an intelligent conversions, which is really what we should care about. AI should simulate human intelligence, but very often all we're seeing is simulation of human stupidity. That's what happened in this case as well.

To me, good AI would be characterized by ability to handle highly unusual situations, not by mimicking irrational behavior.

[+] baddox|13 years ago|reply

Yes. The assumption behind the Turing test (a reasonable one, methinks) is that human language is as "strong" of a test as you can devise. That said, this article was about an AI that was designed to appear human, not one that tried to perform optimally. It would be trivial to create an FPS AI that performed literally optimally, but it would be extremely easy for a human player to notice this ("WTF, there's no way to get a headshot every single time!") and conclude that it's a bot.

[+] calinet6|13 years ago|reply

Yeah, they haven't passed the Turing test, which is a test of very complex and difficult problems (ones which humans do effortlessly but computers cannot emulate).

This is a much simpler set of behaviors for a computer to emulate. It's an interesting test of AI for sure, but it shouldn't be called a Turing test.

[+] ZoFreX|13 years ago|reply

I was debating whether to call it a modified Turing test in the title, but had already editorialised it quite a lot.

What was interesting though is that the bots and humans did not converge on optimization strategies, and part of appearing human was programming irrational behaviour (such as grudges) into the bots.

[+] esolyt|13 years ago|reply

This is immediately what I thought when I saw the title. It is pretty difficult to tell a human from a bot when you only have a very small set of predetermined actions.

[+] unknown|13 years ago|reply

[deleted]

[+] Scriptor|13 years ago|reply

Have there ever been any studies done to look at the effects of any biases when you're actively trying to tell a bot apart from a human? My guess is that the judges end up looking for specific traits and characteristics. Bot developers can then include these in the AI. The issue is that those traits may not be even close to matching an actual person in other situations, but they're enough to provide a local maximum of sorts.

In this case, the fact that on average bots seemed scored as more "human" than actual humans is more of a sign of a critical flaw in the judging system than any great progress. It looks like they reduced typical human behavior to some very simple things, such as holding a grudge or other irrational behavior. If that was a major part of the judges' criteria (consciously or subconsciously) then all this contest proved is that bots can be programmed to be more irrational than human players.

[+] patrickk|13 years ago|reply

I guess the ultimate test would be if they let the bots loose in an online multiplayer FPS environment and see if anyone noticed.

Bonus points if the bot can sing annoying, off-key pop songs in the pre-game loading screen, to mimic the true CoD Xbox Live experience.

[+] omarchowdhury|13 years ago|reply

In Unreal Tournament 2004, you can spectate a user's screen (be it a bot or a human). The bot's cursor follows rigid straight movements to hit the target, movements which would be impossible for a human to do (see: drawing a perfect circle on the first try).

[+] xibernetik|13 years ago|reply

I'm suspicious of the game being UT2k4. It's a fast game with a steep learning curve that's long past its glory days - meaning a small player base. To someone unfamiliar with the game, even the AI that came in the box could be mistaken as human. To an experienced player, it'll be easy to identify newbie-ish patterns and see where the bots are straying from typical new-player psychology once they start toying with them. If the bot is acting experienced... Even the movement during combat in the game is complex, and there are a ton of areas an AI could trip up in.

If the judges were at a competitive level, colour me impressed - but if it was their first time, or even their first week, I'm a little more skeptical. I don't think a novice player would understand the game well enough to judge well. It would be like attempting a traditional Turing test with humans who can't speak English fluently and were raised in a non-English culture: impressive, but no indicator of bots reaching human-like levels.

[+] pxlpshr|13 years ago|reply

I came here to post the same thing.

I use to play UT/CS competitively and worked for a startup that licensed our technology to id Software and Riot Games / League of Legends a long time ago.

UT2k4 was an amazing FPS game with a really steep learning curve. It's one of the only FPS games that I refer to as the "basketball" of online gaming. The diversity of movement, weapon tactics, and map control meant a seasoned gamer could really define their own style. But it also meant few people ever transitioned from public servers into competitive play because 1 pro could easily go Godlike and demolish an entire server, making it extremely frustrating and unexciting for casual gamers.

That being said, watching the videos included in this article signaled that these judges had no experience with UT2k4.

In a match with professional gamers, it wouldn't surprise me if those judges thought WE were the bots. 50%+ accuracy was not uncommon with prim shock or lightening gun.

[+] ZoFreX|13 years ago|reply

> If the judges were at a competitive level, colour me impressed - but if it was their first time, or even their first week, I'm a little more skeptical.

There's a pretty broad area between those two extremes - I'm nowhere near competitive level, but I played (very casually, mainly with friends rather than online) a few hours a week for years, and I definitely understand the game well and can identify good and bad plays, and humans and bots.

I do thoroughly agree, though, that this is nowhere near the level of a true Turing test - but I thought it was still quite interesting!

--

Aside: I think these days with the rise of game spectating and commentary and analysis of matches, more and more non-competitive players are gaining deeper understanding of games they like even if they couldn't necessarily pull it all off themselves.

For a more modern game like Starcraft II for example I think you could find a very large number of non-competitive players that could reliably identify bots from humans.

[+] jimrandomh|13 years ago|reply

Actually, it's even worse than that - the judges weren't spectators, they were players, so they'd probably be too busy dodging to look closely at what the bots were doing.

[+] talmand|13 years ago|reply

I need to see more evidence. The two videos they provided from the viewpoint of the "judges" show that the judges had no idea how to play the game. No serious player stands in one spot firing nonstop. One video showed the judge had no aiming skills whatsoever.

What was the criteria? "That one must be human because it can move and shoot at the same time!"

This reminds me of the awesome days of the ReaperBot, that cheating bastard.

[+] locci|13 years ago|reply

They don't even dodge, which is the staple movement in UT games.

[+] pfortuny|13 years ago|reply

If humans get a 40% of 'humanity', we have a problem with the definition of 'human' or with the rules of the game or whatever. I guess in order to pass as humans they should get a 'standard' score of humanity. Either that or the rules are strange.

[+] biot|13 years ago|reply

In one online game I played, people accused me of cheating/using a bot [I never did]. The problem was that they actually were cheating and could see through walls and/or used an aimbot to improve their accuracy, but my reaction time was faster than theirs and I would get a headshot off before they could kill me. Of course, it didn't help that I used a shotgun which was ridiculously overpowered in the game.

[+] aidenn0|13 years ago|reply

"Human players received an average humanness rating of only 40 percent."

The judging system needs to be seriously reevaluated.

[+] pfortuny|13 years ago|reply

The judging system did not pass the Turing test...

[+] farinasa|13 years ago|reply

I don't like the Turing test. It is nonquantifiable and is used to test something we don't even fully understand yet. It is also based on previous computing mindsets. It's like using a horse and buggy to determine whether a car is suitable.

[+] laserDinosaur|13 years ago|reply

As long as the AI is running around like an idiot spamming rockets, getting run over by vehicles and generally wasting ammo, yep that sounds like most humans who played UT04. Not exactly the most difficult test to pass...

[+] icey|13 years ago|reply

If you like playing with chatbots and think this kind of stuff is cool, please shoot me an email at [email protected] - I want to talk to you!

31 comments