Preventing Bots from Playing Online Games

PHILIPPE GOLLE and NICOLAS DUCHENEAUT Palo Alto Research Center

As multiplayer online gaming gains in economic and social importance, an increasingly large number of players is beginning to rely on bots (automated player agents) to gain unfair advantages in games. In this article we study the problem of restricting participation in online games to human players so they can enjoy the game without interference from the bots. We propose two broad approaches to prevent bots from playing online games. The first consists of seamlessly integrating software-based tests (known as reverse Turing tests or CAPTCHA tests) into online games to tell humans and computers apart. Our second contribution is to propose hardware instantiations of CAPTCHA tests. Our techniques are applicable in a wide variety of online games, from poker to "shoot'em ups." They are cost-effective, immune to cheating, and preserve the human players' enjoyment of each game. We conclude with a discussion of how approaches to deter the use of bots may complement our techniques to detect bots.

Categories and Subject Descriptors: H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems—Artificial, augmented, and virtual realities

General Terms: Design, Experimentation

Additional Key Words and Phrases: Games, agents, bots, reverse Turing test, CAPTCHAs.

1. INTRODUCTION

Multiplayer computer games have become an increasingly important economic, social, and cultural phenomenon: nowadays, millions of players routinely gather online to play their game of choice [Woodcock 2005]. Online game genres are extremely diverse: firstperson shooters (FPS), massively multiplayer online role-playing games (MMORPGs), and online card games (e.g., poker) are among the most popular. However, all share a similar characteristic: they pit human players against one another, unlike earlier forms of computer gaming where most opponents were computer-controlled. The inherent limitations of human physical and mental capabilities, the diverse ways in which these limitations manifest themselves in individuals, and the ability to transcend these limitations are all essential to the enjoyment of a multiplayer game. For instance, success at playing poker is largely predicated on one's ability to correctly assess probabilities; success in FPS depends in great part on hand-eye coordination and "twitch" reactivity; and success in MMORPGs depends on playing the game for long hours, collaborating with others to gain experience, and accumulating powerful objects. Yet some players do not accept the rules of the game and refuse to play only to the best of their abilities. Instead, they use various forms of automation to gain an unfair advantage. The use of so-called bots (or automated player agents) has recently become more prevalent due to the increasingly porous barriers between real and virtual economies [Castranova 2003]. For instance, writer Julian Dibell recently told Wired

Authors' Address: Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304; emails: {pgolle,nicolas}@parc.com

Permission to make digital/hard copy of part of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Permission may be requested from the Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036, USA, fax:+1(212) 869-0481, permissions@acm.org © 2005 ACM 1544-3574/05/0700-ART3C $5.00 magazine that he earned $3,917 in one month by buying and reselling cybergoods from Ultima Online on eBay. The process of collecting artifacts and/or in-game currency, referred to as "farming," in MMORPGs can be easily automated and can lead to substantial monetary gains. In online card rooms, bots can be used to play games entirely on their statistical properties, and thus give its users an unfair advantage whereby they can profit against imperfect human players [Brunker 2004]. Finally, in FPS, bots can be used to increase a player's performance, for instance by artificially increasing targeting accuracy (CounterStrike's aimbot is one of the most blatant examples).

The problem of bots is not a superficial symptom of the limitations of existing games. Rather, it is an inherent consequence of the fact that repetition is a fundamental component of games [Koster 2004]. Mastering patterns is an important source of enjoyment, so games with infinite variations are perceived as chaotic or "noisy," and are poorly received by players. Thus bots cannot be eliminated by just designing richer, more diverse games (too much repetition attracts bots, too little drives humans away). Instead, techniques are needed to distinguish bots from human players so that bots can be eliminated from games.

At first, the problem of distinguishing human players from bots in online games may not seem very difficult. After all, there is a clear gap between the abilities of humans and computers (e.g., bots cannot carry on coherent and sustained conversations with humans). For our purposes, however, identifying this gap is not enough. We must exploit it to distinguish bots from human players in a way that is cost-effective, immune to cheating, and preserves the human players' enjoyment of the game. These three requirements are not easily satisfied together and, as we will show, the best solution often depends on the game.

We propose two broad approaches to keep bots out of online games. The first consists of seamlessly integrating software-based tests (known as reverse Turing tests or CAPTCHA tests [Ahn et al. 2003]) into online games, to tell humans and computers apart. Our second approach is to propose hardware instantiations of CAPTCHA tests, which are extremely versatile, and suitable for a wide variety of online games, from card and board games to "shoot'em ups." Finally, we end with a discussion of how approaches to deter the use of bots may complement our techniques to detect bots.

2. MODEL

We assume throughout this article that bots are undesirable, but a disclaimer is in order: not all uses of automation are malicious or detrimental to the quality of games. In fact, the use of automation is sometimes an integral part of the games. For instance, Second Life,1 a virtual world focused on content creation, provides extensive scripting capabilities so that players can easily create new content. These scripting capabilities could also be used to automatically control players, turning them into bots. In other cases bots are tolerated when used to automate tedious tasks (such as, for example, macros in Star Wars Galaxies2). But their use is generally forbidden when it spoils other players' enjoyment of the game (e.g., fleecing opponents in a poker game or instantaneously killing an opponent with an aimbot) or when it is detrimental to the financial interests of game publishers (e.g., when selling game content on open markets).

Of course, our techniques should only be applied to eliminate the problematic use of bots. The decision as to which uses are allowed and which are forbidden should be made by the game designers and/or the players. Our contribution is to offer tools to enforce these decisions. More precisely, our techniques help enforce one of two properties: human presence (the weaker property) or human play (the stronger property).

Human presence. A technique enforces human presence in a game if some input from a human is required to play the game. The requirement is only that a human contributes some of the interaction with the game, not necessarily all of it. In particular, the human may be assisted by a bot that controls the rest of the interaction with the game. In online poker for example, human presence means that a bot can only play under the supervision of a human. The cards may be chosen by the bot according to probabilities computed by the bot, but the bot cannot play without a human "baby-sitter." While human presence may seem like a weak property, it is actually surprisingly useful. Consider that the scope of bot involvement in online poker games, FPSs or MMORPGs would presumably decrease considerably if every single bot had to be supervised by a human player.

Human play. Human play is a stronger property that requires that all interaction with the game software come from a human, without any involvement from a bot.

Adversarial model. We assume that the game software is tamper-proof and that bots can only interact with the game via the input and output interfaces defined by the game. These assumptions are justified for online multiplayer games, since the game software resides on secure game servers under the control of a game operator. Players may succeed in reverse-engineering or tampering with the code running on their local machines, but the local code is used only to exchange data with the central servers. Tampering does not free players from the constraints of interacting with the games via well-defined input and output interfaces.

We make the standard assumption that players are rational and derive no utility from the intrinsic use of a bot (no defense appears possible against an adversary who has more intrinsic utility for using a bot than for winning the game).

3. CAPTCHA TESTS

There are a variety of automated tests for telling humans and computers apart in an online environment, known collectively as CAPTCHA tests (Completely Automated Public Turing Test to Tell Computers and Humans Apart) [Ahn et al. 2003]. A CAPTCHA test is a program that can generate and grade tests that most humans can pass but that current computer programs cannot. CAPTCHA tests are of necessity heuristic; there is no way to prove that a computer program cannot pass a test that a human can pass. CAPTCHA tests exploit gaps between the abilities of humans and computers that have not narrowed much over the past decades; but there is no guarantee that these gaps will not eventually disappear. The most widely used CAPTCHA tests currently rely on the ability of humans to recognize randomly distorted text or images [Coates et al. 2001; Ahn et al. 2004].

CAPTCHA tests can be used during a game to verify human presence. A wide range of testing strategies is possible. A single CAPTCHA challenge may be presented when a player joins a game, or else repeated CAPTCHA challenges may be presented to the player at random intervals over the course of a game. The game may on occasion allow the player not to reply or to give an incorrect answer. A player that fails one or several challenges may be suspended from the game, either for a limited time or permanently (in which case all progress made in the game is lost). CAPTCHA tests deter bots most effectively in long stateful games where players have a lot to lose if they fail a test; games played in short, stateless rounds may require more frequent tests.

We call this use of CAPTCHAs "out-of-band" because the CAPTCHA test is not integrated into the world of the game but occurs outside of it. The out-of-band

CAPTCHAs' main advantage is that they are extremely flexible and can be used with just about any game. However, out-of-band CAPTCHA tests suffer from the following limitations:

CAPTCHA tests that are disruptive: They draw the player's attention away from the game. The interruption is only momentary (it takes only a few seconds for a human to solve a CAPTCHA) but it breaks the suspension of disbelief that is so important to gaming. CAPTCHAs can also adversely affect the pace of a game: frequent requests to solve CAPTCHA tests would be intolerable to human players, since pacing and a sense of flow is essential to the enjoyment of a game [Koster 2004]. We address this issue in Section 3.1.

CAPTCHA tests can be outsourced: A CAPTCHA must be solved by a human, but that human need not be the person playing the game. For example, a bot playing a game without human assistance may forward the CAPTCHA tests it receives to low-cost workers that specialize in solving them. Games are particularly vulnerable to these so-called CAPTCHA relay attacks [Stubblebine and Oorschot 2004] because they make frequent repeated use of CAPTCHAs. The cost of setting up a relay attack to solve a single CAPTCHA test may be prohibitive (e.g., when opening a new email account), but it becomes negligible if it can be amortized over many tests, as in online games. Traditional digital CAPTCHAs may thus not be very effective in preventing bot participation in online games. We address this issue in Section 4.2.

3.1 Embedding CAPTCHAS Into Games

With a little design effort, CAPTCHA tests can be embedded into games in ways that are much less disruptive than out-of-band CATPCHAs. For most games, there is a trade-off between how well a CAPTCHA test distinguishes between a human and a bot, and how well it blends into the game environment. Correspondingly, game designers can choose from a spectrum of tests that range from most secure to least disruptive, depending on the severity of the threat posed by bots and the tolerance of players for changes to the game.

At one end of the spectrum, we can embed the same well-studied CAPTCHA tests that we proposed using in an out-of-band fashion in the previous section. These tests have withstood a lot of scrutiny and are widely believed to be secure; but they are difficult to integrate seamlessly into games. The best we can hope for is to minimize the disruption, compared to the out-of-band use of CAPTCHAs. We illustrate this approach with the example of in-game resource collection or creation in MMORPGs. It would be consistent with the game's environment to require that players obtain licenses before they are allowed to perform certain actions. Licenses would take the form of an "exam" that is a CAPTCHA in disguise. For example, a license to craft items may require players to prove that they can read engineering blueprints (that is, solve a text-based CAPTCHA such as Gimpy [Ahn et al. 2004]); a license to hunt may require players to identify an animal in a picture (that is, solve an image-based CAPTCHA such as PIX [Ahn et al. 2004]), and so on. Obtaining the license and passing the exam would be almost instantaneous and only adds a simple preliminary step to the regular sequence of actions required to collect or create resources.

At the other end of the spectrum, we find tests that are far less secure but can be integrated virtually seamlessly into games. These tests are CAPTCHAs only in a weak sense: they make it far harder to program bots that play the game, but not impossible (they will not stop a determined adversary). The tests consist of adding random variations to key aspects of the game, e.g., the location of powerful items and resources in MMORPGs can be decided at runtime when a player chooses a particular mission or quest. This prevents "spawn camping" at hard-coded locations, which is highly susceptible to automation and the unusually quick accumulation of valuable game resources. Several recent games rely on this mechanism (e.g., Star Wars Galaxy).

Like out-of-band CAPTCHAs, embedded CAPTCHAs allow for a broad range of testing frequency and strategy. CAPTCHAs that are tightly integrated into, and consistent with, the game's environment do not negatively affect suspension of disbelief, and are thus also less likely to be outsourced.

Applicability. Embedding CAPTCHAs into games is a suitable approach to ensure human presence in games that are slow-paced, have high-entropy environments, and rules that are not too specific (most MMORPGs satisfy these requirements). A slow-paced game ensures that players have enough time to solve embedded CAPTCHAs. A high-entropy environment, such as a rich graphical world, allows embedded CAPTCHAs to blend naturally into the environment. Finally, the absence of very specific rules makes it possible to embed CAPTCHAs that are not noticeably out of place.

3.2 The Game as CAPTCHA

By extending this approach further, the game itself may be regarded as a CAPTCHA. The test to distinguish bots from humans then becomes none other than the game itself. This approach is counter-intuitive: the very presence of bots in a game would seem to suggest that the game is a poor distinguisher between human players and bots. But this is not the case. The critical observation is that successful bots perform as well as humans along a single measure of success: the scoring mechanism of the game. A variety of other measures such as, for instance, the time spent in certain game locations or patterns of activity may distinguish bots from human players. Game designers may rely on collaborative filtering or other models from artificial intelligence to learn how to compute clusters of bots and clusters of human players.

Interplayer conversations. We illustrate this approach with the example of interplayer conversations. Text-based and even audio conversation is naturally integrated in numerous multiplayer games. In fact, some MMORPGs are purposefully designed to maximize player-to-player interactions [Koster 2005]. Bots may engage in meaningless conversation with other bots, but they cannot engage in sustained conversation with human players (even some of the more sophisticated "chatterbots" such as A.L.I.C.E.3quickly betray their robotic nature in conversation). Interaction metrics computed over the entire player population (see Ducheneaut and Moore [2004]) can separate clusters of bots from clusters of human players. The clustering algorithms could be fixed or adapted dynamically according to real-time analysis of the conversation dynamics in each specific game. It should be clear from this description that, from the viewpoint of the game operator, the use of conversation to distinguish humans from bots is entirely automated and requires no human intervention. All the human intervention required is provided, freely and unwittingly, by other players.

A potential limitation in computing metrics to distinguish bots from human players over the whole user population is that the resulting models may produce a high number of false positives and false negatives. We propose two approaches to deal with the expected high error rates:

Use models as filters. The first approach is to rely on clustering models as a first filtering step only. To improve accuracy, models may be combined with the CAPTCHA tests of Section 3.1: accounts suspected of being played by bots are served with an embedded CAPTCHA test.

Use models to adapt the game. Another (and perhaps complementary) approach is to rely on clustering models to dynamically adapt the game's level of difficulty to the players' performance, and not to expel bots from the game. For instance, models can track a player's progress and make sure that, as soon as a task is mastered and can be performed repeatedly, it will be replaced by a different higher-level task. This amounts to a "softef' approach to eliminating bots, while simultaneously preserving, and even enhancing, the enjoyment of human players.

Limitations. If the models that detect bot patterns came to be known, or could be reconstructed from observation, they could be defeated. The battle between bot designers and bot detectors might then take on the flavor of the arms race that pits virus writers against anti-virus software. More importantly, perhaps, computing these metrics might be prohibitively expensive for some game producers, since it requires a significant amount of data logging and mining.

4. PHYSICAL CAPTCHAS

To move beyond the limitations of the approaches we described above, we propose hardware-based "physical" CAPTCHA tests that distinguish human players from bots based on the humans' ability to interact with the physical world. Human players can easily perform physical actions such as pressing a button, moving a joystick, or tapping on a touch-sensitive screen; but it is difficult and expensive to design hardware to automate these physical tasks. Physical CAPTCHAs offer several advantages over the digital ones of the previous section: they are harder to outsource, they can be less intrusive; and offer the stronger property of human play.

A physical CAPTCHA is a device like a keyboard or a joystick that accepts physical inputs (e.g., motion, pressure) and produces digital outputs. A player should only be able to learn the digital ouput produced by a physical CAPTCHA device in response to a given input at a given time by physically producing that input at that time. To guarantee this property, a physical CAPTCHA device must be tamper-proof and authenticate its output.

Authentication. To prove that the output of the physical CAPTCHA comes at a given time, the output must be authenticated, either to a remote game server or to the game software running on the user's PC or game console.

Tamper-proof. A tamper-proof CAPTCHA device immediately loses its ability to authenticate outputs if an attempt is made to tamper with it (e.g., the authentication key is wiped from memory). This ensures that inputs can only be submitted to the physical CAPTCHA device via its physical input interface. Any attempt to bypass that interface (opening the device, rewiring its microcontrollers) causes the loss of the authentication key.

4.1 CAPTCHA Input Devices

One possible approach in implementing physical CAPTCHA is to turn existing game input devices into CAPTCHA devices. Joysticks and other small game input devices can be rendered tamper-proof at little cost, and adding authentication capability is also inexpensive. To make larger devices, such as keyboards, tamper-proof would be comparatively more expensive (we propose an alternative solution for keyboard-based games in the next section). We call a tamper-proof joystick that authenticates its communication with the game software a "CAPTCHA joystick." Games played with a CAPTCHA joystick ensure that there is no other way for a player to send inputs to the game than by physical action on the joystick (pressing, pushing, etc.) A human can exert this physical action much faster and much more reliably and easily than a machine.



Source: http://scholar.google.com