| About | Rules | Code | 2009 | 2010 Home | 2011 Home | Forum | The Future |
Imagine the following game. It is summer on Lemonade Island, and you need to make some cash. You decide to set up a lemonade stand on the beach (which goes all around the island), as do two others. There are twelve places to set up around the island like the numbers on a clock. Your price is fixed, and all people go to the nearest lemonade stand.
The game is repeated. Every night, everyone moves under cover of darkness (simultaneously) and in the morning, their locations are fixed. There is no cost to move. After 100 days of summer, the game is over. The utility of the repeated game is the sum of the utilities of single-shot games.
If all the lemonade stands are located at different spots, then your utility is the distance to the person clockwise you plus the distance to the person counterclockwise you, measured in spots.
For example, if Alice sets up at 3 o'clock location, Bob sets up at the 10 o'clock location, and Candy sets up at 6 o'clock, then first we arrange them clockwise from 1 o'clock (Alice, Candy, then Bob): there are 3 spots clockwise between Alice and Candy, 4 spots clockwise between Candy and Bob, and 5 spots clockwise between Bob and Alice. Therefore, Alice gets $8, Bob gets $9, and Candy gets $7.
If all the lemonade stands are located at the same spot, everybody gets $8.
If exactly two lemonade stands are located at the same spot, the two collocated stands get $6 each and the loner gets $12.
So, the total utility is always $24.
"Listen, here's the thing, if you can't spot the sucker in your first half hour at the table, then you are the sucker." - Rounders
The game has almost no structure. Probability theory and the theory of Nash equilibria do not provide a single solution to this game, but can you?
And yet, it is frighteningly hard: state-of-the art online algorithms fail when up against a simple strategy.
By symmetry, if everyone plays uniformly at random, they are in equilibrium. It is an equilibrium if Alice plays 12 o'clock, Bob plays 4 o'clock, and Candy plays 8 o'clock. Any equidistant actions form a pure strategy equilibrium.
Nash defined "solvable" games as games where one could mix and match the players' strategies that were parts of equilibria, and they would still be equilibria. For example, given any two equilibria A and B to play two-player Texas Hold em, if you combine the strategy for player 1 in A with the strategy of player 2 in B, this is a new equilibrium for Texas Hold em. In fact, all zero-sum games are solvable. This solvable property makes playing a Nash equilibrium an obvious technique for playing the game. This game is not solvable: as an unsolvable game, it is unlikely for players to begin in an equilibrium, regardless of their computational abilities. Therefore, it is a game where learning is not only necessary to exploit weak players, it is fundamental to not be exploited yourself.
What is more remarkable is that there are equilibria which are totally counterintuitive. For example, Alice at 12 o'clock, and Bob and Candy at 6 o'clock is an equilibrium. Moreover, Alice at 12 o'clock, Bob at 5 o'clock, and Candy at 6 o'clock is an equilibrium. If you considered a variant where one player moved after the other two, there is little the last player can do. For instance, if Alice chooses 12 o'clock, and Bob chooses 5 o'clock, then Candy's possible utilities are:
The bizarre aspect of this game is that your actions determine other people's utilities "more than" your own. To maximize short-term utility, you need only find the larger half of the island after the two other players have split the island in two. However, this leaves open a variety of ways to influence your long-term utility. For example, if Candy plays roughly equidistant from Alice and Bob, it is unlikely Alice or Bob will move, and Candy will be getting the least of the three. However, if she gets into either player's face by choosing the position near (adjacent, or even on top!) of one of them, then this may change the way Alice or Bob plays.
If Alice and Bob sit at opposite sides of the island 12 and 6, then Candy will get no more than $6, but she may be able to influence them to move.
One frustrating thing about poker research and auction research is that one can become so lost in the mechanism and the dynamics of the game, one forgets about the players inside the game. As researchers, we often focus on learning the game, then forget about the people playing it.
There are a variety of strategies that "minimize regret". I chose three: internal regret matching, external regret matching, and GIGA. These I call "intelligent", as they represent some (but not all) reasonable choices for an online game-playing algorithm. I also tried two equilibrium strategies: a strategy which played a constant action (sat in one spot the whole summer), and a random strategy (which choose a spot uniformly at random each day). I ran a tournament where every set of three played the game for 100 iterations.
I repeated this experiment 10,000 times to reduce variance.
The surprising result was that the constant had the highest total utility. At a human level, this is like one player staking out her ground in a dominant fashion. However, this strategy has an obvious weakness: if the two opponents of the constant strategy play on either side of her, then they can capture over 91% of the utility between them. And yet, these "intelligent" strategies, oblivious of the potential for collusion, struggle for scraps instead. More than anything else, what the game embodies is what is most lacking from AI: the ability to recognize the ability to collaborate with others.
The effort behind this competition is to unask the question of how to play the game, and instead focus on how to play the players in the game. In effect, this game is as simple as rock paper scissors, in that symmetries between actions leave nothing to the imagination. What makes the game complex is the people who are playing it.
The most obvious complexity added by humans is the natural connection they make between a particular location on adjacent time steps, whereas technically, the only meaning to this is due to people believing it to be so. But beyond that, a human's ability to evaluate the nature of their opponents would (hopefully) allow them to easily crush an opponent which stood still, rather than standing idly by while it crushed them.