The Lemonade Stand Game

2009 2010 2011 2012 Forum Code The Future

The Future of Multiagent Learning

It may seem odd, but my motivation behind this very empirical study is my interest in the theory of multiagent learning. The field has begun some deep soul-searching as of late, most importantly in the paper "If multi-agent learning is the answer, then what was the question?" by Shoham, Powers, and Grenager. In that paper, they isolate five objectives for multiagent learning:

As computer scientists, we have made great strides on computational problems. However, sometimes (and I myself am guilty of this) we try to view other aspects of the problem in a computational light. Arguably, this is natural given our discipline. However, I would argue that computer scientists have another way in which they can benefit the community. In particular, computer scientists are world experts at formally describing what they want to do.

There are many experiments that are performed on humans. This results in one of two things: either observations about a human beings behavior (from which a strategy or intent must then be derived), or a strategy which is explicitly laid out by the subject of the experiment. The latter is highly suspect: humans are notorious for saying they will do one thing, and then doing another (diets and exercise are two excellent examples of people laying out plans). But a computer program is a contract: it says what will be done in the future. So it distinguishes between what one would want to think that one would do and what one would do.

Even as we write programs, this distinction is significant: there is a big difference between an algorithm submitted to a conference and an algorithm submitted to a competition. Although the top three competitors in this competition were aware of regret minimization and other AI methods, they chose instead to apply completely different strategies.

Herein lies my reason for continuing this competition: observing these programs provides an opportunity to describe a whole new language for describing intelligent behavior. Beyond cooperation and competition, these agents are engaging in behavior that is not only a response to their environment, but they also recognize that their behavior forms an enviroment for others to respond to. Thus, I will play not just so that I can learn, but so that others can learn as well.

This lesson is not completely new: in fact, in Axelrod's iterated prisoner's dilemma competition, he made the same observation. So what is left to be done? In the iterated prisoner's dilemma, cooperation and defection were obvious (those were the names of the actions), and who you should try to cooperate with was obvious. In the real world, both of these things are unclear.

In this first lemonade stand game competition, we have observed that a very natural strategy is to play a Nash equilibrium, waiting for others to join you in playing it. This is similar to the AWESOME algorithm of Conitzer and Sandholm, but we actually observe people successfully using this methodology in the presence of other intelligent agents. The winning algorithm tried to estimate whether it would be better to play a constant strategy or respond to others playing a constant strategy.

At present, the game is simple, but it is one of very many simple games, which in aggregate are mind-boggling. Can we design algorithms for symmetric three-player, 12 action, constant-sum game of 100 rounds? What if there is communication? What if the game does not have a constant sum? What if there are more than three players? What if there is no symmetry?

Multi-agent theory is hamstrung by coming up with reasonable definitions of intelligence. But, as Dostoevsky so eloquently put it, what does reason know? To know what strategies humans would employ, we must work with each other on increasingly complex categories of tasks, to introspect honestly as to what behavior we believe will work. This behavior can then be categorized, and then we can understand it. In fact, if we restrict our definition of intelligence, then it can be empirically falsified:

Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.

Thus, while whether or not a particular property of intelligence can never be verified to hold for all humans, it certainly can be observed to hold for a set of algorithms that specify the choices of a specific set of people at a specific time. Thus, if we conjecture a certain property holds, we will see if it does when the competition actually occurs. Moreover, it also recognizes that we are not necessarily studying intelligence as a whole, nor even the intelligence of the group for all time, but specifically studying the ideas of a group at a point in time.

The biggest issue with studying the phenomenal intelligence of algorithms designed by machine learning experts is that it aggravates the Heisenberg Uncertainty Principle: the observations we make about past competitions will affect future competitions. The biggest potential gain is a new language of describing intelligent behavior, that can then be used in understanding human experiments, designing theoretical models of intelligence, and building intelligent agents in the future.