Reinforcement Learning Bots #89

garbear · 2017-12-07T21:37:12Z

Introduction

Here, I introduce the Emulation Equation, a game-theoretic equation for emulation.

The equation represents all emulation, allows for powerful features, and all gameplay (human or otherwise) becomes data for training an artificially-intelligent game-playing agent.

For demonstration purposes, I've split the explanation into two theories:

The Special Theory of Emulation explains the fundamentals, and presents a simplified emulation equation. Using this, all emulation and many powerful features are possible.
The General Theory of Emulation uses the same fundamentals, and presents an extended emulation equation. Using this, all gameplay (human or otherwise) becomes data for training a reinforcement learner.

For the physics nerds, this is analogous to how Einstein introduced relativity:

In 1905, the Special Theory of Relativity explained moving bodies without gravity
In 1915, the General Theory of Relativity explained moving bodies in the presence of gravity

Background

Reinforcement learning is popular in game-playing AI because the reward signal is often sparse (not evident every frame) and depends on actions taken much earlier in the game.

The Q-learning algorithm is well-suited for teaching reinforcement learners how to play a video game because it does not require a model of the environment, which would be difficult to create for even the most basic computing machine.

Recall the algorithm from Q-learning:

Understanding that is out of the scope of this documentation. Just know that it learns over "emulation equations", which describe a series of frames in the discrete time domain.

Here I present two theories of emulation. The Special Theory of Emulation is the smallest equation needed for emulation. The General Theory of Emulation expands on this equation, allowing it to be used for Q-learning.

Special Theory of Emulation

The Special Theory of Emulation presents the smallest equation (the emulation equation) needed to represent all emulation.

Emulation variables

Game-theoretic emulation uses two variables:

State consists of video, audio, and memory regions (RAM, SRAM, Real-time clock).

Action is the combined state of all input devices.

Time series

Emulation occurs at discrete time steps, so every time step has its own instance of these variables:

The emulation history is therefore a time series of tuples containing these two variables:

(S₀, A₀, S₁, A₁, ...)

Emulation model

Time steps occur by applying a set of functions to the emulation variables:

The PlayFrame() function takes the previous state, along with the most recent action, and produces a new state

The GetInput() function takes the previous action, along with the most recent state, and produces a new action

Emulation equation

The emulation equation is a time series model consisting of the initial conditions, as well as the model used for each time step.

The initial condition of all variables is the empty set:

The variables then evolve by applying the functions in sequence:

Summary

The emulation equation describes something fundamental in every emulator; play a frame, get input, repeat.

Interestingly, this fundamental concept was not a priori. It emerged as a model from solving an algorithm.

Surprising facts also appear. A₀ is empty; the first frame is played with all buttons unpressed. Upon deeper inspection, this results because Q-learners get no value from an Action without a prior State observation.

Next, we present the General Theory of Emulation, which expands on these fundamentals to include two new concepts needed in the Q-learning algorithm.

The General Theory of Emulation

The general theory of emulation extends the emulation equation so that it can be used for Q-learning.

Note: I also wanted to choose strategies for my Q-learners, such as "walk up" or "reach level 2". I extended Q-learning to depend on a Policy variable in the time series. When the strategy is the identity function (no strategy), this extended learning algorithm decays to Q-learning.