You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here, I introduce the Emulation Equation, a game-theoretic equation for emulation.
The equation represents all emulation, allows for powerful features, and all gameplay (human or otherwise) becomes data for training an artificially-intelligent game-playing agent.
For demonstration purposes, I've split the explanation into two theories:
The Special Theory of Emulation explains the fundamentals, and presents a simplified emulation equation. Using this, all emulation and many powerful features are possible.
The General Theory of Emulation uses the same fundamentals, and presents an extended emulation equation. Using this, all gameplay (human or otherwise) becomes data for training a reinforcement learner.
For the physics nerds, this is analogous to how Einstein introduced relativity:
In 1905, the Special Theory of Relativity explained moving bodies without gravity
In 1915, the General Theory of Relativity explained moving bodies in the presence of gravity
Background
Reinforcement learning is popular in game-playing AI because the reward signal is often sparse (not evident every frame) and depends on actions taken much earlier in the game.
The Q-learning algorithm is well-suited for teaching reinforcement learners how to play a video game because it does not require a model of the environment, which would be difficult to create for even the most basic computing machine.
Understanding that is out of the scope of this documentation. Just know that it learns over "emulation equations", which describe a series of frames in the discrete time domain.
Here I present two theories of emulation. The Special Theory of Emulation is the smallest equation needed for emulation. The General Theory of Emulation expands on this equation, allowing it to be used for Q-learning.
Special Theory of Emulation
The Special Theory of Emulation presents the smallest equation (the emulation equation) needed to represent all emulation.
Emulation variables
Game-theoretic emulation uses two variables:
State consists of video, audio, and memory regions (RAM, SRAM, Real-time clock).
Action is the combined state of all input devices.
Time series
Emulation occurs at discrete time steps, so every time step has its own instance of these variables:
The emulation history is therefore a time series of tuples containing these two variables:
(S0, A0, S1, A1, ...)
Emulation model
Time steps occur by applying a set of functions to the emulation variables:
The PlayFrame() function takes the previous state, along with the most recent action, and produces a new state
The GetInput() function takes the previous action, along with the most recent state, and produces a new action
Emulation equation
The emulation equation is a time series model consisting of the initial conditions, as well as the model used for each time step.
The initial condition of all variables is the empty set:
The variables then evolve by applying the functions in sequence:
Summary
The emulation equation describes something fundamental in every emulator; play a frame, get input, repeat.
Interestingly, this fundamental concept was not a priori. It emerged as a model from solving an algorithm.
Surprising facts also appear. A0 is empty; the first frame is played with all buttons unpressed. Upon deeper inspection, this results because Q-learners get no value from an Action without a prior State observation.
Next, we present the General Theory of Emulation, which expands on these fundamentals to include two new concepts needed in the Q-learning algorithm.
The General Theory of Emulation
The general theory of emulation extends the emulation equation so that it can be used for Q-learning.
Note: I also wanted to choose strategies for my Q-learners, such as "walk up" or "reach level 2". I extended Q-learning to depend on a Policy variable in the time series. When the strategy is the identity function (no strategy), this extended learning algorithm decays to Q-learning.
Emulation variables
Reinforcement learning adds two variables:
Reward is used to train the function approximator that infers an action from the observed state. The reward can come from sniffing RAM, such as the achievements at http://retroachievements.org, or reading a value from video memory using OCR.
Policy is used by the agent to choose the next move. The goal of reinforcement learning is to choose the policy that maximizes the reward.
Time series
Emulation occurs at discrete time steps, so every time step has its own instance of these variables:
The emulation history is therefore a time series of tuples containing these four variables:
(S0, R0, π0, A0, S1, R1, π1, A1, ...)
Emulation model
Reinforcement learning also needs two more functions:
The GetReward() function takes the previous reward, along with the most recent values of the other variables, and produces a new reward
The Strategize() function takes the previous policy, along with the most recent values of the other variables, and produces a new policy
Emulation equation
The emulation equation is a time series model consisting of the initial conditions, as well as the model used for each time step.
The initial condition of all variables is the empty set:
The variables then evolve by applying the functions in sequence:
The text was updated successfully, but these errors were encountered:
right. so far the math in this issue just describes the data we need to gather to make this happen. then it can be uploaded to the cloud for training, and depending on the state of embedded tensorflow, inference can be run locally or in the cloud if we get netplay support.
Introduction
Here, I introduce the Emulation Equation, a game-theoretic equation for emulation.
The equation represents all emulation, allows for powerful features, and all gameplay (human or otherwise) becomes data for training an artificially-intelligent game-playing agent.
For demonstration purposes, I've split the explanation into two theories:
The Special Theory of Emulation explains the fundamentals, and presents a simplified emulation equation. Using this, all emulation and many powerful features are possible.
The General Theory of Emulation uses the same fundamentals, and presents an extended emulation equation. Using this, all gameplay (human or otherwise) becomes data for training a reinforcement learner.
For the physics nerds, this is analogous to how Einstein introduced relativity:
Background
Reinforcement learning is popular in game-playing AI because the reward signal is often sparse (not evident every frame) and depends on actions taken much earlier in the game.
The Q-learning algorithm is well-suited for teaching reinforcement learners how to play a video game because it does not require a model of the environment, which would be difficult to create for even the most basic computing machine.
Recall the algorithm from Q-learning:
Understanding that is out of the scope of this documentation. Just know that it learns over "emulation equations", which describe a series of frames in the discrete time domain.
Here I present two theories of emulation. The Special Theory of Emulation is the smallest equation needed for emulation. The General Theory of Emulation expands on this equation, allowing it to be used for Q-learning.
Special Theory of Emulation
The Special Theory of Emulation presents the smallest equation (the emulation equation) needed to represent all emulation.
Emulation variables
Game-theoretic emulation uses two variables:
Time series
Emulation occurs at discrete time steps, so every time step has its own instance of these variables:
The emulation history is therefore a time series of tuples containing these two variables:
(S0, A0, S1, A1, ...)
Emulation model
Time steps occur by applying a set of functions to the emulation variables:
Emulation equation
The emulation equation is a time series model consisting of the initial conditions, as well as the model used for each time step.
The initial condition of all variables is the empty set:
The variables then evolve by applying the functions in sequence:
Summary
The emulation equation describes something fundamental in every emulator; play a frame, get input, repeat.
Interestingly, this fundamental concept was not a priori. It emerged as a model from solving an algorithm.
Surprising facts also appear. A0 is empty; the first frame is played with all buttons unpressed. Upon deeper inspection, this results because Q-learners get no value from an Action without a prior State observation.
Next, we present the General Theory of Emulation, which expands on these fundamentals to include two new concepts needed in the Q-learning algorithm.
The General Theory of Emulation
The general theory of emulation extends the emulation equation so that it can be used for Q-learning.
Note: I also wanted to choose strategies for my Q-learners, such as "walk up" or "reach level 2". I extended Q-learning to depend on a Policy variable in the time series. When the strategy is the identity function (no strategy), this extended learning algorithm decays to Q-learning.
Emulation variables
Reinforcement learning adds two variables:
Time series
Emulation occurs at discrete time steps, so every time step has its own instance of these variables:
The emulation history is therefore a time series of tuples containing these four variables:
(S0, R0, π0, A0, S1, R1, π1, A1, ...)
Emulation model
Reinforcement learning also needs two more functions:
Emulation equation
The emulation equation is a time series model consisting of the initial conditions, as well as the model used for each time step.
The initial condition of all variables is the empty set:
The variables then evolve by applying the functions in sequence:
The text was updated successfully, but these errors were encountered: