From 461a0f6374f06402efe8a82093939e075d1bed79 Mon Sep 17 00:00:00 2001 From: Diego Ferigo Date: Sat, 10 Dec 2022 11:53:40 +0100 Subject: [PATCH 1/5] Address internal feedback --- Chapters/Part_1/chapter_1.tex | 39 ++++----- Chapters/Part_1/chapter_2.tex | 155 ++++++++++++++++++++++------------ Chapters/Part_1/chapter_3.tex | 68 ++++++++------- Chapters/Part_1/chapter_4.tex | 69 +++++++-------- Chapters/Part_2/chapter_5.tex | 16 ++-- Chapters/Part_2/chapter_6.tex | 24 ++++-- Chapters/Part_2/chapter_7.tex | 39 ++++----- Chapters/Part_2/chapter_8.tex | 97 ++++++++++----------- Chapters/epilogue.tex | 44 +++++----- Chapters/prologue.tex | 20 ++--- FrontBackmatter/abstract.tex | 2 +- FrontBackmatter/contents.tex | 3 + 12 files changed, 323 insertions(+), 253 deletions(-) diff --git a/Chapters/Part_1/chapter_1.tex b/Chapters/Part_1/chapter_1.tex index 0012a7f..8dad428 100644 --- a/Chapters/Part_1/chapter_1.tex +++ b/Chapters/Part_1/chapter_1.tex @@ -11,8 +11,8 @@ \section{Simulators for robotics} In the robotics domain, rigid-body simulators became a standard component included in any practitioner's toolbox. Simulators allow studying the properties of robotics systems starting from their early design stage, anticipating possible errors that could lead to delays and failures, together with their corresponding cost. -Limiting the domain to robot control, simulations became an important technology enabler thanks to the possibility of designing and prototyping algorithms on a model that captures the principal dynamics of the real system. -Other important benefits include the execution in a controlled and repeatable environment, not subject to tear and wear, and not being vulnerable to costly damage in case of design failures. +Restricting the domain to robot control, simulations became an important technology enabler thanks to the possibility of designing and prototyping algorithms on a model that captures the principal dynamics of the real system. +Other important benefits include the execution in a controlled and repeatable environment, not subject to wear and tear, and not being vulnerable to costly damage in case of design failures. This section describes the high-level software architecture of a robot simulator. We introduce and describe the main components existing simulators typically implement, and then identify a list of properties that could be used as a means of comparison. @@ -27,8 +27,8 @@ \subsection{Components} Accordingly to the robot modelling provided in the previous chapter, a robot can be described as a set of \emph{links} (or bodies) interconnected by a set of \emph{joints}, that apply motion constraints and can provide actuation. Simulating a robot requires knowledge of its \emph{kinematics}, encoding the topology and geometry of the links and joints, and its \emph{dynamics}, encoding their physical properties. This information is usually provided in textual form by a structured \emph{robot description}. -Examples of common descriptions are the \ac{URDF}, \ac{SDF}, and \ac{MJCF}. -These descriptions also typically include additional information about the robot, for example, how its visual appearance is rendered, how it collides with either itself or other simulated robots, the type and location of the on-board sensors, the joint actuation type and limits, \etc +Examples of common descriptions are the \ac{URDF}, \ac{SDF}, \ac{MJCF}, \ac{USD}. +These descriptions also typically include additional information about the robot, for example, how its visual appearance is rendered, how it collides with either itself or other simulated objects, the type and location of the on-board sensors, the joint actuation type and limits, \etc Simulators typically support one or more of these description formats, allowing them to import robots and create scenes where they can operate. The \emph{description parser} is the component that reads the supported textual descriptions and imports the robot properties into the simulation. @@ -37,9 +37,9 @@ \subsection{Components} The central component of a simulator is its \emph{physics engine}, responsible for implementing the behaviours governed by the physics laws of motion. It uses the information parsed from the robot description to predict how the dynamics evolve over time. Depending on how the joint constraints are simulated, we can categorise physics engines in \emph{maximal coordinates} and \emph{reduced coordinates}. -Using maximal coordinates, each simulated body is treated separately in the Cartesian space, and the overall robot's dynamics is computed by solving an optimisation problem that applies explicit constraints to enforce the effects of the joints. +Using maximal coordinates, each simulated body is treated separately in the Cartesian space, and the overall robot's dynamics is computed by solving an optimisation problem that applies explicit constraints to enforce on the kinematics the effects of the joints. Instead, using reduced coordinates, the system dynamics considers the mechanical structure as a whole and it implicitly enforces motion constraints induced by the joints. -The physics engine usually also includes routines of \emph{collision detection} that, exploiting geometrical properties of the robot's collisions shapes, allow to assess if bodies are in contact, and \emph{contact models}, that compute interaction forces between colliding bodies. +The physics engine usually also includes routines of \emph{collision detection} that, exploiting geometrical properties of the link's collisions shapes, allow to assess if bodies are in contact, and \emph{contact models}, that compute interaction forces between colliding bodies. A general-purpose simulator either implements or interfaces with at least one physics engine, and it is not uncommon to find simulators exposing multiple physics engines. \paragraph{Public APIs} @@ -51,8 +51,8 @@ \subsection{Components} \paragraph{Sensors} -The most advanced simulators include the possibility to generate data from virtual sensors that mimic the behaviour of those mounted on the real robot, like \acp{IMU}, cameras, force-torque sensors, \etc -Often, in order to reduce differences with the actual setup, they also allow adding noise to sensor readings. +The most advanced simulators include the possibility to generate data from virtual sensors that mimic the behaviour of those mounted on the real robot, like \acp{IMU}, cameras, \ac{F/T} sensors, \etc +Often, in order to reduce differences with the actual setup, they might allow injecting noise to sensor readings. \paragraph{Rendering} @@ -79,13 +79,13 @@ \subsection{Properties} \paragraph{Reproducibility} -A simulator is reproducible if different executions of a scene starting from the same initial state and applying the same inputs yield exactly the same final state. +A simulator is reproducible if consecutive executions of a scene starting from the same initial state and applying the same inputs yield exactly the same trajectory and final state. The main component that typically undermines reproducibility is a subtle consequence of the client-server architecture widely used by many simulators. -Often, the physics engine and the user code that reads and writes values reside on different threads or processes. +Often, the physics engine and the user code that read data and sends commands reside on different threads or processes. The communication between them relies on sockets whose processes, depending on the system's load and the operating system's scheduler, can be preempted. Without complex synchronisation protocols, the user code might think to have stepped the simulator and read the most recent measurement even if the data might have been buffered. -Therefore, even if the underlying physics engines, when called programmatically, would provide reproducible trajectories, simulators using socket-based protocols could lose their reproducibility. +Therefore, even if the underlying physics engines, when called programmatically, would provide reproducible trajectories, simulators exposing their functionality using socket-based protocols could affect their reproducibility. \paragraph{Parallel Simulation} @@ -97,7 +97,7 @@ \subsection{Properties} The ratio between real and simulated time is known as \ac{RTF}. A \ac{RTF} of 1 means that one simulated second is executed in $1$ real-world second, and a \ac{RTF} of 2 means that the same simulated second is executed in $0.5$ real-world seconds. -Considering the typical usage for traditional robotic applications, many simulators aim to achieve a \ac{RTF} of 1 then executed with all their features, rendering included. +Considering the typical usage for traditional robotic applications, many simulators aim to achieve a \ac{RTF} of 1 when executed with all their features, rendering included. Usually, the \ac{RTF} value defaults to 1 even when the simulation could run faster, and it is a configurable simulation parameter. \paragraph{Headless Simulation} @@ -107,6 +107,7 @@ \subsection{Properties} The physics can be executed in both settings regardless. However, not all simulators can render the scene on a headless setup when rendering is involved. +\pagebreak \section{Enabling technologies} In this section, we describe the technology enablers that made developing the experiments presented in this thesis possible. @@ -116,15 +117,15 @@ \subsection{Gazebo Sim} Gazebo~\parencite{koenig_design_2004}, developed by Open Robotics, is among the most used and widely adopted simulators by the robotics community. It interfaces with multiple physics engines like ODE\footnote{\url{https://www.ode.org}}, bullet~\parencite{coumans_pybullet_2016}, and DART~\parencite{lee_dart_2018}. It supports loading descriptions of robots defined either with the \ac{URDF} or the \ac{SDF}. -It also supports the simulation of a wide range of commonly used sensors like \acp{IMU}, cameras, Force/Torque sensors, \etc +It also supports the simulation of a wide range of commonly used sensors like \acp{IMU}, cameras, \ac{F/T} sensors, \etc -In the Artificial Mechanical laboratory, we always based our simulation stack on Gazebo. -Over the years, we developed the model description of our robots and built an entire infrastructure for this simulator. +In the Artificial Mechanical Intelligence\footnote{\url{https://ami.iit.it/}} laboratory, we have always based our simulation stack on Gazebo. +Over the years, we developed the model description of our robots\footnote{\url{https://github.com/robotology/icub-models}} and built an entire infrastructure\footnote{\url{https://github.com/robotology/gazebo-yarp-plugins}} for this simulator. However, while previous attempts\footnote{\url{http://wiki.ros.org/openai_ros}}~\parencite{zamora_extending_2017,lopez_gym-gazebo2_2019} tried to integrate Gazebo into a \ac{RL} training pipeline, the performance that could be obtained together with the need to execute it in a separated process, always prevented its wide adoption by the robot learning community. After more than 15 years of development, Open Robotics started the development of a new simulator, representing the next generation of Gazebo, that from now we will call \emph{Gazebo Classic} for the sake of clarity. The new simulator, initially known as Ignition Gazebo and later rebranded as \emph{Gazebo Sim}, is a modular suite of libraries partially extracted from the monolithic architecture of its predecessor. -Gazebo Sim, contrarily to its old generation, offers programmatic \acp{API} to instantiate and step the simulator, enabling users to obtain a finer control of the simulation cycle. +Gazebo Sim, contrarily to its predecessor, offers programmatic \acp{API} to instantiate and step the simulator, enabling users to obtain a finer control of the simulation cycle. One of the simulation architectures presented in this thesis is based on Gazebo Sim. A more detailed overview of the features that motivated the adoption of the simulator and why they represent a valid choice for the contributed architecture will be discussed in more detail in Section~\ref{sec:scenario_gazebo}. @@ -141,7 +142,7 @@ \subsection{The iCub humanoid robot} For the work presented in this thesis, only details about its description for simulation purposes are necessary to be specified. The kinematics, the inertial parameters, and the visual meshes of the robot are automatically generated\footnote{\url{https://github.com/robotology/icub-models-generator}} from its CAD design. Their accuracy has always been sufficient for developing highly-dynamic balancing controllers~\parencite{pucci_highly_2016} and walking algorithms~\parencite{dafarra_control_2018}. -The official \ac{URDF} models\footnote{\url{https://github.com/robotology/icub-models/}} have been slighlty updated\footnote{\url{https://github.com/ami-iit/gym-ignition-models}} to be imported in Gazebo Sim. +The official \ac{URDF} models\footnote{\url{https://github.com/robotology/icub-models/}} have been slightly updated\footnote{\url{https://github.com/ami-iit/gym-ignition-models}} to be imported in Gazebo Sim. \subsection{JAX} @@ -162,7 +163,7 @@ \subsection{JAX} \item Any logic operating on \jax arrays can be parallelized on the target hardware accelerator thanks to the native support of \emph{auto-vectorization}. This feature enables the development of algorithms that can be seamlessly scaled horizontally by just providing inputs with an additional dimension representing the batch axis. -To maximize performance, vectorized code can also be combined with \ac{JIT} compilation. +To maximise performance, vectorized code can also be combined with \ac{JIT} compilation. \item Any logic operating on \jax arrays can be automatically differentiated using either forward or reverse mode~\parencite{blondel_efficient_2022}. @@ -173,4 +174,4 @@ \subsection{JAX} \noindent At the time of writing, the most advanced \ac{XLA} backends are those targeting \acp{TPU} and \acp{GPU}. -The \ac{JIT} compilation of code for \acp{CPU}, depending on the complexity of the logic, could take a long time. +The \ac{JIT} compilation of code for \acp{CPU}, depending on the complexity of the logic, could take a long time as it can use just one compilation thread. diff --git a/Chapters/Part_1/chapter_2.tex b/Chapters/Part_1/chapter_2.tex index c91ca83..a9d165e 100644 --- a/Chapters/Part_1/chapter_2.tex +++ b/Chapters/Part_1/chapter_2.tex @@ -13,7 +13,7 @@ \chapter{Robot Modelling} The purpose of the resulting floating-base model is twofold. Firstly, assuming we know the kinematics and dynamics properties of a real robot, it allows us to compute many relevant quantities required by control algorithms. Secondly, it enables the definition of a dynamic system that can be used to create and perform simulations of real robots operating in an environment. -We also provide the necessary to develop widely used \ac{RBDA} that efficiently compute relevant quantities and the system's dynamics. +We also provide the necessary notions to develop widely used \ac{RBDA} that efficiently compute relevant quantities of the system's dynamics. We adopt the unified view of the \acp{EoM} proposed by \textcite{traversaro_unied_2017}, slightly adapted to employ the notation summarised in~\textcite{traversaro_multibody_2019}. Minor modifications are introduced to suit the simulation setting better. @@ -24,7 +24,7 @@ \section{Notation} \item The set of real numbers is denoted by $\mathbb{R}$. -Let $\mathbf{u}$ and $\mathbf{v}$ be two n-dimensional column vectors of real numbers, \ie $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$, then their inner product is denoted as $\mathbf{u}^\top \mathbf{u}$, with ${(\cdot)}^\top$ the transpose operator. +Let $\mathbf{u}$ and $\mathbf{v}$ be two n-dimensional column vectors of real numbers, \ie $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$, then their inner product is denoted as $\mathbf{u}^\top \mathbf{u}$, where ${(\cdot)}^\top$ is the transpose operator. \item The identity matrix of size $n$ is denoted by $\eye_n \in \realn^{n \times n}$; the zero column vector of dimension $n$ is denoted by $\zeros_n \in \realn^n$; the zero matrix of dimension $n \times m$ is denoted by $\zeros_{n \times m} \in \realn^{n \times m}$; the all-ones matrix of dimension $n \times m$ is denoted by $\ones_{n \times m} \in \realn^{n \times m}$. @@ -76,6 +76,20 @@ \section{Notation} % The vee operator is the inverse of the hat operator. +\item Given two 3D vectors $\mathbf{a}, \mathbf{b} \in \realn^3$, the hat operator can be used to compute their cross product: +% +\begin{equation*} + \mathbf{a} \times \mathbf{b} = \mathbf{a}^ \wedge \, \mathbf{b} = - \mathbf{b}^\wedge \, \mathbf{a} + . +\end{equation*} + +\item Given a vector $\mathbf{w} \in \realn^3$ and a matrix $\rot \in \SO(3)$, the following property of the hat operator holds: +% +\begin{equation*} + \left( \rot \mathbf{w} \right)^\wedge = \rot \mathbf{w}^\wedge \, \rot^\top + . +\end{equation*} + \item Given a vector $\velsix = (\vellin, \velang) \in \realn^6$ with $\vellin, \velang \in \realn^3$, we define % @@ -98,6 +112,7 @@ \section{Notation} % \end{itemize} +\pagebreak \section{Points, Frames, Rotations, Transformations} Let's consider a \emph{point} $\pos$. @@ -209,15 +224,18 @@ \section{Frame Velocity} , \end{equation*} % -where we have introduced the left $S_l = \rot^\top \rotd \in \so(3)$ and the right $S_r = \rotd \rot^\top \in \so(3)$. +where we have introduced the left $S_l = \rot^\top \rotd \in \so(3)$ and the right\linebreak $S_r = \rotd \rot^\top \in \so(3)$. Since both matrices are skew-symmetric, they can be parameterized by a vector $S^\vee = \velang_{A,B} \in \realn^3$. -The matrix product is not commutative, therefore the vectors for the left and right cases must be different: +The matrix product is not commutative, therefore the vectors of the left and right cases must be different: % -\begin{align*} +\begin{align} +\begin{split} + \label{eq:R_dot} \rotd &= \rot S_l = \rot \velang[B]_{A,B}^\wedge ,\\ \rotd &= S_r \rot = \velang[A]_{A,B}^\wedge \rot , -\end{align*} +\end{split} +\end{align} % where we have introduced the angular velocity $\velang_{A,B}$ between frames $A$ and $B$, expressed either in $A$ or $B$ coordinates. % @@ -260,7 +278,7 @@ \subsection{Left-trivialized velocity} \end{bmatrix} = \begin{bmatrix} \rot[A]_B^\top \orid[A]_B \\ - \rot[A]_B^\top \rotd[A]_B + \left( \rot[A]_B^\top \rotd[A]_B \right)^\vee \end{bmatrix} \in \realn^6 , @@ -314,7 +332,7 @@ \subsection{Right-trivialized velocity} \end{bmatrix} = \begin{bmatrix} \orid[A]_B - \rotd[A]_B \rot[A]_B^\top \ori[A]_B \\ - \rotd[A]_B \rot[A]_B^\top + \left( \rotd[A]_B \rot[A]_B^\top \right)^\vee \end{bmatrix} \in \realn^6 , @@ -333,7 +351,9 @@ \subsection{Right-trivialized velocity} % \begin{equation} \label{eq:vel_lin_inertial_fixed} - \vellin[A]_{A,B} = \orid[A]_B + \ori[A]_B^\wedge \velang[A]_{A,B} + \vellin[A]_{A,B} = + \orid[A]_B + \ori[A]_B^\wedge \velang[A]_{A,B} = + \orid[A]_B + \ori[A]_B \times \velang[A]_{A,B} , \end{equation} % @@ -363,19 +383,23 @@ \subsection{Expressing 6D velocities in different frames} . \end{equation} % +% Proof: +% https://www.wikiwand.com/en/Block_matrix#/Block_matrix_inversion +% https://math.stackexchange.com/a/1353960 +% Recalling that $\ori[A]_B = -\rot[A]_B \ori[B]_A$, it can be shown that the inverse velocity transformation is simply $\veladj[B]_A = \veladj[A]_B^{-1}$. \subsection{Mixed velocity representation} -We have seen that the angular component of the left- and right-trivialized velocities correspond to the classic concept of angular velocity, as introduced in Definition~\ref{definition:R_dot}. +We have seen that the angular component of the left- and right-trivialized velocities correspond to the classic concept of \emph{angular} velocity, as introduced in Definition~\ref{definition:R_dot}. The different representations only relate the angular velocity $\velang_{A,B}$ to reference frames, which could either be $A$ or $B$. -The linear velocity, instead, is less intuitive. -While it corresponds to the time derivative of $\ori_B$ in the left-trivialized representation, the expression in the right-trivialization includes an additional term as reported in Equation~\eqref{eq:vel_lin_inertial_fixed}. +The \emph{linear} velocity, instead, is less intuitive. +While it corresponds to the time derivative of $\ori_B$ in the left-trivialized representation, the expression in the right-trivialization includes an additional term as reported in Equation~\eqref{eq:vel_lin_inertial_fixed} to account for non-inertial effects. There are situations in which we desire to express the 6D velocity of a frame with just the time derivatives $\orid[A]_B$ and $\velang[A]_{A,B}$. We can obtain such special 6D velocity by introducing a new frame $B[A] = (\ori_B, [A])$, that is a frame whose origin coincides with the origin of frame $B$, with the orientation frame of $A$. -In this frame, we can define: +In this frame, we can define the \emph{mixed} velocity as follows: % \begin{equation} \label{eq:mixed_velocity} @@ -393,8 +417,6 @@ \subsection{Mixed velocity representation} \end{bmatrix} . \end{equation} -% -that will be referred as \emph{mixed} velocity. \subsection{Cross product on $\realn^6$} @@ -449,8 +471,8 @@ \subsection{Accelerations} , \end{equation} % -where it can be noticed that in this case $\velsixd[C]_{A,B} \neq \transvel[C]_B \velsixd[B]_{A,B}$. -However, either if $C=A$ or $C=B$, the cross-product term is zero, and the equality holds. +where it can be noticed that in this case the expected mnemonic-friendly form $\velsixd[C]_{A,B} = \transvel[C]_B \velsixd[B]_{A,B}$ does \emph{not} hold. +However, if either $C=A$ or $C=B$, the cross-product term is zero, and the equality holds. Under these conditions, we can define the following acceleration: % \begin{equation*} @@ -479,7 +501,7 @@ \subsection{Accelerations} \subsection{Forces} -Let's consider a 6D force $\forcesix$ composed by stacking a linear a linear force $\forcelin \in \realn^3$ and an angular force $\forceang \in \realn^3$. +Let's consider a 6D force $\forcesix$ composed by stacking a linear force $\forcelin \in \realn^3$ and a torque $\forceang \in \realn^3$. Its coordinates \wrtl a frame $B$ are denoted as: % \begin{equation*} @@ -532,10 +554,10 @@ \subsection{Forces} . \end{equation*} % -It can be noted that the application of $\forcesix[B]$ in a different frame $A$, beyond rotating its components $(\forcelin[B], \forceang[B])$ with $\rot[A]_B$, requires the introduction of an additional angular term proportional to $\forcelin[B]$. +It can be notices that the application of $\forcesix[B]$ to a different frame $A$, beyond rotating its components $(\forcelin[B], \forceang[B])$ with $\rot[A]_B$, requires the introduction of an additional angular term proportional to $\forcelin[B]$. This behaviour can be explained considering a simplified case of applying a pure linear force $\forcesix[B] = (\forcelin[B], \zeros_3)$ to the origin of frame $A$. Changing the application point would produce a torque due to the moment arm between the origins of frames $A$ and $B$. -Since the transformation should not alter the physical effect, $\transfor[A]^B$ introduces the additional term $\ori[A]_B^\wedge \left( \rot[A]_B \forcelin[B] \right)$ that compensates the moment arm. +Since the transformation should not alter the physical effect (for example, a resulting acceleration of the frame), $\transfor[A]^B$ introduces the additional term $\ori[A]_B^\wedge \left( \rot[A]_B \forcelin[B] \right)$ that compensates the moment arm. % \end{remark*} @@ -546,7 +568,7 @@ \subsection{Forces} , \end{equation*} % -where the last term is the matrix representation of the cross-product of $\realn^6$, defined as: +where the last term is the matrix representation of the cross-product in $\realn^6$, defined as: % \begin{equation*} \crossforsix[{\velsix[B]_{A,B}}] = @@ -608,6 +630,9 @@ \subsection{Inertial parameters} In this section, we introduce all the inertial parameters necessary to describe the dynamics of a rigid body. Given a body $B$, in order to simplify the notation, we denote the coordinates of a point $\pos_i$ belonging to the body and expressed in $B$ as $\mathbf{r} = \pos[B]_i$. +\begin{itemize} +% +\item The \emph{total mass} of the rigid body can be calculated by introducing the function $\rho(\cdot): \realn^3 \mapsto \realn^+$ that maps each point of the body to its density, and integrating over the volume occupied by the body: % \begin{equation*} @@ -616,6 +641,7 @@ \subsection{Inertial parameters} . \end{equation*} +\item The \ac{CoM} of the body can be calculated as the average point of its density: % \begin{equation} @@ -627,15 +653,29 @@ \subsection{Inertial parameters} . \end{equation} +\item The \emph{inertia tensor} of the body, describing all moments of inertia of a body rotating around a specific axis, can be computed as: % \begin{equation*} - I = -\iiint_{\realn^3} \rho(\mathbf{r}) (\mathbf{r}^\wedge)^2 \dd{\mathbf{r}} + I = -\iiint_{\realn^3} \rho(\mathbf{r}) \left( \mathbf{r}^\wedge \right)^2 \dd{\mathbf{r}} \in \realn^{3\times3} - . + , \end{equation*} +% +resulting from the following computation of the body \emph{angular momentum} $\mathbf{h}^\omega \in \realn^3$: +% +\begin{align*} + \mathbf{h}^\omega = I \velang + &= \iiint_{\realn^3} \rho(\mathbf{r}) \left( \mathbf{r} \times \vellin \right) \dd{\mathbf{r}} + = \iiint_{\realn^3} \rho(\mathbf{r}) \left( \mathbf{r} \times \velang \times \mathbf{r} \right) \dd{\mathbf{r}} \\ + &= \iiint_{\realn^3} \rho(\mathbf{r}) \, \mathbf{r}^\wedge \left( \velang^\wedge \mathbf{r} \right) \dd{\mathbf{r}} + = -\iiint_{\realn^3} \rho(\mathbf{r}) \, \mathbf{r}^\wedge \left( \mathbf{r}^\wedge \velang \right) \dd{\mathbf{r}} \\ + &= -\iiint_{\realn^3} \rho(\mathbf{r}) \left( \mathbf{r}^\wedge \right)^2 \dd{\mathbf{r}} \, \velang + . +\end{align*} -Finally, the \emph{6D inertia matrix} of the body that unifies all the previous inertial properties can be defined as follows: +\item +The \emph{6D inertia matrix} of the body that unifies all the previous inertial properties can be defined as follows: % \begin{equation*} \masssix = @@ -647,8 +687,10 @@ \subsection{Inertial parameters} . \end{equation*} +\end{itemize} + \begin{remark*} - When we need to denote multiple inertia matrices, we use a subscript $\masssix_{B1}, \masssix_{B2},$ \etc + When we need to denote inertia matrices of different bodies, we use subscripts $\masssix_{B1}, \masssix_{B2},$ \etc Note that the 6D inertia matrix definition is valid only in body frame, and we should have specified it with an additional prescript $\masssix[B]_B$. In this thesis, we only need 6D inertia matrices expressed in body frames, therefore we will always omit the prescript. \end{remark*} @@ -719,12 +761,12 @@ \subsection{Equations of Motion} , \end{equation} % -where we used the relation $\homod = \homo \velsix^\wedge$ of Equation~\eqref{eq:sixd_velocity_body_fixed}, and introduced the \emph{trivialized kinetic energy} $k$ that can be shown to only depend on $\velsix$. +where we used the relation $\homod = \homo \velsix^\wedge$ of Equation~\eqref{eq:sixd_velocity_body_fixed}, and introduced the \emph{trivialized kinetic energy} $k(\cdot)$ that can be shown to only depend on $\velsix$. The trivialized kinetic energy and the potential energy can be computed as: % \begin{align} \label{eq:kinetic_and_potential_energies} - k(\velsix) &= \frac{1}{2} \velsix^T \masssix \velsix , \\ + k(\velsix) &= \frac{1}{2} \velsix^\top \masssix \velsix , \\ U(\homo) &= \begin{bmatrix}\gravity^\top & \zeros_{1\times3}\end{bmatrix} m \homo @@ -778,10 +820,10 @@ \subsection{Equations of Motion} % we can simplify this equation using the proper acceleration: % -\begin{equation*} +\begin{equation} \masssix_B {}^B \bar{\mathbf{a}}_{W,B} + \velsix[B]_{W,B} \bar{\times}^* \masssix_B \velsix[B]_{W,B} = \forcesix[B]^{ext} . -\end{equation*} +\end{equation} % This equation will be useful in the definition of recursive algorithms for rigid body dynamics. % @@ -802,12 +844,12 @@ \section{Joint Model} The second fundamental element for modelling a multibody system is the \emph{joint}. All links part of the system are characterised by 6~\ac{DoF} that, ignoring for the moment possible collisions that could occur, are free to evolve in space independently of each other. Joints can be used to connect links together and act as constraints that limit their relative motion. -Each joint is characterised by its number of \aclp{DoF}, which can range from 0 to 6 and, considering the relative position between two links as a local topological space, describes its dimension. +Each joint is characterised by its number of \acp{DoF}, which can range from 0 to 6 and, considering the relative position between two links as a local topological space, describes its dimension. \begin{assumption*} % This thesis only considers multibody systems modelled with 1~\ac{DoF} joints. -All the theories and algorithms proposed in the following chapters can be extended with minor modifications to other less common joint types~\parencite{featherstone_rigid_2008}. +All the theories and algorithms proposed in the following chapters can be extended with minor modifications to other less common multi-\ac{DoF} joint types~\parencite{featherstone_rigid_2008}. % \end{assumption*} @@ -846,7 +888,7 @@ \section{Joint Model} % where $\dot{\shape} \in \realn$ is the \emph{joint velocity}. Also in this case, it will be convenient to express the relative velocity as a 6D velocity. -If $X$ is a placeholder that selects any of the velocity representations introduced in the previous sections, we express the relative velocity between links $P$ and $C$ as follows: +If $X$ is a placeholder that selects any of the velocity representations introduced in Section~\ref{sec:frame_velocities}, we express the relative velocity between links $P$ and $C$ as follows: % \begin{equation} \label{eq:joint_model_velocity_propagation} @@ -873,7 +915,7 @@ \section{Joint Model} \begin{assumption*} % \label{assumption:subspace_independent_configuration} -In this thesis, we assume that motion subspaces are independent from the joint configuration, \ie $\dv{\subspacee[X]_{P,C}}{\shape} = \zeros_6$, from what $\subspacee[X]_{P,C}(\shape) = \subspacee[X]_{P,C}$ follows. +In this thesis, we assume that motion subspaces are independent from the joint configuration, \ie $\dv{\subspacee[X]_{P,C}}{\shape} = \zeros_6$, from which $\subspacee[X]_{P,C}(\shape) = \subspacee[X]_{P,C}$ follows. % \end{assumption*} @@ -886,7 +928,7 @@ \section{Joint Model} \end{remark} The relative acceleration between links $P$ and $C$ can be obtained by differentiating Equation~\eqref{eq:joint_model_velocity_propagation}. -Considering Assumption~\ref{assumption:subspace_independent_configuration} and Remark~\ref{remark:subspaces_independent_representation}, it can be shown the following resulting relation: +Considering Assumption~\ref{assumption:subspace_independent_configuration} and Remark~\ref{remark:subspaces_independent_representation}, it can be shown that the following resulting relation holds: % \begin{equation} \label{eq:joint_model_acceleration_propagation} @@ -924,7 +966,7 @@ \subsubsection{Kinematics and dynamics propagation} \begin{definition*}[Propagation of 6D Velocities] % Given the 6D velocity of the parent link $P$, and the joint velocity, we want to calculate the velocity of the child link $C$. -In $C$ coordinates, if $\velsix[P]_{W,C}$ is the body-fixed velocity of the parent link, and $\velsix[C]_{P,C}$ is the 6D velocity induced by the joint motion, we can write the following relation: +In $C$ coordinates, if $\velsix[P]_{W,P}$ is the body-fixed velocity of the parent link, and $\velsix[C]_{P,C}$ is the 6D velocity induced by the joint motion, we can write the following relation: % \begin{equation*} \velsix[C]_{W,C} @@ -941,7 +983,7 @@ \subsubsection{Kinematics and dynamics propagation} \label{definition:propagation_accelerations} % Given the 6D acceleration of the parent link $P$, and the joint acceleration, we want to calculate the acceleration of the child link $C$. -In $C$ coordinates, if $\velsixd[P]_{W,C}$ is the body-fixed apparent velocity of the parent link, and $\velsixd[C]_{P,C}$ is the 6D apparent velocity induced by the joint motion, we can write the following relation for the intrinsic acceleration: +In $C$ coordinates, if $\velsixd[P]_{W,P}$ is the body-fixed apparent acceleration of the parent link, and $\velsixd[C]_{P,C}$ is the 6D apparent acceleration induced by the joint motion, we can write the following relation for the intrinsic acceleration: % \begin{equation*} \accsix[C]_{W,C} = \velsixd[C]_{W,C} @@ -1076,7 +1118,7 @@ \subsection{Generalised position and velocity} The configuration of a free-floating mechanical system can be modelled as the set formed by the poses of all links. However, the existence of the joints that induce motion constraints enables to determine the system configuration as a pair composed of the pose of a \emph{base link} and the generalised joints positions. These two modelling choices are known, respectively, as \emph{maximal coordinates} and \emph{reduced coordinates}. -In this thesis, we focus on the case of reduced coordinates, since it enables the application of efficient algorithms~\parencite{featherstone_rigid_2008} to operate on the system's kinematics and dynamics. +In this thesis, we focus on the case of reduced coordinates, since it enables the application of efficient iterative algorithms~\parencite{featherstone_rigid_2008} to operate on the system's kinematics and dynamics. Furthermore, interesting properties of the mathematical model that can be computed in reduced coordinates can be exploited for designing control systems. In reduced coordinates, we can formalise the generalised position and the generalised velocity of the floating-base multibody system as follows: @@ -1090,7 +1132,8 @@ \subsection{Generalised position and velocity} % where we introduced the \emph{joint positions} $\Shape \in \realn^n$, also called \emph{shape}. Under the assumption of having only 1-\ac{DoF} joints, $n$ is the overall number of internal \aclp{DoF} of the system, matching the number of joints $n_J$. -Note that this limitation can be removed, and a more general formulation can be found in \parencite{featherstone_rigid_2008}. +Note that this limitation can be removed. +A more general formulation can be found in \parencite{featherstone_rigid_2008}. Similarly to what we observed for a single rigid body, it can be more convenient to represent the system's velocity as a column vector: % @@ -1103,12 +1146,12 @@ \subsection{Generalised position and velocity} , \end{equation*} % -where $\velsix[X]_{A,B}$ is the velocity of the base link, $\Shaped \in \realn^n$ are the \emph{joint velocities}, and the generic frame $X$ is a placeholder to select one among the \emph{body-fixed} $X=B$, \emph{inertial-fixed} $X=W$, or \emph{mixed} $X=B[W]$ representations. +where $\velsix[X]_{W,B}$ is the velocity of the base link, $\Shaped \in \realn^n$ are the \emph{joint velocities}, and the generic frame $X$ is a placeholder to select one among the \emph{body-fixed} $X=B$, \emph{inertial-fixed} $X=W$, or \emph{mixed} $X=B[W]$ representations. \subsection{Kinematics} -In this section, we describe how we can relate the pose $\homo[W]_E$ and the velocity $\velsix_{W,E}$ between the world frame and a generic link $E$ of the multibody mechanical structure, commonly called also \emph{kinematic chain}, with the generalised position $\Qu$ and generalised velocity $\Nu$ of the system. -The link $E$ can be thought of as the \emph{end-effector} frame, even if it applies to any generic link $L$ of the model and, more generally, any frame rigidly attached to one of the links. +In this section, we describe how we can relate the pose $\homo[W]_E$ and the velocity $\velsix_{W,E}$ between the world frame $W$ and a generic link $E$ of the multibody mechanical structure with the generalised position $\Qu$ and generalised velocity $\Nu$ of the system. +The link $E$ can be thought of as the \emph{end-effector} frame, even if it applies to any generic link $L \in \mathcal{L}$ of the model and, more generally, any frame rigidly attached to any link. \subsubsection{Link pose} @@ -1146,11 +1189,11 @@ \subsubsection{Link velocity} \end{equation*} % where we introduced $\jac_{W,E} \in \realn^{6\times(6+n)}$ as the \emph{floating-base Jacobian} of link $E$. -We can notice that two different velocity representations characterie this relation, denoted by the $X$ and $Y$ placeholder frames: $X$ related to the input system velocity $\Nu[X]$, and $Y$ related to the output link velocity $\velsix[Y]_{W,E}$. -We will show the derivation of the \emph{left-trivialized Jacobian} $\jac[Y]_{W,Y/X}$, and then introduce the appropriate transformations to change the two representations. +We can notice that two different velocity representations characterise this relation, denoted by the $X$ and $Y$ placeholder frames: $X$ is related to the input system velocity $\Nu[X]$, and $Y$ is related to the output link velocity $\velsix[Y]_{W,E}$. +We will show the derivation of the \emph{left-trivialized Jacobian} $\jac[Y]_{W,Y/X}$, and then introduce the appropriate transformations to change the representations of $X$ and $Y$. The velocity of link $E$ \wrt the world frame can be computed differentiating Equation~\eqref{eq:multibody_kinematics_pose}. -Instead of proceeding with this calculation, we follow the equivalent approach of decomposing the velocity $\velsix_{W, E}$ as the sum of the base velocity and the velocity between the base and the link $E$: +Instead of proceeding with this calculation, we follow the equivalent approach of decomposing the velocity $\velsix_{W, E}$ as the sum of the base velocity and the velocity between the base $B$ and the link $E$: % \begin{equation*} \velsix[E]_{W,E} = \velsix[E]_{W,B} + \velsix[E]_{B,E} @@ -1220,7 +1263,7 @@ \subsubsection{Link velocity} \label{remark:jacobian_and_6d_forces} % The duality between 6D velocities and 6D forces also propagates to the definition of the floating-base Jacobian. -If we consider a 6D force $\forcesix[C_i]_i$, that could be for example the force applied to the frame $C_i$ associated to the contact point $i$ of link $L$ which pose is defined by a transform $\homo[L]_{C_i}$, we can use Equation~\eqref{eq:relative_jacobian} to compute its projection to the floating-base configuration space $\jac[C_i]_{W,L/X}^\top(\Qu) \forcesix[C_i]_i = (\transvel[C_i]_L \jac[L]_{W,L/X}(\Qu))^\top \forcesix[C_i]_i \in \realn^{6+n}$. +If we consider a 6D force $\forcesix[C_i]_i$, that could be for example the force applied to the frame $C_i = (\ori_{C_i}, [W])$ associated to the contact point $i$ of link $L$ which pose is defined by a transform $\homo[L]_{C_i}$, we can use Equation~\eqref{eq:relative_jacobian} to compute its projection to the floating-base configuration space as $\jac[C_i]_{W,L/X}^\top(\Qu) \forcesix[C_i]_i = (\transvel[C_i]_L \jac[L]_{W,L/X}(\Qu))^\top \forcesix[C_i]_i \in \realn^{6+n}$. This verbose notation results particularly useful in this case, because in the example of a rolling contact, the frame $C_i$ is time-varying, and the most appropriate contact Jacobian is $\jac[C_i]_{W,L}$ instead of $\jac[C_i]_{W, C_i}$, regardless of the system's velocity representation $X$. % \end{remark} @@ -1236,8 +1279,8 @@ \subsection{Dynamics} l(\Qu, \Nu) &= k(\Qu, \Nu) - U(\Qu), \\ k(\Qu, \Nu) &= \frac{1}{2} \sum_{L \in \mathcal{L}} \velsix[L]_{W,L}^\top \masssix_L \velsix[L]_{W,L}, \\ U(\Qu) &= - \sum_{L \in \mathcal{L}} \begin{bmatrix}\gravity[W]^\top & \zeros_{1\times3}\end{bmatrix} - \homo[W]_L - \begin{bmatrix}m_L {}^L\mathbf{c} \\ m_L \end{bmatrix} + m_L \, \homo[W]_L + \begin{bmatrix} {}^L\mathbf{c} \\ 1 \end{bmatrix} . \end{align*} % @@ -1246,7 +1289,7 @@ \subsection{Dynamics} \begin{align*} k(\Qu, \Nu) &= \frac{1}{2} \Nu^\top M(\Qu) \Nu, \\ U(\Qu) &= -\begin{bmatrix}\gravity^\top & \zeros_{1\times3}\end{bmatrix} - m \homo[W]_B + m \, \homo[W]_B \begin{bmatrix}{}^B\mathbf{c}(\Shape) \\ 1 \end{bmatrix} , \end{align*} @@ -1265,7 +1308,7 @@ \subsection{Dynamics} , \end{equation*} % -and ${}^B\mathbf{c}(\Shape) \in \realn^3$ is the \ac{CoM} of the mechanical system: +and ${}^B\mathbf{c}(\Shape) \in \realn^3$ is its \ac{CoM} expressed in the coordinates of the base frame $B$: % \begin{equation*} \begin{bmatrix}{}^B\mathbf{c}(\Shape) \\ 1\end{bmatrix} = @@ -1273,10 +1316,10 @@ \subsection{Dynamics} . \end{equation*} % -We introduced $\jac_L(\Qu) = \jac[L]_{W,L/B}$ as the floating-base left-trivialized Jacobian of link $L$ for left-trivialized velocities $\Nu[B]$, as defined in Equation~\eqref{eq:jacobian_left_trivialized}. +We introduced $\jac_L(\Qu) = \jac[L]_{W,L/B}$ as the floating-base left-trivialized Jacobian of link $L$ for left-trivialized generalised velocities $\Nu[B]$, as defined in Equation~\eqref{eq:jacobian_left_trivialized}. -The \aclp{EoM} of the multibody system, considering that its configuration is an element of $\SE(3)\times\realn^n$, can be obtained by applying the Hamel equations~\parencite{marsden_jerrold_e_introduction_2010, maruskin_dynamical_2018}, that can be seen as the combination of the Euler-Poincarè equation for the base variables in $\SE(3)$ and the classical Euler-Lagrange equation for the joint variables in $\realn^n$. -The left-trivialized Lagrangian plugged into the Hamel equations gives the \acp{EoM} of the multibody system: +The \aclp{EoM} of the multibody system, considering that its configuration $\Qu$ is an element of $\SE(3)\times\realn^n$, can be obtained by applying the Hamel equations~\parencite{marsden_jerrold_e_introduction_2010, maruskin_dynamical_2018}, that can be seen as the combination of the Euler-Poincarè equation for the base variables in $\SE(3)$ and the classical Euler-Lagrange equation for the joint variables in $\realn^n$. +The left-trivialized Lagrangian plugged into the Hamel equations gives the \acp{EoM} of the multibody system~\parencite[Appendix~A.4]{traversaro_modelling_2017}: % \begin{equation} \label{eq:equation_of_motion_multibody} @@ -1349,7 +1392,7 @@ \subsection{Change of base variables} \begin{equation*} \begin{cases} \dot{\tilde{\Qu}} = \left( \homod(\Qu, \Nu) \homo_T(\Shape) + \homo(\Qu) \dot{\homo}_T(\Shape, \Shaped), \Shaped \right) \\ - \tilde{M}(\tilde{\Qu})\dot{\tilde{\Nu}} + \tilde{C}(\tilde{\Qu}, \tilde{\Nu}) \tilde{\Nu} + \tilde{g}(\tilde{\Qu}) = B \Torques + \sum_{\mathcal{L}} \tilde{J}_L(\Qu) \forcesix[L]_L^{ext} + \tilde{M}(\tilde{\Qu}) \dot{\tilde{\boldsymbol{\nu}}} + \tilde{C}(\tilde{\Qu}, \tilde{\Nu}) \tilde{\Nu} + \tilde{g}(\tilde{\Qu}) = B \Torques + \sum_{\mathcal{L}} \tilde{J}_L(\Qu) \forcesix[L]_L^{ext} \end{cases} , \end{equation*} @@ -1361,10 +1404,12 @@ \subsection{Change of base variables} \begin{aligned} \tilde{M} &= T^{-\top} M T^{-1} ,\\ \tilde{C} &= T^{-\top} \left( M \dv{t}(T^{-1}) + CT^{-1} \right) ,\\ - \tilde{g} &= T^{-\top} G ,\\ - \tilde{J}_L &= J_L T. + \tilde{g} &= T^{-\top} g ,\\ + \tilde{J}_L &= J_L T, \end{aligned} \end{equation} +% +and omitted their dependencies to improve readability. \begin{remark} % diff --git a/Chapters/Part_1/chapter_3.tex b/Chapters/Part_1/chapter_3.tex index bb7f3c6..7c089f1 100644 --- a/Chapters/Part_1/chapter_3.tex +++ b/Chapters/Part_1/chapter_3.tex @@ -8,14 +8,14 @@ \chapter{Basics of Reinforcement Learning} \acl{RL} operates on a unified setting decoupled as two systems interacting sequentially over time, illustrated in Figure~\ref{fig:rl_setting}. The \emph{environment} is the world the \emph{agent} interacts with. -Differently from other domains of \acl{ML}, the learned \ac{RL} policy generates actions that may affect not only the current instant (immediate reward), but also the new configuration in which the environment transitions and its corresponding rewards. +Differently from other domains of \ac{ML}, the learned \ac{RL} policy generates actions that may affect not only the current instant (immediate reward), but also the new configuration in which the environment transitions and its corresponding reward. The trial-and-error nature together with delayed rewards, which give rise to the \emph{credit assignment problem}, are two among the essential features that characterise \acs{RL}~\parencite{sutton_reinforcement_2018}. This chapter, based on the theory and notation from~\textcite{achiam_spinning_2018} and \textcite{dong_deep_2020}, describes in greater detail the \acl{RL} setting, introduces the terminology and the mathematical framework used throughout the thesis, and presents algorithms to solve the corresponding reward maximisation problem. Despite being a generic setting from a high-level perspective, all the elements part of Figure~\ref{fig:rl_setting} may have different properties altering the formulation of the specific problem at hand. This thesis focuses on the application of \ac{RL} to the robotics domain, constraining the nature of environment modelling, the structure of the policies that can be learned, and the family of algorithms that can be employed. In particular, this chapter is aimed at introducing the theory of operating on environments modelled with unknown stochastic continuous dynamics, providing continuous rewards and receiving continuous actions generated by stochastic policies implemented as neural networks. -The setting adopting these policies is known as \ac{DRL}. +The setting adopting this family of policies is known as \ac{DRL}. We will use throughout the thesis the \ac{RL} and \ac{DRL} terminology interchangeably. \begin{figure} @@ -57,17 +57,17 @@ \subsection{Environment} \end{equation*} \acl{RL}, contrary to other approaches that find optimal policies in a comparable setting, assumes that both the state-transition function $\mathcal{P}$ and the reward function $\mathcal{R}$ are unknown. -It focuses on methods to learn how to act optimally only by sampling sequences of states-actions-rewards without assuming any knowledge of the process generating data. +It focuses on methods to learn how to act optimally only by sampling sequences of states-actions-rewards without assuming any knowledge of the process generating their data. \begin{remark*}[Episodic and Continuing tasks] % The iterative nature of \ac{RL} requires the environment to be sampled continuously over different \emph{episodes}. -An episode is defined as the environment evolution from an initial state $s_0$ to the termination. +An \emph{episode} is defined as the environment evolution from an initial state $s_0$ to its termination. We can identify two episode termination strategies, corresponding to the underlying \emph{task} -- the objective the reward is maximising. \emph{Episodic tasks} are characterised by a --possibly variable-- finite length $T$, \ie they do not continue after reaching a terminal state $s_T$. \emph{Continuing tasks}, instead, are characterised by an infinite horizon and they never terminate. On some occasions, it is useful considering episodic tasks as continuing tasks by assuming that in time steps $t > T$, the last state $s_T$ becomes a special \emph{absorbing state}, that is a state that can only transition to itself regardless to the action generating only rewards of zero. -On others, instead, it is useful to truncate the evolution of a continuing task at a certain time step~\parencite{pardo_time_2022}. +On other occasions, instead, it is useful to truncate the evolution of a continuing task at a certain time step~\parencite{pardo_time_2022}. % \end{remark*} @@ -85,7 +85,7 @@ \subsection{Environment} \subsection{Agent} The agent is the -- rather abstract -- entity of the \ac{RL} setting that learns and interacts with the environment with the objective of maximising the received reward signal. -In our setting, depending on the \ac{RL} algorithm selected for training, agents are at least composed of a function approximator used to generate the action, \ie a \emph{policy}, and a method for optimising the function given the observed states and received rewards. +In our setting, depending on the \ac{RL} algorithm selected for training, agents are at least composed of a function approximator used to generate the action, \ie a \emph{policy}, and a method for optimising this function given the observed states and received rewards. \subsection{Policy} @@ -118,7 +118,7 @@ \subsection{Policy} = \frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{1}{2}\frac{(x - \mu)^2}{\sigma^2} \right). \end{equation} % -We denote sampling a single real-valued random variable $x \in \mathbb{R}$ from this distribution as $x \sim \mathcal{N}(\mu, \sigma^2)$, where $\mu$ and $\sigma^2$ are, respectively, the \emph{mean} and the \emph{variance} of the distribution. +We denote sampling a single real-valued random variable $x \in \mathbb{R}$ from this distribution as $x \sim \mathcal{N}(\mu, \sigma^2)$, where $\mu$ and $\sigma^2$ are the \emph{mean} and the \emph{variance} of the distribution, respectively. We can use the same distribution also in the multivariate case, in which $x, \mu \in \realn^k$ and $\Sigma = \operatorname{diag}(\sigma_1, \sigma_2, \dots) \in \realn^{k\times k}$, where we assumed a diagonal covariance matrix parameterized with diagonal values denoted as $\sigma \in \realn^k$. A stochastic policy could be implemented as a parameterized multivariate diagonal Gaussian distribution. During the training phase, exploration is achieved by sampling from the distribution. @@ -188,7 +188,7 @@ \subsection{Trajectory} \begin{remark*}[Reward trajectory] % -The literature is not uniform on the time subscript of the immediate reward. +The literature is not uniform on the time subscript of the immediate reward $r_{(\cdot)}$. In this thesis, we consider the data-generating process illustrated in Figure~\ref{fig:data_generating_process_mdp}. The transition $t \rightarrow t+1$ corresponds to the data $(s_t, a_t) \rightarrow (r_t, s_{t+1})$. Note that the subscript of the reward is associated to the time $t$ of the tuple $(s_t, a_t)$ that generated it. @@ -228,6 +228,7 @@ \subsection{Return} \end{equation*} \begin{remark*}[Variance in the bias-variance trade-off] +\label{remark:variance_in_biasvariance} % During the training phase, many \ac{RL} algorithms try to estimate the expected return of each state of the environment. If we take multiple trajectories starting from the same state but evolving differently, the computed return of that state for each trajectory may differ considerably. @@ -241,7 +242,7 @@ \subsection{Return} \subsection{The Reinforcement Learning Problem} \label{section:reinforcement_learning_problem} -Combining of all these concepts allows defining the \ac{RL} problem more pragmatically. +The combination of all these concepts allows defining the \ac{RL} problem more pragmatically. In our setting, it is defined as follows: % \begin{displayquote} @@ -294,11 +295,11 @@ \subsection{Markov Decision Processes} we define as \ac{MDP} the tuple $\langle \mathcal{S}, \mathcal{A}, \mathcal{R}, \mathcal{P}, \mathcal{\rho}_0 \rangle$. % The two key features of the iterative \ac{RL} settings are correctly modelled by a \ac{MDP}. -Firstly, \ac{MDP} satisfy the \emph{Markov Property}. +Firstly, \acp{MDP} satisfy the \emph{Markov Property}. % \begin{definition*}[Markov Property] % -A stochastic process has the Markov property if the conditional probability distribution of future states of the process depends only upon the present state. +A stochastic process satisfies the Markov property if the conditional probability distribution of future states of the process depends only upon the present state. Given the present state, the future does not depend on the past ones. A process with this property is said to be Markovian or a Markov process. % @@ -306,13 +307,13 @@ \subsection{Markov Decision Processes} % It follows that the dynamics of a \ac{MDP} is uniquely modelled by $\mathcal{P}$. Given a state-action pair $(s_t, a_t) \in \mathcal{S} \times \mathcal{A}$, the probability of transitioning to the next state $s_{t+1}$ is $\mathcal{P}(s_{t+1} \given s_t, a_t)$. -Concerning the support of delayed rewards to solve the credit assignment problem, \acp{MDP} exploit \emph{value functions}. +Concerning the support of delayed rewards to solve the credit assignment problem, which represents the second key feature of iterative \ac{RL}, \acp{MDP} exploit \emph{value functions}. \subsection{Value functions} \label{section:value_functions} Given a \acl{MDP} $\mathcal{M}$, we can associate each state $s$ -- or state-action pair $(s, a)$ -- to a scalar value representing how rewarding it is for an agent to generate a trajectory passing through it. -Considering the objective of reward maximisation, a natural choice for this score is the expected return. +Considering the objective of reward maximisation, a natural choice for this score is the \emph{expected return}. We introduce the \emph{state-value function for policy $\pi$} as the following quantity: % \begin{equation} @@ -323,7 +324,7 @@ \subsection{Value functions} \end{equation} % It provides the expected return of a trajectory starting from $s$ and always acting following policy $\pi$. -The need to specify which is the active policy as superscript originates from the latter consideration. +This is also the reason why we need to specify the active policy with the superscript. Another important value function to introduce is the \emph{action-value function for policy $\pi$}: % @@ -356,7 +357,7 @@ \subsection{Value functions} % \begin{align} V^\pi(s) &= \E_{a \sim \pi} \left[ Q^\pi(s, a) \right] \label{equation:v_as_func_of_q}\\ - Q^\pi(s, a) &= \E_{s_{t+1} \sim \mathcal{P}} \left[ r_t + \gamma V^\pi(s_{t+1}) \given s_t=s, a_t=a \right] \label{equation:q_as_func_of_v} + Q^\pi(s, a) &= \E_{s_{t+1} \sim \mathcal{P}(\cdot \given s_t, a_t)} \left[ r_t + \gamma V^\pi(s_{t+1}) \given s_t=s, a_t=a \right] \label{equation:q_as_func_of_v} . \end{align} @@ -364,7 +365,7 @@ \subsection{Value functions} \label{remark:bias_in_biasvariance} % Many \ac{RL} algorithms operating on continuous spaces, during the training phase, try to fit a function to estimate the optimal value function (or functions). -Before reaching an acceptable convergence, the effect of using a wrong value function estimate that could affect training performance is called \emph{bias}, and it is the other element of the previously introduced \emph{bias-variance} trade-off that characterises many policy learning methods. +Before reaching an acceptable convergence, the effect of using a wrong value function estimate that could affect training performance is called \emph{bias}, and it is the other element of the previously introduced \emph{bias-variance} trade-off that characterises many policy learning methods (Remark~\ref{remark:variance_in_biasvariance}). % \end{remark*} @@ -382,7 +383,7 @@ \subsection{Value functions} \subsection{Bellman Equation} \label{section:bellman_equation} -An optimisation problem in discrete time like the \ac{RL} problem can be structured in a recursive form, \ie expressing the relationship between a quantity in one step and the same quantity in at the next one. +An optimisation problem in discrete time like the \ac{RL} problem can be structured in a recursive form, \ie expressing the relationship between a quantity in one step and the same quantity at the next one. If a problem can be structured in this form, the equation expressing the update rule between the two periods is known as \emph{Bellman equation}. The value functions for policy $\pi$ introduced in Section~\ref{section:value_functions} can be transformed into a recursive form by noticing that we can express the return as follows: @@ -419,7 +420,7 @@ \subsection{Model-free and Model-based} The formulation of the \ac{RL} problem provided in Section~\ref{section:reinforcement_learning_problem} outlines that the agent has no knowledge of environment details. The state-transition probability $\mathcal{P}(\cdot|s_t, a_t)$ and the reward function $\mathcal{R}(s_t, a_t, s_{t+1})$ are usually unknown. -The agent needs to explore the environment through trial-and-error while trying to learn how to maximise the collected rewards. +The agent needs to explore the environment through trial-and-error while trying to learn how to maximise the collected immediate rewards. There are, however, algorithms that assume a --possibly partial-- knowledge of environment details that can be advantageous in different circumstances. The first major categorisation separates the algorithms in \emph{model-free} and \emph{model-based} methods, depending on whether the agent has access to a model of the environment. @@ -472,14 +473,14 @@ \subsection{On-policy and off-policy} On-policy methods usually learn a stochastic policy and use it both as behaviour and target. These methods are mainly either policy-based or actor-critic. Under the assumption that the policy's stochasticity is sufficient for environment exploration, on-policy methods share the properties of policy-based methods. -Contrarily to off-policy methods, during training, they expect data to be generated from the same policy, therefore their sampling efficiency is usually low. +Contrarily to off-policy methods, during training, they expect data to be generated from the same policy, preventing the usage of old data acquired in previous optimisation epochs and therefore showing a lower sampling efficiency. However, the most popular algorithms in this family implement techniques based on \emph{importance sampling}, enabling multiple optimisation steps per on-policy batch, mitigating the need for newly-sampled trajectories. \section{Policy Optimization} -Most of the \acl{RL} algorithms used in robotics belong to the family of \emph{policy gradients} methods, \ie model-free policy-based off-policy methods. +Most of the \acl{RL} algorithms used in robotics belong to the family of \emph{policy gradients} methods, \ie model-free policy-based on-policy methods. In this section, we first derive how we can compute the gradient of the policy performance function $J(\pi_{\boldsymbol{\theta}})$, already introduced in Equation~\eqref{equation:policy_performance_function}, \wrt its parameterization $\boldsymbol{\theta}$ directly from the trajectories $\tau$. -Then, we introduce \acl{PPO}, a widely used algorithm that exploits a local approximation of this gradient to converge towards $\pi^*$. +Then, we introduce \ac{PPO}, a widely used algorithm that exploits a local approximation of this gradient to let $\pi_{\boldsymbol{\theta}}$ converge towards $\pi^*$. \subsection{Policy Gradient} @@ -591,10 +592,11 @@ \subsection{Generalized Advantage Estimation} Policy gradient methods are not uniquely defined by the final forms of Equation~\eqref{equation:policy_gradient_final} and Equation~\eqref{equation:policy_gradient_reward_to_go}. They are just two specific cases of a more general formulation expressed in the following form: % -\begin{equation*} +\begin{equation} + \label{equation:policy_gradient_generic} \nabla_{\boldsymbol{\theta}} J(\pi_{\boldsymbol{\theta}}) = \E_{\tau \sim \pi_{\boldsymbol{\theta}}} \left[ \sum_{t=0}^{\infty} \nabla_{\boldsymbol{\theta}} \log \pi_{\boldsymbol{\theta}}(a_t \given s_t) \Psi_t \right] . -\end{equation*} +\end{equation} % Among all possible choices of $\Psi_t$, the one yielding the lowest variance~\parencite{schulman_high-dimensional_2018} is the \emph{advantage function}, already introduced in Equation~\eqref{equation:advantage_function}: % @@ -602,7 +604,7 @@ \subsection{Generalized Advantage Estimation} \Psi_t = A^\pi(s_t, a_t) = Q^\pi(s_t, a_t) - V^\pi(s_t) . \end{equation*} % -Using the advantage as log-likelihood weights can be intuitively understood from its properties to describe the relative quality of actions. +Using the advantage as log-likelihood's weight can be intuitively understood from its properties to describe the relative quality of actions. In fact, if $A^\pi(s, a) > 0$, the policy gradient $\hat{g}$ pushes the parameters $\boldsymbol{\theta}$ towards an increased action likelihood $\pi_{\boldsymbol{\theta}}$, with the consequence of reinforcing the probability to choose again $a_t$ over other actions. % The main reason for the favourable properties regarding the variance reduction introduced by using the advantage function originates from integrating a reinforcement effect based on a quantity that expresses relative feedback instead of an absolute value that could be noisy. @@ -645,7 +647,7 @@ \subsection{Generalized Advantage Estimation} \subsection{Proximal Policy Optimization} \label{sec:ppo} -In the previous sections, we obtained the generic equation of the policy gradient, whose advantage-based form can be empirically estimated over a finite batch of samples as follows: +In the previous sections, we obtained the generic Equation~\eqref{equation:policy_gradient_generic} of the policy gradient, whose advantage-based form can be empirically estimated over a finite batch of samples as follows: % \begin{equation*} \hat{g} = \hat{\E}_t \left[ \nabla_{\boldsymbol{\theta}} \log \pi_{\boldsymbol{\theta}}(a_t \given s_t) \hat{A}_t \right] . @@ -663,8 +665,8 @@ \subsection{Proximal Policy Optimization} Often, this process generates steps too large in the policy's parameters space, leading to updates that destroy the previously learned behaviour. Authors of~\parencite{schulman_trust_2017} have proposed -a modification of Equation~\eqref{equation:policy_gradient_loss_advantage} that guarantees monotonic improvement of a stochastic policy. -The \ac{TRPO} algorithm exploits importance sampling for correcting the probabilities of using trajectories produced by different policies, and limits how much these policies can change between two iterations using a hard constraint on their \ac{KL} divergence: +a modification of Equation~\eqref{equation:policy_gradient_loss_advantage} that guarantees the monotonic improvement of a stochastic policy. +The \ac{TRPO} algorithm exploits importance sampling for correcting the probabilities of trajectories produced by different policies, and limits how much these policies can change between two iterations using a hard constraint on their \ac{KL} divergence: % \begin{align*} \mathop{\text{maximize}}_{\boldsymbol{\theta}} \;\; & \hat{\E}_t \left[ \frac{\pi_{\boldsymbol{\theta}}(a_t \given s_t)}{\pi_{\boldsymbol{\theta}_{old}}(a_t \given s_t)} \hat{A}_t \right] \\ @@ -685,17 +687,23 @@ \subsection{Proximal Policy Optimization} L^{CLIP}(\boldsymbol{\theta}) = \hat{\E}_t \left[ \min \left\{ r_t(\boldsymbol{\theta}) \hat{A}_t, \; \operatorname{clip} (r_t(\boldsymbol{\theta}), 1 - \epsilon, 1+\epsilon) \hat{A}_t \right\} \right] . \end{equation*} % -While optimising over a dataset $\mathcal{D}$, after the first batch, $\boldsymbol{\theta} \neq \boldsymbol{\theta}_{old}$. +While optimising over a dataset $\mathcal{D}$, after the optimisation of the first batch, $\boldsymbol{\theta} \neq \boldsymbol{\theta}_{old}$. Clipping the likelihood ratio in the interval $1 \pm \epsilon$ has the effect of dampening potentially large steps towards the reinforcement direction ($r_t(\boldsymbol{\theta}) \gg 1$ if $\hat{A}_t>0$, and $r_t(\boldsymbol{\theta}) \ll 1$ if $\hat{A}_t < 0$). The penalty variant of the algorithm, instead, transforms the hard constraint of \ac{TRPO} into a soft constraint: % \begin{equation*} - L^{KLPEN}(\boldsymbol{\theta}) = \hat{\E}_t \left[ r_t(\boldsymbol{\theta}) \hat{A}_t - \beta D_{KL}\left[\pi_{\boldsymbol{\theta}_{old}}(\cdot|s_t) || \pi_{\boldsymbol{\theta}}(\cdot|s_t)\right] \right] . + L^{KLPEN}(\boldsymbol{\theta}) = \hat{\E}_t \left[ r_t(\boldsymbol{\theta}) \hat{A}_t - \beta D_{KL}\left[\pi_{\boldsymbol{\theta}_{old}}(\cdot|s_t) \,||\, \pi_{\boldsymbol{\theta}}(\cdot|s_t)\right] \right] . \end{equation*} % While selecting a constant $\beta$ is a possibility, the authors of \ac{PPO} proposed an adaptive parameter update. -After each policy update, the \ac{KL} divergence of the target policy from the behaviour policy can be computed (or, if the parameterized policy has no closed-form expression, estimated) as $d = D_{KL}\left[\pi_{\boldsymbol{\theta}_{old}}(\cdot|s_t) || \pi_{\boldsymbol{\theta}}(\cdot|s_t)\right]$. +After each policy update, the \ac{KL} divergence of the target policy $\pi_{\boldsymbol{\theta}}$ from the behaviour policy $\pi_{\boldsymbol{\theta}_{old}}$ can be computed (or, if the parameterized policy has no closed-form expression, estimated) as: +% +\begin{equation*} + d = D_{KL}\left[\pi_{\boldsymbol{\theta}_{old}}(\cdot|s_t) \, || \, \pi_{\boldsymbol{\theta}}(\cdot|s_t)\right] + . +\end{equation*} +% Then, the new $\beta$ parameter used in the following policy is adjusted as follows: % \begin{align*} diff --git a/Chapters/Part_1/chapter_4.tex b/Chapters/Part_1/chapter_4.tex index feed924..c499b19 100644 --- a/Chapters/Part_1/chapter_4.tex +++ b/Chapters/Part_1/chapter_4.tex @@ -19,23 +19,23 @@ \section{Review of Reinforcement Learning for Robot Locomotion} In the following period, in the late 1990s and the entire 2000s, the body of research mainly focused on trajectory optimisation, particularly targeting learning from demonstration~\parencite{schaal_learning_1996, atkeson_robot_1997, schaal_is_1999}. In order to limit the challenges posed by the computational constraints in conjunction with the available hardware, researchers widely exploited parameterized movement primitives~\parencite{schaal_dynamic_2006} and policy gradients methods~\parencite{peters_reinforcement_2003, peters_policy_2006, kober_policy_2008}, also attempting to bridge the latter with stochastic optimal control~\parencite{theodorou_generalized_2010}. Restricting the domain to robot locomotion, this decade was mainly characterised by the usage of quadrupeds~\parencite{kohl_policy_2004, honglak_lee_quadruped_2006, kolter_hierarchical_2007, theodorou_reinforcement_2010}. -\textcite{koberReinforcementLearningRobotics2013} provide a well-structured and extensive survey on the state and challenges of research in \ac{RL} applied to robotics in this period. +\textcite{koberReinforcementLearningRobotics2013} provide a well-structured and extensive survey on the state and challenges of research in \ac{RL} applied to robotics characterising this period. In the early 2010s, after the advent and initial success of \ac{DL}~\parencite{hinton_fast_2006}, hardware advancements enabled significant progress in speech recognition and computer vision research. The transfer of similar methodologies to \ac{RL}, whose performance was previously limited by computational constraints, at that point, was natural, and the interest in their combination, \ac{DRL}, exploded. \textcite{mnih_human-level_2015} showed that \ac{DRL} was suitable for becoming a generic and effective method to learn end-to-end policies also in high-dimensional problems. Similar ideas have been transferred to the control of simulated robots by~\textcite{lillicrap_continuous_2016}, showing that this category of algorithms can train end-to-end policies with performance comparable to those obtained by methods that can access the complete system's dynamics. -In the same years, the \ac{RL} community has been very prolific and produced a large number of algorithms, for example, \ac{TRPO}~\parencite{schulman_trust_2017}, \ac{PPO}~\parencite{schulman_proximal_2017}, and \ac{SAC}~\parencite{haarnoja_soft_2018}, to mention a few among those that have been widely adopted in the following years. +In the same years, the \ac{RL} community has been very prolific and produced a large number of algorithms, for example, \ac{TRPO}~\parencite{schulman_trust_2017}, \ac{PPO}~\parencite{schulman_proximal_2017}, and \ac{SAC}~\parencite{haarnoja_soft_2018}, to mention a few among those that have experienced a wide adoption. \pagebreak[1] The emergence of \ac{DRL}, together with new algorithms, provided a strong thrust also in their transfer to the robotics domain. -Limiting the scope to quadrupedal locomotion, many parallel research directions emerged over time. +Limiting the scope to \emph{quadrupedal locomotion}, many parallel research directions emerged over time. Researchers succeeded to transfer policies from simulation to real-world robots~\parencite{tan_sim--real_2018, hwangboLearningAgileDynamic2019s} introducing models of the actuation dynamics in the training process. Other studies, exploiting imitation learning from motion data, proposed techniques to learn policies for simulated robots~\parencite{peng_learning_2020} that were later adapted to their real-world counterparts~\parencite{smith_legged_2021}. Other methods integrated high-level planners and low-level controllers within the learning pipeline~\parencite{tsounis_deepgait_2020}, used the learning component as corrective action~\parencite{gangapurwala_real-time_2021}, performed control in Cartesian space~\parencite{bellegarda_robust_2021}, or exploited hardware accelerators to speed-up learning~\parencite{rudin_learning_2021}. -For what concerns research on bipedal locomotion, instead, the literature is more sparse. +For what concerns research on \emph{bipedal locomotion}, instead, the literature is more sparse. The work of \textcite{heess_emergence_2017} posed one of the first milestones in end-to-end learning of locomotion behaviours from simple reward signals, exploiting a distributed version of \ac{PPO} and a curriculum learning strategy that makes the learned task progressively more challenging. A second relevant work comes from the computer graphics community, that nowadays shares multiple research directions with robot learning. \textcite{peng_deepmimic_2018} proposed a method that, utilising motion capture data of highly-dynamic movements, produces a policy that can execute and adapt the collected trajectories on a simulated character. @@ -45,47 +45,47 @@ \section{Review of Reinforcement Learning for Robot Locomotion} \section{Review of Simulators for Robot Learning} \label{sec:review_simulators_robot_learning} -All works discussed in the previous section propose algorithms, architectures, and learning pipelines that, during the training phase of a policy, require a constant flow of new data sampled from the controlled robot. -In most cases, obtaining the necessary amount of data from physical robots would either take too long or be dangerous due to the high probability of damaging the system and its surroundings due to the inherent trial-and-error nature of \ac{RL}. +All works discussed in the previous section proposed algorithms, architectures, and learning pipelines that, during the training phase of a policy, require a constant flow of new data sampled from the controlled robot. +In most cases, obtaining the necessary amount of data from physical robots would either take too long or be dangerous due to the high probability of damaging the system and its surroundings as a consequence of the inherent trial-and-error nature of \ac{RL}. Many of the contributions in the robot learning domain exploit rigid-body simulators and robot models to generate enough state transitions for the learning process to converge to a satisfying solution. Due to the unavoidable introduction of approximations, the evolution of rigid-body simulations will differ from the evolution of the real system. -This mismatch, for example, could originate from the estimation error of the inertial parameters of the simulated robot, from the wrong assumption of a perfectly rigid body, from the simplistic modelling of the actuation dynamics, from the approximations of the contact dynamics and contact parameters, from the mismatch between simulated and real sensor noises, \etc +This mismatch, for example, could originate from the estimation error of the inertial parameters of the simulated robot, from the wrong assumption of a perfectly rigid body, from the simplistic modelling of the actuation dynamics, from the approximations of the contact dynamics and contact parameters, from the mismatch between simulated and real sensors noise, \etc Agents trained in environments characterised by such approximations, that could be referred all together as \emph{reality gap}, could learn to exploit modelling approximations to reach unrealistic performance that could never be transferred elsewhere. The most popular method to mitigate the occurrence of this behaviour is \emph{domain randomization}~\parencite{peng_sim--real_2018, muratore_robot_2022}, widely studied in sim-to-real research~\parencite{zhao_sim--real_2020}, that aims to regularise the simulation with different methods to prevent overfitting during training. -In the rest of this section, we pass over similar methodologies that could be applied to any simulation, and focus instead on providing a review of common simulators that could be used for robot learning and, particularly, locomotion applications. +In the rest of this section, we bypass similar methodologies that could be applied to any simulation, and focus instead on providing a review of common simulators that could be used for robot learning and, particularly, locomotion applications. The process of describing the evolution of a dynamical system is deeply rooted in control and systems theory. As regards physical robots, despite the theory behind the evolution of their rigid-body description has been known for centuries, the advent of general-purpose simulators did not occur until the early 2000s. -The most established simulator that is nowadays widely used is Gazebo~\parencite{koenig_design_2004}. -It interfaces with multiple physics engines, using ODE\footnote{\url{https://www.ode.org/}} as a default, and supports importing models and world descriptions from the \ac{URDF} and the \ac{SDF} files. +The most established simulator that nowadays is still widely used is Gazebo~\parencite{koenig_design_2004}. +It interfaces with multiple physics engines, using ODE\footnote{\url{https://www.ode.org/}} as a default, and supports importing models and world descriptions from the \ac{SDF} and the \ac{URDF} files. A second very popular simulator, mainly for robot learning, is Mujoco~\parencite{todorov_mujoco_2012}. It is specifically designed for model-based optimisation with a particular focus on contact dynamics. Another common simulator supporting multiple physics engines and common descriptions is CoppeliaSim~\parencite{rohmer_v-rep_2013}, formerly known as V-Rep. -It exposes Bullet~\parencite{coumans_pybullet_2016} as the default physics engine. +It features Bullet~\parencite{coumans_pybullet_2016} as the default physics engine. More recently, Nvidia released its general-purpose toolkit Nvidia Isaac\footnote{\url{https://developer.nvidia.com/isaac-sdk}} that integrates with their PhysX\footnote{\url{https://developer.nvidia.com/physx-sdk}} physics engine. -Beyond general-purpose simulators,many standalone physics engines can be used to simulate multibody systems. +Beyond general-purpose simulators, many standalone physics engines can be used to simulate multibody systems. PyBullet~\parencite{coumans_pybullet_2016}, despite its origins in videogame development, is another common option in robot learning research. DART~\parencite{lee_dart_2018}, also available in Gazebo, interfaces with several collision detectors and constraint solvers, and was recently used as the basis of Nimble Physics~\parencite{werling_fast_2021} which adds differentiable physics support. Differentiable physics originates from differentiable programming~\parencite{innes_differentiable_2019} and scientific machine learning~\parencite{rackauckas_universal_2021}, in which physics laws are implemented with \ac{AD}~\parencite{baydin_automatic_2018} frameworks that allow to propagate the gradients through their calculation. In the past few years, it has become a popular research direction that has yielded outstanding results in many fields. In the domain of robotics, differentiability is still under scrutiny~\parencite{suh_differentiable_2022} and research is active~\parencite{gillen_leveraging_2022}. -On the one hand, common \acp{RBDA} have been studied to compute analytical gradients~\parencite{carpentier_analytical_2018, belbute-peres_end--end_2018}. -On the other hand, the entire computational flow was implemented with an \ac{AD} framework~\parencite{freeman_brax_2021, howell_dojo_2022}. +On the one hand, common \acp{RBDA} have been studied to compute analytical gradients~\parencite{carpentier_analytical_2018, belbute-peres_end--end_2018, singh_efficient_2022}. +On the other hand, the entire computational flow was implemented with \ac{AD} frameworks~\parencite{freeman_brax_2021, howell_dojo_2022}. On a similar note, simulations implemented entirely with \ac{AD} frameworks could be executed directly on hardware accelerators like \acp{GPU} or \acp{TPU}, as an alternative to \acp{CPU} either on a single machine or a cluster. -Even though \ac{CPU} simulations could be optimized to run fast~\parencite{hwangbo_per-contact_2018}, hardware accelerators can provide a degree of scalability that outperforms any \ac{CPU}-based solution. +Even though \ac{CPU} simulations could be optimised to run fast~\parencite{hwangbo_per-contact_2018} under given circumstances, hardware accelerators can provide a degree of scalability that outperforms any \ac{CPU}-based solution. Thanks to their hardware, Nvidia pioneered this domain with PhysX. -More recent development showed remarkable learning speed with their Isaac Gym~\parencite{liangGPUAcceleratedRoboticSimulation2018s, makoviychuk_isaac_2021, rudin_learning_2021} framework. +More recent development showed remarkable learning speed with their Isaac Gym framework~\parencite{liangGPUAcceleratedRoboticSimulation2018s, makoviychuk_isaac_2021, rudin_learning_2021}. In this realm, the past few years have seen several other attempts to develop comparable solutions~\parencite{heiden_neuralsim_2021, qiao_efficient_2021, freeman_brax_2021}. -Regardless of the selected simulator, in reinforcement learning research it is common to abstract environments and agents, allowing to develop them independently from each other. -The most common environment abstraction is provided by OpenAI Gym~\parencite{brockman_openai_2016}. -Environments exposing the \verb|gym.Env| interface can be implemented with any of the simulators discussed in this section, and they can be -employed by any \ac{RL} agent of available frameworks supporting this interface. +Regardless of the selected simulator, in \ac{RL} research it is common to abstract environments and agents, allowing to develop them independently from each other. +The most common environment abstraction is provided by OpenAI Gym~\parencite{brockman_openai_2016} in Python. +Environments employing the \verb|gym.Env| interface can be implemented with any of the simulators discussed in this section, and they can be +employed by any \ac{RL} agent included in the available frameworks supporting this interface. -Considering the increasing interest in simulation and the related speed of development, The review provided in this section is far from complete. +Considering the increasing interest in simulations and its corresponding speed of advancements, the review provided in this section is far from complete. We conclude by pointing the interested reader to the surveys on the topic, from the early ones~\parencite{ivaldiToolsSimulatingHumanoid2014s,erezSimulationToolsModelbased2015s} to the most recent~\parencite{collins_review_2021, kim_survey_2021, korber_comparing_2021}, acknowledging that either they became outdated quickly, or use metrics that are valid only for a few selected use cases. \section{Review of Push-recovery Strategies} @@ -93,13 +93,13 @@ \section{Review of Push-recovery Strategies} Locomotion is among the most fundamental capabilities that legged robots need to master in order to become useful in the real world. In the past ten years, quadrupedal locomotion research achieved remarkable results, and nowadays, quadrupeds are able to autonomously navigate hazardous environments with great agility \parencite{lee_learning_2020, miki_learning_2022}. Despite decades of research, the situation of bipedal locomotion is quite different, especially when we compare the agility of most of the existing humanoid robots with human capabilities. -Many fundamental methods, techniques, and control architectures widely adopted by locomotion research on humanoid robots have been first studied in the simplified case of push recovery. +Many fundamental methods, techniques, and control architectures widely adopted by bipedal locomotion research have been first studied in the simplified case of push recovery. In fact, the ability to react appropriately to external perturbations is paramount for achieving robust locomotion capabilities, and often advances in push recovery research are preparatory for advances in locomotion research~\parencite{jeong_robust_2019}. Humans are able to employ different strategies to keep their balance, such as \emph{ankle}, \emph{hip}, and \emph{stepping} strategies~\parencite{nashner_organization_1985, maki_role_1997, stephens_humanoid_2007}. The adoption of these strategies follows an activation proportional to the magnitude of external disturbances. The effectiveness of human capabilities mainly stems from how different strategies are combined into a continuous motion~\parencite{mcgreavy_unified_2020}. -The applicability of these principles for the generation of control actions applied to robots traditionally relied on simplified models approximating their dynamics, such as the \ac{LIP} model~\parencite{kajita_3d_2001} and the \ac{CP}~\parencite{pratt_capture_2006}. +The applicability of these principles for the generation of control actions applied to robots traditionally relied on \emph{simplified models} approximating their dynamics, such as the \ac{LIP} model~\parencite{kajita_3d_2001} and the \ac{CP}~\parencite{pratt_capture_2006}. Together with the formulation of the \ac{ZMP}~\parencite{vukobratovic_contribution_1969, vukobratovic_zero-moment_2004} widely adopted as a stability criterion, simplified models became very popular and still used nowadays. Modern applications alternatively rely on the \ac{DCM}, that can be seen as an extension of the \ac{CP} theory~\parencite{shafiee_online_2019}. @@ -108,14 +108,14 @@ \section{Review of Push-recovery Strategies} Implementing an effective and robust blending of all the discrete strategies has been considered challenging and prone to failures, even with careful tuning~\parencite{mcgreavy_unified_2020}. Nonetheless, their usage still represents an active research area that can achieve promising results~\parencite{jeong_robust_2019}. -In the past few years, the robotics community attempted to develop methods based on robot learning and, in particular, \ac{RL} to generate a unified control action that automatically blends the discrete strategies. +In the past few years, the robotics community attempted to develop methods based on robot learning and, in particular, \ac{RL} to generate a unified control action that automatically blends the discrete strategies~\parencite{yang_learning_2018}. Early results have been demonstrated capable of controlling the lower limbs of a simulated humanoid robot~\parencite{kim_push_2019} and, more recently, a real exoskeleton~\parencite{duburcq_reactive_2022}. \section{Thesis Content} \label{sec:thesis_content} The previous sections outlined the history, the overall status, and the most recent breakthroughs in the domains of reinforcement learning for robot locomotion, rigid-body simulations for robot learning, and push-recovery strategies for humanoid robots. -In view of the subject of this thesis on how modern technology can help us generate synthetic data for humanoid robot planning and control, and considering the three reviewed research domains, we conclude this chapter by identifying problems still open and detailing how the contributions to knowledge provided in Part~\ref{part:contribution} aim to solve them. +In view of the subject of this thesis regarding how modern technology can help us generate synthetic data for humanoid robot planning and control, and considering the three reviewed research domains, we conclude this chapter by identifying problems still open and detailing how the contributions to knowledge provided in the next Part~\ref{part:contribution} aim to solve them. \subsection{\autoref{ch:rl_env_for_robotics}: Reinforcement Learning Environments for Robotics} @@ -136,9 +136,9 @@ \subsection{\autoref{ch:rl_env_for_robotics}: Reinforcement Learning Environment We currently provide a complete implementation for simulated robots interfacing with the new Gazebo Sim\footnote{\url{https://gazebosim.org/}} simulator, supporting the DART and Bullet physics engines. At the high level, the \emph{Gym-Ignition} Python package\footnote{In its earlier releases, the new Gazebo Sim simulator was called Ignition Gazebo, from what derives the name of our Python package.} provides abstraction layers of the different components that typically form a robotic environment: the Task and the Runtime. On the one hand, the Task provides the necessary components to develop agnostic decision-making logic with the generic \scenario \acp{API}, and on the other, the Runtime provides the actual interfacing either with a simulator or a real robot. -This whole architecture allows to develop the Tasks only once, and use them for training, executing, or refining a policy in any of the supported Runtimes. +This whole architecture the user to only develop the Tasks only once, and use them for training, executing, or refining a policy in any of the supported Runtimes. The selected Gazebo Sim simulator has multiple advantages over alternative options for generating synthetic data. -Its plugin-based architecture allows third-party developers to integrate custom physics engines and rendering engines, enabling them to develop agnostic environments that select the desired engines during runtime. +Its plugin-based architecture allows third-party developers to integrate custom physics engines and custom rendering engines, enabling them to develop agnostic environments that select the desired engines during runtime. For robot learning, if needed, it enables seamlessly switching engines, opening the possibility to add them as a whole in the domain randomization set. Furthermore, considering the wide adoption of Gazebo within the robotics community, it enables roboticists to create environments using familiar tools, guaranteeing that their execution remains reproducible thanks to a single-process architecture not possible with the previous generations of the simulator. @@ -190,8 +190,8 @@ \subsection{\autoref{ch:learning_from_scratch}: Learning from scratch exploiting Generating the appropriate control action for highly-dynamic movements has always been challenging in robotics, especially when the controlled system is redundant and under-actuated like a humanoid robot. Traditional methods based on control theory and optimal control either rely on accurate descriptions of the dynamics, or exploit approximations in the form of either simplified or reduced models. -The first ones strongly rely on the accuracy of the dynamic model, to the point of failure in the presence of a mismatch sufficiently large that controller's robustness is unable to compensate for it adequately. -The second ones, instead, often constrain the space of individual control actions and, when dealing with multiple options, their automatic selection might require careful and often manual tuning. +Methods based on control theory strongly rely on the accuracy of the dynamic model, to the point of failure in the presence of a mismatch sufficiently large that controller's robustness is unable to compensate for it adequately. +Instead, those based on optimal control often constrain the space of individual control actions and, when dealing with multiple options, their automatic selection might require careful and often manual tuning. Furthermore, with the increase in the controlled \acp{DoF} and the number of optimisation constraints, this family of methods is still facing computational challenges in real-time settings. An alternative direction for this category of problems is \ac{RL}, providing a unified learning framework that, given a meaningful reward signal, does not require the knowledge of the controlled system's dynamics. However, regardless of its accuracy, the dynamic model can provide interesting priors that could benefit learning. @@ -247,8 +247,9 @@ \subsection{\autoref{ch:contact_aware_dynamics}: Contact-aware Multibody Dynamic \paragraph{Context of Contribution} We describe the multibody dynamics in reduced coordinates as a dynamical system in state-space capable of modelling free-floating robots. -With locomotion applications in mind, we propose a dynamical system augmenting the multibody \acp{EoM} that supports contacts with uneven terrain, formulating a soft-contacts model for points-surface collisions without any approximation of the corresponding friction cone. -We obtain a continuous, albeit stiff, state-space representation of the dynamical system whose trajectories can be computed by any numerical integration scheme. +With locomotion applications in mind, we propose a dynamical system augmenting the multibody \acp{EoM} with contacts in presence of uneven smooth terrain, formulating a soft-contacts model capable to handle both sticking and slipping states without approximating the corresponding friction cone. +To this end, we extend existing soft-contact models developed for sphere-plane surface to a more generic point-surface setting. +When combined with the multibody \acp{EoM}, we obtain a continuous, albeit stiff, state-space representation of the dynamical system whose trajectories can be computed by any numerical integration scheme. \paragraph{Contribution Outputs} @@ -268,7 +269,7 @@ \subsection{\autoref{ch:scaling_rigid_body_simulations}: Scaling Rigid Body Simu \paragraph{Open Problem} Training policies in a simulated setting is still the predominant choice in \ac{RL} research. -Although sim-to-real methodologies are progressing rapidly, and the community is actively researching either learning directly from real-world robots or offline transitions, simulations remain a central component in this domain. +Although sim-to-real methodologies are progressing rapidly, and the community is actively researching on either learning directly from real-world robots or from offline transitions, simulations remain a central component in this domain. In the setting of robot locomotion, the training process involves sampling simulated trajectories of a multibody system endowed with a considerable number of \acp{DoF} interacting, at least, with the terrain surface. Most of the general-purpose simulators currently available perform computations entirely on \acp{CPU}. Despite the most advanced frameworks providing \ac{RL} agents support the parallel execution of multiple environments either on a single machine or on a cluster, sampling trajectories represents the main bottleneck of the entire learning pipeline. @@ -282,7 +283,7 @@ \subsection{\autoref{ch:scaling_rigid_body_simulations}: Scaling Rigid Body Simu The key to the transparent execution on different devices is the exploitation of the contact-aware multibody dynamics introduced in Chapter~\ref{ch:contact_aware_dynamics}. It enables the development of efficient routines, not relying on any dynamic allocation that can be deployed on hardware accelerators. The possibility to execute the simulation entirely on hardware accelerators represents an essential feature for applications requiring the generation of a massive amount of data like those belonging to the robot learning domain. -We also describe state-of-the-art \acp{RBDA} proposed by \textcite{featherstone_rigid_2008} with the notation introduced in Chapter~\ref{ch:robot_modelling} that can be executed on this hardware. +We also describe state-of-the-art \acp{RBDA} proposed by \textcite{featherstone_rigid_2008} with the notation introduced in Chapter~\ref{ch:robot_modelling} that can be executed on this hardware, and provide an updated version of the \ac{RNEA} compatible with floating-base robots, that exactly corresponds to the inverse of forward dynamics algorithms like \ac{ABA}. \paragraph{Contribution Outputs} @@ -295,4 +296,4 @@ \subsection{\autoref{ch:scaling_rigid_body_simulations}: Scaling Rigid Body Simu \textbf{Daniele Pucci:} Resources, Supervision, Funding acquisition. \end{quote} % -The software components described in this chapter have been open-sourced and are publicly available at the following link: \url{https://github.com/ami-iit/jaxsim}. +The software components described in this chapter have been open-sourced and are publicly available at the following link:\linebreak \url{https://github.com/ami-iit/jaxsim}. diff --git a/Chapters/Part_2/chapter_5.tex b/Chapters/Part_2/chapter_5.tex index d15460d..987e129 100644 --- a/Chapters/Part_2/chapter_5.tex +++ b/Chapters/Part_2/chapter_5.tex @@ -105,7 +105,7 @@ \subsection{Existing frameworks} Its main advantage is an efficient contact solver that greatly speeds up the simulation. Due to its very recent release, there are not many examples available. Like other frameworks, its closed-source nature might limit applications. -It includes the Python framework RaisimGymTorch that allows creating \ac{RL} environment. +It includes the Python framework RaisimGymTorch\footnote{\url{https://raisim.com/sections/RaisimGymTorch.html}} that allows creating \ac{RL} environment. \item[PyRoboLearn]\!\parencite{delhaisse_pyrobolearn_2019} is another framework containing both robotic environments and RL algorithms. It focuses on modularity and flexibility to promote code reuse. @@ -128,11 +128,11 @@ \subsection{Existing frameworks} \item[Nvidia Isaac Gym]\!\parencite{makoviychuk_isaac_2021} is the RL component of Isaac, the new Nvidia toolbox for AI applications in robotics. -Simulations can be executed in either their PhysX engine and they provide state-of-the-art photorealistic rendering. +Simulations can be executed in their PhysX engine and they provide state-of-the-art photorealistic rendering. Nvidia Isaac\footnote{\url{https://developer.nvidia.com/isaac-sdk}} is one of the most promising projects that will provide a unified framework for robotics and AI, but unfortunately its closed source nature might limit the possibility of extending and customising it. \item[Brax]\!\parencite{freeman_brax_2021} is a differentiable physics engine that simulates rigid bodies in maximal coordinates. -It is written in JAX and is designed for use on acceleration hardware, enabling massively parallel simulations on multiple devices. +It is written in \jax and is designed for use on acceleration hardware, enabling massively parallel simulations on multiple devices. The physics engine, beyond the drawbacks in joint constraints enforcement due to the maximal coordinates, neglects some dynamical effects and therefore the simulation is approximate. Rendering capabilities are limited and the execution of environments provided in the framework is constrained to a simulated setting. @@ -199,12 +199,12 @@ \section{Design Goals} Having specified the design goals and the related considerations, we structured the framework in two different components: \emph{\scenario} and \emph{Gym-Ignition}. Considering that most of the robotics libraries and simulators are available in \cpp, we designed the low-level unified scene interfaces in this language. \scenario (\spacedlowsmallcaps{SCEN}e interf\spacedlowsmallcaps{A}ces for \spacedlowsmallcaps{R}obot \spacedlowsmallcaps{I}nput/\spacedlowsmallcaps{O}utput) defines the \verb|World|, \verb|Model|, \verb|Link|, and \verb|Joint| abstractions, and allows to implement them either in C++ or, through a set of bindings, in Python. -Instead, the most popular language is Python for the \ac{RL} logic. +Instead, the most popular language for the \ac{RL} logic is Python. Gym-Ignition is a pure Python package that provides the \verb|Runtime| and \verb|Task| interfaces, together with high-level helpers to compute model-based quantities based on the iDynTree~\parencite{nori_icub_2015} library. \section{ScenarIO: SCENe interfAces for Robot Input/Output} -\scenario\footnote{\url{https://github.com/robotology/gym-ignition/tree/master/scenario}} is a \cpp library acting as a Hardware Abstraction Layer over either simulated or real robots. +\scenario\footnote{\url{https://github.com/robotology/gym-ignition/tree/master/scenario}} is a \cpp library acting as a \ac{HAL} over either simulated or real robots. The abstraction of the scene is structured in different interfaces: % \begin{description} @@ -226,13 +226,13 @@ \section{ScenarIO: SCENe interfAces for Robot Input/Output} Furthermore, it allows querying the location of active contact points with their corresponding 6D contact force. % \item[\texttt{\MakeUppercase{J}oint}] -This interface returns all the parameters of the joint, including the number of \acp{DoF}, the type, the friction parameters, the position, velocity, and force limits, \etc It is also the entry point to get the joint variables like position, velocity, and acceleration, and set the corresponding targets used for motion control. +This interface returns all the parameters of the joint, including the number of \acp{DoF}, the type, the friction parameters, the position, velocity, force limits, \etc It is also the entry point to get joint variables like position, velocity, and acceleration, and set the corresponding targets used for motion control. The joint interface also controls the actuation type, exposing either a position or velocity PID controller with its parameters. % \end{description} Beyond these scene interfaces, \scenario also includes the \verb|Controller| interfaces, allowing development of runtime-agnostic whole-body controllers that can be enabled at the \verb|Model| level. -They can be used as a replacement for the low-level PID controllers associated with each of the model's joints. +They can be used as a replacement for the default low-level PID controllers associated with each of the model's joints. The abstraction layer provided by \scenario enables to develop either \cpp or Python code agnostic of the actual setting where the scene and its robots operate. For both robot control and \ac{RL} research, we implemented the \scenario interfaces to communicate with the Gazebo Sim simulator, obtaining the \emph{\scenario Gazebo} library. @@ -270,7 +270,7 @@ \subsection{\scenario Gazebo} These two features are particularly beneficial in the robot learning setting. The plugin-based architecture of physics engines allows running simulated environments transparently in any of the supported physics backends. It enables to add in the domain randomization set not only physics parameters, but also the entire physics engine, preventing policy overfitting to subtle implementation details. -Furthermore, ensuring reproducibility already at the simulation level is necessary condition to inherit this feature by simulated \ac{RL} environments. +Furthermore, ensuring reproducibility already at the simulation level is a necessary condition to inherit this feature by simulated \ac{RL} environments. % Other key features of Gazebo Sim are the following: % diff --git a/Chapters/Part_2/chapter_6.tex b/Chapters/Part_2/chapter_6.tex index 019926c..03906d9 100644 --- a/Chapters/Part_2/chapter_6.tex +++ b/Chapters/Part_2/chapter_6.tex @@ -51,7 +51,7 @@ \subsection{Action} Hands, wrists, and neck, which arguably play a minor role in balancing, are locked in their natural positions. The policy computes target joint velocities bounded in $[-180,180]$~deg/s at $25$~Hz. Commanding joint velocities rather than joint positions prevents target joint positions from being too distant from each other in consecutive steps. -Especially at training onset, this would lead to jumpy references that the \pid controllers cannot track, affecting the discovery of the relation between $\mathbf{x}_t$ and $\mathbf{x}_{t+1}$. +Especially at training onset, this would lead to jumpy references that the \pid controllers cannot track, affecting the discovery of the relation between the states $\mathbf{x}_t$ and $\mathbf{x}_{t+1}$. The integration process, instead, enables to use a policy that generates discontinuous actions while maintaining continuous \pid inputs with no need for additional filters. \subsection{State} @@ -78,8 +78,16 @@ \subsection{State} \end{table} Since no perception is involved, the state of the \ac{MDP} contains information about the robot's kinematics and dynamics. -It is defined as the tuple $\mathbf{x} := \langle \mathbf{q}, \mathbf{\boldsymbol{\nu}}, \mathbf{f}_L, \mathbf{f}_R \rangle \in \mathcal{X}$. -The observation, computed from the state $\mathbf{x}$, is defined as the tuple $\boldsymbol{o} := \langle \mathbf{o}_s, \mathbf{o}_{\dot{s}}, \mathbf{o}_h, \mathbf{o}_R, \mathbf{o}_c, \mathbf{o}_f, \mathbf{o}_F, \mathbf{o}_v \rangle \in \mathcal{O}$, where $\mathcal{O} := \mathbb{R}^{62}$. +It is defined as the tuple $\mathbf{x} := \langle \mathbf{q}, \mathbf{\boldsymbol{\nu}}, \mathbf{f}_L, \mathbf{f}_R \rangle \in \mathcal{X}$, where $(\mathbf{f}_L, \mathbf{f}_R)$ are the 6D forces exchanged between the feet and the terrain. +The observation, computed from the state $\mathbf{x}$ and the robot model $\mathcal{M}$, is defined as the tuple +% +\begin{equation*} + \boldsymbol{o} = + \boldsymbol{o}(\mathbf{x}, \mathcal{M}) := + \langle \mathbf{o}_s, \mathbf{o}_{\dot{s}}, \mathbf{o}_h, \mathbf{o}_R, \mathbf{o}_c, \mathbf{o}_f, \mathbf{o}_F, \mathbf{o}_v \rangle \in \mathcal{O} +\end{equation*} +% +where $\mathcal{O} := \mathbb{R}^{62}$. % The observation consists of the following terms: % @@ -117,12 +125,12 @@ \subsection{Other specifications}\label{sec:env-other} The initial state distribution $\rho(\mathbf{x}_0): \mathcal{X} \mapsto \mathcal{O}$ defines the value of the observation in which the agent begins each episode. Sampling the initial state from a distribution with small variance, particularly regarding joint positions and velocities, positively affects exploration without degrading the learning performance. -At the beginning of each episode, for each joint $j$ we sample its position $s_{j,0}$ from $\mathcal{N}(\mu=s_0, \sigma=10 \, \text{deg})$, where $s_{0}$ represents the fixed initial reference, and its velocity $\dot{s}_{j,0}$ from $\mathcal{N}(\mu=0, \sigma=90 \, \text{deg/s})$. +At the beginning of each episode, for each joint $j \in \mathcal{J}$ we sample its position $s_{j,0}$ from $\mathcal{N}(\mu=s_0, \sigma=10 \, \text{deg})$, where $s_{0}$ represents the fixed initial reference, and its velocity $\dot{s}_{j,0}$ from $\mathcal{N}(\mu=0, \sigma=90 \, \text{deg/s})$. As a result, the robot may or may not start with the feet in contact with the ground, encouraging the agent to learn how to land and deal with impacts. \paragraph{Exploration} -In order to promote exploration beyond the initial state distribution and favour the emergence of push-recovery strategies, we apply external perturbations in the form of a 3D force to the robot's base frame. +In order to promote exploration beyond the initial state distribution and favour the emergence of push-recovery strategies, we apply external perturbations in the form of a 3D force to the robot's base frame located over its pelvis. The applied force vector has a fixed magnitude of 200~N and is applied for 200~ms. Considering the weight of the iCub, approximately 33~kg, the normalised impulse sums up to 1.21~Ns/Kg. We sample the direction of the applied force from a uniform spherical distribution. @@ -141,7 +149,7 @@ \subsection{Other specifications}\label{sec:env-other} During the training process, the environment performs a domain randomisation step at the beginning of each new episode. The masses of the robot's links are sampled from a normal distribution $\mathcal{N}(\mu=m_0, \sigma=0.2 m_0)$, where $m_0$ is the nominal mass of the link defined in the model description. To avoid making assumptions about the feet's and ground's material properties, we randomise the Coulomb friction $\mu_c$ of the feet by sampling it from $\mathcal{U}(0.5, 3)$. -Finally, since the simulation does not include the real dynamics of the actuators, to increase robustness, we apply a delay to the position references fed to the \pid controllers, sampled from $\mathcal{U}(0, 20)$~ms. +Finally, since the simulation does not include the real dynamics of the actuators, to increase robustness, we apply a delay to the position references fed to the \pid controllers, sampled from $\mathcal{U}(0, 20)$~ms, and kept constant during the entire episode (until termination). \section{Agent} @@ -330,7 +338,7 @@ \subsubsection{Transient} \item[Feet contact forces]\!($r_f$)\; % -This reward term pushes the transient towards a steady-state pose in which the vertical forces at feet's \acp{CoP} $(f^{CoP}_L, f^{CoP}_R)$ assume the value of half of the robot's weight, distributing it equally on the two feet. +This reward term pushes the transient towards a steady-state pose in which the vertical forces at feet's \acp{CoP} $(f^{CoP}_L, f^{CoP}_R)$ (see Appendix~\ref{appendix:center_of_pressure}) assume the value of half of the robot's weight, distributing it equally on the two feet. \item[Feet CoP]\!($r_p$)\; % @@ -375,7 +383,7 @@ \subsection{Emerging behaviours} \subsection{Deterministic planar forces} -We evaluate the push-recovery performance from horizontal forces. +We evaluate the push-recovery performance by assessing the resilience from the application of external forces to the horizontal transverse plane of the robot. Forces are applied for 0.2~s after 3~s from the simulation start, when the robot is stably standing still and front-facing. Success is defined if the robot is still standing after 7~s, defining as a standing state the configuration in which only the feet can be in contact with the ground. In Fig.~\ref{fig:force_polar}, success rates for forces pointing in 12 directions are reported. diff --git a/Chapters/Part_2/chapter_7.tex b/Chapters/Part_2/chapter_7.tex index 098c67c..af38c59 100644 --- a/Chapters/Part_2/chapter_7.tex +++ b/Chapters/Part_2/chapter_7.tex @@ -6,11 +6,11 @@ \chapter{Contact-aware Multibody Dynamics} The simulations were performed using the general-purpose Gazebo Sim, which provides a rich ecosystem supporting many types of robots, sensors, physics engines, and rendering capabilities. However, the benefits of exploiting general-purpose solutions often introduce a trade-off with the achievable performance. As we have experienced for the push-recovery policy presented in Chapter~\ref{ch:learning_from_scratch}, a single iteration of policy training could last multiple days, resulting in a long tuning process that could limit the search space. -The bottleneck of the training pipeline is the computations performed by the rigid-body simulator, limiting the rate at which new trajectories can be sampled. +The bottleneck of the training pipeline are the computations performed by the rigid-body simulator, limiting the rate at which new trajectories can be sampled. In the continuation of this thesis, we attempt to reply to the question: \textit{"How can we optimise sampling performance of synthetic data?"}. In this chapter, we derive the state-space representation of a floating-base multibody system interacting with a known ground surface. -Assuming the knowledge of the terrain height at any point in space, this formulation of the dynamics enables calculating the robot's trajectory using a plain numerical integration scheme, regardless of the contact state. +Assuming the knowledge of the terrain height at any point in space and the smoothness of the terrain surface, this formulation of the dynamics enables calculating the robot's trajectory using a plain numerical integration scheme, regardless of the contact state. To this end, we introduce a continuous soft-contacts model for resolving points-surface collisions supporting both static (sticking) and dynamic (slipping) contacts. Each contact point's dynamics state is structured such that it can be included in the state-space representation and integrated with the robot's dynamics. The resulting representation will be used, in the next chapter, as a base for a novel physics engine targeted to exploit modern hardware accelerators for maximising trajectory sampling. @@ -27,7 +27,7 @@ \section{State-space Multibody Dynamics} In this chapter, when it is not required to decompose the Coriolis and gravitational terms explicitly, we use a more compact notation introducing the following vector of \emph{bias forces}: % \begin{equation*} - h(\Qu, \Nu) = C(\Qu, \Nu) \Nu + g(\Qu) + h(\Qu, \Nu) = C(\Qu, \Nu) \Nu + g(\Qu) \in \realn^{6+n} . \end{equation*} @@ -93,7 +93,7 @@ \section{State-space Multibody Dynamics} , \end{align*} % -where we model the orientation of the base link with a quaternion $\quat = (w, \mathbf{r}) \in \mathbb{R}^4$, where $w \in R$ is its real part and $\mathbf{r} \in \mathbb{R}^3$ its imaginary part. +where we model the orientation of the base link with a quaternion $\quat = (w, \mathbf{r}) \in \mathbb{R}^4$, where $w \in \realn$ is its real part and $\mathbf{r} \in \mathbb{R}^3$ its imaginary part. Quaternions are useful in the context of modelling an element of $\SO(3)$ because their description is a more practical $\realn^4$ vector. We can now structure the desired state-space form of the system: @@ -111,13 +111,13 @@ \section{State-space Multibody Dynamics} \dot{\mathbf{s}} \end{pmatrix} \\ - \operatorname{FD}(\mathcal{M}, \mathbf{q}, \boldsymbol{\nu}, \boldsymbol{\tau}, \forcesix_L) + \operatorname{FD}(\mathcal{M}, \mathbf{q}, \boldsymbol{\nu}, \boldsymbol{\tau}, \forcesix_\mathcal{L}^{ext}) \end{bmatrix} = f\left(\mathbf{x}(t), \mathbf{u}(t)\right) . \end{equation} % -We introduced the \emph{forward dynamics} function $\operatorname{FD}$ to model the dynamics of the system's velocity $\Nu$. +We introduced the \emph{forward dynamics} function $\operatorname{FD}(\cdot)$ to model the dynamics of the system's velocity $\Nu$. We will discuss in greater detail the computation of forward dynamics in Chapter~\ref{ch:scaling_rigid_body_simulations}. At this moment, assuming the knowledge of all model parameters forming the \acp{EoM}, we can think of isolating the generalized acceleration $\Nud$ from Equation~\eqref{eq:eom_multibody_compact}. It is worth noting that $\Qud \neq \Nu$, in fact we used the term $\posd[W]_B$ as derivative of the base position. @@ -156,7 +156,7 @@ \section{Contact model} For simplicity, we assume that only contacts between points belonging to the model and a terrain surface can occur. Considering our locomotion scenario, this assumption allows us to describe robots with collision shapes composed of a set of \emph{collidable points}. This approach enables a unified logic for shapes ranging from simple boxes to complex meshes. -Note, however, that point-surface collisions do not provide expected results when a primitive shape like a box, modelled for example with its eight corner points, falls over a triangle-shaped terrain surface. +Note, however, that point-surface collisions do not provide expected results when a primitive shape like a box, modelled for example with its eight corner points, falls over the tip of a triangle-shaped terrain surface. In this case, the collision detection should consider the box as a surface instead of a set of points. If this use case is relevant, a possible workaround would be adding new collidable points on the box's surface, at a higher computational cost. Despite this limitation, the point-surface model could suffice in many target scenarios. @@ -200,7 +200,7 @@ \subsection{Point-surface collisions} % where $h_z = z_T - z_C$ is the \emph{penetration depth}, positive only for collidable points below the ground surface. -In the following sections, we need to project the penetration vector $\mathbf{h}$ and the linear velocity $\vellin[W]_{W, C}$ of the collidable point into the parallel and normal directions of the ground surface. +In the following sections, we need to project the penetration vector $\mathbf{h}$ and the linear velocity $\vellin[W]_{W, C}$ of the collidable point into the parallel and normal directions \wrt the ground surface. We denote the magnitude of the normal deformation as $\delta \in \mathbb{R}$, and the normal and tangential components of the velocity as $\vellin[W]^{\perp}_{W, C}, \vellin[W]^{\parallel}_{W, C} \in \mathbb{R}^3$: % \begin{align*} @@ -228,7 +228,7 @@ \subsection{Normal forces} \begin{equation*} \begin{cases} \delta = {}^W \mathbf{h} \cdot \hat{\mathbf{n}}, \\ - \dot{\delta} = {}^W \dot{\mathbf{h}} \, \cdot \hat{\mathbf{n}} = -\posd[W]_{cp} \cdot \hat{\mathbf{n}} + \dot{\delta} = {}^W \dot{\mathbf{h}} \, \cdot \hat{\mathbf{n}} = -\posd[W]_{cp} \cdot \hat{\mathbf{n}} = -\vellin[W]_{W,C}^\perp \end{cases} . \end{equation*} @@ -294,7 +294,7 @@ \subsection{Tangential forces} % The same study we considered for the model of normal forces~\parencite{azad_modeling_2010} proposes a spring-damper-clutch system for the tangential forces, where the additional clutch component controls the sticking-slipping condition. -Extending their formulation to our point-surface setting, we can introduce the following relation between the tangential forces and the tangential material deformation: +Extending their 2D formulation to our 3D point-surface setting, we can introduce the following relation between the tangential forces and the tangential material deformation: % \begin{equation} \label{equation:tangential_contact_model} @@ -304,7 +304,7 @@ \subsection{Tangential forces} , \end{equation} % -where $\alpha, \beta \in \mathbb{R}$ are model parameters, $\mathbf{u} \in \mathbb{R}^3$ is the tangential deformation of the material as illustrated in Figure~\ref{fig:soft_contact_model}, and $\vellin[W]_{W,clutch}$ is the clutch velocity. +where $\alpha, \beta \in \mathbb{R}$ are model parameters, $\mathbf{u} \in \mathbb{R}^3$ is the 3D tangential deformation of the material as illustrated in Figure~\ref{fig:soft_contact_model}, and $\vellin[W]_{W,clutch}$ is the unknown clutch velocity. When sticking, the clutch velocity is zero and, assuming the knowledge of $\mathbf{u}$, the tangential force can be computed with Equation~\eqref{equation:tangential_contact_model}. Instead, when the magnitude of the sticking force exceeds the friction cone bounds, the clutch is unlocked and the collidable point starts sliding. @@ -335,6 +335,7 @@ \subsection{Tangential forces} \end{equation} % that can be numerically integrated to obtain $\mathbf{u}$. +It is worth noting that this formulation does not need to either know or keep track of the clutch velocity. \subsection{Augmented system dynamics} @@ -345,11 +346,11 @@ \subsection{Augmented system dynamics} \begin{bmatrix} \mathbf{q} \\ \boldsymbol{\nu} \\ \operatorname{vec}(\mathbf{U}) \end{bmatrix} - \in \mathbb{R}^{2n+2n_c+13} + \in \mathbb{R}^{2n+3n_c+13} . \end{equation*} % -We introduced the matrix $\mathbf{U} \in \mathbb{R}^{2 \times n_c}$ that stacks the tangential deformations corresponding to all the $n_c$ collidable points of the model. +We introduced the matrix $\mathbf{U} \in \mathbb{R}^{3 \times n_c}$ that stacks the tangential deformations corresponding to all the $n_c$ collidable points of the model. Its dynamics can be obtained from Equations~\eqref{equation:tangential_deformation_dynamics} and plugged in the following contacts-aware dynamics system: % \begin{equation} @@ -403,7 +404,7 @@ \section{Integrators} . \end{equation*} % -It is the most basic explicit integration method that can be seen, rearranging the equation, as an approximation of the forward finite difference formula: +It is the most basic explicit integration method that can be seen, by rearranging the equation, as an approximation of the forward finite difference formula: % \begin{equation*} \frac{\mathbf{x}(t + \dd{t}) - \mathbf{x}(t)}{\dd{t}} \approx \dot{\mathbf{x}}(t) @@ -422,9 +423,9 @@ \section{Integrators} , \end{equation*} % -where $\mathbf{k}_i = f\left(\mathbf{x}_i(t), t_i\right)$ is an intermediate evaluation of the system's dynamics. +where $\mathbf{k}_i = f\left(\mathbf{x}_i(t), t_i\right)$ is an intermediate evaluation of the system's dynamics and $\mathbf{x}_i(t)$ the reached intermediate states. Intuitively, these methods reduce the integration error by using an averaged value of the state derivative over the interval. -The most widely adopted instance of this family is the $4$th-order \ac{RK4}, which computes the slopes +The most widely adopted instance of this family is the \ac{RK4} corresponding to a $4$th-order, which computes the slopes % \begin{align*} \mathbf{k}_1 &= f\left(\mathbf{x}(t), t\right) \\ @@ -507,7 +508,7 @@ \subsection{Sliding Box} The simulation is configured with a standard gravity of $g = 9.8 \, m/s$. In this setting, the threshold of the friction cone separating the sticking and the slipping regimes is $\mu f_\perp = 4.9 \, N$, averaged over the four contacts points of the bottom box surface. -Figure~\ref{fig:sliding_box} reports the plots of the $x$ components of the \ac{CoM} position and linear velocity. +Figure~\ref{fig:sliding_box} reports the plots of the $x$ components of the \ac{CoM}'s position and linear velocity. It can be seen that, as expected, when the applied force is smaller than the threshold, the box stays still. As soon as the external force exceeds the threshold, the box starts accelerating. As soon as the external force goes to zero, the frictional effects of the contact model produce a reaction force that decelerates the box with a fast transient until it reaches the sticking regime again. @@ -534,7 +535,7 @@ \section{Conclusions} While sphere and box collisions might seem trivial examples, they often represent the typical collision shapes adopted to simulate robot's feet. The shown results remain valid also in case of a smooth uneven terrain supported by the contact-model, where a comparable behaviour occurs in a different plane. -This setting presents different limitations, especially when compared to general-purpose simulators. -In fact, it does not detect the collision between different bodies and does not consider joint limits and other types of constraints. +The final contact-aware dynamical system combined with the point-surface collision detection logic presents different limitations, especially when compared to implementations provided by general-purpose simulators. +Our solution does not support detecting collisions between different bodies and does not consider joint limits and other types of constraints. To address the former, more advanced geometrical processing is necessary for detecting all ranges of collisions between points, primitive shapes, edges, surfaces, \etc For the latter, instead, it is possible to introduce an additional phase in the simulation step that computes the generalized forces to apply to the system for enforcing those constraints. diff --git a/Chapters/Part_2/chapter_8.tex b/Chapters/Part_2/chapter_8.tex index e88f99d..b71f6a1 100644 --- a/Chapters/Part_2/chapter_8.tex +++ b/Chapters/Part_2/chapter_8.tex @@ -8,8 +8,8 @@ \chapter{Scaling rigid-body simulations} We have observed that such a pipeline was characterised by long training times, and identified the simulator as the main bottleneck. In this chapter, we propose our final simulation architecture for maximising the sampling performance of synthetic data for robot locomotion applications. -Starting from the contact-aware state-space representation proposed in Chapter~\ref{ch:contact_aware_dynamics}, we introduce state-of-the-art \acp{RBDA} for obtaining the terms forming the multibody \acp{EoM} and computing the \emph{forward dynamics} function $\operatorname{FD}(\mathcal{M}, \Qu, \Nu, \Torques, \forcesix_L)$ used in the $\dot{\mathbf{x}}(t)$ definition. -Excluding inverse dynamics, in which our formulation represents the actual inverse of the forward dynamics also for floating-base system, the implementation of the other \acp{RBDA} contains only minor differences compared to the reference formulation presented by \textcite{featherstone_rigid_2008}. +Starting from the contact-aware state-space representation proposed in Chapter~\ref{ch:contact_aware_dynamics}, we introduce state-of-the-art \acp{RBDA} for obtaining the terms forming the multibody \acp{EoM} and computing the \emph{forward dynamics} function $\operatorname{FD}(\mathcal{M}, \Qu, \Nu, \Torques, \forcesix_L)$ used in the $\dot{\mathbf{x}}(t)$ definition of Equation~\ref{equation:tangential_deformation_dynamics} and Equation~\ref{equation:floatig_base_dynamics_with_contacts_state_space}. +Excluding inverse dynamics, that we formulate as the true inverse of the forward dynamics also for floating-base systems, the implementations of the other \acp{RBDA} contain only minor differences compared to the reference formulation presented by \textcite{featherstone_rigid_2008}. Beyond being denoted with the notation introduced in Chapter~\ref{ch:robot_modelling} that makes explicit the reference frame of quantities like 6D velocities and 6D forces, we provide a unified implementation for both fixed-base and floating-base robots, and our inverse dynamics extends its acceleration to force mapping also to the base link. In the second part of this chapter, we propose \jaxsim, a new physics engine in reduced coordinates that, thanks to the implementation of these algorithms in \jax, enables running rigid-body simulations on modern hardware accelerators like \acsp{GPU} and \acsp{TPU}. In fact, the problem of the previous architectures was not really the general-purpose simulator, but the limited parallel capabilities of \acsp{CPU} in which it was executed. @@ -21,7 +21,7 @@ \section{Floating-base rigid-body dynamics algorithms} \label{section:fb_rbda} In previous sections, we described how the dynamics of a floating-base system interacting with a non-flat terrain surface can be described, and how its evolution over time can be obtained through numerical integration. -When we introduced the state-space representations, in Equation~\eqref{equation:floatig_base_dynamics_state_space} and Equation~\eqref{equation:floatig_base_dynamics_with_contacts_state_space}, we used the \emph{forward dynamics} function $\operatorname{FD}$ to describe the dynamics of the floating-base system velocity $\Nu$. +When we introduced the state-space representations, in Equation~\eqref{equation:floatig_base_dynamics_state_space} and Equation~\eqref{equation:floatig_base_dynamics_with_contacts_state_space}, we used the \emph{forward dynamics} function $\operatorname{FD}(\cdot)$ to describe the dynamics of the floating-base system velocity $\Nu$. We also mentioned that it can be calculated by inverting the following \ac{EoM}, already defined in Equation~\eqref{eq:eom_multibody_compact}: % \begin{equation} @@ -30,7 +30,7 @@ \section{Floating-base rigid-body dynamics algorithms} , \end{equation} % -assuming the knowledge, at any integration step, of the floating-base version of the mass matrix $M(\Qu)$, the bias-vector $h(\Qu, \Nu)$, and the Jacobians matrix $\jac_\mathcal{L}(\Qu)$. +by assuming the knowledge, at any integration step, of the floating-base version of the mass matrix $M(\Qu)$, the bias-vector $h(\Qu, \Nu)$, and the Jacobians matrix $\jac_\mathcal{L}(\Qu)$. Calculating the forward dynamics from the \acp{EoM} involves the inversion of the mass matrix that, depending on the number of the number of \acp{DoF}, could be computationally expensive when performed in a simulation loop. Furthermore, depending on the selected integrator, it might be needed to evaluate the system's dynamics multiple times for each simulation step, computing and inverting $M(\mathbf{q})$ at every evaluation. @@ -83,7 +83,7 @@ \subsection{Implementation differences} \item We use the notation of 6D quantities introduced in Chapter~\ref{ch:robot_modelling} that makes explicit the reference frame of 6D quantities like velocities, accelerations, and forces. % \item All the input and output quantities are expressed in inertial representation, with the exclusion of the mass matrix $M(\Qu)$ and the Jacobian $\jac_\mathcal{L}(\Qu)$ that are computed for efficiency reasons in body-fixed representation. -Section~\ref{sec:change_of_base_variables} provides the transformations to apply for changing the velocity representation of the base link. +Section~\ref{sec:change_of_base_variables} provides the transformations to apply for changing the velocity representation of the computed quantities. % \item We assume the model $\mathcal{M}$ only containing joints with 1 \ac{DoF}. As a consequence, all joint quantities $\Shape, \Shaped, \Shapedd, \Torques \in \realn^n$, where $n = N_B - 1$. @@ -91,19 +91,19 @@ \subsection{Implementation differences} \item If the model description contains fixed joints, we assume they can be removed from the kinematic tree through a \emph{lumping} process. If links $P$ and $C$ are connected with a fixed joint, the lumping process replaces the link pair $(P, C)$ with an equivalent link $\tilde{P}$ associated to the frame of $P$, having an equivalent 6D inertia $\masssix[\tilde{P}]_{\tilde{P}} = \masssix[P]_P + \transfor[P]^C \masssix[C]_C \transvel[C]_P$. % -\item The floating-base \ac{RNEA} implementation implemented in \textcite[Section 9.5]{featherstone_rigid_2008} is not a proper $\operatorname{ID}$ function as it was defined in Definition~\ref{def:id}, since it treats the base acceleration as an output instead of being an input. +\item The floating-base \ac{RNEA} implemented in \textcite[Section 9.5]{featherstone_rigid_2008} is not a proper $\operatorname{ID}$ function as it was defined in Definition~\ref{def:id}, since it treats the base acceleration as an output instead of being an input. In other words, the reference $\operatorname{FD}$ and $\operatorname{ID}$ are not inverse functions. In fact, the reference implementation is presented as a hybrid-dynamics problem, where the base acceleration is computed through a forward pass. Our \acs{RNEA} implementation instead accepts as additional input the acceleration of the base $\accsix[W]_{W,B}$, that forms the first six elements of the generalized acceleration $\Nud[W]$. We assume this acceleration being provided externally, either measured or estimated if applied on real robots. -Our algorithm returns the 6D force $\forcesix[W]_B$ that, when exerted on the base link together with the optional external forces, produces the provided acceleration. +Our algorithm returns the 6D force $\forcesix[W]_B$ that, when exerted on the base link together with the optional external forces, produces the provided base acceleration. The effects of gravity are handled internally. % \end{enumerate} \subsection{Model specification} -The information about the kinematics, the inertial properties of all links, and the joint types, are included in the model specification $\mathcal{M}$: +The information about the kinematics, the inertial properties of all links, and the joint types, are included in the model specification $\mathcal{M}$\footnote{All information included in the model $\mathcal{M}$ could be directly computed from model descriptions like \acs{SDF}, \acs{URDF}, \acs{MJCF}, \etc}: % \begin{enumerate} % @@ -121,11 +121,11 @@ \subsection{Model specification} % \item The 6D inertia of all links $\masssix_L$, expressed in link frame. % -\item For each joint $i$, connecting the link pair $(\lambda(i), i)$, $\mathcal{M}$ includes the velocity transformation $\transvel[\operatorname{pre}(i)]_{\lambda(i)}$ that locates the predecessor frame $\operatorname{pre}(i)$ of joint $i$ from the frame $\lambda(i)$ of its parent link. +\item For each joint $i$, connecting the link pair $(\lambda(i), i)$, $\mathcal{M}$ includes the velocity transformation $\transvel[\operatorname{pre}(i)]_{\lambda(i)}$ that locates the predecessor frame $\operatorname{pre}(i)$ of joint $i$ from the frame $\lambda(i)$ of its parent link (refer to Figure~\ref{fig:joint_model} for a visual feedback). % -\item The type of all joints, retrieved through the \spacedlowsmallcaps{Jtype} function from the joint index. +\item The type of all joints, retrieved through the $\text{\spacedlowsmallcaps{Jtype}}(\cdot)$ function from the joint index. % -\item A function \spacedlowsmallcaps{Jcalc} accepting a joint type and a joint position $\shape \in \realn$ that returns the motion subspace $\subspace$ and the joint velocity transform $\transvel[\operatorname{suc}(\cdot)]_{\operatorname{pre}(\cdot)}$ corresponding to the given joint position. +\item A function $\text{\spacedlowsmallcaps{Jcalc}}(\cdot)$ accepting a joint type and a joint position $\shape \in \realn$ that returns the motion subspace $\subspace$ and the joint velocity transform $\transvel[\operatorname{suc}(\cdot)]_{\operatorname{pre}(\cdot)}$ corresponding to the given joint position (refer to Figure~\ref{fig:joint_model} for a visual feedback). % \end{enumerate} % @@ -147,7 +147,7 @@ \subsection{Remarks} Regarding joint accelerations, we assume the presence of joints with only 1~\ac{DoF} and a constant motion subspace, resulting to Equation~\eqref{eq:joint_model_acceleration_propagation} and the considerations reported in Definition~\ref{definition:propagation_accelerations}. % \item We use 1-based indexing for vectors and matrices. -This means that, if $A \in \realn^{n \times m}$, the top-left element is $A_{(1, 1)}$ and the bottom-right $A_{(m,m)}$. +This means that, if $A \in \realn^{n \times m}$, the top-left element is $A_{(1, 1)}$ and the bottom-right $A_{(n,m)}$. Selectors of multiple elements use the $:$ delimiter, for example $A_{(1:3,1:3)}$ is the top-left $3 \times 3$ block, and $A_{(1:3,1)}$ its first column. Note that, while this notation simplifies pseudocode, it requires a careful implementation with 0-based array libraries. % @@ -161,7 +161,7 @@ \subsection{CRBA: Composite Rigid Body Algorithm} M_B(\Shape) = \begin{bmatrix} \masssix[B](\Shape) & F(\Shape) \\ - F^T(\Shape) & H(\Shape) + F^\top(\Shape) & H(\Shape) \end{bmatrix} \in \realn^{(6+n)\times(6+n)} . @@ -209,29 +209,29 @@ \subsection{CRBA: Composite Rigid Body Algorithm} \State $M = \boldsymbol{0}_{(6+n)\times(6+n)}$ -\State $M_{(1:6, 1:6)} = \masssix^c_0$ -\Comment{Compute $M_{bb}$} - \For{$i = N_B^{\mathcal{M}} \textbf{ to } 1$} \State $\masssix^c_{\lambda(i)} = \masssix^c_{\lambda(i)} + \transvel[i]_{\lambda(i)}^\top \masssix^c_i \transvel[i]_{\lambda(i)}$ \LComment{Compute $M_{jj}$} \State $\mathbf{F}_i = \masssix^c_i \mathbf{S}_i$ - \State $M_{(i+6,i+6)} = \mathbf{S}_i^T \mathbf{F}_i$ + \State $M_{(i+6,i+6)} = \mathbf{S}_i^\top \mathbf{F}_i$ \State $j = i$ \While{$\lambda(j) \neq 0$} - \State $\mathbf{F}_i = \transvel[j]_{\lambda(j)}^T \mathbf{F}_i$ + \State $\mathbf{F}_i = \transvel[j]_{\lambda(j)}^\top \mathbf{F}_i$ \State $j = \lambda(j)$ - \State $M_{(i+6,j+6)} = \mathbf{F}^T_i \mathbf{S}_j$ + \State $M_{(i+6,j+6)} = \mathbf{F}^\top_i \mathbf{S}_j$ \State $M_{(j+6,i+6)} = M_{(i+6,j+6)}$ \EndWhile \LComment{Compute $M_{jb}$ and $M_{bj}$} \State $\mathbf{F}_i = \transvel[j]_0^\top \mathbf{F}_i$ - \State $M_{(i+6, 1:6)} = \mathbf{F}_i^T$ + \State $M_{(i+6, 1:6)} = \mathbf{F}_i^\top$ \State $M_{(1:6, i+6)} = \mathbf{F}_i$ \EndFor +\State $M_{(1:6, 1:6)} = \masssix^c_0$ +\Comment{Compute $M_{bb}$} + \State \textbf{output} $M$ \end{algorithmic} @@ -603,7 +603,7 @@ \subsection{Soft-contacts Algorithm} \item $K, D \in \realn^+$: the parameters of the spring-damper model used for both the normal and tangential force calculation; \item $\mu \in \realn^+$: the static friction coefficient of the contact point; \item $\mathcal{H}: \realn^2 \mapsto \realn$: a function returning the terrain height $z_T$ at given $(x, y)$ coordinates; - \item $\mathcal{S}: \realn^2 \mapsto \realn^3$: a function returning the normal of the surface $\hat{\mathbf{n}}$ at given $(x, y)$ coordinates; + \item $\mathcal{S}: \realn^2 \mapsto \realn^3$: a function returning the normal of the surface $\hat{\mathbf{n}}$ at given $(x, y)$ coordinates\footnote{Under the assumption of smooth terrain, an approximation of $\mathcal{S}$ could be calculated from $\mathcal{H}$, \ie $\mathcal{S}(x, y) = f\left(\mathcal{H}(x, y)\right)$}; \end{itemize} % and provides the following outputs: @@ -626,6 +626,7 @@ \subsection{Soft-contacts Algorithm} \State $x_{cp}, y_{cp}, z_{cp} = \pos[W]_{cp}$ \State $z_T = \mathcal{H}(x_{cp}, y_{cp})$ \State ${}^W \dot{\mathbf{u}} = -(K / D) {}^W \mathbf{u}$ +\Comment{Material relaxation dynamics} \State $\forcesix[W]_{cp} = \zeros_6$ \If{$z_{cp} < z_T$} @@ -666,16 +667,17 @@ \subsection{Soft-contacts Algorithm} \end{algorithmic} \end{algorithm} +\pagebreak \section{Jaxsim} \label{section:jaxsim} -The previous sections described how we can model and simulate a floating-base robot locomoting in an environment efficiently compute relevant quantities useful, for example, for model-based control. +The previous sections described how we can model and simulate a floating-base robot locomoting in an environment, and efficiently compute relevant quantities useful, for example, for model-based control. For robot-learning applications, and particularly the domain of \ac{RL}, the execution of the dynamics on \ac{CPU} often represents the major bottleneck of the training pipeline. We can identify two principal sources affecting performance in such setup: \begin{enumerate} - \item \acp{CPU} can only run concurrently albeit with great speed a small number of threads, that usually match the double of their physical cores; - \item When a training pipeline exploits \acp{GPU} for optimising its function approximators (in the form of \acp{NN}, for example), the data sampled from the simulation has to be moved from the \ac{CPU} to the \ac{GPU}, incurring in overheads related to the data transport. + \item \acp{CPU} can only run concurrently --albeit with great speed-- a small number of threads, that usually match the double of their physical cores; + \item When a training pipeline exploits \acp{GPU} for optimising its function approximators (in the form of \acp{NN}, for example), the data sampled from the simulation has to be moved from the \ac{CPU} to the \ac{GPU}, incurring into overheads related to the data transport. \end{enumerate} In this section, we describe how to overcome these two limitations by introducing \jaxsim, a highly scalable rigid-body simulator of floating-base systems for robot locomotion research. @@ -690,16 +692,16 @@ \subsection{Features} \begin{itemize} \item Physics engine implemented in reduced coordinates; - \item Forward Euler and \ac{RK4} integrators; - \item Collision detection between points associated with the model's collision shapes and a customisable terrain surface; - \item Continuous soft-contacts model computing interaction forces without introducing friction cone approximations; + \item Forward Euler, Semi-implicit Euler, and \ac{RK4} integrators; + \item Collision detection between points associated with the model's collision shapes and a customisable smooth terrain surface; + \item Continuous soft-contacts model to compute interaction forces without introducing friction cone approximations; \item Support of the complete set of link's inertial parameters; \item Support of revolute, prismatic, and fixed joints; - \item Support of \ac{SDF} and \ac{URDF} model descriptions; + \item Support of \ac{SDF} and \ac{URDF} model descriptions\footnote{\url{https://github.com/ami-iit/rod}}; \item Implemented in plain Python using the \jax framework for fast development and readability; \item Possibility to maximise runtime performance by \ac{JIT}-compiling Python code and executing physics transparently on \acp{CPU}, \acp{GPU}, and \acp{TPU}; \item Seamless integration with \jax's auto-vectorization capability for parallelizing simulation steps on hardware accelerators; - \item High-level API for computing model-based kinematics and dynamics quantities based on the algorithms presented in Section~\ref{section:fb_rbda}. + \item High-level API for computing model-based kinematics and dynamics quantities based on the algorithms presented in Section~\ref{section:fb_rbda} and notation introduced in Chapter~\ref{ch:robot_modelling}. \end{itemize} \begin{remark*}[\ac{AD} support] @@ -727,7 +729,7 @@ \subsection{Features} \textbf{Tiny Differentiable Simulator} \parencite{heiden_neuralsim_2021} & \cpp & Reduced & \ck & \ck & & \ck & \ck & \ck & & \ck \\ \textbf{Nimble Physics} \hspace{1.5cm} \textcite{werling_fast_2021} & \cpp & Reduced & \ck & & & \ck & \ck & \ck & \ck & \ck \\ \textbf{Nvidia ISAAC} \hspace{1cm} \textcite{makoviychuk_isaac_2021} & \cpp & Reduced & \ck & \ck & & & \ck & \ck & \ck & \\ - \textbf{Dojo} \hspace{3cm} \textcite{howell_dojo_2022} & Julia & Maximal & \ck & & & \ck & \ck & & & \ck \\ + \textbf{Dojo} \hspace{3.5cm} \textcite{howell_dojo_2022} & Julia & Maximal & \ck & & & \ck & \ck & & & \ck \\ \textbf{Brax} \hspace{3cm} \textcite{freeman_brax_2021} & Python & Maximal & \ck & \ck & \ck & \ck & \ck & \ck & & \ck \\ \textbf{\jaxsim} & Python & Reduced & \ck & \ck & \ck & $[*]$ & \ck & & \ck & \ck \\ \bottomrule @@ -788,7 +790,7 @@ \subsubsection{Accuracy} In the first phase, the model is actuated for 1 second with random torques starting from a configuration with zero velocity and, therefore, zero momentum. The simulation is performed with all lossy components (like joint friction) disabled, therefore, due to momentum conservation, the momentum should remain zero over the entire horizon. In the second phase, the model evolves for 100 seconds without actuation from the configuration reached in the previous phase. -Also in this case, not having any lossy effect, the total mechanical energy of the system should remain constant over the entire horizon due to energy conservation. +Also in this case, since there is no loss, the total mechanical energy of the system (\ie the sum of the kinetic and potential energies defined in Equation~\eqref{eq:kinetic_and_potential_energies}) should remain constant over the entire horizon due to energy conservation. We perform this experiment using the model of the iCub robot with a configuration characterised by 23 of its \acp{DoF}. The experiment is executed at different integration step sizes and with both the forward Euler and \ac{RK4} integrators. @@ -800,8 +802,8 @@ \subsubsection{Accuracy} The results of the momentum conservation and energy conservation experiments are shown, respectively, in Figure~\ref{fig:jaxsim_conservation_momentum} and Figure~\ref{fig:jaxsim_conservation_energy}. In the results of the first phase, it can be seen that in almost all configurations the 6D momentum drifts from its initial value. -Particularly, all configurations show a more substantial drift in conjunction with increasing the integration step size. -The configuration of \jaxsim with the \ac{RK4} integrator can outperform the others, showing acceptable drifts with steps below $20$~ms. +Particularly, all configurations show a more substantial drift in conjunction with larger integration steps. +The configuration of \jaxsim with the \ac{RK4} integrator can outperform the other ones, showing acceptable drifts with steps below $20$~ms. Considering as acceptable a drift of $0.1$\% after $1$~s of simulation, also the forward Euler integration scheme stepping at $0.001$~s falls within this limit. In the results of the second phase, a similar drift could be noticed in all the tested configurations. @@ -835,15 +837,15 @@ \subsubsection{Performance} We compute the total time necessary to compute the output of the \ac{CRBA}, \ac{RNEA}, and \ac{ABA} algorithms. To assess how much the execution time is affected by the topology of the simulated robot, we consider three different robot models with an increased number of \acp{DoF}. In particular, we benchmark the 9~\acp{DoF} fixed-base Panda manipulator from Franka Emika, the 12~\acp{DoF} quadruped ANYmal C from ANYbotics, and the humanoid iCub from IIT. -Before running any test with \jaxsim, we call each algorithm one time so that it can get \ac{JIT} compiled since in this experiment we are interested in their runtime performance. -For each bar in the plot, we first compute the mean time taken by a single run of 1000 executions of the algorithms, and then report the average over 10 runs. +Before running any test with \jaxsim, we call each algorithm one time so that it can get \ac{JIT} compiled since in this experiment we are interested in their runtime performance rather than compilation time. +For each bar in the plot, we first compute the mean time taken by a single run of 1000 executions of the algorithms, and then report the average over 10 of these runs. Figure~\ref{fig:jaxsim_benchmark_algos} shows the resulting mean, where the variance of the 10 runs has been omitted since it's negligible for all algorithms. The results are comparable for all three tested robot models. The execution time of the \jaxsim algorithms compared to the implementation of Pinocchio are about 10 times higher when executed on \ac{CPU}, and 100 times higher when executed on \ac{GPU}. These numbers are expected since Pinocchio algorithms are implemented entirely in \cpp and have been optimised for almost a decade. -For all three robot models, the \jaxsim algorithms executed on \ac{CPU} do not exceed $250~\mu s$, making them compatible with a real-time loop running at a target rate of 1~kHz, in which a model-based controller might have to compute the mass matrix $M(\Qu)$ through \ac{CRBA} and the bias forces $h(\Qu, \Nu)$ through \ac{RNEA}. -Instead, the \ac{GPU} execution of a single instance of the algorithms exceeds 1~ms in most cases, making it incompatible with real-time usage. +Regardless, for all three robot models, the \jaxsim algorithms executed on \ac{CPU} do not exceed $250~\mu s$, making them compatible with a real-time loop running at a target rate of 1~kHz, in which a model-based controller might have to compute the mass matrix $M(\Qu)$ through \ac{CRBA} and the bias forces $h(\Qu, \Nu)$ through \ac{RNEA}. +Instead, the \ac{GPU} execution of a single instance of the algorithms exceeds 1~ms in most cases, making them incompatible with real-time usage. \begin{figure} \centering @@ -869,19 +871,19 @@ \subsubsection{Scalability} In this test, we evaluate the scalability of \jaxsim by increasing the number of parallel instances exploiting the auto-vectorization capabilities of \jax. Instead of testing parallel calls of the \acp{RBDA}, we consider a more practical scenario of a 1~ms simulation step with the forward Euler integration scheme. We benchmark the performance on the 23~\acp{DoF} model of the iCub humanoid with 8 collidable points corresponding to the vertices of the two boxes that model its feet collision shapes. -We compute the equivalent \ac{RTF} of the parallel integration, which consists of the ratio between the total simulated time and the time it took to compute it. +We compute the \emph{equivalent} \ac{RTF} of the parallel integration, which consists of the ratio between the total simulated time and the time it took to compute it. For example, if the parallel integration of 10 models takes 1~ms, the equivalent \ac{RTF} is 10. A higher \ac{RTF} corresponds to a better sampling efficiency. This test is performed on the same laptop as the previous tests, and also on a workstation with a more powerful \ac{GPU}, whose specifications are reported in Table~\ref{tab:specifications}. -For each point in the plot, we first compute the mean time of a single run over 100 integration steps, and then report the average over 10 runs. +For each point in the plot, we first compute the mean time of a single run over 100 integration steps, and then report the average over 10 of these runs. The results of the \ac{CPU} and \ac{GPU} executions are shown in Figure~\ref{fig:jaxsim_benchmark_parallel}, where the variance of the 10 runs has been omitted since it is negligible for all executions. On the laptop setup, the integration step on \ac{CPU} starts with a \ac{RTF} greater than 1 already with a single instance. -A single \ac{CPU} core is able to reach a \ac{RTF} of about 5 with 16 parallel models, showing some benefits of parallel integration also on this hardware. +A single \ac{CPU} core is able to reach a \ac{RTF} of about 5 with 16 parallel models, showing some benefits of parallel integration also on this type of hardware. With more than 16 models, increasing the number of models does not give any benefit to the equivalent \ac{RTF} as the execution time grows linearly with the number of models. The integration step of \ac{GPU} starts with a lower \ac{RTF} of 0.11. -This effect is expected since a single \ac{GPU} core is less powerful than a \ac{CPU} core. -However, the parallel integration on \ac{GPU} runs without overhead until 128 models. +This effect is expected since a single \ac{GPU} core is typically less powerful than a \ac{CPU} core. +However, the parallel integration on \ac{GPU} is able to scale without showing any overhead until 128 models. The equivalent \ac{RTF} on \ac{GPU} peaks at a value of about $19$ between 512 and 1024 parallel models. On this hardware, the 512-1024 range is justified by the number of CUDA cores equal to 640. @@ -900,24 +902,25 @@ \subsubsection{Scalability} \label{fig:jaxsim_benchmark_parallel} \end{figure} +\pagebreak \section{Conclusions} In this chapter, we proposed \jaxsim, a new physics engine capable of executing multibody simulations on modern hardware accelerators. \jaxsim simulates the dynamics of free-floating mechanical systems through a reduced coordinates approach, therefore guaranteeing that joint constraints are never violated. -Regarding the interaction with the environment, it is currently capable of detecting collisions between a smooth ground surface not necessarily planar, and points corresponding to the collision shapes of the links. +Regarding the interaction with the environment, it is currently capable of detecting collisions between a smooth ground surface not necessarily planar and points belonging to the collision shapes of the links. Being implemented in reduced coordinates, it allows to efficiently compute all terms of the \ac{EoM} that are fundamental for model-based control at any simulated step. We have validated its integration accuracy, obtaining either comparable or better results than Gazebo Sim, depending on the adopted integration scheme. -We also benchmarked the \acp{RBDA} performance, finding that \ac{JIT}-compiled Python code on \ac{CPU} runs 10 times slower compared to a state-of-the-art \cpp implementation, remaining compatible with real-time usage. -However, the best characteristics of \jaxsim emerge when modern hardware accelerators are exploited in highly parallel computations. +We also benchmarked the \acp{RBDA} performance, finding that \ac{JIT}-compiled Python code on \ac{CPU} runs 10 times slower compared to a state-of-the-art \cpp implementation, remaining in any case compatible with real-time usage. +Nonetheless, the best characteristics of \jaxsim emerge when modern hardware accelerators are exploited in highly parallel computations. We have shown that it can reach a \ac{RTF} of 20 on a laptop and 200 on a workstation. Applications requiring sampling experience with high throughput such as those characterising \ac{RL} research, would benefit the most from these performances. Furthermore, generating physics data directly in the same device where function approximators are optimised would remove any overhead related to the data transfer from the \ac{CPU}. We intend to investigate these directions in future activities. -The physics engine still presents multiple limitations in this first version. +\jaxsim still presents multiple limitations in this first version. Firstly, the shown performances were obtained in a setting where no exteroception was necessary. Integrating basic rendering functionality would surely affect performance. -Furthermore, \jaxsim does not yet support collisions between links, limiting its adoption for robot manipulation. +Furthermore, collisions between links are not yet supported, limiting the adoption for robot manipulation. Finally, it does not yet allow enforcing additional dynamic constraints like joint limits, and closed kinematic chains. To conclude, although the automatic differentiation capability provided by the \jax framework has not yet been thoroughly tested with the \jaxsim algorithms, its combination with the smooth dynamics given by the contact-aware state-space representation opens up exciting research directions that will be explored in the future. diff --git a/Chapters/epilogue.tex b/Chapters/epilogue.tex index 961a962..d77f54f 100644 --- a/Chapters/epilogue.tex +++ b/Chapters/epilogue.tex @@ -17,7 +17,7 @@ \section*{Summary} The thesis continued with Part~\ref{part:contribution}, describing the contributions to knowledge of this thesis. The first two chapters analysed the challenges of creating robotic environments for \ac{RL} research for the aim of sampling experience. -Motivated by the fragmented state of the existing frameworks providing environments for robotics and the desire to develop environments that could run deterministically on either simulated or real-time runtimes without the need for major refactoring, Chapter~\ref{ch:rl_env_for_robotics} presented \scenario and Gym-Ignition. +Motivated by the fragmented state of the existing frameworks providing environments for robotics and the desire to develop environments that could run on either simulated or real-time runtimes without the need for major refactoring, Chapter~\ref{ch:rl_env_for_robotics} presented \scenario and Gym-Ignition. \scenario (\spacedlowsmallcaps{SCEN}e interf\spacedlowsmallcaps{A}ces for \spacedlowsmallcaps{R}obot \spacedlowsmallcaps{I}nput/\spacedlowsmallcaps{O}utput) provides unified interfaces for entities part of a scene like World and Model. Gym-Ignition, instead, provides abstraction layers of different components that structure a robotic environment, like the Task and the Runtime. The combination of the two projects enables a modular development of environments for robotics, where the decision-making logic could be implemented only once and executed transparently on all the supported runtimes, either simulated or in real-time. @@ -28,7 +28,7 @@ \section*{Summary} To conclude, the last chapters of Part~\ref{part:contribution} analysed the problem of the high computational cost associated with the sampling of synthetic data from rigid multibody simulations. Motivated by the long time required for training the policy presented in Chapter~\ref{ch:learning_from_scratch}, and the increased interest by the robot learning community to offload and parallelize computation on hardware accelerators, we proposed a simulation architecture to maximise the sampling performance of simulated data for robot locomotion applications. Towards this aim, in Chapter~\ref{ch:contact_aware_dynamics}, we introduced a continuous state-space representation modelling the contact-aware dynamics of floating-base robots. -Assuming the knowledge of the terrain surface's smooth profile, and formulating a soft-contacts model to compute the interaction forces for point-surface collisions, we obtained a continuous \acs{ODE} system that can be integrated numerically to simulate its dynamical evolution over time. +Assuming the knowledge of the terrain surface's smooth profile, and utilising a soft-contacts model to compute the interaction forces for point-surface collisions, we obtained a continuous \acs{ODE} system that can be integrated numerically to simulate its dynamical evolution over time. Finally, exploiting this representation, in Chapter~\ref{ch:scaling_rigid_body_simulations} we presented \jaxsim, a new physics engine capable to execute simulations of floating-base robots entirely on hardware accelerators like \acsp{GPU} and \acsp{TPU}. We adapted widely used \aclp{RBDA} to run in this context, formulating them with the notation introduced in Chapter~\ref{ch:robot_modelling}. Their definition was unified to be applicable also on fixed-based robots. @@ -40,19 +40,19 @@ \section*{Discussions, Limitations, and Future work} \subsection*{\autoref{ch:rl_env_for_robotics}: Reinforcement Learning Environments for Robotics} -The proposed framework, composed by \scenario and Gym-Ignition, was initially aimed to be applicable to both simulated and real-world robots. -We started the development with the simulation backend in late 2018, when we decided to leverage the Gazebo Sim general-purpose simulator that, at that time, did not even reach its first major release. +The proposed framework, composed of \scenario and Gym-Ignition, was initially aimed to be applicable to both simulated and real-world robots. +We started the development of the simulation backend in late 2018, when we decided to leverage the Gazebo Sim general-purpose simulator that, at that time, did not even reach its first major release. Considering the status of \ac{RL} research for robotics in the same period, which was mainly performed either in PyBullet or the closed-source Mujoco, we saw the potential to obtain a complete, open-source, and actively maintained simulator to base our research. The possibility to extend the simulator through custom plugins, the support of integrating third-party physics engines, the new architecture allowing to use it as a library through programmatic calls, and the knowledge and infrastructure already developed for its predecessor Gazebo Classic, were all appealing to the direction we envisioned for our research. The Gazebo \scenario backend enabled us to experiment with our first \ac{RL} problem described in Chapter~\ref{ch:learning_from_scratch}, whose concluding details will be discussed in the next section. -The framework remains a valid alternative to the other freely available online, especially nowadays since Gazebo Sim reached almost feature parity with its predecessor. -Upstram activities are currently ongoing to bridge Gazebo Sim with Nvidia Omniverse and the Isaac simulator, expecting significant advances particularly regarding rendering capabilities. -Considering the limitations of newer solutions like those presented in Chapter~\ref{ch:scaling_rigid_body_simulations}, general-purpose simulators will remain the standard when perception is required for the aim of sampling synthetic data for \ac{RL}-based robotics research. +The framework remains a valid alternative to the other options freely available online, especially nowadays since Gazebo Sim reached almost feature parity with its predecessor. +Upstream activities are currently ongoing to bridge Gazebo Sim with Nvidia Omniverse and the Isaac simulator, expecting significant advances particularly regarding rendering capabilities. +Considering the limitations of newer domain-specific solutions like those presented in Chapter~\ref{ch:scaling_rigid_body_simulations}, general-purpose simulators will remain the standard when perception is required for the aim of sampling synthetic data for \ac{RL}-based robotics research. Nonetheless, our Gazebo \scenario backend still has limitations. Our \acp{API} do not yet support the sensor interfaces of Gazebo Sim, therefore their data can only be gathered from the network, a solution that does not ensure the reproducibility of sampled data. Considering the generic \scenario project, instead, after we refocused the research project to rely mainly on simulations, the activities of a real-time backend to communicate with YARP-based robots like the iCub remained on hold since. The possibility of running the same environment implementation on both simulated and real-world robots is an interesting solution for sim-to-real research, and it will be considered for future activities. -From the Gym-Ignition point of view, instead, limitations are less impacting. +From the Gym-Ignition point of view, instead, limitations are less impactful. The framework is generic enough for developing most categories of robotic environments. What is still missing is a collection of canonical examples and benchmarks similar to the robotic environments provided by OpenAI. Access to a new collection would help the community to have a more diverse range of environments that, thanks to Gazebo Sim, could run on any physics engine supported by the simulator. @@ -64,16 +64,16 @@ \subsection*{\autoref{ch:learning_from_scratch}: Learning from scratch exploitin Starting from a simple reward structure, we tried to address undesired behaviour by iteratively adding new terms, until its final form. We decided to control most of the \acp{DoF} of the iCub robot, acknowledging that the kinematics is highly redundant, and the policy optimisation could have stalled to local optima. Parameter tuning is paramount for these applications, and details are too often left out from research discussions. -For tuning our reward function (for each term, its weights and kernel parameters), we adopted a heuristic method in which we analysed the learning curves of individual terms of the reward, tuning the sensitivity of the corresponding kernel if the algorithm was not improving its performance, and then balancing the gain to obtain the desired trade-off among all the reward terms. +For tuning our reward function (for each term, its weights and kernel parameters), we adopted a heuristic method in which we analysed the learning curves of individual terms of the reward, tuning the sensitivity of the corresponding kernel if the algorithm was not improving its performance, and then balancing the weight to obtain the desired trade-off among all the reward terms. Nonetheless, each experiment was days long on powerful workstations, and parameter tuning resulted in a long and, at some point, quite extenuating process. Much work also went into the training infrastructure, leveraging a cluster of machines and implementing the proper experiment deployment, with logging and synchronisation support. -Beyond the training process, also our results show limitations, particularly when possible future sim-to-real applications are considered. +Beyond the training process, our results also show limitations, particularly when possible future sim-to-real applications are considered. Our control architecture relies on a policy providing velocity references, that are integrated and given as inputs to independent \pid joint controllers, generating the final joint torques actuated by the simulator. Beyond being difficult to tune with performance comparable to those of the real-world counterpart, the low-level \pid controllers present a trade-off between accuracy and compliance. In our experiment, the position \pid controllers resulted in a stiff robot, that together with rigid contacts, limited the emergence of a natural, smooth, and more human-like behaviour. We think two different directions can be considered for improvements. -As first direction, \pid controllers could be replaced by a single whole-body controller that typically exploits the information of the model dynamics. +As a first direction, \pid controllers could be replaced by a single whole-body controller that typically exploits the information of the model dynamics. It would consider the entire robot as a whole and possibly reduce the differences from real robots. However, they are more complex to design, more computationally expensive, and making them work reliably on all the robot configurations allowed by the state space of the \ac{MDP} is difficult. A second direction could consider different policy outputs (joint torques, velocities, positions, \etc), which means a different nature of the high-level trajectory. @@ -81,17 +81,17 @@ \subsection*{\autoref{ch:learning_from_scratch}: Learning from scratch exploitin In both cases, a change in policy output and its corresponding action space could have major effects on exploration. Common \ac{RL} algorithms for continuous actions usually learn a distribution from which actions are sampled during the training process for exploration purposes. The typical choice is a multivariate Gaussian distribution, but it does not play well with bounded spaces. -Other studies have found other distributions like the Beta~\parencite{chou_improving_2017} that behave better in this context. +Other studies have found other distributions like the Beta~\parencite{chou_improving_2017} that might behave better in this context. Studies investigating robot control with \ac{RL} could be considered part of the bigger umbrella of trajectory optimisation, in which applications like push-recovery and locomotion are instances. -As reviewed in Section~\ref{sec:review_rl_robot_locomotion}, application targeting quadrupeds already managed to successfully target real robots. +As reviewed in Section~\ref{sec:review_rl_robot_locomotion}, applications targeting quadrupeds already managed to successfully target real robots. The situation for real bipeds, instead, from when our research project started in 2018, didn't progress noticeably. The gap between the latest simulated results~\parencite{peng_ase_2022} and the few ones targeting real-world robots~\parencite{castillo_robust_2021,li_reinforcement_2021,rodriguez_deepwalk_2021,bloesch_towards_2022} is still wide. Common characteristics of these studies are either the usage of large simulations with techniques of domain randomization and imitation learning, or the usage of curriculum learning and the introduction of non-ideal effects in simulation like actuation delays. In view of this discussion, for locomotion purposes, we think that imitation learning could provide a suitable trade-off between exploration guidance, avoidance of local minima, and learning stability. -Future activities will focus of integrating novel motion generation techniques~\parencite{viceconte_adherent_2022} within \ac{RL} environments. -Practitioners working in \ac{RL} applied to robotics must be aware that this field suffers of mostly all the challenges that have been identified by the community~\parencite{dulac-arnold_empirical_2021}, and can learn from their successes and failures~\parencite{ibarz_how_2021}. +Future activities will focus on integrating novel motion generation techniques~\parencite{viceconte_adherent_2022} within \ac{RL} environments. +Practitioners working in \ac{RL} applied to robotics must be aware that this field suffers from most of the challenges that have been identified by the community~\parencite{dulac-arnold_empirical_2021}, and can learn from their successes and failures~\parencite{ibarz_how_2021}. \subsection*{\autoref{ch:contact_aware_dynamics}: Contact-aware Multibody Dynamics} @@ -105,7 +105,7 @@ \subsection*{\autoref{ch:contact_aware_dynamics}: Contact-aware Multibody Dynami Another direction of possible improvements regards the integration schemes. As shown in the benchmarks of Chapter~\ref{ch:scaling_rigid_body_simulations}, the forward Euler scheme is the fastest (and simpler) integrator, but its accuracy is not as good as what is achieved by the \ac{RK4} scheme. Other common methods often used in similar physics engines, like semi-implicit and implicit schemes, could provide performance similar to \ac{RK4} at a cost comparable to forward Euler. -Finally, the state-space representation, in its default form, does not allow to enforce bounds to state space variables. +Finally, the state-space representation, in its default form, does not allow to enforce bounds to state space variables, useful for example to enforce joint position limits. Common workarounds involve the introduction of penalty-based continuous forces~\parencite{xu_accelerated_2022}, or introducing exogenous variables mapping the unbounded joint positions to a bounded range. Future work will address all these limitations, extending the supported use-cases for the physics engine proposed in Chapter~\ref{ch:scaling_rigid_body_simulations}. @@ -116,8 +116,8 @@ \subsection*{\autoref{ch:scaling_rigid_body_simulations}: Scaling Rigid Body Sim Its key features are the possibility to compile kernels developed in Python with a \ac{JIT} approach, auto-vectorization support, NumPy compatibility, and high-order \ac{AD} support. All these features are inherited by \jaxsim, whose algorithms can be executed with all the benefits of the underlying technology. \jaxsim, however, also inherits the limitations of \jax. -The need to compile code at its first execution could take many minutes, depending on the complexity of the logic and the active hardware acceleration. -The \ac{GPU} and \ac{TPU} backends of \jax are much more optimised compared to the \ac{CPU} backend, that nonetheless, despite longer compiling time, is able to run code pretty fast compared to plain Python. +The need to compile code at its first execution could take several minutes, depending on the complexity of the logic and the active hardware acceleration. +The \ac{GPU} and \ac{TPU} backends of \jax are much more optimised compared to the \ac{CPU} backend, that nonetheless, despite longer compiling time, is able to run code faster than plain Python. We have not yet optimised our algorithms aggressively to improve compilation time, especially because we expect to see soon caching support of compiled code. Regarding scalability, our benchmarks considered two \acp{GPU} with 640 and 4608 CUDA cores. The potential of executing code on these types of hardware, when the problem permits parallelization, becomes year after year more appealing considering the technological progress. @@ -127,11 +127,11 @@ \subsection*{\autoref{ch:scaling_rigid_body_simulations}: Scaling Rigid Body Sim The activities to assess the support and implement \ac{AD} support are ongoing, and we expect they will enable us to start investigating all the new emerging methodologies involving differentiable simulations. Other activities planned for the near future involve the \ac{RL} stack built over \jaxsim. -The combination of environment interfacing with \jaxsim and \ac{RL} algorithms implemented in \jax would result in a single application whose data never leaves the hardware accelerator. +The combination of an environment interfacing with \jaxsim and \ac{RL} algorithms implemented in \jax would result in a single application whose data never leaves the hardware accelerator. Therefore, beyond the sampling performance of parallel simulations, the complete pipeline would also prevent the data transfer overhead that is always present when some computation has to happen on \acp{CPU}. -We already implemented a \jax version of \ac{PPO} and tested on the canonical examples of inverted pendulum and cartpole swing-up, but the results are too preliminary and have not been included. +We already implemented a \jax version of \ac{PPO} and tested on the canonical examples of inverted pendulum and cartpole swing-up, but the results are too preliminary and have not been included in this thesis. Future work will continue this activity, extending the investigation to contact-rich locomotion problems. -Finally, we would like to embed these environments in Gym-Ignition, creating a new \jaxsim \scenario \acp{API}, so that all the benefits of future real-time backends could be applicable on \jaxsim experiments. +Finally, we would like to embed these environments in Gym-Ignition, creating a new \jaxsim \scenario component, so that all the benefits of future real-time backends could be applicable on \jaxsim experiments. Towards this goal, Gym-Ignition should switch to the upcoming functional version of \verb|gym.Env| that has been recently proposed upstream. \vfill @@ -143,5 +143,5 @@ \section*{Conclusions} We believe that, particularly in this domain, the infrastructure plays such an essential role to the extent that those possessing a large enough technological advantage would stand out with ease. We hope to have helped readers reaching this final paragraph understand the challenges and research directions that are still necessary to obtain the next generation of robots capable of seamlessly operating around us. Simulation technology is evolving rapidly. -We believe that future progress that will inevitably characterise the next decades is going to set aside the real breakthroughs that all robotic practitioners have been waiting for long time. +We believe that future progress that will inevitably characterise the next decades is going to set aside the real breakthroughs long awaited by all robotic practitioners. We can not wait to keep contributing with all the enthusiasm that characterised the activities carried out for this thesis. diff --git a/Chapters/prologue.tex b/Chapters/prologue.tex index d24e75f..94b0260 100644 --- a/Chapters/prologue.tex +++ b/Chapters/prologue.tex @@ -32,7 +32,7 @@ \chapter*{Prologue} Machine intelligence is among the most fascinating problems currently researched by our society. \ac{DL}, which consists of the recent combination of \ac{ML} with deep \acp{NN} recently enabled by the computational power of modern computing, revolutionised domains like computer vision and natural language processing. Under the assumption of large enough datasets, the most advanced algorithms belonging to the Supervised Learning family have been demonstrated to be capable of training systems that have often shown super-human performance. -One could think of applying similar techniques to the field of robotics to obtaining comparable success without realising the inherent challenges posed by this domain. +One could think of applying similar techniques to the field of robotics to obtain comparable success without realising the inherent challenges posed by this domain. Robots are physical systems. Generating datasets as large as those that characterise the most recent research in computer vision and language processing would either take too long or not even be possible, without even considering the wear and tear of the hardware and the operational and maintenance cost. Furthermore, regardless of the feasibility of data collection, robotics is an interactive domain, and data sampled from robots has a sequential conformation. @@ -44,18 +44,18 @@ \chapter*{Prologue} The transfer to robotics was inevitable. However, \ac{RL} methods learn sequentially following a process similar to trial-and-error. When applied to robotics, during their training phase, these methods could generate control actions that may damage either the robot or its surroundings. -For this reason, the accomplishments in the past decade of \ac{RL} applied to robotics have been demonstrated mainly in simulations rather than with real-world robots. +For this reason, the accomplishments in the past decade of \ac{RL} applied to robotics have been demonstrated mainly in simulation rather than with real-world robots. Generating synthetic data from simulators is cheap and effective, can be scaled to dozens or hundreds of concurrent instances, and there is no risk of damage. Undeniably, synthetic data can only capture physical effects that can be described through equations executed by a computer program, and similarly to other domains, also robotic simulations suffer from the trade-off between accuracy and speed. -Particularly, contact dynamics have always been challenging to capture with high fidelity, and usually simulators neglect non-ideal effects like motor dynamics, actuation delays, and backlash. -However, the past decades have shown that advances in computing hardware and software methodologies have always had a substantial impact on reducing the effects of this trade-off, also called \emph{reality gap}, and we believe that also research in robotic simulations will continue in this direction. +Particularly, contact dynamics has always been challenging to capture with high fidelity, and usually simulators neglect non-ideal effects like motor dynamics, actuation delays, and backlash. +However, the past decades have shown that advances in computing hardware and software methodologies have always had a substantial impact on reducing the effects of this trade-off --also called \emph{reality gap}-- and we believe that also research in robotic simulations will continue in this direction. In the 2010s, the \ac{RL} community has been particularly prolific, producing a considerable number of new algorithms. Despite their improved properties like increased sample efficiency, faster and more stable convergence, \etc they always rely on a given amount of trial-and-error experience, with no exception. If, on the one hand, better algorithms are lower-bounded by the least amount of data describing the decision-making logic to learn, on the other, the amount of synthetic data that simulators can generate has no theoretical upper bound. The rapid adoption of \ac{RL} in the robotics domain forced practitioners to use the simulation technology from either robotics or gaming available at that time, which was never optimised for maximising execution speed. -In fact, a single simulation instance executing in real-time has always been more than enough for the original applications. +In fact, a single simulation instance executing in real-time has always been more than enough for the original purpose. It is not uncommon to find \ac{RL} experiments that require days, weeks, or even months worth of data, giving those with access to large computational resources a significant advantage. The robot learning community is aware of this limitation to the extent that today's research on domain-specific simulators that can be massively scaled on modern hardware accelerators, not necessarily as fully-fledged as in the past, is surging. @@ -72,13 +72,13 @@ \chapter*{Prologue} \subsubsection{Part I: Background and Fundamentals} -This part introduces the reader to the fundamental concepts behind the contributions of this thesis, reviews the state-of-the-art of corresponding domains, and provides a detailed overview of the contributions to knowledge. +This part introduces the reader to the fundamental concepts behind the contributions of this thesis, reviews the state-of-the-art of corresponding domains, and provides a detailed overview of research output supporting the contributions to knowledge. \begin{itemize} \item Chapter~\ref{ch:introduction} introduces robotic simulators, defining their main components and properties. It also describes the enabling technologies that made the work of this thesis possible, such as the Gazebo Sim simulator and the \jax framework. \item Chapter~\ref{ch:robot_modelling} introduces the notation and the equations governing rigid multibody dynamics, and how relevant dynamics and kinematics quantities could be computed and propagated through the model of a robotic mechanical structure. \item Chapter~\ref{ch:reinforcement_learning} introduces the main concepts and notation of \ac{RL}. It formulates the structure of the \ac{RL} problem, describes the main families of algorithms that could be used to compute a solution, and develops the theory behind policy optimisation, reaching the formulation of the \acl{PPO} algorithm. - \item Chapter~\ref{ch:sota} reviews the state-of-the-art of the domains of \acl{RL} applied to robot locomotion, simulators for robot learning, and methodologies for push recovery. It also details open problems that characterise these domains, and how the contributions to knowledge of this thesis aims to solve them. + \item Chapter~\ref{ch:sota} reviews the state-of-the-art of the domains of \acl{RL} applied to robot locomotion, simulators for robot learning, and methodologies for push recovery. It also details open problems that characterise these domains, and how the contributions to knowledge of this thesis aim to solve them. \end{itemize} \subsubsection{Part II: Contributions} @@ -195,11 +195,11 @@ \section*{Software Projects} \vspace{3mm} \noindent -\textbf{\texttt{scenario}} is an abstraction layer providing \acp{API} to interact with simulated and real robots. The goal of the project is to allow developing high-level code that can target all the available implementations using the same function calls. Currently, it provides a complete implementation for interfacing with robots simulated using the Gazebo Sim general-purpose simulator. The \acp{API} are developed and available in \cpp. Python bindings are also provided. The library is open-source, released under the \emph{LGPL v2.1 or any later version}, and it is available at \url{https://github.com/robotology/gym-ignition/tree/master/scenario}. +\textbf{\texttt{scenario}} is an abstraction layer providing \acp{API} to interact with simulated and real robots. The goal of the project is to allow developing high-level code that can target all the available implementations using the same function calls. Currently, it provides a complete implementation for interfacing with robots simulated using the Gazebo Sim general-purpose simulator. The \acp{API} are developed and available in \cpp. Python bindings are also provided. The library is open-source, released under the \emph{LGPL v2.1 or any later version}, and it is available at \linebreak \url{https://github.com/robotology/gym-ignition/tree/master/scenario}. \vspace{3mm} \noindent -\textbf{\texttt{gym-ignition}} is a Python framework to create reproducible robotics environment for \ac{RL} research. It exposes an abstraction layer providing \acp{API} that enable the development of \ac{RL} environment compatible with \texttt{gym.Env}. The resulting environments, if they exploit \texttt{scenario}, become agnostic from setting in which they execute (either simulated or in real-time). The project also provides helpful utilities to compute common quantities commonly used by robotic environments, includes support of environment randomization, and handles the correct propagation of randomness. The library is open-source, released under the \emph{LGPL v2.1 or any later version}, and it is available at \url{https://github.com/robotology/gym-ignition}. +\textbf{\texttt{gym-ignition}} is a Python framework to create reproducible robotics environment for \ac{RL} research. It exposes an abstraction layer providing \acp{API} that enable the development of \ac{RL} environment compatible with \texttt{gym.Env}. The resulting environments, if they exploit \texttt{scenario}, become agnostic from setting in which they execute (either simulated or in real-time). The project also provides helpful utilities to compute common quantities commonly used by robotic environments, includes support of environment randomization, and handles the correct propagation of randomness. The library is open-source, released under the \emph{LGPL v2.1 or any later version}, and it is available at \linebreak \url{https://github.com/robotology/gym-ignition}. \vspace{3mm} \noindent @@ -207,4 +207,4 @@ \section*{Software Projects} \vspace{3mm} \noindent -\textbf{\texttt{jaxsim}} is a scalable physics engine in reduced coordinates implemented with \jax. It is developed in Python and supports most of the \jax features, including \ac{JIT} compilation and auto-vectorization. Simulations can be executed on all the hardware supported by \jax, including \acp{CPU}, \acp{GPU}, and \acp{TPU}. The library is open-source, released under the \emph{BSD-3-Clause}, and it is available at \url{https://github.com/ami-iit/jaxsim}. +\textbf{\texttt{jaxsim}} is a scalable physics engine in reduced coordinates implemented with \jax. It is developed in Python and supports most of the \jax features, including \ac{JIT} compilation and auto-vectorization. Simulations can be executed on all the hardware supported by \jax, including \acp{CPU}, \acp{GPU}, and \acp{TPU}. The library is open-source, released under the \emph{BSD-3-Clause}, and it is available at \linebreak \url{https://github.com/ami-iit/jaxsim}. diff --git a/FrontBackmatter/abstract.tex b/FrontBackmatter/abstract.tex index 6201a46..47f51b6 100644 --- a/FrontBackmatter/abstract.tex +++ b/FrontBackmatter/abstract.tex @@ -44,7 +44,7 @@ Such progress revolutionised domains like computer vision and language processing, showing performance previously out of reach. One may think that results could transfer straightforwardly to other fields like robotics until realising the existence of domain-specific characteristics and limitations hindering the potential of these learning methods. Generating enough data from real-world robots is often too expensive or not even possible to the desired scale. -Data sampled from robots have a sequential nature, and not all families of learning algorithms are effective in this context. +Data sampled from robots has a sequential nature, and not all families of learning algorithms are effective in this context. Furthermore, most algorithms that excel in this sequential setting, such as those belonging to the \ac{RL} family, learn by a trial-and-error process, which could lead to trajectories that damage either the robots or their surroundings. In this thesis, we attempt to answer the question, diff --git a/FrontBackmatter/contents.tex b/FrontBackmatter/contents.tex index ac6f3f3..2ae1d62 100644 --- a/FrontBackmatter/contents.tex +++ b/FrontBackmatter/contents.tex @@ -92,9 +92,11 @@ \acro{DS}{Double Support} \acro{EoM}{Equation of Motion} \acrodefplural{EoM}[EoMs]{Equations of Motion} + \acro{F/T}{Force/Torque} \acro{GAE}{Generalized Advantage Estimator} \acro{GPU}{Graphics Processing Unit} \acro{GUI}{Graphical User Interface} + \acro{HAL}{Hardware Abstraction Layer} \acro{IMU}{Inertial Measurement Unit} \acro{JIT}{Just-in-time} \acro{KL}{Kullback–Leibler} @@ -121,6 +123,7 @@ \acro{TPU}{Tensor Processing Unit} \acro{TRPO}{Trust Region Policy Optimization} \acro{URDF}{Unified Robot Description Format} + \acro{USD}{Universal Scene Description} \acro{XLA}{Accelerated Linear Algebra} \acro{ZMP}{Zero Moment Point} \end{acronym} From d984a2af0f53dcaf9875f268ec631f4187ab9b30 Mon Sep 17 00:00:00 2001 From: Diego Ferigo Date: Mon, 22 May 2023 17:04:27 +0200 Subject: [PATCH 2/5] Address typos, minor corrections, and confined changes --- Chapters/Part_1/chapter_1.tex | 9 +- Chapters/Part_1/chapter_2.tex | 81 +- Chapters/Part_1/chapter_3.tex | 53 +- Chapters/Part_1/chapter_4.tex | 22 +- Chapters/Part_2/chapter_5.tex | 9 +- Chapters/Part_2/chapter_6.tex | 79 +- Chapters/Part_2/chapter_7.tex | 246 +- Chapters/Part_2/chapter_8.tex | 5 +- Chapters/covid_impact_statement.tex | 8 +- Chapters/epilogue.tex | 11 +- Chapters/prologue.tex | 5 +- FrontBackmatter/abstract.tex | 2 +- FrontBackmatter/contents.tex | 89 + classicthesis-config.tex | 16 + .../chapter_7/soft_contact_model.tikz | 8 +- new_commands.tex | 5 + thesis.tex | 3 + zotero.bib | 4784 +++++++++-------- 18 files changed, 2935 insertions(+), 2500 deletions(-) diff --git a/Chapters/Part_1/chapter_1.tex b/Chapters/Part_1/chapter_1.tex index 8dad428..49f99bc 100644 --- a/Chapters/Part_1/chapter_1.tex +++ b/Chapters/Part_1/chapter_1.tex @@ -1,4 +1,3 @@ -\acresetall \chapter{Simulators and Enabling Technologies} \label{ch:introduction} @@ -24,7 +23,7 @@ \subsection{Components} \paragraph{Description Parser} -Accordingly to the robot modelling provided in the previous chapter, a robot can be described as a set of \emph{links} (or bodies) interconnected by a set of \emph{joints}, that apply motion constraints and can provide actuation. +A robot can be described as a set of \emph{links} (or bodies) interconnected by a set of \emph{joints}, which apply motion constraints and can provide actuation. Simulating a robot requires knowledge of its \emph{kinematics}, encoding the topology and geometry of the links and joints, and its \emph{dynamics}, encoding their physical properties. This information is usually provided in textual form by a structured \emph{robot description}. Examples of common descriptions are the \ac{URDF}, \ac{SDF}, \ac{MJCF}, \ac{USD}. @@ -39,7 +38,7 @@ \subsection{Components} Depending on how the joint constraints are simulated, we can categorise physics engines in \emph{maximal coordinates} and \emph{reduced coordinates}. Using maximal coordinates, each simulated body is treated separately in the Cartesian space, and the overall robot's dynamics is computed by solving an optimisation problem that applies explicit constraints to enforce on the kinematics the effects of the joints. Instead, using reduced coordinates, the system dynamics considers the mechanical structure as a whole and it implicitly enforces motion constraints induced by the joints. -The physics engine usually also includes routines of \emph{collision detection} that, exploiting geometrical properties of the link's collisions shapes, allow to assess if bodies are in contact, and \emph{contact models}, that compute interaction forces between colliding bodies. +The physics engine usually also includes routines of \emph{collision detection} that, exploiting geometrical properties of the link's collisions shapes, allow to assess if bodies are in contact, and \emph{contact models}, which compute interaction forces between colliding bodies. A general-purpose simulator either implements or interfaces with at least one physics engine, and it is not uncommon to find simulators exposing multiple physics engines. \paragraph{Public APIs} @@ -123,12 +122,12 @@ \subsection{Gazebo Sim} Over the years, we developed the model description of our robots\footnote{\url{https://github.com/robotology/icub-models}} and built an entire infrastructure\footnote{\url{https://github.com/robotology/gazebo-yarp-plugins}} for this simulator. However, while previous attempts\footnote{\url{http://wiki.ros.org/openai_ros}}~\parencite{zamora_extending_2017,lopez_gym-gazebo2_2019} tried to integrate Gazebo into a \ac{RL} training pipeline, the performance that could be obtained together with the need to execute it in a separated process, always prevented its wide adoption by the robot learning community. -After more than 15 years of development, Open Robotics started the development of a new simulator, representing the next generation of Gazebo, that from now we will call \emph{Gazebo Classic} for the sake of clarity. +After more than 15 years of development, Open Robotics started the development of a new simulator, representing the next generation of Gazebo, that from now on we will call \emph{Gazebo Classic} for the sake of clarity. The new simulator, initially known as Ignition Gazebo and later rebranded as \emph{Gazebo Sim}, is a modular suite of libraries partially extracted from the monolithic architecture of its predecessor. Gazebo Sim, contrarily to its predecessor, offers programmatic \acp{API} to instantiate and step the simulator, enabling users to obtain a finer control of the simulation cycle. One of the simulation architectures presented in this thesis is based on Gazebo Sim. -A more detailed overview of the features that motivated the adoption of the simulator and why they represent a valid choice for the contributed architecture will be discussed in more detail in Section~\ref{sec:scenario_gazebo}. +A more detailed overview of the features that motivated the adoption of the simulator and why they represent a valid choice for the contributed architecture is discussed in more detail in Section~\ref{sec:scenario_gazebo}. \subsection{The iCub humanoid robot} diff --git a/Chapters/Part_1/chapter_2.tex b/Chapters/Part_1/chapter_2.tex index a9d165e..3613734 100644 --- a/Chapters/Part_1/chapter_2.tex +++ b/Chapters/Part_1/chapter_2.tex @@ -1,4 +1,3 @@ -\acresetall \chapter{Robot Modelling} \label{ch:robot_modelling} @@ -253,11 +252,11 @@ \subsection{Left-trivialized velocity} \end{bmatrix} \begin{bmatrix} \rotd[A]_B & \orid[A]_B \\ \zeros_{1\times 3} & 0 - \end{bmatrix} =\\ + \end{bmatrix} \\ &= \begin{bmatrix} \rot[A]_B^\top \rotd[A]_B & \rot[A]_B^\top \orid[A]_B \\ \zeros_{1\times 3} & 0 - \end{bmatrix} =\\ + \end{bmatrix} \\ &= \begin{bmatrix} \velang[B]_{A,B}^\wedge& \vellin[B]_{A,B} \\ @@ -308,12 +307,11 @@ \subsection{Right-trivialized velocity} \begin{bmatrix} \rot[A]_B^\top & -\rot[A]_B^\top \ori[A]_B \\ \zeros_{1\times3} & 1 - \end{bmatrix} - =\\ + \end{bmatrix} \\ &= \begin{bmatrix} \rotd[A]_B \rot[A]_B^\top & \orid[A]_B - \rotd[A]_B \rot[A]_B^\top \ori[A]_B \\ \zeros_{1\times 3} & 0 - \end{bmatrix} = \\ + \end{bmatrix} \\ &= \begin{bmatrix} \velang[A]_{A,B}^\wedge & \vellin[A]_{A,B} \\ @@ -583,6 +581,7 @@ \subsection{Forces} The combination of the overline with the star marks mnemonically the $(\cdot)^{-\top}$ operator. \section{Rigid-body Kinematics} +\label{sec:rigid_body_kinematics} This section provides a mathematical description of the kinematics of a \emph{rigid body}. We first provide the definition of a rigid body, and then define its position and velocity. @@ -699,13 +698,13 @@ \subsection{Equations of Motion} \label{sec:eom_rigid_body} We have seen that the kinematics of a rigid body can be described with the position and velocity of its corresponding frame: $\homo[W]_B$ and $\homod[W]_B$, respectively. -The dynamics of the rigid body can be derived from the basic principles of Lagrangian mechanics. +The dynamics of the rigid body can be derived from the formulation of Lagrangian mechanics founded on the principle of least action~~\parencite{bullo_geometric_2004, selig_geometric_2005, marsden_jerrold_e_introduction_2013, maruskin_dynamical_2018}. \begin{definition*}[Lagrangian mechanics] % -Lagrangian mechanics defines a \emph{mechanical system} with a pair $(\mathbb{Q}, L)$ of a \emph{configuration space} $\mathbb{Q}$ and a smooth function $L(\Qu, \Qud) = K - U$ called \emph{Lagrangian}. -The Lagrangian takes as input the system configuration and the system velocity $(\Qu, \Qud) \in \mathbb{Q} \times \mathbb{V}$, -where $\mathbb{V}$ is the \emph{algebra} of $\mathbb{Q}$, and computes the difference between the kinetic energy $K$ and potential energy $U$ of the system. +Lagrangian mechanics defines a \emph{mechanical system} with a pair $(\mathcal{Q}, L)$ of a \emph{configuration space} $\mathcal{Q}$ and a smooth function $L(\Qu, \Qud) = K - U$ called \emph{Lagrangian}. +The Lagrangian takes as input the system configuration and the system velocity $(\Qu, \Qud) \in \mathcal{Q} \times \mathcal{V}$, +where $\mathcal{V}$ is the \emph{tangent space} of $\mathcal{Q}$, and computes the difference between the \emph{kinetic energy} $K$ and \emph{potential energy} $U$ of the system. % \end{definition*} @@ -724,28 +723,30 @@ \subsection{Equations of Motion} \label{definition:euler_lagrange} % The trajectory $\Qu(t)$ is the stationary point of the action functional $\mathcal{S}$ if and only if it satisfies the Euler-Lagrange equation: -\begin{equation*} +\begin{equation} + \label{eq:euler_lagrange} \dv{t} \pdv{L}{\Qud} - \pdv{L}{\Qu} = 0 . -\end{equation*} +\end{equation} % -This equation holds in Euclidean space, when $\mathbb{Q} = \mathbb{V} = \realn^n$. +This equation holds in Euclidean space, when $\mathcal{Q} = \mathcal{V} = \realn^n$. % \end{definition*} -Depending on the system to be described, there may be many possible choices of the \emph{generalised position} $\Qu \in \mathbb{Q}$ and its derivative $\Qud \in \mathbb{V}$, that we will refer as \emph{generalised velocity}. +Depending on the system to be described, there may be many possible choices of the \emph{generalised position} $\Qu \in \mathcal{Q}$ and its derivative $\Qud \in \mathcal{V}$, that we will refer as \emph{generalised velocity}. In the case of a rigid body, we can describe the system configuration with the kinematic quantities of its corresponding frame, \ie using $\Qu = \homo[W]_B \in \SE(3)$ and $\Qud = \homod[W]_B$. With this choice of variables, the Lagrangian of the system is the following: % \begin{equation*} - L(\homo[W]_B, \homod[W]_B) = K(\homo[W]_B, \homod[W]_B) - U(\homo[W]_B). + L \left( \homo[W]_B, \homod[W]_B \right) = K \left( \homo[W]_B, \homod[W]_B \right) - U \left( \homo[W]_B \right). \end{equation*} \begin{remark*}[Extension to non-Euclidean spaces] % It can be noted that the choice of $(\Qu, \Qud) = (\homo[W]_B, \homod[W]_B)$ implies that the configuration space is no longer Euclidean. -In these circumstances, the Euler-Lagrange equation does not hold. -It can be shown that, introducing Lie theory, the Lagrangian formulation can be generalised to any smooth manifold, and the \acp{EoM} of the system can be obtained by applying the Euler-Poincaré equation~\parencite{marsden_jerrold_e_introduction_2010, maruskin_dynamical_2018}. +In these circumstances, the Euler-Lagrange equation~\eqref{eq:euler_lagrange} does not hold. +It can be shown that, introducing Lie theory, the Lagrangian formulation can be generalised to any smooth manifold, and the \acp{EoM} of the system can be obtained by applying the Euler-Poincaré equation~\parencite{marsden_jerrold_e_introduction_2013, maruskin_dynamical_2018}. +To this end, $\mathcal{Q}$ must belong to a group and $\mathcal{V}$ related to its Lie algebra (respectively, the $\SE(3)$ matrix Lie group and the $\se(3)$ group in our case). In this background chapter, in order to help focus only on the most important theoretical results required to understand the topic discussed in this thesis, we will omit the mathematical details of differential geometry. The interested readers are recommended to refer to specific textbooks~\parencite{warner_foundations_1983, selig_geometric_2005} for a rigorous derivation of the theory of Lie groups and differential manifolds. % @@ -757,16 +758,16 @@ \subsection{Equations of Motion} % \begin{equation} \label{eq:left_trivialized_lagrangian} - l(\homo, \velsix) = L(\homo, \homo \velsix^\wedge) = k(\velsix) - U(\homo) + \ell(\homo, \velsix) = L(\homo, \homo \velsix^\wedge) = \kappa(\velsix) - U(\homo) , \end{equation} % -where we used the relation $\homod = \homo \velsix^\wedge$ of Equation~\eqref{eq:sixd_velocity_body_fixed}, and introduced the \emph{trivialized kinetic energy} $k(\cdot)$ that can be shown to only depend on $\velsix$. +where we used the relation $\homod = \homo \velsix^\wedge$ of Equation~\eqref{eq:sixd_velocity_body_fixed}, and introduced the \emph{trivialized kinetic energy} $\kappa(\cdot)$ that can be shown to only depend on $\velsix$~\parencite[Remark~2.5]{traversaro_modelling_2017}. The trivialized kinetic energy and the potential energy can be computed as: % \begin{align} \label{eq:kinetic_and_potential_energies} - k(\velsix) &= \frac{1}{2} \velsix^\top \masssix \velsix , \\ + \kappa(\velsix) &= \frac{1}{2} \velsix^\top \masssix \velsix , \\ U(\homo) &= \begin{bmatrix}\gravity^\top & \zeros_{1\times3}\end{bmatrix} m \homo @@ -776,7 +777,7 @@ \subsection{Equations of Motion} % where $\gravity = \gravity[W] \in \realn^3$ is the gravitational acceleration vector, and $\mathbf{c}$ the displacement between of the \ac{CoM} of the body and its frame, as defined in Equation~\eqref{eq:com_definition}. -We now note that the sets of the system's positions and velocities are connected by a relationship between $\homo[W]_B \in \SE(3)$ and $\velsix^\wedge \in \se(3)$. +We now note that the sets of the system's positions and velocities are connected by a relationship between $\homo[W]_B \in \SE(3)$ and $\velsix^\wedge_{W,B} \in \se(3)$. This relation enables the application of the Euler-Poincaré equation to obtain the system's dynamics. As shown by \textcite[Section~2.6.2]{traversaro_modelling_2017}, the resulting \acp{EoM} of the rigid body are the following: % @@ -858,7 +859,7 @@ \section{Joint Model} We model a joint as a time-varying transformation between the frames $P$ and $C$ of its \emph{parent} and \emph{child} links: % \begin{equation*} - \homo[P]_C(\shape) : \realn \mapsto \SE(3) + \homo[P]_C(\shape) : \realn \to \SE(3) . \end{equation*} % @@ -1123,12 +1124,13 @@ \subsection{Generalised position and velocity} In reduced coordinates, we can formalise the generalised position and the generalised velocity of the floating-base multibody system as follows: % -\begin{equation*} +\begin{equation} +\label{eq:floating_base_position} \begin{cases} - \Qu = \left(\homo[W]_B, \Shape\right) \in \mathbb{Q} = \SE(3) \times \realn^n \\ - \Qud = \left(\homod[W]_B, \Shaped\right) \in \mathbb{V} + \Qu = \left(\homo[W]_B, \Shape\right) \in \mathcal{Q} = \SE(3) \times \realn^n \\ + \Qud = \left(\homod[W]_B, \Shaped\right) \in \mathcal{V} \end{cases} -\end{equation*} +\end{equation} % where we introduced the \emph{joint positions} $\Shape \in \realn^n$, also called \emph{shape}. Under the assumption of having only 1-\ac{DoF} joints, $n$ is the overall number of internal \aclp{DoF} of the system, matching the number of joints $n_J$. @@ -1137,14 +1139,15 @@ \subsection{Generalised position and velocity} Similarly to what we observed for a single rigid body, it can be more convenient to represent the system's velocity as a column vector: % -\begin{equation*} +\begin{equation} +\label{eq:floating_base_velocity} \Nu[X] = \begin{bmatrix} \velsix[X]_{W,B} \\ \Shaped \end{bmatrix} \in \realn^{6+n} , -\end{equation*} +\end{equation} % where $\velsix[X]_{W,B}$ is the velocity of the base link, $\Shaped \in \realn^n$ are the \emph{joint velocities}, and the generic frame $X$ is a placeholder to select one among the \emph{body-fixed} $X=B$, \emph{inertial-fixed} $X=W$, or \emph{mixed} $X=B[W]$ representations. @@ -1156,7 +1159,7 @@ \subsection{Kinematics} \subsubsection{Link pose} The pose of a link $E$ \wrtl the world frame uniquely depends on the generalised position $\Qu$. -We can denote the pose as a function $\homo[W]_E(\Qu): \mathbb{Q} \mapsto \SE(3)$, defined as follows: +We can denote the pose as a function $\homo[W]_E(\Qu): \mathcal{Q} \mapsto \SE(3)$, defined as follows: % \begin{equation} \label{eq:multibody_kinematics_pose} @@ -1276,8 +1279,8 @@ \subsection{Dynamics} In the simplified setting of maximal coordinates, the Lagrangian of the overall system can be obtained as the combination of the left-trivialized Lagrangian of all its links: % \begin{align*} - l(\Qu, \Nu) &= k(\Qu, \Nu) - U(\Qu), \\ - k(\Qu, \Nu) &= \frac{1}{2} \sum_{L \in \mathcal{L}} \velsix[L]_{W,L}^\top \masssix_L \velsix[L]_{W,L}, \\ + \ell(\Qu, \Nu) &= k(\Qu, \Nu) - U(\Qu), \\ + \kappa(\Qu, \Nu) &= \frac{1}{2} \sum_{L \in \mathcal{L}} \velsix[L]_{W,L}^\top \masssix_L \velsix[L]_{W,L}, \\ U(\Qu) &= - \sum_{L \in \mathcal{L}} \begin{bmatrix}\gravity[W]^\top & \zeros_{1\times3}\end{bmatrix} m_L \, \homo[W]_L \begin{bmatrix} {}^L\mathbf{c} \\ 1 \end{bmatrix} @@ -1287,7 +1290,7 @@ \subsection{Dynamics} In reduced coordinates, we can obtain an alternative and more compact expression of the energies: % \begin{align*} - k(\Qu, \Nu) &= \frac{1}{2} \Nu^\top M(\Qu) \Nu, \\ + \kappa(\Qu, \Nu) &= \frac{1}{2} \Nu^\top M(\Qu) \Nu, \\ U(\Qu) &= -\begin{bmatrix}\gravity^\top & \zeros_{1\times3}\end{bmatrix} m \, \homo[W]_B \begin{bmatrix}{}^B\mathbf{c}(\Shape) \\ 1 \end{bmatrix} @@ -1318,14 +1321,14 @@ \subsection{Dynamics} % We introduced $\jac_L(\Qu) = \jac[L]_{W,L/B}$ as the floating-base left-trivialized Jacobian of link $L$ for left-trivialized generalised velocities $\Nu[B]$, as defined in Equation~\eqref{eq:jacobian_left_trivialized}. -The \aclp{EoM} of the multibody system, considering that its configuration $\Qu$ is an element of $\SE(3)\times\realn^n$, can be obtained by applying the Hamel equations~\parencite{marsden_jerrold_e_introduction_2010, maruskin_dynamical_2018}, that can be seen as the combination of the Euler-Poincarè equation for the base variables in $\SE(3)$ and the classical Euler-Lagrange equation for the joint variables in $\realn^n$. +The \aclp{EoM} of the multibody system, considering that its configuration $\Qu$ is an element of $\SE(3)\times\realn^n$, can be obtained by applying the Hamel equations~\parencite{marsden_jerrold_e_introduction_2013, maruskin_dynamical_2018}, that can be seen as the combination of the Euler-Poincarè equation for the base variables in $\SE(3)$ and the classical Euler-Lagrange equation for the joint variables in $\realn^n$. The left-trivialized Lagrangian plugged into the Hamel equations gives the \acp{EoM} of the multibody system~\parencite[Appendix~A.4]{traversaro_modelling_2017}: % \begin{equation} \label{eq:equation_of_motion_multibody} \begin{cases} \Qud = \left( \homod[W]_B, \Shaped \right) \\ - M(\Qu) \Nud + C(\Qu, \Nu) \Nu + g(\Qu) = B \Torques + \sum_{\mathcal{L}} J_{L}^\top(\Qu) \forcesix_L^{ext} + M(\Qu) \Nud + C(\Qu, \Nu) \Nu + g(\Qu) = B \Torques + \sum_{L \in \mathcal{L}} J_{L}^\top(\Qu) \forcesix_L^{ext} \end{cases} \end{equation} % @@ -1336,6 +1339,16 @@ \subsection{Dynamics} g(\Qu) &= -M(\Qu) \begin{bmatrix}\rot[W]_B^\top \gravity[W] \\ \zeros_{3\times1} \\ \zeros_{n\times1}\end{bmatrix} . \end{align*} +% +When the decomposition of gravity and Coriolis effects is not important, we will use the compact form of the \ac{EoM}: +% +\begin{equation} + \label{eq:equation_of_motion_multibody_compact} + M(\Qu) \Nud + h(\Qu, \Nu) = B \Torques + \sum_{L \in \mathcal{L}} J_{L}^\top(\Qu) \forcesix_L^{ext} + , +\end{equation} +% +where we introduced the vector of \emph{bias forces} $h(\Qu, \Nu) \in \realn^{6+n}$. \begin{remark*} % diff --git a/Chapters/Part_1/chapter_3.tex b/Chapters/Part_1/chapter_3.tex index 7c089f1..f81a1ad 100644 --- a/Chapters/Part_1/chapter_3.tex +++ b/Chapters/Part_1/chapter_3.tex @@ -1,4 +1,3 @@ -\acresetall \chapter{Basics of Reinforcement Learning} \label{ch:reinforcement_learning} @@ -12,7 +11,7 @@ \chapter{Basics of Reinforcement Learning} The trial-and-error nature together with delayed rewards, which give rise to the \emph{credit assignment problem}, are two among the essential features that characterise \acs{RL}~\parencite{sutton_reinforcement_2018}. This chapter, based on the theory and notation from~\textcite{achiam_spinning_2018} and \textcite{dong_deep_2020}, describes in greater detail the \acl{RL} setting, introduces the terminology and the mathematical framework used throughout the thesis, and presents algorithms to solve the corresponding reward maximisation problem. -Despite being a generic setting from a high-level perspective, all the elements part of Figure~\ref{fig:rl_setting} may have different properties altering the formulation of the specific problem at hand. +Despite being a generic setting from a high-level perspective, all the element parts of Figure~\ref{fig:rl_setting} may have different properties altering the formulation of the specific problem at hand. This thesis focuses on the application of \ac{RL} to the robotics domain, constraining the nature of environment modelling, the structure of the policies that can be learned, and the family of algorithms that can be employed. In particular, this chapter is aimed at introducing the theory of operating on environments modelled with unknown stochastic continuous dynamics, providing continuous rewards and receiving continuous actions generated by stochastic policies implemented as neural networks. The setting adopting this family of policies is known as \ac{DRL}. @@ -29,6 +28,8 @@ \chapter{Basics of Reinforcement Learning} \section{Key concepts} \label{section:key_concepts} +This section provides the basic terminology and definitions of \ac{RL} used throughout the following chapters, mainly borrowed from~\parencite{sutton_reinforcement_2018, achiam_spinning_2018, dong_deep_2020}. + \subsection{Environment} The environment is the -- rather abstract -- entity of the \ac{RL} setting that determines the evolution of a system as a consequence of an action, and the generation of the reward signal. @@ -39,7 +40,7 @@ \subsection{Environment} Therefore, the state's definition is enough to encode all the information about the past evolution of the environment. State transition maps can assume different forms depending on the nature of the environment. -Formally, if the environment is \emph{deterministic}, state transitions can be expressed with a \emph{state-transition function} $f: \mathcal{S} \times \mathcal{A} \mapsto \mathcal{S}$; instead, if the environment is \emph{stochastic}, state transitions can be expressed with the \emph{state-transition probability density function} $\mathcal{P}: \mathcal{S} \times \mathcal{A} \mapsto \operatorname{Pr}[\mathcal{S}]$: +Formally, if the environment is \emph{deterministic}, state transitions can be expressed with a \emph{state-transition function} $f: \mathcal{S} \times \mathcal{A} \to \mathcal{S}$; instead, if the environment is \emph{stochastic}, state transitions can be expressed with the \emph{state-transition probability density function} $\mathcal{P}: \mathcal{S} \times \mathcal{A} \to \operatorname{Pr}[\mathcal{S}]$: % \begin{equation} \label{eq:state_transition_equations} @@ -50,7 +51,7 @@ \subsection{Environment} \end{equation} The environment is also responsible for generating the \emph{immediate reward} $r_t \in \mathbb{R}$. -In its most generic form, the \emph{reward function} can be modelled as a function $\mathcal{R}: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \mapsto \mathbb{R}$: +In its most generic form, the \emph{reward function} can be modelled as a function $\mathcal{R}: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \to \mathbb{R}$: % \begin{equation*} r_t = \mathcal{R}(s_t, a_t, s_{t+1}) . @@ -59,6 +60,9 @@ \subsection{Environment} \acl{RL}, contrary to other approaches that find optimal policies in a comparable setting, assumes that both the state-transition function $\mathcal{P}$ and the reward function $\mathcal{R}$ are unknown. It focuses on methods to learn how to act optimally only by sampling sequences of states-actions-rewards without assuming any knowledge of the process generating their data. +The objective(s) of the learning process are known as \emph{tasks}. +In the context of \ac{RL}, the reward function $\mathcal{R}$ is responsible to guide the agent towards the fulfilment of the considered task(s). + \begin{remark*}[Episodic and Continuing tasks] % The iterative nature of \ac{RL} requires the environment to be sampled continuously over different \emph{episodes}. @@ -66,7 +70,7 @@ \subsection{Environment} We can identify two episode termination strategies, corresponding to the underlying \emph{task} -- the objective the reward is maximising. \emph{Episodic tasks} are characterised by a --possibly variable-- finite length $T$, \ie they do not continue after reaching a terminal state $s_T$. \emph{Continuing tasks}, instead, are characterised by an infinite horizon and they never terminate. -On some occasions, it is useful considering episodic tasks as continuing tasks by assuming that in time steps $t > T$, the last state $s_T$ becomes a special \emph{absorbing state}, that is a state that can only transition to itself regardless to the action generating only rewards of zero. +On some occasions, it is useful considering episodic tasks as continuing tasks by assuming that in time steps $t > T$, the last state $s_T$ becomes a special \emph{absorbing state}, that is a state that can only transition to itself regardless of the action generating only rewards of zero. On other occasions, instead, it is useful to truncate the evolution of a continuing task at a certain time step~\parencite{pardo_time_2022}. % \end{remark*} @@ -75,10 +79,11 @@ \subsection{Environment} % It's common in the \ac{RL} literature to use the terms \emph{state} and \emph{observation} interchangeably, depending on the context. However, practically speaking, the agent does not always receive the same state $s$ that encodes the complete environment configuration, whose space $\mathcal{S}$ could also be non-Euclidean. -When necessary, we introduce the function $O: \mathcal{S} \mapsto \mathcal{O}$ that computes the observation $o \in \mathcal{O}$ from the state $s$. +When necessary, we introduce the function $O: \mathcal{S} \to \mathcal{O}$ that computes the observation $o \in \mathcal{O}$ from the state $s$. Oftentimes, we consider the observation $o$ as the input to the agent, and it is convenient to consider $\mathcal{O} = \mathbb{R}^n$. To keep a clean notation and reduce confusion, in the rest of this background chapter we just use the notation $s$. We only differentiate between $s$ and $o$ when they are explicitly different quantities. +Furthermore, we always assume \emph{fully observable} problems, meaning that the full state is always available to the agent~\parencite{lovejoy_survey_1991, jaakkola_reinforcement_1994}. % \end{remark*} @@ -93,13 +98,13 @@ \subsection{Policy} Policies can be either \emph{deterministic}, \ie given a state $s$, the agent always takes the same action: % \begin{equation*} - a_t(s_t) = \mu(s_t) \quad \text{, with } \mu(s_t): \mathcal{S} \mapsto \mathcal{A}, + a_t(s_t) = \mu(s_t) \quad \text{, with } \mu(s_t): \mathcal{S} \to \mathcal{A}, \end{equation*} % or \emph{stochastic}, \ie modelled as a probability distribution, from which the action is sampled: % \begin{equation*} - a_t \sim \pi(\cdot|s_t) \quad \text{, with } \pi(\cdot|s_t): \mathcal{S} \mapsto \operatorname{Pr}(\mathcal{A}). + a_t \sim \pi(\cdot|s_t) \quad \text{, with } \pi(\cdot|s_t): \mathcal{S} \to \operatorname{Pr}(\mathcal{A}). \end{equation*} % In most \ac{DRL} applications, policies assume the form of neural networks, whose parameters $\boldsymbol{\theta}$ are updated during the training procedure by an optimisation algorithm. @@ -127,12 +132,12 @@ \subsection{Policy} We define the policy being the corresponding parameterized density, denoted as: % \begin{equation*} - \pi(a_t \given s_t) = + \pi_{\boldsymbol{\theta}}(a_t \given s_t) = f(a_t \given \boldsymbol{\theta}(s_t)) = f_{\mathcal{N}}(a_t \given \mu_{\boldsymbol{\theta}}(s_t), \sigma_{\boldsymbol{\theta}}) . \end{equation*} % -A common choice is to compute only $\mu_{\boldsymbol{\theta}} =\mu_{\boldsymbol{\theta}}(s_t) $ from the forward pass of the \ac{NN}, that takes the state $s_t$ as input. +A common choice is to compute only $\mu_{\boldsymbol{\theta}} =\mu_{\boldsymbol{\theta}}(s_t) $ from the forward pass of the \ac{NN}, which takes the state $s_t$ as input. The standard deviation is often computed by standalone parameters independent of the \ac{NN} and consequently is state-independent. Furthermore, to prevent generating negative values of $\sigma_{\boldsymbol{\theta}}$, it's common practice to let these standalone parameters provide $\log \sigma_{\boldsymbol{\theta}}$, so that they could be optimised in $\realn$ and, upon exponentiation, return a standard deviation in $\realn^+$. @@ -151,7 +156,7 @@ \subsection{Policy} The former calculates the probability of $x$ to fall within a particular range of values considering constant distribution parameters. The latter, instead, also denoted as $\mathcal{L}(\mu, \sigma \given x)$, describes the probability of the observed data $x$ as a function of varying distribution parameters. As we will analyse later, calculations are often simpler when considering the \emph{log-likelihood} $\boldsymbol{\ell} = \log{\mathcal{L}}$ instead of the plain likelihood. -From a state $s$, assuming a diagonal multivariate Gaussian policy with mean $\mu = \mu_{\boldsymbol{\theta}}(s)$ and standard deviation $\sigma = \sigma_{\boldsymbol{\theta}}$, it can be shown that the log-likelihood of an action $a \in \mathbb{R}^k$ is the following: +From a state $s$, assuming a diagonal multivariate Gaussian policy with mean $\mu = \mu_{\boldsymbol{\theta}}(s)$ and standard deviation $\sigma = \sigma_{\boldsymbol{\theta}}$, it can be shown that the log-likelihood of an action $a \in \mathbb{R}^k$ is the following~\parencite{bishop_pattern_2006}: % \begin{equation} \boldsymbol{\ell}(\mu, \sigma \given a) = @@ -190,7 +195,7 @@ \subsection{Trajectory} % The literature is not uniform on the time subscript of the immediate reward $r_{(\cdot)}$. In this thesis, we consider the data-generating process illustrated in Figure~\ref{fig:data_generating_process_mdp}. -The transition $t \rightarrow t+1$ corresponds to the data $(s_t, a_t) \rightarrow (r_t, s_{t+1})$. +The transition $t \rightarrow t+1$ corresponds to the data $(s_t, a_t) \mapsto (r_t, s_{t+1})$. Note that the subscript of the reward is associated to the time $t$ of the tuple $(s_t, a_t)$ that generated it. While this might seem odd at a first glance, it keeps the notation consistent when building the dataset $\mathcal{D}$ of a trajectory $\tau$. In practice, it's common to store the experience in the form $\mathcal{D} = \{(s_0, a_0, r_0), (s_1, a_1, r_1), \dots, (s_{T-1}, a_{T-1}, r_{T-1}), (s_T, \cdot, 0)\}$. @@ -242,14 +247,14 @@ \subsection{Return} \subsection{The Reinforcement Learning Problem} \label{section:reinforcement_learning_problem} -The combination of all these concepts allows defining the \ac{RL} problem more pragmatically. +Combining all these concepts allows defining the \ac{RL} problem more pragmatically. In our setting, it is defined as follows: % \begin{displayquote} The maximisation of the infinite-horizon discounted return $R(\tau)$ of a trajectory $\tau$ sampled following a stochastic policy $\pi(\cdot|s_t)$ from an uncertain environment modelled with an unknown state transition probability $\mathcal{P}(\cdot|s_t, a_t)$ and unknown reward function $\mathcal{R}(s_t, a_t, s_{t+1})$. \end{displayquote} % -In this setting, we can compute the probability of a trajectory $\tau$ of an arbitrary length $T$ as\footnote{For continuing tasks, we can truncate the trajectory after $T$ steps and handle the return of the last state properly.}: +In this setting, we can compute the probability of a trajectory $\tau$~\parencite{sutton_reinforcement_2018, achiam_spinning_2018} of an arbitrary length $T$ as\footnote{For continuing tasks, we can truncate the trajectory after $T$ steps and handle the return of the last state properly.}: % \begin{equation} \label{equation:probability_of_trajectory} @@ -280,15 +285,15 @@ \section{Reinforcement Learning Formalism} \subsection{Markov Decision Processes} -\aclp{MDP} are one of the classical formulations of sequential decision-making, that introduce the mathematical framework of choice for the discrete-time stochastic setting described in Section~\ref{section:key_concepts}. +\aclp{MDP}~\parencite{puterman_markov_2005, sutton_reinforcement_2018} are one of the classical formulations of sequential decision-making, which introduce the mathematical framework of choice for the discrete-time stochastic setting described in Section~\ref{section:key_concepts}. With the notation introduced in the previous section: % \begin{itemize} \item The set of all valid states $\mathcal{S}$, \item The set of all valid actions $\mathcal{A}$, - \item The reward function $\mathcal{R}: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \mapsto \mathbb{R}$, - \item The state-transition probability function $\mathcal{P}: \mathcal{S} \times \mathcal{A} \mapsto \operatorname{Pr}[\mathcal{S}]$, + \item The reward function $\mathcal{R}: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \to \mathbb{R}$, + \item The state-transition probability function $\mathcal{P}: \mathcal{S} \times \mathcal{A} \to \operatorname{Pr}[\mathcal{S}]$, \item The initial state distribution $\rho_0$, \end{itemize} % @@ -305,7 +310,7 @@ \subsection{Markov Decision Processes} % \end{definition*} % -It follows that the dynamics of a \ac{MDP} is uniquely modelled by $\mathcal{P}$. +It follows that the dynamics of an \ac{MDP} is uniquely modelled by $\mathcal{P}$~\parencite{sutton_reinforcement_2018}. Given a state-action pair $(s_t, a_t) \in \mathcal{S} \times \mathcal{A}$, the probability of transitioning to the next state $s_{t+1}$ is $\mathcal{P}(s_{t+1} \given s_t, a_t)$. Concerning the support of delayed rewards to solve the credit assignment problem, which represents the second key feature of iterative \ac{RL}, \acp{MDP} exploit \emph{value functions}. @@ -339,7 +344,7 @@ \subsection{Value functions} In this case, we assign a scalar value to each state-action pair $(s, a) \in \mathcal{S} \times \mathcal{A}$. The definitions of these value functions now give us a helpful metric to define the performance of a policy $\pi$. -In fact, we can consider policy $\pi_A$ better than policy $\pi_B$ if $V^{\pi_A}(s) \geq V^{\pi_B}(s)$ for all states $s \in \mathcal{S}$. +In fact, we can consider policy $\pi_A$ better than policy $\pi_B$ if $V^{\pi_A}(s) > V^{\pi_B}(s)$ for all states $s \in \mathcal{S}$. An \ac{MDP} $\mathcal{M}$ always has at least one policy that performs better than all the others, and this is the optimal policy $\pi_*$. The optimal state-value function and the optimal action-value function of the policy $\pi_*$ (or policies, if the optimal policy is not unique) are: % @@ -353,7 +358,7 @@ \subsection{Value functions} \end{align*} It is often not needed to compute both value functions. -Two connections between the value functions are the following: +Two connections between the value functions are the following~\parencite{sutton_reinforcement_2018}: % \begin{align} V^\pi(s) &= \E_{a \sim \pi} \left[ Q^\pi(s, a) \right] \label{equation:v_as_func_of_q}\\ @@ -378,12 +383,12 @@ \subsection{Value functions} % It plays an important role when we need to describe the quality of an action in a relative sense. In fact, we can think of $V(s_t)$ as a function providing the expected return of state $s_t$ averaged over all the possible actions $a_t$ that can be taken in this state\footnote{This can be clearly seen from Equation~\eqref{equation:v_as_func_of_q}.}, and $Q(s_t, a_t)$ as providing the expected return of state $s_t$ considering that the action taken was $a_t$. -If this action $a_t$ performs better than average, expressed mathematically as $Q(s_t, a_t) > V(s_t) \rightarrow A(s_t, a_t) > 0$, we could use this information to reinforce the choice of $a_t$ the next time the trajectory evolves through $s_t$. +If this action $a_t$ performs better than average, expressed mathematically as $Q(s_t, a_t) > V(s_t) \implies A(s_t, a_t) > 0$, we could use this information to reinforce the choice of $a_t$ the next time the trajectory evolves through $s_t$. \subsection{Bellman Equation} \label{section:bellman_equation} -An optimisation problem in discrete time like the \ac{RL} problem can be structured in a recursive form, \ie expressing the relationship between a quantity in one step and the same quantity at the next one. +An optimisation problem in discrete time like the \ac{RL} problem can be structured in a recursive form, \ie expressing the relationship between a quantity in one step and the same quantity in the next one. If a problem can be structured in this form, the equation expressing the update rule between the two periods is known as \emph{Bellman equation}. The value functions for policy $\pi$ introduced in Section~\ref{section:value_functions} can be transformed into a recursive form by noticing that we can express the return as follows: @@ -419,7 +424,7 @@ \section{Algorithms} \subsection{Model-free and Model-based} The formulation of the \ac{RL} problem provided in Section~\ref{section:reinforcement_learning_problem} outlines that the agent has no knowledge of environment details. -The state-transition probability $\mathcal{P}(\cdot|s_t, a_t)$ and the reward function $\mathcal{R}(s_t, a_t, s_{t+1})$ are usually unknown. +The state-transition probability density function $\mathcal{P}(\cdot|s_t, a_t)$ and the reward function $\mathcal{R}(s_t, a_t, s_{t+1})$ are usually unknown. The agent needs to explore the environment through trial-and-error while trying to learn how to maximise the collected immediate rewards. There are, however, algorithms that assume a --possibly partial-- knowledge of environment details that can be advantageous in different circumstances. @@ -442,7 +447,7 @@ \subsection{Value-based and Policy-based} They introduce a function approximator whose parameters $\boldsymbol{\theta}$ can be iteratively optimised to maximise a performance function $J(\boldsymbol{\theta})$. Value-based methods learn a parameterized action-value function $Q_{\boldsymbol{\theta}}(s, a)$ usually applying updates based on the Bellman equations introduced in Section~\ref{section:bellman_equation}. -From the action-value function, actions can be deterministically computed as $a(s) = \argmax_a Q_{\boldsymbol{\theta}}(s, a)$. +From the action-value function, actions can be deterministically computed as $a^*(s) \in \argmax_a Q_{\boldsymbol{\theta}}(s, a)$. This equation, however, shows both the limitations of this family of methods. First, due to the action selection based on the $\argmax$, the computation could be expensive in high-dimensional discrete action spaces. Also, they are incompatible with continuous action spaces since the maximisation expects a finite set of actions. diff --git a/Chapters/Part_1/chapter_4.tex b/Chapters/Part_1/chapter_4.tex index c499b19..082267d 100644 --- a/Chapters/Part_1/chapter_4.tex +++ b/Chapters/Part_1/chapter_4.tex @@ -1,11 +1,10 @@ -\acresetall \chapter{State-of-the-Art and Thesis Content} \label{ch:sota} This chapter concludes the introductory part of this thesis. With the basic knowledge about robot simulators, robot modelling, and \ac{RL} provided by the previous chapters, we attempt to draw the state-of-the-art of domains covered by this thesis. Then, from the picture of the current research status, we identify what we believe are problems still open, and outline how this thesis aims to solve them by providing the contributions to knowledge of this work. -We start by reviewing the research on \ac{RL} for robot locomotion, that should provide a bird-eye view of how this methodology evolved and has been applied over the past three decades to the specific domain of interest of this thesis. +We start by reviewing the research on \ac{RL} for robot locomotion, which should provide a bird-eye view of how this methodology evolved and has been applied over the past three decades to the specific domain of interest of this thesis. Then, we review the technological evolution of robot simulators, focusing on robot learning applications. Finally, to provide the necessary context to motivate one of our contributions, we review the domain of push-recovery strategies applied to bipedal robots. @@ -37,9 +36,10 @@ \section{Review of Reinforcement Learning for Robot Locomotion} For what concerns research on \emph{bipedal locomotion}, instead, the literature is more sparse. The work of \textcite{heess_emergence_2017} posed one of the first milestones in end-to-end learning of locomotion behaviours from simple reward signals, exploiting a distributed version of \ac{PPO} and a curriculum learning strategy that makes the learned task progressively more challenging. -A second relevant work comes from the computer graphics community, that nowadays shares multiple research directions with robot learning. +A second relevant work comes from the computer graphics community, which nowadays shares multiple research directions with robot learning. \textcite{peng_deepmimic_2018} proposed a method that, utilising motion capture data of highly-dynamic movements, produces a policy that can execute and adapt the collected trajectories on a simulated character. -Regarding the transfer of policies to real bipeds, instead, \textcite{xieIterativeReinforcementLearning2019s} proposed to learn locomotion policies leveraging data from generated from pre-existing controllers, \textcite{castillo_robust_2021} developed a cascade control architecture to train a trajectory planning policy that is deployed on a real robot, and \textcite{li_reinforcement_2021} were able to track high-level velocity commands on their bipedal robot with a policy learned in simulation with an extensive use of domain randomization. +Regarding the transfer of policies to real bipeds, instead, \textcite{xieIterativeReinforcementLearning2019s} proposed to learn locomotion policies leveraging data from generated from pre-existing controllers. +\textcite{castillo_robust_2021} developed a cascade control architecture to train a trajectory planning policy that is deployed on a real robot, and \textcite{li_reinforcement_2021} were able to track high-level velocity commands on their bipedal robot with a policy learned in simulation with an extensive use of domain randomization. Finally, \textcite{rodriguez_deepwalk_2021} proposed a methodology based on curriculum learning to train a single policy capable of controlling a humanoid robot for omnidirectional walking, and \textcite{bloesch_towards_2022} introduced an end-to-end method to learn walking gaits without relying on simulated data. \section{Review of Simulators for Robot Learning} @@ -50,8 +50,8 @@ \section{Review of Simulators for Robot Learning} Many of the contributions in the robot learning domain exploit rigid-body simulators and robot models to generate enough state transitions for the learning process to converge to a satisfying solution. Due to the unavoidable introduction of approximations, the evolution of rigid-body simulations will differ from the evolution of the real system. This mismatch, for example, could originate from the estimation error of the inertial parameters of the simulated robot, from the wrong assumption of a perfectly rigid body, from the simplistic modelling of the actuation dynamics, from the approximations of the contact dynamics and contact parameters, from the mismatch between simulated and real sensors noise, \etc -Agents trained in environments characterised by such approximations, that could be referred all together as \emph{reality gap}, could learn to exploit modelling approximations to reach unrealistic performance that could never be transferred elsewhere. -The most popular method to mitigate the occurrence of this behaviour is \emph{domain randomization}~\parencite{peng_sim--real_2018, muratore_robot_2022}, widely studied in sim-to-real research~\parencite{zhao_sim--real_2020}, that aims to regularise the simulation with different methods to prevent overfitting during training. +Agents trained in environments characterised by such approximations, which could be referred all together as \emph{reality gap}, could learn to exploit modelling approximations to reach unrealistic performance that could never be transferred elsewhere. +The most popular method to mitigate the occurrence of this behaviour is \emph{domain randomization}~\parencite{peng_sim--real_2018, muratore_robot_2022}, widely studied in sim-to-real research~\parencite{zhao_sim--real_2020}, which aims to regularise the simulation with different methods to prevent overfitting during training. In the rest of this section, we bypass similar methodologies that could be applied to any simulation, and focus instead on providing a review of common simulators that could be used for robot learning and, particularly, locomotion applications. The process of describing the evolution of a dynamical system is deeply rooted in control and systems theory. @@ -99,13 +99,13 @@ \section{Review of Push-recovery Strategies} Humans are able to employ different strategies to keep their balance, such as \emph{ankle}, \emph{hip}, and \emph{stepping} strategies~\parencite{nashner_organization_1985, maki_role_1997, stephens_humanoid_2007}. The adoption of these strategies follows an activation proportional to the magnitude of external disturbances. The effectiveness of human capabilities mainly stems from how different strategies are combined into a continuous motion~\parencite{mcgreavy_unified_2020}. -The applicability of these principles for the generation of control actions applied to robots traditionally relied on \emph{simplified models} approximating their dynamics, such as the \ac{LIP} model~\parencite{kajita_3d_2001} and the \ac{CP}~\parencite{pratt_capture_2006}. +The applicability of these principles for the generation of control actions applied to robots has traditionally relied on \emph{simplified models} approximating their dynamics, such as the \ac{LIP} model~\parencite{kajita_3d_2001} and the \ac{CP}~\parencite{pratt_capture_2006}. Together with the formulation of the \ac{ZMP}~\parencite{vukobratovic_contribution_1969, vukobratovic_zero-moment_2004} widely adopted as a stability criterion, simplified models became very popular and still used nowadays. Modern applications alternatively rely on the \ac{DCM}, that can be seen as an extension of the \ac{CP} theory~\parencite{shafiee_online_2019}. The structure of control algorithms utilizing any of these models, however, typically considers only the \ac{CoM} of the robot. The generation of the actual joint strategy is usually achieved through either hierarchical~\parencite{feng_optimization_2014} or predictive~\parencite{wieber_trajectory_2006, aftab_ankle_2012} architectures. -Implementing an effective and robust blending of all the discrete strategies has been considered challenging and prone to failures, even with careful tuning~\parencite{mcgreavy_unified_2020}. +Implementing an effective and robust blending of all the discrete strategies (ankle, hip, stepping, \etc) has been considered challenging and prone to failures, even with careful tuning~\parencite{mcgreavy_unified_2020}. Nonetheless, their usage still represents an active research area that can achieve promising results~\parencite{jeong_robust_2019}. In the past few years, the robotics community attempted to develop methods based on robot learning and, in particular, \ac{RL} to generate a unified control action that automatically blends the discrete strategies~\parencite{yang_learning_2018}. @@ -125,17 +125,17 @@ \subsection{\autoref{ch:rl_env_for_robotics}: Reinforcement Learning Environment While the \verb|gym.Env| interface became the de-facto abstraction layer to isolate agents and environments, an appropriate structure for environments has never been properly outlined. In robot learning applications, the implementation of the decision-making logic related to the task to learn is often intertwined with the setting in which it is executed, that could be either in simulation or in the real world. Therefore, usually environments cannot be transferred between different settings without significant refactoring. -Furthermore, in simulated environments, the choice of the simulator and how it communicates with the environment implementation when sampling experience could undermine the reproducibility of the simulation, resulting in different outcomes at each execution. +Furthermore, in simulated environments, the choice of the simulator and how it communicates with the environment could undermine the reproducibility of the simulation, resulting in different outcomes at each execution. \paragraph{Context of Contribution} We present a framework for developing robotic environments for \acl{RL} research. The framework is composed of two main components. At the lower level, the \scenario \cpp abstraction layer provides different interfaces of entities part of a scene like World and Model, with additional Link and Joint interfaces to simplify the interaction. -\scenario is not strictly related to robot learning, it provides a unified interface that could be implemented to communicate either with simulated robots running in any simulator, or with real robots, passing through, \eg, a robotic middleware. +\scenario is not strictly related to robot learning, but it provides a unified interface that could be implemented to communicate either with simulated robots running in any simulator, or with real robots, passing through, \eg, a robotic middleware. We currently provide a complete implementation for simulated robots interfacing with the new Gazebo Sim\footnote{\url{https://gazebosim.org/}} simulator, supporting the DART and Bullet physics engines. At the high level, the \emph{Gym-Ignition} Python package\footnote{In its earlier releases, the new Gazebo Sim simulator was called Ignition Gazebo, from what derives the name of our Python package.} provides abstraction layers of the different components that typically form a robotic environment: the Task and the Runtime. -On the one hand, the Task provides the necessary components to develop agnostic decision-making logic with the generic \scenario \acp{API}, and on the other, the Runtime provides the actual interfacing either with a simulator or a real robot. +On the one hand, the Task provides the necessary components to develop agnostic decision-making logic with the generic \scenario \acp{API}, and on the other hand, the Runtime provides the actual interfacing either with a simulator or a real robot. This whole architecture the user to only develop the Tasks only once, and use them for training, executing, or refining a policy in any of the supported Runtimes. The selected Gazebo Sim simulator has multiple advantages over alternative options for generating synthetic data. Its plugin-based architecture allows third-party developers to integrate custom physics engines and custom rendering engines, enabling them to develop agnostic environments that select the desired engines during runtime. diff --git a/Chapters/Part_2/chapter_5.tex b/Chapters/Part_2/chapter_5.tex index 987e129..363d565 100644 --- a/Chapters/Part_2/chapter_5.tex +++ b/Chapters/Part_2/chapter_5.tex @@ -1,4 +1,3 @@ -\acresetall \chapter{Reinforcement Learning Environments for Robotics} \label{ch:rl_env_for_robotics} @@ -34,9 +33,8 @@ \subsection{Properties} \item[Modularity] % \acl{RL} is one of the most generic learning frameworks. -Its definition, a learning agent that interacts with an environment, facilitates the separation of the components in a \ac{RL} framework. -In a sense, most of the learning-from-interaction processes can be represented by such scheme. -In the field of robotics, however, we might benefit from a more fine-grained abstraction of the environment. +Its structure is composed of a learning agent interacting with an environment, as illustrated in Figure~\ref{fig:rl_setting}. +While most of the related software architectures already map their component based on this high-level structure, specific implementations for robotics might benefit from a more fine-grained abstraction, particularly regarding the environment. For instance, we might want to achieve the same learning goal on robots with different mechanical structures. In order to promote code reuse and minimise system integration effort, a modular design that abstracts the robot and the logic of the learned task is a valuable addition. % @@ -300,7 +298,8 @@ \section{Gym-Ignition} \resizebox{0.75\textwidth}{!}{ \includegraphics{images/contributions/chapter_5/scenario_and_gym_ignition.tikz} } - \caption{Architecture of \scenario and Gym-Ignition. Users of the overall framework just need to provide the \acs{URDF} or \acs{SDF} description of their robot and implement the \texttt{Task} interface with the desired decision-making logic. The framework, following a top-down approach, exposes to the \texttt{Agent} algorithms the unified \texttt{gym.Env} interface. The provided \texttt{Runtime} classes either instantiate the simulator, or handle soft real-time logic for real-world robots. The runtimes are generic and can operate on any decision-making logic that exposes the \texttt{Task} interface. Finally, \texttt{Task} implementations use the \scenario \acp{API} to interact with the robots part of the environment.} + \caption{Architecture of \scenario and Gym-Ignition. Users of the overall framework just need to provide the \acs{URDF} or \acs{SDF} description of their robot and implement the \texttt{Task} interface with the desired decision-making logic. The framework, following a top-down approach, exposes to the \texttt{Agent} algorithms the unified \texttt{gym.Env} interface. The provided \texttt{Runtime} classes either instantiate the simulator, or handle soft real-time logic for real-world robots. The runtimes are generic and can operate on any decision-making logic that exposes the \texttt{Task} interface. Finally, \texttt{Task} implementations use the \scenario \acp{API} to interact with the robots part of the environment.\\ + A typical data flow starts with the agent setting the action with \texttt{gym.Env.step}. The processing of the action is a combination of logic inside the active runtime and the active task. In particular, the runtime receives the action and directly forwards it to the task for being processed. The task, by operating transparently over the \scenario \acp{API}, applies the action to the robot, and then waits the runtime to perform the time stepping. After this phase, the task computes the reward, packs the observation, detects if the environment reached the terminal state, and returns all this data back to the agent passing through the \texttt{gym.Env} \acp{API}.} \label{fig:scenario_and_gym_ignition} \end{figure} diff --git a/Chapters/Part_2/chapter_6.tex b/Chapters/Part_2/chapter_6.tex index 03906d9..41f97fa 100644 --- a/Chapters/Part_2/chapter_6.tex +++ b/Chapters/Part_2/chapter_6.tex @@ -1,4 +1,3 @@ -\acresetall \chapter{Learning from scratch exploiting Robot Models} \label{ch:learning_from_scratch} @@ -9,10 +8,6 @@ \chapter{Learning from scratch exploiting Robot Models} The resulting control action, simultaneously operating on both the upper- and lower-body joints, encodes multiple whole-body push-recovery strategies involving the usage of ankles, hips, momentum, and stepping strategies. The policy automatically selects and blends all these different strategies upon need. -% TODO: move to context of contribution? -% Motion control on humanoid robots, compared to other robot categories, is particularly challenging due to their narrow support surface, sparse mass distribution, redundancy due to the number of \acp{DoF}, and underactuation of their floating base. -% In this setting, the agent during the \ac{RL} training needs to navigate its state space properly balancing the exploration and the exploitation trade-off, preventing to obtain sub-optimal policies due to local minima. - The presented architecture adopts a \emph{reward shaping} methodology that exploits as prior information quantities computed from the robot description, such as whole-body momentum and data related to the robot's \ac{CoM}. This approach allows utilising the widely used family of model-free \ac{RL} algorithms while exploiting priors without changing the overall learning framework. The priors act only as exploration hints, therefore, the model description does not need to be excessively accurate. @@ -27,22 +22,24 @@ \section{Training Environment} \label{fig:hierarchical} \end{figure} -The decision-making task is structured as a continuous control task with early termination conditions. +The decision-making logic is structured as a continuous control task with early termination conditions. Its dynamics runs in the Gazebo Sim simulator embedded into the Gym-Ignition framework presented in Section~\ref{sec:gym_ignition}, exposing a \ac{RL} environment compatible with \verb|gym.Env|~\parencite{brockman_openai_2016}. The enabled physics engine is DART~\parencite{lee_dart_2018}. We selected iDynTree~\parencite{nori_icub_2015} for calculating rigid-body dynamics quantities that model the floating-base multibody system as described in Section~\ref{sec:multibody_dynamics}, whose dynamics is described by the following equation: % \begin{equation} - M(\Qu) \Nud + h(\Qu, \Nu) = B \Torques + \sum_{\mathcal{L}} J^\top(\Qu) \forcesix_L^{ext} - , + M(\Qu) \Nud + h(\Qu, \Nu) = B \Torques + \sum_{L \in \mathcal{L}} J_{W,L}^\top(\Qu) \forcesix_L^{ext} + . \end{equation} % -where we grouped the Coriolis and gravity terms into a new term $h(\Qu, \Nu) = C(\Qu, \Nu) \Nu + g(\Qu)$ for compactness. +The iDynTree library allows to load a model description encoded in \ac{URDF} and provides all the terms forming this Lagrangian representation of the floating-base \acp{EoM} together with additional quantities that will be included in the reward computation described in Section~\ref{sec:reward_shaping}. -The environment receives actions and provides observations and rewards at 25~Hz. +Figure~\ref{fig:hierarchical} reports a high-level overview of the proposed control system. +The environment receives actions from the agent, here represented as its underlying \ac{NN}, and produces observations and rewards at 25~Hz. The physics and the low-level \pid controllers run at 1000~Hz. During training, some properties of the environment are randomised (see Sec.~\ref{sec:env-other}). +\pagebreak \subsection{Action} The separation between agent and environment is defined by the action selection. @@ -89,7 +86,7 @@ \subsection{State} % where $\mathcal{O} := \mathbb{R}^{62}$. % -The observation consists of the following terms: +The observation consists of the following terms, each of them re-scaled\footnote{We also call the re-scaling operation of a bounded variable as \emph{normalization}.} into a given interval for better properties when used as \ac{NN} inputs: % \begin{itemize} % @@ -107,7 +104,7 @@ \subsection{State} % \item $\mathbf{o}_F$ is a tuple containing the positions of the feet \wrt the base frame, normalised in $[0, 0.78]$~m; % -\item $\mathbf{o}_v$ is the linear velocity of the \ac{CoM} expressed in $G$, normalised in $[0, 3]$~m/s. +\item $\mathbf{o}_v$ is the linear velocity of the \ac{CoM} expressed in the \ac{CoM} frame $G = (\pos_{CoM}, [B])$, normalised in $[0, 3]$~m/s. % \end{itemize} % @@ -123,7 +120,7 @@ \subsection{Other specifications}\label{sec:env-other} \paragraph{Initial State Distribution} -The initial state distribution $\rho(\mathbf{x}_0): \mathcal{X} \mapsto \mathcal{O}$ defines the value of the observation in which the agent begins each episode. +The initial state distribution $\rho(\mathbf{x}_0): \mathcal{X} \to \mathcal{O}$ defines the value of the observation in which the agent begins each episode. Sampling the initial state from a distribution with small variance, particularly regarding joint positions and velocities, positively affects exploration without degrading the learning performance. At the beginning of each episode, for each joint $j \in \mathcal{J}$ we sample its position $s_{j,0}$ from $\mathcal{N}(\mu=s_0, \sigma=10 \, \text{deg})$, where $s_{0}$ represents the fixed initial reference, and its velocity $\dot{s}_{j,0}$ from $\mathcal{N}(\mu=0, \sigma=90 \, \text{deg/s})$. As a result, the robot may or may not start with the feet in contact with the ground, encouraging the agent to learn how to land and deal with impacts. @@ -181,7 +178,9 @@ \section{Agent} \paragraph{Learning Algorithm} -We select \ac{PPO} as the candidate learning algorithm, in the variant using the losses of both the gradient clipping and the minimisation of the \ac{KL} divergence, introduced in Section~\ref{sec:ppo}. +We select \ac{PPO} as the candidate learning algorithm, in the variant that includes both the clipped surrogate and \ac{KL} penalty objectives introduced in Section~\ref{sec:ppo}. +We selected this algorithm since it provides a simple but effective implementation of policy optimization, widely used in comparable studies\footnote{This chapter is mainly focused on the reward shaping process, not on the specific algorithm used to train the agent.}. +A practical benefit of \ac{PPO} is its small number of parameters (only the clip parameter $\epsilon$ if the \ac{KL} penalty coefficient is adjusted dynamically) that does not excessively overload the parameter tuning process. \paragraph{Policy and Value Function} @@ -203,6 +202,7 @@ \section{Agent} We use the RLlib~\parencite{liang_rllib_2018} framework, OpenAI Gym, and distributed training. \section{Reward Shaping} +\label{sec:reward_shaping} \begin{sidewaystable} \center @@ -223,33 +223,24 @@ \section{Reward Shaping} Centroidal momentum & $r_h$ & 1 & $\Vert {}_{G} \mathbf{h}_l \Vert^2 + \Vert {}_{G} \mathbf{h}_\omega \Vert^2$ & 0 & 50.0 & kg m$^2$/s & \ck & \ck \\ Feet CoPs & $\{r_{p}^L, r_{p}^R\}$ & 20 & $\{\boldsymbol{p}_{L, CoP}, \boldsymbol{p}_{R, CoP}\}$ & $\{\bar{\boldsymbol{p}}^{xy}_{L, hull}, \bar{\boldsymbol{p}}^{xy}_{R, hull}\}$ & 0.3 & m & \ck & \ck \\ \rowcolor{black!10} Feet orientation & $\{r_{o}^L, r_{o}^R\}$ & 3 & $\{\mathbf{r}^{(z)}_L \cdot \mathbf{e}_z, \mathbf{r}^{(z)}_R \cdot \mathbf{e}_z\}$ & $1$ & 0.01 & - & \ck & \ck \\ - CoM projection & $r_{G}$ & 10 & $\boldsymbol{p}^{xy}_{G}$ & $\in$ CH of support polygon & \multicolumn{1}{r}{-} & - & & \ck \\ \rowcolor{black!10} + CoM projection & $r_{G}$ & 10 & $\boldsymbol{p}^{xy}_{G}$ & in the CH of support polygon & \multicolumn{1}{r}{-} & - & & \ck \\ \rowcolor{black!10} Feet in contact & $r_{c}$ & 2 & $c_L \land c_R$ & 1 & \multicolumn{1}{r}{-} & - & \ck & \ck \\ Links in contact & $r_l$ & -10 & $c_{l}$ & $0$ & \multicolumn{1}{r}{-} & - & \ck & \ck \\ \bottomrule \end{tabular} \end{sidewaystable} -The reward is a weighted sum of terms we categorised as regularisers, steady-state, and transient. -\emph{Regularisers} are terms often used in optimal control for minimising the control action and the joint torques. -\emph{Steady-state} components help to obtain the balancing behaviour in the absence of external perturbations, and are active only in \ac{DS}. -Finally, the \emph{transient} components favour the emergence of push-recovery whole-body strategies. - -The total reward comprises a weighted sum of scalar components $\sum_i \omega_i r_i$, where $r_i$ is the reward term and $w_i$ its weight. -In order to provide a similar scale for each of them and therefore improve the interpretability of the total reward, we process the real and vector components with a \ac{RBF} kernel~\parencite{yang_emergence_2017} with a dimension given by a cutoff parameter calculated from the desired sensitivity. -Table~\ref{tab:reward} includes the weights of each reward component and the kernel parameters, if active. - \subsection{RBF Kernel} -Radial basis function (RBF) kernels are widely employed functions in machine learning, defined as +\ac{RBF} kernels are widely employed functions in machine learning, defined as $$ K(\mathbf{x}, \mathbf{x}^*) = \exp \left( -\tilde\gamma ||\mathbf{x} - \mathbf{x}^*||^2 \right) \quad \in [0, 1] , $$ where $\tilde\gamma$ is the kernel bandwidth hyperparameter. -The RBF kernel measures similarities between input vectors. +The \ac{RBF} kernel measures similarities between input vectors. This can be useful for defining scaled reward components. In particular, if $\mathbf{x}$ is the current measurement and $\mathbf{x}^*$ is the target, the kernel provides a normalised estimate of their similarity. -$\tilde\gamma$ can be used to tune the bandwidth of the kernel, i.e.\ its sensitivity. +The variable $\tilde\gamma$ can be used to tune the bandwidth of the kernel, i.e.\ its sensitivity. In particular, we use $\tilde\gamma$ to select the threshold from which the kernel tails begin to grow. Introducing the pair $(x_c, \epsilon)$, with $x_c, \epsilon \in \mathbb{R}^+$ and $|\epsilon| \ll 1$, we can parameterise $\tilde\gamma = -\ln(\epsilon) / x_c^2$. This formulation results in the following properties: @@ -264,15 +255,33 @@ \subsection{RBF Kernel} The sensitivity of individual components is tuned by adjusting $x_c$. We refer to $x_c$ as \emph{cutoff} value of the kernel, since each norm of the distance in the input space bigger than $x_c$ yields output values smaller than $\epsilon$. This formulation eases the composition of the total reward $r_t$ when reward components are calculated from measurements of different dimensionalities and scales. -In fact, once the sensitivities have been properly tuned for each component, they can simply be weighted differently as -$ -r_t = \sum_i w_i K(\mathbf{x}^{(i)}_t, \mathbf{x}^*) \in \mathbb{R} -$ -where $\mathbf{x}^{(i)}_t$ is the $i$-th measurement sampled at time $t$, and $w_i \in \mathbb{R}$ the weight corresponding to the $i$-th reward component. +In fact, once the sensitivities have been properly tuned for each component, they can simply be weighted differently as: +% +\begin{equation} + \label{eq:weighted_reward_kernel} + r_t = \sum_i w_i K \left( \mathbf{x}_i(t), \mathbf{x}_i^*(t) \right) \in \mathbb{R} + , +\end{equation} +% +where $\mathbf{x}_{i}(t)$ is the $i$-th measurement sampled at time $t$, and $w_i \in \mathbb{R}$ the weight corresponding to the $i$-th reward component. \subsection{Reward} -\subsubsection{Regularises} +This section describes all reward terms forming the instantaneous reward generated by the environment. +Given their number, we divided the terms in three categories: \emph{regularisers} are terms often used in optimal control for minimising the control action and the joint torques; \emph{steady-state terms} help obtain the balancing behaviour in the absence of external perturbations, and are active only in \ac{DS}; \emph{transient terms} favour the emergence of push-recovery whole-body strategies.. +The logic of the task computes all reward terms at each environment step. +Then, all terms are processed with a \ac{RBF} kernel, weighted, and summed together using Equation~\eqref{eq:weighted_reward_kernel}, obtaining the final instantaneous reward $r_t$ returned to the agent for its maximisation during the training phase. + +The situation in which all individual reward terms $r_i$ share the same range thanks to the filtering effect of the applied kernel simplifies the process of parameters tuning. +Each reward term introduces two parameters: its weight $\omega_i$ and the kernel cutoff $x_c$, where we assumed to keep the $\epsilon$ kernel parameter fixed. +Instead of applying grid-like search methods, that would require an excessively large number of permutations, we proceeded with an heuristic tuning by observing after each training the learning curve of each reward term independently. +Firstly, we tuned the cutoff parameter (\ie the sensitivity associated to the reward term) so that the agent could receive a perceivable increased reward to promote the desired exploration direction. +This process could be thought as tuning the variance of a Gaussian curve having mean over the target value. +Secondly, the reward terms with their tuned kernels are composed together by a weighted sum. +We started the tuning process from a weight of $1.0$ for all reward terms, and increased individual weights heuristically in case we wanted to adjust their relative importance. +Table~\ref{tab:reward} reports all the parameters of the reward function. + +\subsubsection{Regularisers} \begin{description} @@ -467,8 +476,8 @@ \section{Conclusions} The learning pipeline described in this chapter presents different limitations and shortcomings. First, the training process to obtain a single policy requires an experience equivalent to approximately 7 simulated days, which is not surprising considering the low sample-efficiency typical of model-free \ac{RL} algorithms based on policy gradient. Two possible ways to mitigate this problem are either switching to algorithms with better sample-efficiency, or optimising the performance of the experience generation. -The execution of the 7 simulated days, on a powerful workstation able to run 32 parallel simulations, lasts more than 2 real-world days, resulting to a long and extenuating parameters tuning process. -A second limitation comes from the chosen low-level control architecture, composed by independent \pid controllers. +The execution of the 7 simulated days, on a powerful workstation able to run 32 parallel simulations, lasts more than 2 real-world days, resulting in a long and extenuating parameters tuning process. +A second limitation comes from the chosen low-level control architecture, composed of independent \pid controllers. Despite the benefits of their simplicity, they introduce in the system a stiffness that can prevent the emergence of natural and more human-like motions. Finally, the applicability to real robots has yet to be assessed, since the simulation does not take into account second-order dynamic effects characterising real systems. We tried to increase the robustness of the resulting policy by introducing multiple domain randomisation effects, but their effectiveness in the real world has not been assessed. diff --git a/Chapters/Part_2/chapter_7.tex b/Chapters/Part_2/chapter_7.tex index af38c59..cafe098 100644 --- a/Chapters/Part_2/chapter_7.tex +++ b/Chapters/Part_2/chapter_7.tex @@ -1,11 +1,10 @@ -\acresetall \chapter{Contact-aware Multibody Dynamics} \label{ch:contact_aware_dynamics} In the first part of this thesis contributions, we proposed a general framework for creating robotic environments and, with it, introduced a scheme for training a policy for generating push-recovery control signals balancing the iCub humanoid robot in the presence of external disturbances. The simulations were performed using the general-purpose Gazebo Sim, which provides a rich ecosystem supporting many types of robots, sensors, physics engines, and rendering capabilities. However, the benefits of exploiting general-purpose solutions often introduce a trade-off with the achievable performance. -As we have experienced for the push-recovery policy presented in Chapter~\ref{ch:learning_from_scratch}, a single iteration of policy training could last multiple days, resulting in a long tuning process that could limit the search space. +As we have experienced with the push-recovery policy presented in Chapter~\ref{ch:learning_from_scratch}, a single iteration of policy training could last multiple days, resulting in a long tuning process that could limit the search space. The bottleneck of the training pipeline are the computations performed by the rigid-body simulator, limiting the rate at which new trajectories can be sampled. In the continuation of this thesis, we attempt to reply to the question: \textit{"How can we optimise sampling performance of synthetic data?"}. @@ -15,16 +14,125 @@ \chapter{Contact-aware Multibody Dynamics} Each contact point's dynamics state is structured such that it can be included in the state-space representation and integrated with the robot's dynamics. The resulting representation will be used, in the next chapter, as a base for a novel physics engine targeted to exploit modern hardware accelerators for maximising trajectory sampling. -\section{State-space Multibody Dynamics} +\section{Notation} + +\subsection{Frame kinematics with quaternions} +\label{sec:frame_kinematics_quaternion} + +In Chapter~\ref{ch:robot_modelling}, we introduced the homogeneous transformation $\homo[A]_B \in \SE(3)$ to describe the pose of a frame (corresponding also to a rigid body). +An alternative representation of the $\SE(3)$ group is the tuple $(\pos[A]_B, {}^A \bar{q}_B)$, where $\pos[A]_B \in \realn^3$ is the \emph{position vector} and ${}^A \bar{q}_B$ a \emph{unit quaternion}. +A generic quaternion can be defined as follows: +% +\begin{equation*} + q \in \mathbb{H} := \left\{ q_w + \hat{i} q_x + \hat{j} q_y + \hat{k} q_z \;:\;\; q_w, q_x, q_y, q_z \in \realn \right\} + , +\end{equation*} +% +where $\Re(q) = q_w$ is the \emph{real} part of the quaternion and $\Imag(q) = \hat{i} q_x + \hat{j} q_y + \hat{k} q_z$ is the \emph{imaginary} part. +Elements of $\SO(3)$ describing frame rotations can be described by \emph{unit quaternions}: +% +\begin{equation*} + \bar{q} \in \Spin(3) := \left\{ q \in \mathbb{H} \;:\; |q| = 1 \right\} + . +\end{equation*} +% +A quaternion could be also described with a real vector of its coefficients: +% +\begin{equation*} + \quat = (w, \mathbf{r}) \in \realn^4 + , +\end{equation*} +% +where $w = q_w \in \realn$ and $\mathbf{r} = (q_x, q_y, q_z) \in \realn^3$. +In this chapter, we mainly use this last representation since it has a direct relation with the practical usage in algorithms design. +The pose of a generic frame can therefore be described by a vector $(\pos[A]_B, \quat[A]_B) \in \realn^7$. +% +For what regards the frame velocity, we can keep using any of the 6D vectors introduced in Section~\ref{sec:frame_velocities}, \ie $\velsix_{A,B} = (\vellin_{A,B}, \velang_{A,B}) \in \realn^6$. + +\subsection{Frame pose derivative and 6D velocity with quaternions} + +When the orientation of a frame is expressed with a quaternion, it's worth introducing the relation between the derivative of the pose $(\posd[A]_B, \quatd[A]_B) \in \realn^7$ and the 6D velocity $\velsix_{A,B} \in \realn^6$. +In fact, particularly in this case with the quaternion, it's clear that the two cannot be related by a simple numerical differentiation since they also have different dimensions. +The following relations hold: +% +\begin{equation} + \label{eq:pose_dot_with_quaternion} + \begin{cases} + \posd[A]_B = \vellin[{A[B]}]_{A,B} \\ + \quatd[A]_B = \frac{1}{2} S(\velang[B]_{A,B}) \quat[A]_B + \end{cases} + , +\end{equation} +% +where we notice that the derivative of the position is the linear part of the mixed velocity defined in Equation~\eqref{eq:mixed_velocity} (denoted as $\orid[A]_B$), and defined the derivative of the quaternion~\parencite{sola_quaternion_2017} by introducing the following matrix: +% +\begin{equation*} + \operatorname{S}(\velang) = + \begin{bmatrix} + 0 & -\velang^\top \\ \velang & -\velang^\wedge + \end{bmatrix} + . +\end{equation*} + +The relations of Equation~\eqref{eq:pose_dot_with_quaternion} can be used for integrating numerically the frame pose, using one of the schemes that will be introduced in Section~\ref{sec:integrators}. +The quaternion $\quat$, in this setting, has to be treated with care, because only unit quaternions can describe rotations. + +Numerical integration schemes do not enforce this property is maintained along the trajectory, and numeric approximations could lead to unwanted instabilities. +This problem can be either solved or mitigated by different types of solutions. +The most straightforward solution is normalising the quaternion after each integration step. +Alternatively, a correction term orthogonal to the quaternion dynamics of the following form can be introduced: +% +\begin{equation*} + \quatd = \frac{1}{2} \operatorname{S}\left( \rot[B]_W \velang[W]_{W,B} \right) \quat + \frac{1}{2} K_{\quat} \quat \left( \norm{\quat}^{-1} - 1 \right) + , +\end{equation*} +% +that corresponds to a Baumgarte stabilization on $\SO(3)$~\parencite{gros_baumgarte_2015}, where $K_{\quat} \in \realn$ is the correction coefficient. +This stabilization term progressively projects the quaternion towards a unity quaternion, restoring the norm in case of drifting. +In the continuation of this thesis, we do not explicitly include any correction, considering its choice an implementation detail. + +\begin{remark*}[Geometric integration of quaternions] +% +Quaternions, like other representations of elements of $\SO(3)$, have strong geometrical properties based on the underlying symmetries. +These properties can be exploited to obtain more rigorous integration schemes performed directly on the manifold~\parencite{andrle_geometric_2013}. +The quaternion relation of Equation~\eqref{eq:pose_dot_with_quaternion} can be seen as a differential equation in $\quat[A]_B$. +By exploiting properties of the matrix exponential together with quaternion properties, it can be shown~\parencite{andrle_geometric_2013, sola_quaternion_2017, sola_micro_2020} that the following relation implements a zero-\emph{th} order integration: +% +\begin{align*} + \quat_{t_{k+1}} = \quat_{t_k} \otimes \left( \norm{\velang}\cos(\velang \Delta t / 2), \frac{}{} \sin(\velang \Delta t / 2) \right) +\end{align*} +% +where $\otimes$ denotes the quaternion multiplication, and $\velang$ is assumed constant within the integration interval $[t_k, t_{k+1}]$. + +While both integration approaches are valid options --albeit showing different numerical stability \parencite{andrle_geometric_2013}--, we develop the theory of this chapter using the numerical derivative. +This choice allows treating the entire state of a multibody system, including the base quantities, as a vector in $\realn^n$, simplifying the equations of this chapter. +If needed, practical implementations of the presented results could adopt the geometrical integration if better numerical stability is required, paying the price to treat the base orientation separately. +% +\end{remark*} + +\subsection{Multibody dynamics} The \acp{EoM} of a floating-base multibody system have been previously introduced in Section~\ref{sec:multibody_dynamics} in the following form: % \begin{equation*} M(\Qu) \Nud + C(\Qu, \Nu) \Nu + g(\Qu) = B \Torques + \sum_{\mathcal{L}} J_{L}^\top(\Qu) \forcesix_L^{ext} - . + , \end{equation*} % -In this chapter, when it is not required to decompose the Coriolis and gravitational terms explicitly, we use a more compact notation introducing the following vector of \emph{bias forces}: +where $\Qu$ is the generalised position and $\Nu$ is the generalised velocity already introduced in Equation~\eqref{eq:floating_base_velocity}. +% +If a quaternion $\quat[W]_B$ is used to model the orientation of the base frame $B$, as described in Section~\ref{sec:frame_kinematics_quaternion}, we can expand the generalised position and velocity as follows: +% +\begin{align} + \label{eq:qu_nu_quaternion} + \mathbf{q} &= \left( \pos[W]_B, \quat[W]_B, \mathbf{s} \right) \in \mathbb{R}^{n+7} \\ + \boldsymbol{\nu} &= \Nu[W] = \left( \vellin[W]_{W,B}, \velang[W]_{W,B}, \dot{\mathbf{s}} \right) \in \mathbb{R}^{n+6} + , +\end{align} +% +where we chose the inertial-fixed representation of the base velocity, introduced in Section~\ref{sec:right_trivialized_velocity}, just for practical reasons. + +When the explicit decomposition of the Coriolis and gravitational terms is not required, we combine their effects into the following vector of \emph{bias forces}: % \begin{equation*} h(\Qu, \Nu) = C(\Qu, \Nu) \Nu + g(\Qu) \in \realn^{6+n} @@ -32,11 +140,11 @@ \section{State-space Multibody Dynamics} \end{equation*} We can compact the Lagrangian formulation of the system's dynamics even more by updating the terms related to the external 6D forces. -We assume, for each link $L \in \mathcal{L}$, that an external 6D force $\forcesix[L]^{ext}_L$ always exists but is zero if the link has no interaction with the environment. +We assume, for each link $L \in \mathcal{L}$, that an external 6D force $\forcesix^{ext}_L$ always exists but is zero if the link has no interaction with the environment. If we stack the Jacobians and the 6D forces of all links in the following new matrices: % \begin{align*} - J_\mathcal{L} = + J_\mathcal{L}(\Qu) = \begin{bmatrix} J_0 \\ J_1 \\ @@ -62,47 +170,70 @@ \section{State-space Multibody Dynamics} . \end{equation} -Now, to simplify the numerical integration of this system, we should express this non-linear \ac{ODE} in a state-space form: +\section{State-space Multibody Dynamics} + +The \acp{EoM} of a multibody system of Equation~\eqref{eq:eom_multibody_compact} are expressed as non-linear \ac{ODE}. +Modern control theory studies the properties of these systems studied by operating on their \emph{state-space} representation~\parencite{friedland_control_2005}, which assumes the following general form: % -\begin{equation*} - \dot{\mathbf{x}}(t) = f\left(\mathbf{x}(t), \mathbf{u}(t)\right) . -\end{equation*} +\begin{equation} +\label{eq:state_space_representation} +\begin{cases} + \mathbf{x}(t) &= f \left( t, \mathbf{x}(t), \mathbf{u}(t) \right) \\ + \mathbf{y}(t) &= g \left( t, \mathbf{x}(t), \mathbf{u}(t) \right) +\end{cases} +, +\end{equation} % -We can obtain this form by grouping the variables in the \emph{state vector} $\mathbf{x}$ and the \emph{input vector} $\mathbf{u}$: +where the first is the \emph{state equation} and the second is the \emph{output equation}. +The variable $\mathbf{x}$ is called \emph{state vector} and the variable $\mathbf{u}$ is called \emph{input vector}. +State-space models are particularly interesting for computing, starting from a given initial state $\mathbf{x}_0$, the future trajectory of an arbitrarily complex system, under the assumption of the complete knowledge of its dynamics $f(\cdot)$ and inputs $\mathbf{u}(t)$, if any. +In the most general case, even when closed-form solutions of Equation~\eqref{eq:state_space_representation} cannot be found, future trajectories can be computed through numerical integration~\parencite{cellier_continuous_2006}, as it will be described in Section~\ref{sec:integrators}. + +In the case of a rigid multibody system, the \acp{EoM}~\eqref{eq:eom_multibody_compact} can be expressed in state-space representation by introducing the following state and input vectors: % -\begin{align*} +\begin{align} + \label{eq:input_and_state_vectors} \mathbf{x}(t) = \begin{bmatrix} - \mathbf{q} \\ \boldsymbol{\nu} + \Qu \\ \Nu \end{bmatrix} \in \mathbb{R}^{2n+13}, && \mathbf{u}(t) = \begin{bmatrix} - \boldsymbol{\tau} \\ \forcesix^{ext}_\mathcal{L} + \Torques \\ \forcesix^{ext}_\mathcal{L} \end{bmatrix} \in \mathbb{R}^{n+6n_L}. -\end{align*} +\end{align} % -Most of the state-of-the-art \aclp{RBDA}~\parencite{featherstone_rigid_2008}, for efficiency reasons, exploit properties of the inertial-fixed velocity representation, introduced in Section~\ref{sec:right_trivialized_velocity}. -In this representation, a possible formulation of the state vector is the following: +With these definitions, we can define the state equation of our system as: % -\begin{align*} - \mathbf{q} &= \left( \pos[W]_B, \quat[W]_B, \mathbf{s} \right) \in \mathbb{R}^{n+7} \\ - \boldsymbol{\nu} &= \left( \vellin[W]_{W,B}, \velang[W]_{W,B}, \dot{\mathbf{s}} \right) \in \mathbb{R}^{n+6} +\begin{equation*} + \dot{\mathbf{x}}(t) = + \begin{bmatrix} + \dot{\mathbf{q}} \\ \dot{\boldsymbol{\nu}} + \end{bmatrix} = + f\left(\mathbf{x}(t), \mathbf{u}(t)\right) , -\end{align*} +\end{equation*} % -where we model the orientation of the base link with a quaternion $\quat = (w, \mathbf{r}) \in \mathbb{R}^4$, where $w \in \realn$ is its real part and $\mathbf{r} \in \mathbb{R}^3$ its imaginary part. -Quaternions are useful in the context of modelling an element of $\SO(3)$ because their description is a more practical $\realn^4$ vector. +where we still need to find the equations of $\Qud$ and $\Nud$. + +The variable $\Nud \in \realn^{6+n}$ is the \emph{generalised acceleration} of the system. +Assuming the mass matrix $M(\Qu)$ non-singular, the dynamics of $\Nu$ can be extracted directly from the \acp{EoM}~\eqref{eq:eom_multibody_compact}. +As we will introduce in Chapter~\ref{ch:scaling_rigid_body_simulations}, the computation of $\Nud$ is also known as \emph{forward dynamics} of the multibody system, and the methodology that requires the inversion of the mass matrix is just one among the available methods, not necessarily the most computationally efficient. +In order to maintain a degree of generality, in this chapter we mark this computation with the $\operatorname{FD}(\cdot)$ function. -We can now structure the desired state-space form of the system: +For what regards $\Qud$, we can simply extend the base quantities of Equation~\eqref{eq:pose_dot_with_quaternion} with the joint velocities $\dot{\mathbf{s}}$. +The advantage of using the numerical integration form of the quaternion should now be clear, since it enables its direct inclusion in the state vector. + +The final form of the state-space representation can be found by combining all the previous elements: % \begin{equation} \label{equation:floatig_base_dynamics_state_space} \dot{\mathbf{x}}(t) = \begin{bmatrix} - \dot{\mathbf{q}} \\ \dot{\boldsymbol{\nu}} + \Qud \\ \Nud \end{bmatrix} = \begin{bmatrix} \begin{pmatrix} @@ -111,43 +242,24 @@ \section{State-space Multibody Dynamics} \dot{\mathbf{s}} \end{pmatrix} \\ - \operatorname{FD}(\mathcal{M}, \mathbf{q}, \boldsymbol{\nu}, \boldsymbol{\tau}, \forcesix_\mathcal{L}^{ext}) + \operatorname{FD}(\mathcal{M}, \Qu, \Nu, \Torques, \forcesix_\mathcal{L}^{ext}) \end{bmatrix} = f\left(\mathbf{x}(t), \mathbf{u}(t)\right) . \end{equation} % -We introduced the \emph{forward dynamics} function $\operatorname{FD}(\cdot)$ to model the dynamics of the system's velocity $\Nu$. -We will discuss in greater detail the computation of forward dynamics in Chapter~\ref{ch:scaling_rigid_body_simulations}. -At this moment, assuming the knowledge of all model parameters forming the \acp{EoM}, we can think of isolating the generalized acceleration $\Nud$ from Equation~\eqref{eq:eom_multibody_compact}. -It is worth noting that $\Qud \neq \Nu$, in fact we used the term $\posd[W]_B$ as derivative of the base position. -This value matches the linear component of the mixed velocity $\velsix[{B[W]}]_{W,B}$, denoted as $\orid[A]_B$ in Equation~\eqref{eq:mixed_velocity}. +Also in this case, it's worth noting that $\Qud \neq \Nu$, due to the nature of the angular variables of the base link. -\begin{remark*}[Quaternion derivative] -% -Details of the equation used to define the derivative of the quaternion can be found in~\cite{sola_quaternion_2017}. -In Equation~\eqref{equation:floatig_base_dynamics_state_space}, we used the following matrix: +Computing the evolution of this system requires the knowledge of its inputs, represented by the joint torques $\Torques$ and the external forces $\forcesix_{\mathcal{L}}^{ext}$. +The external forces could either be known 6D forces supplied by the user, or unknown 6D forces resulting from the interaction with the environment: % -\begin{equation*} - \operatorname{S}(\velang) = - \begin{bmatrix} - 0 & -\velang^\top \\ \velang & -\velang^\wedge - \end{bmatrix} - . -\end{equation*} -% -Due to numerical errors, the quaternion obtained from the integration of the system could lose its unitary norm. -We can introduce a correction term orthogonal to the quaternion dynamics corresponding to a Baumgarte stabilization on SO$(3)$~\citep{gros_baumgarte_2015}, that iteratively restores the norm in case of drifting: -% -\begin{equation*} - \dot\quat = \frac{1}{2} \operatorname{S}\left( \rot[B]_W \velang[W]_{W,B} \right) \quat + \frac{1}{2} K_{\quat} \quat \left( \norm{\quat}^{-1} - 1 \right) - , -\end{equation*} -% -where $K_{\quat} \in \mathbb{R}$ is the correction coefficient. -In the rest of this chapter, being only a workaround for numerical errors, we do not explicitly include this correction term in Equation~\eqref{equation:floatig_base_dynamics_state_space}. +\begin{equation} + \label{eq:decomposition_external_forces} + \forcesix_\mathcal{L}^{ext} = \forcesix_\mathcal{L}^{user} + \forcesix_\mathcal{L}^{contact} +\end{equation} % -\end{remark*} +In the next section, we describe a methodology to compute the unknown forces exchanged with the environment by assuming that the floating-base model only interacts with the terrain $\forcesix_\mathcal{L}^{contact} := \forcesix_\mathcal{L}^{terrain}$, assumption compatible with our locomotion setting. +In a more general setting, $\forcesix_\mathcal{L}^{contact}$ should also include the forces exchanged between bodies, for example to simulate either self-collisions or interaction with other bodies part of the scene, particularly useful for robot manipulation. \section{Contact model} \label{section:contact_model} @@ -178,13 +290,13 @@ \subsection{Point-surface collisions} \centering \resizebox{.63\textwidth}{!}{ \includegraphics{images/contributions/chapter_7/soft_contact_model.tikz}} - \caption{Illustration of the point-surface soft-contact model for non-planar terrains.} + \caption{Illustration of the point-surface soft-contact model for non-planar terrains. The collidable point follow a trajectory $\pos_{cp}(t)$, penetrating the ground in $\pos_{cp}^0 := \left( x^0, y^0, \mathcal{H}(x^0, y^0) \right)$. While penetrating the material, the point reaches a generic point $\pos_{cp}$, over which a local contact frame $C = (\pos_{cp}, [W])$ is positioned, with a linear velocity $\vellin_{W,C} \in \realn^3$. The figure reports also the \emph{penetration depth} $h \in \realn$, the \emph{normal deformation} $\delta \in \realn$, and the tangential deformation $\mathbf{u} \in \realn^3$, used for the calculation of the 3D reaction force $\forcelin[C]_{cp}$ with the proposed soft-contact model.} \label{fig:soft_contact_model} \end{figure} For each time-varying collidable point belonging to the simulated model, we introduce a new local frame $C = (\ori_C, [W])$, having its origin located over the point's time-varying position $\pos[W]_{cp}(t)$ and orientation of $W$, illustrated in Figure~\ref{fig:soft_contact_model}. -In this setup, \emph{collision detection} is as easy as assessing if the $z$ coordinate the collidable point is lower than the terrain height. +In this setup, \emph{collision detection} is as easy as assessing if the $z$ coordinate of the collidable point is lower than the terrain height. We can assume having a \emph{heightmap} function $\mathcal{H}: (x, y) \mapsto z$ providing the terrain height at any location. We also assume to know the direction of the terrain normal $\hat{\mathbf{n}}$ in world coordinates at any location of the terrain's surface\footnote{For smooth terrains, it can be shown that the normal can be estimated from $\mathcal{H}$.}. If $\pos[W]_T = (x_{cp}, y_{cp}, z_T)$ is the point on the terrain surface vertical to the collidable point, where $z_T = \mathcal{H}(x_{cp}, y_{cp})$, @@ -357,7 +469,7 @@ \subsection{Augmented system dynamics} \label{equation:floatig_base_dynamics_with_contacts_state_space} \dot{\mathbf{x}}(t) = \begin{bmatrix} - \dot{\mathbf{q}} \\ \dot{\boldsymbol{\nu}} \\ \operatorname{vec}(\dot{\mathbf{U}}) + \Qud \\ \Nud \\ \operatorname{vec}(\dot{\mathbf{U}}) \end{bmatrix} = \begin{bmatrix} \begin{pmatrix} @@ -365,7 +477,7 @@ \subsection{Augmented system dynamics} \frac{1}{2} \operatorname{S}\left( \rot[B]_W \velang[W]_{W,B} \right) \quat[W]_B \\ \dot{\mathbf{s}} \end{pmatrix} \\ - \operatorname{FD}(\mathcal{M}, \mathbf{q}, \boldsymbol{\nu}, \boldsymbol{\tau}, \forcesix_L) \\ + \operatorname{FD}(\mathcal{M}, \Qu, \Nu, \Torques, \forcesix_\mathcal{L}^{ext}) \\ \operatorname{vec}(\dot{\mathbf{U}}) \end{bmatrix} = f\left(\mathbf{x}(t), \mathbf{u}(t)\right) @@ -374,6 +486,22 @@ \subsection{Augmented system dynamics} % This final non-linear system, albeit being quite stiff when new contacts are made or broken, does not present any discontinuity. +In the considered locomotion setting, considering the decomposition of the external forces of Equation~\eqref{eq:decomposition_external_forces} and the assumption of knowing the terrain profile\footnote{The terrain profile could also be estimated by introducing perception algorithms.}, the input vector of Equation~\eqref{eq:input_and_state_vectors} can be simplified as follows: +% +\begin{equation*} + \mathbf{u}(t) = \left( \Torques,\, \forcesix_\mathcal{L}^{user} \right) + , +\end{equation*} +% +since the forces exchanged with the terrain can be computed from the kinematics and the terrain properties: +% +\begin{equation*} + \forcesix_\mathcal{L}^{contact} := \forcesix_\mathcal{L}^{terrain} = \forcesix_\mathcal{L}^{terrain}(\Qu, \Nu, \mathcal{H}, \mathcal{S}) + , +\end{equation*} +% +where $\mathcal{H}: (x, y) \mapsto z$ returns the terrain height and $\mathcal{S}: (x, y) \mapsto \hat{\mathbf{n}}$ returns the terrain normal. + \section{Integrators} \label{sec:integrators} diff --git a/Chapters/Part_2/chapter_8.tex b/Chapters/Part_2/chapter_8.tex index b71f6a1..75d3b51 100644 --- a/Chapters/Part_2/chapter_8.tex +++ b/Chapters/Part_2/chapter_8.tex @@ -1,4 +1,3 @@ -\acresetall \chapter{Scaling rigid-body simulations} \label{ch:scaling_rigid_body_simulations} @@ -602,8 +601,8 @@ \subsection{Soft-contacts Algorithm} \item ${}^W \mathbf{u} \in \realn^3$: the tangential deformation of the terrain associated to the contact point; \item $K, D \in \realn^+$: the parameters of the spring-damper model used for both the normal and tangential force calculation; \item $\mu \in \realn^+$: the static friction coefficient of the contact point; - \item $\mathcal{H}: \realn^2 \mapsto \realn$: a function returning the terrain height $z_T$ at given $(x, y)$ coordinates; - \item $\mathcal{S}: \realn^2 \mapsto \realn^3$: a function returning the normal of the surface $\hat{\mathbf{n}}$ at given $(x, y)$ coordinates\footnote{Under the assumption of smooth terrain, an approximation of $\mathcal{S}$ could be calculated from $\mathcal{H}$, \ie $\mathcal{S}(x, y) = f\left(\mathcal{H}(x, y)\right)$}; + \item $\mathcal{H}: \realn^2 \to \realn$: a function returning the terrain height $z_T$ at given $(x, y)$ coordinates; + \item $\mathcal{S}: \realn^2 \to \realn^3$: a function returning the normal of the surface $\hat{\mathbf{n}}$ at given $(x, y)$ coordinates\footnote{Under the assumption of smooth terrain, an approximation of $\mathcal{S}$ could be calculated from $\mathcal{H}$, \ie $\mathcal{S}(x, y) = f\left(\mathcal{H}(x, y)\right)$}; \end{itemize} % and provides the following outputs: diff --git a/Chapters/covid_impact_statement.tex b/Chapters/covid_impact_statement.tex index e6f0111..abce30f 100644 --- a/Chapters/covid_impact_statement.tex +++ b/Chapters/covid_impact_statement.tex @@ -17,12 +17,12 @@ \section*{Covid-19 Impact Statement} In the following period, the research institute enforced a fixed quota of persons having access to the laboratories and equipment. Originally, the research project was structured in two stages: the first stage mainly focused on simulated results, and the second on practical activities with real robots, addressing sim-to-real applications. -Before the beginning of the pandemics, the first activities of the research project were directed towards the development of a software infrastructure, presented in Chapter~\ref{ch:rl_env_for_robotics}, to enable running experiments both on simulated and real robots. -The most acute phase of the pandemic occurred in the beginning of the second year of the research project, when we were achieving with the developed architecture the first results in simulation that will be presented in Chapter~\ref{ch:learning_from_scratch}. -In the same year, I was also personally affected by the disease, that forced me to stay home for 20 days. +Before the beginning of the pandemic, the first activities of the research project were directed towards the development of a software infrastructure, presented in Chapter~\ref{ch:rl_env_for_robotics}, to enable running experiments both on simulated and real robots. +The most acute phase of the pandemic occurred in the beginning of the second year of the research project, when we were achieving with the developed architecture the first results in simulation that are presented in Chapter~\ref{ch:learning_from_scratch}. +In the same year, I was also personally affected by the disease, which forced me to stay home for 20 days. Due to the limitations in accessing the laboratory, and the uncertainty of how the pandemic would have evolved in conjunction with new corresponding restrictions that could have been enforced nationwide, we decided to refocus the research project to rely only on activities that could have been performed and finalised without requiring physical access to the facilities. In particular, we updated the remaining objectives of the research project to study methodologies to mitigate the sampling bottlenecks of \acl{RL} architectures applied to robot learning applications. In this setting, the training methods typically gather synthetic data from rigid body simulators, which is later fed into an optimisation problem to synthesise the control action. -Despite the need to train multiple \acl{NN}, and the computational cost of the optimisation algorithms, the sampling process often represents the major bottleneck of the complete learning pipeline. +Despite the need to train multiple \aclp{NN}, and the computational cost of the optimisation algorithms, the sampling process often represents the major bottleneck of the complete learning pipeline. For this reason, we decided to study in greater detail how synthetic data could be efficiently sampled from rigid body simulations, maintaining the focus on robot locomotion. Chapter~\ref{ch:contact_aware_dynamics} and Chapter~\ref{ch:scaling_rigid_body_simulations} will present the outputs of these final objectives. diff --git a/Chapters/epilogue.tex b/Chapters/epilogue.tex index d77f54f..33dcf8f 100644 --- a/Chapters/epilogue.tex +++ b/Chapters/epilogue.tex @@ -3,7 +3,6 @@ \addtocontents{toc}{\protect\vspace{\beforebibskip}} \addcontentsline{toc}{chapter}{\tocEntry{Epilogue}} -\acresetall \chapter*{Epilogue} \chaptermark{Epilogue} @@ -13,7 +12,7 @@ \section*{Summary} In particular, Chapter~\ref{ch:introduction} introduced robotic simulators, describing their main components and properties, and provided a brief description of the technologies that enabled the activities related to this thesis. Chapter~\ref{ch:robot_modelling}, after an initial overview of frame kinematics and rigid body dynamics, derived the \acp{EoM} of a rigid multibody system describing the dynamics of floating-base robots. Then, Chapter~\ref{ch:reinforcement_learning} provided a bird-eye view of \ac{RL}. -With the goal of formulating the theory behind \ac{PPO}, that is the \ac{RL} algorithm used in Part~\ref{part:contribution}, this second background chapter first defined all the elements necessary to formalise the \ac{RL} problem, then it described the taxonomy of the available algorithms available to compute a solution, and finally provided the theory of policy optimisation. +With the goal of formulating the theory behind \ac{PPO}, which is the \ac{RL} algorithm used in Part~\ref{part:contribution}, this second background chapter first defined all the elements necessary to formalise the \ac{RL} problem, then it described the taxonomy of the available algorithms available to compute a solution, and finally provided the theory of policy optimisation. The thesis continued with Part~\ref{part:contribution}, describing the contributions to knowledge of this thesis. The first two chapters analysed the challenges of creating robotic environments for \ac{RL} research for the aim of sampling experience. @@ -36,7 +35,7 @@ \section*{Summary} \section*{Discussions, Limitations, and Future work} -To conclude the thesis, we provide final discussion points of each contribution to knowledge, outlining pitfalls and future work directions. +To conclude the thesis, we provide final discussion pointing of each contribution to knowledge, outlining pitfalls and future work directions. \subsection*{\autoref{ch:rl_env_for_robotics}: Reinforcement Learning Environments for Robotics} @@ -60,7 +59,7 @@ \subsection*{\autoref{ch:rl_env_for_robotics}: Reinforcement Learning Environmen \subsection*{\autoref{ch:learning_from_scratch}: Learning from scratch exploiting robot models} In this chapter, we proposed a scheme to train a policy with \ac{RL} for balancing a simulated humanoid robot in the presence of external disturbances, sampling synthetic data from the framework proposed in the previous chapter. -We adopted a process based on reward shaping to guide the the state space exploration throughout the training phase. +We adopted a process based on reward shaping to guide the state space exploration throughout the training phase. Starting from a simple reward structure, we tried to address undesired behaviour by iteratively adding new terms, until its final form. We decided to control most of the \acp{DoF} of the iCub robot, acknowledging that the kinematics is highly redundant, and the policy optimisation could have stalled to local optima. Parameter tuning is paramount for these applications, and details are too often left out from research discussions. @@ -69,7 +68,7 @@ \subsection*{\autoref{ch:learning_from_scratch}: Learning from scratch exploitin Much work also went into the training infrastructure, leveraging a cluster of machines and implementing the proper experiment deployment, with logging and synchronisation support. Beyond the training process, our results also show limitations, particularly when possible future sim-to-real applications are considered. -Our control architecture relies on a policy providing velocity references, that are integrated and given as inputs to independent \pid joint controllers, generating the final joint torques actuated by the simulator. +Our control architecture relies on a policy providing velocity references, which are integrated and given as inputs to independent \pid joint controllers, generating the final joint torques actuated by the simulator. Beyond being difficult to tune with performance comparable to those of the real-world counterpart, the low-level \pid controllers present a trade-off between accuracy and compliance. In our experiment, the position \pid controllers resulted in a stiff robot, that together with rigid contacts, limited the emergence of a natural, smooth, and more human-like behaviour. We think two different directions can be considered for improvements. @@ -95,7 +94,7 @@ \subsection*{\autoref{ch:learning_from_scratch}: Learning from scratch exploitin \subsection*{\autoref{ch:contact_aware_dynamics}: Contact-aware Multibody Dynamics} -This chapter proposed a state-space representation modelling the dynamics of a floating-base robot in contact with a ground surface, that can be integrated numerically over time to compute the trajectory of the system. +This chapter proposed a state-space representation modelling the dynamics of a floating-base robot in contact with a ground surface, which can be integrated numerically over time to compute the trajectory of the system. Being formulated in reduced coordinates, the system dynamics is forced to evolve enforcing the joint constraints. While being quite generic and providing a compact formulation applicable to any articulated structure, it presents different limitations. The collision detection corresponding to the contact model considers only points rigidly attached to the links and a smooth ground surface. diff --git a/Chapters/prologue.tex b/Chapters/prologue.tex index 94b0260..202dac7 100644 --- a/Chapters/prologue.tex +++ b/Chapters/prologue.tex @@ -2,7 +2,6 @@ \phantomsection \addcontentsline{toc}{chapter}{\tocEntry{Prologue}} -\acresetall \chapter*{Prologue} \chaptermark{Prologue} @@ -53,7 +52,7 @@ \chapter*{Prologue} In the 2010s, the \ac{RL} community has been particularly prolific, producing a considerable number of new algorithms. Despite their improved properties like increased sample efficiency, faster and more stable convergence, \etc they always rely on a given amount of trial-and-error experience, with no exception. -If, on the one hand, better algorithms are lower-bounded by the least amount of data describing the decision-making logic to learn, on the other, the amount of synthetic data that simulators can generate has no theoretical upper bound. +If, on the one hand, better algorithms are lower-bounded by the least amount of data describing the decision-making logic to learn, on the other hand, the amount of synthetic data that simulators can generate has no theoretical upper bound. The rapid adoption of \ac{RL} in the robotics domain forced practitioners to use the simulation technology from either robotics or gaming available at that time, which was never optimised for maximising execution speed. In fact, a single simulation instance executing in real-time has always been more than enough for the original purpose. It is not uncommon to find \ac{RL} experiments that require days, weeks, or even months worth of data, giving those with access to large computational resources a significant advantage. @@ -199,7 +198,7 @@ \section*{Software Projects} \vspace{3mm} \noindent -\textbf{\texttt{gym-ignition}} is a Python framework to create reproducible robotics environment for \ac{RL} research. It exposes an abstraction layer providing \acp{API} that enable the development of \ac{RL} environment compatible with \texttt{gym.Env}. The resulting environments, if they exploit \texttt{scenario}, become agnostic from setting in which they execute (either simulated or in real-time). The project also provides helpful utilities to compute common quantities commonly used by robotic environments, includes support of environment randomization, and handles the correct propagation of randomness. The library is open-source, released under the \emph{LGPL v2.1 or any later version}, and it is available at \linebreak \url{https://github.com/robotology/gym-ignition}. +\textbf{\texttt{gym-ignition}} is a Python framework to create reproducible robotics environment for \ac{RL} research. It exposes an abstraction layer providing \acp{API} that enable the development of \ac{RL} environment compatible with \texttt{gym.Env}. The resulting environments, if they exploit \texttt{scenario}, become agnostic from the setting in which they execute (either simulated or in real-time). The project also provides helpful utilities to compute common quantities commonly used by robotic environments, includes support of environment randomization, and handles the correct propagation of randomness. The library is open-source, released under the \emph{LGPL v2.1 or any later version}, and it is available at \linebreak \url{https://github.com/robotology/gym-ignition}. \vspace{3mm} \noindent diff --git a/FrontBackmatter/abstract.tex b/FrontBackmatter/abstract.tex index 47f51b6..5a4ae64 100644 --- a/FrontBackmatter/abstract.tex +++ b/FrontBackmatter/abstract.tex @@ -53,7 +53,7 @@ In this context, we first introduce a software architecture allowing to structure learning environments for robotics that can be adopted to train and run \ac{RL} policies regardless of the simulated or real-world setting. With its underlying simulation technology and exploiting a scheme based on reward shaping, we validate the architecture by training with \ac{RL} a push-recovery controller capable of synthesising whole-body references for the humanoid robot iCub. -Then, motivated by overcoming the bottlenecks related to the poor sampling performance of traditional rigid-body simulators, we present a new physics engine in reduced coordinates that can simulate robots interacting with a ground surface on hardware accelerators like \acp{GPU} and \acp{TPU}. +Then, motivated by overcoming the bottlenecks related to the poor sampling performance of traditional rigid-body simulators, we present a new physics engine in reduced coordinates that can simulate robots interacting with a ground surface on hardware accelerators like \ac{GPU} and \ac{TPU}. To this end, we present a contact-aware continuous state-space representation describing the dynamical evolution of floating-base robots that can be numerically integrated for simulation purposes. We adopt the new general-purpose Gazebo Sim simulator as our first solution to sample synthetic data, and exploit \jax and its hardware support to scale the sampling performance for highly parallel problems. Furthermore, we implement and benchmark common \acp{RBDA} part of the proposed physics engine on hardware accelerators and assess their scalability properties on different \acp{GPU}. diff --git a/FrontBackmatter/contents.tex b/FrontBackmatter/contents.tex index 2ae1d62..06ca8dd 100644 --- a/FrontBackmatter/contents.tex +++ b/FrontBackmatter/contents.tex @@ -128,4 +128,93 @@ \acro{ZMP}{Zero Moment Point} \end{acronym} + %******************************************************* + % List of Symbols + %******************************************************* + + \phantomsection + \pdfbookmark[1]{List of Symbols}{los} + + \nomenclature[A, 01]{$\pos$}{Point in space} + \nomenclature[A, 01]{$A, B, C, \dots$}{Frame names} + \nomenclature[A, 01]{$W$}{World (inertial) frame} + \nomenclature[A, 02]{$[A]$}{Orientation frame of frame A} + \nomenclature[A, 02]{$\ori_A \in \realn^3$}{Origin of frame A} + \nomenclature[A, 02]{$A = (\ori_A, [A])$}{Definition of frame A} + \nomenclature[A, 03]{$\pos[A] \in \realn^3$}{Coordinate vector of point $\pos$ expressed in frame $A$} + \nomenclature[A, 03]{$\rot[A]_B \in \SO(3)$}{Rotation matrix from orientation frame $[B]$ to $[A]$} + \nomenclature[A, 04]{$\homo[A]_B \in \SE(3)$}{Homogeneous transformation from frame $B$ to $A$} + \nomenclature[A, 04]{${}^{A}\tilde{\mathbf{p}} \in \realn^4$}{Homogeneous representation of coordinate vector $\pos[A]$} + \nomenclature[A, 05]{$\vellin[C]_{A,B} \in \realn^3$}{Linear velocity of frame $B$ relative to frame $A$, expressed in $C$} + \nomenclature[A, 06]{$\velang[C]_{A,B} \in \realn^3$}{Angular velocity of frame $B$ relative to frame $A$, expressed in $C$} + \nomenclature[A, 06]{$\velsix[C]_{A,B} \in \realn^6$}{6D velocity of frame $B$ relative to frame $A$, expressed in $C$} + \nomenclature[A, 06]{$\velsix[C]_{A,B}^\wedge \in \se(3)$}{Matrix representation of 6D velocity $\velsix[C]_{A,B}$} + \nomenclature[A, 07]{$\transvel[A]_B \in \realn^{6\times 6}$}{Velocity transformation from frame $B$ to frame $A$} + \nomenclature[A, 08]{$\velsix[C]_{A,B} \crossvelsix \in \realn^{6\times6}$}{Cross product operator on $\realn^6$ for 6D velocities} + \nomenclature[A, 09]{$\velsixd[C]_{A,B} \in \realn^3$}{Apparent acceleration of frame $B$ relative to $A$, expressed in C} + \nomenclature[A, 10]{$\accsix[C]_{A,B} \in \realn^3$}{Intrinsic acceleration of frame $B$ relative to $A$, expressed in C} + \nomenclature[A, 10]{$\accsixproper[C]_{A,B} \in \realn^3$}{Proper acceleration of frame $B$ relative to $A$, expressed in C} + \nomenclature[A, 10]{$\forcelin[A] \in \realn^3$}{Linear force expressed in frame $A$} + \nomenclature[A, 11]{$\forceang[A] \in \realn^3$}{Angular force (torque) expressed in frame $A$} + \nomenclature[A, 11]{$\forcesix[A] \in \realn^6$}{6D force expressed in frame $A$} + \nomenclature[A, 11]{$\transfor[A]^B \in \realn^{6\times 6}$}{Force transformation from frame $B$ to frame $A$} + \nomenclature[A, 11]{$\velsix[C]_{A,B} \crossforsix \in \realn^{6\times6}$}{Cross product operator on $\realn^6$ for 6D forces} + \nomenclature[A, 11]{$I \in \realn^{3\times3}$}{Inertia tensor} + \nomenclature[A, 12]{$\masssix[B] \in \realn^{6\times6}$}{6D inertia matrix, expressed in frame $B$} + \nomenclature[A, 12]{$g \in \realn^+$}{Standard gravity} + \nomenclature[A, 13]{$\gravity[W] \in \realn^3$}{Gravitational acceleration vector} + \nomenclature[A, 13]{$\subspacee[X]_{P,C} \in \realn^6$}{Joint motion subspace between frame $P$ and $C$, expressed in $X$} + \nomenclature[A, 13]{$n \in \mathbb{N}$}{Number of degrees of freedom of a multibody system} + \nomenclature[A, 14]{$\Qu \in \SE(3) \times \realn^n$}{Floating-base position} + \nomenclature[A, 15]{$\Nu[X] \in \realn^{6+n}$}{Floating-base velocity having base velocity $\velsix[X]_{W,B}$} + \nomenclature[A, 16]{$\Shape \in \realn^n$}{Joint positions} + \nomenclature[A, 16]{$\Shaped \in \realn^n$}{Joint velocities} + \nomenclature[A, 16]{$\Shapedd \in \realn^n$}{Joint accelerations} + \nomenclature[A, 16]{$\jac[Y]_{B,E} \in \realn^{6\times n}$}{Relative Jacobian of frame $E$ \wrt $B$, expressed in $Y$} + \nomenclature[A, 17]{$\jac[Y]_{W,E} \in \realn^{6\times (6+n)}$}{Free-floating Jacobian of frame $E$, expressed in $Y$} + \nomenclature[A, 18]{$\jac[Y]_{W,E/X} \in \realn^{6\times (6+n)}$}{Free-floating Jacobian of frame $E$, expressed in $Y$, for base velocity expressed in $X$} + \nomenclature[A, 19]{$M(\Qu) \in \realn^{(6+n)\times (6+n)}$}{Mass matrix} + \nomenclature[A, 20]{$C(\Qu, \Nu) \in \realn^{(6+n)\times(6+n)}$}{Coriolis matrix} + \nomenclature[A, 20]{$g(\Qu) \in \realn^{6+n}$}{Potential force vector (or gravity vector)} + \nomenclature[A, 20]{$h(\Qu, \Nu) \in \realn^{6+n}$}{Bias forces vector} + \nomenclature[A, 21]{$\Torques \in \realn^n$}{Joint generalized forces} + \nomenclature[A, 22]{$\mathcal{L}$}{Set of link indices} + \nomenclature[A, 23]{$n_L$}{Number of links} + \nomenclature[A, 24]{$\forcesix_\mathcal{L}^\text{ext} \in \realn^{6 \times n_L}$}{Vector stacking external forces applied to all links} + \nomenclature[A, 24]{$\jac_\mathcal{L} \in \realn^{6 n_L \times (6+n)}$}{Matrix stacking floating-base Jacobians of all links} + \nomenclature[A, 25]{$q \in \mathbb{H}$}{A quaternion} + \nomenclature[A, 26]{$\bar{q} \in \operatorname{Spin}(3)$}{A unit quaternion} + \nomenclature[A, 27]{$\quat = (w, \mathbf{r}) \in \realn^4$}{Quaternion coefficients} + \nomenclature[A, 28]{$\mathcal{H}$}{Function providing terrain height} + \nomenclature[A, 29]{$\mathcal{S}$}{Function providing terrain normal} + \nomenclature[A, 30]{$(\cdot)^\parallel, \, (\cdot)_\parallel$}{Component parallel to the terrain} + \nomenclature[A, 30]{$(\cdot)^\perp, \, (\cdot)_\perp$}{Component normal to the terrain} + + \nomenclature[L, 01]{$\mathcal{S}$}{The state space} + \nomenclature[L, 02]{$\mathcal{A}$}{The action space} + \nomenclature[L, 03]{$s_t \in \mathcal{S}$}{State of the environment at time $t$} + \nomenclature[L, 04]{$a_t \in \mathcal{A}$}{Action applied to the environment at time $t$} + \nomenclature[L, 05]{$\mathcal{P} \,:\, \mathcal{S} \times \mathcal{A} \to \operatorname{Pr}[\mathcal{S}]$}{State-transition probability density function} + \nomenclature[L, 06]{$\mathcal{R} \,:\, \mathcal{S} \times \mathcal{A} \times \mathcal{S} \to \realn$}{Reward function} + \nomenclature[L, 07]{$r_t \in \realn$}{Immediate reward at time $t$} + \nomenclature[L, 08]{$a = \mu(s)$}{Action taken from deterministic policy in state $s$} + \nomenclature[L, 09]{$a \sim \pi(\cdot \given s)$}{Action sampled from stochastic policy in state $s$} + \nomenclature[L, 10]{$\pi_{\boldsymbol{\theta}}$}{Stochastic policy parameterized with $\boldsymbol{\theta}$} + \nomenclature[L, 11]{$\tau = (s_0, a_0, s_1, a_1, \dots, s_T)$}{Trajectory of states and actions} + \nomenclature[L, 12]{$s_0 \sim \rho_0(\cdot)$}{Sampling state from initial state distribution} + \nomenclature[L, 13]{$R_t$}{Return at time $t$} + \nomenclature[L, 13]{$\hat{R}_t$}{Reward-to-go at time $t$} + \nomenclature[L, 14]{$R(\tau)$}{Discounted return of trajectory $\tau$} + \nomenclature[L, 15]{$J(\pi)$}{Performance function of stochastic policy $\pi$} + \nomenclature[L, 16]{$\pi^*$}{Optimal policy} + \nomenclature[L, 17]{$\langle \mathcal{S}, \mathcal{A}, \mathcal{R}, \mathcal{P}, \mathcal{\rho}_0 \rangle$}{Tuple defining a Markov Decision Process} + \nomenclature[L, 18]{$V^\pi(s)$}{State-value function for policy $\pi$ at state $s$} + \nomenclature[L, 18]{$Q^\pi(s, a)$}{Action-value function for policy $\pi$ at state-action pair $(s, a)$} + \nomenclature[L, 19]{$A^\pi(s, a)$}{Advantage function for policy $pi$ at state-action pair $(s, a)$} + \nomenclature[L, 20]{$\mathbb{E}[\cdot]$}{Expected value of a random variable} + \nomenclature[L, 21]{$\hat{\mathbb{E}}[\cdot]$}{Empirical average estimating the expected value of a random variable from samples} + + \renewcommand{\nomname}{List of Symbols} + \printnomenclature + \endgroup diff --git a/classicthesis-config.tex b/classicthesis-config.tex index bd62dea..646fc92 100644 --- a/classicthesis-config.tex +++ b/classicthesis-config.tex @@ -336,3 +336,19 @@ \usepackage{algpseudocodex} \usepackage{algorithm} \usepackage{setspace} + +% List of symbols +% https://www.overleaf.com/learn/latex/Nomenclatures +\usepackage{nomencl} +\makenomenclature + +% Group separation in list of symbols +\usepackage{etoolbox} +\renewcommand\nomgroup[1]{% +\item[ + \bfseries + \ifstrequal{#1}{A}{Robot Kinematics and Dynamics}{% + \ifstrequal{#1}{L}{Reinforcement Learning}{}% + }% +]% +} diff --git a/images/contributions/chapter_7/soft_contact_model.tikz b/images/contributions/chapter_7/soft_contact_model.tikz index 87c9b1d..64fa96c 100644 --- a/images/contributions/chapter_7/soft_contact_model.tikz +++ b/images/contributions/chapter_7/soft_contact_model.tikz @@ -105,19 +105,19 @@ color(100bp)=(transparent!10) } % Text Node \draw (288,46.6) node [anchor=south east] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 65; green, 70; blue, 203 } ,opacity=1 ] {$\mathbf{p}_{cp}( t)$}; % Text Node -\draw (190,213.4) node [anchor=north] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 189; green, 16; blue, 224 } ,opacity=1 ] {$\boldsymbol{v}_{W,cp}$}; +\draw (190,213.4) node [anchor=north] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 189; green, 16; blue, 224 } ,opacity=1 ] {$\boldsymbol{v}_{W,C}$}; % Text Node \draw (221,111.6) node [anchor=south west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 234; green, 81; blue, 100 } ,opacity=1 ] {$\mathbf{u}$}; % Text Node \draw (242,156.6) node [anchor=south west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 67; green, 207; blue, 175 } ,opacity=1 ] {$\delta $}; % Text Node -\draw (178,190) node [anchor=east] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 99; green, 159; blue, 30 } ,opacity=1 ] {$\boldsymbol{v}_{W,cp}^{\parallel }$}; +\draw (178,190) node [anchor=east] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 99; green, 159; blue, 30 } ,opacity=1 ] {$\boldsymbol{v}_{W,C}^{\parallel }$}; % Text Node -\draw (232,190) node [anchor=west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 74; green, 144; blue, 226 } ,opacity=1 ] {$\boldsymbol{v}_{W,cp}^{\perp }$}; +\draw (232,190) node [anchor=west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 74; green, 144; blue, 226 } ,opacity=1 ] {$\boldsymbol{v}_{W,C}^{\perp }$}; % Text Node \draw (298,146.6) node [anchor=south east] [inner sep=0.75pt] [font=\tiny,color={rgb, 255:red, 0; green, 0; blue, 0 } ,opacity=1 ] {$W$}; % Text Node -\draw (218,166.6) node [anchor=south east] [inner sep=0.75pt] [font=\tiny,color={rgb, 255:red, 0; green, 0; blue, 0 } ,opacity=1 ] {$C[ W]$}; +\draw (218,166.6) node [anchor=south east] [inner sep=0.75pt] [font=\tiny,color={rgb, 255:red, 0; green, 0; blue, 0 } ,opacity=1 ] {$C$}; % Text Node \draw (148,140) node [anchor=east] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 139; green, 87; blue, 42 } ,opacity=1 ] {$h$}; % Text Node diff --git a/new_commands.tex b/new_commands.tex index 23e68cd..2cafec8 100644 --- a/new_commands.tex +++ b/new_commands.tex @@ -27,6 +27,7 @@ \newcommand{\homod}[1][]{\prescript{#1}{}{\dot{\mathbf{H}}}} \newcommand{\veladj}[1][]{\prescript{#1}{}{\mathbf{X}}} \newcommand{\quat}[1][]{\prescript{#1}{}{\mathtt{Q}}} +\newcommand{\quatd}[1][]{\prescript{#1}{}{\dot{\mathtt{Q}}}} \newcommand{\trans}[1][]{\prescript{#1}{}{\mathbf{H}}} \newcommand{\transvel}[1][]{\prescript{#1}{}{\mathbf{X}}} \newcommand{\transfor}[1][]{\prescript{}{#1}{\mathbf{X}}} @@ -78,6 +79,10 @@ \newcommand{\so}{\mathfrak{so}} \newcommand{\SE}{\text{SE}} \newcommand{\se}{\mathfrak{se}} +% \newcommand{\S}{\text{S}} +\newcommand{\Real}{\text{Re}} +\newcommand{\Imag}{\text{Im}} +\newcommand{\Spin}{\text{Spin}} \newcommand{\pid}{\spacedlowsmallcaps{PID}\xspace} diff --git a/thesis.tex b/thesis.tex index fc2cd1b..961f134 100644 --- a/thesis.tex +++ b/thesis.tex @@ -49,6 +49,9 @@ %******************************************************* %\hyphenation{put special hyphenation here} +% Sequential numbering document-wise of footnotes +\counterwithout{footnote}{chapter} + % ******************************************************************** % GO!GO!GO! MOVE IT! %******************************************************* diff --git a/zotero.bib b/zotero.bib index 3379dee..6f61393 100644 --- a/zotero.bib +++ b/zotero.bib @@ -1,60 +1,104 @@ -@inproceedings{ferigo_gym-ignition_2020, - title = {Gym-{Ignition}: {Reproducible} {Robotic} {Simulations} for {Reinforcement} {Learning}}, - copyright = {All rights reserved}, - shorttitle = {Gym-{Ignition}}, - doi = {10.1109/SII46433.2020.9025951}, - abstract = {This paper presents Gym-Ignition, a new framework to create reproducible robotic environments for reinforcement learning research. It interfaces with the new generation of Gazebo, part of the Ignition Robotics suite, which provides three main improvements for reinforcement learning applications compared to the alternatives: 1) the modular architecture enables using the simulator as a C++ library, simplifying the interconnection with external software; 2) multiple physics and rendering engines are supported as plugins, simplifying their selection during the execution; 3) the new distributed simulation capability allows simulating complex scenarios while sharing the load on multiple workers and machines. The core of Gym-Ignition is a component that contains the Ignition Gazebo simulator and exposes a simple interface for its configuration and execution. We provide a Python package that allows developers to create robotic environments simulated in Ignition Gazebo. Environments expose the common OpenAI Gym interface, making them compatible out-of-the-box with third-party frameworks containing reinforcement learning algorithms. Simulations can be executed in both headless and GUI mode, the physics engine can run in accelerated mode, and instances can be parallelized. Furthermore, the Gym-Ignition software architecture provides abstraction of the Robot and the Task, making environments agnostic on the specific runtime. This abstraction allows their execution also in a real-time setting on actual robotic platforms, even if driven by different middlewares.}, - booktitle = {{IEEE}/{SICE} {International} {Symposium} on {System} {Integration} ({SII})}, - author = {Ferigo, Diego and Traversaro, Silvio and Metta, Giorgio and Pucci, Daniele}, - year = {2020}, - note = {ISSN: 2474-2325}, - keywords = {Learning (artificial intelligence), Task analysis, Robots, Physics, Ignition, Real-time systems, own}, - pages = {885--890}, +@article{silver_reward_2021, + title = {Reward is enough}, + volume = {299}, + issn = {0004-3702}, + url = {https://www.sciencedirect.com/science/article/pii/S0004370221000862}, + doi = {10.1016/j.artint.2021.103535}, + abstract = {In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.}, + language = {en}, + urldate = {2022-02-09}, + journal = {Artificial Intelligence}, + author = {Silver, David and Singh, Satinder and Precup, Doina and Sutton, Richard S.}, + month = oct, + year = {2021}, + keywords = {Reinforcement learning, Artificial general intelligence, Artificial intelligence, Reward}, + pages = {103535}, + file = {ScienceDirect Snapshot:/home/dferigo/Zotero/storage/XUXAKHJ9/S0004370221000862.html:text/html;Silver et al_2021_Reward is enough.pdf:/home/dferigo/Zotero/storage/DE8PXLYA/Silver et al_2021_Reward is enough.pdf:application/pdf}, } -@inproceedings{ferigo_gym-ignition_2019, - title = {Gym-{Ignition}: {Reproducible} {Robotic} {Simulations} for {Reinforcement} {Learning}}, - copyright = {All rights reserved}, - abstract = {This paper presents Gym-Ignition, a new framework to create reproducible robotic environments for reinforcement learning research. It interfaces with the new generation of Gazebo, part of the Ignition Robotics suite. The modular architecture of Ignition Gazebo allows using the simulator as a library, simplifying the interconnection between the simulator and external software. Gym-Ignition enables the creation of environments compatible with OpenAI Gym that are executed in Ignition Gazebo. The Gym-Ignition software architecture can interact natively with programs developed in low-level languages. The advantage is twofold: it permits direct interaction with robots in real-time, and it simplifies the embedding of existing software such as low-level robotic controllers.}, - booktitle = {Robotics: {Science} and {Systems} ({RSS}) - {Workshop} on {Closing} the {Reality} {Gap} in {Sim2real} {Transfer} for {Robotic} {Manipulation}}, - author = {Ferigo, Diego and Traversaro, Silvio and Pucci, Daniele}, - year = {2019}, - keywords = {own}, +@article{cleach_fast_2021, + title = {Fast {Contact}-{Implicit} {Model}-{Predictive} {Control}}, + url = {http://arxiv.org/abs/2107.05616}, + abstract = {We present a general approach for controlling robotic systems that make and break contact with their environments. Contact-implicit model-predictive control (CI-MPC) generalizes linear MPC to contact-rich settings by relying on linear complementarity problems (LCP) computed using strategic Taylor approximations about a reference trajectory and retaining non-smooth impact and friction dynamics, allowing the policy to not only reason about contact forces and timing, but also generate entirely new contact mode sequences online. To achieve reliable and fast numerical convergence, we devise a structure-exploiting, path-following solver for the LCP contact dynamics and a custom trajectory optimizer for trajectory-tracking MPC problems. We demonstrate CI-MPC at real-time rates in simulation, and show that it is robust to model mismatch and can respond to disturbances by discovering and exploiting new contact modes across a variety of robotic systems, including a pushbot, hopper, and planar quadruped and biped.}, + language = {en}, + urldate = {2021-11-26}, + journal = {arXiv:2107.05616 [cs, eess]}, + author = {Cleac'h, Simon Le and Howell, Taylor and Schwager, Mac and Manchester, Zachary}, + month = sep, + year = {2021}, + note = {arXiv: 2107.05616}, + keywords = {Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control}, + file = {Cleac'h et al. - 2021 - Fast Contact-Implicit Model-Predictive Control.pdf:/home/dferigo/Zotero/storage/FCY6KR2I/Cleac'h et al. - 2021 - Fast Contact-Implicit Model-Predictive Control.pdf:application/pdf}, } -@article{ferigo_emergence_2021, - title = {On the {Emergence} of {Whole}-body {Strategies} from {Humanoid} {Robot} {Push}-recovery {Learning}}, - abstract = {Balancing and push-recovery are essential capabilities enabling humanoid robots to solve complex locomotion tasks. In this context, classical control systems tend to be based on simplified physical models and hard-coded strategies. Although successful in specific scenarios, this approach requires demanding tuning of parameters and switching logic between specifically-designed controllers for handling more general perturbations. We apply model-free Deep Reinforcement Learning for training a general and robust humanoid push-recovery policy in a simulation environment. Our method targets highdimensional whole-body humanoid control and is validated on the iCub humanoid. Reward components incorporating expert knowledge on humanoid control enable fast learning of several robust behaviors by the same policy, spanning the entire body. We validate our method with extensive quantitative analyses in simulation, including out-of-sample tasks which demonstrate policy robustness and generalization, both key requirements towards real-world robot deployment.}, - journal = {IEEE Robotics and Automation Letters}, - author = {Ferigo, Diego and Camoriano, Raffaello and Viceconte, Paolo Maria and Calandriello, Daniele and Traversaro, Silvio and Rosasco, Lorenzo and Pucci, Daniele}, +@article{roy_machine_2021, + title = {From {Machine} {Learning} to {Robotics}: {Challenges} and {Opportunities} for {Embodied} {Intelligence}}, + shorttitle = {From {Machine} {Learning} to {Robotics}}, + url = {http://arxiv.org/abs/2110.15245}, + abstract = {Machine learning has long since become a keystone technology, accelerating science and applications in a broad range of domains. Consequently, the notion of applying learning methods to a particular problem set has become an established and valuable modus operandi to advance a particular field. In this article we argue that such an approach does not straightforwardly extended to robotics -- or to embodied intelligence more generally: systems which engage in a purposeful exchange of energy and information with a physical environment. In particular, the purview of embodied intelligent agents extends significantly beyond the typical considerations of main-stream machine learning approaches, which typically (i) do not consider operation under conditions significantly different from those encountered during training; (ii) do not consider the often substantial, long-lasting and potentially safety-critical nature of interactions during learning and deployment; (iii) do not require ready adaptation to novel tasks while at the same time (iv) effectively and efficiently curating and extending their models of the world through targeted and deliberate actions. In reality, therefore, these limitations result in learning-based systems which suffer from many of the same operational shortcomings as more traditional, engineering-based approaches when deployed on a robot outside a well defined, and often narrow operating envelope. Contrary to viewing embodied intelligence as another application domain for machine learning, here we argue that it is in fact a key driver for the advancement of machine learning technology. In this article our goal is to highlight challenges and opportunities that are specific to embodied intelligence and to propose research directions which may significantly advance the state-of-the-art in robot learning.}, + language = {en}, + urldate = {2021-11-03}, + journal = {arXiv:2110.15245 [cs]}, + author = {Roy, Nicholas and Posner, Ingmar and Barfoot, Tim and Beaudoin, Philippe and Bengio, Yoshua and Bohg, Jeannette and Brock, Oliver and Depatie, Isabelle and Fox, Dieter and Koditschek, Dan and Lozano-Perez, Tomas and Mansinghka, Vikash and Pal, Christopher and Richards, Blake and Sadigh, Dorsa and Schaal, Stefan and Sukhatme, Gaurav and Therien, Denis and Toussaint, Marc and Van de Panne, Michiel}, + month = oct, year = {2021}, - keywords = {own}, + note = {arXiv: 2110.15245}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, + file = {Roy et al_2021_From Machine Learning to Robotics.pdf:/home/dferigo/Zotero/storage/CA7DHLB4/Roy et al_2021_From Machine Learning to Robotics.pdf:application/pdf}, } -@article{andrychowicz_what_2020, - title = {What {Matters} {In} {On}-{Policy} {Reinforcement} {Learning}? {A} {Large}-{Scale} {Empirical} {Study}}, - shorttitle = {What {Matters} {In} {On}-{Policy} {Reinforcement} {Learning}?}, - url = {http://arxiv.org/abs/2006.05990}, - abstract = {In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement {\textgreater}50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.}, - urldate = {2021-05-27}, - journal = {arXiv:2006.05990 [cs, stat]}, - author = {Andrychowicz, Marcin and Raichuk, Anton and Stańczyk, Piotr and Orsini, Manu and Girgin, Sertan and Marinier, Raphael and Hussenot, Léonard and Geist, Matthieu and Pietquin, Olivier and Michalski, Marcin and Gelly, Sylvain and Bachem, Olivier}, +@techreport{traversaro_multibody_2019, + title = {Multibody dynamics notation}, + url = {https://pure.tue.nl/ws/portalfiles/portal/139293126/A_Multibody_Dynamics_Notation_Revision_2_.pdf}, + language = {en}, + author = {Traversaro, Silvio and Saccon, Alessandro}, + year = {2019}, + pages = {24}, +} + +@inproceedings{liu_robot_2021, + title = {Robot {Reinforcement} {Learning} on the {Constraint} {Manifold}}, + url = {https://openreview.net/forum?id=zwo1-MdMl1P}, + abstract = {Reinforcement learning in robotics is extremely challenging due to many practical issues, including safety, mechanical constraints, and wear and tear. Typically, these issues are not considered in...}, + language = {en}, + urldate = {2021-10-21}, + author = {Liu, Puze and Tateo, Davide and Ammar, Haitham Bou and Peters, Jan}, month = jun, - year = {2020}, - note = {arXiv: 2006.05990}, - keywords = {Computer Science - Machine Learning, Statistics - Machine Learning}, - file = {Andrychowicz et al_2020_What Matters In On-Policy Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/WZMAKPPK/Andrychowicz et al_2020_What Matters In On-Policy Reinforcement Learning.pdf:application/pdf;arXiv.org Snapshot:/home/dferigo/Zotero/storage/U5JYD637/2006.html:text/html}, + year = {2021}, + file = {Liu et al_2021_Robot Reinforcement Learning on the Constraint Manifold.pdf:/home/dferigo/Zotero/storage/8VD2XRMC/Liu et al_2021_Robot Reinforcement Learning on the Constraint Manifold.pdf:application/pdf}, } -@misc{schulman_nuts_2016, - title = {The {Nuts} and {Bolts} of {Deep} {RL} {Research}}, - url = {https://rll.berkeley.edu/deeprlcourse/docs/nuts-and-bolts.pdf}, +@article{acosta_validating_2021, + title = {Validating {Robotics} {Simulators} on {Real} {World} {Impacts}}, + url = {http://arxiv.org/abs/2110.00541}, + abstract = {A realistic simulation environment is an essential tool in every roboticist's toolkit, with uses ranging from planning and control to training policies with reinforcement learning. Despite the centrality of simulation in modern robotics, little work has been done to compare the performance of robotics simulators against real-world data, especially for scenarios involving dynamic motions with high speed impact events. Handling dynamic contact is the computational bottleneck for most simulations, and thus the modeling and algorithmic choices surrounding impacts and friction form the largest distinctions between popular tools. Here, we evaluate the ability of several simulators to reproduce real-world trajectories involving impacts. Using experimental data, we identify system-specific contact parameters of popular simulators Drake, MuJoCo, and Bullet, analyzing the effects of modeling choices around these parameters. For the simple example of a cube tossed onto a table, simulators capture inelastic impacts well while failing to capture elastic impacts. For the higher-dimensional case of a Cassie biped landing from a jump, the simulators capture the bulk motion well but the accuracy is limited by numerous model differences between the real robot and the simulators.}, + urldate = {2021-10-15}, + journal = {arXiv:2110.00541 [cs]}, + author = {Acosta, Brian and Yang, William and Posa, Michael}, + month = oct, + year = {2021}, + note = {arXiv: 2110.00541}, + keywords = {Computer Science - Robotics}, + file = {Acosta et al_2021_Validating Robotics Simulators on Real World Impacts.pdf:/home/dferigo/Zotero/storage/D4WET8GW/Acosta et al_2021_Validating Robotics Simulators on Real World Impacts.pdf:application/pdf;arXiv.org Snapshot:/home/dferigo/Zotero/storage/LU98NNTF/2110.html:text/html}, +} + +@inproceedings{gronauer_successful_2021, + address = {Montreal, Canada}, + title = {The {Successful} {Ingredients} of {Policy} {Gradient} {Algorithms}}, + isbn = {978-0-9992411-9-6}, + url = {https://www.ijcai.org/proceedings/2021/338}, + doi = {10.24963/ijcai.2021/338}, + abstract = {Despite the sublime success in recent years, the underlying mechanisms powering the advances of reinforcement learning are yet poorly understood. In this paper, we identify these mechanisms - which we call ingredients - in on-policy policy gradient methods and empirically determine their impact on the learning. To allow an equitable assessment, we conduct our experiments based on a unified and modular implementation. Our results underline the significance of recent algorithmic advances and demonstrate that reaching state-of-the-art performance may not need sophisticated algorithms but can also be accomplished by the combination of a few simple ingredients.}, language = {en}, - author = {Schulman, John}, - month = dec, - year = {2016}, - file = {Schulman - The Nuts and Bolts of Deep RL Research.pdf:/home/dferigo/Zotero/storage/NWLULPJC/Schulman - The Nuts and Bolts of Deep RL Research.pdf:application/pdf}, + urldate = {2021-10-14}, + booktitle = {Proceedings of the {Thirtieth} {International} {Joint} {Conference} on {Artificial} {Intelligence}}, + publisher = {International Joint Conferences on Artificial Intelligence Organization}, + author = {Gronauer, Sven and Gottwald, Martin and Diepold, Klaus}, + month = aug, + year = {2021}, + pages = {2455--2461}, + file = {Gronauer et al_2021_The Successful Ingredients of Policy Gradient Algorithms.pdf:/home/dferigo/Zotero/storage/B9A5Y9UV/Gronauer et al_2021_The Successful Ingredients of Policy Gradient Algorithms.pdf:application/pdf}, } @article{engstrom_implementation_2020, @@ -72,973 +116,935 @@ @article{engstrom_implementation_2020 file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/AI6US5L8/2005.html:text/html;Engstrom et al_2020_Implementation Matters in Deep Policy Gradients.pdf:/home/dferigo/Zotero/storage/HSDXPTLP/Engstrom et al_2020_Implementation Matters in Deep Policy Gradients.pdf:application/pdf}, } -@article{li_reinforcement_2021, - title = {Reinforcement {Learning} for {Robust} {Parameterized} {Locomotion} {Control} of {Bipedal} {Robots}}, - url = {http://arxiv.org/abs/2103.14295}, - abstract = {Developing robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, which can then be transferred to a real bipedal Cassie robot. To facilitate sim-to-real transfer, domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics. The learned policies enable Cassie to perform a set of diverse and dynamic behaviors, while also being more robust than traditional controllers and prior learning-based methods that use residual control. We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw.}, - urldate = {2021-05-27}, - journal = {arXiv:2103.14295 [cs, eess]}, - author = {Li, Zhongyu and Cheng, Xuxin and Peng, Xue Bin and Abbeel, Pieter and Levine, Sergey and Berseth, Glen and Sreenath, Koushil}, - month = mar, - year = {2021}, - note = {arXiv: 2103.14295}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control}, - file = {Li et al. - 2021 - Reinforcement Learning for Robust Parameterized Lo.pdf:/home/dferigo/Zotero/storage/CG5A5LGU/Li et al. - 2021 - Reinforcement Learning for Robust Parameterized Lo.pdf:application/pdf}, -} - -@article{collins_review_2021, - title = {A {Review} of {Physics} {Simulators} for {Robotic} {Applications}}, - volume = {9}, - issn = {2169-3536}, - doi = {10.1109/ACCESS.2021.3068769}, - abstract = {The use of simulators in robotics research is widespread, underpinning the majority of recent advances in the field. There are now more options available to researchers than ever before, however navigating through the plethora of choices in search of the right simulator is often non-trivial. Depending on the field of research and the scenario to be simulated there will often be a range of suitable physics simulators from which it is difficult to ascertain the most relevant one. We have compiled a broad review of physics simulators for use within the major fields of robotics research. More specifically, we navigate through key sub-domains and discuss the features, benefits, applications and use-cases of the different simulators categorised by the respective research communities. Our review provides an extensive index of the leading physics simulators applicable to robotics researchers and aims to assist them in choosing the best simulator for their use case.}, - journal = {IEEE Access}, - author = {Collins, J. and Chand, S. and Vanderkop, A. and Howard, D.}, - year = {2021}, - keywords = {review, Robots, Robot sensing systems, robotics, Legged locomotion, Physics, aerial robotics, field robotics, manipulation, marine robotics, Mobile robots, Navigation, robotic learning, Sensors, Simulation, soft robotics, surgical robotics}, - pages = {51416--51431}, +@misc{schulman_nuts_2016, + title = {The {Nuts} and {Bolts} of {Deep} {RL} {Research}}, + url = {https://rll.berkeley.edu/deeprlcourse/docs/nuts-and-bolts.pdf}, + language = {en}, + author = {Schulman, John}, + month = dec, + year = {2016}, + file = {Schulman - The Nuts and Bolts of Deep RL Research.pdf:/home/dferigo/Zotero/storage/NWLULPJC/Schulman - The Nuts and Bolts of Deep RL Research.pdf:application/pdf}, } -@inproceedings{seung-joon_yi_learning_2011, - address = {Shanghai, China}, - title = {Learning full body push recovery control for small humanoid robots}, - isbn = {978-1-61284-386-5}, - url = {http://ieeexplore.ieee.org/document/5980531/}, - doi = {10.1109/ICRA.2011.5980531}, - abstract = {Dynamic bipedal walking is susceptible to external disturbances and surface irregularities, requiring robust feedback control to remain stable. In this work, we present a practical hierarchical push recovery strategy that can be readily implemented on a wide range of humanoid robots. Our method consists of low level controllers that perform simple, biomechanically motivated push recovery actions and a high level controller that combines the low level controllers according to proprioceptive and inertial sensory signals and the current robot state. Reinforcement learning is used to optimize the parameters of the controllers in order to maximize the stability of the robot over a broad range of external disturbances. The controllers are learned on a physical simulation and implemented on the Darwin-HP humanoid robot platform, and the resulting experiments demonstrate effective full body push recovery behaviors during dynamic walking.}, +@article{andrychowicz_what_2020, + title = {What {Matters} {In} {On}-{Policy} {Reinforcement} {Learning}? {A} {Large}-{Scale} {Empirical} {Study}}, + shorttitle = {What {Matters} {In} {On}-{Policy} {Reinforcement} {Learning}?}, + url = {http://arxiv.org/abs/2006.05990}, + abstract = {In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement {\textgreater}50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.}, urldate = {2021-05-27}, - booktitle = {2011 {IEEE} {International} {Conference} on {Robotics} and {Automation}}, - publisher = {IEEE}, - author = {{Seung-Joon Yi} and {Byoung-Tak Zhang} and Hong, Dennis and Lee, Daniel D.}, - month = may, - year = {2011}, - pages = {2047--2052}, - file = {Seung-Joon Yi et al. - 2011 - Learning full body push recovery control for small.pdf:/home/dferigo/Zotero/storage/N3N4IJG7/Seung-Joon Yi et al. - 2011 - Learning full body push recovery control for small.pdf:application/pdf}, + journal = {arXiv:2006.05990 [cs, stat]}, + author = {Andrychowicz, Marcin and Raichuk, Anton and Stańczyk, Piotr and Orsini, Manu and Girgin, Sertan and Marinier, Raphael and Hussenot, Léonard and Geist, Matthieu and Pietquin, Olivier and Michalski, Marcin and Gelly, Sylvain and Bachem, Olivier}, + month = jun, + year = {2020}, + note = {arXiv: 2006.05990}, + keywords = {Computer Science - Machine Learning, Statistics - Machine Learning}, + file = {Andrychowicz et al_2020_What Matters In On-Policy Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/WZMAKPPK/Andrychowicz et al_2020_What Matters In On-Policy Reinforcement Learning.pdf:application/pdf;arXiv.org Snapshot:/home/dferigo/Zotero/storage/U5JYD637/2006.html:text/html}, } -@article{bellegarda_robust_2021, - title = {Robust {High}-speed {Running} for {Quadruped} {Robots} via {Deep} {Reinforcement} {Learning}}, - url = {http://arxiv.org/abs/2103.06484}, - abstract = {Deep reinforcement learning has emerged as a popular and powerful way to develop locomotion controllers for quadruped robots. Common approaches have largely focused on learning actions directly in joint space, or learning to modify and offset foot positions produced by trajectory generators. Both approaches typically require careful reward shaping and training for millions of time steps, and with trajectory generators introduce human bias into the resulting control policies. In this paper, we instead explore learning foot positions in Cartesian space, which we track with impedance control, for a task of running as fast as possible subject to environmental disturbances. Compared with other action spaces, we observe less needed reward shaping, much improved sample efficiency, the emergence of natural gaits such as galloping and bounding, and ease of sim-to-sim transfer. Policies can be learned in only a few million time steps, even for challenging tasks of running over rough terrain with loads of over 100\% of the nominal quadruped mass. Training occurs in PyBullet, and we perform a sim-to-sim transfer to Gazebo, where our quadruped is able to run at over 4 m/s without a load, and 3.5 m/s with a 10 kg load, which is over 83\% of the nominal quadruped mass. Video results can be found at https://youtu.be/roE1vxpEWfw.}, +@article{korber_comparing_2021, + title = {Comparing {Popular} {Simulation} {Environments} in the {Scope} of {Robotics} and {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/2103.04616}, + abstract = {This letter compares the performance of four different, popular simulation environments for robotics and reinforcement learning (RL) through a series of benchmarks. The benchmarked scenarios are designed carefully with current industrial applications in mind. Given the need to run simulations as fast as possible to reduce the real-world training time of the RL agents, the comparison includes not only different simulation environments but also different hardware configurations, ranging from an entry-level notebook up to a dual CPU high performance server. We show that the chosen simulation environments benefit the most from single core performance. Yet, using a multi core system, multiple simulations could be run in parallel to increase the performance.}, language = {en}, - urldate = {2021-05-27}, - journal = {arXiv:2103.06484 [cs, eess]}, - author = {Bellegarda, Guillaume and Nguyen, Quan}, + urldate = {2022-04-20}, + journal = {arXiv:2103.04616 [cs]}, + author = {Körber, Marian and Lange, Johann and Rediske, Stephan and Steinmann, Simon and Glück, Roland}, month = mar, year = {2021}, - note = {sim-to-sim}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control, locomotion, sim-to-sim, quadrupeds, rl, pybullet, gazebo, simulation}, - file = {Bellegarda_Nguyen_2021_Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/5NMBKYDC/Bellegarda_Nguyen_2021_Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning.pdf:application/pdf}, + note = {arXiv: 2103.04616}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:/home/dferigo/Zotero/storage/LWVBUMDD/Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:application/pdf}, } -@inproceedings{gronauer_successful_2021, - address = {Montreal, Canada}, - title = {The {Successful} {Ingredients} of {Policy} {Gradient} {Algorithms}}, - isbn = {978-0-9992411-9-6}, - url = {https://www.ijcai.org/proceedings/2021/338}, - doi = {10.24963/ijcai.2021/338}, - abstract = {Despite the sublime success in recent years, the underlying mechanisms powering the advances of reinforcement learning are yet poorly understood. In this paper, we identify these mechanisms - which we call ingredients - in on-policy policy gradient methods and empirically determine their impact on the learning. To allow an equitable assessment, we conduct our experiments based on a unified and modular implementation. Our results underline the significance of recent algorithmic advances and demonstrate that reaching state-of-the-art performance may not need sophisticated algorithms but can also be accomplished by the combination of a few simple ingredients.}, - language = {en}, - urldate = {2021-10-14}, - booktitle = {Proceedings of the {Thirtieth} {International} {Joint} {Conference} on {Artificial} {Intelligence}}, - publisher = {International Joint Conferences on Artificial Intelligence Organization}, - author = {Gronauer, Sven and Gottwald, Martin and Diepold, Klaus}, - month = aug, +@inproceedings{kim_survey_2021, + title = {A {Survey} on {Simulation} {Environments} for {Reinforcement} {Learning}}, + doi = {10.1109/UR52253.2021.9494694}, + abstract = {Most of the recent studies of reinforcement learning and robotics basically employ computer simulation due to the advantages of time and cost. For this reason, users have to spare time for investigation in order to choose optimal environment for their purposes. This paper presents a survey result that can be a guidance in user’s choice for simulation environments. The investigation result includes features, brief historical backgrounds, license policies and formats for robot and object description of the eight most popular environments in robot RL studies. We also propose a quantitative evaluation method for those simulation environments considering the features and a pragmatic point of view.}, + booktitle = {2021 18th {International} {Conference} on {Ubiquitous} {Robots} ({UR})}, + author = {Kim, Taewoo and Jang, Minsu and Kim, Jaehong}, + month = jul, year = {2021}, - pages = {2455--2461}, - file = {Gronauer et al_2021_The Successful Ingredients of Policy Gradient Algorithms.pdf:/home/dferigo/Zotero/storage/B9A5Y9UV/Gronauer et al_2021_The Successful Ingredients of Policy Gradient Algorithms.pdf:application/pdf}, + note = {ISSN: 2325-033X}, + keywords = {Analytical models, Computer simulation, Lead, Licenses, Reinforcement learning, Rendering (computer graphics), Software}, + pages = {63--67}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/K5USS9RE/9494694.html:text/html;Kim et al_2021_A Survey on Simulation Environments for Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/2HCEHSZB/Kim et al_2021_A Survey on Simulation Environments for Reinforcement Learning.pdf:application/pdf}, } -@article{acosta_validating_2021, - title = {Validating {Robotics} {Simulators} on {Real} {World} {Impacts}}, - url = {http://arxiv.org/abs/2110.00541}, - abstract = {A realistic simulation environment is an essential tool in every roboticist's toolkit, with uses ranging from planning and control to training policies with reinforcement learning. Despite the centrality of simulation in modern robotics, little work has been done to compare the performance of robotics simulators against real-world data, especially for scenarios involving dynamic motions with high speed impact events. Handling dynamic contact is the computational bottleneck for most simulations, and thus the modeling and algorithmic choices surrounding impacts and friction form the largest distinctions between popular tools. Here, we evaluate the ability of several simulators to reproduce real-world trajectories involving impacts. Using experimental data, we identify system-specific contact parameters of popular simulators Drake, MuJoCo, and Bullet, analyzing the effects of modeling choices around these parameters. For the simple example of a cube tossed onto a table, simulators capture inelastic impacts well while failing to capture elastic impacts. For the higher-dimensional case of a Cassie biped landing from a jump, the simulators capture the bulk motion well but the accuracy is limited by numerous model differences between the real robot and the simulators.}, - urldate = {2021-10-15}, - journal = {arXiv:2110.00541 [cs]}, - author = {Acosta, Brian and Yang, William and Posa, Michael}, - month = oct, - year = {2021}, - note = {arXiv: 2110.00541}, - keywords = {Computer Science - Robotics}, - file = {Acosta et al_2021_Validating Robotics Simulators on Real World Impacts.pdf:/home/dferigo/Zotero/storage/D4WET8GW/Acosta et al_2021_Validating Robotics Simulators on Real World Impacts.pdf:application/pdf;arXiv.org Snapshot:/home/dferigo/Zotero/storage/LU98NNTF/2110.html:text/html}, +@inproceedings{belbute-peres_end--end_2018, + title = {End-to-{End} {Differentiable} {Physics} for {Learning} and {Control}}, + language = {en}, + author = {Belbute-Peres, Filipe de A and Smith, Kevin A and Allen, Kelsey R and Tenenbaum, Joshua B and Kolter, J Zico}, + year = {2018}, + pages = {12}, + file = {Belbute-Peres et al_2018_End-to-End Differentiable Physics for Learning and Control.pdf:/home/dferigo/Zotero/storage/YX2WUZLM/Belbute-Peres et al_2018_End-to-End Differentiable Physics for Learning and Control.pdf:application/pdf}, } -@article{smith_legged_2021, - title = {Legged {Robots} that {Keep} on {Learning}: {Fine}-{Tuning} {Locomotion} {Policies} in the {Real} {World}}, - shorttitle = {Legged {Robots} that {Keep} on {Learning}}, - url = {http://arxiv.org/abs/2110.05457}, - abstract = {Legged robots are physically capable of traversing a wide range of challenging environments, but designing controllers that are sufficiently robust to handle this diversity has been a long-standing challenge in robotics. Reinforcement learning presents an appealing approach for automating the controller design process and has been able to produce remarkably robust controllers when trained in a suitable range of environments. However, it is difficult to predict all likely conditions the robot will encounter during deployment and enumerate them at training-time. What if instead of training controllers that are robust enough to handle any eventuality, we enable the robot to continually learn in any setting it finds itself in? This kind of real-world reinforcement learning poses a number of challenges, including efficiency, safety, and autonomy. To address these challenges, we propose a practical robot reinforcement learning system for fine-tuning locomotion policies in the real world. We demonstrate that a modest amount of real-world training can substantially improve performance during deployment, and this enables a real A1 quadrupedal robot to autonomously fine-tune multiple locomotion skills in a range of environments, including an outdoor lawn and a variety of indoor terrains.}, - urldate = {2021-10-18}, - journal = {arXiv:2110.05457 [cs]}, - author = {Smith, Laura and Kew, J. Chase and Peng, Xue Bin and Ha, Sehoon and Tan, Jie and Levine, Sergey}, - month = oct, +@article{rackauckas_universal_2021, + title = {Universal {Differential} {Equations} for {Scientific} {Machine} {Learning}}, + url = {http://arxiv.org/abs/2001.04385}, + abstract = {In the context of science, the well-known adage “a picture is worth a thousand words” might well be “a model is worth a thousand datasets.” In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators.}, + language = {en}, + urldate = {2022-04-20}, + journal = {arXiv:2001.04385 [cs, math, q-bio, stat]}, + author = {Rackauckas, Christopher and Ma, Yingbo and Martensen, Julius and Warner, Collin and Zubov, Kirill and Supekar, Rohit and Skinner, Dominic and Ramadhan, Ali and Edelman, Alan}, + month = nov, year = {2021}, - note = {arXiv: 2110.05457}, + note = {arXiv: 2001.04385}, + keywords = {Computer Science - Machine Learning, Mathematics - Dynamical Systems, Quantitative Biology - Quantitative Methods, Statistics - Machine Learning}, + file = {Rackauckas et al_2021_Universal Differential Equations for Scientific Machine Learning.pdf:/home/dferigo/Zotero/storage/3ALYEQ65/Rackauckas et al_2021_Universal Differential Equations for Scientific Machine Learning.pdf:application/pdf}, +} + +@article{singh_efficient_2022, + title = {Efficient {Analytical} {Derivatives} of {Rigid}-{Body} {Dynamics} using {Spatial} {Vector} {Algebra}}, + volume = {7}, + issn = {2377-3766, 2377-3774}, + url = {http://arxiv.org/abs/2105.05102}, + doi = {10.1109/LRA.2022.3141194}, + abstract = {An essential need for many model-based robot control algorithms is the ability to quickly and accurately compute partial derivatives of the equations of motion. State of the art approaches to this problem often use analytical methods based on the chain rule applied to existing dynamics algorithms. Although these methods are an improvement over finite differences in terms of accuracy, they are not always the most efficient. In this paper, we contribute new closed-form expressions for the firstorder partial derivatives of inverse dynamics, leading to a recursive algorithm. The algorithm is benchmarked against chain-rule approaches in Fortran and against an existing algorithm from the Pinocchio library in C++. Tests consider computing the partial derivatives of inverse and forward dynamics for robots ranging from kinematic chains to humanoids and quadrupeds. Compared to the previous open-source Pinocchio implementation, our new analytical results uncover a key computational restructuring that enables efficiency gains. Speedups of up to 1.4x are reported for calculating the partial derivatives of inverse dynamics for the 50-dof Talos humanoid.}, + language = {en}, + number = {2}, + urldate = {2022-04-20}, + journal = {IEEE Robotics and Automation Letters}, + author = {Singh, Shubham and Russell, Ryan P. and Wensing, Patrick M.}, + month = apr, + year = {2022}, + note = {arXiv: 2105.05102}, keywords = {Computer Science - Robotics}, - file = {Smith et al. - 2021 - Legged Robots that Keep on Learning Fine-Tuning L.pdf:/home/dferigo/Zotero/storage/FGTZX2P6/Smith et al. - 2021 - Legged Robots that Keep on Learning Fine-Tuning L.pdf:application/pdf}, + pages = {1776--1783}, + file = {Singh et al_2022_Efficient Analytical Derivatives of Rigid-Body Dynamics using Spatial Vector.pdf:/home/dferigo/Zotero/storage/EU2RG38L/Singh et al_2022_Efficient Analytical Derivatives of Rigid-Body Dynamics using Spatial Vector.pdf:application/pdf}, } -@inproceedings{liu_robot_2021, - title = {Robot {Reinforcement} {Learning} on the {Constraint} {Manifold}}, - url = {https://openreview.net/forum?id=zwo1-MdMl1P}, - abstract = {Reinforcement learning in robotics is extremely challenging due to many practical issues, including safety, mechanical constraints, and wear and tear. Typically, these issues are not considered in...}, +@inproceedings{carpentier_analytical_2018, + title = {Analytical {Derivatives} of {Rigid} {Body} {Dynamics} {Algorithms}}, + isbn = {978-0-9923747-4-7}, + url = {http://www.roboticsproceedings.org/rss14/p38.pdf}, + doi = {10.15607/RSS.2018.XIV.038}, + abstract = {Rigid body dynamics is a well-established frame-work in robotics. It can be used to expose the analytic form of kinematic and dynamic functions of the robot model. So far, two major algorithms, namely the recursive Newton-Euler algorithm (RNEA) and the articulated body algorithm (ABA), have been proposed to compute the inverse dynamics and the forward dynamics in a few microseconds. Evaluating their derivatives is an important challenge for various robotic applications (optimal control, estimation, co-design or reinforcement learning). However it remains time consuming, whether using finite differences or automatic differentiation. In this paper, we propose new algorithms to efficiently compute them thanks to closed-form formulations. Using the chain rule and adequate algebraic differentiation of spatial algebra, we firstly differentiate explicitly RNEA. Then, using properties about the derivative of function composition, we show that the same algorithm can also be used to compute the derivatives of ABA with a marginal additional cost. For this purpose, we introduce a new algorithm to compute the inverse of the joint-space inertia matrix, without explicitly computing the matrix itself. All the algorithms are implemented in our open-source C++ framework called Pinocchio. Benchmarks show computational costs varying between 3 microseconds (for a 7-dof arm) up to 17 microseconds (for a 36-dof humanoid), outperforming the alternative approaches of the state of the art.}, language = {en}, - urldate = {2021-10-21}, - author = {Liu, Puze and Tateo, Davide and Ammar, Haitham Bou and Peters, Jan}, + urldate = {2021-05-18}, + booktitle = {Robotics: {Science} and {Systems} {XIV}}, + publisher = {Robotics: Science and Systems Foundation}, + author = {Carpentier, Justin and Mansard, Nicolas}, month = jun, - year = {2021}, - file = {Liu et al_2021_Robot Reinforcement Learning on the Constraint Manifold.pdf:/home/dferigo/Zotero/storage/8VD2XRMC/Liu et al_2021_Robot Reinforcement Learning on the Constraint Manifold.pdf:application/pdf}, + year = {2018}, + file = {Carpentier_Mansard_2018_Analytical Derivatives of Rigid Body Dynamics Algorithms.pdf:/home/dferigo/Zotero/storage/8XI5XBSA/Carpentier_Mansard_2018_Analytical Derivatives of Rigid Body Dynamics Algorithms.pdf:application/pdf}, } -@techreport{traversaro_multibody_2019, - title = {Multibody dynamics notation}, - url = {https://pure.tue.nl/ws/portalfiles/portal/139293126/A_Multibody_Dynamics_Notation_Revision_2_.pdf}, - language = {en}, - author = {Traversaro, Silvio and Saccon, Alessandro}, +@article{innes_differentiable_2019, + title = {A {Differentiable} {Programming} {System} to {Bridge} {Machine} {Learning} and {Scientific} {Computing}}, + url = {http://arxiv.org/abs/1907.07587}, + abstract = {Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs.}, + urldate = {2019-09-14}, + journal = {arXiv:1907.07587 [cs]}, + author = {Innes, Mike and Edelman, Alan and Fischer, Keno and Rackauckas, Chris and Saba, Elliot and Shah, Viral B. and Tebbutt, Will}, + month = jul, year = {2019}, - pages = {24}, + note = {arXiv: 1907.07587}, + keywords = {Computer Science - Machine Learning, Computer Science - Programming Languages}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/GUIDB8WM/1907.html:text/html;Innes et al_2019_A Differentiable Programming System to Bridge Machine Learning and Scientific.pdf:/home/dferigo/Zotero/storage/5DV2YZ5K/Innes et al_2019_A Differentiable Programming System to Bridge Machine Learning and Scientific.pdf:application/pdf}, } -@inproceedings{traversaro_unied_2017, - title = {A {Unified} {View} of the {Equations} of {Motion} used for {Control} {Design} of {Humanoid} {Robots}}, - abstract = {This paper contributes towards the development of a unified standpoint on the equations of motion used for the control of free-floating mechanical systems. In particular, the contribution of the manuscript is twofold. First, we show how to write the system equations of motion for any choice of the base frame, without the need of re-applying algorithms for evaluating the mass, coriolis, and gravity matrix. A particular attention is paid to the properties associated with the mechanical systems, which are shown to be invariant with respect to the base frame choice. Secondly, we show that the so-called centroidal dynamics can be obtained from any expression of the equations of motion via an appropriate system state transformation. In this case, we show that the mass matrix associated with the new state is block diagonal, and the new base velocity corresponds to the so-called average 6D velocity.}, - language = {en}, - author = {Traversaro, Silvio and Pucci, Daniele and Nori, Francesco}, - year = {2017}, - pages = {24}, +@misc{nvidia_nvidia_2011, + title = {Nvidia {PhysX}}, + url = {https://developer.nvidia.com/physx-sdk}, + author = {{NVIDIA}}, + year = {2011}, } -@article{ferigo_generic_2020, - title = {A generic synchronous dataflow architecture to rapidly prototype and deploy robot controllers}, - copyright = {All rights reserved}, - shorttitle = {A generic synchronous dataflow architecture to rapidly prototype and deploy robot controllers}, - url = {https://journals.sagepub.com/doi/10.1177/1729881420921625}, - doi = {10.1177/1729881420921625}, - abstract = {The article presents a software architecture to optimize the process of prototyping and deploying robot controllers that are synthesized using model-based desig...}, - language = {en}, - urldate = {2020-04-30}, - journal = {International Journal of Advanced Robotic Systems}, - author = {Ferigo, Diego and Traversaro, Silvio and Romano, Francesco and Pucci, Daniele}, - year = {2020}, +@misc{nvidia_nvidia_2018, + title = {Nvidia {Isaac}}, + url = {https://developer.nvidia.com/isaac-sdk}, + author = {{NVIDIA}}, + year = {2018}, } -@inproceedings{latella_human_2019, - title = {A {Human} {Wearable} {Framework} for {Physical} {Human}-{Robot} {Interaction}}, - copyright = {Creative Commons Attribution 4.0 International, Open Access}, - url = {https://zenodo.org/record/4782543}, - doi = {10.5281/ZENODO.4782543}, - abstract = {Physical interaction between humans and robots is more and more frequently required in the modern society. We introduce a novel framework endowing robots with the ability to perceive humans while collaborating at the same task. The framework embodies a human wearable system (i.e., motion tracking suit, force/torque sensors shoes) and a probabilistic algorithm for the human floating-base whole-body kinematics and dynamics estimation.}, +@article{zhao_sim--real_2020, + title = {Sim-to-{Real} {Transfer} in {Deep} {Reinforcement} {Learning} for {Robotics}: a {Survey}}, + shorttitle = {Sim-to-{Real} {Transfer} in {Deep} {Reinforcement} {Learning} for {Robotics}}, + url = {http://arxiv.org/abs/2009.13303}, + abstract = {Deep reinforcement learning has recently seen huge success across multiple areas in the robotics domain. Owing to the limitations of gathering real-world data, i.e., sample inefficiency and the cost of collecting it, simulation environments are utilized for training the different agents. This not only aids in providing a potentially infinite data source, but also alleviates safety concerns with real robots. Nonetheless, the gap between the simulated and real worlds degrades the performance of the policies once the models are transferred into real robots. Multiple research efforts are therefore now being directed towards closing this sim-toreal gap and accomplish more efficient policy transfer. Recent years have seen the emergence of multiple methods applicable to different domains, but there is a lack, to the best of our knowledge, of a comprehensive review summarizing and putting into context the different methods. In this survey paper, we cover the fundamental background behind sim-to-real transfer in deep reinforcement learning and overview the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation. We categorize some of the most relevant recent works, and outline the main application scenarios. Finally, we discuss the main opportunities and challenges of the different approaches and point to the most promising directions.}, language = {en}, - urldate = {2021-09-08}, - author = {Latella, Claudia and Tirupachuri, Yeshasvi and Rapetti, Lorenzo and Ferigo, Diego and Traversaro, Silvio and Sorrentino, Ines and Chavez, Francisco Javier Andrade and Nori, Francesco and Pucci, Daniele}, - year = {2019}, - note = {Conference Name: I-RIM}, - keywords = {Floating-base human dynamics, human wearable system, real-time human joint torque estimation}, + urldate = {2020-10-02}, + journal = {arXiv:2009.13303 [cs]}, + author = {Zhao, Wenshuai and Queralta, Jorge Peña and Westerlund, Tomi}, + month = sep, + year = {2020}, + note = {arXiv: 2009.13303}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, + file = {Zhao et al_2020_Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics.pdf:/home/dferigo/Zotero/storage/BA2MFGQZ/Zhao et al_2020_Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics.pdf:application/pdf}, } -@inproceedings{tirupachuri_towards_2020, - address = {Cham}, - series = {Advances in {Intelligent} {Systems} and {Computing}}, - title = {Towards {Partner}-{Aware} {Humanoid} {Robot} {Control} {Under} {Physical} {Interactions}}, - copyright = {All rights reserved}, - isbn = {978-3-030-29513-4}, - doi = {10.1007/978-3-030-29513-4_78}, - abstract = {The topic of physical human-robot interaction received a lot of attention from the robotics community because of many promising application domains. However, studying physical interaction between a robot and an external agent, like a human or another robot, without considering the dynamics of both the systems may lead to many shortcomings in fully exploiting the interaction. In this paper, we present a coupled-dynamics formalism followed by a sound approach in exploiting helpful interaction with a humanoid robot. In particular, we propose the first attempt to define and exploit the human help for the robot to accomplish a specific task. As a result, we present a task-based partner-aware robot control techniques. The theoretical results are validated by conducting experiments with two iCub humanoid robots involved in physical interaction.}, +@article{muratore_robot_2022, + title = {Robot {Learning} from {Randomized} {Simulations}: {A} {Review}}, + shorttitle = {Robot {Learning} from {Randomized} {Simulations}}, + url = {http://arxiv.org/abs/2111.00956}, + abstract = {The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. Unfortunately, it is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the ‘reality gap’. We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named ‘domain randomization’ which is a method for learning from randomized simulations.}, language = {en}, - booktitle = {Intelligent {Systems} and {Applications}}, - publisher = {Springer International Publishing}, - author = {Tirupachuri, Yeshasvi and Nava, Gabriele and Latella, Claudia and Ferigo, Diego and Rapetti, Lorenzo and Tagliapietra, Luca and Nori, Francesco and Pucci, Daniele}, - editor = {Bi, Yaxin and Bhatia, Rahul and Kapoor, Supriya}, - year = {2020}, - keywords = {Humanoids, Physical human-robot interaction, Physical robot-robot interaction}, - pages = {1073--1092}, + urldate = {2022-04-20}, + journal = {arXiv:2111.00956 [cs]}, + author = {Muratore, Fabio and Ramos, Fabio and Turk, Greg and Yu, Wenhao and Gienger, Michael and Peters, Jan}, + month = jan, + year = {2022}, + note = {arXiv: 2111.00956}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, + file = {Muratore et al_2022_Robot Learning from Randomized Simulations.pdf:/home/dferigo/Zotero/storage/BJEATQL9/Muratore et al_2022_Robot Learning from Randomized Simulations.pdf:application/pdf}, } -@article{duan_learning_2020, - title = {Learning to {Avoid} {Obstacles} {With} {Minimal} {Intervention} {Control}}, - volume = {7}, - issn = {2296-9144}, - url = {https://www.frontiersin.org/article/10.3389/frobt.2020.00060}, - doi = {10.3389/frobt.2020.00060}, - abstract = {Programming by demonstration has received much attention as it offers a general framework which allows robots to efficiently acquire novel motor skills from a human teacher. While traditional imitation learning that only focuses on either Cartesian or joint space might become inappropriate in situations where both spaces are equally important (e.g., writing or striking task), hybrid imitation learning of skills in both Cartesian and joint spaces simultaneously has been studied recently. However, an important issue which often arises in dynamical or unstructured environments is overlooked, namely how can a robot avoid obstacles? In this paper, we aim to address the problem of avoiding obstacles in the context of hybrid imitation learning. Specifically, we propose to tackle three subproblems: (i) designing a proper potential field so as to bypass obstacles, (ii) guaranteeing joint limits are respected when adjusting trajectories in the process of avoiding obstacles, and (iii) determining proper control commands for robots such that potential human-robot interaction is safe. By solving the aforementioned subproblems, the robot is capable of generalizing observed skills to new situations featuring obstacles in a feasible and safe manner. The effectiveness of the proposed method is validated through a toy example as well as a real transportation experiment on the iCub humanoid robot.}, - urldate = {2021-10-28}, - journal = {Frontiers in Robotics and AI}, - author = {Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Huang, Yanlong and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele}, - year = {2020}, - pages = {60}, +@article{li_reinforcement_2021, + title = {Reinforcement {Learning} for {Robust} {Parameterized} {Locomotion} {Control} of {Bipedal} {Robots}}, + url = {http://arxiv.org/abs/2103.14295}, + abstract = {Developing robust walking controllers for bipedal robots is a challenging endeavor. Traditional model-based locomotion controllers require simplifying assumptions and careful modelling; any small errors can result in unstable control. To address these challenges for bipedal locomotion, we present a model-free reinforcement learning framework for training robust locomotion policies in simulation, which can then be transferred to a real bipedal Cassie robot. To facilitate sim-to-real transfer, domain randomization is used to encourage the policies to learn behaviors that are robust across variations in system dynamics. The learned policies enable Cassie to perform a set of diverse and dynamic behaviors, while also being more robust than traditional controllers and prior learning-based methods that use residual control. We demonstrate this on versatile walking behaviors such as tracking a target walking velocity, walking height, and turning yaw.}, + urldate = {2021-05-27}, + journal = {arXiv:2103.14295 [cs, eess]}, + author = {Li, Zhongyu and Cheng, Xuxin and Peng, Xue Bin and Abbeel, Pieter and Levine, Sergey and Berseth, Glen and Sreenath, Koushil}, + month = mar, + year = {2021}, + note = {arXiv: 2103.14295}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control}, + file = {Li et al. - 2021 - Reinforcement Learning for Robust Parameterized Lo.pdf:/home/dferigo/Zotero/storage/CG5A5LGU/Li et al. - 2021 - Reinforcement Learning for Robust Parameterized Lo.pdf:application/pdf}, } -@inproceedings{duan_learning_2019, - title = {Learning to {Sequence} {Multiple} {Tasks} with {Competing} {Constraints}}, - copyright = {All rights reserved}, - doi = {10.1109/IROS40897.2019.8968496}, - abstract = {Imitation learning offers a general framework where robots can efficiently acquire novel motor skills from demonstrations of a human teacher. While many promising achievements have been shown, the majority of them are only focused on single-stroke movements, without taking into account the problem of multi-tasks sequencing. Conceivably, sequencing different atomic tasks can further augment the robot's capabilities as well as avoid repetitive demonstrations. In this paper, we propose to address the issue of multi-tasks sequencing with emphasis on handling the so-called competing constraints, which emerge due to the existence of the concurrent constraints from Cartesian and joint trajectories. Specifically, we explore the null space of the robot from an information-theoretic perspective in order to maintain imitation fidelity during transition between consecutive tasks. The effectiveness of the proposed method is validated through simulated and real experiments on the iCub humanoid robot.}, - booktitle = {2019 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, - author = {Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Huang, Yanlong and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele}, - year = {2019}, - note = {ISSN: 2153-0866}, - pages = {2672--2678}, +@inproceedings{seung-joon_yi_learning_2011, + address = {Shanghai, China}, + title = {Learning full body push recovery control for small humanoid robots}, + isbn = {978-1-61284-386-5}, + url = {http://ieeexplore.ieee.org/document/5980531/}, + doi = {10.1109/ICRA.2011.5980531}, + abstract = {Dynamic bipedal walking is susceptible to external disturbances and surface irregularities, requiring robust feedback control to remain stable. In this work, we present a practical hierarchical push recovery strategy that can be readily implemented on a wide range of humanoid robots. Our method consists of low level controllers that perform simple, biomechanically motivated push recovery actions and a high level controller that combines the low level controllers according to proprioceptive and inertial sensory signals and the current robot state. Reinforcement learning is used to optimize the parameters of the controllers in order to maximize the stability of the robot over a broad range of external disturbances. The controllers are learned on a physical simulation and implemented on the Darwin-HP humanoid robot platform, and the resulting experiments demonstrate effective full body push recovery behaviors during dynamic walking.}, + urldate = {2021-05-27}, + booktitle = {2011 {IEEE} {International} {Conference} on {Robotics} and {Automation}}, + publisher = {IEEE}, + author = {{Seung-Joon Yi} and {Byoung-Tak Zhang} and Hong, Dennis and Lee, Daniel D.}, + month = may, + year = {2011}, + pages = {2047--2052}, + file = {Seung-Joon Yi et al. - 2011 - Learning full body push recovery control for small.pdf:/home/dferigo/Zotero/storage/N3N4IJG7/Seung-Joon Yi et al. - 2011 - Learning full body push recovery control for small.pdf:application/pdf}, } -@inproceedings{darvish_whole-body_2019, - address = {Toronto, ON, Canada}, - title = {Whole-{Body} {Geometric} {Retargeting} for {Humanoid} {Robots}}, - copyright = {All rights reserved}, - isbn = {978-1-5386-7630-1}, - url = {https://ieeexplore.ieee.org/document/9035059/}, - doi = {10.1109/Humanoids43949.2019.9035059}, - abstract = {Humanoid robot teleoperation allows humans to integrate their cognitive capabilities with the apparatus to perform tasks that need high strength, manoeuvrability and dexterity. This paper presents a framework for teleoperation of humanoid robots using a novel approach for motion retargeting through inverse kinematics over the robot model. The proposed method enhances scalability for retargeting, i.e., it allows teleoperating different robots by different human users with minimal changes to the proposed system. Our framework enables an intuitive and natural interaction between the human operator and the humanoid robot at the configuration space level. We validate our approach by demonstrating whole-body retargeting with multiple robot models. Furthermore, we present experimental validation through teleoperation experiments using two state-of-the-art whole-body controllers for humanoid robots.}, - language = {en}, - urldate = {2021-10-28}, - booktitle = {2019 {IEEE}-{RAS} 19th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, - publisher = {IEEE}, - author = {Darvish, Kourosh and Tirupachuri, Yeshasvi and Romualdi, Giulio and Rapetti, Lorenzo and Ferigo, Diego and Chavez, Francisco Javier Andrade and Pucci, Daniele}, +@article{smith_legged_2021, + title = {Legged {Robots} that {Keep} on {Learning}: {Fine}-{Tuning} {Locomotion} {Policies} in the {Real} {World}}, + shorttitle = {Legged {Robots} that {Keep} on {Learning}}, + url = {http://arxiv.org/abs/2110.05457}, + abstract = {Legged robots are physically capable of traversing a wide range of challenging environments, but designing controllers that are sufficiently robust to handle this diversity has been a long-standing challenge in robotics. Reinforcement learning presents an appealing approach for automating the controller design process and has been able to produce remarkably robust controllers when trained in a suitable range of environments. However, it is difficult to predict all likely conditions the robot will encounter during deployment and enumerate them at training-time. What if instead of training controllers that are robust enough to handle any eventuality, we enable the robot to continually learn in any setting it finds itself in? This kind of real-world reinforcement learning poses a number of challenges, including efficiency, safety, and autonomy. To address these challenges, we propose a practical robot reinforcement learning system for fine-tuning locomotion policies in the real world. We demonstrate that a modest amount of real-world training can substantially improve performance during deployment, and this enables a real A1 quadrupedal robot to autonomously fine-tune multiple locomotion skills in a range of environments, including an outdoor lawn and a variety of indoor terrains.}, + urldate = {2021-10-18}, + journal = {arXiv:2110.05457 [cs]}, + author = {Smith, Laura and Kew, J. Chase and Peng, Xue Bin and Ha, Sehoon and Tan, Jie and Levine, Sergey}, month = oct, - year = {2019}, - pages = {679--686}, + year = {2021}, + note = {arXiv: 2110.05457}, + keywords = {Computer Science - Robotics}, + file = {Smith et al. - 2021 - Legged Robots that Keep on Learning Fine-Tuning L.pdf:/home/dferigo/Zotero/storage/FGTZX2P6/Smith et al. - 2021 - Legged Robots that Keep on Learning Fine-Tuning L.pdf:application/pdf}, } -@article{latella_simultaneous_2019, - title = {Simultaneous {Floating}-{Base} {Estimation} of {Human} {Kinematics} and {Joint} {Torques}}, - volume = {19}, - copyright = {http://creativecommons.org/licenses/by/3.0/}, - url = {https://www.mdpi.com/1424-8220/19/12/2794}, - doi = {10.3390/s19122794}, - abstract = {The paper presents a stochastic methodology for the simultaneous floating-base estimation of the human whole-body kinematics and dynamics (i.e., joint torques, internal and external forces). The paper builds upon our former work where a fixed-base formulation had been developed for the human estimation problem. The presented approach is validated by presenting experimental results of a health subject equipped with a wearable motion tracking system and a pair of shoes sensorized with force/torque sensors while performing different motion tasks, e.g., walking on a treadmill. The results show that joint torque estimates obtained by using floating-base and fixed-base approaches match satisfactorily, thus validating the present approach.}, +@inproceedings{bloesch_towards_2022, + title = {Towards {Real} {Robot} {Learning} in the {Wild}: {A} {Case} {Study} in {Bipedal} {Locomotion}}, + shorttitle = {Towards {Real} {Robot} {Learning} in the {Wild}}, + url = {https://proceedings.mlr.press/v164/bloesch22a.html}, + abstract = {Algorithms for self-learning systems have made considerable progress in recent years, yet safety concerns and the need for additional instrumentation have so far largely limited learning experiments with real robots to well controlled lab settings. In this paper, we demonstrate how a small bipedal robot can autonomously learn to walk with minimal human intervention and with minimal instrumentation of the environment. We employ data-efficient off-policy deep reinforcement learning to learn to walk end-to-end, directly on hardware, using rewards that are computed exclusively from proprioceptive sensing. To allow the robot to autonomously adapt its behaviour to its environment, we additionally provide the agent with raw RGB camera images as input. By deploying two robots in different geographic locations while sharing data in a distributed learning setup, we achieve higher throughput and greater diversity of the training data. Our learning experiments constitute a step towards the long-term vision of learning “in the wild” for legged robots, and, to our knowledge, represent the first demonstration of learning a deep neural network controller for bipedal locomotion directly on hardware.}, language = {en}, - number = {12}, - urldate = {2021-10-28}, - journal = {Sensors}, - author = {Latella, Claudia and Traversaro, Silvio and Ferigo, Diego and Tirupachuri, Yeshasvi and Rapetti, Lorenzo and Andrade Chavez, Francisco Javier and Nori, Francesco and Pucci, Daniele}, - year = {2019}, - note = {Number: 12 -Publisher: Multidisciplinary Digital Publishing Institute}, - keywords = {floating-base dynamics estimation, human joint torque analysis, human wearable dynamics}, - pages = {2794}, -} - -@article{roy_machine_2021, - title = {From {Machine} {Learning} to {Robotics}: {Challenges} and {Opportunities} for {Embodied} {Intelligence}}, - shorttitle = {From {Machine} {Learning} to {Robotics}}, - url = {http://arxiv.org/abs/2110.15245}, - abstract = {Machine learning has long since become a keystone technology, accelerating science and applications in a broad range of domains. Consequently, the notion of applying learning methods to a particular problem set has become an established and valuable modus operandi to advance a particular field. In this article we argue that such an approach does not straightforwardly extended to robotics -- or to embodied intelligence more generally: systems which engage in a purposeful exchange of energy and information with a physical environment. In particular, the purview of embodied intelligent agents extends significantly beyond the typical considerations of main-stream machine learning approaches, which typically (i) do not consider operation under conditions significantly different from those encountered during training; (ii) do not consider the often substantial, long-lasting and potentially safety-critical nature of interactions during learning and deployment; (iii) do not require ready adaptation to novel tasks while at the same time (iv) effectively and efficiently curating and extending their models of the world through targeted and deliberate actions. In reality, therefore, these limitations result in learning-based systems which suffer from many of the same operational shortcomings as more traditional, engineering-based approaches when deployed on a robot outside a well defined, and often narrow operating envelope. Contrary to viewing embodied intelligence as another application domain for machine learning, here we argue that it is in fact a key driver for the advancement of machine learning technology. In this article our goal is to highlight challenges and opportunities that are specific to embodied intelligence and to propose research directions which may significantly advance the state-of-the-art in robot learning.}, - language = {en}, - urldate = {2021-11-03}, - journal = {arXiv:2110.15245 [cs]}, - author = {Roy, Nicholas and Posner, Ingmar and Barfoot, Tim and Beaudoin, Philippe and Bengio, Yoshua and Bohg, Jeannette and Brock, Oliver and Depatie, Isabelle and Fox, Dieter and Koditschek, Dan and Lozano-Perez, Tomas and Mansinghka, Vikash and Pal, Christopher and Richards, Blake and Sadigh, Dorsa and Schaal, Stefan and Sukhatme, Gaurav and Therien, Denis and Toussaint, Marc and Van de Panne, Michiel}, - month = oct, - year = {2021}, - note = {arXiv: 2110.15245}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {Roy et al_2021_From Machine Learning to Robotics.pdf:/home/dferigo/Zotero/storage/CA7DHLB4/Roy et al_2021_From Machine Learning to Robotics.pdf:application/pdf}, + urldate = {2022-04-19}, + booktitle = {Proceedings of the 5th {Conference} on {Robot} {Learning}}, + publisher = {PMLR}, + author = {Bloesch, Michael and Humplik, Jan and Patraucean, Viorica and Hafner, Roland and Haarnoja, Tuomas and Byravan, Arunkumar and Siegel, Noah Yamamoto and Tunyasuvunakool, Saran and Casarini, Federico and Batchelor, Nathan and Romano, Francesco and Saliceti, Stefano and Riedmiller, Martin and Eslami, S. M. Ali and Heess, Nicolas}, + month = jan, + year = {2022}, + note = {ISSN: 2640-3498}, + pages = {1502--1511}, + file = {Bloesch et al_2022_Towards Real Robot Learning in the Wild.pdf:/home/dferigo/Zotero/storage/MUVLGR7I/Bloesch et al_2022_Towards Real Robot Learning in the Wild.pdf:application/pdf}, } -@article{cleach_fast_2021, - title = {Fast {Contact}-{Implicit} {Model}-{Predictive} {Control}}, - url = {http://arxiv.org/abs/2107.05616}, - abstract = {We present a general approach for controlling robotic systems that make and break contact with their environments. Contact-implicit model-predictive control (CI-MPC) generalizes linear MPC to contact-rich settings by relying on linear complementarity problems (LCP) computed using strategic Taylor approximations about a reference trajectory and retaining non-smooth impact and friction dynamics, allowing the policy to not only reason about contact forces and timing, but also generate entirely new contact mode sequences online. To achieve reliable and fast numerical convergence, we devise a structure-exploiting, path-following solver for the LCP contact dynamics and a custom trajectory optimizer for trajectory-tracking MPC problems. We demonstrate CI-MPC at real-time rates in simulation, and show that it is robust to model mismatch and can respond to disturbances by discovering and exploiting new contact modes across a variety of robotic systems, including a pushbot, hopper, and planar quadruped and biped.}, +@inproceedings{castillo_robust_2021, + address = {Prague, Czech Republic}, + title = {Robust {Feedback} {Motion} {Policy} {Design} {Using} {Reinforcement} {Learning} on a {3D} {Digit} {Bipedal} {Robot}}, + isbn = {978-1-66541-714-3}, + url = {https://ieeexplore.ieee.org/document/9636467/}, + doi = {10.1109/IROS51168.2021.9636467}, + abstract = {In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimensional state and action spaces of the policy, significantly simplifying the design and increasing the sampling efficiency of the learning method. The inclusion of feedback regulation into the framework improves the robustness of the learned walking gait and ensures the success of the sim-to-real transfer of the proposed controller with minimal tuning. We specifically present a learning pipeline that considers hardware-feasible initial poses of the robot within the learning process to ensure the initial state of the learning is replicated as close as possible to the initial state of the robot in hardware experiments. Finally, we demonstrate the feasibility of our method by successfully transferring the learned policy in simulation to the Digit robot hardware, realizing sustained walking gaits under external force disturbances and challenging terrains not incurred during the training process. To the best of our knowledge, this is the first time a learning-based policy is transferred successfully to the Digit robot in hardware experiments.}, language = {en}, - urldate = {2021-11-26}, - journal = {arXiv:2107.05616 [cs, eess]}, - author = {Cleac'h, Simon Le and Howell, Taylor and Schwager, Mac and Manchester, Zachary}, + urldate = {2022-04-19}, + booktitle = {2021 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, + publisher = {IEEE}, + author = {Castillo, Guillermo A. and Weng, Bowen and Zhang, Wei and Hereid, Ayonga}, month = sep, year = {2021}, - note = {arXiv: 2107.05616}, - keywords = {Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control}, - file = {Cleac'h et al. - 2021 - Fast Contact-Implicit Model-Predictive Control.pdf:/home/dferigo/Zotero/storage/FCY6KR2I/Cleac'h et al. - 2021 - Fast Contact-Implicit Model-Predictive Control.pdf:application/pdf}, + pages = {5136--5143}, + file = {Castillo et al. - 2021 - Robust Feedback Motion Policy Design Using Reinfor.pdf:/home/dferigo/Zotero/storage/43DSYDWU/Castillo et al. - 2021 - Robust Feedback Motion Policy Design Using Reinfor.pdf:application/pdf}, } -@article{silver_reward_2021, - title = {Reward is enough}, - volume = {299}, - issn = {0004-3702}, - url = {https://www.sciencedirect.com/science/article/pii/S0004370221000862}, - doi = {10.1016/j.artint.2021.103535}, - abstract = {In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.}, +@inproceedings{rudin_learning_2022, + title = {Learning to {Walk} in {Minutes} {Using} {Massively} {Parallel} {Deep} {Reinforcement} {Learning}}, + url = {https://proceedings.mlr.press/v164/rudin22a.html}, + abstract = {In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged\_gym/.}, language = {en}, - urldate = {2022-02-09}, - journal = {Artificial Intelligence}, - author = {Silver, David and Singh, Satinder and Precup, Doina and Sutton, Richard S.}, - month = oct, - year = {2021}, - keywords = {Reinforcement learning, Artificial general intelligence, Artificial intelligence, Reward}, - pages = {103535}, - file = {ScienceDirect Snapshot:/home/dferigo/Zotero/storage/XUXAKHJ9/S0004370221000862.html:text/html;Silver et al_2021_Reward is enough.pdf:/home/dferigo/Zotero/storage/DE8PXLYA/Silver et al_2021_Reward is enough.pdf:application/pdf}, + urldate = {2022-04-19}, + booktitle = {Proceedings of the 5th {Conference} on {Robot} {Learning}}, + publisher = {PMLR}, + author = {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco}, + month = jan, + year = {2022}, + note = {ISSN: 2640-3498}, + pages = {91--100}, + file = {Rudin et al_2022_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/UKAV6F4D/Rudin et al_2022_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:application/pdf}, } -@book{sutton_reinforcement_2018, - address = {Cambridge, MA}, - edition = {Second edition}, - series = {Adaptive computation and machine learning series}, - title = {Reinforcement learning: an introduction}, - isbn = {978-0-262-03924-6}, - shorttitle = {Reinforcement learning}, - abstract = {"Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms."--}, - publisher = {The MIT Press}, - author = {Sutton, Richard S. and Barto, Andrew G.}, - year = {2018}, - keywords = {Reinforcement learning}, +@inproceedings{gangapurwala_real-time_2021, + address = {Xi'an, China}, + title = {Real-{Time} {Trajectory} {Adaptation} for {Quadrupedal} {Locomotion} using {Deep} {Reinforcement} {Learning}}, + isbn = {978-1-72819-077-8}, + url = {https://ieeexplore.ieee.org/document/9561639/}, + doi = {10.1109/ICRA48506.2021.9561639}, + abstract = {We present a control architecture for real-time adaptation and tracking of trajectories generated using a terrain-aware trajectory optimization solver. This approach enables us to circumvent the computationally exhaustive task of online trajectory optimization, and further introduces a control solution robust to systems modeled with approximated dynamics. We train a policy using deep reinforcement learning (RL) to introduce additive deviations to a reference trajectory in order to generate a feedback-based trajectory tracking system for a quadrupedal robot. We train this policy across a multitude of simulated terrains and ensure its generality by introducing training methods that avoid overfitting and convergence towards local optima. Additionally, in order to capture terrain information, we include a latent representation of the height maps in the observation space of the RL environment as a form of exteroceptive feedback. We test the performance of our trained policy by tracking the corrected set points using a model-based whole-body controller and compare it with the tracking behavior obtained without the corrective feedback in several simulation environments, and show that introducing the corrective feedback results in increase of the success rate from 72.7\% to 92.4\% for tracking precomputed dynamic long horizon trajectories on flat terrain and from 47.5\% to 80.3\% on a complex modular uneven terrain. We also show successful transfer of our training approach to the real physical system and further present cogent arguments in support of our framework.}, + language = {en}, + urldate = {2022-04-19}, + booktitle = {2021 {IEEE} {International} {Conference} on {Robotics} and {Automation} ({ICRA})}, + publisher = {IEEE}, + author = {Gangapurwala, Siddhant and Geisert, Mathieu and Orsolino, Romeo and Fallon, Maurice and Havoutis, Ioannis}, + month = may, + year = {2021}, + pages = {5973--5979}, + file = {Gangapurwala et al_2021_Real-Time Trajectory Adaptation for Quadrupedal Locomotion using Deep.pdf:/home/dferigo/Zotero/storage/8MU3IKJR/Gangapurwala et al_2021_Real-Time Trajectory Adaptation for Quadrupedal Locomotion using Deep.pdf:application/pdf}, } -@phdthesis{shulman_optimizing_2016, - title = {Optimizing {Expectations}: {From} {Deep} {Reinforcement} {Learning} to {Stochastic} {Computation} {Graphs}}, - url = {http://joschu.net/docs/thesis.pdf}, - urldate = {2019-06-07}, - author = {Shulman, John}, - year = {2016}, +@article{bellegarda_robust_2021, + title = {Robust {High}-speed {Running} for {Quadruped} {Robots} via {Deep} {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/2103.06484}, + abstract = {Deep reinforcement learning has emerged as a popular and powerful way to develop locomotion controllers for quadruped robots. Common approaches have largely focused on learning actions directly in joint space, or learning to modify and offset foot positions produced by trajectory generators. Both approaches typically require careful reward shaping and training for millions of time steps, and with trajectory generators introduce human bias into the resulting control policies. In this paper, we instead explore learning foot positions in Cartesian space, which we track with impedance control, for a task of running as fast as possible subject to environmental disturbances. Compared with other action spaces, we observe less needed reward shaping, much improved sample efficiency, the emergence of natural gaits such as galloping and bounding, and ease of sim-to-sim transfer. Policies can be learned in only a few million time steps, even for challenging tasks of running over rough terrain with loads of over 100\% of the nominal quadruped mass. Training occurs in PyBullet, and we perform a sim-to-sim transfer to Gazebo, where our quadruped is able to run at over 4 m/s without a load, and 3.5 m/s with a 10 kg load, which is over 83\% of the nominal quadruped mass. Video results can be found at https://youtu.be/roE1vxpEWfw.}, + language = {en}, + urldate = {2021-05-27}, + journal = {arXiv:2103.06484 [cs, eess]}, + author = {Bellegarda, Guillaume and Nguyen, Quan}, + month = mar, + year = {2021}, + note = {sim-to-sim}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control, locomotion, gazebo, pybullet, quadrupeds, rl, sim-to-sim, simulation}, + file = {Bellegarda_Nguyen_2021_Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/5NMBKYDC/Bellegarda_Nguyen_2021_Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning.pdf:application/pdf}, } -@article{schulman_proximal_2017, - title = {Proximal {Policy} {Optimization} {Algorithms}}, - url = {http://arxiv.org/abs/1707.06347}, - abstract = {We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.}, +@article{haarnoja_soft_2018, + title = {Soft {Actor}-{Critic}: {Off}-{Policy} {Maximum} {Entropy} {Deep} {Reinforcement} {Learning} with a {Stochastic} {Actor}}, + shorttitle = {Soft {Actor}-{Critic}}, + url = {http://arxiv.org/abs/1801.01290}, + abstract = {Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.}, urldate = {2020-05-07}, - journal = {arXiv:1707.06347 [cs]}, - author = {Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg}, + journal = {arXiv:1801.01290 [cs, stat]}, + author = {Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey}, month = aug, - year = {2017}, - note = {arXiv: 1707.06347}, - keywords = {Computer Science - Machine Learning}, + year = {2018}, + note = {arXiv: 1801.01290}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/53U5GB7V/1801.html:text/html;Haarnoja et al_2018_Soft Actor-Critic.pdf:/home/dferigo/Zotero/storage/KI69MXY9/Haarnoja et al_2018_Soft Actor-Critic.pdf:application/pdf}, } -@article{pardo_time_2022, - title = {Time {Limits} in {Reinforcement} {Learning}}, - url = {http://arxiv.org/abs/1712.00378}, - abstract = {In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and explain why not doing so can cause state aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. In case (i), we argue that the terminations due to time limits are in fact part of the environment, and thus a notion of the remaining time should be included as part of the agent’s input to avoid violation of the Markov property. In case (ii), the time limits are not part of the environment and are only used to facilitate learning. We argue that this insight should be incorporated by bootstrapping from the value of the state at the end of each partial episode. For both cases, we illustrate empirically the significance of our considerations in improving the performance and stability of existing reinforcement learning algorithms, showing state-of-the-art results on several control tasks.}, +@article{mnih_human-level_2015, + title = {Human-level control through deep reinforcement learning}, + volume = {518}, + issn = {0028-0836, 1476-4687}, + url = {http://www.nature.com/articles/nature14236}, + doi = {10.1038/nature14236}, language = {en}, - urldate = {2022-02-14}, - journal = {arXiv:1712.00378 [cs]}, - author = {Pardo, Fabio and Tavakoli, Arash and Levdik, Vitaly and Kormushev, Petar}, - month = jan, - year = {2022}, - note = {arXiv: 1712.00378}, - keywords = {Computer Science - Machine Learning}, + number = {7540}, + urldate = {2018-06-23}, + journal = {Nature}, + author = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis}, + month = feb, + year = {2015}, + keywords = {deepmind}, + pages = {529--533}, + file = {Mnih et al_2015_Human-level control through deep reinforcement learning.pdf:/home/dferigo/Zotero/storage/WC7QJIMF/Mnih et al_2015_Human-level control through deep reinforcement learning.pdf:application/pdf}, } -@incollection{zhang_taxonomy_2020, - address = {Singapore}, - title = {Taxonomy of {Reinforcement} {Learning} {Algorithms}}, - isbn = {9789811540950}, - url = {https://doi.org/10.1007/978-981-15-4095-0_3}, - abstract = {In this chapter, we introduce and summarize the taxonomy and categories for reinforcement learning (RL) algorithms. Figure 3.1 presents an overview of the typical and popular algorithms in a structural way. We classify reinforcement learning algorithms from different perspectives, including model-based and model-free methods, value-based and policy-based methods (or combination of the two), Monte Carlo methods and temporal-difference methods, on-policy and off-policy methods. Most reinforcement learning algorithms can be classified under different categories according to the above criteria, hope this helps to provide the readers some overviews of the full picture before introducing the algorithms in detail in later chapters.Open image in new windowFig. 3.1Map of reinforcement learning algorithms. Boxes with thick lines denote different categories, others denote specific algorithms}, +@inproceedings{kohl_policy_2004, + address = {New Orleans, LA, USA}, + title = {Policy gradient reinforcement learning for fast quadrupedal locomotion}, + isbn = {978-0-7803-8232-9}, + url = {http://ieeexplore.ieee.org/document/1307456/}, + doi = {10.1109/ROBOT.2004.1307456}, + abstract = {This paper presents a machine learning approach to optimizing a quadrupedal trot gait for forward speed. Given a parameterized walk designed for a specific robot, we propose using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk. We implement and test our approach on a commercially available quadrupedal robot platform, namely the Sony Aibo robot. After about three hours of learning, all on the physical robots and with no human intervention other than to change the batteries, the robots achieved a gait faster than any previously known gait for the Aibo, significantly outperforming a variety of existing hand-coded and learned solutions.}, language = {en}, - urldate = {2022-02-24}, - booktitle = {Deep {Reinforcement} {Learning}: {Fundamentals}, {Research} and {Applications}}, - publisher = {Springer}, - author = {Zhang, Hongming and Yu, Tianyang}, - editor = {Dong, Hao and Ding, Zihan and Zhang, Shanghang}, - year = {2020}, - doi = {10.1007/978-981-15-4095-0_3}, - keywords = {Model-based, Model-free, Monte Carlo (MC) methods, Off-policy, On-policy, Policy-based, Temporal-difference (TD) methods, Value-based}, - pages = {125--133}, + urldate = {2022-04-19}, + booktitle = {{IEEE} {International} {Conference} on {Robotics} and {Automation}, 2004. {Proceedings}. {ICRA} '04. 2004}, + publisher = {IEEE}, + author = {Kohl, N. and Stone, P.}, + year = {2004}, + pages = {2619--2624 Vol.3}, + file = {Kohl_Stone_2004_Policy gradient reinforcement learning for fast quadrupedal locomotion.pdf:/home/dferigo/Zotero/storage/6VYM3T8R/Kohl_Stone_2004_Policy gradient reinforcement learning for fast quadrupedal locomotion.pdf:application/pdf}, } -@article{schulman_high-dimensional_2018, - title = {High-{Dimensional} {Continuous} {Control} {Using} {Generalized} {Advantage} {Estimation}}, - url = {http://arxiv.org/abs/1506.02438}, - abstract = {Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(λ). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks.}, +@inproceedings{atkeson_robot_1997, + title = {Robot {Learning} {From} {Demonstration}}, + abstract = {The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration the robot learns a reward function from the demonstration and a task model from repeated attempts to perform the task. A policy is computed based on the learned reward function and task model. Lessons learned from an implementation on an anthropomorphic robot arm using a pendulum swing up task include 1) simply mimicking demonstrated motions is not adequate to perform this task, 2) a task planner can use a learned model and reward function to compute an appropriate policy, 3) this modelbased planning process supports rapid learning, 4) both parametric and nonparametric models can be learned and used, and 5) incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning.}, language = {en}, - urldate = {2022-02-28}, - journal = {arXiv:1506.02438 [cs]}, - author = {Schulman, John and Moritz, Philipp and Levine, Sergey and Jordan, Michael and Abbeel, Pieter}, - month = oct, - year = {2018}, - note = {arXiv: 1506.02438}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control}, + author = {Atkeson, Christopher G and Schaal, Stefan}, + year = {1997}, + pages = {9}, + file = {Atkeson_Schaal_Robot Learning From Demonstration.pdf:/home/dferigo/Zotero/storage/GNWQGNQS/Atkeson_Schaal_Robot Learning From Demonstration.pdf:application/pdf}, } -@article{schulman_trust_2017, - title = {Trust {Region} {Policy} {Optimization}}, - url = {http://arxiv.org/abs/1502.05477}, - abstract = {We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.}, +@inproceedings{kolter_hierarchical_2007, + title = {Hierarchical {Apprenticeship} {Learning} with {Application} to {Quadruped} {Locomotion}}, + abstract = {We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly non-trivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work.}, language = {en}, - urldate = {2022-02-28}, - journal = {arXiv:1502.05477 [cs]}, - author = {Schulman, John and Levine, Sergey and Moritz, Philipp and Jordan, Michael I. and Abbeel, Pieter}, - month = apr, - year = {2017}, - note = {arXiv: 1502.05477}, - keywords = {Computer Science - Machine Learning}, + author = {Kolter, J Z and Abbeel, Pieter and Ng, Andrew Y}, + year = {2007}, + pages = {8}, + file = {Kolter et al. - Hierarchical Apprenticeship Learning with Applicat.pdf:/home/dferigo/Zotero/storage/3YAK7MMU/Kolter et al. - Hierarchical Apprenticeship Learning with Applicat.pdf:application/pdf}, } -@article{freeman_brax_2021, - title = {Brax -- {A} {Differentiable} {Physics} {Engine} for {Large} {Scale} {Rigid} {Body} {Simulation}}, - url = {http://arxiv.org/abs/2106.13281}, - abstract = {We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX. We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine. Additionally, we provide reimplementations of PPO, SAC, ES, and direct policy optimization in JAX that compile alongside our environments, allowing the learning algorithm and the environment processing to occur on the same device, and to scale seamlessly on accelerators. Finally, we include notebooks that facilitate training of performant policies on common OpenAI Gym MuJoCo-like tasks in minutes.}, +@article{theodorou_generalized_2010, + title = {A {Generalized} {Path} {Integral} {Control} {Approach} to {Reinforcement} {Learning}}, + volume = {11}, + abstract = {With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-JacobiBellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.}, language = {en}, - urldate = {2021-12-09}, - journal = {arXiv:2106.13281 [cs]}, - author = {Freeman, C. Daniel and Frey, Erik and Raichuk, Anton and Girgin, Sertan and Mordatch, Igor and Bachem, Olivier}, - month = jun, - year = {2021}, - note = {arXiv: 2106.13281}, - keywords = {Computer Science - Robotics, Computer Science - Artificial Intelligence}, + journal = {The Journal of Machine Learning Research}, + author = {Theodorou, Evangelos A and Buchli, Jonas and Schaal, Stefan and Org, Buchli}, + year = {2010}, + pages = {45}, + file = {Theodorou et al_A Generalized Path Integral Control Approach to Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/N8HXKEPV/Theodorou et al_A Generalized Path Integral Control Approach to Reinforcement Learning.pdf:application/pdf}, } -@article{makoviychuk_isaac_2021, - title = {Isaac {Gym}: {High} {Performance} {GPU}-{Based} {Physics} {Simulation} {For} {Robot} {Learning}}, - url = {http://arxiv.org/abs/2108.10470}, - abstract = {Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 2-3 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. We host the results and videos at https://sites.google.com/view/ isaacgym-nvidia and isaac gym can be downloaded at https://developer. nvidia.com/isaac-gym.}, - language = {en}, - urldate = {2022-03-01}, - journal = {arXiv:2108.10470 [cs]}, - author = {Makoviychuk, Viktor and Wawrzyniak, Lukasz and Guo, Yunrong and Lu, Michelle and Storey, Kier and Macklin, Miles and Hoeller, David and Rudin, Nikita and Allshire, Arthur and Handa, Ankur and State, Gavriel}, - month = aug, - year = {2021}, - note = {arXiv: 2108.10470}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, +@inproceedings{peters_reinforcement_2003, + title = {Reinforcement {Learning} for {Humanoid} {Robotics}}, + abstract = {Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be coarsely classified into three different categories, i.e., greedy methods, ‘vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. ‘Vanilla’ policy gradient methods on the other hand have been successfully applied on real-world robots including at least one humanoid robot [3]. We demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. A derivation of the natural policy gradient is provided, proving that the average policy gradient of Kakade [10] is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges to the nearest local minimum of the cost function with respect to the Fisher information metric under suitable conditions. The algorithm outperforms non-natural policy gradients by far in a cart-pole balancing evaluation, and for learning nonlinear dynamic motor primitives for humanoid robot control. It offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems.}, + language = {en}, + booktitle = {Proceedings of the third {IEEE}-{RAS} international conference on humanoid robots}, + author = {Peters, Jan and Vijayakumar, Sethu and Schaal, Stefan}, + year = {2003}, + pages = {20}, + file = {Peters et al_Reinforcement Learning for Humanoid Robotics.pdf:/home/dferigo/Zotero/storage/WVS83I5T/Peters et al_Reinforcement Learning for Humanoid Robotics.pdf:application/pdf}, } -@article{heiden_neuralsim_2021, - title = {{NeuralSim}: {Augmenting} {Differentiable} {Simulators} with {Neural} {Networks}}, - url = {http://arxiv.org/abs/2011.04217}, - abstract = {Differentiable simulators provide an avenue for closing the sim-to-real gap by enabling the use of efficient, gradient-based optimization algorithms to find the simulation parameters that best fit the observed sensor readings. Nonetheless, these analytical models can only predict the dynamical behavior of systems for which they have been designed. In this work, we study the augmentation of a novel differentiable rigid-body physics engine via neural networks that is able to learn nonlinear relationships between dynamic quantities and can thus learn effects not accounted for in traditional simulators.Such augmentations require less data to train and generalize better compared to entirely data-driven models. Through extensive experiments, we demonstrate the ability of our hybrid simulator to learn complex dynamics involving frictional contacts from real data, as well as match known models of viscous friction, and present an approach for automatically discovering useful augmentations. We show that, besides benefiting dynamics modeling, inserting neural networks can accelerate model-based control architectures. We observe a ten-fold speed-up when replacing the QP solver inside a model-predictive gait controller for quadruped robots with a neural network, allowing us to significantly improve control delays as we demonstrate in real-hardware experiments. We publish code, additional results and videos from our experiments on our project webpage at https://sites.google.com/usc.edu/neuralsim.}, - urldate = {2022-03-01}, - journal = {arXiv:2011.04217 [cs]}, - author = {Heiden, Eric and Millard, David and Coumans, Erwin and Sheng, Yizhou and Sukhatme, Gaurav S.}, - month = may, - year = {2021}, - note = {arXiv: 2011.04217}, - keywords = {Computer Science - Robotics}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/M9F3TAP8/2011.html:text/html;Heiden et al_2021_NeuralSim.pdf:/home/dferigo/Zotero/storage/BZ9QUVGR/Heiden et al_2021_NeuralSim.pdf:application/pdf}, +@article{gullapalli_acquiring_1994, + title = {Acquiring robot skills via reinforcement learning}, + volume = {14}, + issn = {1941-000X}, + doi = {10.1109/37.257890}, + abstract = {Skill acquisition is a difficult , yet important problem in robot performance. The authors focus on two skills, namely robotic assembly and balancing and on two classic tasks to develop these skills via learning: the peg-in hole insertion task, and the ball balancing task. A stochastic real-valued (SRV) reinforcement learning algorithm is described and used for learning control and the authors show how it can be used with nonlinear multilayer ANNs. In the peg-in-hole insertion task the SRV network successfully learns to insert to insert a peg into a hole with extremely low clearance, in spite of high sensor noise. In the ball balancing task the SRV network successfully learns to balance the ball with minimal feedback.{\textless}{\textgreater}}, + number = {1}, + journal = {IEEE Control Systems Magazine}, + author = {Gullapalli, V. and Franklin, J.A. and Benbrahim, H.}, + month = feb, + year = {1994}, + note = {Conference Name: IEEE Control Systems Magazine}, + keywords = {Adaptive control, Control design, Control systems, Delay, Feedback, Robot control, Robotic assembly, Robust control, Supervised learning, Uncertainty}, + pages = {13--24}, + file = {Gullapalli et al_1994_Acquiring robot skills via reinforcement learning.pdf:/home/dferigo/Zotero/storage/D239HJKA/Gullapalli et al_1994_Acquiring robot skills via reinforcement learning.pdf:application/pdf}, } -@article{sola_quaternion_2017, - title = {Quaternion kinematics for the error-state {Kalman} filter}, - language = {en}, - author = {Sola, Joan}, - year = {2017}, - pages = {93}, +@inproceedings{kober_policy_2008, + title = {Policy {Search} for {Motor} {Primitives} in {Robotics}}, + volume = {21}, + url = {https://proceedings.neurips.cc/paper/2008/hash/7647966b7343c29048673252e490f736-Abstract.html}, + abstract = {Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results into a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning by assuming a form of exploration that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable in complex motor learning tasks. We compare this algorithm to alternative parametrized policy search methods and show that it outperforms previous methods. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM robot arm.}, + urldate = {2022-04-15}, + booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, + publisher = {Curran Associates, Inc.}, + author = {Kober, Jens and Peters, Jan}, + year = {2008}, + file = {Kober_Peters_2008_Policy Search for Motor Primitives in Robotics.pdf:/home/dferigo/Zotero/storage/BFS6BPCJ/Kober_Peters_2008_Policy Search for Motor Primitives in Robotics.pdf:application/pdf}, } -@inproceedings{gros_baumgarte_2015, - address = {Osaka}, - title = {Baumgarte stabilisation over the {SO}(3) rotation group for control}, - isbn = {978-1-4799-7886-1}, - url = {http://ieeexplore.ieee.org/document/7402298/}, - doi = {10.1109/CDC.2015.7402298}, - abstract = {Representations of the SO(3) rotation group are crucial for airborne and aerospace applications. Euler angles is a popular representation in many applications, but yield models having singular dynamics. This issue is addressed via non-singular representations, operating in dimensions higher than 3. Unit quaternions and the Direction Cosine Matrix are the best known non-singular representations, and favoured in challenging aeronautic and aerospace applications. All nonsingular representations yield invariants in the model dynamics, i.e. a set of nonlinear algebraic conditions that must be fulfilled by the model initial conditions, and that remain fulfilled over time. However, due to numerical integration errors, these conditions tend to become violated when using standard integrators, making the model inconsistent with the physical reality. This issue poses some challenges when non-singular representations are deployed in optimal control. In this paper, we propose a simple technique to address the issue for classical integration schemes, establish formally its properties, and illustrate it on the optimal control of a satellite.}, +@inproceedings{honglak_lee_quadruped_2006, + address = {Orlando, FL, USA}, + title = {Quadruped robot obstacle negotiation via reinforcement learning}, + isbn = {978-0-7803-9505-3}, + url = {http://ieeexplore.ieee.org/document/1642158/}, + doi = {10.1109/ROBOT.2006.1642158}, + abstract = {Legged robots can, in principle, traverse a large variety of obstacles and terrains. In this paper, we describe a successful application of reinforcement learning to the problem of negotiating obstacles with a quadruped robot. Our algorithm is based on a two-level hierarchical decomposition of the task, in which the high-level controller selects the sequence of footplacement positions, and the low-level controller generates the continuous motions to move each foot to the specified positions. The high-level controller uses an estimate of the value function to guide its search; this estimate is learned partially from supervised data. The low-level controller is obtained via policy search. We demonstrate that our robot can successfully climb over a variety of obstacles which were not seen at training time.}, language = {en}, - urldate = {2022-03-02}, - booktitle = {2015 54th {IEEE} {Conference} on {Decision} and {Control} ({CDC})}, + urldate = {2022-04-15}, + booktitle = {Proceedings 2006 {IEEE} {International} {Conference} on {Robotics} and {Automation}, 2006. {ICRA} 2006.}, publisher = {IEEE}, - author = {Gros, Sebastien and Zanon, Marion and Diehl, Moritz}, - month = dec, - year = {2015}, - pages = {620--625}, + author = {{Honglak Lee} and {Yirong Shen} and {Chih-Han Yu} and Singh, G. and Ng, A.Y.}, + year = {2006}, + pages = {3003--3010}, + file = {Honglak Lee et al_2006_Quadruped robot obstacle negotiation via reinforcement learning.pdf:/home/dferigo/Zotero/storage/LE662D9H/Honglak Lee et al_2006_Quadruped robot obstacle negotiation via reinforcement learning.pdf:application/pdf}, } -@article{gilardi_literature_2002, - title = {Literature survey of contact dynamics modelling}, - volume = {37}, - issn = {0094114X}, - url = {https://linkinghub.elsevier.com/retrieve/pii/S0094114X02000459}, - doi = {10.1016/S0094-114X(02)00045-9}, - abstract = {Impact is a complex phenomenon that occurs when two or more bodies undergo a collision. This phenomenon is important in many different areas––machine design, robotics, multi-body analysis are just a few examples. The purpose of this manuscript is to provide an overview of the state of the art on impact and contact modelling methodologies, taking into account their different aspects, specifically, the energy loss, the influence of the friction model, solution approaches, the multi-contact problem and the experimental verification. The paper is intended to provide a review of results presented in literature and some additional insights into existing models, their interrelationship and the use of these models for impact/contact scenarios encountered in space robotic applications.}, +@inproceedings{kohl_machine_2004, + title = {Machine {Learning} for {Fast} {Quadrupedal} {Locomotion}}, + abstract = {For a robot, the ability to get from one place to another is one of the most basic skills. However, locomotion on legged robots is a challenging multidimensional control problem. This paper presents a machine learning approach to legged locomotion, with all training done on the physical robots. The main contributions are a specification of our fully automated learning environment and a detailed empirical comparison of four different machine learning algorithms for learning quadrupedal locomotion. The resulting learned walk is considerably faster than all previously reported hand-coded walks for the same robot platform.}, language = {en}, - number = {10}, - urldate = {2022-03-02}, - journal = {Mechanism and Machine Theory}, - author = {Gilardi, G. and Sharf, I.}, - month = oct, - year = {2002}, - pages = {1213--1239}, - file = {Gilardi and Sharf - 2002 - Literature survey of contact dynamics modelling.pdf:/home/dferigo/Zotero/storage/R9VJE4P2/Gilardi and Sharf - 2002 - Literature survey of contact dynamics modelling.pdf:application/pdf}, + author = {Kohl, Nate and Stone, Peter}, + year = {2004}, + pages = {6}, + file = {Kohl_Stone_2004_Machine Learning for Fast Quadrupedal Locomotion.pdf:/home/dferigo/Zotero/storage/ZF4WFM4N/Kohl_Stone_2004_Machine Learning for Fast Quadrupedal Locomotion.pdf:application/pdf}, } -@inproceedings{azad_model_2016, - title = {Model estimation and control of compliant contact normal force}, - doi = {10.1109/HUMANOIDS.2016.7803313}, - abstract = {This paper proposes a method to realize desired contact normal forces between humanoids and their compliant environment. By using contact models, desired contact forces are converted to desired deformations of compliant surfaces. To achieve desired forces, deformations are controlled by controlling the contact point positions. Parameters of contact models are assumed to be known or estimated using the approach described in this paper. The proposed methods for estimating the contact parameters and controlling the contact normal force are implemented on a LWR KUKA IV arm. To verify both methods, experiments are performed with the KUKA arm while its end-effector is in contact with two different soft objects.}, - booktitle = {2016 {IEEE}-{RAS} 16th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, - author = {Azad, Morteza and Ortenzi, Valerio and Lin, Hsiu-Chin and Rueckert, Elmar and Mistry, Michael}, - month = nov, - year = {2016}, - note = {ISSN: 2164-0580}, - keywords = {Motion control, Dynamics, Humanoid robots, Acceleration, Force, Estimation}, - pages = {442--447}, +@article{zico_kolter_stanford_2011, + title = {The {Stanford} {LittleDog}: {A} learning and rapid replanning approach to quadruped locomotion}, + volume = {30}, + issn = {0278-3649, 1741-3176}, + shorttitle = {The {Stanford} {LittleDog}}, + url = {http://journals.sagepub.com/doi/10.1177/0278364910390537}, + doi = {10.1177/0278364910390537}, + abstract = {Legged robots offer the potential to navigate a wide variety of terrains that are inaccessible to wheeled vehicles. In this paper we consider the planning and control tasks of navigating a quadruped robot over a wide variety of challenging terrain, including terrain which it has not seen until run-time. We present a software architecture that makes use of both static and dynamic gaits, as well as specialized dynamic maneuvers, to accomplish this task. Throughout the paper we highlight two themes that have been central to our approach: 1) the prevalent use of learning algorithms, and 2) a focus on rapid recovery and replanning techniques; we present several novel methods and algorithms that we developed for the quadruped and that illustrate these two themes. We evaluate the performance of these different methods, and also present and discuss the performance of our system on the official Learning Locomotion tests.}, + language = {en}, + number = {2}, + urldate = {2022-04-15}, + journal = {The International Journal of Robotics Research}, + author = {Zico Kolter, J. and Ng, Andrew Y}, + month = feb, + year = {2011}, + pages = {150--174}, + file = {Zico Kolter_Ng_2011_The Stanford LittleDog.pdf:/home/dferigo/Zotero/storage/3SSTHW7F/Zico Kolter_Ng_2011_The Stanford LittleDog.pdf:application/pdf}, } -@article{azad_new_2014, - title = {A {New} {Nonlinear} {Model} of {Contact} {Normal} {Force}}, - volume = {30}, - issn = {1941-0468}, - doi = {10.1109/TRO.2013.2293833}, - abstract = {This paper presents a new nonlinear model of the normal force that arises during compliant contact between two spheres, or between a sphere and a flat plate. It differs from a well-known existing model by only a single term. The advantage of the new model is that it accurately predicts the measured values of the coefficient of restitution between spheres and plates of various materials, whereas other models do not.}, - number = {3}, - journal = {IEEE Transactions on Robotics}, - author = {Azad, Morteza and Featherstone, Roy}, +@article{schaal_is_1999, + title = {Is imitation learning the route to humanoid robots?}, + volume = {3}, + issn = {13646613}, + url = {https://linkinghub.elsevier.com/retrieve/pii/S1364661399013273}, + doi = {10.1016/S1364-6613(99)01327-3}, + abstract = {This review investigates two recent developments in artificial intelligence and neural computation: learning from imitation and the development of humanoid robots. It will be postulated that the study of imitation learning offers a promising route to gain new insights into mechanisms of perceptual motor control that could ultimately lead to the creation of autonomous humanoid robots. Imitation learning focuses on three important issues: efficient motor learning, the connection between action and perception, and modular motor control in form of movement primitives. It will be reviewed how research on representations of, and functional connections between action and perception have contributed to our understanding of motor acts of other beings. The recent discovery that some areas in the primate brain are active during both movement perception and execution has provided a hypothetical neural basis of imitation. Computational approaches to imitation learning will also be described, initially from the perspective of traditional AI and robotics, but also from the perspective of neural network models and statistical learning research. Parallels and differences between biological and computational approaches to imitation will be highlighted and an overview of current projects that actually employ imitation learning for humanoid robots will be given.}, + language = {en}, + number = {6}, + urldate = {2022-04-15}, + journal = {Trends in Cognitive Sciences}, + author = {Schaal, Stefan}, month = jun, - year = {2014}, - note = {Conference Name: IEEE Transactions on Robotics}, - keywords = {Data models, Mathematical model, Robots, Computational modeling, Predictive models, Force, Animation and simulation, compliant contact model, Materials, nonlinear damping, normal force}, - pages = {736--739}, + year = {1999}, + pages = {233--242}, + file = {Schaal_1999_Is imitation learning the route to humanoid robots.pdf:/home/dferigo/Zotero/storage/IN2PXD9V/Schaal_1999_Is imitation learning the route to humanoid robots.pdf:application/pdf}, } -@inproceedings{azad_modeling_2010, - title = {Modeling the contact between a rolling sphere and a compliant ground plane}, - copyright = {http://legaloffice.weblogs.anu.edu.au/content/copyright/}, - isbn = {978-0-9807404-1-7}, - url = {https://openresearch-repository.anu.edu.au/handle/1885/62522}, - abstract = {In this paper a complete 3D contact model is presented. This model has nonlinearity in determining both normal and friction forces. A new nonlinear normal force model is introduced and the differences between this new model and previous ones are described. Also, the characteristics of this model are discussed and compared with the classical models and empirical results. For calculating the friction force, a new nonlinear model which is able to calculate pre-sliding displacement and viscous friction is presented. It is shown that this model allows us to keep track of all the energy in the system and therefore, supports an energy audit. Also, the rolling motion of a sphere on a compliant ground plane is simulated, and the results are presented. Rolling motion is an integral part of simulating the general 3D motion of a robot's foot while in contact with the ground.}, - language = {en}, - urldate = {2022-03-03}, - booktitle = {Proceedings of {ACRA} 2010}, - publisher = {Australian Robotics and Automation Association}, - author = {Azad, Morteza and Featherstone, Roy}, - year = {2010}, - note = {Accepted: 2015-12-10T23:05:49Z -Last Modified: 2020-05-19}, +@inproceedings{schaal_learning_1996, + title = {Learning from {Demonstration}}, + volume = {9}, + url = {https://proceedings.neurips.cc/paper/1996/hash/68d13cf26c4b4f4f932e3eff990093ba-Abstract.html}, + abstract = {By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely at(cid:173) tempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based rein(cid:173) forcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.}, + urldate = {2022-04-15}, + booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, + publisher = {MIT Press}, + author = {Schaal, Stefan}, + year = {1996}, } -@misc{bradbury_james_jax_2018, - title = {{JAX}: composable transformations of {Python}+{NumPy} programs}, - copyright = {Apache-2.0}, - shorttitle = {{JAX}}, - url = {https://github.com/google/jax}, - urldate = {2022-03-07}, - publisher = {Google}, - author = {{Bradbury, James} and {Frostig, Roy} and {Hawkins, Peter} and {Johnson, Matthew James} and {Leary, Chris} and {Maclaurin, Dougal} and {Necula, George} and {Paszke, Adam} and {VanderPlas, James} and {Wanderman-Milne, Skye} and {Zhang, Qiao}}, - year = {2018}, - note = {original-date: 2018-10-25T21:25:02Z}, - keywords = {jax}, +@incollection{schaal_dynamic_2006, + address = {Tokyo}, + title = {Dynamic {Movement} {Primitives} -{A} {Framework} for {Motor} {Control} in {Humans} and {Humanoid} {Robotics}}, + isbn = {978-4-431-24164-5}, + url = {http://link.springer.com/10.1007/4-431-31381-8_23}, + abstract = {Given the continuous stream of movements that biological systems exhibit in their daily activities, an account for such versatility and creativity has to assume that movement sequences consist of segments, executed either in sequence or with partial or complete overlap. Therefore, a fundamental question that has pervaded research in motor control both in artificial and biological systems revolves around identifying movement primitives (a.k.a. units of actions, basis behaviors, motor schemas, etc.). What are the fundamental building blocks that are strung together, adapted to, and created for ever new behaviors? This paper summarizes results that led to the hypothesis of Dynamic Movement Primitives (DMP). DMPs are units of action that are formalized as stable nonlinear attractor systems. They are useful for autonomous robotics as they are highly flexible in creating complex rhythmic (e.g., locomotion) and discrete (e.g., a tennis swing) behaviors that can quickly be adapted to the inevitable perturbations of a dynamically changing, stochastic environment. Moreover, DMPs provide a formal framework that also lends itself to investigations in computational neuroscience. A recent finding that allows creating DMPs with the help of well-understood statistical learning methods has elevated DMPs from a more heuristic to a principled modeling approach. Theoretical insights, evaluations on a humanoid robot, and behavioral and brain imaging data will serve to outline the framework of DMPs for a general approach to motor control in robotics and biology.}, + language = {en}, + urldate = {2022-04-15}, + booktitle = {Adaptive {Motion} of {Animals} and {Machines}}, + publisher = {Springer-Verlag}, + author = {Schaal, Stefan}, + year = {2006}, + doi = {10.1007/4-431-31381-8_23}, + pages = {261--280}, + file = {Schaal_2006_Dynamic Movement Primitives -A Framework for Motor Control in Humans and.pdf:/home/dferigo/Zotero/storage/NGSBJLMR/Schaal_2006_Dynamic Movement Primitives -A Framework for Motor Control in Humans and.pdf:application/pdf}, } -@article{frostig_compiling_2018, - title = {Compiling machine learning programs via high-level tracing}, - abstract = {We describe JAX, a domain-specific tracing JIT compiler for generating high-performance accelerator code from pure Python and Numpy machine learning programs. JAX uses the XLA compiler infrastructure to generate optimized code for the program subroutines that are most favorable for acceleration, and these optimized subroutines can be called and orchestrated by arbitrary Python. Because the system is fully compatible with Autograd, it allows forward- and reverse-mode automatic differentiation of Python functions to arbitrary order. Because JAX supports structured control flow, it can generate code for sophisticated machine learning algorithms while maintaining high performance. We show that by combining JAX with Autograd and Numpy we get an easily programmable and highly performant ML system that targets CPUs, GPUs, and TPUs, capable of scaling to multi-core Cloud TPUs.}, +@inproceedings{peters_policy_2006, + address = {Beijing, China}, + title = {Policy {Gradient} {Methods} for {Robotics}}, + isbn = {978-1-4244-0258-8 978-1-4244-0259-5}, + url = {http://ieeexplore.ieee.org/document/4058714/}, + doi = {10.1109/IROS.2006.282564}, + abstract = {The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm.}, language = {en}, - author = {Frostig, Roy and Johnson, Matthew James and Leary, Chris}, - year = {2018}, - pages = {3}, - file = {Frostig et al. - Compiling machine learning programs via high-level.pdf:/home/dferigo/Zotero/storage/R9ZJ9L5K/Frostig et al. - Compiling machine learning programs via high-level.pdf:application/pdf}, + urldate = {2022-04-15}, + booktitle = {2006 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}}, + publisher = {IEEE}, + author = {Peters, Jan and Schaal, Stefan}, + month = oct, + year = {2006}, + pages = {2219--2225}, + file = {Peters_Schaal_2006_Policy Gradient Methods for Robotics.pdf:/home/dferigo/Zotero/storage/MAL8FFI9/Peters_Schaal_2006_Policy Gradient Methods for Robotics.pdf:application/pdf}, } -@book{featherstone_rigid_2008, - title = {Rigid body dynamics algorithms}, - isbn = {978-0-387-74314-1 978-0-387-74315-8}, - abstract = {"Rigid Body Dynamics presents the subject of computational rigid-body dynamics through the medium of spatial 6D vector notation. It explains how to model a rigid-body system and how to analyze it, and it presents the most comprehensive collection of the best rigid-body dynamics algorithms to be found in a single source. The use of spatial vector notation greatly reduces the volume of algebra which allows systems to be described using fewer equations and fewer quantities. It also allows problems to be solved in fewer steps, and solutions to be expressed more succinctly. In addition algorithms are explained simply and clearly, and are expressed in a compact form. The use of spatial vector notation facilitates the implementation of dynamics algorithms on a computer: shorter, simpler code that is easier to write, understand and debug, with no loss of efficiency." "Rigid Body Dynamics Algorithms is aimed at readers who already have some elementary knowledge of rigid-body dynamics, and are interested in calculating the dynamics of a rigid-body system. This book serves as an algorithms recipe book as well as a guide to the analysis and deeper understanding of rigid-body systems."--BOOK JACKET}, +@article{benbrahim_biped_1997, + title = {Biped dynamic walking using reinforcement learning}, + volume = {22}, + issn = {09218890}, + url = {https://linkinghub.elsevier.com/retrieve/pii/S0921889097000432}, + doi = {10.1016/S0921-8890(97)00043-2}, language = {en}, - publisher = {Springer}, - author = {Featherstone, Roy}, - year = {2008}, - note = {OCLC: ocn190774140}, - keywords = {Robots, Dynamics, Dynamics, Rigid, Recursive functions}, + number = {3-4}, + urldate = {2022-04-15}, + journal = {Robotics and Autonomous Systems}, + author = {Benbrahim, Hamid and Franklin, Judy A.}, + month = dec, + year = {1997}, + pages = {283--302}, + file = {Benbrahim_Franklin_1997_Biped dynamic walking using reinforcement learning.pdf:/home/dferigo/Zotero/storage/2JYIXAGY/Benbrahim_Franklin_1997_Biped dynamic walking using reinforcement learning.pdf:application/pdf}, } -@phdthesis{traversaro_modelling_2017, - title = {Modelling, {Estimation} and {Identification} of {Humanoid} {Robots} {Dynamics}}, - url = {https://github.com/traversaro/traversaro-phd-thesis}, - urldate = {2020-01-05}, - author = {Traversaro, Silvio}, - year = {2017}, -} - -@article{brockman_openai_2016, - title = {{OpenAI} {Gym}}, - url = {http://arxiv.org/abs/1606.01540}, - abstract = {OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.}, - urldate = {2019-05-22}, - journal = {arXiv:1606.01540 [cs]}, - author = {Brockman, Greg and Cheung, Vicki and Pettersson, Ludwig and Schneider, Jonas and Schulman, John and Tang, Jie and Zaremba, Wojciech}, - month = jun, - year = {2016}, - note = {arXiv: 1606.01540}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/MPYQGNAJ/1606.html:text/html;Brockman et al_2016_OpenAI Gym.pdf:/home/dferigo/Zotero/storage/Q2ANSE79/Brockman et al_2016_OpenAI Gym.pdf:application/pdf}, +@phdthesis{watkins_christopher_learning_1989, + title = {Learning from {Delayed} {Rewards}}, + school = {King's College}, + author = {Watkins, Christopher}, + year = {1989}, + file = {Watkins, Christopher John Cornish Hellaby_Learning from Delayed Rewards.pdf:/home/dferigo/Zotero/storage/BV9UA3K9/Watkins, Christopher John Cornish Hellaby_Learning from Delayed Rewards.pdf:application/pdf}, } -@article{dulac-arnoldChallengesRealWorldReinforcement2019s, - title = {Challenges of {Real}-{World} {Reinforcement} {Learning}}, - abstract = {Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in realworld systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real world problems. We also present an example domain that has been modified to present these challenges as a testbed for practical RL research.}, +@article{hinton_fast_2006, + title = {A {Fast} {Learning} {Algorithm} for {Deep} {Belief} {Nets}}, + volume = {18}, + issn = {0899-7667, 1530-888X}, + url = {https://direct.mit.edu/neco/article/18/7/1527-1554/7065}, + doi = {10.1162/neco.2006.18.7.1527}, + abstract = {We show how to use “complementary priors” to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.}, language = {en}, - urldate = {2019-05-22}, - author = {Dulac-Arnold, Gabriel and Mankowitz, Daniel and Hester, Todd}, - month = apr, - year = {2019}, - note = {arXiv: 1904.12901}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Statistics - Machine Learning}, - file = {Dulac-Arnold et al_2019_Challenges of Real-World Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/56EVTTDV/Dulac-Arnold et al_2019_Challenges of Real-World Reinforcement Learning.pdf:application/pdf}, -} - -@inproceedings{erezSimulationToolsModelbased2015s, - title = {Simulation tools for model-based robotics: {Comparison} of {Bullet}, {Havok}, {MuJoCo}, {ODE} and {PhysX}}, - shorttitle = {Simulation tools for model-based robotics}, - doi = {10.1109/ICRA.2015.7139807}, - abstract = {There is growing need for software tools that can accurately simulate the complex dynamics of modern robots. While a number of candidates exist, the field is fragmented. It is difficult to select the best tool for a given project, or to predict how much effort will be needed and what the ultimate simulation performance will be. Here we introduce new quantitative measures of simulation performance, focusing on the numerical challenges that are typical for robotics as opposed to multi-body dynamics and gaming. We then present extensive simulation results, obtained within a new software framework for instantiating the same model in multiple engines and running side-by-side comparisons. Overall we find that each engine performs best on the type of system it was designed and optimized for: MuJoCo wins the robotics-related tests, while the gaming engines win the gaming-related tests without a clear leader among them. The simulations are illustrated in the accompanying movie.}, - author = {Erez, T. and Tassa, Y. and Todorov, E.}, - year = {2015}, - keywords = {Mathematical model, middleware, Robot kinematics, Joints, Accuracy, Bullet, Computational modeling, control engineering computing, digital simulation, Engines, gaming engines, Havok, model-based robotics, MuJoCo, ODE, PhysX, robots, simulation tools}, - file = {Erez et al_2015_Simulation tools for model-based robotics.pdf:/home/dferigo/Zotero/storage/QIJZ3BFR/Erez et al_2015_Simulation tools for model-based robotics.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/KH8DWX7R/7139807.html:text/html}, + number = {7}, + urldate = {2022-04-15}, + journal = {Neural Computation}, + author = {Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee-Whye}, + month = jul, + year = {2006}, + pages = {1527--1554}, + file = {Hinton et al_2006_A Fast Learning Algorithm for Deep Belief Nets.pdf:/home/dferigo/Zotero/storage/IMEASWQY/Hinton et al_2006_A Fast Learning Algorithm for Deep Belief Nets.pdf:application/pdf}, } -@article{hwangbo_per-contact_2018, - title = {Per-{Contact} {Iteration} {Method} for {Solving} {Contact} {Dynamics}}, - volume = {3}, - issn = {2377-3766}, - doi = {10.1109/LRA.2018.2792536}, - abstract = {This letter introduces a new iterative method for contact dynamics problems. The proposed method is based on an efficient bisection method which iterates over each contact. We compared our approach to two existing ones for the same model and found that it is about twice as fast as the existing ones. We also introduce four different robotic simulation experiments and compare the proposed method to the most common contact solver, the projected Gauss-Seidel (PGS) method. We show that, while both methods are very efficient in solving simple problems, the proposed method significantly outperforms the PGS method in more complicated contact scenarios. Simulating one time step of an 18-DOF quadruped robot with multiple contacts took less than 20 μs with a single core of a CPU. This is at least an order of magnitude faster than many other simulators which employ multiple relaxation methods to the major dynamic principles in order to boost their computational speed. The proposed simulation method is also stable at 50 Hz due to its strict adherence to the dynamical principles. Although the accuracy might be compromised at such a low update rate, this means that we can simulate an 18-DOF robot more than thousand times faster than the real time.}, - number = {2}, - journal = {IEEE Robotics and Automation Letters}, - author = {Hwangbo, J. and Lee, J. and Hutter, M.}, - month = apr, - year = {2018}, - keywords = {iterative methods, Mathematical model, Robot kinematics, Friction, legged locomotion, Legged locomotion, legged robots, robot dynamics, 18-DOF quadruped robot, bisection method, collision avoidance, complicated contact scenarios, contact dynamics problems, contact modeling, contact solver, CPU, dynamical principles, frequency 50.0 Hz, Indexes, mechanical contact, multiple contacts, multiple relaxation methods, per-contact iteration method, PGS method, projected Gauss-Seidel method, robot kinematics, robotic simulation experiments, Simulation and animation, simulation method}, - pages = {895--902}, - file = {Hwangbo et al_2018_Per-Contact Iteration Method for Solving Contact Dynamics.pdf:/home/dferigo/Zotero/storage/EM7MBJHW/Hwangbo et al_2018_Per-Contact Iteration Method for Solving Contact Dynamics.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/KYQ3U5SM/8255551.html:text/html}, +@techreport{rummery_-line_1994, + title = {On-{Line} {Q}-{Learning} {Using} {Connectionist} {Systems}}, + abstract = {Reinforcement learning algorithms are a powerful machine learning technique. However, much of the work on these algorithms has been developed with regard to discrete finite-state Markovian problems, which is too restrictive for many real-world environments. Therefore, it is desirable to extend these methods to high dimensional continuous state-spaces, which requires the use of function approximation to generalise the information learnt by the system. In this report, the use of back-propagation neural networks (Rumelhart, Hinton and Williams 1986) is considered in this context. We consider a number of different algorithms based around Q-Learning (Watkins 1989) combined with the Temporal Difference algorithm (Sutton 1988), including a new algorithm (Modified Connectionist Q-Learning), and Q() (Peng and Williams 1994). In addition, we present algorithms for applying these updates on-line during trials, unlike backward replay used by Lin (1993) that requires waiting until the end of each t...}, + author = {Rummery, G. A. and Niranjan, M.}, + year = {1994}, + file = {Citeseer - Snapshot:/home/dferigo/Zotero/storage/IBKMIYHG/summary.html:text/html;Rummery_Niranjan_1994_On-Line Q-Learning Using Connectionist Systems.pdf:/home/dferigo/Zotero/storage/RP9VWYGZ/Rummery_Niranjan_1994_On-Line Q-Learning Using Connectionist Systems.pdf:application/pdf}, } -@inproceedings{ivaldiToolsSimulatingHumanoid2014s, - title = {Tools for simulating humanoid robot dynamics: {A} survey based on user feedback}, - shorttitle = {Tools for simulating humanoid robot dynamics}, - doi = {10.1109/HUMANOIDS.2014.7041462}, - abstract = {The number of tools for dynamics simulation has grown substantially in the last few years. Humanoid robots, in particular, make extensive use of such tools for a variety of applications, from simulating contacts to planning complex motions. It is necessary for the humanoid robotics community to have a systematic evaluation to assist in choosing which of the available tools is best for their research. This paper surveys the state of the art in dynamics simulation and reports on the analysis of an online survey about the use of dynamics simulation in the robotics research community. The major requirements for robotics researchers are better physics engines and open-source software. Despite the numerous tools, there is not a general-purpose simulator which dominates the others in terms of performance or application. However, for humanoid robotics, Gazebo emerges as the best choice among the open-source projects, while V-Rep is the preferred commercial simulator. The survey report has been instrumental for choosing Gazebo as the base for the new simulator for the iCub humanoid robot.}, - booktitle = {{IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, - author = {Ivaldi, S. and Peters, J. and Padois, V. and Nori, F.}, - month = nov, - year = {2014}, - keywords = {humanoid robots, Robots, open-source software, Open source software, iCub humanoid robot, control engineering computing, Engines, physics engine, Collision avoidance, Communities, dynamics simulation, Gazebo robotics, general-purpose simulator, humanoid robot dynamics, Physics, robotics research community, user feedback, V-Rep simulator}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/3MNDLKGL/7041462.html:text/html;Ivaldi et al_2014_Tools for simulating humanoid robot dynamics.pdf:/home/dferigo/Zotero/storage/UHMX6A28/Ivaldi et al_2014_Tools for simulating humanoid robot dynamics.pdf:application/pdf}, +@article{williams_simple_1992, + title = {Simple statistical gradient-following algorithms for connectionist reinforcement learning}, + volume = {8}, + issn = {1573-0565}, + url = {https://doi.org/10.1007/BF00992696}, + doi = {10.1007/BF00992696}, + abstract = {This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.}, + language = {en}, + number = {3}, + urldate = {2022-04-15}, + journal = {Machine Learning}, + author = {Williams, Ronald J.}, + month = may, + year = {1992}, + pages = {229--256}, + file = {Williams_1992_Simple statistical gradient-following algorithms for connectionist.pdf:/home/dferigo/Zotero/storage/TICDQE49/Williams_1992_Simple statistical gradient-following algorithms for connectionist.pdf:application/pdf}, } -@inproceedings{koenig_design_2004, - address = {Sendai, Japan}, - title = {Design and use paradigms for gazebo, an open-source multi-robot simulator}, +@article{sutton_learning_1988, + title = {Learning to predict by the methods of temporal differences}, volume = {3}, - isbn = {978-0-7803-8463-7}, - url = {http://ieeexplore.ieee.org/document/1389727/}, - doi = {10.1109/IROS.2004.1389727}, - abstract = {Simulators have played a critical role in robotics research as tools for quick and efficient testing of new concepts, strategies, and algorithms. To date, most simulators have been restricted to 2D worlds, and few have matured to the point where they are both highly capable and easily adaptable. Gazebo is designed to fill this niche by creating a 3D dynamic multi-robot environment capable of recreating the complex worlds that will be encountered by the next generation of mobile robots. Its open source status, fine grained control, and high fidelity place Gazebo in a unique position to become more than just a stepping stone between the drawing board and real hardware: data visualization, simulation of remote environments, and even reverse engineering of blackbox systems are all possible applications.}, + issn = {0885-6125, 1573-0565}, + url = {http://link.springer.com/10.1007/BF00115009}, + doi = {10.1007/BF00115009}, + abstract = {This article introduces a class of incremental learning procedures specialized for prediction that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, tile new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference method{\textasciitilde} have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, telnporal-differenee methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporaldifference methods can be applied to advantage.}, language = {en}, - urldate = {2019-02-07}, - booktitle = {2004 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS}) ({IEEE} {Cat}. {No}.{04CH37566})}, - publisher = {IEEE}, - author = {Koenig, N. and Howard, A.}, - year = {2004}, - pages = {2149--2154}, - file = {Koenig_Howard_2004_Design and use paradigms for gazebo, an open-source multi-robot simulator.pdf:/home/dferigo/Zotero/storage/9KCY9GLI/Koenig_Howard_2004_Design and use paradigms for gazebo, an open-source multi-robot simulator.pdf:application/pdf}, + number = {1}, + urldate = {2022-04-15}, + journal = {Machine Learning}, + author = {Sutton, Richard S.}, + month = aug, + year = {1988}, + pages = {9--44}, + file = {Sutton_1988_Learning to predict by the methods of temporal differences.pdf:/home/dferigo/Zotero/storage/XBWG7D4V/Sutton_1988_Learning to predict by the methods of temporal differences.pdf:application/pdf}, } -@article{lee_dart_2018, - title = {{DART}: {Dynamic} {Animation} and {Robotics} {Toolkit}}, - issn = {2475-9066}, - shorttitle = {{DART}}, - doi = {10.21105/joss.00500}, +@article{tesauro_td-gammon_1994, + title = {{TD}-{Gammon}, a {Self}-{Teaching} {Backgammon} {Program}, {Achieves} {Master}-{Level} {Play}}, + volume = {6}, + issn = {0899-7667, 1530-888X}, + url = {https://direct.mit.edu/neco/article/6/2/215-219/5771}, + doi = {10.1162/neco.1994.6.2.215}, + abstract = {TD-Gammonis a neural network that is able to teach itself to play backgammosnolely by playing against itself and learning from the results, based on the TD(A)reinforcement learning algorithm (Sutton, 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammoanchieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e. given only a "raw" description of the board state), the network learns to play at a strong intermediate level. Furthermore, when a set of hand-crafted features is added to the network’s input representation, the result is a truly staggering level of performance: the latest version of TD-Gammoisn nowestimated to play at a strong master level that is extremely close to the world’s best humanplayers.}, language = {en}, - urldate = {2019-08-09}, - journal = {The Journal of Open Source Software}, - author = {Lee, Jeongseok and X. Grey, Michael and Ha, Sehoon and Kunz, Tobias and Jain, Sumit and Ye, Yuting and S. Srinivasa, Siddhartha and Stilman, Mike and Karen Liu, C.}, - month = feb, - year = {2018}, - file = {Lee et al_2018_DART.pdf:/home/dferigo/Zotero/storage/VGHXAZPR/Lee et al_2018_DART.pdf:application/pdf}, + number = {2}, + urldate = {2022-04-15}, + journal = {Neural Computation}, + author = {Tesauro, Gerald}, + month = mar, + year = {1994}, + pages = {215--219}, + file = {Tesauro_1994_TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play.pdf:/home/dferigo/Zotero/storage/9C9X9HWP/Tesauro_1994_TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play.pdf:application/pdf}, } -@article{liangGPUAcceleratedRoboticSimulation2018s, - title = {{GPU}-{Accelerated} {Robotic} {Simulation} for {Distributed} {Reinforcement} {Learning}}, - abstract = {Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10 − 1000× fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.}, +@article{lin_reinforcement_1993, + title = {Reinforcement {Learning} for {Robots} {Using} {Neural} {Networks}}, language = {en}, - urldate = {2019-06-19}, - author = {Liang, Jacky and Makoviychuk, Viktor and Handa, Ankur and Chentanez, Nuttapong and Macklin, Miles and Fox, Dieter}, - month = oct, - year = {2018}, - note = {arXiv: 1810.05762}, - keywords = {Computer Science - Robotics}, - file = {Liang et al_2018_GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/WHNFWU9F/Liang et al_2018_GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning.pdf:application/pdf}, -} - -@inproceedings{todorov_mujoco_2012, - title = {{MuJoCo}: {A} physics engine for model-based control}, - shorttitle = {{MuJoCo}}, - doi = {10.1109/IROS.2012.6386109}, - abstract = {We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are specified using either a high-level C++ API or an intuitive XML file format. A built-in compiler transforms the user model into an optimized data structure used for runtime computation. The engine can compute both forward and inverse dynamics. The latter are well-defined even in the presence of contacts and equality constraints. The model can include tendon wrapping as well as actuator activation states (e.g. pneumatic cylinders or muscles). To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel. Around 400,000 dynamics evaluations per second are possible on a 12-core machine, for a 3D homanoid with 18 dofs and 6 active contacts. We have already used the engine in a number of control applications. It will soon be made publicly available.}, - booktitle = {2012 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}}, - author = {Todorov, E. and Erez, T. and Tassa, Y.}, - month = oct, - year = {2012}, - keywords = {Mathematical model, optimal control, humanoid robots, Heuristic algorithms, application program interfaces, Dynamics, Computational modeling, control engineering computing, Engines, MuJoCo, 12-core machine, 3D humanoid, active contacts, actuator activation states, built-in compiler transforms, C++ language, data structures, finite difference methods, finite differencing, high-level C++ API, intuitive XML file format, model-based control, multijoint dynamics, optimal control applications, Optimization, optimized data structure, physics engine, program compilers, recursive algorithms, runtime computation, shock absorbers, spring-dampers, tendon wrapping, velocity-stepping approach, XML}, - pages = {5026--5033}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/UYHBLVES/6386109.html:text/html;Todorov et al_2012_MuJoCo.pdf:/home/dferigo/Zotero/storage/MH7NEVNR/Todorov et al_2012_MuJoCo.pdf:application/pdf}, + author = {Lin, Long-Ji}, + year = {1993}, + pages = {168}, + file = {Lin_1993_Reinforcement Learning for Robots Using Neural Networks.pdf:/home/dferigo/Zotero/storage/Y2VMYRV8/Lin_1993_Reinforcement Learning for Robots Using Neural Networks.pdf:application/pdf}, } -@article{xieIterativeReinforcementLearning2019s, - title = {Iterative {Reinforcement} {Learning} {Based} {Design} of {Dynamic} {Locomotion} {Skills} for {Cassie}}, - abstract = {Deep reinforcement learning (DRL) is a promising approach for developing legged locomotion skills. However, the iterative design process that is inevitable in practice is poorly supported by the default methodology. It is difficult to predict the outcomes of changes made to the reward functions, policy architectures, and the set of tasks being trained on. In this paper, we propose a practical method that allows the reward function to be fully redefined on each successive design iteration while limiting the deviation from the previous iteration. We characterize policies via sets of Deterministic Action Stochastic State (DASS) tuples, which represent the deterministic policy state-action pairs as sampled from the states visited by the trained stochastic policy. New policies are trained using a policy gradient algorithm which then mixes RL-based policy gradients with gradient updates defined by the DASS tuples. The tuples also allow for robust policy distillation to new network architectures. We demonstrate the effectiveness of this iterative-design approach on the bipedal robot Cassie, achieving stable walking with different gait styles at various speeds. We demonstrate the successful transfer of policies learned in simulation to the physical robot without any dynamics randomization, and that variable-speed walking policies for the physical robot can be represented by a small dataset of 5-10k tuples.}, +@article{narendra_identification_1990, + title = {Identification and control of dynamical systems using neural networks}, + volume = {1}, + issn = {10459227}, + url = {http://ieeexplore.ieee.org/document/80202/}, + doi = {10.1109/72.80202}, + abstract = {The paper demonstrates that neural networks can be used effectively for the identification and control of nonlinear dynamical systems. The emphasis of the paper is on models for both identification and control. Static and dynamic back-propagation methods for the adjustment of parameters are discussed. In the models that are introduced, multilayer and recurrent networks are interconnected in novel configurations and hence there is a real need to study them in a unified fashion. Simulation results reveal that the identification and adaptive control schemes suggested are practically feasible. Basic concepts and definitions are introduced throughout the paper, and theoretical questions which have to be addressed are also described.}, language = {en}, - urldate = {2019-04-05}, - author = {Xie, Zhaoming and Clary, Patrick and Dao, Jeremy and Morais, Pedro and Hurst, Jonathan and van de Panne, Michiel}, + number = {1}, + urldate = {2022-04-15}, + journal = {IEEE Transactions on Neural Networks}, + author = {Narendra, K.S. and Parthasarathy, K.}, month = mar, - year = {2019}, - keywords = {Computer Science - Robotics}, - file = {Xie et al_2019_Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for.pdf:/home/dferigo/Zotero/storage/8MUYACWD/Xie et al_2019_Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for.pdf:application/pdf}, + year = {1990}, + pages = {4--27}, + file = {Narendra_Parthasarathy_1990_Identification and control of dynamical systems using neural networks.pdf:/home/dferigo/Zotero/storage/T9ITSBDR/Narendra_Parthasarathy_1990_Identification and control of dynamical systems using neural networks.pdf:application/pdf}, } -@inproceedings{yang_learning_2018, - title = {Learning {Whole}-{Body} {Motor} {Skills} for {Humanoids}}, - doi = {10.1109/HUMANOIDS.2018.8625045}, - abstract = {This paper presents a hierarchical framework for Deep Reinforcement Learning that acquires motor skills for a variety of push recovery and balancing behaviors, i.e., ankle, hip, foot tilting, and stepping strategies. The policy is trained in a physics simulator with realistic setting of robot model and low-level impedance control that are easy to transfer the learned skills to real robots. The advantage over traditional methods is the integration of high-level planner and feedback control all in one single coherent policy network, which is generic for learning versatile balancing and recovery motions against unknown perturbations at arbitrary locations (e.g., legs, torso). Furthermore, the proposed framework allows the policy to be learned quickly by many state-of-the-art learning algorithms. By comparing our learned results to studies of preprogrammed, special-purpose controllers in the literature, self-learned skills are comparable in terms of disturbance rejection but with additional advantages of producing a wide range of adaptive, versatile and robust behaviors.}, - booktitle = {2018 {IEEE}-{RAS} 18th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, - author = {Yang, C. and Yuan, K. and Merkt, W. and Komura, T. and Vijayakumar, S. and Li, Z.}, - month = nov, - year = {2018}, - keywords = {Reinforcement learning, feedback, learning (artificial intelligence), humanoid robots, Aerospace electronics, Humanoid robots, legged locomotion, adaptive behaviors, ankle, arbitrary locations, balancing behaviors, Collision avoidance, Deep Reinforcement, feedback control, foot tilting, hierarchical framework, high-level planner, hip, humanoids, learning algorithms, low-level impedance control, path planning, PD control, preprogrammed purpose controllers, push recovery, realistic setting, recovery motions, robot model, robust behaviors, self-learned skills, single coherent policy network, special-purpose controllers, stepping strategies, unknown perturbations, versatile balancing, versatile behaviors, whole-body motor skills}, - pages = {270--276}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/IDSJNM9R/8625045.html:text/html;Yang et al_2018_Learning Whole-Body Motor Skills for Humanoids.pdf:/home/dferigo/Zotero/storage/TNUZ5X3Q/Yang et al_2018_Learning Whole-Body Motor Skills for Humanoids.pdf:application/pdf}, +@article{koberReinforcementLearningRobotics2013, + title = {Reinforcement {Learning} in {Robotics}: {A} {Survey}}, + language = {en}, + journal = {International Journal of Robotics Research}, + author = {Kober, Jens and Bagnell, J Andrew and Peters, Jan}, + year = {2013}, + pages = {38}, + file = {Kober2013-Reinforcement_Learning_in_Robotics.pdf:/home/dferigo/Zotero/storage/858JLQET/Kober2013-Reinforcement_Learning_in_Robotics.pdf:application/pdf;Kober2013-Reinforcement_Learning_in_Robotics.pdf:/home/dferigo/Zotero/storage/W3LHRYT7/Kober2013-Reinforcement_Learning_in_Robotics.pdf:application/pdf}, } -@article{liang_rllib_2018, - title = {{RLlib}: {Abstractions} for {Distributed} {Reinforcement} {Learning}}, - abstract = {Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project 1.}, - journal = {arXiv}, - author = {Liang, Eric and Liaw, Richard and Moritz, Philipp and Nishihara, Robert and Fox, Roy and Goldberg, Ken and Gonzalez, Joseph E. and Jordan, Michael I. and Stoica, Ion}, - year = {2018}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing}, +@article{chatzilygeroudis_survey_2020, + title = {A {Survey} on {Policy} {Search} {Algorithms} for {Learning} {Robot} {Controllers} in a {Handful} of {Trials}}, + volume = {36}, + issn = {1941-0468}, + doi = {10.1109/TRO.2019.2958211}, + abstract = {Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data,” we refer to this challenge as “micro-data reinforcement learning.” In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.}, + number = {2}, + journal = {IEEE Transactions on Robotics}, + author = {Chatzilygeroudis, Konstantinos and Vassiliades, Vassilis and Stulp, Freek and Calinon, Sylvain and Mouret, Jean-Baptiste}, + month = apr, + year = {2020}, + keywords = {Autonomous agents, learning and adaptive systems, micro-data policy search (MDPS), robot learning}, + pages = {328--347}, + file = {Chatzilygeroudis et al_2020_A Survey on Policy Search Algorithms for Learning Robot Controllers in a.pdf:/home/dferigo/Zotero/storage/VLFIF5C6/Chatzilygeroudis et al_2020_A Survey on Policy Search Algorithms for Learning Robot Controllers in a.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/I636FFXG/8944013.html:text/html}, } -@article{lillicrap_continuous_2016, - title = {Continuous {Control} with {Deep} {Reinforcement} {Learning}}, - abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.}, +@article{fabisch_survey_2019, + title = {A {Survey} of {Behavior} {Learning} {Applications} in {Robotics} -- {State} of the {Art} and {Perspectives}}, + url = {http://arxiv.org/abs/1906.01868}, + abstract = {Recent success of machine learning in many domains has been overwhelming, which often leads to false expectations regarding the capabilities of behavior learning in robotics. In this survey, we analyze the current state of machine learning for robotic behaviors. We will give a broad overview of behaviors that have been learned and used on real robots. Our focus is on kinematically or sensorially complex robots. That includes humanoid robots or parts of humanoid robots, for example, legged robots or robotic arms. We will classify presented behaviors according to various categories and we will draw conclusions about what can be learned and what should be learned. Furthermore, we will give an outlook on problems that are challenging today but might be solved by machine learning in the future and argue that classical robotics and other approaches from artificial intelligence should be integrated more with machine learning to form complete, autonomous systems.}, language = {en}, - journal = {ICLR}, - author = {Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan}, - year = {2016}, - keywords = {deepmind}, + urldate = {2019-06-27}, + journal = {arXiv:1906.01868 [cs]}, + author = {Fabisch, Alexander and Petzoldt, Christoph and Otto, Marc and Kirchner, Frank}, + month = jun, + year = {2019}, + note = {arXiv: 1906.01868}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, + file = {Fabisch et al_2019_A Survey of Behavior Learning Applications in Robotics -- State of the Art and.pdf:/home/dferigo/Zotero/storage/R9NX3LR5/Fabisch et al_2019_A Survey of Behavior Learning Applications in Robotics -- State of the Art and.pdf:application/pdf}, } -@article{metta_icub_2010, - title = {The {iCub} humanoid robot: {An} open-systems platform for research in cognitive development}, - shorttitle = {The {iCub} humanoid robot}, - abstract = {We describe a humanoid robot platform — the iCub — which was designed to support collaborative research in cognitive development through autonomous exploration and social interaction. The motivation for this effort is the conviction that significantly greater impact can be leveraged by adopting an open systems policy for software and hardware development. This creates the need for a robust humanoid robot that offers rich perceptuo-motor capabilities with many degrees of freedom, a cognitive capacity for learning and development, a software architecture that encourages reuse \& easy integration, and a support infrastructure that fosters collaboration and sharing of resources. The iCub satisfies all of these needs in the guise of an open-system platform which is freely available and which has attracted a growing community of users and developers. To date, twenty iCubs each comprising approximately 5000 mechanical and electrical parts have been delivered to several research labs in Europe and to one in the USA.}, +@article{busoniu_reinforcement_2018, + title = {Reinforcement learning for control: {Performance}, stability, and deep approximators}, + volume = {46}, + issn = {13675788}, + shorttitle = {Reinforcement learning for control}, + url = {https://linkinghub.elsevier.com/retrieve/pii/S1367578818301184}, + doi = {10.1016/j.arcontrol.2018.09.005}, + abstract = {Reinforcement learning (RL) offers powerful algorithms to search for optimal controllers of systems with nonlinear, possibly stochastic dynamics that are unknown or highly uncertain. This review mainly covers artificial-intelligence approaches to RL, from the viewpoint of the control engineer. We explain how approximate representations of the solution make RL feasible for problems with continuous states and control actions. Stability is a central concern in control, and we argue that while the control-theoretic RL subfield called adaptive dynamic programming is dedicated to it, stability of RL largely remains an open question. We also cover in detail the case where deep neural networks are used for approximation, leading to the field of deep RL, which has shown great success in recent years. With the control practitioner in mind, we outline opportunities and pitfalls of deep RL; and we close the survey with an outlook that – among other things – points out some avenues for bridging the gap between control and artificial-intelligence RL techniques.}, language = {en}, - urldate = {2020-07-22}, - journal = {Neural Networks}, - author = {Metta, Giorgio and Natale, Lorenzo and Nori, Francesco and Sandini, Giulio and Vernon, David and Fadiga, Luciano and von Hofsten, Claes and Rosander, Kerstin and Lopes, Manuel and Santos-Victor, José and Bernardino, Alexandre and Montesano, Luis}, - year = {2010}, -} - -@article{nori_icub_2015, - title = {{iCub} {Whole}-{Body} {Control} through {Force} {Regulation} on {Rigid} {Non}-{Coplanar} {Contacts}}, - abstract = {This paper details the implementation on the humanoid robot iCub of state-of-the-art algorithms for whole-body control. We regulate the forces between the robot and its surrounding environment to stabilize a desired robot posture. We assume that the forces and torques are exerted on rigid contacts. The validity of this assumption is guaranteed by constraining the contact forces and torques, e.g. the contact forces must belong to the associated friction cones. The implementation of this control strategy requires to estimate the external forces acting on the robot, and the internal joint torques. We then detail algorithms to obtain these estimations when using a robot with an iCub-like sensor set, i.e. distributed six-axis force-torque sensors and whole-body tactile sensors. A general theory for identifying the robot inertial parameters is also presented. From an actuation standpoint, we show how to implement a joint torque control in the case of DC brushless motors. In addition, the coupling mechanism of the iCub torso is investigated. The soundness of the entire control architecture is validated in a real scenario involving the robot iCub balancing and making contacts at both arms.}, - urldate = {2020-07-22}, - journal = {Frontiers in Robotics and AI}, - author = {Nori, Francesco and Traversaro, Silvio and Eljaik, Jorhabib and Romano, Francesco and Del Prete, Andrea and Pucci, Daniele}, - year = {2015}, - keywords = {floating base robots, force sensors., noncoplanar contact, Rigid contacts, Tactile sensors, whole-body control}, -} - -@article{peng_deepmimic_2018, - title = {{DeepMimic}: {Example}-{Guided} {Deep} {Reinforcement} {Learning} of {Physics}-{Based} {Character} {Skills}}, - abstract = {A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We show that well-known reinforcement learning (RL) methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals. Our method handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings, e.g., by walking in a desired direction or throwing a ball at a user-specified target. This approach thus combines the convenience and motion quality of using motion clips to define the desired style and appearance, with the flexibility and generality afforded by RL methods and physics-based animation. We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills. We demonstrate results using multiple characters (human, Atlas robot, bipedal dinosaur, dragon) and a large variety of skills, including locomotion, acrobatics, and martial arts.}, - urldate = {2019-09-24}, - journal = {ACM Transactions on Graphics}, - author = {Peng, Xue Bin and Abbeel, Pieter and Levine, Sergey and van de Panne, Michiel}, - month = jul, + urldate = {2018-12-17}, + journal = {Annual Reviews in Control}, + author = {Buşoniu, Lucian and de Bruin, Tim and Tolić, Domagoj and Kober, Jens and Palunko, Ivana}, year = {2018}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Graphics}, -} - -@article{peng_learning_2020, - title = {Learning {Agile} {Robotic} {Locomotion} {Skills} by {Imitating} {Animals}}, - url = {http://arxiv.org/abs/2004.00784}, - abstract = {Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.}, - urldate = {2020-05-07}, - journal = {arXiv:2004.00784 [cs]}, - author = {Peng, Xue Bin and Coumans, Erwin and Zhang, Tingnan and Lee, Tsang-Wei and Tan, Jie and Levine, Sergey}, - month = apr, - year = {2020}, - note = {arXiv: 2004.00784}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, -} - -@article{peng_learning_2017, - title = {Learning locomotion skills using {DeepRL}: does the choice of action space matter?}, - shorttitle = {Learning locomotion skills using {DeepRL}}, - abstract = {The use of deep reinforcement learning allows for high-dimensional state descriptors, but little is known about how the choice of action representation impacts learning and the resulting performance. We compare the impact of four different action parameterizations (torques, muscle-activations, target joint angles, and target joint-angle velocities) in terms of learning time, policy robustness, motion quality, and policy query rates. Our results are evaluated on a gait-cycle imitation task for multiple planar articulated figures and multiple gaits. We demonstrate that the local feedback provided by higher-level action parameterizations can significantly impact the learning, robustness, and motion quality of the resulting policies.}, - journal = {SIGGRAPH}, - author = {Peng, Xue Bin and van de Panne, Michiel}, - year = {2017}, - keywords = {motion control, locomotion skills, physics-based character animation}, + pages = {8--28}, + file = {Buşoniu et al_2018_Reinforcement learning for control.pdf:/home/dferigo/Zotero/storage/KWV5Z2VL/Buşoniu et al_2018_Reinforcement learning for control.pdf:application/pdf}, } -@article{schulman_trust_2017-1, - title = {Trust {Region} {Policy} {Optimization}}, - url = {http://arxiv.org/abs/1502.05477}, - abstract = {We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.}, +@inproceedings{stulp_reinforcement_2010, + title = {Reinforcement learning of full-body humanoid motor skills}, + isbn = {978-1-4244-8688-5}, + url = {http://ieeexplore.ieee.org/document/5686320/}, + doi = {10.1109/ICHR.2010.5686320}, + abstract = {Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is modelfree, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn fullbody motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.}, language = {en}, - urldate = {2020-07-22}, - journal = {arXiv:1502.05477 [cs]}, - author = {Schulman, John and Levine, Sergey and Moritz, Philipp and Jordan, Michael I. and Abbeel, Pieter}, - month = apr, - year = {2017}, - note = {arXiv: 1502.05477}, - keywords = {Computer Science - Machine Learning}, -} - -@article{schulman_proximal_2017-1, - title = {Proximal {Policy} {Optimization} {Algorithms}}, - abstract = {We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.}, - urldate = {2020-05-07}, - journal = {arXiv}, - author = {Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg}, - year = {2017}, - keywords = {Computer Science - Machine Learning}, + urldate = {2018-06-17}, + publisher = {IEEE}, + author = {Stulp, Freek and Buchli, Jonas and Theodorou, Evangelos and Schaal, Stefan}, + month = dec, + year = {2010}, + pages = {405--410}, + file = {Stulp et al_2010_Reinforcement learning of full-body humanoid motor skills.pdf:/home/dferigo/Zotero/storage/4H6NFGV6/Stulp et al_2010_Reinforcement learning of full-body humanoid motor skills.pdf:application/pdf}, } -@article{saccon_centroidal_2017, - title = {On {Centroidal} {Dynamics} and {Integrability} of {Average} {Angular} {Velocity}}, - volume = {2}, - issn = {2377-3766, 2377-3774}, - url = {http://arxiv.org/abs/1701.02514}, - doi = {10.1109/LRA.2017.2655560}, - abstract = {In the literature on robotics and multibody dynamics, the concept of average angular velocity has received considerable attention in recent years. We address the question of whether the average angular velocity defines an orientation frame that depends only on the current robot configuration and provide a simple algebraic condition to check whether this holds. In the language of geometric mechanics, this condition corresponds to requiring the flatness of the mechanical connection associated to the robotic system. Here, however, we provide both a reinterpretation and a proof of this result accessible to readers with a background in rigid body kinematics and multibody dynamics but not necessarily acquainted with differential geometry, still providing precise links to the geometric mechanics literature. This should help spreading the algebraic condition beyond the scope of geometric mechanics, contributing to a proper utilization and understanding of the concept of average angular velocity.}, +@article{sola_micro_2020, + title = {A micro {Lie} theory for state estimation in robotics}, + url = {http://arxiv.org/abs/1812.01537}, + abstract = {A Lie group is an old mathematical abstract object dating back to the XIX century, when mathematician Sophus Lie laid the foundations of the theory of continuous transformation groups. Its influence has spread over diverse areas of science and technology many years later. In robotics, we are recently experiencing an important trend in its usage, at least in the fields of estimation, and particularly in motion estimation for navigation. Yet for a vast majority of roboticians, Lie groups are highly abstract constructions and therefore difficult to understand and to use.}, language = {en}, - number = {2}, - urldate = {2022-04-13}, - journal = {IEEE Robotics and Automation Letters}, - author = {Saccon, Alessandro and Traversaro, Silvio and Nori, Francesco and Nijmeijer, Henk}, - month = apr, - year = {2017}, - note = {arXiv: 1701.02514}, + urldate = {2021-11-26}, + journal = {arXiv:1812.01537 [cs]}, + author = {Solà, Joan and Deray, Jeremie and Atchuthan, Dinesh}, + month = nov, + year = {2020}, + note = {arXiv: 1812.01537}, keywords = {Computer Science - Robotics}, - pages = {943--950}, - file = {Saccon et al_2017_On Centroidal Dynamics and Integrability of Average Angular Velocity.pdf:/home/dferigo/Zotero/storage/6P4LEJ7T/Saccon et al_2017_On Centroidal Dynamics and Integrability of Average Angular Velocity.pdf:application/pdf}, + file = {Solà et al_2020_A micro Lie theory for state estimation in robotics.pdf:/home/dferigo/Zotero/storage/IBW6W28A/Solà et al_2020_A micro Lie theory for state estimation in robotics.pdf:application/pdf}, } -@article{gillen_leveraging_2022, - title = {Leveraging {Reward} {Gradients} {For} {Reinforcement} {Learning} in {Differentiable} {Physics} {Simulations}}, - url = {http://arxiv.org/abs/2203.02857}, - abstract = {In recent years, fully differentiable rigid body physics simulators have been developed, which can be used to simulate a wide range of robotic systems. In the context of reinforcement learning for control, these simulators theoretically allow algorithms to be applied directly to analytic gradients of the reward function. However, to date, these gradients have proved extremely challenging to use, and are outclassed by algorithms using no gradient information at all. In this work we present a novel algorithm, cross entropy analytic policy gradients, that is able to leverage these gradients to outperform state of art deep reinforcement learning on a certain set of challenging nonlinear control problems.}, +@article{dulac-arnold_empirical_2021, + title = {An empirical investigation of the challenges of real-world reinforcement learning}, + url = {http://arxiv.org/abs/2003.11881}, + abstract = {Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.}, language = {en}, urldate = {2022-04-13}, - journal = {arXiv:2203.02857 [cs, eess]}, - author = {Gillen, Sean and Byl, Katie}, + journal = {arXiv:2003.11881 [cs]}, + author = {Dulac-Arnold, Gabriel and Levine, Nir and Mankowitz, Daniel J. and Li, Jerry and Paduraru, Cosmin and Gowal, Sven and Hester, Todd}, month = mar, - year = {2022}, - note = {arXiv: 2203.02857}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control}, - file = {Gillen_Byl_2022_Leveraging Reward Gradients For Reinforcement Learning in Differentiable.pdf:/home/dferigo/Zotero/storage/2SG33NTS/Gillen_Byl_2022_Leveraging Reward Gradients For Reinforcement Learning in Differentiable.pdf:application/pdf}, + year = {2021}, + note = {arXiv: 2003.11881}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence}, + file = {Dulac-Arnold et al_2021_An empirical investigation of the challenges of real-world reinforcement.pdf:/home/dferigo/Zotero/storage/FN6Z7NFN/Dulac-Arnold et al_2021_An empirical investigation of the challenges of real-world reinforcement.pdf:application/pdf}, } -@article{howell_dojo_2022, - title = {Dojo: {A} {Differentiable} {Simulator} for {Robotics}}, - shorttitle = {Dojo}, - url = {http://arxiv.org/abs/2203.00806}, - abstract = {We present a differentiable rigid-body-dynamics simulator for robotics that prioritizes physical accuracy and differentiability: Dojo. The simulator utilizes an expressive maximal-coordinates representation, achieves stable simulation at low sample rates, and conserves energy and momentum by employing a variational integrator. A nonlinear complementarity problem, with nonlinear friction cones, models hard contact and is reliably solved using a custom primal-dual interiorpoint method. The implicit-function theorem enables efficient differentiation of an intermediate relaxed problem and computes smooth gradients from the contact model. We demonstrate the usefulness of the simulator and its gradients through a number of examples including: simulation, trajectory optimization, reinforcement learning, and system identification.}, +@article{dulac-arnold_empirical_2020, + title = {An empirical investigation of the challenges of real-world reinforcement learning}, + url = {http://arxiv.org/abs/2003.11881}, + abstract = {Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.}, language = {en}, - urldate = {2022-04-13}, - journal = {arXiv:2203.00806 [cs]}, - author = {Howell, Taylor A. and Cleac'h, Simon Le and Kolter, J. Zico and Schwager, Mac and Manchester, Zachary}, + urldate = {2020-07-25}, + journal = {arXiv:2003.11881 [cs]}, + author = {Dulac-Arnold, Gabriel and Levine, Nir and Mankowitz, Daniel J. and Li, Jerry and Paduraru, Cosmin and Gowal, Sven and Hester, Todd}, month = mar, - year = {2022}, - note = {arXiv: 2203.00806}, - keywords = {Computer Science - Robotics}, - file = {Howell et al_2022_Dojo.pdf:/home/dferigo/Zotero/storage/SRC36QEU/Howell et al_2022_Dojo.pdf:application/pdf}, + year = {2020}, + note = {arXiv: 2003.11881}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence}, + file = {Dulac-Arnold et al_2020_An empirical investigation of the challenges of real-world reinforcement.pdf:/home/dferigo/Zotero/storage/X77JWG9W/Dulac-Arnold et al_2020_An empirical investigation of the challenges of real-world reinforcement.pdf:application/pdf}, } -@article{werling_fast_2021, - title = {Fast and {Feature}-{Complete} {Differentiable} {Physics} for {Articulated} {Rigid} {Bodies} with {Contact}}, - url = {http://arxiv.org/abs/2103.16021}, - abstract = {We present a fast and feature-complete differentiable physics engine, Nimble (nimblephysics.org), that supports Lagrangian dynamics and hard contact constraints for articulated rigid body simulation. Our differentiable physics engine offers a complete set of features that are typically only available in non-differentiable physics simulators commonly used by robotics applications. We solve contact constraints precisely using linear complementarity problems (LCPs). We present efficient and novel analytical gradients through the LCP formulation of inelastic contact that exploit the sparsity of the LCP solution. We support complex contact geometry, and gradients approximating continuous-time elastic collision. We also introduce a novel method to compute complementarity-aware gradients that help downstream optimization tasks avoid stalling in saddle points. We show that an implementation of this combination in an existing physics engine (DART) is capable of a 87x single-core speedup over finite-differencing in computing analytical Jacobians for a single timestep, while preserving all the expressiveness of original DART.}, - urldate = {2022-04-13}, - journal = {arXiv:2103.16021 [cs, eess]}, - author = {Werling, Keenon and Omens, Dalton and Lee, Jeongseok and Exarchos, Ioannis and Liu, C. Karen}, +@article{haarnoja_learning_2019, + title = {Learning to {Walk} via {Deep} {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/1812.11103}, + abstract = {Deep reinforcement learning (deep RL) holds the promise of automating the acquisition of complex controllers that can map sensory inputs directly to low-level actions. In the domain of robotic locomotion, deep RL could enable learning locomotion skills with minimal engineering and without an explicit model of the robot dynamics. Unfortunately, applying deep RL to real-world robotic tasks is exceptionally difficult, primarily due to poor sample complexity and sensitivity to hyperparameters. While hyperparameters can be easily tuned in simulated domains, tuning may be prohibitively expensive on physical systems, such as legged robots, that can be damaged through extensive trial-and-error learning. In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies. We apply this method to learning walking gaits on a real-world Minitaur robot. Our method can acquire a stable gait from scratch directly in the real world in about two hours, without relying on any model or simulation, and the resulting policy is robust to moderate variations in the environment. We further show that our algorithm achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters. Videos of training and the learned policy can be found on the project website.}, + urldate = {2020-05-07}, + journal = {arXiv:1812.11103 [cs, stat]}, + author = {Haarnoja, Tuomas and Ha, Sehoon and Zhou, Aurick and Tan, Jie and Tucker, George and Levine, Sergey}, month = jun, + year = {2019}, + note = {arXiv: 1812.11103}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Statistics - Machine Learning}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/NVW3BT23/1812.html:text/html;Haarnoja et al_2019_Learning to Walk via Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/X3F83MC6/Haarnoja et al_2019_Learning to Walk via Deep Reinforcement Learning.pdf:application/pdf}, +} + +@inproceedings{rohmer_v-rep_2013, + title = {V-{REP}: {A} versatile and scalable robot simulation framework}, + shorttitle = {V-{REP}}, + doi = {10.1109/IROS.2013.6696520}, + abstract = {From exploring planets to cleaning homes, the reach and versatility of robotics is vast. The integration of actuation, sensing and control makes robotics systems powerful, but complicates their simulation. This paper introduces a versatile, scalable, yet powerful general-purpose robot simulation framework called V-REP. The paper discusses the utility of a portable and flexible simulation framework that allows for direct incorporation of various control techniques. This renders simulations and simulation models more accessible to a general-public, by reducing the simulation model deployment complexity. It also increases productivity by offering built-in and ready-to-use functionalities, as well as a multitude of programming approaches. This allows for a multitude of applications including rapid algorithm development, system verification, rapid prototyping, and deployment for cases such as safety/remote monitoring, training and education, hardware control, and factory automation simulation.}, + booktitle = {2013 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}}, + author = {Rohmer, Eric and Singh, Surya P. N. and Freese, Marc}, + month = nov, + year = {2013}, + note = {ISSN: 2153-0866}, + keywords = {Computational modeling, Hardware, Joints, Load modeling, Robots, Sensors, Shape}, + pages = {1321--1326}, + file = {Rohmer et al_2013_V-REP.pdf:/home/dferigo/Zotero/storage/P46T7D6L/Rohmer et al_2013_V-REP.pdf:application/pdf}, +} + +@article{ibarz_how_2021, + title = {How to {Train} {Your} {Robot} with {Deep} {Reinforcement} {Learning}; {Lessons} {We}'ve {Learned}}, + issn = {0278-3649, 1741-3176}, + url = {http://arxiv.org/abs/2102.02915}, + doi = {10.1177/0278364920987859}, + abstract = {Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time,real world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn; as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.}, + urldate = {2021-04-08}, + journal = {The International Journal of Robotics Research}, + author = {Ibarz, Julian and Tan, Jie and Finn, Chelsea and Kalakrishnan, Mrinal and Pastor, Peter and Levine, Sergey}, + month = jan, year = {2021}, - note = {arXiv: 2103.16021}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/CHBP72VL/2103.html:text/html;Werling et al_2021_Fast and Feature-Complete Differentiable Physics for Articulated Rigid Bodies.pdf:/home/dferigo/Zotero/storage/EZU732MN/Werling et al_2021_Fast and Feature-Complete Differentiable Physics for Articulated Rigid Bodies.pdf:application/pdf}, + note = {arXiv: 2102.02915}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, + pages = {027836492098785}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/89CMU29H/2102.html:text/html;Ibarz et al_2021_How to Train Your Robot with Deep Reinforcement Learning\; Lessons We've Learned.pdf:/home/dferigo/Zotero/storage/YXS79SEC/Ibarz et al_2021_How to Train Your Robot with Deep Reinforcement Learning\; Lessons We've Learned.pdf:application/pdf}, } -@article{qiao_efficient_2021, - title = {Efficient {Differentiable} {Simulation} of {Articulated} {Bodies}}, - url = {http://arxiv.org/abs/2109.07719}, - abstract = {We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks, and gradient-based optimization of neural networks that operate on articulated bodies. We derive the gradients of the forward dynamics using spatial algebra and the adjoint method. Our approach is an order of magnitude faster than autodiff tools. By only saving the initial states throughout the simulation process, our method reduces memory requirements by two orders of magnitude. We demonstrate the utility of efficient differentiable dynamics for articulated bodies in a variety of applications. We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method. In applications to control and inverse problems, gradient-based optimization enabled by our work accelerates convergence by more than an order of magnitude.}, +@article{da_learning_2020, + title = {Learning a {Contact}-{Adaptive} {Controller} for {Robust}, {Efficient} {Legged} {Locomotion}}, + url = {http://arxiv.org/abs/2009.10019}, + abstract = {We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago). The system consists of a high-level controller that learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives. Our framework learns a controller that can adapt to challenging environmental changes on the fly, including novel scenarios not seen during training. The learned controller is up to 85 percent more energy efficient and is more robust compared to baseline methods. We also deploy the controller on a physical robot without any randomization or adaptation scheme.}, language = {en}, - urldate = {2022-04-13}, - journal = {arXiv:2109.07719 [cs]}, - author = {Qiao, Yi-Ling and Liang, Junbang and Koltun, Vladlen and Lin, Ming C.}, + urldate = {2020-10-01}, + journal = {arXiv:2009.10019 [cs]}, + author = {Da, Xingye and Xie, Zhaoming and Hoeller, David and Boots, Byron and Anandkumar, Animashree and Zhu, Yuke and Babich, Buck and Garg, Animesh}, month = sep, - year = {2021}, - note = {arXiv: 2109.07719}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Graphics}, - file = {Qiao et al_2021_Efficient Differentiable Simulation of Articulated Bodies.pdf:/home/dferigo/Zotero/storage/98MGA44B/Qiao et al_2021_Efficient Differentiable Simulation of Articulated Bodies.pdf:application/pdf}, + year = {2020}, + note = {arXiv: 2009.10019}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, + file = {Da et al_2020_Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion.pdf:/home/dferigo/Zotero/storage/UABUZD58/Da et al_2020_Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion.pdf:application/pdf}, } -@article{plancher_grid_2022, - title = {{GRiD}: {GPU}-{Accelerated} {Rigid} {Body} {Dynamics} with {Analytical} {Gradients}}, - shorttitle = {{GRiD}}, - url = {http://arxiv.org/abs/2109.06976}, - abstract = {We introduce GRiD: a GPU-accelerated library for computing rigid body dynamics with analytical gradients. GRiD was designed to accelerate the nonlinear trajectory optimization subproblem used in state-of-the-art robotic planning, control, and machine learning, which requires tens to hundreds of naturally parallel computations of rigid body dynamics and their gradients at each iteration. GRiD leverages URDF parsing and code generation to deliver optimized dynamics kernels that not only expose GPU-friendly computational patterns, but also take advantage of both fine-grained parallelism within each computation and coarse-grained parallelism between computations. Through this approach, when performing multiple computations of rigid body dynamics algorithms, GRiD provides as much as a 7.2x speedup over a state-of-the-art, multi-threaded CPU implementation, and maintains as much as a 2.5x speedup when accounting for I/O overhead. We release GRiD as an open-source library for use by the wider robotics community.}, +@article{yang_data_2019, + title = {Data {Efficient} {Reinforcement} {Learning} for {Legged} {Robots}}, + url = {http://arxiv.org/abs/1907.03613}, + abstract = {We present a model-based reinforcement learning framework for robot locomotion that achieves walking based on only 4.5 minutes of data collected on a quadruped robot. To accurately model the robot’s dynamics over a long horizon, we introduce a loss function that tracks the model’s prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function.1 To the best of our knowledge, our approach is more than an order of magnitude more sample efficient than current model-free methods.}, language = {en}, urldate = {2022-04-13}, - journal = {arXiv:2109.06976 [cs]}, - author = {Plancher, Brian and Neuman, Sabrina M. and Ghosal, Radhika and Kuindersma, Scott and Reddi, Vijay Janapa}, - month = feb, - year = {2022}, - note = {arXiv: 2109.06976}, - keywords = {Computer Science - Robotics}, - file = {Plancher et al. - 2022 - GRiD GPU-Accelerated Rigid Body Dynamics with Ana.pdf:/home/dferigo/Zotero/storage/PS4UCZMJ/Plancher et al. - 2022 - GRiD GPU-Accelerated Rigid Body Dynamics with Ana.pdf:application/pdf}, + journal = {arXiv:1907.03613 [cs]}, + author = {Yang, Yuxiang and Caluwaerts, Ken and Iscen, Atil and Zhang, Tingnan and Tan, Jie and Sindhwani, Vikas}, + month = oct, + year = {2019}, + note = {arXiv: 1907.03613}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {Yang et al_2019_Data Efficient Reinforcement Learning for Legged Robots.pdf:/home/dferigo/Zotero/storage/XQMX3SIP/Yang et al_2019_Data Efficient Reinforcement Learning for Legged Robots.pdf:application/pdf}, } -@article{suh_differentiable_2022, - title = {Do {Differentiable} {Simulators} {Give} {Better} {Policy} {Gradients}?}, - url = {http://arxiv.org/abs/2202.00817}, - abstract = {Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an α-order gradient estimator, with α ∈ [0, 1], which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zeroorder methods. We demonstrate the pitfalls of traditional estimators and the advantages of the α-order estimator on some numerical examples.}, +@article{hwangboLearningAgileDynamic2019s, + title = {Learning agile and dynamic motor skills for legged robots}, + issn = {2470-9476}, + doi = {10.1126/scirobotics.aau5872}, language = {en}, - urldate = {2022-04-13}, - journal = {arXiv:2202.00817 [cs]}, - author = {Suh, H. J. Terry and Simchowitz, Max and Zhang, Kaiqing and Tedrake, Russ}, - month = feb, - year = {2022}, - note = {arXiv: 2202.00817}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, - file = {Suh et al_2022_Do Differentiable Simulators Give Better Policy Gradients.pdf:/home/dferigo/Zotero/storage/8LC9ZL5P/Suh et al_2022_Do Differentiable Simulators Give Better Policy Gradients.pdf:application/pdf}, + urldate = {2019-05-20}, + journal = {Science Robotics}, + author = {Hwangbo, Jemin and Lee, Joonho and Dosovitskiy, Alexey and Bellicoso, Dario and Tsounis, Vassilios and Koltun, Vladlen and Hutter, Marco}, + month = jan, + year = {2019}, + file = {Hwangbo et al_2019_Learning agile and dynamic motor skills for legged robots.pdf:/home/dferigo/Zotero/storage/GDPEWKB2/Hwangbo et al_2019_Learning agile and dynamic motor skills for legged robots.pdf:application/pdf}, } -@article{rudin_learning_2021, - title = {Learning to {Walk} in {Minutes} {Using} {Massively} {Parallel} {Deep} {Reinforcement} {Learning}}, - url = {http://arxiv.org/abs/2109.11978}, - abstract = {In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged\_gym/.}, - language = {en}, - urldate = {2022-04-13}, - journal = {arXiv:2109.11978 [cs]}, - author = {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco}, - month = oct, - year = {2021}, - note = {arXiv: 2109.11978}, +@article{tsounis_deepgait_2020, + title = {{DeepGait}: {Planning} and {Control} of {Quadrupedal} {Gaits} using {Deep} {Reinforcement} {Learning}}, + shorttitle = {{DeepGait}}, + url = {http://arxiv.org/abs/1909.08399}, + abstract = {This paper addresses the problem of legged locomotion in non-flat terrain. As legged robots such as quadrupeds are to be deployed in terrains with geometries which are difficult to model and predict, the need arises to equip them with the capability to generalize well to unforeseen situations. In this work, we propose a novel technique for training neural-network policies for terrain-aware locomotion, which combines state-of-the-art methods for model-based motion planning and reinforcement learning. Our approach is centered on formulating Markov decision processes using the evaluation of dynamic feasibility criteria in place of physical simulation. We thus employ policy-gradient methods to independently train policies which respectively plan and execute foothold and base motions in 3D environments using both proprioceptive and exteroceptive measurements. We apply our method within a challenging suite of simulated terrain scenarios which contain features such as narrow bridges, gaps and stepping-stones, and train policies which succeed in locomoting effectively in all cases.}, + urldate = {2020-07-28}, + journal = {arXiv:1909.08399 [cs]}, + author = {Tsounis, Vassilios and Alge, Mitja and Lee, Joonho and Farshidian, Farbod and Hutter, Marco}, + month = jan, + year = {2020}, + note = {arXiv: 1909.08399}, keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {Rudin et al_2021_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/4C3M2URR/Rudin et al_2021_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:application/pdf}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/NWSXQ664/1909.html:text/html;Tsounis et al_2020_DeepGait.pdf:/home/dferigo/Zotero/storage/UBF2BQVD/Tsounis et al_2020_DeepGait.pdf:application/pdf}, } -@book{coumans_pybullet_2016, - title = {Pybullet, a python module for physics simulation in robotics, games and machine learning}, - author = {Coumans, Erwin and Bai, Yunfei}, - year = {2016}, +@inproceedings{theodorou_reinforcement_2010, + title = {Reinforcement learning of motor skills in high dimensions: {A} path integral approach}, + isbn = {978-1-4244-5038-1}, + shorttitle = {Reinforcement learning of motor skills in high dimensions}, + url = {http://ieeexplore.ieee.org/document/5509336/}, + doi = {10.1109/ROBOT.2010.5509336}, + abstract = {Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to highdimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.}, + language = {en}, + urldate = {2018-06-17}, + publisher = {IEEE}, + author = {Theodorou, Evangelos and Buchli, Jonas and Schaal, Stefan}, + month = may, + year = {2010}, + pages = {2397--2403}, + file = {Theodorou et al_2010_Reinforcement learning of motor skills in high dimensions.pdf:/home/dferigo/Zotero/storage/JCDMGAN5/Theodorou et al_2010_Reinforcement learning of motor skills in high dimensions.pdf:application/pdf}, } -@article{rodriguez_deepwalk_2021, - title = {{DeepWalk}: {Omnidirectional} {Bipedal} {Gait} by {Deep} {Reinforcement} {Learning}}, - shorttitle = {{DeepWalk}}, - url = {http://arxiv.org/abs/2106.00534}, - abstract = {Bipedal walking is one of the most difficult but exciting challenges in robotics. The difficulties arise from the complexity of high-dimensional dynamics, sensing and actuation limitations combined with real-time and computational constraints. Deep Reinforcement Learning (DRL) holds the promise to address these issues by fully exploiting the robot dynamics with minimal craftsmanship. In this paper, we propose a novel DRL approach that enables an agent to learn omnidirectional locomotion for humanoid (bipedal) robots. Notably, the locomotion behaviors are accomplished by a single control policy (a single neural network). We achieve this by introducing a new curriculum learning method that gradually increases the task difficulty by scheduling target velocities. In addition, our method does not require reference motions which facilities its application to robots with different kinematics, and reduces the overall complexity. Finally, different strategies for sim-to-real transfer are presented which allow us to transfer the learned policy to a real humanoid robot.}, +@article{tan_sim--real_2018, + title = {Sim-to-{Real}: {Learning} {Agile} {Locomotion} {For} {Quadruped} {Robots}}, + abstract = {Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch using simple reward signals. In addition, users can provide an open loop reference to guide the learning process when more control over the learned gait is needed. The control policies are learned in a physics simulator and then deployed on real robots. In robotics, policies trained in simulation often do not transfer to the real world. We narrow this reality gap by improving the physics simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in the real world.}, language = {en}, - urldate = {2022-04-13}, - journal = {arXiv:2106.00534 [cs]}, - author = {Rodriguez, Diego and Behnke, Sven}, - month = jun, - year = {2021}, - note = {arXiv: 2106.00534}, - keywords = {Computer Science - Robotics}, - file = {Rodriguez_Behnke_2021_DeepWalk.pdf:/home/dferigo/Zotero/storage/URLX8XDC/Rodriguez_Behnke_2021_DeepWalk.pdf:application/pdf}, + author = {Tan, Jie and Zhang, Tingnan and Coumans, Erwin and Iscen, Atil and Bai, Yunfei and Hafner, Danijar and Bohez, Steven and Vanhoucke, Vincent}, + year = {2018}, + pages = {10}, + file = {Tan et al_2018_Sim-to-Real.pdf:/home/dferigo/Zotero/storage/STASMH88/Tan et al_2018_Sim-to-Real.pdf:application/pdf}, } -@article{heess_emergence_2017, - title = {Emergence of {Locomotion} {Behaviours} in {Rich} {Environments}}, - abstract = {The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion – behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed in this video.}, - author = {Heess, Nicolas and TB, Dhruva and Sriram, Srinivasan and Lemmon, Jay and Merel, Josh and Wayne, Greg and Tassa, Yuval and Erez, Tom and Wang, Ziyu and Eslami, S M Ali and Riedmiller, Martin and Silver, David}, - year = {2017}, +@article{lillicrap_continuous_2016, + title = {Continuous {Control} with {Deep} {Reinforcement} {Learning}}, + abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.}, + language = {en}, + author = {Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan}, + year = {2016}, keywords = {deepmind}, pages = {14}, - file = {Heess et al_2017_Emergence of Locomotion Behaviours in Rich Environments.pdf:/home/dferigo/Zotero/storage/H9E9RKMJ/Heess et al_2017_Emergence of Locomotion Behaviours in Rich Environments.pdf:application/pdf}, + file = {Lillicrap et al_2016_Continuous Control with Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/4CRG6GXA/Lillicrap et al_2016_Continuous Control with Deep Reinforcement Learning.pdf:application/pdf}, } @article{heess_learning_2015, @@ -1050,1212 +1056,1367 @@ @article{heess_learning_2015 file = {Heess et al_2015_Learning Continuous Control Policies by Stochastic Value Gradients.pdf:/home/dferigo/Zotero/storage/3A8IMU4B/Heess et al_2015_Learning Continuous Control Policies by Stochastic Value Gradients.pdf:application/pdf}, } -@article{lillicrap_continuous_2016-1, - title = {Continuous {Control} with {Deep} {Reinforcement} {Learning}}, - abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.}, - language = {en}, - author = {Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan}, - year = {2016}, +@article{heess_emergence_2017, + title = {Emergence of {Locomotion} {Behaviours} in {Rich} {Environments}}, + abstract = {The reinforcement learning paradigm allows, in principle, for complex behaviours to be learned directly from simple reward signals. In practice, however, it is common to carefully hand-design the reward function to encourage a particular solution, or to derive it from demonstration data. In this paper explore how a rich environment can help to promote the learning of complex behavior. Specifically, we train agents in diverse environmental contexts, and find that this encourages the emergence of robust behaviours that perform well across a suite of tasks. We demonstrate this principle for locomotion – behaviours that are known for their sensitivity to the choice of reward. We train several simulated bodies on a diverse set of challenging terrains and obstacles, using a simple reward function based on forward progress. Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment without explicit reward-based guidance. A visual depiction of highlights of the learned behavior can be viewed in this video.}, + author = {Heess, Nicolas and TB, Dhruva and Sriram, Srinivasan and Lemmon, Jay and Merel, Josh and Wayne, Greg and Tassa, Yuval and Erez, Tom and Wang, Ziyu and Eslami, S M Ali and Riedmiller, Martin and Silver, David}, + year = {2017}, keywords = {deepmind}, pages = {14}, - file = {Lillicrap et al_2016_Continuous Control with Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/4CRG6GXA/Lillicrap et al_2016_Continuous Control with Deep Reinforcement Learning.pdf:application/pdf}, + file = {Heess et al_2017_Emergence of Locomotion Behaviours in Rich Environments.pdf:/home/dferigo/Zotero/storage/H9E9RKMJ/Heess et al_2017_Emergence of Locomotion Behaviours in Rich Environments.pdf:application/pdf}, } -@article{tan_sim--real_2018, - title = {Sim-to-{Real}: {Learning} {Agile} {Locomotion} {For} {Quadruped} {Robots}}, - abstract = {Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch using simple reward signals. In addition, users can provide an open loop reference to guide the learning process when more control over the learned gait is needed. The control policies are learned in a physics simulator and then deployed on real robots. In robotics, policies trained in simulation often do not transfer to the real world. We narrow this reality gap by improving the physics simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in the real world.}, +@article{rodriguez_deepwalk_2021, + title = {{DeepWalk}: {Omnidirectional} {Bipedal} {Gait} by {Deep} {Reinforcement} {Learning}}, + shorttitle = {{DeepWalk}}, + url = {http://arxiv.org/abs/2106.00534}, + abstract = {Bipedal walking is one of the most difficult but exciting challenges in robotics. The difficulties arise from the complexity of high-dimensional dynamics, sensing and actuation limitations combined with real-time and computational constraints. Deep Reinforcement Learning (DRL) holds the promise to address these issues by fully exploiting the robot dynamics with minimal craftsmanship. In this paper, we propose a novel DRL approach that enables an agent to learn omnidirectional locomotion for humanoid (bipedal) robots. Notably, the locomotion behaviors are accomplished by a single control policy (a single neural network). We achieve this by introducing a new curriculum learning method that gradually increases the task difficulty by scheduling target velocities. In addition, our method does not require reference motions which facilities its application to robots with different kinematics, and reduces the overall complexity. Finally, different strategies for sim-to-real transfer are presented which allow us to transfer the learned policy to a real humanoid robot.}, language = {en}, - author = {Tan, Jie and Zhang, Tingnan and Coumans, Erwin and Iscen, Atil and Bai, Yunfei and Hafner, Danijar and Bohez, Steven and Vanhoucke, Vincent}, - year = {2018}, - pages = {10}, - file = {Tan et al_2018_Sim-to-Real.pdf:/home/dferigo/Zotero/storage/STASMH88/Tan et al_2018_Sim-to-Real.pdf:application/pdf}, + urldate = {2022-04-13}, + journal = {arXiv:2106.00534 [cs]}, + author = {Rodriguez, Diego and Behnke, Sven}, + month = jun, + year = {2021}, + note = {arXiv: 2106.00534}, + keywords = {Computer Science - Robotics}, + file = {Rodriguez_Behnke_2021_DeepWalk.pdf:/home/dferigo/Zotero/storage/URLX8XDC/Rodriguez_Behnke_2021_DeepWalk.pdf:application/pdf}, } -@inproceedings{theodorou_reinforcement_2010, - title = {Reinforcement learning of motor skills in high dimensions: {A} path integral approach}, - isbn = {978-1-4244-5038-1}, - shorttitle = {Reinforcement learning of motor skills in high dimensions}, - url = {http://ieeexplore.ieee.org/document/5509336/}, - doi = {10.1109/ROBOT.2010.5509336}, - abstract = {Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to highdimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.}, +@article{rudin_learning_2021, + title = {Learning to {Walk} in {Minutes} {Using} {Massively} {Parallel} {Deep} {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/2109.11978}, + abstract = {In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged\_gym/.}, language = {en}, - urldate = {2018-06-17}, - publisher = {IEEE}, - author = {Theodorou, Evangelos and Buchli, Jonas and Schaal, Stefan}, - month = may, - year = {2010}, - pages = {2397--2403}, - file = {Theodorou et al_2010_Reinforcement learning of motor skills in high dimensions.pdf:/home/dferigo/Zotero/storage/JCDMGAN5/Theodorou et al_2010_Reinforcement learning of motor skills in high dimensions.pdf:application/pdf}, -} - -@article{tsounis_deepgait_2020, - title = {{DeepGait}: {Planning} and {Control} of {Quadrupedal} {Gaits} using {Deep} {Reinforcement} {Learning}}, - shorttitle = {{DeepGait}}, - url = {http://arxiv.org/abs/1909.08399}, - abstract = {This paper addresses the problem of legged locomotion in non-flat terrain. As legged robots such as quadrupeds are to be deployed in terrains with geometries which are difficult to model and predict, the need arises to equip them with the capability to generalize well to unforeseen situations. In this work, we propose a novel technique for training neural-network policies for terrain-aware locomotion, which combines state-of-the-art methods for model-based motion planning and reinforcement learning. Our approach is centered on formulating Markov decision processes using the evaluation of dynamic feasibility criteria in place of physical simulation. We thus employ policy-gradient methods to independently train policies which respectively plan and execute foothold and base motions in 3D environments using both proprioceptive and exteroceptive measurements. We apply our method within a challenging suite of simulated terrain scenarios which contain features such as narrow bridges, gaps and stepping-stones, and train policies which succeed in locomoting effectively in all cases.}, - urldate = {2020-07-28}, - journal = {arXiv:1909.08399 [cs]}, - author = {Tsounis, Vassilios and Alge, Mitja and Lee, Joonho and Farshidian, Farbod and Hutter, Marco}, - month = jan, - year = {2020}, - note = {arXiv: 1909.08399}, + urldate = {2022-04-13}, + journal = {arXiv:2109.11978 [cs]}, + author = {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco}, + month = oct, + year = {2021}, + note = {arXiv: 2109.11978}, keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/NWSXQ664/1909.html:text/html;Tsounis et al_2020_DeepGait.pdf:/home/dferigo/Zotero/storage/UBF2BQVD/Tsounis et al_2020_DeepGait.pdf:application/pdf}, + file = {Rudin et al_2021_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/4C3M2URR/Rudin et al_2021_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:application/pdf}, } -@article{hwangboLearningAgileDynamic2019s, - title = {Learning agile and dynamic motor skills for legged robots}, - issn = {2470-9476}, - doi = {10.1126/scirobotics.aau5872}, +@article{suh_differentiable_2022, + title = {Do {Differentiable} {Simulators} {Give} {Better} {Policy} {Gradients}?}, + url = {http://arxiv.org/abs/2202.00817}, + abstract = {Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients. However, it is yet unclear what factors decide the performance of the two estimators on complex landscapes that involve long-horizon planning and control on physical systems, despite the crucial relevance of this question for the utility of differentiable simulators. We show that characteristics of certain physical systems, such as stiffness or discontinuities, may compromise the efficacy of the first-order estimator, and analyze this phenomenon through the lens of bias and variance. We additionally propose an α-order gradient estimator, with α ∈ [0, 1], which correctly utilizes exact gradients to combine the efficiency of first-order estimates with the robustness of zeroorder methods. We demonstrate the pitfalls of traditional estimators and the advantages of the α-order estimator on some numerical examples.}, language = {en}, - urldate = {2019-05-20}, - journal = {Science Robotics}, - author = {Hwangbo, Jemin and Lee, Joonho and Dosovitskiy, Alexey and Bellicoso, Dario and Tsounis, Vassilios and Koltun, Vladlen and Hutter, Marco}, - month = jan, - year = {2019}, - file = {Hwangbo et al_2019_Learning agile and dynamic motor skills for legged robots.pdf:/home/dferigo/Zotero/storage/GDPEWKB2/Hwangbo et al_2019_Learning agile and dynamic motor skills for legged robots.pdf:application/pdf}, + urldate = {2022-04-13}, + journal = {arXiv:2202.00817 [cs]}, + author = {Suh, H. J. Terry and Simchowitz, Max and Zhang, Kaiqing and Tedrake, Russ}, + month = feb, + year = {2022}, + note = {arXiv: 2202.00817}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {Suh et al_2022_Do Differentiable Simulators Give Better Policy Gradients.pdf:/home/dferigo/Zotero/storage/8LC9ZL5P/Suh et al_2022_Do Differentiable Simulators Give Better Policy Gradients.pdf:application/pdf}, } -@article{yang_data_2019, - title = {Data {Efficient} {Reinforcement} {Learning} for {Legged} {Robots}}, - url = {http://arxiv.org/abs/1907.03613}, - abstract = {We present a model-based reinforcement learning framework for robot locomotion that achieves walking based on only 4.5 minutes of data collected on a quadruped robot. To accurately model the robot’s dynamics over a long horizon, we introduce a loss function that tracks the model’s prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function.1 To the best of our knowledge, our approach is more than an order of magnitude more sample efficient than current model-free methods.}, +@article{plancher_grid_2022, + title = {{GRiD}: {GPU}-{Accelerated} {Rigid} {Body} {Dynamics} with {Analytical} {Gradients}}, + shorttitle = {{GRiD}}, + url = {http://arxiv.org/abs/2109.06976}, + abstract = {We introduce GRiD: a GPU-accelerated library for computing rigid body dynamics with analytical gradients. GRiD was designed to accelerate the nonlinear trajectory optimization subproblem used in state-of-the-art robotic planning, control, and machine learning, which requires tens to hundreds of naturally parallel computations of rigid body dynamics and their gradients at each iteration. GRiD leverages URDF parsing and code generation to deliver optimized dynamics kernels that not only expose GPU-friendly computational patterns, but also take advantage of both fine-grained parallelism within each computation and coarse-grained parallelism between computations. Through this approach, when performing multiple computations of rigid body dynamics algorithms, GRiD provides as much as a 7.2x speedup over a state-of-the-art, multi-threaded CPU implementation, and maintains as much as a 2.5x speedup when accounting for I/O overhead. We release GRiD as an open-source library for use by the wider robotics community.}, language = {en}, urldate = {2022-04-13}, - journal = {arXiv:1907.03613 [cs]}, - author = {Yang, Yuxiang and Caluwaerts, Ken and Iscen, Atil and Zhang, Tingnan and Tan, Jie and Sindhwani, Vikas}, - month = oct, - year = {2019}, - note = {arXiv: 1907.03613}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, - file = {Yang et al_2019_Data Efficient Reinforcement Learning for Legged Robots.pdf:/home/dferigo/Zotero/storage/XQMX3SIP/Yang et al_2019_Data Efficient Reinforcement Learning for Legged Robots.pdf:application/pdf}, + journal = {arXiv:2109.06976 [cs]}, + author = {Plancher, Brian and Neuman, Sabrina M. and Ghosal, Radhika and Kuindersma, Scott and Reddi, Vijay Janapa}, + month = feb, + year = {2022}, + note = {arXiv: 2109.06976}, + keywords = {Computer Science - Robotics}, + file = {Plancher et al. - 2022 - GRiD GPU-Accelerated Rigid Body Dynamics with Ana.pdf:/home/dferigo/Zotero/storage/PS4UCZMJ/Plancher et al. - 2022 - GRiD GPU-Accelerated Rigid Body Dynamics with Ana.pdf:application/pdf}, } -@article{da_learning_2020, - title = {Learning a {Contact}-{Adaptive} {Controller} for {Robust}, {Efficient} {Legged} {Locomotion}}, - url = {http://arxiv.org/abs/2009.10019}, - abstract = {We present a hierarchical framework that combines model-based control and reinforcement learning (RL) to synthesize robust controllers for a quadruped (the Unitree Laikago). The system consists of a high-level controller that learns to choose from a set of primitives in response to changes in the environment and a low-level controller that utilizes an established control method to robustly execute the primitives. Our framework learns a controller that can adapt to challenging environmental changes on the fly, including novel scenarios not seen during training. The learned controller is up to 85 percent more energy efficient and is more robust compared to baseline methods. We also deploy the controller on a physical robot without any randomization or adaptation scheme.}, +@article{qiao_efficient_2021, + title = {Efficient {Differentiable} {Simulation} of {Articulated} {Bodies}}, + url = {http://arxiv.org/abs/2109.07719}, + abstract = {We present a method for efficient differentiable simulation of articulated bodies. This enables integration of articulated body dynamics into deep learning frameworks, and gradient-based optimization of neural networks that operate on articulated bodies. We derive the gradients of the forward dynamics using spatial algebra and the adjoint method. Our approach is an order of magnitude faster than autodiff tools. By only saving the initial states throughout the simulation process, our method reduces memory requirements by two orders of magnitude. We demonstrate the utility of efficient differentiable dynamics for articulated bodies in a variety of applications. We show that reinforcement learning with articulated systems can be accelerated using gradients provided by our method. In applications to control and inverse problems, gradient-based optimization enabled by our work accelerates convergence by more than an order of magnitude.}, language = {en}, - urldate = {2020-10-01}, - journal = {arXiv:2009.10019 [cs]}, - author = {Da, Xingye and Xie, Zhaoming and Hoeller, David and Boots, Byron and Anandkumar, Animashree and Zhu, Yuke and Babich, Buck and Garg, Animesh}, + urldate = {2022-04-13}, + journal = {arXiv:2109.07719 [cs]}, + author = {Qiao, Yi-Ling and Liang, Junbang and Koltun, Vladlen and Lin, Ming C.}, month = sep, - year = {2020}, - note = {arXiv: 2009.10019}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {Da et al_2020_Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion.pdf:/home/dferigo/Zotero/storage/UABUZD58/Da et al_2020_Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion.pdf:application/pdf}, + year = {2021}, + note = {arXiv: 2109.07719}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Graphics}, + file = {Qiao et al_2021_Efficient Differentiable Simulation of Articulated Bodies.pdf:/home/dferigo/Zotero/storage/98MGA44B/Qiao et al_2021_Efficient Differentiable Simulation of Articulated Bodies.pdf:application/pdf}, } -@article{ibarz_how_2021, - title = {How to {Train} {Your} {Robot} with {Deep} {Reinforcement} {Learning}; {Lessons} {We}'ve {Learned}}, - issn = {0278-3649, 1741-3176}, - url = {http://arxiv.org/abs/2102.02915}, - doi = {10.1177/0278364920987859}, - abstract = {Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time,real world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn; as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.}, - urldate = {2021-04-08}, - journal = {The International Journal of Robotics Research}, - author = {Ibarz, Julian and Tan, Jie and Finn, Chelsea and Kalakrishnan, Mrinal and Pastor, Peter and Levine, Sergey}, - month = jan, - year = {2021}, - note = {arXiv: 2102.02915}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - pages = {027836492098785}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/89CMU29H/2102.html:text/html;Ibarz et al_2021_How to Train Your Robot with Deep Reinforcement Learning\; Lessons We've Learned.pdf:/home/dferigo/Zotero/storage/YXS79SEC/Ibarz et al_2021_How to Train Your Robot with Deep Reinforcement Learning\; Lessons We've Learned.pdf:application/pdf}, -} - -@inproceedings{rohmer_v-rep_2013, - title = {V-{REP}: {A} versatile and scalable robot simulation framework}, - shorttitle = {V-{REP}}, - doi = {10.1109/IROS.2013.6696520}, - abstract = {From exploring planets to cleaning homes, the reach and versatility of robotics is vast. The integration of actuation, sensing and control makes robotics systems powerful, but complicates their simulation. This paper introduces a versatile, scalable, yet powerful general-purpose robot simulation framework called V-REP. The paper discusses the utility of a portable and flexible simulation framework that allows for direct incorporation of various control techniques. This renders simulations and simulation models more accessible to a general-public, by reducing the simulation model deployment complexity. It also increases productivity by offering built-in and ready-to-use functionalities, as well as a multitude of programming approaches. This allows for a multitude of applications including rapid algorithm development, system verification, rapid prototyping, and deployment for cases such as safety/remote monitoring, training and education, hardware control, and factory automation simulation.}, - booktitle = {2013 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}}, - author = {Rohmer, Eric and Singh, Surya P. N. and Freese, Marc}, - month = nov, - year = {2013}, - note = {ISSN: 2153-0866}, - keywords = {Robots, Hardware, Joints, Computational modeling, Sensors, Shape, Load modeling}, - pages = {1321--1326}, - file = {Rohmer et al_2013_V-REP.pdf:/home/dferigo/Zotero/storage/P46T7D6L/Rohmer et al_2013_V-REP.pdf:application/pdf}, -} - -@article{haarnoja_learning_2019, - title = {Learning to {Walk} via {Deep} {Reinforcement} {Learning}}, - url = {http://arxiv.org/abs/1812.11103}, - abstract = {Deep reinforcement learning (deep RL) holds the promise of automating the acquisition of complex controllers that can map sensory inputs directly to low-level actions. In the domain of robotic locomotion, deep RL could enable learning locomotion skills with minimal engineering and without an explicit model of the robot dynamics. Unfortunately, applying deep RL to real-world robotic tasks is exceptionally difficult, primarily due to poor sample complexity and sensitivity to hyperparameters. While hyperparameters can be easily tuned in simulated domains, tuning may be prohibitively expensive on physical systems, such as legged robots, that can be damaged through extensive trial-and-error learning. In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies. We apply this method to learning walking gaits on a real-world Minitaur robot. Our method can acquire a stable gait from scratch directly in the real world in about two hours, without relying on any model or simulation, and the resulting policy is robust to moderate variations in the environment. We further show that our algorithm achieves state-of-the-art performance on simulated benchmarks with a single set of hyperparameters. Videos of training and the learned policy can be found on the project website.}, - urldate = {2020-05-07}, - journal = {arXiv:1812.11103 [cs, stat]}, - author = {Haarnoja, Tuomas and Ha, Sehoon and Zhou, Aurick and Tan, Jie and Tucker, George and Levine, Sergey}, +@article{werling_fast_2021, + title = {Fast and {Feature}-{Complete} {Differentiable} {Physics} for {Articulated} {Rigid} {Bodies} with {Contact}}, + url = {http://arxiv.org/abs/2103.16021}, + abstract = {We present a fast and feature-complete differentiable physics engine, Nimble (nimblephysics.org), that supports Lagrangian dynamics and hard contact constraints for articulated rigid body simulation. Our differentiable physics engine offers a complete set of features that are typically only available in non-differentiable physics simulators commonly used by robotics applications. We solve contact constraints precisely using linear complementarity problems (LCPs). We present efficient and novel analytical gradients through the LCP formulation of inelastic contact that exploit the sparsity of the LCP solution. We support complex contact geometry, and gradients approximating continuous-time elastic collision. We also introduce a novel method to compute complementarity-aware gradients that help downstream optimization tasks avoid stalling in saddle points. We show that an implementation of this combination in an existing physics engine (DART) is capable of a 87x single-core speedup over finite-differencing in computing analytical Jacobians for a single timestep, while preserving all the expressiveness of original DART.}, + urldate = {2022-04-13}, + journal = {arXiv:2103.16021 [cs, eess]}, + author = {Werling, Keenon and Omens, Dalton and Lee, Jeongseok and Exarchos, Ioannis and Liu, C. Karen}, month = jun, - year = {2019}, - note = {arXiv: 1812.11103}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Statistics - Machine Learning}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/NVW3BT23/1812.html:text/html;Haarnoja et al_2019_Learning to Walk via Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/X3F83MC6/Haarnoja et al_2019_Learning to Walk via Deep Reinforcement Learning.pdf:application/pdf}, + year = {2021}, + note = {arXiv: 2103.16021}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Systems and Control}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/CHBP72VL/2103.html:text/html;Werling et al_2021_Fast and Feature-Complete Differentiable Physics for Articulated Rigid Bodies.pdf:/home/dferigo/Zotero/storage/EZU732MN/Werling et al_2021_Fast and Feature-Complete Differentiable Physics for Articulated Rigid Bodies.pdf:application/pdf}, } -@article{dulac-arnold_empirical_2020, - title = {An empirical investigation of the challenges of real-world reinforcement learning}, - url = {http://arxiv.org/abs/2003.11881}, - abstract = {Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.}, +@article{howell_dojo_2022, + title = {Dojo: {A} {Differentiable} {Simulator} for {Robotics}}, + shorttitle = {Dojo}, + url = {http://arxiv.org/abs/2203.00806}, + abstract = {We present a differentiable rigid-body-dynamics simulator for robotics that prioritizes physical accuracy and differentiability: Dojo. The simulator utilizes an expressive maximal-coordinates representation, achieves stable simulation at low sample rates, and conserves energy and momentum by employing a variational integrator. A nonlinear complementarity problem, with nonlinear friction cones, models hard contact and is reliably solved using a custom primal-dual interiorpoint method. The implicit-function theorem enables efficient differentiation of an intermediate relaxed problem and computes smooth gradients from the contact model. We demonstrate the usefulness of the simulator and its gradients through a number of examples including: simulation, trajectory optimization, reinforcement learning, and system identification.}, language = {en}, - urldate = {2020-07-25}, - journal = {arXiv:2003.11881 [cs]}, - author = {Dulac-Arnold, Gabriel and Levine, Nir and Mankowitz, Daniel J. and Li, Jerry and Paduraru, Cosmin and Gowal, Sven and Hester, Todd}, + urldate = {2022-04-13}, + journal = {arXiv:2203.00806 [cs]}, + author = {Howell, Taylor A. and Cleac'h, Simon Le and Kolter, J. Zico and Schwager, Mac and Manchester, Zachary}, month = mar, - year = {2020}, - note = {arXiv: 2003.11881}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence}, - file = {Dulac-Arnold et al_2020_An empirical investigation of the challenges of real-world reinforcement.pdf:/home/dferigo/Zotero/storage/X77JWG9W/Dulac-Arnold et al_2020_An empirical investigation of the challenges of real-world reinforcement.pdf:application/pdf}, + year = {2022}, + note = {arXiv: 2203.00806}, + keywords = {Computer Science - Robotics}, + file = {Howell et al_2022_Dojo.pdf:/home/dferigo/Zotero/storage/SRC36QEU/Howell et al_2022_Dojo.pdf:application/pdf}, } -@article{dulac-arnold_empirical_2021, - title = {An empirical investigation of the challenges of real-world reinforcement learning}, - url = {http://arxiv.org/abs/2003.11881}, - abstract = {Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.}, +@article{gillen_leveraging_2022, + title = {Leveraging {Reward} {Gradients} {For} {Reinforcement} {Learning} in {Differentiable} {Physics} {Simulations}}, + url = {http://arxiv.org/abs/2203.02857}, + abstract = {In recent years, fully differentiable rigid body physics simulators have been developed, which can be used to simulate a wide range of robotic systems. In the context of reinforcement learning for control, these simulators theoretically allow algorithms to be applied directly to analytic gradients of the reward function. However, to date, these gradients have proved extremely challenging to use, and are outclassed by algorithms using no gradient information at all. In this work we present a novel algorithm, cross entropy analytic policy gradients, that is able to leverage these gradients to outperform state of art deep reinforcement learning on a certain set of challenging nonlinear control problems.}, language = {en}, urldate = {2022-04-13}, - journal = {arXiv:2003.11881 [cs]}, - author = {Dulac-Arnold, Gabriel and Levine, Nir and Mankowitz, Daniel J. and Li, Jerry and Paduraru, Cosmin and Gowal, Sven and Hester, Todd}, + journal = {arXiv:2203.02857 [cs, eess]}, + author = {Gillen, Sean and Byl, Katie}, month = mar, - year = {2021}, - note = {arXiv: 2003.11881}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence}, - file = {Dulac-Arnold et al_2021_An empirical investigation of the challenges of real-world reinforcement.pdf:/home/dferigo/Zotero/storage/FN6Z7NFN/Dulac-Arnold et al_2021_An empirical investigation of the challenges of real-world reinforcement.pdf:application/pdf}, + year = {2022}, + note = {arXiv: 2203.02857}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control}, + file = {Gillen_Byl_2022_Leveraging Reward Gradients For Reinforcement Learning in Differentiable.pdf:/home/dferigo/Zotero/storage/2SG33NTS/Gillen_Byl_2022_Leveraging Reward Gradients For Reinforcement Learning in Differentiable.pdf:application/pdf}, } -@article{sola_micro_2020, - title = {A micro {Lie} theory for state estimation in robotics}, - url = {http://arxiv.org/abs/1812.01537}, - abstract = {A Lie group is an old mathematical abstract object dating back to the XIX century, when mathematician Sophus Lie laid the foundations of the theory of continuous transformation groups. Its influence has spread over diverse areas of science and technology many years later. In robotics, we are recently experiencing an important trend in its usage, at least in the fields of estimation, and particularly in motion estimation for navigation. Yet for a vast majority of roboticians, Lie groups are highly abstract constructions and therefore difficult to understand and to use.}, +@article{saccon_centroidal_2017, + title = {On {Centroidal} {Dynamics} and {Integrability} of {Average} {Angular} {Velocity}}, + volume = {2}, + issn = {2377-3766, 2377-3774}, + url = {http://arxiv.org/abs/1701.02514}, + doi = {10.1109/LRA.2017.2655560}, + abstract = {In the literature on robotics and multibody dynamics, the concept of average angular velocity has received considerable attention in recent years. We address the question of whether the average angular velocity defines an orientation frame that depends only on the current robot configuration and provide a simple algebraic condition to check whether this holds. In the language of geometric mechanics, this condition corresponds to requiring the flatness of the mechanical connection associated to the robotic system. Here, however, we provide both a reinterpretation and a proof of this result accessible to readers with a background in rigid body kinematics and multibody dynamics but not necessarily acquainted with differential geometry, still providing precise links to the geometric mechanics literature. This should help spreading the algebraic condition beyond the scope of geometric mechanics, contributing to a proper utilization and understanding of the concept of average angular velocity.}, language = {en}, - urldate = {2021-11-26}, - journal = {arXiv:1812.01537 [cs]}, - author = {Solà, Joan and Deray, Jeremie and Atchuthan, Dinesh}, - month = nov, - year = {2020}, - note = {arXiv: 1812.01537}, + number = {2}, + urldate = {2022-04-13}, + journal = {IEEE Robotics and Automation Letters}, + author = {Saccon, Alessandro and Traversaro, Silvio and Nori, Francesco and Nijmeijer, Henk}, + month = apr, + year = {2017}, + note = {arXiv: 1701.02514}, keywords = {Computer Science - Robotics}, - file = {Solà et al_2020_A micro Lie theory for state estimation in robotics.pdf:/home/dferigo/Zotero/storage/IBW6W28A/Solà et al_2020_A micro Lie theory for state estimation in robotics.pdf:application/pdf}, + pages = {943--950}, + file = {Saccon et al_2017_On Centroidal Dynamics and Integrability of Average Angular Velocity.pdf:/home/dferigo/Zotero/storage/6P4LEJ7T/Saccon et al_2017_On Centroidal Dynamics and Integrability of Average Angular Velocity.pdf:application/pdf}, } -@inproceedings{stulp_reinforcement_2010, - title = {Reinforcement learning of full-body humanoid motor skills}, - isbn = {978-1-4244-8688-5}, - url = {http://ieeexplore.ieee.org/document/5686320/}, - doi = {10.1109/ICHR.2010.5686320}, - abstract = {Applying reinforcement learning to humanoid robots is challenging because humanoids have a large number of degrees of freedom and state and action spaces are continuous. Thus, most reinforcement learning algorithms would become computationally infeasible and require a prohibitive amount of trials to explore such high-dimensional spaces. In this paper, we present a probabilistic reinforcement learning approach, which is derived from the framework of stochastic optimal control and path integrals. The algorithm, called Policy Improvement with Path Integrals (PI2), has a surprisingly simple form, has no open tuning parameters besides the exploration noise, is modelfree, and performs numerically robustly in high dimensional learning problems. We demonstrate how PI2 is able to learn fullbody motor skills on a 34-DOF humanoid robot. To demonstrate the generality of our approach, we also apply PI2 in the context of variable impedance control, where both planned trajectories and gain schedules for each joint are optimized simultaneously.}, - language = {en}, - urldate = {2018-06-17}, - publisher = {IEEE}, - author = {Stulp, Freek and Buchli, Jonas and Theodorou, Evangelos and Schaal, Stefan}, - month = dec, - year = {2010}, - pages = {405--410}, - file = {Stulp et al_2010_Reinforcement learning of full-body humanoid motor skills.pdf:/home/dferigo/Zotero/storage/4H6NFGV6/Stulp et al_2010_Reinforcement learning of full-body humanoid motor skills.pdf:application/pdf}, +@article{schulman_proximal_2017, + title = {Proximal {Policy} {Optimization} {Algorithms}}, + abstract = {We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.}, + urldate = {2020-05-07}, + journal = {arXiv}, + author = {Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg}, + year = {2017}, + keywords = {Computer Science - Machine Learning}, } -@article{busoniu_reinforcement_2018, - title = {Reinforcement learning for control: {Performance}, stability, and deep approximators}, - volume = {46}, - issn = {13675788}, - shorttitle = {Reinforcement learning for control}, - url = {https://linkinghub.elsevier.com/retrieve/pii/S1367578818301184}, - doi = {10.1016/j.arcontrol.2018.09.005}, - abstract = {Reinforcement learning (RL) offers powerful algorithms to search for optimal controllers of systems with nonlinear, possibly stochastic dynamics that are unknown or highly uncertain. This review mainly covers artificial-intelligence approaches to RL, from the viewpoint of the control engineer. We explain how approximate representations of the solution make RL feasible for problems with continuous states and control actions. Stability is a central concern in control, and we argue that while the control-theoretic RL subfield called adaptive dynamic programming is dedicated to it, stability of RL largely remains an open question. We also cover in detail the case where deep neural networks are used for approximation, leading to the field of deep RL, which has shown great success in recent years. With the control practitioner in mind, we outline opportunities and pitfalls of deep RL; and we close the survey with an outlook that – among other things – points out some avenues for bridging the gap between control and artificial-intelligence RL techniques.}, +@article{schulman_trust_2017, + title = {Trust {Region} {Policy} {Optimization}}, + url = {http://arxiv.org/abs/1502.05477}, + abstract = {We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.}, language = {en}, - urldate = {2018-12-17}, - journal = {Annual Reviews in Control}, - author = {Buşoniu, Lucian and de Bruin, Tim and Tolić, Domagoj and Kober, Jens and Palunko, Ivana}, - year = {2018}, - pages = {8--28}, - file = {Buşoniu et al_2018_Reinforcement learning for control.pdf:/home/dferigo/Zotero/storage/KWV5Z2VL/Buşoniu et al_2018_Reinforcement learning for control.pdf:application/pdf}, + urldate = {2020-07-22}, + journal = {arXiv:1502.05477 [cs]}, + author = {Schulman, John and Levine, Sergey and Moritz, Philipp and Jordan, Michael I. and Abbeel, Pieter}, + month = apr, + year = {2017}, + note = {arXiv: 1502.05477}, + keywords = {Computer Science - Machine Learning}, } -@article{fabisch_survey_2019, - title = {A {Survey} of {Behavior} {Learning} {Applications} in {Robotics} -- {State} of the {Art} and {Perspectives}}, - url = {http://arxiv.org/abs/1906.01868}, - abstract = {Recent success of machine learning in many domains has been overwhelming, which often leads to false expectations regarding the capabilities of behavior learning in robotics. In this survey, we analyze the current state of machine learning for robotic behaviors. We will give a broad overview of behaviors that have been learned and used on real robots. Our focus is on kinematically or sensorially complex robots. That includes humanoid robots or parts of humanoid robots, for example, legged robots or robotic arms. We will classify presented behaviors according to various categories and we will draw conclusions about what can be learned and what should be learned. Furthermore, we will give an outlook on problems that are challenging today but might be solved by machine learning in the future and argue that classical robotics and other approaches from artificial intelligence should be integrated more with machine learning to form complete, autonomous systems.}, - language = {en}, - urldate = {2019-06-27}, - journal = {arXiv:1906.01868 [cs]}, - author = {Fabisch, Alexander and Petzoldt, Christoph and Otto, Marc and Kirchner, Frank}, - month = jun, - year = {2019}, - note = {arXiv: 1906.01868}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {Fabisch et al_2019_A Survey of Behavior Learning Applications in Robotics -- State of the Art and.pdf:/home/dferigo/Zotero/storage/R9NX3LR5/Fabisch et al_2019_A Survey of Behavior Learning Applications in Robotics -- State of the Art and.pdf:application/pdf}, +@article{peng_learning_2017, + title = {Learning locomotion skills using {DeepRL}: does the choice of action space matter?}, + shorttitle = {Learning locomotion skills using {DeepRL}}, + abstract = {The use of deep reinforcement learning allows for high-dimensional state descriptors, but little is known about how the choice of action representation impacts learning and the resulting performance. We compare the impact of four different action parameterizations (torques, muscle-activations, target joint angles, and target joint-angle velocities) in terms of learning time, policy robustness, motion quality, and policy query rates. Our results are evaluated on a gait-cycle imitation task for multiple planar articulated figures and multiple gaits. We demonstrate that the local feedback provided by higher-level action parameterizations can significantly impact the learning, robustness, and motion quality of the resulting policies.}, + journal = {SIGGRAPH}, + author = {Peng, Xue Bin and van de Panne, Michiel}, + year = {2017}, + keywords = {motion control, locomotion skills, physics-based character animation}, } -@article{chatzilygeroudis_survey_2020, - title = {A {Survey} on {Policy} {Search} {Algorithms} for {Learning} {Robot} {Controllers} in a {Handful} of {Trials}}, - volume = {36}, - issn = {1941-0468}, - doi = {10.1109/TRO.2019.2958211}, - abstract = {Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data,” we refer to this challenge as “micro-data reinforcement learning.” In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.}, - number = {2}, - journal = {IEEE Transactions on Robotics}, - author = {Chatzilygeroudis, Konstantinos and Vassiliades, Vassilis and Stulp, Freek and Calinon, Sylvain and Mouret, Jean-Baptiste}, +@article{peng_learning_2020, + title = {Learning {Agile} {Robotic} {Locomotion} {Skills} by {Imitating} {Animals}}, + url = {http://arxiv.org/abs/2004.00784}, + abstract = {Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.}, + urldate = {2020-05-07}, + journal = {arXiv:2004.00784 [cs]}, + author = {Peng, Xue Bin and Coumans, Erwin and Zhang, Tingnan and Lee, Tsang-Wei and Tan, Jie and Levine, Sergey}, month = apr, year = {2020}, - keywords = {Autonomous agents, learning and adaptive systems, robot learning, micro-data policy search (MDPS)}, - pages = {328--347}, - file = {Chatzilygeroudis et al_2020_A Survey on Policy Search Algorithms for Learning Robot Controllers in a.pdf:/home/dferigo/Zotero/storage/VLFIF5C6/Chatzilygeroudis et al_2020_A Survey on Policy Search Algorithms for Learning Robot Controllers in a.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/I636FFXG/8944013.html:text/html}, + note = {arXiv: 2004.00784}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, } -@article{koberReinforcementLearningRobotics2013, - title = {Reinforcement {Learning} in {Robotics}: {A} {Survey}}, - language = {en}, - journal = {International Journal of Robotics Research}, - author = {Kober, Jens and Bagnell, J Andrew and Peters, Jan}, - year = {2013}, - pages = {38}, - file = {Kober2013-Reinforcement_Learning_in_Robotics.pdf:/home/dferigo/Zotero/storage/W3LHRYT7/Kober2013-Reinforcement_Learning_in_Robotics.pdf:application/pdf;Kober2013-Reinforcement_Learning_in_Robotics.pdf:/home/dferigo/Zotero/storage/858JLQET/Kober2013-Reinforcement_Learning_in_Robotics.pdf:application/pdf}, +@article{peng_deepmimic_2018, + title = {{DeepMimic}: {Example}-{Guided} {Deep} {Reinforcement} {Learning} of {Physics}-{Based} {Character} {Skills}}, + abstract = {A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We show that well-known reinforcement learning (RL) methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals. Our method handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings, e.g., by walking in a desired direction or throwing a ball at a user-specified target. This approach thus combines the convenience and motion quality of using motion clips to define the desired style and appearance, with the flexibility and generality afforded by RL methods and physics-based animation. We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills. We demonstrate results using multiple characters (human, Atlas robot, bipedal dinosaur, dragon) and a large variety of skills, including locomotion, acrobatics, and martial arts.}, + urldate = {2019-09-24}, + journal = {ACM Transactions on Graphics}, + author = {Peng, Xue Bin and Abbeel, Pieter and Levine, Sergey and van de Panne, Michiel}, + month = jul, + year = {2018}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Graphics}, } -@article{narendra_identification_1990, - title = {Identification and control of dynamical systems using neural networks}, - volume = {1}, - issn = {10459227}, - url = {http://ieeexplore.ieee.org/document/80202/}, - doi = {10.1109/72.80202}, - abstract = {The paper demonstrates that neural networks can be used effectively for the identification and control of nonlinear dynamical systems. The emphasis of the paper is on models for both identification and control. Static and dynamic back-propagation methods for the adjustment of parameters are discussed. In the models that are introduced, multilayer and recurrent networks are interconnected in novel configurations and hence there is a real need to study them in a unified fashion. Simulation results reveal that the identification and adaptive control schemes suggested are practically feasible. Basic concepts and definitions are introduced throughout the paper, and theoretical questions which have to be addressed are also described.}, +@article{metta_icub_2010, + title = {The {iCub} humanoid robot: {An} open-systems platform for research in cognitive development}, + shorttitle = {The {iCub} humanoid robot}, + abstract = {We describe a humanoid robot platform — the iCub — which was designed to support collaborative research in cognitive development through autonomous exploration and social interaction. The motivation for this effort is the conviction that significantly greater impact can be leveraged by adopting an open systems policy for software and hardware development. This creates the need for a robust humanoid robot that offers rich perceptuo-motor capabilities with many degrees of freedom, a cognitive capacity for learning and development, a software architecture that encourages reuse \& easy integration, and a support infrastructure that fosters collaboration and sharing of resources. The iCub satisfies all of these needs in the guise of an open-system platform which is freely available and which has attracted a growing community of users and developers. To date, twenty iCubs each comprising approximately 5000 mechanical and electrical parts have been delivered to several research labs in Europe and to one in the USA.}, language = {en}, - number = {1}, - urldate = {2022-04-15}, - journal = {IEEE Transactions on Neural Networks}, - author = {Narendra, K.S. and Parthasarathy, K.}, - month = mar, - year = {1990}, - pages = {4--27}, - file = {Narendra_Parthasarathy_1990_Identification and control of dynamical systems using neural networks.pdf:/home/dferigo/Zotero/storage/T9ITSBDR/Narendra_Parthasarathy_1990_Identification and control of dynamical systems using neural networks.pdf:application/pdf}, + urldate = {2020-07-22}, + journal = {Neural Networks}, + author = {Metta, Giorgio and Natale, Lorenzo and Nori, Francesco and Sandini, Giulio and Vernon, David and Fadiga, Luciano and von Hofsten, Claes and Rosander, Kerstin and Lopes, Manuel and Santos-Victor, José and Bernardino, Alexandre and Montesano, Luis}, + year = {2010}, } -@article{lin_reinforcement_1993, - title = {Reinforcement {Learning} for {Robots} {Using} {Neural} {Networks}}, +@article{lillicrap_continuous_2016-1, + title = {Continuous {Control} with {Deep} {Reinforcement} {Learning}}, + abstract = {We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.}, language = {en}, - author = {Lin, Long-Ji}, - year = {1993}, - pages = {168}, - file = {Lin_1993_Reinforcement Learning for Robots Using Neural Networks.pdf:/home/dferigo/Zotero/storage/Y2VMYRV8/Lin_1993_Reinforcement Learning for Robots Using Neural Networks.pdf:application/pdf}, + journal = {ICLR}, + author = {Lillicrap, Timothy P and Hunt, Jonathan J and Pritzel, Alexander and Heess, Nicolas and Erez, Tom and Tassa, Yuval and Silver, David and Wierstra, Daan}, + year = {2016}, + keywords = {deepmind}, } -@phdthesis{watkins_christopher_learning_1989, - title = {Learning from {Delayed} {Rewards}}, - school = {King's College}, - author = {Watkins, Christopher}, - year = {1989}, - file = {Watkins, Christopher John Cornish Hellaby_Learning from Delayed Rewards.pdf:/home/dferigo/Zotero/storage/BV9UA3K9/Watkins, Christopher John Cornish Hellaby_Learning from Delayed Rewards.pdf:application/pdf}, +@article{liang_rllib_2018, + title = {{RLlib}: {Abstractions} for {Distributed} {Reinforcement} {Learning}}, + abstract = {Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project 1.}, + journal = {arXiv}, + author = {Liang, Eric and Liaw, Richard and Moritz, Philipp and Nishihara, Robert and Fox, Roy and Goldberg, Ken and Gonzalez, Joseph E. and Jordan, Michael I. and Stoica, Ion}, + year = {2018}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing}, } -@article{tesauro_td-gammon_1994, - title = {{TD}-{Gammon}, a {Self}-{Teaching} {Backgammon} {Program}, {Achieves} {Master}-{Level} {Play}}, - volume = {6}, - issn = {0899-7667, 1530-888X}, - url = {https://direct.mit.edu/neco/article/6/2/215-219/5771}, - doi = {10.1162/neco.1994.6.2.215}, - abstract = {TD-Gammonis a neural network that is able to teach itself to play backgammosnolely by playing against itself and learning from the results, based on the TD(A)reinforcement learning algorithm (Sutton, 1988). Despite starting from random initial weights (and hence random initial strategy), TD-Gammoanchieves a surprisingly strong level of play. With zero knowledge built in at the start of learning (i.e. given only a "raw" description of the board state), the network learns to play at a strong intermediate level. Furthermore, when a set of hand-crafted features is added to the network’s input representation, the result is a truly staggering level of performance: the latest version of TD-Gammoisn nowestimated to play at a strong master level that is extremely close to the world’s best humanplayers.}, +@inproceedings{yang_learning_2018, + title = {Learning {Whole}-{Body} {Motor} {Skills} for {Humanoids}}, + doi = {10.1109/HUMANOIDS.2018.8625045}, + abstract = {This paper presents a hierarchical framework for Deep Reinforcement Learning that acquires motor skills for a variety of push recovery and balancing behaviors, i.e., ankle, hip, foot tilting, and stepping strategies. The policy is trained in a physics simulator with realistic setting of robot model and low-level impedance control that are easy to transfer the learned skills to real robots. The advantage over traditional methods is the integration of high-level planner and feedback control all in one single coherent policy network, which is generic for learning versatile balancing and recovery motions against unknown perturbations at arbitrary locations (e.g., legs, torso). Furthermore, the proposed framework allows the policy to be learned quickly by many state-of-the-art learning algorithms. By comparing our learned results to studies of preprogrammed, special-purpose controllers in the literature, self-learned skills are comparable in terms of disturbance rejection but with additional advantages of producing a wide range of adaptive, versatile and robust behaviors.}, + booktitle = {2018 {IEEE}-{RAS} 18th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, + author = {Yang, C. and Yuan, K. and Merkt, W. and Komura, T. and Vijayakumar, S. and Li, Z.}, + month = nov, + year = {2018}, + keywords = {Reinforcement learning, Aerospace electronics, Humanoid robots, humanoids, humanoid robots, learning (artificial intelligence), legged locomotion, feedback, adaptive behaviors, ankle, arbitrary locations, balancing behaviors, Collision avoidance, Deep Reinforcement, feedback control, foot tilting, hierarchical framework, high-level planner, hip, learning algorithms, low-level impedance control, path planning, PD control, preprogrammed purpose controllers, push recovery, realistic setting, recovery motions, robot model, robust behaviors, self-learned skills, single coherent policy network, special-purpose controllers, stepping strategies, unknown perturbations, versatile balancing, versatile behaviors, whole-body motor skills}, + pages = {270--276}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/IDSJNM9R/8625045.html:text/html;Yang et al_2018_Learning Whole-Body Motor Skills for Humanoids.pdf:/home/dferigo/Zotero/storage/TNUZ5X3Q/Yang et al_2018_Learning Whole-Body Motor Skills for Humanoids.pdf:application/pdf}, +} + +@article{xieIterativeReinforcementLearning2019s, + title = {Iterative {Reinforcement} {Learning} {Based} {Design} of {Dynamic} {Locomotion} {Skills} for {Cassie}}, + abstract = {Deep reinforcement learning (DRL) is a promising approach for developing legged locomotion skills. However, the iterative design process that is inevitable in practice is poorly supported by the default methodology. It is difficult to predict the outcomes of changes made to the reward functions, policy architectures, and the set of tasks being trained on. In this paper, we propose a practical method that allows the reward function to be fully redefined on each successive design iteration while limiting the deviation from the previous iteration. We characterize policies via sets of Deterministic Action Stochastic State (DASS) tuples, which represent the deterministic policy state-action pairs as sampled from the states visited by the trained stochastic policy. New policies are trained using a policy gradient algorithm which then mixes RL-based policy gradients with gradient updates defined by the DASS tuples. The tuples also allow for robust policy distillation to new network architectures. We demonstrate the effectiveness of this iterative-design approach on the bipedal robot Cassie, achieving stable walking with different gait styles at various speeds. We demonstrate the successful transfer of policies learned in simulation to the physical robot without any dynamics randomization, and that variable-speed walking policies for the physical robot can be represented by a small dataset of 5-10k tuples.}, language = {en}, - number = {2}, - urldate = {2022-04-15}, - journal = {Neural Computation}, - author = {Tesauro, Gerald}, + urldate = {2019-04-05}, + author = {Xie, Zhaoming and Clary, Patrick and Dao, Jeremy and Morais, Pedro and Hurst, Jonathan and van de Panne, Michiel}, month = mar, - year = {1994}, - pages = {215--219}, - file = {Tesauro_1994_TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play.pdf:/home/dferigo/Zotero/storage/9C9X9HWP/Tesauro_1994_TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play.pdf:application/pdf}, + year = {2019}, + keywords = {Computer Science - Robotics}, + file = {Xie et al_2019_Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for.pdf:/home/dferigo/Zotero/storage/8MUYACWD/Xie et al_2019_Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for.pdf:application/pdf}, } -@article{sutton_learning_1988, - title = {Learning to predict by the methods of temporal differences}, - volume = {3}, - issn = {0885-6125, 1573-0565}, - url = {http://link.springer.com/10.1007/BF00115009}, - doi = {10.1007/BF00115009}, - abstract = {This article introduces a class of incremental learning procedures specialized for prediction that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, tile new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference method{\textasciitilde} have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, telnporal-differenee methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporaldifference methods can be applied to advantage.}, +@article{liangGPUAcceleratedRoboticSimulation2018s, + title = {{GPU}-{Accelerated} {Robotic} {Simulation} for {Distributed} {Reinforcement} {Learning}}, + abstract = {Most Deep Reinforcement Learning (Deep RL) algorithms require a prohibitively large number of training samples for learning complex tasks. Many recent works on speeding up Deep RL have focused on distributed training and simulation. While distributed training is often done on the GPU, simulation is not. In this work, we propose using GPU-accelerated RL simulations as an alternative to CPU ones. Using NVIDIA Flex, a GPU-based physics engine, we show promising speed-ups of learning various continuous-control, locomotion tasks. With one GPU and CPU core, we are able to train the Humanoid running task in less than 20 minutes, using 10 − 1000× fewer CPU cores than previous works. We also demonstrate the scalability of our simulator to multi-GPU settings to train more challenging locomotion tasks.}, language = {en}, - number = {1}, - urldate = {2022-04-15}, - journal = {Machine Learning}, - author = {Sutton, Richard S.}, - month = aug, - year = {1988}, - pages = {9--44}, - file = {Sutton_1988_Learning to predict by the methods of temporal differences.pdf:/home/dferigo/Zotero/storage/XBWG7D4V/Sutton_1988_Learning to predict by the methods of temporal differences.pdf:application/pdf}, + urldate = {2019-06-19}, + author = {Liang, Jacky and Makoviychuk, Viktor and Handa, Ankur and Chentanez, Nuttapong and Macklin, Miles and Fox, Dieter}, + month = oct, + year = {2018}, + note = {arXiv: 1810.05762}, + keywords = {Computer Science - Robotics}, + file = {Liang et al_2018_GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/WHNFWU9F/Liang et al_2018_GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning.pdf:application/pdf}, } -@article{williams_simple_1992, - title = {Simple statistical gradient-following algorithms for connectionist reinforcement learning}, - volume = {8}, - issn = {1573-0565}, - url = {https://doi.org/10.1007/BF00992696}, - doi = {10.1007/BF00992696}, - abstract = {This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units. These algorithms, called REINFORCE algorithms, are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reinforcement tasks, and they do this without explicitly computing gradient estimates or even storing information from which such estimates could be computed. Specific examples of such algorithms are presented, some of which bear a close relationship to certain existing algorithms while others are novel but potentially interesting in their own right. Also given are results that show how such algorithms can be naturally integrated with backpropagation. We close with a brief discussion of a number of additional issues surrounding the use of such algorithms, including what is known about their limiting behaviors as well as further considerations that might be used to help develop similar but potentially more powerful reinforcement learning algorithms.}, - language = {en}, - number = {3}, - urldate = {2022-04-15}, - journal = {Machine Learning}, - author = {Williams, Ronald J.}, - month = may, - year = {1992}, - pages = {229--256}, - file = {Williams_1992_Simple statistical gradient-following algorithms for connectionist.pdf:/home/dferigo/Zotero/storage/TICDQE49/Williams_1992_Simple statistical gradient-following algorithms for connectionist.pdf:application/pdf}, +@inproceedings{ivaldiToolsSimulatingHumanoid2014s, + title = {Tools for simulating humanoid robot dynamics: {A} survey based on user feedback}, + shorttitle = {Tools for simulating humanoid robot dynamics}, + doi = {10.1109/HUMANOIDS.2014.7041462}, + abstract = {The number of tools for dynamics simulation has grown substantially in the last few years. Humanoid robots, in particular, make extensive use of such tools for a variety of applications, from simulating contacts to planning complex motions. It is necessary for the humanoid robotics community to have a systematic evaluation to assist in choosing which of the available tools is best for their research. This paper surveys the state of the art in dynamics simulation and reports on the analysis of an online survey about the use of dynamics simulation in the robotics research community. The major requirements for robotics researchers are better physics engines and open-source software. Despite the numerous tools, there is not a general-purpose simulator which dominates the others in terms of performance or application. However, for humanoid robotics, Gazebo emerges as the best choice among the open-source projects, while V-Rep is the preferred commercial simulator. The survey report has been instrumental for choosing Gazebo as the base for the new simulator for the iCub humanoid robot.}, + booktitle = {{IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, + author = {Ivaldi, S. and Peters, J. and Padois, V. and Nori, F.}, + month = nov, + year = {2014}, + keywords = {Robots, humanoid robots, iCub humanoid robot, Open source software, open-source software, control engineering computing, Collision avoidance, Engines, physics engine, Physics, Communities, dynamics simulation, Gazebo robotics, general-purpose simulator, humanoid robot dynamics, robotics research community, user feedback, V-Rep simulator}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/3MNDLKGL/7041462.html:text/html;Ivaldi et al_2014_Tools for simulating humanoid robot dynamics.pdf:/home/dferigo/Zotero/storage/UHMX6A28/Ivaldi et al_2014_Tools for simulating humanoid robot dynamics.pdf:application/pdf}, } -@techreport{rummery_-line_1994, - title = {On-{Line} {Q}-{Learning} {Using} {Connectionist} {Systems}}, - abstract = {Reinforcement learning algorithms are a powerful machine learning technique. However, much of the work on these algorithms has been developed with regard to discrete finite-state Markovian problems, which is too restrictive for many real-world environments. Therefore, it is desirable to extend these methods to high dimensional continuous state-spaces, which requires the use of function approximation to generalise the information learnt by the system. In this report, the use of back-propagation neural networks (Rumelhart, Hinton and Williams 1986) is considered in this context. We consider a number of different algorithms based around Q-Learning (Watkins 1989) combined with the Temporal Difference algorithm (Sutton 1988), including a new algorithm (Modified Connectionist Q-Learning), and Q() (Peng and Williams 1994). In addition, we present algorithms for applying these updates on-line during trials, unlike backward replay used by Lin (1993) that requires waiting until the end of each t...}, - author = {Rummery, G. A. and Niranjan, M.}, - year = {1994}, - file = {Citeseer - Snapshot:/home/dferigo/Zotero/storage/IBKMIYHG/summary.html:text/html;Rummery_Niranjan_1994_On-Line Q-Learning Using Connectionist Systems.pdf:/home/dferigo/Zotero/storage/RP9VWYGZ/Rummery_Niranjan_1994_On-Line Q-Learning Using Connectionist Systems.pdf:application/pdf}, +@inproceedings{erezSimulationToolsModelbased2015s, + title = {Simulation tools for model-based robotics: {Comparison} of {Bullet}, {Havok}, {MuJoCo}, {ODE} and {PhysX}}, + shorttitle = {Simulation tools for model-based robotics}, + doi = {10.1109/ICRA.2015.7139807}, + abstract = {There is growing need for software tools that can accurately simulate the complex dynamics of modern robots. While a number of candidates exist, the field is fragmented. It is difficult to select the best tool for a given project, or to predict how much effort will be needed and what the ultimate simulation performance will be. Here we introduce new quantitative measures of simulation performance, focusing on the numerical challenges that are typical for robotics as opposed to multi-body dynamics and gaming. We then present extensive simulation results, obtained within a new software framework for instantiating the same model in multiple engines and running side-by-side comparisons. Overall we find that each engine performs best on the type of system it was designed and optimized for: MuJoCo wins the robotics-related tests, while the gaming engines win the gaming-related tests without a clear leader among them. The simulations are illustrated in the accompanying movie.}, + author = {Erez, T. and Tassa, Y. and Todorov, E.}, + year = {2015}, + keywords = {Computational modeling, Joints, Robot kinematics, middleware, Mathematical model, control engineering computing, Engines, MuJoCo, Accuracy, Bullet, digital simulation, gaming engines, Havok, model-based robotics, ODE, PhysX, robots, simulation tools}, + file = {Erez et al_2015_Simulation tools for model-based robotics.pdf:/home/dferigo/Zotero/storage/QIJZ3BFR/Erez et al_2015_Simulation tools for model-based robotics.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/KH8DWX7R/7139807.html:text/html}, } -@article{hinton_fast_2006, - title = {A {Fast} {Learning} {Algorithm} for {Deep} {Belief} {Nets}}, - volume = {18}, - issn = {0899-7667, 1530-888X}, - url = {https://direct.mit.edu/neco/article/18/7/1527-1554/7065}, - doi = {10.1162/neco.2006.18.7.1527}, - abstract = {We show how to use “complementary priors” to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.}, +@article{dulac-arnoldChallengesRealWorldReinforcement2019s, + title = {Challenges of {Real}-{World} {Reinforcement} {Learning}}, + abstract = {Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in realworld systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real world problems. We also present an example domain that has been modified to present these challenges as a testbed for practical RL research.}, language = {en}, - number = {7}, - urldate = {2022-04-15}, - journal = {Neural Computation}, - author = {Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee-Whye}, - month = jul, - year = {2006}, - pages = {1527--1554}, - file = {Hinton et al_2006_A Fast Learning Algorithm for Deep Belief Nets.pdf:/home/dferigo/Zotero/storage/IMEASWQY/Hinton et al_2006_A Fast Learning Algorithm for Deep Belief Nets.pdf:application/pdf}, + urldate = {2019-05-22}, + author = {Dulac-Arnold, Gabriel and Mankowitz, Daniel and Hester, Todd}, + month = apr, + year = {2019}, + note = {arXiv: 1904.12901}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Statistics - Machine Learning}, + file = {Dulac-Arnold et al_2019_Challenges of Real-World Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/56EVTTDV/Dulac-Arnold et al_2019_Challenges of Real-World Reinforcement Learning.pdf:application/pdf}, } -@inproceedings{peters_reinforcement_2003, - title = {Reinforcement {Learning} for {Humanoid} {Robotics}}, - abstract = {Reinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be coarsely classified into three different categories, i.e., greedy methods, ‘vanilla’ policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. ‘Vanilla’ policy gradient methods on the other hand have been successfully applied on real-world robots including at least one humanoid robot [3]. We demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. A derivation of the natural policy gradient is provided, proving that the average policy gradient of Kakade [10] is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges to the nearest local minimum of the cost function with respect to the Fisher information metric under suitable conditions. The algorithm outperforms non-natural policy gradients by far in a cart-pole balancing evaluation, and for learning nonlinear dynamic motor primitives for humanoid robot control. It offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems.}, +@inproceedings{azad_modeling_2010, + title = {Modeling the contact between a rolling sphere and a compliant ground plane}, + copyright = {http://legaloffice.weblogs.anu.edu.au/content/copyright/}, + isbn = {978-0-9807404-1-7}, + url = {https://openresearch-repository.anu.edu.au/handle/1885/62522}, + abstract = {In this paper a complete 3D contact model is presented. This model has nonlinearity in determining both normal and friction forces. A new nonlinear normal force model is introduced and the differences between this new model and previous ones are described. Also, the characteristics of this model are discussed and compared with the classical models and empirical results. For calculating the friction force, a new nonlinear model which is able to calculate pre-sliding displacement and viscous friction is presented. It is shown that this model allows us to keep track of all the energy in the system and therefore, supports an energy audit. Also, the rolling motion of a sphere on a compliant ground plane is simulated, and the results are presented. Rolling motion is an integral part of simulating the general 3D motion of a robot's foot while in contact with the ground.}, language = {en}, - booktitle = {Proceedings of the third {IEEE}-{RAS} international conference on humanoid robots}, - author = {Peters, Jan and Vijayakumar, Sethu and Schaal, Stefan}, - year = {2003}, - pages = {20}, - file = {Peters et al_Reinforcement Learning for Humanoid Robotics.pdf:/home/dferigo/Zotero/storage/WVS83I5T/Peters et al_Reinforcement Learning for Humanoid Robotics.pdf:application/pdf}, + urldate = {2022-03-03}, + booktitle = {Proceedings of {ACRA} 2010}, + publisher = {Australian Robotics and Automation Association}, + author = {Azad, Morteza and Featherstone, Roy}, + year = {2010}, + note = {Accepted: 2015-12-10T23:05:49Z +Last Modified: 2020-05-19}, } -@article{benbrahim_biped_1997, - title = {Biped dynamic walking using reinforcement learning}, - volume = {22}, - issn = {09218890}, - url = {https://linkinghub.elsevier.com/retrieve/pii/S0921889097000432}, - doi = {10.1016/S0921-8890(97)00043-2}, - language = {en}, - number = {3-4}, - urldate = {2022-04-15}, - journal = {Robotics and Autonomous Systems}, - author = {Benbrahim, Hamid and Franklin, Judy A.}, - month = dec, - year = {1997}, - pages = {283--302}, - file = {Benbrahim_Franklin_1997_Biped dynamic walking using reinforcement learning.pdf:/home/dferigo/Zotero/storage/2JYIXAGY/Benbrahim_Franklin_1997_Biped dynamic walking using reinforcement learning.pdf:application/pdf}, +@article{azad_new_2014, + title = {A {New} {Nonlinear} {Model} of {Contact} {Normal} {Force}}, + volume = {30}, + issn = {1941-0468}, + doi = {10.1109/TRO.2013.2293833}, + abstract = {This paper presents a new nonlinear model of the normal force that arises during compliant contact between two spheres, or between a sphere and a flat plate. It differs from a well-known existing model by only a single term. The advantage of the new model is that it accurately predicts the measured values of the coefficient of restitution between spheres and plates of various materials, whereas other models do not.}, + number = {3}, + journal = {IEEE Transactions on Robotics}, + author = {Azad, Morteza and Featherstone, Roy}, + month = jun, + year = {2014}, + note = {Conference Name: IEEE Transactions on Robotics}, + keywords = {Computational modeling, Robots, Mathematical model, Data models, Predictive models, Force, Animation and simulation, compliant contact model, Materials, nonlinear damping, normal force}, + pages = {736--739}, } -@incollection{schaal_dynamic_2006, - address = {Tokyo}, - title = {Dynamic {Movement} {Primitives} -{A} {Framework} for {Motor} {Control} in {Humans} and {Humanoid} {Robotics}}, - isbn = {978-4-431-24164-5}, - url = {http://link.springer.com/10.1007/4-431-31381-8_23}, - abstract = {Given the continuous stream of movements that biological systems exhibit in their daily activities, an account for such versatility and creativity has to assume that movement sequences consist of segments, executed either in sequence or with partial or complete overlap. Therefore, a fundamental question that has pervaded research in motor control both in artificial and biological systems revolves around identifying movement primitives (a.k.a. units of actions, basis behaviors, motor schemas, etc.). What are the fundamental building blocks that are strung together, adapted to, and created for ever new behaviors? This paper summarizes results that led to the hypothesis of Dynamic Movement Primitives (DMP). DMPs are units of action that are formalized as stable nonlinear attractor systems. They are useful for autonomous robotics as they are highly flexible in creating complex rhythmic (e.g., locomotion) and discrete (e.g., a tennis swing) behaviors that can quickly be adapted to the inevitable perturbations of a dynamically changing, stochastic environment. Moreover, DMPs provide a formal framework that also lends itself to investigations in computational neuroscience. A recent finding that allows creating DMPs with the help of well-understood statistical learning methods has elevated DMPs from a more heuristic to a principled modeling approach. Theoretical insights, evaluations on a humanoid robot, and behavioral and brain imaging data will serve to outline the framework of DMPs for a general approach to motor control in robotics and biology.}, - language = {en}, - urldate = {2022-04-15}, - booktitle = {Adaptive {Motion} of {Animals} and {Machines}}, - publisher = {Springer-Verlag}, - author = {Schaal, Stefan}, - year = {2006}, - doi = {10.1007/4-431-31381-8_23}, - pages = {261--280}, - file = {Schaal_2006_Dynamic Movement Primitives -A Framework for Motor Control in Humans and.pdf:/home/dferigo/Zotero/storage/NGSBJLMR/Schaal_2006_Dynamic Movement Primitives -A Framework for Motor Control in Humans and.pdf:application/pdf}, +@inproceedings{azad_model_2016, + title = {Model estimation and control of compliant contact normal force}, + doi = {10.1109/HUMANOIDS.2016.7803313}, + abstract = {This paper proposes a method to realize desired contact normal forces between humanoids and their compliant environment. By using contact models, desired contact forces are converted to desired deformations of compliant surfaces. To achieve desired forces, deformations are controlled by controlling the contact point positions. Parameters of contact models are assumed to be known or estimated using the approach described in this paper. The proposed methods for estimating the contact parameters and controlling the contact normal force are implemented on a LWR KUKA IV arm. To verify both methods, experiments are performed with the KUKA arm while its end-effector is in contact with two different soft objects.}, + booktitle = {2016 {IEEE}-{RAS} 16th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, + author = {Azad, Morteza and Ortenzi, Valerio and Lin, Hsiu-Chin and Rueckert, Elmar and Mistry, Michael}, + month = nov, + year = {2016}, + note = {ISSN: 2164-0580}, + keywords = {Humanoid robots, Acceleration, Dynamics, Motion control, Force, Estimation}, + pages = {442--447}, } -@inproceedings{peters_policy_2006, - address = {Beijing, China}, - title = {Policy {Gradient} {Methods} for {Robotics}}, - isbn = {978-1-4244-0258-8 978-1-4244-0259-5}, - url = {http://ieeexplore.ieee.org/document/4058714/}, - doi = {10.1109/IROS.2006.282564}, - abstract = {The aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm.}, +@article{gilardi_literature_2002, + title = {Literature survey of contact dynamics modelling}, + volume = {37}, + issn = {0094114X}, + url = {https://linkinghub.elsevier.com/retrieve/pii/S0094114X02000459}, + doi = {10.1016/S0094-114X(02)00045-9}, + abstract = {Impact is a complex phenomenon that occurs when two or more bodies undergo a collision. This phenomenon is important in many different areas––machine design, robotics, multi-body analysis are just a few examples. The purpose of this manuscript is to provide an overview of the state of the art on impact and contact modelling methodologies, taking into account their different aspects, specifically, the energy loss, the influence of the friction model, solution approaches, the multi-contact problem and the experimental verification. The paper is intended to provide a review of results presented in literature and some additional insights into existing models, their interrelationship and the use of these models for impact/contact scenarios encountered in space robotic applications.}, language = {en}, - urldate = {2022-04-15}, - booktitle = {2006 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}}, - publisher = {IEEE}, - author = {Peters, Jan and Schaal, Stefan}, + number = {10}, + urldate = {2022-03-02}, + journal = {Mechanism and Machine Theory}, + author = {Gilardi, G. and Sharf, I.}, month = oct, - year = {2006}, - pages = {2219--2225}, - file = {Peters_Schaal_2006_Policy Gradient Methods for Robotics.pdf:/home/dferigo/Zotero/storage/MAL8FFI9/Peters_Schaal_2006_Policy Gradient Methods for Robotics.pdf:application/pdf}, + year = {2002}, + pages = {1213--1239}, + file = {Gilardi and Sharf - 2002 - Literature survey of contact dynamics modelling.pdf:/home/dferigo/Zotero/storage/R9VJE4P2/Gilardi and Sharf - 2002 - Literature survey of contact dynamics modelling.pdf:application/pdf}, } -@inproceedings{schaal_learning_1996, - title = {Learning from {Demonstration}}, - volume = {9}, - url = {https://proceedings.neurips.cc/paper/1996/hash/68d13cf26c4b4f4f932e3eff990093ba-Abstract.html}, - abstract = {By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely at(cid:173) tempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based rein(cid:173) forcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.}, - urldate = {2022-04-15}, - booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, - publisher = {MIT Press}, - author = {Schaal, Stefan}, - year = {1996}, -} - -@inproceedings{atkeson_robot_1997, - title = {Robot {Learning} {From} {Demonstration}}, - abstract = {The goal of robot learning from demonstration is to have a robot learn from watching a demonstration of the task to be performed. In our approach to learning from demonstration the robot learns a reward function from the demonstration and a task model from repeated attempts to perform the task. A policy is computed based on the learned reward function and task model. Lessons learned from an implementation on an anthropomorphic robot arm using a pendulum swing up task include 1) simply mimicking demonstrated motions is not adequate to perform this task, 2) a task planner can use a learned model and reward function to compute an appropriate policy, 3) this modelbased planning process supports rapid learning, 4) both parametric and nonparametric models can be learned and used, and 5) incorporating a task level direct learning component, which is non-model-based, in addition to the model-based planner, is useful in compensating for structural modeling errors and slow model learning.}, +@inproceedings{gros_baumgarte_2015, + address = {Osaka}, + title = {Baumgarte stabilisation over the {SO}(3) rotation group for control}, + isbn = {978-1-4799-7886-1}, + url = {http://ieeexplore.ieee.org/document/7402298/}, + doi = {10.1109/CDC.2015.7402298}, + abstract = {Representations of the SO(3) rotation group are crucial for airborne and aerospace applications. Euler angles is a popular representation in many applications, but yield models having singular dynamics. This issue is addressed via non-singular representations, operating in dimensions higher than 3. Unit quaternions and the Direction Cosine Matrix are the best known non-singular representations, and favoured in challenging aeronautic and aerospace applications. All nonsingular representations yield invariants in the model dynamics, i.e. a set of nonlinear algebraic conditions that must be fulfilled by the model initial conditions, and that remain fulfilled over time. However, due to numerical integration errors, these conditions tend to become violated when using standard integrators, making the model inconsistent with the physical reality. This issue poses some challenges when non-singular representations are deployed in optimal control. In this paper, we propose a simple technique to address the issue for classical integration schemes, establish formally its properties, and illustrate it on the optimal control of a satellite.}, language = {en}, - author = {Atkeson, Christopher G and Schaal, Stefan}, - year = {1997}, - pages = {9}, - file = {Atkeson_Schaal_Robot Learning From Demonstration.pdf:/home/dferigo/Zotero/storage/GNWQGNQS/Atkeson_Schaal_Robot Learning From Demonstration.pdf:application/pdf}, + urldate = {2022-03-02}, + booktitle = {2015 54th {IEEE} {Conference} on {Decision} and {Control} ({CDC})}, + publisher = {IEEE}, + author = {Gros, Sebastien and Zanon, Marion and Diehl, Moritz}, + month = dec, + year = {2015}, + pages = {620--625}, } -@article{schaal_is_1999, - title = {Is imitation learning the route to humanoid robots?}, - volume = {3}, - issn = {13646613}, - url = {https://linkinghub.elsevier.com/retrieve/pii/S1364661399013273}, - doi = {10.1016/S1364-6613(99)01327-3}, - abstract = {This review investigates two recent developments in artificial intelligence and neural computation: learning from imitation and the development of humanoid robots. It will be postulated that the study of imitation learning offers a promising route to gain new insights into mechanisms of perceptual motor control that could ultimately lead to the creation of autonomous humanoid robots. Imitation learning focuses on three important issues: efficient motor learning, the connection between action and perception, and modular motor control in form of movement primitives. It will be reviewed how research on representations of, and functional connections between action and perception have contributed to our understanding of motor acts of other beings. The recent discovery that some areas in the primate brain are active during both movement perception and execution has provided a hypothetical neural basis of imitation. Computational approaches to imitation learning will also be described, initially from the perspective of traditional AI and robotics, but also from the perspective of neural network models and statistical learning research. Parallels and differences between biological and computational approaches to imitation will be highlighted and an overview of current projects that actually employ imitation learning for humanoid robots will be given.}, +@article{schulman_trust_2017-1, + title = {Trust {Region} {Policy} {Optimization}}, + url = {http://arxiv.org/abs/1502.05477}, + abstract = {We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.}, language = {en}, - number = {6}, - urldate = {2022-04-15}, - journal = {Trends in Cognitive Sciences}, - author = {Schaal, Stefan}, - month = jun, - year = {1999}, - pages = {233--242}, - file = {Schaal_1999_Is imitation learning the route to humanoid robots.pdf:/home/dferigo/Zotero/storage/IN2PXD9V/Schaal_1999_Is imitation learning the route to humanoid robots.pdf:application/pdf}, + urldate = {2022-02-28}, + journal = {arXiv:1502.05477 [cs]}, + author = {Schulman, John and Levine, Sergey and Moritz, Philipp and Jordan, Michael I. and Abbeel, Pieter}, + month = apr, + year = {2017}, + note = {arXiv: 1502.05477}, + keywords = {Computer Science - Machine Learning}, } -@article{zico_kolter_stanford_2011, - title = {The {Stanford} {LittleDog}: {A} learning and rapid replanning approach to quadruped locomotion}, - volume = {30}, - issn = {0278-3649, 1741-3176}, - shorttitle = {The {Stanford} {LittleDog}}, - url = {http://journals.sagepub.com/doi/10.1177/0278364910390537}, - doi = {10.1177/0278364910390537}, - abstract = {Legged robots offer the potential to navigate a wide variety of terrains that are inaccessible to wheeled vehicles. In this paper we consider the planning and control tasks of navigating a quadruped robot over a wide variety of challenging terrain, including terrain which it has not seen until run-time. We present a software architecture that makes use of both static and dynamic gaits, as well as specialized dynamic maneuvers, to accomplish this task. Throughout the paper we highlight two themes that have been central to our approach: 1) the prevalent use of learning algorithms, and 2) a focus on rapid recovery and replanning techniques; we present several novel methods and algorithms that we developed for the quadruped and that illustrate these two themes. We evaluate the performance of these different methods, and also present and discuss the performance of our system on the official Learning Locomotion tests.}, +@article{schulman_high-dimensional_2018, + title = {High-{Dimensional} {Continuous} {Control} {Using} {Generalized} {Advantage} {Estimation}}, + url = {http://arxiv.org/abs/1506.02438}, + abstract = {Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(λ). We address the second challenge by using trust region optimization procedure for both the policy and the value function, which are represented by neural networks.}, language = {en}, - number = {2}, - urldate = {2022-04-15}, - journal = {The International Journal of Robotics Research}, - author = {Zico Kolter, J. and Ng, Andrew Y}, - month = feb, - year = {2011}, - pages = {150--174}, - file = {Zico Kolter_Ng_2011_The Stanford LittleDog.pdf:/home/dferigo/Zotero/storage/3SSTHW7F/Zico Kolter_Ng_2011_The Stanford LittleDog.pdf:application/pdf}, + urldate = {2022-02-28}, + journal = {arXiv:1506.02438 [cs]}, + author = {Schulman, John and Moritz, Philipp and Levine, Sergey and Jordan, Michael and Abbeel, Pieter}, + month = oct, + year = {2018}, + note = {arXiv: 1506.02438}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control}, } -@inproceedings{kohl_machine_2004, - title = {Machine {Learning} for {Fast} {Quadrupedal} {Locomotion}}, - abstract = {For a robot, the ability to get from one place to another is one of the most basic skills. However, locomotion on legged robots is a challenging multidimensional control problem. This paper presents a machine learning approach to legged locomotion, with all training done on the physical robots. The main contributions are a specification of our fully automated learning environment and a detailed empirical comparison of four different machine learning algorithms for learning quadrupedal locomotion. The resulting learned walk is considerably faster than all previously reported hand-coded walks for the same robot platform.}, +@incollection{zhang_taxonomy_2020, + address = {Singapore}, + title = {Taxonomy of {Reinforcement} {Learning} {Algorithms}}, + isbn = {9789811540950}, + url = {https://doi.org/10.1007/978-981-15-4095-0_3}, + abstract = {In this chapter, we introduce and summarize the taxonomy and categories for reinforcement learning (RL) algorithms. Figure 3.1 presents an overview of the typical and popular algorithms in a structural way. We classify reinforcement learning algorithms from different perspectives, including model-based and model-free methods, value-based and policy-based methods (or combination of the two), Monte Carlo methods and temporal-difference methods, on-policy and off-policy methods. Most reinforcement learning algorithms can be classified under different categories according to the above criteria, hope this helps to provide the readers some overviews of the full picture before introducing the algorithms in detail in later chapters.Open image in new windowFig. 3.1Map of reinforcement learning algorithms. Boxes with thick lines denote different categories, others denote specific algorithms}, language = {en}, - author = {Kohl, Nate and Stone, Peter}, - year = {2004}, - pages = {6}, - file = {Kohl_Stone_2004_Machine Learning for Fast Quadrupedal Locomotion.pdf:/home/dferigo/Zotero/storage/ZF4WFM4N/Kohl_Stone_2004_Machine Learning for Fast Quadrupedal Locomotion.pdf:application/pdf}, + urldate = {2022-02-24}, + booktitle = {Deep {Reinforcement} {Learning}: {Fundamentals}, {Research} and {Applications}}, + publisher = {Springer}, + author = {Zhang, Hongming and Yu, Tianyang}, + editor = {Dong, Hao and Ding, Zihan and Zhang, Shanghang}, + year = {2020}, + doi = {10.1007/978-981-15-4095-0_3}, + keywords = {Model-based, Model-free, Monte Carlo (MC) methods, Off-policy, On-policy, Policy-based, Temporal-difference (TD) methods, Value-based}, + pages = {125--133}, } -@inproceedings{honglak_lee_quadruped_2006, - address = {Orlando, FL, USA}, - title = {Quadruped robot obstacle negotiation via reinforcement learning}, - isbn = {978-0-7803-9505-3}, - url = {http://ieeexplore.ieee.org/document/1642158/}, - doi = {10.1109/ROBOT.2006.1642158}, - abstract = {Legged robots can, in principle, traverse a large variety of obstacles and terrains. In this paper, we describe a successful application of reinforcement learning to the problem of negotiating obstacles with a quadruped robot. Our algorithm is based on a two-level hierarchical decomposition of the task, in which the high-level controller selects the sequence of footplacement positions, and the low-level controller generates the continuous motions to move each foot to the specified positions. The high-level controller uses an estimate of the value function to guide its search; this estimate is learned partially from supervised data. The low-level controller is obtained via policy search. We demonstrate that our robot can successfully climb over a variety of obstacles which were not seen at training time.}, +@article{pardo_time_2022, + title = {Time {Limits} in {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/1712.00378}, + abstract = {In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and explain why not doing so can cause state aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. In case (i), we argue that the terminations due to time limits are in fact part of the environment, and thus a notion of the remaining time should be included as part of the agent’s input to avoid violation of the Markov property. In case (ii), the time limits are not part of the environment and are only used to facilitate learning. We argue that this insight should be incorporated by bootstrapping from the value of the state at the end of each partial episode. For both cases, we illustrate empirically the significance of our considerations in improving the performance and stability of existing reinforcement learning algorithms, showing state-of-the-art results on several control tasks.}, language = {en}, - urldate = {2022-04-15}, - booktitle = {Proceedings 2006 {IEEE} {International} {Conference} on {Robotics} and {Automation}, 2006. {ICRA} 2006.}, - publisher = {IEEE}, - author = {{Honglak Lee} and {Yirong Shen} and {Chih-Han Yu} and Singh, G. and Ng, A.Y.}, - year = {2006}, - pages = {3003--3010}, - file = {Honglak Lee et al_2006_Quadruped robot obstacle negotiation via reinforcement learning.pdf:/home/dferigo/Zotero/storage/LE662D9H/Honglak Lee et al_2006_Quadruped robot obstacle negotiation via reinforcement learning.pdf:application/pdf}, + urldate = {2022-02-14}, + journal = {arXiv:1712.00378 [cs]}, + author = {Pardo, Fabio and Tavakoli, Arash and Levdik, Vitaly and Kormushev, Petar}, + month = jan, + year = {2022}, + note = {arXiv: 1712.00378}, + keywords = {Computer Science - Machine Learning}, } -@article{theodorou_generalized_2010, - title = {A {Generalized} {Path} {Integral} {Control} {Approach} to {Reinforcement} {Learning}}, - volume = {11}, - abstract = {With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-JacobiBellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open algorithmic parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a complex robot learning scenario. We believe that Policy Improvement with Path Integrals (PI2) offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs.}, - language = {en}, - journal = {The Journal of Machine Learning Research}, - author = {Theodorou, Evangelos A and Buchli, Jonas and Schaal, Stefan and Org, Buchli}, - year = {2010}, - pages = {45}, - file = {Theodorou et al_A Generalized Path Integral Control Approach to Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/N8HXKEPV/Theodorou et al_A Generalized Path Integral Control Approach to Reinforcement Learning.pdf:application/pdf}, +@article{schulman_proximal_2017-1, + title = {Proximal {Policy} {Optimization} {Algorithms}}, + url = {http://arxiv.org/abs/1707.06347}, + abstract = {We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.}, + urldate = {2020-05-07}, + journal = {arXiv:1707.06347 [cs]}, + author = {Schulman, John and Wolski, Filip and Dhariwal, Prafulla and Radford, Alec and Klimov, Oleg}, + month = aug, + year = {2017}, + note = {arXiv: 1707.06347}, + keywords = {Computer Science - Machine Learning}, } -@inproceedings{kober_policy_2008, - title = {Policy {Search} for {Motor} {Primitives} in {Robotics}}, - volume = {21}, - url = {https://proceedings.neurips.cc/paper/2008/hash/7647966b7343c29048673252e490f736-Abstract.html}, - abstract = {Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results into a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning by assuming a form of exploration that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable in complex motor learning tasks. We compare this algorithm to alternative parametrized policy search methods and show that it outperforms previous methods. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM robot arm.}, - urldate = {2022-04-15}, - booktitle = {Advances in {Neural} {Information} {Processing} {Systems}}, - publisher = {Curran Associates, Inc.}, - author = {Kober, Jens and Peters, Jan}, - year = {2008}, - file = {Kober_Peters_2008_Policy Search for Motor Primitives in Robotics.pdf:/home/dferigo/Zotero/storage/BFS6BPCJ/Kober_Peters_2008_Policy Search for Motor Primitives in Robotics.pdf:application/pdf}, +@phdthesis{shulman_optimizing_2016, + title = {Optimizing {Expectations}: {From} {Deep} {Reinforcement} {Learning} to {Stochastic} {Computation} {Graphs}}, + url = {http://joschu.net/docs/thesis.pdf}, + urldate = {2019-06-07}, + author = {Shulman, John}, + year = {2016}, } -@article{gullapalli_acquiring_1994, - title = {Acquiring robot skills via reinforcement learning}, - volume = {14}, - issn = {1941-000X}, - doi = {10.1109/37.257890}, - abstract = {Skill acquisition is a difficult , yet important problem in robot performance. The authors focus on two skills, namely robotic assembly and balancing and on two classic tasks to develop these skills via learning: the peg-in hole insertion task, and the ball balancing task. A stochastic real-valued (SRV) reinforcement learning algorithm is described and used for learning control and the authors show how it can be used with nonlinear multilayer ANNs. In the peg-in-hole insertion task the SRV network successfully learns to insert to insert a peg into a hole with extremely low clearance, in spite of high sensor noise. In the ball balancing task the SRV network successfully learns to balance the ball with minimal feedback.{\textless}{\textgreater}}, - number = {1}, - journal = {IEEE Control Systems Magazine}, - author = {Gullapalli, V. and Franklin, J.A. and Benbrahim, H.}, - month = feb, - year = {1994}, - note = {Conference Name: IEEE Control Systems Magazine}, - keywords = {Robust control, Control systems, Robot control, Adaptive control, Uncertainty, Control design, Delay, Feedback, Robotic assembly, Supervised learning}, - pages = {13--24}, - file = {Gullapalli et al_1994_Acquiring robot skills via reinforcement learning.pdf:/home/dferigo/Zotero/storage/D239HJKA/Gullapalli et al_1994_Acquiring robot skills via reinforcement learning.pdf:application/pdf}, +@article{merel_catch_2020, + title = {Catch \& {Carry}: reusable neural controllers for vision-guided whole-body tasks}, + volume = {39}, + issn = {0730-0301}, + shorttitle = {Catch \& {Carry}}, + url = {https://doi.org/10.1145/3386569.3392474}, + doi = {10.1145/3386569.3392474}, + abstract = {We address the longstanding challenge of producing flexible, realistic humanoid character controllers that can perform diverse whole-body tasks involving object interactions. This challenge is central to a variety of fields, from graphics and animation to robotics and motor neuroscience. Our physics-based environment uses realistic actuation and first-person perception - including touch sensors and egocentric vision - with a view to producing active-sensing behaviors (e.g. gaze direction), transferability to real robots, and comparisons to the biology. We develop an integrated neural-network based approach consisting of a motor primitive module, human demonstrations, and an instructed reinforcement learning regime with curricula and task variations. We demonstrate the utility of our approach for several tasks, including goal-conditioned box carrying and ball catching, and we characterize its behavioral robustness. The resulting controllers can be deployed in real-time on a standard PC.1}, + number = {4}, + urldate = {2022-11-16}, + journal = {ACM Transactions on Graphics}, + author = {Merel, Josh and Tunyasuvunakool, Saran and Ahuja, Arun and Tassa, Yuval and Hasenclever, Leonard and Pham, Vu and Erez, Tom and Wayne, Greg and Heess, Nicolas}, + month = aug, + year = {2020}, + keywords = {reinforcement learning, motor control, object interaction, physics-based character}, + pages = {39:39:1--39:39:12}, + file = {Merel et al_2020_Catch & Carry.pdf:/home/dferigo/Zotero/storage/E2BDLCYP/Merel et al_2020_Catch & Carry.pdf:application/pdf}, } -@inproceedings{kolter_hierarchical_2007, - title = {Hierarchical {Apprenticeship} {Learning} with {Application} to {Quadruped} {Locomotion}}, - abstract = {We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly non-trivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work.}, - language = {en}, - author = {Kolter, J Z and Abbeel, Pieter and Ng, Andrew Y}, - year = {2007}, - pages = {8}, - file = {Kolter et al. - Hierarchical Apprenticeship Learning with Applicat.pdf:/home/dferigo/Zotero/storage/3YAK7MMU/Kolter et al. - Hierarchical Apprenticeship Learning with Applicat.pdf:application/pdf}, +@article{liu_motor_2022, + title = {From motor control to team play in simulated humanoid football}, + volume = {7}, + url = {https://www.science.org/doi/10.1126/scirobotics.abo0235}, + doi = {10.1126/scirobotics.abo0235}, + abstract = {Learning to combine control at the level of joint torques with longer-term goal-directed behavior is a long-standing challenge for physically embodied artificial agents. Intelligent behavior in the physical world unfolds across multiple spatial and temporal scales: Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals that are defined on much longer time scales and that often involve complex interactions with the environment and other agents. Recent research has demonstrated the potential of learning-based approaches applied to the respective problems of complex movement, long-term planning, and multiagent coordination. However, their integration traditionally required the design and optimization of independent subsystems and remains challenging. In this work, we tackled the integration of motor control and long-horizon decision-making in the context of simulated humanoid football, which requires agile motor control and multiagent coordination. We optimized teams of agents to play simulated football via reinforcement learning, constraining the solution space to that of plausible movements learned using human motion capture data. They were trained to maximize several environment rewards and to imitate pretrained football-specific skills if doing so led to improved performance. The result is a team of coordinated humanoid football players that exhibit complex behavior at different scales, quantified by a range of analysis and statistics, including those used in real-world sport analytics. Our work constitutes a complete demonstration of learned integrated decision-making at multiple scales in a multiagent setting.}, + number = {69}, + urldate = {2022-11-16}, + journal = {Science Robotics}, + author = {Liu, Siqi and Lever, Guy and Wang, Zhe and Merel, Josh and Eslami, S. M. Ali and Hennes, Daniel and Czarnecki, Wojciech M. and Tassa, Yuval and Omidshafiei, Shayegan and Abdolmaleki, Abbas and Siegel, Noah Y. and Hasenclever, Leonard and Marris, Luke and Tunyasuvunakool, Saran and Song, H. Francis and Wulfmeier, Markus and Muller, Paul and Haarnoja, Tuomas and Tracey, Brendan and Tuyls, Karl and Graepel, Thore and Heess, Nicolas}, + month = aug, + year = {2022}, + note = {Publisher: American Association for the Advancement of Science}, + pages = {eabo0235}, + file = {Liu et al_2022_From motor control to team play in simulated humanoid football.pdf:/home/dferigo/Zotero/storage/8776G3UK/Liu et al_2022_From motor control to team play in simulated humanoid football.pdf:application/pdf}, } -@inproceedings{kohl_policy_2004, - address = {New Orleans, LA, USA}, - title = {Policy gradient reinforcement learning for fast quadrupedal locomotion}, - isbn = {978-0-7803-8232-9}, - url = {http://ieeexplore.ieee.org/document/1307456/}, - doi = {10.1109/ROBOT.2004.1307456}, - abstract = {This paper presents a machine learning approach to optimizing a quadrupedal trot gait for forward speed. Given a parameterized walk designed for a specific robot, we propose using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk. We implement and test our approach on a commercially available quadrupedal robot platform, namely the Sony Aibo robot. After about three hours of learning, all on the physical robots and with no human intervention other than to change the batteries, the robots achieved a gait faster than any previously known gait for the Aibo, significantly outperforming a variety of existing hand-coded and learned solutions.}, - language = {en}, - urldate = {2022-04-19}, - booktitle = {{IEEE} {International} {Conference} on {Robotics} and {Automation}, 2004. {Proceedings}. {ICRA} '04. 2004}, - publisher = {IEEE}, - author = {Kohl, N. and Stone, P.}, - year = {2004}, - pages = {2619--2624 Vol.3}, - file = {Kohl_Stone_2004_Policy gradient reinforcement learning for fast quadrupedal locomotion.pdf:/home/dferigo/Zotero/storage/6VYM3T8R/Kohl_Stone_2004_Policy gradient reinforcement learning for fast quadrupedal locomotion.pdf:application/pdf}, +@misc{bohez_imitate_2022, + title = {Imitate and {Repurpose}: {Learning} {Reusable} {Robot} {Movement} {Skills} {From} {Human} and {Animal} {Behaviors}}, + shorttitle = {Imitate and {Repurpose}}, + url = {http://arxiv.org/abs/2203.17138}, + doi = {10.48550/arXiv.2203.17138}, + abstract = {We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer. Accompanying videos are available at https://bit.ly/robot-npmp.}, + urldate = {2022-11-16}, + publisher = {arXiv}, + author = {Bohez, Steven and Tunyasuvunakool, Saran and Brakel, Philemon and Sadeghi, Fereshteh and Hasenclever, Leonard and Tassa, Yuval and Parisotto, Emilio and Humplik, Jan and Haarnoja, Tuomas and Hafner, Roland and Wulfmeier, Markus and Neunert, Michael and Moran, Ben and Siegel, Noah and Huber, Andrea and Romano, Francesco and Batchelor, Nathan and Casarini, Federico and Merel, Josh and Hadsell, Raia and Heess, Nicolas}, + month = mar, + year = {2022}, + note = {arXiv:2203.17138 [cs]}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/KUU2ESMB/2203.html:text/html;Bohez et al_2022_Imitate and Repurpose.pdf:/home/dferigo/Zotero/storage/9PRUNF9I/Bohez et al_2022_Imitate and Repurpose.pdf:application/pdf}, } -@article{mnih_human-level_2015, - title = {Human-level control through deep reinforcement learning}, - volume = {518}, - issn = {0028-0836, 1476-4687}, - url = {http://www.nature.com/articles/nature14236}, - doi = {10.1038/nature14236}, +@misc{gleeson_optimizing_2022, + title = {Optimizing {Data} {Collection} in {Deep} {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/2207.07736}, + abstract = {Reinforcement learning (RL) workloads take a notoriously long time to train due to the large number of samples collected at runtime from simulators. Unfortunately, cluster scale-up approaches remain expensive, and commonly used CPU implementations of simulators induce high overhead when switching back and forth between GPU computations. We explore two optimizations that increase RL data collection efficiency by increasing GPU utilization: (1) GPU vectorization: parallelizing simulation on the GPU for increased hardware parallelism, and (2) simulator kernel fusion: fusing multiple simulation steps to run in a single GPU kernel launch to reduce global memory bandwidth requirements. We find that GPU vectorization can achieve up to 1024× speedup over commonly used CPU simulators. We profile the performance of different implementations and show that for a simple simulator, ML compiler implementations (XLA) of GPU vectorization outperform a DNN framework (PyTorch) by 13.4× by reducing CPU overhead from repeated Python to DL backend API calls. We show that simulator kernel fusion speedups with a simple simulator are 11.3× and increase by up to 1024× as simulator complexity increases in terms of memory bandwidth requirements. We show that the speedups from simulator kernel fusion are orthogonal and combinable with GPU vectorization, leading to a multiplicative speedup.}, language = {en}, - number = {7540}, - urldate = {2018-06-23}, - journal = {Nature}, - author = {Mnih, Volodymyr and Kavukcuoglu, Koray and Silver, David and Rusu, Andrei A. and Veness, Joel and Bellemare, Marc G. and Graves, Alex and Riedmiller, Martin and Fidjeland, Andreas K. and Ostrovski, Georg and Petersen, Stig and Beattie, Charles and Sadik, Amir and Antonoglou, Ioannis and King, Helen and Kumaran, Dharshan and Wierstra, Daan and Legg, Shane and Hassabis, Demis}, - month = feb, - year = {2015}, - keywords = {deepmind}, - pages = {529--533}, - file = {Mnih et al_2015_Human-level control through deep reinforcement learning.pdf:/home/dferigo/Zotero/storage/WC7QJIMF/Mnih et al_2015_Human-level control through deep reinforcement learning.pdf:application/pdf}, + urldate = {2022-11-16}, + publisher = {arXiv}, + author = {Gleeson, James and Snider, Daniel and Yang, Yvonne and Gabel, Moshe and de Lara, Eyal and Pekhimenko, Gennady}, + month = jul, + year = {2022}, + note = {arXiv:2207.07736 [cs]}, + keywords = {Computer Science - Machine Learning}, + file = {Gleeson et al. - 2022 - Optimizing Data Collection in Deep Reinforcement L.pdf:/home/dferigo/Zotero/storage/AAZK89RA/Gleeson et al. - 2022 - Optimizing Data Collection in Deep Reinforcement L.pdf:application/pdf}, } -@article{haarnoja_soft_2018, - title = {Soft {Actor}-{Critic}: {Off}-{Policy} {Maximum} {Entropy} {Deep} {Reinforcement} {Learning} with a {Stochastic} {Actor}}, - shorttitle = {Soft {Actor}-{Critic}}, - url = {http://arxiv.org/abs/1801.01290}, - abstract = {Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.}, - urldate = {2020-05-07}, - journal = {arXiv:1801.01290 [cs, stat]}, - author = {Haarnoja, Tuomas and Zhou, Aurick and Abbeel, Pieter and Levine, Sergey}, - month = aug, - year = {2018}, - note = {arXiv: 1801.01290}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/53U5GB7V/1801.html:text/html;Haarnoja et al_2018_Soft Actor-Critic.pdf:/home/dferigo/Zotero/storage/KI69MXY9/Haarnoja et al_2018_Soft Actor-Critic.pdf:application/pdf}, +@misc{singh_learning_2022, + title = {Learning {Bipedal} {Walking} {On} {Planned} {Footsteps} {For} {Humanoid} {Robots}}, + url = {http://arxiv.org/abs/2207.12644}, + abstract = {Deep reinforcement learning (RL) based controllers for legged robots have demonstrated impressive robustness for walking in different environments for several robot platforms. To enable the application of RL policies for humanoid robots in real-world settings, it is crucial to build a system that can achieve robust walking in any direction, on 2D and 3D terrains, and be controllable by a user-command. In this paper, we tackle this problem by learning a policy to follow a given step sequence. The policy is trained with the help of a set of procedurally generated step sequences (also called footstep plans). We show that simply feeding the upcoming 2 steps to the policy is sufficient to achieve omnidirectional walking, turning in place, standing, and climbing stairs. Our method employs curriculum learning on the complexity of terrains, and circumvents the need for reference motions or pre-trained weights. We demonstrate the application of our proposed method to learn RL policies for 2 new robot platforms - HRP5P and JVRC-1 - in the MuJoCo simulation environment. The code for training and evaluation is available online. †.}, + language = {en}, + urldate = {2022-10-04}, + publisher = {arXiv}, + author = {Singh, Rohan Pratap and Benallegue, Mehdi and Morisawa, Mitsuharu and Cisneros, Rafael and Kanehiro, Fumio}, + month = jul, + year = {2022}, + note = {arXiv:2207.12644 [cs]}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {Singh et al. - 2022 - Learning Bipedal Walking On Planned Footsteps For .pdf:/home/dferigo/Zotero/storage/ATTC2SAK/Singh et al. - 2022 - Learning Bipedal Walking On Planned Footsteps For .pdf:application/pdf}, } -@inproceedings{gangapurwala_real-time_2021, - address = {Xi'an, China}, - title = {Real-{Time} {Trajectory} {Adaptation} for {Quadrupedal} {Locomotion} using {Deep} {Reinforcement} {Learning}}, - isbn = {978-1-72819-077-8}, - url = {https://ieeexplore.ieee.org/document/9561639/}, - doi = {10.1109/ICRA48506.2021.9561639}, - abstract = {We present a control architecture for real-time adaptation and tracking of trajectories generated using a terrain-aware trajectory optimization solver. This approach enables us to circumvent the computationally exhaustive task of online trajectory optimization, and further introduces a control solution robust to systems modeled with approximated dynamics. We train a policy using deep reinforcement learning (RL) to introduce additive deviations to a reference trajectory in order to generate a feedback-based trajectory tracking system for a quadrupedal robot. We train this policy across a multitude of simulated terrains and ensure its generality by introducing training methods that avoid overfitting and convergence towards local optima. Additionally, in order to capture terrain information, we include a latent representation of the height maps in the observation space of the RL environment as a form of exteroceptive feedback. We test the performance of our trained policy by tracking the corrected set points using a model-based whole-body controller and compare it with the tracking behavior obtained without the corrective feedback in several simulation environments, and show that introducing the corrective feedback results in increase of the success rate from 72.7\% to 92.4\% for tracking precomputed dynamic long horizon trajectories on flat terrain and from 47.5\% to 80.3\% on a complex modular uneven terrain. We also show successful transfer of our training approach to the real physical system and further present cogent arguments in support of our framework.}, - language = {en}, - urldate = {2022-04-19}, - booktitle = {2021 {IEEE} {International} {Conference} on {Robotics} and {Automation} ({ICRA})}, - publisher = {IEEE}, - author = {Gangapurwala, Siddhant and Geisert, Mathieu and Orsolino, Romeo and Fallon, Maurice and Havoutis, Ioannis}, - month = may, - year = {2021}, - pages = {5973--5979}, - file = {Gangapurwala et al_2021_Real-Time Trajectory Adaptation for Quadrupedal Locomotion using Deep.pdf:/home/dferigo/Zotero/storage/8MU3IKJR/Gangapurwala et al_2021_Real-Time Trajectory Adaptation for Quadrupedal Locomotion using Deep.pdf:application/pdf}, +@misc{xu_accelerated_2022, + title = {Accelerated {Policy} {Learning} with {Parallel} {Differentiable} {Simulation}}, + url = {http://arxiv.org/abs/2204.07137}, + doi = {10.48550/arXiv.2204.07137}, + abstract = {Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.}, + urldate = {2022-07-28}, + publisher = {arXiv}, + author = {Xu, Jie and Makoviychuk, Viktor and Narang, Yashraj and Ramos, Fabio and Matusik, Wojciech and Garg, Animesh and Macklin, Miles}, + month = apr, + year = {2022}, + note = {arXiv:2204.07137 [cs]}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Graphics}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/RZKKUGFX/2204.html:text/html;Xu et al_2022_Accelerated Policy Learning with Parallel Differentiable Simulation.pdf:/home/dferigo/Zotero/storage/AT2J2BBX/Xu et al_2022_Accelerated Policy Learning with Parallel Differentiable Simulation.pdf:application/pdf}, } -@inproceedings{rudin_learning_2022, - title = {Learning to {Walk} in {Minutes} {Using} {Massively} {Parallel} {Deep} {Reinforcement} {Learning}}, - url = {https://proceedings.mlr.press/v164/rudin22a.html}, - abstract = {In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion: https://leggedrobotics.github.io/legged\_gym/.}, - language = {en}, - urldate = {2022-04-19}, - booktitle = {Proceedings of the 5th {Conference} on {Robot} {Learning}}, - publisher = {PMLR}, - author = {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco}, - month = jan, +@article{peng_ase_2022, + title = {{ASE}: {Large}-{Scale} {Reusable} {Adversarial} {Skill} {Embeddings} for {Physically} {Simulated} {Characters}}, + volume = {41}, + issn = {0730-0301, 1557-7368}, + shorttitle = {{ASE}}, + url = {http://arxiv.org/abs/2205.01906}, + doi = {10.1145/3528223.3530110}, + abstract = {The incredible feats of athleticism demonstrated by humans are made possible in part by a vast repertoire of general-purpose motor skills, acquired through years of practice and experience. These skills not only enable humans to perform complex tasks, but also provide powerful priors for guiding their behaviors when learning new tasks. This is in stark contrast to what is common practice in physics-based character animation, where control policies are most typically trained from scratch for each task. In this work, we present a large-scale data-driven framework for learning versatile and reusable skill embeddings for physically simulated characters. Our approach combines techniques from adversarial imitation learning and unsupervised reinforcement learning to develop skill embeddings that produce life-like behaviors, while also providing an easy to control representation for use on new downstream tasks. Our models can be trained using large datasets of unstructured motion clips, without requiring any task-specific annotation or segmentation of the motion data. By leveraging a massively parallel GPU-based simulator, we are able to train skill embeddings using over a decade of simulated experiences, enabling our model to learn a rich and versatile repertoire of skills. We show that a single pre-trained model can be effectively applied to perform a diverse set of new tasks. Our system also allows users to specify tasks through simple reward functions, and the skill embedding then enables the character to automatically synthesize complex and naturalistic strategies in order to achieve the task objectives.}, + number = {4}, + urldate = {2022-07-30}, + journal = {ACM Transactions on Graphics}, + author = {Peng, Xue Bin and Guo, Yunrong and Halper, Lina and Levine, Sergey and Fidler, Sanja}, + month = jul, year = {2022}, - note = {ISSN: 2640-3498}, - pages = {91--100}, - file = {Rudin et al_2022_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/UKAV6F4D/Rudin et al_2022_Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.pdf:application/pdf}, + note = {arXiv:2205.01906 [cs]}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Graphics}, + pages = {1--17}, } -@inproceedings{castillo_robust_2021, - address = {Prague, Czech Republic}, - title = {Robust {Feedback} {Motion} {Policy} {Design} {Using} {Reinforcement} {Learning} on a {3D} {Digit} {Bipedal} {Robot}}, - isbn = {978-1-66541-714-3}, - url = {https://ieeexplore.ieee.org/document/9636467/}, - doi = {10.1109/IROS51168.2021.9636467}, - abstract = {In this paper, a hierarchical and robust framework for learning bipedal locomotion is presented and successfully implemented on the 3D biped robot Digit built by Agility Robotics. We propose a cascade-structure controller that combines the learning process with intuitive feedback regulations. This design allows the framework to realize robust and stable walking with a reduced-dimensional state and action spaces of the policy, significantly simplifying the design and increasing the sampling efficiency of the learning method. The inclusion of feedback regulation into the framework improves the robustness of the learned walking gait and ensures the success of the sim-to-real transfer of the proposed controller with minimal tuning. We specifically present a learning pipeline that considers hardware-feasible initial poses of the robot within the learning process to ensure the initial state of the learning is replicated as close as possible to the initial state of the robot in hardware experiments. Finally, we demonstrate the feasibility of our method by successfully transferring the learned policy in simulation to the Digit robot hardware, realizing sustained walking gaits under external force disturbances and challenging terrains not incurred during the training process. To the best of our knowledge, this is the first time a learning-based policy is transferred successfully to the Digit robot in hardware experiments.}, +@inproceedings{chou_improving_2017, + title = {Improving {Stochastic} {Policy} {Gradients} in {Continuous} {Control} with {Deep} {Reinforcement} {Learning} using the {Beta} {Distribution}}, + abstract = {Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and robotic manipulation. However, in real-world control problems, the actions one can take are bounded by physical constraints, which introduces a bias when the standard Gaussian distribution is used as the stochastic policy. In this work, we propose to use the Beta distribution as an alternative and analyze the bias and variance of the policy gradients of both policies. We show that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization (TRPO) and actor critic with experience replay (ACER), the state-of-the-art on- and offpolicy stochastic methods respectively, on OpenAI Gym’s and MuJoCo’s continuous control environments.}, language = {en}, - urldate = {2022-04-19}, - booktitle = {2021 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, - publisher = {IEEE}, - author = {Castillo, Guillermo A. and Weng, Bowen and Zhang, Wei and Hereid, Ayonga}, - month = sep, - year = {2021}, - pages = {5136--5143}, - file = {Castillo et al. - 2021 - Robust Feedback Motion Policy Design Using Reinfor.pdf:/home/dferigo/Zotero/storage/43DSYDWU/Castillo et al. - 2021 - Robust Feedback Motion Policy Design Using Reinfor.pdf:application/pdf}, + author = {Chou, Po-Wei and Maturana, Daniel and Scherer, Sebastian}, + year = {2017}, + pages = {10}, + file = {Chou et al_Improving Stochastic Policy Gradients in Continuous Control with Deep.pdf:/home/dferigo/Zotero/storage/AL92UVVR/Chou et al_Improving Stochastic Policy Gradients in Continuous Control with Deep.pdf:application/pdf}, } -@inproceedings{bloesch_towards_2022, - title = {Towards {Real} {Robot} {Learning} in the {Wild}: {A} {Case} {Study} in {Bipedal} {Locomotion}}, - shorttitle = {Towards {Real} {Robot} {Learning} in the {Wild}}, - url = {https://proceedings.mlr.press/v164/bloesch22a.html}, - abstract = {Algorithms for self-learning systems have made considerable progress in recent years, yet safety concerns and the need for additional instrumentation have so far largely limited learning experiments with real robots to well controlled lab settings. In this paper, we demonstrate how a small bipedal robot can autonomously learn to walk with minimal human intervention and with minimal instrumentation of the environment. We employ data-efficient off-policy deep reinforcement learning to learn to walk end-to-end, directly on hardware, using rewards that are computed exclusively from proprioceptive sensing. To allow the robot to autonomously adapt its behaviour to its environment, we additionally provide the agent with raw RGB camera images as input. By deploying two robots in different geographic locations while sharing data in a distributed learning setup, we achieve higher throughput and greater diversity of the training data. Our learning experiments constitute a step towards the long-term vision of learning “in the wild” for legged robots, and, to our knowledge, represent the first demonstration of learning a deep neural network controller for bipedal locomotion directly on hardware.}, +@inproceedings{reda_learning_2020, + address = {Virtual Event SC USA}, + title = {Learning to {Locomote}: {Understanding} {How} {Environment} {Design} {Matters} for {Deep} {Reinforcement} {Learning}}, + isbn = {978-1-4503-8171-0}, + shorttitle = {Learning to {Locomote}}, + url = {https://dl.acm.org/doi/10.1145/3424636.3426907}, + doi = {10.1145/3424636.3426907}, + abstract = {Learning to locomote is one of the most common tasks in physicsbased animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results. Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits. We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote.}, language = {en}, - urldate = {2022-04-19}, - booktitle = {Proceedings of the 5th {Conference} on {Robot} {Learning}}, - publisher = {PMLR}, - author = {Bloesch, Michael and Humplik, Jan and Patraucean, Viorica and Hafner, Roland and Haarnoja, Tuomas and Byravan, Arunkumar and Siegel, Noah Yamamoto and Tunyasuvunakool, Saran and Casarini, Federico and Batchelor, Nathan and Romano, Francesco and Saliceti, Stefano and Riedmiller, Martin and Eslami, S. M. Ali and Heess, Nicolas}, - month = jan, - year = {2022}, - note = {ISSN: 2640-3498}, - pages = {1502--1511}, - file = {Bloesch et al_2022_Towards Real Robot Learning in the Wild.pdf:/home/dferigo/Zotero/storage/MUVLGR7I/Bloesch et al_2022_Towards Real Robot Learning in the Wild.pdf:application/pdf}, + urldate = {2022-07-30}, + booktitle = {Motion, {Interaction} and {Games}}, + publisher = {ACM}, + author = {Reda, Daniele and Tao, Tianxin and van de Panne, Michiel}, + month = oct, + year = {2020}, + pages = {1--10}, + file = {Reda et al_2020_Learning to Locomote.pdf:/home/dferigo/Zotero/storage/44VB7PI2/Reda et al_2020_Learning to Locomote.pdf:application/pdf}, } -@article{muratore_robot_2022, - title = {Robot {Learning} from {Randomized} {Simulations}: {A} {Review}}, - shorttitle = {Robot {Learning} from {Randomized} {Simulations}}, - url = {http://arxiv.org/abs/2111.00956}, - abstract = {The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. Unfortunately, it is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the ‘reality gap’. We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named ‘domain randomization’ which is a method for learning from randomized simulations.}, +@article{peng_learning_2017-1, + title = {Learning {Locomotion} {Skills} using {DeepRL}: does the choice of {Action} {Space} matter?}, + abstract = {The use of deep reinforcement learning allows for high-dimensional state descriptors, but little is known about how the choice of action representation impacts the learning difficulty and the resulting performance. We compare the impact of four different action parameterizations (torques, muscle-activations, target joint angles, and target joint-angle velocities) in terms of learning time, policy robustness, motion quality, and policy query rates. Our results are evaluated on a gaitcycle imitation task for multiple planar articulated figures and multiple gaits. We demonstrate that the local feedback provided by higher-level action parameterizations can significantly impact the learning, robustness, and quality of the resulting policies.}, language = {en}, - urldate = {2022-04-20}, - journal = {arXiv:2111.00956 [cs]}, - author = {Muratore, Fabio and Ramos, Fabio and Turk, Greg and Yu, Wenhao and Gienger, Michael and Peters, Jan}, - month = jan, - year = {2022}, - note = {arXiv: 2111.00956}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {Muratore et al_2022_Robot Learning from Randomized Simulations.pdf:/home/dferigo/Zotero/storage/BJEATQL9/Muratore et al_2022_Robot Learning from Randomized Simulations.pdf:application/pdf}, + author = {Peng, Xue Bin}, + year = {2017}, + pages = {16}, + file = {Peng_2017_Learning Locomotion Skills using DeepRL.pdf:/home/dferigo/Zotero/storage/2ZQWS6JU/Peng_2017_Learning Locomotion Skills using DeepRL.pdf:application/pdf}, } -@article{peng_sim--real_2018, - title = {Sim-to-{Real} {Transfer} of {Robotic} {Control} with {Dynamics} {Randomization}}, - doi = {10.1109/ICRA.2018.8460528}, - abstract = {Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts. In this paper, we demonstrate a simple method to bridge this "reality gap". By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained. This adaptivity enables the policies to generalize to the dynamics of the real world without any training on the physical system. Our approach is demonstrated on an object pushing task using a robotic arm. Despite being trained exclusively in simulation, our policies are able to maintain a similar level of performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. We explore the impact of various design decisions and show that the resulting policies are robust to significant calibration error.}, - urldate = {2019-05-22}, - journal = {IEEE International Conference on Robotics and Automation (ICRA)}, - author = {Peng, Xue Bin and Andrychowicz, Marcin and Zaremba, Wojciech and Abbeel, Pieter}, - month = may, - year = {2018}, - note = {arXiv: 1710.06537}, - keywords = {Computer Science - Robotics, Computer Science - Systems and Control}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/2MBLKJ24/1710.html:text/html;Peng et al_2018_Sim-to-Real Transfer of Robotic Control with Dynamics Randomization.pdf:/home/dferigo/Zotero/storage/BUBU4LU3/Peng et al_2018_Sim-to-Real Transfer of Robotic Control with Dynamics Randomization.pdf:application/pdf}, +@article{romualdi_modeling_2021, + title = {Modeling of {Visco}-{Elastic} {Environments} for {Humanoid} {Robot} {Motion} {Control}}, + volume = {6}, + issn = {2377-3766, 2377-3774}, + url = {http://arxiv.org/abs/2105.14622}, + doi = {10.1109/LRA.2021.3067589}, + abstract = {This manuscript presents a model of compliant contacts for time-critical humanoid robot motion control. The proposed model considers the environment as a continuum of spring-damper systems, which allows us to compute the equivalent contact force and torque that the environment exerts on the contact surface. We show that the proposed model extends the linear and rotational springs and dampers - classically used to characterize soft terrains - to the case of large contact surface orientations. The contact model is then used for the real-time whole-body control of humanoid robots walking on visco-elastic environments. The overall approach is validated by simulating walking motions of the iCub humanoid robot. Furthermore, the paper compares the proposed whole-body control strategy and state of the art approaches. In this respect, we investigate the terrain compliance that makes the classical approaches assuming rigid contacts fail. We finally analyze the robustness of the presented control design with respect to non-parametric uncertainty in the contact-model.}, + number = {3}, + urldate = {2022-07-26}, + journal = {IEEE Robotics and Automation Letters}, + author = {Romualdi, Giulio and Dafarra, Stefano and Pucci, Daniele}, + month = jul, + year = {2021}, + note = {arXiv:2105.14622 [cs]}, + keywords = {Computer Science - Robotics}, + pages = {4289--4296}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/PEPFYDNZ/2105.html:text/html;Romualdi et al_2021_Modeling of Visco-Elastic Environments for Humanoid Robot Motion Control.pdf:/home/dferigo/Zotero/storage/2VQZDSNI/Romualdi et al_2021_Modeling of Visco-Elastic Environments for Humanoid Robot Motion Control.pdf:application/pdf}, } -@article{zhao_sim--real_2020, - title = {Sim-to-{Real} {Transfer} in {Deep} {Reinforcement} {Learning} for {Robotics}: a {Survey}}, - shorttitle = {Sim-to-{Real} {Transfer} in {Deep} {Reinforcement} {Learning} for {Robotics}}, - url = {http://arxiv.org/abs/2009.13303}, - abstract = {Deep reinforcement learning has recently seen huge success across multiple areas in the robotics domain. Owing to the limitations of gathering real-world data, i.e., sample inefficiency and the cost of collecting it, simulation environments are utilized for training the different agents. This not only aids in providing a potentially infinite data source, but also alleviates safety concerns with real robots. Nonetheless, the gap between the simulated and real worlds degrades the performance of the policies once the models are transferred into real robots. Multiple research efforts are therefore now being directed towards closing this sim-toreal gap and accomplish more efficient policy transfer. Recent years have seen the emergence of multiple methods applicable to different domains, but there is a lack, to the best of our knowledge, of a comprehensive review summarizing and putting into context the different methods. In this survey paper, we cover the fundamental background behind sim-to-real transfer in deep reinforcement learning and overview the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation. We categorize some of the most relevant recent works, and outline the main application scenarios. Finally, we discuss the main opportunities and challenges of the different approaches and point to the most promising directions.}, +@phdthesis{traversaro_modelling_2017, + title = {Modelling, {Estimation} and {Identification} of {Humanoid} {Robots} {Dynamics}}, + url = {https://github.com/traversaro/traversaro-phd-thesis}, + urldate = {2020-01-05}, + author = {Traversaro, Silvio}, + year = {2017}, +} + +@article{farley_how_2022, + title = {How to pick a mobile robot simulator: {A} quantitative comparison of {CoppeliaSim}, {Gazebo}, {MORSE} and {Webots} with a focus on the accuracy of motion simulations}, + issn = {1569190X}, + shorttitle = {How to pick a mobile robot simulator}, + url = {https://linkinghub.elsevier.com/retrieve/pii/S1569190X22001046}, + doi = {10.1016/j.simpat.2022.102629}, + abstract = {The number of available tools for dynamic simulation of robots has been growing rapidly in recent years. However, to the best of our knowledge, there are very few reported quantitative comparisons of the most widelyused robot simulation tools. This article attempts to partly fill this gap by providing quantitative and objective comparisons of four widely-used simulation packages for mobile robots. The comparisons reported here were conducted by obtaining data from a real Husky A200 mobile robot driving on mixed terrains as ground truth and by simulating a 3D mobile robot model in a developed identical simulation world of these terrains for each simulator. We then compared the simulation outputs with real, measured results by weighted metrics. Based on our experiments and selected metrics, we conclude that CoppeliaSim is currently the best performing simulator, although Gazebo is not far behind and is a good alternative.}, language = {en}, - urldate = {2020-10-02}, - journal = {arXiv:2009.13303 [cs]}, - author = {Zhao, Wenshuai and Queralta, Jorge Peña and Westerlund, Tomi}, - month = sep, - year = {2020}, - note = {arXiv: 2009.13303}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {Zhao et al_2020_Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics.pdf:/home/dferigo/Zotero/storage/BA2MFGQZ/Zhao et al_2020_Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics.pdf:application/pdf}, + urldate = {2022-07-12}, + journal = {Simulation Modelling Practice and Theory}, + author = {Farley, Andrew and Wang, Jie and Marshall, Joshua A.}, + month = jul, + year = {2022}, + pages = {102629}, + file = {Farley et al_2022_How to pick a mobile robot simulator.pdf:/home/dferigo/Zotero/storage/5MEB9LW7/Farley et al_2022_How to pick a mobile robot simulator.pdf:application/pdf}, } -@misc{nvidia_nvidia_2011, - title = {Nvidia {PhysX}}, - url = {https://developer.nvidia.com/physx-sdk}, - author = {{NVIDIA}}, - year = {2011}, +@article{korber_comparing_2021-1, + title = {Comparing {Popular} {Simulation} {Environments} in the {Scope} of {Robotics} and {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/2103.04616}, + abstract = {This letter compares the performance of four different, popular simulation environments for robotics and reinforcement learning (RL) through a series of benchmarks. The benchmarked scenarios are designed carefully with current industrial applications in mind. Given the need to run simulations as fast as possible to reduce the real-world training time of the RL agents, the comparison includes not only different simulation environments but also different hardware configurations, ranging from an entry-level notebook up to a dual CPU high performance server. We show that the chosen simulation environments benefit the most from single core performance. Yet, using a multi core system, multiple simulations could be run in parallel to increase the performance.}, + urldate = {2022-04-20}, + journal = {arXiv:2103.04616 [cs]}, + author = {Körber, Marian and Lange, Johann and Rediske, Stephan and Steinmann, Simon and Glück, Roland}, + month = mar, + year = {2021}, + note = {arXiv: 2103.04616}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/43R53FNN/2103.html:text/html;Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:/home/dferigo/Zotero/storage/75YCB5Z3/Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:application/pdf}, } -@misc{nvidia_nvidia_2018, - title = {Nvidia {Isaac}}, - url = {https://developer.nvidia.com/isaac-sdk}, - author = {{NVIDIA}}, +@misc{achiam_spinning_2018, + title = {Spinning {Up} in {Deep} {RL}}, + url = {https://spinningup.openai.com}, + language = {en}, + urldate = {2022-06-23}, + author = {Achiam, Josh}, year = {2018}, + file = {Welcome to Spinning Up in Deep RL! — Spinning Up documentation:/home/dferigo/Zotero/storage/WYUQEUMW/latest.html:text/html}, } -@article{innes_differentiable_2019, - title = {A {Differentiable} {Programming} {System} to {Bridge} {Machine} {Learning} and {Scientific} {Computing}}, - url = {http://arxiv.org/abs/1907.07587}, - abstract = {Scientific computing is increasingly incorporating the advancements in machine learning and the ability to work with large amounts of data. At the same time, machine learning models are becoming increasingly sophisticated and exhibit many features often seen in scientific computing, stressing the capabilities of machine learning frameworks. Just as the disciplines of scientific computing and machine learning have shared common underlying infrastructure in the form of numerical linear algebra, we now have the opportunity to further share new computational infrastructure, and thus ideas, in the form of Differentiable Programming. We describe Zygote, a Differentiable Programming system that is able to take gradients of general program structures. We implement this system in the Julia programming language. Our system supports almost all language constructs (control flow, recursion, mutation, etc.) and compiles high-performance code without requiring any user intervention or refactoring to stage computations. This enables an expressive programming model for deep learning, but more importantly, it enables us to incorporate a large ecosystem of libraries in our models in a straightforward way. We discuss our approach to automatic differentiation, including its support for advanced techniques such as mixed-mode, complex and checkpointed differentiation, and present several examples of differentiating programs.}, - urldate = {2019-09-14}, - journal = {arXiv:1907.07587 [cs]}, - author = {Innes, Mike and Edelman, Alan and Fischer, Keno and Rackauckas, Chris and Saba, Elliot and Shah, Viral B. and Tebbutt, Will}, - month = jul, - year = {2019}, - note = {arXiv: 1907.07587}, - keywords = {Computer Science - Machine Learning, Computer Science - Programming Languages}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/GUIDB8WM/1907.html:text/html;Innes et al_2019_A Differentiable Programming System to Bridge Machine Learning and Scientific.pdf:/home/dferigo/Zotero/storage/5DV2YZ5K/Innes et al_2019_A Differentiable Programming System to Bridge Machine Learning and Scientific.pdf:application/pdf}, +@misc{carpentier_pinocchio_2015, + title = {Pinocchio: fast forward and inverse dynamics for poly-articulated systems}, + url = {https://stack-of-tasks.github.io/pinocchio}, + author = {Carpentier, Justin and Valenza, Florian and Mansard, Nicolas}, + year = {2015}, } -@inproceedings{carpentier_analytical_2018, - title = {Analytical {Derivatives} of {Rigid} {Body} {Dynamics} {Algorithms}}, - isbn = {978-0-9923747-4-7}, - url = {http://www.roboticsproceedings.org/rss14/p38.pdf}, - doi = {10.15607/RSS.2018.XIV.038}, - abstract = {Rigid body dynamics is a well-established frame-work in robotics. It can be used to expose the analytic form of kinematic and dynamic functions of the robot model. So far, two major algorithms, namely the recursive Newton-Euler algorithm (RNEA) and the articulated body algorithm (ABA), have been proposed to compute the inverse dynamics and the forward dynamics in a few microseconds. Evaluating their derivatives is an important challenge for various robotic applications (optimal control, estimation, co-design or reinforcement learning). However it remains time consuming, whether using finite differences or automatic differentiation. In this paper, we propose new algorithms to efficiently compute them thanks to closed-form formulations. Using the chain rule and adequate algebraic differentiation of spatial algebra, we firstly differentiate explicitly RNEA. Then, using properties about the derivative of function composition, we show that the same algorithm can also be used to compute the derivatives of ABA with a marginal additional cost. For this purpose, we introduce a new algorithm to compute the inverse of the joint-space inertia matrix, without explicitly computing the matrix itself. All the algorithms are implemented in our open-source C++ framework called Pinocchio. Benchmarks show computational costs varying between 3 microseconds (for a 7-dof arm) up to 17 microseconds (for a 36-dof humanoid), outperforming the alternative approaches of the state of the art.}, - language = {en}, - urldate = {2021-05-18}, - booktitle = {Robotics: {Science} and {Systems} {XIV}}, - publisher = {Robotics: Science and Systems Foundation}, - author = {Carpentier, Justin and Mansard, Nicolas}, - month = jun, - year = {2018}, - file = {Carpentier_Mansard_2018_Analytical Derivatives of Rigid Body Dynamics Algorithms.pdf:/home/dferigo/Zotero/storage/8XI5XBSA/Carpentier_Mansard_2018_Analytical Derivatives of Rigid Body Dynamics Algorithms.pdf:application/pdf}, +@inproceedings{carpentier_pinocchio_2019, + title = {The {Pinocchio} {C}++ library : {A} fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives}, + shorttitle = {The {Pinocchio} {C}++ library}, + doi = {10.1109/SII.2019.8700380}, + abstract = {We introduce Pinocchio, an open-source software framework that implements rigid body dynamics algorithms and their analytical derivatives. Pinocchio does not only include standard algorithms employed in robotics (e.g., forward and inverse dynamics) but provides additional features essential for the control, the planning and the simulation of robots. In this paper, we describe these features and detail the programming patterns and design which make Pinocchio efficient. We evaluate the performances against RBDL, another framework with broad dissemination inside the robotics community. We also demonstrate how the source code generation embedded in Pinocchio outperforms other approaches of state of the art.}, + booktitle = {2019 {IEEE}/{SICE} {International} {Symposium} on {System} {Integration} ({SII})}, + author = {Carpentier, Justin and Saurel, Guilhem and Buondonno, Gabriele and Mirabel, Joseph and Lamiraux, Florent and Stasse, Olivier and Mansard, Nicolas}, + month = jan, + year = {2019}, + note = {ISSN: 2474-2325}, + keywords = {Computational modeling, Kinematics, Robot kinematics, Heuristic algorithms, Libraries, Software algorithms}, + pages = {614--619}, + file = {Carpentier et al_2019_The Pinocchio C++ library.pdf:/home/dferigo/Zotero/storage/8XHJM8IS/Carpentier et al_2019_The Pinocchio C++ library.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/HGDTSVGR/8700380.html:text/html}, } -@article{singh_efficient_2022, - title = {Efficient {Analytical} {Derivatives} of {Rigid}-{Body} {Dynamics} using {Spatial} {Vector} {Algebra}}, - volume = {7}, - issn = {2377-3766, 2377-3774}, - url = {http://arxiv.org/abs/2105.05102}, - doi = {10.1109/LRA.2022.3141194}, - abstract = {An essential need for many model-based robot control algorithms is the ability to quickly and accurately compute partial derivatives of the equations of motion. State of the art approaches to this problem often use analytical methods based on the chain rule applied to existing dynamics algorithms. Although these methods are an improvement over finite differences in terms of accuracy, they are not always the most efficient. In this paper, we contribute new closed-form expressions for the firstorder partial derivatives of inverse dynamics, leading to a recursive algorithm. The algorithm is benchmarked against chain-rule approaches in Fortran and against an existing algorithm from the Pinocchio library in C++. Tests consider computing the partial derivatives of inverse and forward dynamics for robots ranging from kinematic chains to humanoids and quadrupeds. Compared to the previous open-source Pinocchio implementation, our new analytical results uncover a key computational restructuring that enables efficiency gains. Speedups of up to 1.4x are reported for calculating the partial derivatives of inverse dynamics for the 50-dof Talos humanoid.}, - language = {en}, - number = {2}, - urldate = {2022-04-20}, - journal = {IEEE Robotics and Automation Letters}, - author = {Singh, Shubham and Russell, Ryan P. and Wensing, Patrick M.}, - month = apr, - year = {2022}, - note = {arXiv: 2105.05102}, - keywords = {Computer Science - Robotics}, - pages = {1776--1783}, - file = {Singh et al_2022_Efficient Analytical Derivatives of Rigid-Body Dynamics using Spatial Vector.pdf:/home/dferigo/Zotero/storage/EU2RG38L/Singh et al_2022_Efficient Analytical Derivatives of Rigid-Body Dynamics using Spatial Vector.pdf:application/pdf}, -} - -@article{rackauckas_universal_2021, - title = {Universal {Differential} {Equations} for {Scientific} {Machine} {Learning}}, - url = {http://arxiv.org/abs/2001.04385}, - abstract = {In the context of science, the well-known adage “a picture is worth a thousand words” might well be “a model is worth a thousand datasets.” In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators.}, +@article{sola_quaternion_2017, + title = {Quaternion kinematics for the error-state {Kalman} filter}, language = {en}, - urldate = {2022-04-20}, - journal = {arXiv:2001.04385 [cs, math, q-bio, stat]}, - author = {Rackauckas, Christopher and Ma, Yingbo and Martensen, Julius and Warner, Collin and Zubov, Kirill and Supekar, Rohit and Skinner, Dominic and Ramadhan, Ali and Edelman, Alan}, - month = nov, - year = {2021}, - note = {arXiv: 2001.04385}, - keywords = {Computer Science - Machine Learning, Statistics - Machine Learning, Mathematics - Dynamical Systems, Quantitative Biology - Quantitative Methods}, - file = {Rackauckas et al_2021_Universal Differential Equations for Scientific Machine Learning.pdf:/home/dferigo/Zotero/storage/3ALYEQ65/Rackauckas et al_2021_Universal Differential Equations for Scientific Machine Learning.pdf:application/pdf}, + author = {Sola, Joan}, + year = {2017}, + pages = {93}, } -@inproceedings{belbute-peres_end--end_2018, - title = {End-to-{End} {Differentiable} {Physics} for {Learning} and {Control}}, - language = {en}, - author = {Belbute-Peres, Filipe de A and Smith, Kevin A and Allen, Kelsey R and Tenenbaum, Joshua B and Kolter, J Zico}, +@book{sutton_reinforcement_2018, + address = {Cambridge, MA}, + edition = {Second edition}, + series = {Adaptive computation and machine learning series}, + title = {Reinforcement learning: an introduction}, + isbn = {978-0-262-03924-6}, + shorttitle = {Reinforcement learning}, + abstract = {"Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms."--}, + publisher = {The MIT Press}, + author = {Sutton, Richard S. and Barto, Andrew G.}, year = {2018}, - pages = {12}, - file = {Belbute-Peres et al_2018_End-to-End Differentiable Physics for Learning and Control.pdf:/home/dferigo/Zotero/storage/YX2WUZLM/Belbute-Peres et al_2018_End-to-End Differentiable Physics for Learning and Control.pdf:application/pdf}, + keywords = {Reinforcement learning}, } -@inproceedings{kim_survey_2021, - title = {A {Survey} on {Simulation} {Environments} for {Reinforcement} {Learning}}, - doi = {10.1109/UR52253.2021.9494694}, - abstract = {Most of the recent studies of reinforcement learning and robotics basically employ computer simulation due to the advantages of time and cost. For this reason, users have to spare time for investigation in order to choose optimal environment for their purposes. This paper presents a survey result that can be a guidance in user’s choice for simulation environments. The investigation result includes features, brief historical backgrounds, license policies and formats for robot and object description of the eight most popular environments in robot RL studies. We also propose a quantitative evaluation method for those simulation environments considering the features and a pragmatic point of view.}, - booktitle = {2021 18th {International} {Conference} on {Ubiquitous} {Robots} ({UR})}, - author = {Kim, Taewoo and Jang, Minsu and Kim, Jaehong}, - month = jul, - year = {2021}, - note = {ISSN: 2325-033X}, - keywords = {Reinforcement learning, Analytical models, Software, Computer simulation, Lead, Licenses, Rendering (computer graphics)}, - pages = {63--67}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/K5USS9RE/9494694.html:text/html;Kim et al_2021_A Survey on Simulation Environments for Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/2HCEHSZB/Kim et al_2021_A Survey on Simulation Environments for Reinforcement Learning.pdf:application/pdf}, +@book{dong_deep_2020, + address = {Singapore}, + title = {Deep {Reinforcement} {Learning}: {Fundamentals}, {Research} and {Applications}}, + isbn = {9789811540943 9789811540950}, + shorttitle = {Deep {Reinforcement} {Learning}}, + url = {http://link.springer.com/10.1007/978-981-15-4095-0}, + language = {en}, + urldate = {2022-02-24}, + publisher = {Springer Singapore}, + editor = {Dong, Hao and Ding, Zihan and Zhang, Shanghang}, + year = {2020}, + doi = {10.1007/978-981-15-4095-0}, + file = {Dong et al_2020_Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/DNN9KNNP/Dong et al_2020_Deep Reinforcement Learning.pdf:application/pdf}, } -@article{korber_comparing_2021, - title = {Comparing {Popular} {Simulation} {Environments} in the {Scope} of {Robotics} and {Reinforcement} {Learning}}, - url = {http://arxiv.org/abs/2103.04616}, - abstract = {This letter compares the performance of four different, popular simulation environments for robotics and reinforcement learning (RL) through a series of benchmarks. The benchmarked scenarios are designed carefully with current industrial applications in mind. Given the need to run simulations as fast as possible to reduce the real-world training time of the RL agents, the comparison includes not only different simulation environments but also different hardware configurations, ranging from an entry-level notebook up to a dual CPU high performance server. We show that the chosen simulation environments benefit the most from single core performance. Yet, using a multi core system, multiple simulations could be run in parallel to increase the performance.}, +@book{warner_foundations_1983, + title = {Foundations of {Differentiable} {Manifolds} and {Lie} {Groups}}, + volume = {94}, + url = {https://link.springer.com/book/10.1007/978-1-4757-1799-0}, language = {en}, - urldate = {2022-04-20}, - journal = {arXiv:2103.04616 [cs]}, - author = {Körber, Marian and Lange, Johann and Rediske, Stephan and Steinmann, Simon and Glück, Roland}, - month = mar, - year = {2021}, - note = {arXiv: 2103.04616}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, - file = {Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:/home/dferigo/Zotero/storage/LWVBUMDD/Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:application/pdf}, + urldate = {2022-06-23}, + publisher = {Springer Science \& Business Media}, + author = {Warner, Frank W.}, + year = {1983}, + file = {Snapshot:/home/dferigo/Zotero/storage/TRCEMDKF/978-1-4757-1799-0.html:text/html}, } -@article{baydin_automatic_2018, - title = {Automatic {Differentiation} in {Machine} {Learning}: a {Survey}}, - abstract = {Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply “autodiff”, is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other’s results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names “dynamic computational graphs” and “differentiable programming”. We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms “autodiff”, “automatic differentiation”, and “symbolic differentiation” as these are encountered more and more in machine learning settings.}, +@book{selig_geometric_2005, + title = {Geometric {Fundamentals} of {Robotics}}, + volume = {128}, + url = {https://link.springer.com/book/10.1007/b138859}, language = {en}, - author = {Baydin, Atılım Gunes and Pearlmutter, Barak A and Radul, Alexey Andreyevich and Siskind, Jeffrey Mark}, - year = {2018}, - pages = {43}, - file = {Baydin et al_Automatic Differentiation in Machine Learning.pdf:/home/dferigo/Zotero/storage/9SE98F69/Baydin et al_Automatic Differentiation in Machine Learning.pdf:application/pdf}, + urldate = {2022-06-23}, + publisher = {Springer}, + author = {Selig, Jon M.}, + year = {2005}, + file = {Snapshot:/home/dferigo/Zotero/storage/22L6Y9MA/b138859.html:text/html}, } -@inproceedings{xia_gibson_2018, - title = {Gibson {Env}: {Real}-{World} {Perception} for {Embodied} {Agents}}, - isbn = {978-1-5386-6420-9}, - shorttitle = {Gibson {Env}}, - doi = {10.1109/CVPR.2018.00945}, +@book{maruskin_dynamical_2018, + title = {Dynamical {Systems} and {Geometric} {Mechanics}: {An} {Introduction}}, + isbn = {978-3-11-059780-6}, + shorttitle = {Dynamical {Systems} and {Geometric} {Mechanics}}, + url = {https://www.degruyter.com/document/doi/10.1515/9783110597806/html}, language = {en}, - urldate = {2019-05-21}, - booktitle = {{IEEE}/{CVF} {Conference} on {Computer} {Vision} and {Pattern} {Recognition}}, - author = {Xia, Fei and Zamir, Amir R. and He, Zhiyang and Sax, Alexander and Malik, Jitendra and Savarese, Silvio}, - month = jun, + urldate = {2022-03-30}, + publisher = {De Gruyter}, + author = {Maruskin, Jared}, + month = aug, year = {2018}, - file = {Xia et al_2018_Gibson Env.pdf:/home/dferigo/Zotero/storage/X7M9GV3X/Xia et al_2018_Gibson Env.pdf:application/pdf}, + doi = {10.1515/9783110597806}, + file = {Maruskin_2018_Dynamical Systems and Geometric Mechanics.pdf:/home/dferigo/Zotero/storage/E95RJ763/Maruskin_2018_Dynamical Systems and Geometric Mechanics.pdf:application/pdf}, } -@article{ramos_bayessim_2019, - title = {{BayesSim}: adaptive domain randomization via probabilistic inference for robotics simulators}, - shorttitle = {{BayesSim}}, - abstract = {We introduce BayesSim 1, a framework for robotics simulations allowing a full Bayesian treatment for the parameters of the simulator. As simulators become more sophisticated and able to represent the dynamics more accurately, fundamental problems in robotics such as motion planning and perception can be solved in simulation and solutions transferred to the physical robot. However, even the most complex simulator might still not be able to represent reality in all its details either due to inaccurate parametrization or simplistic assumptions in the dynamic models. BayesSim provides a principled framework to reason about the uncertainty of simulation parameters. Given a black box simulator (or generative model) that outputs trajectories of state and action pairs from unknown simulation parameters, followed by trajectories obtained with a physical robot, we develop a likelihood-free inference method that computes the posterior distribution of simulation parameters. This posterior can then be used in problems where Sim2Real is critical, for example in policy search. We compare the performance of BayesSim in obtaining accurate posteriors in a number of classical control and robotics problems. Results show that the posterior computed from BayesSim can be used for domain randomization outperforming alternative methods that randomize based on uniform priors.}, - language = {en}, - urldate = {2019-08-09}, - author = {Ramos, Fabio and Possas, Rafael Carvalhaes and Fox, Dieter}, - month = jun, - year = {2019}, - note = {arXiv: 1906.01728}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, - file = {Ramos et al_2019_BayesSim.pdf:/home/dferigo/Zotero/storage/2GPXV65N/Ramos et al_2019_BayesSim.pdf:application/pdf}, +@book{bullo_geometric_2004, + title = {Geometric {Control} of {Mechanical} {Systems}: {Modeling}, {Analysis}, and {Design} for {Simple} {Mechanical} {Control} {Systems}}, + volume = {49}, + publisher = {Springer Science \& Business Media}, + author = {Bullo, Francesco and Lewis, Andrew D.}, + year = {2004}, + file = {Bullo_Lewis_2004_Geometric Control of Mechanical Systems.pdf:/home/dferigo/Zotero/storage/AYJJLTUV/Bullo_Lewis_2004_Geometric Control of Mechanical Systems.pdf:application/pdf}, } -@inproceedings{parker_optix_2010, - title = {{OptiX}: {A} {General} {Purpose} {Ray} {Tracing} {Engine}}, - isbn = {978-1-4503-0210-4}, - shorttitle = {{OptiX}}, - doi = {10.1145/1833349.1778803}, - abstract = {The NVIDIA® OptiX™ ray tracing engine is a programmable system designed for NVIDIA GPUs and other highly parallel architectures. The OptiX engine builds on the key observation that most ray tracing algorithms can be implemented using a small set of programmable operations. Consequently, the core of OptiX is a domain-specific just-in-time compiler that generates custom ray tracing kernels by combining user-supplied programs for ray generation, material shading, object intersection, and scene traversal. This enables the implementation of a highly diverse set of ray tracing-based algorithms and applications, including interactive rendering, offline rendering, collision detection systems, artificial intelligence queries, and scientific simulations such as sound propagation. OptiX achieves high performance through a compact object model and application of several ray tracing-specific compiler optimizations. For ease of use it exposes a single-ray programming model with full support for recursion and a dynamic dispatch mechanism similar to virtual function calls.}, - urldate = {2019-05-22}, - author = {Parker, Steven G. and Bigler, James and Dietrich, Andreas and Friedrich, Heiko and Hoberock, Jared and Luebke, David and McAllister, David and McGuire, Morgan and Morley, Keith and Robison, Austin and Stich, Martin}, - year = {2010}, - keywords = {graphics hardware, graphics systems, ray tracing}, +@inproceedings{nava_stability_2016, + title = {Stability analysis and design of momentum-based controllers for humanoid robots}, + doi = {10.1109/IROS.2016.7759126}, + abstract = {Envisioned applications for humanoid robots call for the design of balancing and walking controllers. While promising results have been recently achieved, robust and reliable controllers are still a challenge for the control community dealing with humanoid robotics. Momentum-based strategies have proven their effectiveness for controlling humanoids balancing, but the stability analysis of these controllers is still missing. The contribution of this paper is twofold. First, we numerically show that the application of state-of-the-art momentum-based control strategies may lead to unstable zero dynamics. Secondly, we propose simple modifications to the control architecture that avoid instabilities at the zero-dynamics level. Asymptotic stability of the closed loop system is shown by means of a Lyapunov analysis on the linearized system's joint space. The theoretical results are validated with both simulations and experiments on the iCub humanoid robot.}, + booktitle = {2016 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, + author = {Nava, Gabriele and Romano, Francesco and Nori, Francesco and Pucci, Daniele}, + month = oct, + year = {2016}, + note = {ISSN: 2153-0866}, + keywords = {Humanoid robots, Stability analysis, Legged locomotion, Robot kinematics, humanoid robots, legged locomotion, Acceleration, asymptotic stability, Asymptotic stability, balancing controller design, closed loop system asymptotic stability, closed loop systems, control system synthesis, iCub humanoid robot, linearisation techniques, linearized system joint space, Lyapunov analysis, Lyapunov methods, momentum, momentum-based controller design, robust control, robust controllers, stability analysis, unstable zero dynamics, walking controller design}, + pages = {680--687}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/2QU4BGKY/7759126.html:text/html;Nava et al_2016_Stability analysis and design of momentum-based controllers for humanoid robots.pdf:/home/dferigo/Zotero/storage/XPQJX7AK/Nava et al_2016_Stability analysis and design of momentum-based controllers for humanoid robots.pdf:application/pdf}, } -@article{lopez_gym-gazebo2_2019, - title = {gym-gazebo2, a toolkit for reinforcement learning using {ROS} 2 and {Gazebo}}, - url = {http://arxiv.org/abs/1903.06278}, - abstract = {This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.}, - urldate = {2019-03-18}, - journal = {arXiv:1903.06278 [cs]}, - author = {Lopez, Nestor Gonzalez and Nuin, Yue Leire Erro and Moral, Elias Barba and Juan, Lander Usategui San and Rueda, Alejandro Solano and Vilches, Víctor Mayoral and Kojcev, Risto}, - month = mar, - year = {2019}, - note = {arXiv: 1903.06278}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/ISB6DK6W/1903.html:text/html;Lopez et al_2019_gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo.pdf:/home/dferigo/Zotero/storage/QVQB9GJE/Lopez et al_2019_gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo.pdf:application/pdf}, +@inproceedings{yang_emergence_2017, + title = {Emergence of human-comparable balancing behaviours by deep reinforcement learning}, + doi = {10.1109/HUMANOIDS.2017.8246900}, + abstract = {This paper presents a hierarchical framework based on deep reinforcement learning that naturally acquires control policies that are capable of performing balancing behaviours such as ankle push-offs for humanoid robots, without explicit human design of controllers. Only the reward for training the neural network is specifically formulated based on the physical principles and quantities, and hence explainable. The successful emergence of human-comparable behaviours through the deep reinforcement learning demonstrates the feasibility of using an AI-based approach for humanoid motion control in a unified framework. Moreover, the balance strategies learned by reinforcement learning provides a larger range of disturbance rejection than that of the zero moment point based methods, suggesting a research direction of using learning-based controls to explore the optimal performance.}, + booktitle = {2017 {IEEE}-{RAS} 17th {International} {Conference} on {Humanoid} {Robotics} ({Humanoids})}, + author = {Yang, Chuanyu and Komura, Taku and Li, Zhibin}, + month = nov, + year = {2017}, + note = {ISSN: 2164-0580}, + keywords = {Legged locomotion, humanoid robots, learning (artificial intelligence), motion control, robot programming, Machine learning, Foot, control engineering computing, control policies, deep reinforcement learning, human-comparable balancing behaviours, humanoid motion control, Pelvis, Torso, zero moment point}, + pages = {372--377}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/4CTDBA8W/8246900.html:text/html;Yang et al_2017_Emergence of human-comparable balancing behaviours by deep reinforcement.pdf:/home/dferigo/Zotero/storage/WB86I9KP/Yang et al_2017_Emergence of human-comparable balancing behaviours by deep reinforcement.pdf:application/pdf}, } -@article{juliani_unity_2018, - title = {Unity: {A} {General} {Platform} for {Intelligent} {Agents}}, - shorttitle = {Unity}, - url = {http://arxiv.org/abs/1809.02627}, - abstract = {Recent advances in Deep Reinforcement Learning and Robotics have been driven by the presence of increasingly realistic and complex simulation environments. Many of the existing platforms, however, provide either unrealistic visuals, inaccurate physics, low task complexity, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, hence turning the simulation environment into a black-box from the perspective of the learning system. Here we describe a new open source toolkit for creating and interacting with simulation environments using the Unity platform: Unity ML-Agents Toolkit. By taking advantage of Unity as a simulation platform, the toolkit enables the development of learning environments which are rich in sensory and physical complexity, provide compelling cognitive challenges, and support dynamic multi-agent interaction. We detail the platform design, communication protocol, set of example environments, and variety of training scenarios made possible via the toolkit.}, - urldate = {2019-05-03}, - journal = {arXiv:1809.02627 [cs, stat]}, - author = {Juliani, Arthur and Berges, Vincent-Pierre and Vckay, Esh and Gao, Yuan and Henry, Hunter and Mattar, Marwan and Lange, Danny}, - month = sep, - year = {2018}, - note = {arXiv: 1809.02627}, - keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning, Computer Science - Neural and Evolutionary Computing}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/DT6IGGHT/1809.html:text/html;Juliani et al_2018_Unity.pdf:/home/dferigo/Zotero/storage/DBUW2T99/Juliani et al_2018_Unity.pdf:application/pdf}, +@incollection{cangelosi_robot_2022, + address = {Cambridge, MA, USA}, + series = {Intelligent {Robotics} and {Autonomous} {Agents} series}, + title = {Robot {Platforms} and {Simulators}}, + copyright = {All rights reserved}, + isbn = {978-0-262-04683-1}, + abstract = {The current state of the art in cognitive robotics, covering the challenges of building AI-powered intelligent robots inspired by natural cognitive systems.}, + language = {en}, + booktitle = {Cognitive {Robotics}}, + publisher = {MIT Press}, + author = {Ferigo, Diego and Parmiggiani, Alberto and Rampone, Elena and Tikhanoff, Vadim and Traversaro, Silvio and Pucci, Daniele and Natale, Lorenzo}, + editor = {Cangelosi, Angelo and Asada, Minoru and Arkin, Ronald C.}, + month = may, + year = {2022}, + file = {Open Access:/home/dferigo/Zotero/storage/MKCT3PSL/Cognitive-Robotics.html:text/html}, } -@inproceedings{krammer_standardized_2019, - title = {Standardized {Integration} of {Real}-{Time} and {Non}-{Real}-{Time} {Systems}: {The} {Distributed} {Co}-{Simulation} {Protocol}}, - shorttitle = {Standardized {Integration} of {Real}-{Time} and {Non}-{Real}-{Time} {Systems}}, - doi = {10.3384/ecp1915787}, - abstract = {Co-simulation techniques have evolved significantly over the last 10 years. System simulation and hardware-in-the-loop testing are used to develop complex products in many industrial sectors. The Functional Mock-Up Interface (FMI) represents a standardized solution for integration of simulation models, tools and solvers. In practice the integration and coupling of heterogeneous systems still require enormous efforts. Until now no standardized interface or protocol specification is available, which allows the interaction of real-time and non-real-time systems of different vendors. This paper presents selected technical aspects of the novel Distributed Co-simulation Protocol (DCP) and highlights primary application possibilities. The DCP consists of a data model, a finite state machine, and a communication protocol including a set of protocol data units. It supports a master-slave architecture for simulation setup and control. The DCP was developed in context of the ACOSAR project and was subsequently adopted by Modelica Association as a Modelica Association Project (MAP). It may be used in numerous industrial and scientific applications. The standardization of the DCP allows for a modular and interoperable development between system providers and integrators. In the end, this will lead to more efficient product development and testing.}, +@techreport{chen_imitation_2022, + title = {Imitation {Learning} via {Differentiable} {Physics}}, + url = {http://arxiv.org/abs/2206.04873}, + abstract = {Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to suffer long training time and high variance. In this work, we identify the benefits of differentiable physics simulators and propose a new IL method, i.e., Imitation Learning via Differentiable Physics (ILD), which gets rid of the double-loop design and achieves significant improvements in final performance, convergence speed, and stability. The proposed ILD incorporates the differentiable physics simulator as a physics prior into its computational graph for policy learning. It unrolls the dynamics by sampling actions from a parameterized policy, simply minimizing the distance between the expert trajectory and the agent trajectory, and back-propagating the gradient into the policy via temporal physics operators. With the physics prior, ILD policies can not only be transferable to unseen environment specifications but also yield higher final performance on a variety of tasks. In addition, ILD naturally forms a single-loop structure, which significantly improves the stability and training speed. To simplify the complex optimization landscape induced by temporal physics operations, ILD dynamically selects the learning objectives for each state during optimization. In our experiments, we show that ILD outperforms state-of-the-art methods in a variety of continuous control tasks with Brax, requiring only one expert demonstration. In addition, ILD can be applied to challenging deformable object manipulation tasks and can be generalized to unseen configurations.}, language = {en}, - urldate = {2019-08-08}, - booktitle = {13th {International} {Modelica} {Conference}}, - author = {Krammer, Martin and Schuch, Klaus and Kater, Christian and Alekeish, Khaled and Blochwitz, Torsten and Materne, Stefan and Soppa, Andreas and Benedikt, Martin}, - month = feb, - year = {2019}, - file = {Krammer et al_2019_Standardized Integration of Real-Time and Non-Real-Time Systems.pdf:/home/dferigo/Zotero/storage/B7937HYG/Krammer et al_2019_Standardized Integration of Real-Time and Non-Real-Time Systems.pdf:application/pdf}, + number = {arXiv:2206.04873}, + urldate = {2022-06-18}, + institution = {arXiv}, + author = {Chen, Siwei and Ma, Xiao and Xu, Zhongwen}, + month = jun, + year = {2022}, + note = {arXiv:2206.04873 [cs] +type: article}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {Chen et al_2022_Imitation Learning via Differentiable Physics.pdf:/home/dferigo/Zotero/storage/AHDG85ED/Chen et al_2022_Imitation Learning via Differentiable Physics.pdf:application/pdf}, } -@inproceedings{delhaisse_pyrobolearn_2019, - title = {{PyRoboLearn}: {A} {Python} {Framework} for {Robot} {Learning} {Practitioners}}, - abstract = {On the quest for building autonomous robots, several robot learning frameworks with different functionalities have recently been developed. Yet, frameworks that combine diverse learning paradigms (such as imitation and reinforcement learning) into a common place are scarce. Existing ones tend to be robot-specific, and often require time-consuming work to be used with other robots. Also, their architecture is often weakly structured, mainly because of a lack of modularity and flexibility. This leads users to reimplement several pieces of code to integrate them into their own experimental or benchmarking work. To overcome these issues, we introduce PyRoboLearn, a new Python robot learning framework that combines different learning paradigms into a single framework. Our framework provides a plethora of robotic environments, learning models and algorithms. PyRoboLearn is developed with a particular focus on modularity, flexibility, generality, and simplicity to favor (re)usability. This is achieved by abstracting each key concept, undertaking a modular programming approach, minimizing the coupling among the different modules, and favoring composition over inheritance for better flexibility. We demonstrate the different features and utility of our framework through different use cases.}, - language = {en}, - author = {Delhaisse, Brian and Rozo, Leonel and Caldwell, Darwin G}, - year = {2019}, - pages = {11}, - file = {Delhaisse et al_2019_PyRoboLearn.pdf:/home/dferigo/Zotero/storage/DP65NMCS/Delhaisse et al_2019_PyRoboLearn.pdf:application/pdf}, +@misc{bradbury_james_jax_2018, + title = {{JAX}: composable transformations of {Python}+{NumPy} programs}, + copyright = {Apache-2.0}, + shorttitle = {{JAX}}, + url = {https://github.com/google/jax}, + urldate = {2022-03-07}, + publisher = {Google}, + author = {{Bradbury, James} and {Frostig, Roy} and {Hawkins, Peter} and {Johnson, Matthew James} and {Leary, Chris} and {Maclaurin, Dougal} and {Necula, George} and {Paszke, Adam} and {VanderPlas, James} and {Wanderman-Milne, Skye} and {Zhang, Qiao}}, + year = {2018}, + note = {original-date: 2018-10-25T21:25:02Z}, + keywords = {jax}, } -@article{lucchi_robo-gym_2020, - title = {robo-gym -- {An} {Open} {Source} {Toolkit} for {Distributed} {Deep} {Reinforcement} {Learning} on {Real} and {Simulated} {Robots}}, - url = {http://arxiv.org/abs/2007.02753}, - abstract = {Applying Deep Reinforcement Learning (DRL) to complex tasks in the field of robotics has proven to be very successful in the recent years. However, most of the publications focus either on applying it to a task in simulation or to a task in a real world setup. Although there are great examples of combining the two worlds with the help of transfer learning, it often requires a lot of additional work and fine-tuning to make the setup work effectively. In order to increase the use of DRL with real robots and reduce the gap between simulation and real world robotics, we propose an open source toolkit: robo-gym. We demonstrate a unified setup for simulation and real environments which enables a seamless transfer from training in simulation to application on the robot. We showcase the capabilities and the effectiveness of the framework with two real world applications featuring industrial robots: a mobile robot and a robot arm. The distributed capabilities of the framework enable several advantages like using distributed algorithms, separating the workload of simulation and training on different physical machines as well as enabling the future opportunity to train in simulation and real world at the same time. Finally we offer an overview and comparison of robo-gym with other frequently used state-of-the-art DRL frameworks.}, - urldate = {2022-05-02}, - journal = {arXiv:2007.02753 [cs]}, - author = {Lucchi, Matteo and Zindler, Friedemann and Mühlbacher-Karrer, Stephan and Pichler, Horst}, - month = nov, - year = {2020}, - note = {arXiv: 2007.02753}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Software Engineering}, - file = {Lucchi et al_2020_robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/R5BMV45S/Lucchi et al_2020_robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement Learning.pdf:application/pdf}, +@inproceedings{darvish_whole-body_2019, + address = {Toronto, ON, Canada}, + title = {Whole-{Body} {Geometric} {Retargeting} for {Humanoid} {Robots}}, + copyright = {All rights reserved}, + isbn = {978-1-5386-7630-1}, + url = {https://ieeexplore.ieee.org/document/9035059/}, + doi = {10.1109/Humanoids43949.2019.9035059}, + abstract = {Humanoid robot teleoperation allows humans to integrate their cognitive capabilities with the apparatus to perform tasks that need high strength, manoeuvrability and dexterity. This paper presents a framework for teleoperation of humanoid robots using a novel approach for motion retargeting through inverse kinematics over the robot model. The proposed method enhances scalability for retargeting, i.e., it allows teleoperating different robots by different human users with minimal changes to the proposed system. Our framework enables an intuitive and natural interaction between the human operator and the humanoid robot at the configuration space level. We validate our approach by demonstrating whole-body retargeting with multiple robot models. Furthermore, we present experimental validation through teleoperation experiments using two state-of-the-art whole-body controllers for humanoid robots.}, + language = {en}, + urldate = {2021-10-28}, + booktitle = {2019 {IEEE}-{RAS} 19th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, + publisher = {IEEE}, + author = {Darvish, Kourosh and Tirupachuri, Yeshasvi and Romualdi, Giulio and Rapetti, Lorenzo and Ferigo, Diego and Chavez, Francisco Javier Andrade and Pucci, Daniele}, + month = oct, + year = {2019}, + pages = {679--686}, } -@article{james_pyrep_2019, - title = {{PyRep}: {Bringing} {V}-{REP} to {Deep} {Robot} {Learning}}, - shorttitle = {{PyRep}}, - url = {http://arxiv.org/abs/1906.11176}, - abstract = {PyRep is a toolkit for robot learning research, built on top of the virtual robotics experimentation platform (V-REP). Through a series of modifications and additions, we have created a tailored version of V-REP built with robot learning in mind. The new PyRep toolkit offers three improvements: (1) a simple and flexible API for robot control and scene manipulation, (2) a new rendering engine, and (3) speed boosts upwards of 10,000x in comparison to the previous Python Remote API. With these improvements, we believe PyRep is the ideal toolkit to facilitate rapid prototyping of learning algorithms in the areas of reinforcement learning, imitation learning, state estimation, mapping, and computer vision.}, - urldate = {2022-05-02}, - journal = {arXiv:1906.11176 [cs]}, - author = {James, Stephen and Freese, Marc and Davison, Andrew J.}, - month = jun, +@inproceedings{latella_human_2019, + title = {A {Human} {Wearable} {Framework} for {Physical} {Human}-{Robot} {Interaction}}, + copyright = {Creative Commons Attribution 4.0 International, Open Access}, + url = {https://zenodo.org/record/4782543}, + doi = {10.5281/ZENODO.4782543}, + abstract = {Physical interaction between humans and robots is more and more frequently required in the modern society. We introduce a novel framework endowing robots with the ability to perceive humans while collaborating at the same task. The framework embodies a human wearable system (i.e., motion tracking suit, force/torque sensors shoes) and a probabilistic algorithm for the human floating-base whole-body kinematics and dynamics estimation.}, + language = {en}, + urldate = {2021-09-08}, + author = {Latella, Claudia and Tirupachuri, Yeshasvi and Rapetti, Lorenzo and Ferigo, Diego and Traversaro, Silvio and Sorrentino, Ines and Chavez, Francisco Javier Andrade and Nori, Francesco and Pucci, Daniele}, year = {2019}, - note = {arXiv: 1906.11176}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Computer Vision and Pattern Recognition}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/B7I7BMTG/1906.html:text/html;James et al_2019_PyRep.pdf:/home/dferigo/Zotero/storage/72C2E4NJ/James et al_2019_PyRep.pdf:application/pdf}, + note = {Conference Name: I-RIM}, + keywords = {Floating-base human dynamics, human wearable system, real-time human joint torque estimation}, } -@article{choi_use_2021, - title = {On the use of simulation in robotics: {Opportunities}, challenges, and suggestions for moving forward}, - copyright = {Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.}, - shorttitle = {On the use of simulation in robotics}, - url = {https://dspace.mit.edu/handle/1721.1/139616}, - abstract = {© 2021 National Academy of Sciences. All rights reserved. The last five years marked a surge in interest for and use of smart robots, which operate in dynamic and unstructured environments and might interact with humans. We posit that well-validated computer simulation can provide a virtual proving ground that in many cases is instrumental in understanding safely, faster, at lower costs, and more thoroughly how the robots of the future should be designed and controlled for safe operation and improved performance. Against this backdrop, we discuss how simulation can help in robotics, barriers that currently prevent its broad adoption, and potential steps that can eliminate some of these barriers. The points and recommendations made concern the following simulation-in-robotics aspects: simulation of the dynamics of the robot; simulation of the virtual world; simulation of the sensing of this virtual world; simulation of the interaction between the human and the robot; and, in less depth, simulation of the communication between robots. This Perspectives contribution summarizes the points of view that coalesced during a 2018 National Science Foundation/Department of Defense/National Institute for Standards and Technology workshop dedicated to the topic at hand. The meeting brought together participants from a range of organizations, disciplines, and application fields, with expertise at the intersection of robotics, machine learning, and physics-based simulation.}, +@article{ferigo_generic_2020, + title = {A generic synchronous dataflow architecture to rapidly prototype and deploy robot controllers}, + copyright = {All rights reserved}, + shorttitle = {A generic synchronous dataflow architecture to rapidly prototype and deploy robot controllers}, + url = {https://journals.sagepub.com/doi/10.1177/1729881420921625}, + doi = {10.1177/1729881420921625}, + abstract = {The article presents a software architecture to optimize the process of prototyping and deploying robot controllers that are synthesized using model-based desig...}, language = {en}, - urldate = {2022-05-06}, - journal = {PNAS}, - author = {Choi, HeeSun and Crump, Cindy and Duriez, Christian and Elmquist, Asher and Hager, Gregory and Han, David and Hearl, Frank and Hodgins, Jessica and Jain, Abhinandan and Leve, Frederick and Li, Chen and Meier, Franziska and Negrut, Dan and Righetti, Ludovic and Rodriguez, Alberto and Tan, Jie and Trinkle, Jeff}, - year = {2021}, - note = {Accepted: 2022-01-14T19:58:12Z -Publisher: Proceedings of the National Academy of Sciences}, - file = {Choi et al_2021_On the use of simulation in robotics.pdf:/home/dferigo/Zotero/storage/2Z2DTR3H/Choi et al_2021_On the use of simulation in robotics.pdf:application/pdf;Snapshot:/home/dferigo/Zotero/storage/D7Q7XUG8/139616.html:text/html}, + urldate = {2020-04-30}, + journal = {International Journal of Advanced Robotic Systems}, + author = {Ferigo, Diego and Traversaro, Silvio and Romano, Francesco and Pucci, Daniele}, + year = {2020}, } -@inproceedings{pratt_capture_2006, - title = {Capture {Point}: {A} {Step} toward {Humanoid} {Push} {Recovery}}, - shorttitle = {Capture {Point}}, - doi = {10.1109/ICHR.2006.321385}, - abstract = {It is known that for a large magnitude push a human or a humanoid robot must take a step to avoid a fall. Despite some scattered results, a principled approach towards "when and where to take a step" has not yet emerged. Towards this goal, we present methods for computing capture points and the capture region, the region on the ground where a humanoid must step to in order to come to a complete stop. The intersection between the capture region and the base of support determines which strategy the robot should adopt to successfully stop in a given situation. Computing the capture region for a humanoid, in general, is very difficult. However, with simple models of walking, computation of the capture region is simplified. We extend the well-known linear inverted pendulum model to include a flywheel body and show how to compute exact solutions of the capture region for this model. Adding rotational inertia enables the humanoid to control its centroidal angular momentum, much like the way human beings do, significantly enlarging the capture region. We present simulations of a simple planar biped that can recover balance after a push by stepping to the capture region and using internal angular momentum. Ongoing work involves applying the solution from the simple model as an approximate solution to more complex simulations of bipedal walking, including a 3D biped with distributed mass.}, - booktitle = {2006 6th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, - author = {Pratt, Jerry and Carff, John and Drakunov, Sergey and Goswami, Ambarish}, - month = dec, - year = {2006}, - note = {ISSN: 2164-0580}, - keywords = {nonlinear control systems, humanoid robots, humanoid robot, Humanoid robots, Acceleration, legged locomotion, Legged locomotion, Computational modeling, Foot, torque control, Biological system modeling, biped robot, capture point, capture region, centroidal angular momentum, Closed-form solution, Cognition, flywheel body, Flywheels, humanoid push recovery, Humans, linear inverted pendulum model, position control, rotational inertia}, - pages = {200--207}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/JNWUQT8M/4115602.html:text/html;Pratt et al_2006_Capture Point.pdf:/home/dferigo/Zotero/storage/2M2Q2H7D/Pratt et al_2006_Capture Point.pdf:application/pdf}, +@inproceedings{ferigo_gym-ignition_2019, + title = {Gym-{Ignition}: {Reproducible} {Robotic} {Simulations} for {Reinforcement} {Learning}}, + copyright = {All rights reserved}, + abstract = {This paper presents Gym-Ignition, a new framework to create reproducible robotic environments for reinforcement learning research. It interfaces with the new generation of Gazebo, part of the Ignition Robotics suite. The modular architecture of Ignition Gazebo allows using the simulator as a library, simplifying the interconnection between the simulator and external software. Gym-Ignition enables the creation of environments compatible with OpenAI Gym that are executed in Ignition Gazebo. The Gym-Ignition software architecture can interact natively with programs developed in low-level languages. The advantage is twofold: it permits direct interaction with robots in real-time, and it simplifies the embedding of existing software such as low-level robotic controllers.}, + booktitle = {Robotics: {Science} and {Systems} ({RSS}) - {Workshop} on {Closing} the {Reality} {Gap} in {Sim2real} {Transfer} for {Robotic} {Manipulation}}, + author = {Ferigo, Diego and Traversaro, Silvio and Pucci, Daniele}, + year = {2019}, + keywords = {own}, } -@article{shafiee-ashtiani_push_2017, - title = {Push {Recovery} of a {Position}-{Controlled} {Humanoid} {Robot} {Based} on {Capture} {Point} {Feedback} {Control}}, - url = {http://arxiv.org/abs/1710.10598}, - abstract = {In this paper, a combination of ankle and hip strategy is used for push recovery of a position-controlled humanoid robot. Ankle strategy and hip strategy are equivalent to Center of Pressure (CoP) and Centroidal Moment Pivot (CMP) regulation respectively. For controlling the CMP and CoP we need a torque-controlled robot, however most of the conventional humanoid robots are position controlled. In this regard, we present an efficient way for implementation of the hip and ankle strategies on a position controlled humanoid robot. We employ a feedback controller to compensate the capture point error. Using our scheme, a simple and practical push recovery controller is designed which can be implemented on the most of the conventional humanoid robots without the need for torque sensors. The effectiveness of the proposed approach is verified through push recovery experiments on SURENA-Mini humanoid robot under severe pushes.}, +@inproceedings{ferigo_gym-ignition_2020, + title = {Gym-{Ignition}: {Reproducible} {Robotic} {Simulations} for {Reinforcement} {Learning}}, + copyright = {All rights reserved}, + shorttitle = {Gym-{Ignition}}, + doi = {10.1109/SII46433.2020.9025951}, + abstract = {This paper presents Gym-Ignition, a new framework to create reproducible robotic environments for reinforcement learning research. It interfaces with the new generation of Gazebo, part of the Ignition Robotics suite, which provides three main improvements for reinforcement learning applications compared to the alternatives: 1) the modular architecture enables using the simulator as a C++ library, simplifying the interconnection with external software; 2) multiple physics and rendering engines are supported as plugins, simplifying their selection during the execution; 3) the new distributed simulation capability allows simulating complex scenarios while sharing the load on multiple workers and machines. The core of Gym-Ignition is a component that contains the Ignition Gazebo simulator and exposes a simple interface for its configuration and execution. We provide a Python package that allows developers to create robotic environments simulated in Ignition Gazebo. Environments expose the common OpenAI Gym interface, making them compatible out-of-the-box with third-party frameworks containing reinforcement learning algorithms. Simulations can be executed in both headless and GUI mode, the physics engine can run in accelerated mode, and instances can be parallelized. Furthermore, the Gym-Ignition software architecture provides abstraction of the Robot and the Task, making environments agnostic on the specific runtime. This abstraction allows their execution also in a real-time setting on actual robotic platforms, even if driven by different middlewares.}, + booktitle = {{IEEE}/{SICE} {International} {Symposium} on {System} {Integration} ({SII})}, + author = {Ferigo, Diego and Traversaro, Silvio and Metta, Giorgio and Pucci, Daniele}, + year = {2020}, + note = {ISSN: 2474-2325}, + keywords = {Real-time systems, Robots, Task analysis, Learning (artificial intelligence), Physics, Ignition, own}, + pages = {885--890}, +} + +@article{ferigo_emergence_2021, + title = {On the {Emergence} of {Whole}-body {Strategies} from {Humanoid} {Robot} {Push}-recovery {Learning}}, + abstract = {Balancing and push-recovery are essential capabilities enabling humanoid robots to solve complex locomotion tasks. In this context, classical control systems tend to be based on simplified physical models and hard-coded strategies. Although successful in specific scenarios, this approach requires demanding tuning of parameters and switching logic between specifically-designed controllers for handling more general perturbations. We apply model-free Deep Reinforcement Learning for training a general and robust humanoid push-recovery policy in a simulation environment. Our method targets highdimensional whole-body humanoid control and is validated on the iCub humanoid. Reward components incorporating expert knowledge on humanoid control enable fast learning of several robust behaviors by the same policy, spanning the entire body. We validate our method with extensive quantitative analyses in simulation, including out-of-sample tasks which demonstrate policy robustness and generalization, both key requirements towards real-world robot deployment.}, + journal = {IEEE Robotics and Automation Letters}, + author = {Ferigo, Diego and Camoriano, Raffaello and Viceconte, Paolo Maria and Calandriello, Daniele and Traversaro, Silvio and Rosasco, Lorenzo and Pucci, Daniele}, + year = {2021}, + keywords = {own}, +} + +@inproceedings{tirupachuri_towards_2020, + address = {Cham}, + series = {Advances in {Intelligent} {Systems} and {Computing}}, + title = {Towards {Partner}-{Aware} {Humanoid} {Robot} {Control} {Under} {Physical} {Interactions}}, + copyright = {All rights reserved}, + isbn = {978-3-030-29513-4}, + doi = {10.1007/978-3-030-29513-4_78}, + abstract = {The topic of physical human-robot interaction received a lot of attention from the robotics community because of many promising application domains. However, studying physical interaction between a robot and an external agent, like a human or another robot, without considering the dynamics of both the systems may lead to many shortcomings in fully exploiting the interaction. In this paper, we present a coupled-dynamics formalism followed by a sound approach in exploiting helpful interaction with a humanoid robot. In particular, we propose the first attempt to define and exploit the human help for the robot to accomplish a specific task. As a result, we present a task-based partner-aware robot control techniques. The theoretical results are validated by conducting experiments with two iCub humanoid robots involved in physical interaction.}, language = {en}, - urldate = {2022-05-21}, - journal = {arXiv:1710.10598 [cs]}, - author = {Shafiee-Ashtiani, Milad and Yousefi-Koma, Aghil and Mirjalili, Reihaneh and Maleki, Hessam and Karimi, Mojtaba}, - month = oct, - year = {2017}, - note = {arXiv: 1710.10598}, - keywords = {Computer Science - Robotics}, - file = {Shafiee-Ashtiani et al. - 2017 - Push Recovery of a Position-Controlled Humanoid Ro.pdf:/home/dferigo/Zotero/storage/BFPJNTZT/Shafiee-Ashtiani et al. - 2017 - Push Recovery of a Position-Controlled Humanoid Ro.pdf:application/pdf}, + booktitle = {Intelligent {Systems} and {Applications}}, + publisher = {Springer International Publishing}, + author = {Tirupachuri, Yeshasvi and Nava, Gabriele and Latella, Claudia and Ferigo, Diego and Rapetti, Lorenzo and Tagliapietra, Luca and Nori, Francesco and Pucci, Daniele}, + editor = {Bi, Yaxin and Bhatia, Rahul and Kapoor, Supriya}, + year = {2020}, + keywords = {Humanoids, Physical human-robot interaction, Physical robot-robot interaction}, + pages = {1073--1092}, } -@article{shafiee_online_2019, - title = {Online {DCM} {Trajectory} {Generation} for {Push} {Recovery} of {Torque}-{Controlled} {Humanoid} {Robots}}, - url = {http://arxiv.org/abs/1909.10403}, - abstract = {We present a computationally efficient method for online planning of bipedal walking trajectories with push recovery. In particular, the proposed methodology fits control architectures where the Divergent-Component-of-Motion (DCM) is planned beforehand, and adds a step adapter to adjust the planned trajectories and achieve push recovery. Assuming that the robot is in a single support state, the step adapter generates new positions and timings for the next step. The step adapter is active in single support phases only, but the proposed torque-control architecture considers double support phases too. The key idea for the design of the step adapter is to impose both initial and final DCM step values using an exponential interpolation of the time varying ZMP trajectory. This allows us to cast the push recovery problem as a Quadratic Programming (QP) one, and to solve it online with stateof-the-art optimisers. The overall approach is validated with simulations of the torque-controlled 33 kg humanoid robot iCub. Results show that the proposed strategy prevents the humanoid robot from falling while walking at 0.28 m/s and pushed with external forces up to 150 Newton for 0.05 seconds.}, +@article{duan_learning_2020, + title = {Learning to {Avoid} {Obstacles} {With} {Minimal} {Intervention} {Control}}, + volume = {7}, + issn = {2296-9144}, + url = {https://www.frontiersin.org/article/10.3389/frobt.2020.00060}, + doi = {10.3389/frobt.2020.00060}, + abstract = {Programming by demonstration has received much attention as it offers a general framework which allows robots to efficiently acquire novel motor skills from a human teacher. While traditional imitation learning that only focuses on either Cartesian or joint space might become inappropriate in situations where both spaces are equally important (e.g., writing or striking task), hybrid imitation learning of skills in both Cartesian and joint spaces simultaneously has been studied recently. However, an important issue which often arises in dynamical or unstructured environments is overlooked, namely how can a robot avoid obstacles? In this paper, we aim to address the problem of avoiding obstacles in the context of hybrid imitation learning. Specifically, we propose to tackle three subproblems: (i) designing a proper potential field so as to bypass obstacles, (ii) guaranteeing joint limits are respected when adjusting trajectories in the process of avoiding obstacles, and (iii) determining proper control commands for robots such that potential human-robot interaction is safe. By solving the aforementioned subproblems, the robot is capable of generalizing observed skills to new situations featuring obstacles in a feasible and safe manner. The effectiveness of the proposed method is validated through a toy example as well as a real transportation experiment on the iCub humanoid robot.}, + urldate = {2021-10-28}, + journal = {Frontiers in Robotics and AI}, + author = {Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Huang, Yanlong and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele}, + year = {2020}, + pages = {60}, +} + +@article{latella_simultaneous_2019, + title = {Simultaneous {Floating}-{Base} {Estimation} of {Human} {Kinematics} and {Joint} {Torques}}, + volume = {19}, + copyright = {http://creativecommons.org/licenses/by/3.0/}, + url = {https://www.mdpi.com/1424-8220/19/12/2794}, + doi = {10.3390/s19122794}, + abstract = {The paper presents a stochastic methodology for the simultaneous floating-base estimation of the human whole-body kinematics and dynamics (i.e., joint torques, internal and external forces). The paper builds upon our former work where a fixed-base formulation had been developed for the human estimation problem. The presented approach is validated by presenting experimental results of a health subject equipped with a wearable motion tracking system and a pair of shoes sensorized with force/torque sensors while performing different motion tasks, e.g., walking on a treadmill. The results show that joint torque estimates obtained by using floating-base and fixed-base approaches match satisfactorily, thus validating the present approach.}, language = {en}, - urldate = {2022-05-21}, - journal = {arXiv:1909.10403 [cs]}, - author = {Shafiee, Milad and Romualdi, Giulio and Dafarra, Stefano and Chavez, Francisco Javier Andrade and Pucci, Daniele}, - month = oct, + number = {12}, + urldate = {2021-10-28}, + journal = {Sensors}, + author = {Latella, Claudia and Traversaro, Silvio and Ferigo, Diego and Tirupachuri, Yeshasvi and Rapetti, Lorenzo and Andrade Chavez, Francisco Javier and Nori, Francesco and Pucci, Daniele}, year = {2019}, - note = {arXiv: 1909.10403}, - keywords = {Computer Science - Robotics}, - file = {Shafiee et al. - 2019 - Online DCM Trajectory Generation for Push Recovery.pdf:/home/dferigo/Zotero/storage/SC6TWNXN/Shafiee et al. - 2019 - Online DCM Trajectory Generation for Push Recovery.pdf:application/pdf}, + note = {Number: 12 +Publisher: Multidisciplinary Digital Publishing Institute}, + keywords = {floating-base dynamics estimation, human joint torque analysis, human wearable dynamics}, + pages = {2794}, } -@inproceedings{abdallah_biomechanically_2005, - title = {A {Biomechanically} {Motivated} {Two}-{Phase} {Strategy} for {Biped} {Upright} {Balance} {Control}}, - doi = {10.1109/ROBOT.2005.1570406}, - abstract = {Balance maintenance and upright posture recovery under unexpected environmental forces are key requirements for safe and successful co-existence of humanoid robots in normal human environments. In this paper we present a two-phase control strategy for robust balance maintenance under a force disturbance. The first phase, called the reflex phase, is designed to withstand the immediate effect of the force. The second phase is the recovery phase where the system is steered back to a statically stable “home” posture. The reflex control law employs angular momentum and is characterized by its counter-intuitive quality of “yielding” to the disturbance. The recovery control employs a general scheme of seeking to maximize the potential energy and is robust to local ground surface feature. Biomechanics literature indicates a similar strategy in play during human balance maintenance.}, - booktitle = {Proceedings of the 2005 {IEEE} {International} {Conference} on {Robotics} and {Automation}}, - author = {Abdallah, M. and Goswami, A.}, +@inproceedings{duan_learning_2019, + title = {Learning to {Sequence} {Multiple} {Tasks} with {Competing} {Constraints}}, + copyright = {All rights reserved}, + doi = {10.1109/IROS40897.2019.8968496}, + abstract = {Imitation learning offers a general framework where robots can efficiently acquire novel motor skills from demonstrations of a human teacher. While many promising achievements have been shown, the majority of them are only focused on single-stroke movements, without taking into account the problem of multi-tasks sequencing. Conceivably, sequencing different atomic tasks can further augment the robot's capabilities as well as avoid repetitive demonstrations. In this paper, we propose to address the issue of multi-tasks sequencing with emphasis on handling the so-called competing constraints, which emerge due to the existence of the concurrent constraints from Cartesian and joint trajectories. Specifically, we explore the null space of the robot from an information-theoretic perspective in order to maintain imitation fidelity during transition between consecutive tasks. The effectiveness of the proposed method is validated through simulated and real experiments on the iCub humanoid robot.}, + booktitle = {2019 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, + author = {Duan, Anqing and Camoriano, Raffaello and Ferigo, Diego and Huang, Yanlong and Calandriello, Daniele and Rosasco, Lorenzo and Pucci, Daniele}, + year = {2019}, + note = {ISSN: 2153-0866}, + pages = {2672--2678}, +} + +@article{viceconte_adherent_2022, + title = {{ADHERENT}: {Learning} {Human}-like {Trajectory} {Generators} for {Whole}-body {Control} of {Humanoid} {Robots}}, + volume = {7}, + copyright = {All rights reserved}, + issn = {2377-3766}, + shorttitle = {{ADHERENT}}, + doi = {10.1109/LRA.2022.3141658}, + abstract = {Human-like trajectory generation and footstep planning represent challenging problems in humanoid robotics. Recently, research in computer graphics investigated machine-learning methods for character animation based on training human-like models directly on motion capture data. Such methods proved effective in virtual environments, mainly focusing on trajectory visualization. This letter presents ADHERENT, a system architecture integrating machine-learning methods used in computer graphics with whole-body control methods employed in robotics to generate and stabilize human-like trajectories for humanoid robots. Leveraging human motion capture locomotion data, ADHERENT yields a general footstep planner, including forward, sideways, and backward walking trajectories that blend smoothly from one to another. Furthermore, at the joint configuration level, ADHERENT computes data-driven whole-body postural reference trajectories coherent with the generated footsteps, thus increasing the human likeness of the resulting robot motion. Extensive validations of the proposed architecture are presented with both simulations and real experiments on the iCub humanoid robot, thus demonstrating ADHERENT to be robust to varying step sizes and walking speeds.}, + number = {2}, + journal = {IEEE Robotics and Automation Letters}, + author = {Viceconte, Paolo Maria and Camoriano, Raffaello and Romualdi, Giulio and Ferigo, Diego and Dafarra, Stefano and Traversaro, Silvio and Oriolo, Giuseppe and Rosasco, Lorenzo and Pucci, Daniele}, month = apr, - year = {2005}, - note = {ISSN: 1050-4729}, - keywords = {Robust control, Hip, Force control, Humanoid robots, Foot, Biological system modeling, Humans, Biomechanics, Biped robot, balance, disturbance rejection, posture recovery, potential energy, Potential energy, Robotics and automation}, - pages = {1996--2001}, - file = {Abdallah_Goswami_2005_A Biomechanically Motivated Two-Phase Strategy for Biped Upright Balance Control.pdf:/home/dferigo/Zotero/storage/CQLPYTNX/Abdallah_Goswami_2005_A Biomechanically Motivated Two-Phase Strategy for Biped Upright Balance Control.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/IGRBKH2M/1570406.html:text/html}, + year = {2022}, + note = {Conference Name: IEEE Robotics and Automation Letters}, + keywords = {Computer architecture, Robots, Humanoid robots, Legged locomotion, Robot kinematics, Trajectory, Generators, Humanoid robot systems, machine learning for robot control, whole-body motion planning and control}, + pages = {2779--2786}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/WMUUTDFE/9676410.html:text/html;Viceconte et al_2022_ADHERENT.pdf:/home/dferigo/Zotero/storage/CKLBBNAL/Viceconte et al_2022_ADHERENT.pdf:application/pdf}, } -@inproceedings{aftab_ankle_2012, - title = {Ankle, hip and stepping strategies for humanoid balance recovery with a single {Model} {Predictive} {Control} scheme}, - doi = {10.1109/HUMANOIDS.2012.6651514}, - abstract = {While humans are highly efficient in dealing with balance perturbations, current biped humanoid robots are far from showing similar skills. This is mainly due to the limited capacity of current robot controllers to deal with the inherently complex dynamics of biped robots. Though Model Predictive Control schemes have shown improved robustness to perturbations, they still suffer from a few shortcomings such as not considering the upper body inertial effects or non-optimal step durations. We propose here a Model Predictive Control scheme that specifically addresses these shortcomings and generates human-like responses to perturbations, involving appropriate combinations of ankle, hip and stepping strategies, with automatically adjusted step durations. The emphasis of this paper is on modeling and analyzing the effects of different cost functions and coefficients on the behavior of the controller while leaving real-time implementations and experiments for later work.}, - booktitle = {2012 12th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots} ({Humanoids} 2012)}, - author = {Aftab, Zohaib and Robert, Thomas and Wieber, Pierre-Brice}, +@inproceedings{dafarra_control_2018, + title = {A {Control} {Architecture} with {Online} {Predictive} {Planning} for {Position} and {Torque} {Controlled} {Walking} of {Humanoid} {Robots}}, + doi = {10.1109/IROS.2018.8594277}, + abstract = {A common approach to the generation of walking patterns for humanoid robots consists in adopting a layered control architecture. This paper proposes an architecture composed of three nested control loops. The outer loop exploits a robot kinematic model to plan the footstep positions. In the mid layer, a predictive controller generates a Center of Mass trajectory according to the well-known table-cart model. Through a whole-body inverse kinematics algorithm, we can define joint references for position controlled walking. The outcomes of these two loops are then interpreted as inputs of a stack-of-task QP-based torque controller, which represents the inner loop of the presented control architecture. This resulting architecture allows the robot to walk also in torque control, guaranteeing higher level of compliance. Real world experiments have been carried on the humanoid robot iCub.}, + booktitle = {2018 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, + author = {Dafarra, Stefano and Nava, Gabriele and Charbonneau, Marie and Guedelha, Nuno and Andrade, Francisco and Traversaro, Silvio and Fiorio, Luca and Romano, Francesco and Nori, Francesco and Metta, Giorgio and Pucci, Daniele}, + month = oct, + year = {2018}, + note = {ISSN: 2153-0866}, + keywords = {Computer architecture, Humanoid robots, Legged locomotion, Robot kinematics, Trajectory, humanoid robots, legged locomotion, Mathematical model, torque control, path planning, center of mass trajectory, control loops, footstep positions, gait analysis, iCub, inverse kinematics algorithm, layered control architecture, online predictive planning, position control, position controlled walking, predictive control, predictive controller, robot kinematic model, robot kinematics, stack-of-task QP-based torque controller, table-cart model, torque controlled walking}, + pages = {1--9}, + file = {Dafarra et al_2018_A Control Architecture with Online Predictive Planning for Position and Torque.pdf:/home/dferigo/Zotero/storage/LH9MSN77/Dafarra et al_2018_A Control Architecture with Online Predictive Planning for Position and Torque.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/96C6RSJ4/8594277.html:text/html}, +} + +@inproceedings{pucci_highly_2016, + title = {Highly dynamic balancing via force control}, + doi = {10.1109/HUMANOIDS.2016.7803266}, + abstract = {This video shows the latest results in the whole-body control of humanoid robots achieved by the Dynamic Interaction Control Lab at the Italian Institute of Technology. In particular, the control architecture is composed of two nested control loops. The internal loop, which runs at 1 KHz, is in charge of stabilizing any desired joint torque. This task is achieved thanks to an off-line identification procedure providing us with a reliable model of friction and motor constants. The outer loop, which generates desired joint torques at 100 Hz, is a momentum based control algorithm in the context of free-floating systems. More precisely, the control objective for the outer loop is the stabilization of the robot's linear and angular momentum and the associated zero dynamics. The latter objective can be used to stabilize a desired joint configuration. The stability of the control framework is shown to be in the sense of Lyapunov. The contact forces and torques are regulated so as to break contacts only at desired configurations. Switching between several contacts is taken into account thanks to a finite-state-machine that dictates the constraints acting on the system. The control framework is implemented on the iCub humanoid robot.}, + booktitle = {2016 {IEEE}-{RAS} 16th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, + author = {Pucci, Daniele and Romano, Francesco and Traversaro, Silvio and Nori, Francesco}, month = nov, - year = {2012}, + year = {2016}, note = {ISSN: 2164-0580}, - keywords = {Hip, Robots, Acceleration, Stability analysis, Foot, Predictive control, Linear programming}, - pages = {159--164}, - file = {Aftab et al_2012_Ankle, hip and stepping strategies for humanoid balance recovery with a single.pdf:/home/dferigo/Zotero/storage/FGKZFRTL/Aftab et al_2012_Ankle, hip and stepping strategies for humanoid balance recovery with a single.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/LJPL5LQR/6651514.html:text/html}, + keywords = {Humanoid robots, Dynamics, Force control, Friction, Reliability, Torque}, + pages = {141--141}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/LG6JJYV2/7803266.html:text/html}, } -@article{jeong_robust_2019, - title = {A {Robust} {Walking} {Controller} {Based} on {Online} {Optimization} of {Ankle}, {Hip}, and {Stepping} {Strategies}}, - volume = {35}, - issn = {1941-0468}, - doi = {10.1109/TRO.2019.2926487}, - abstract = {In this paper, we propose a biped walking controller that optimized three push recovery strategies: the ankle, hip, and stepping strategies. We suggested formulations that related the effects of each strategy to the stability of walking based on the linear inverted pendulum with flywheel model. With these relations, we could set up an optimization problem that integrates all the strategies, including step time change. These strategies are not applied hierarchically, but applied according to each weighting factor. Various combinations of weighting factors can be used to determine how the robot should respond to an external push. The optimization problem derived here includes many nonlinear components, but it has been linearized though some assumptions and it can be applied to a robot in real time. Our method is designed to be robust to modeling errors or weak perturbations, by exploiting the advantages of the foot. Hence, it is very practical to apply this algorithm to a real robot. The effectiveness of the walking controller has been verified through abstracted model simulation, full dynamics simulation, and a practical robot experiments.}, - number = {6}, - journal = {IEEE Transactions on Robotics}, - author = {Jeong, Hyobin and Lee, Inho and Oh, Jaesung and Lee, Kang Kyu and Oh, Jun-Ho}, - month = dec, - year = {2019}, - note = {Conference Name: IEEE Transactions on Robotics}, - keywords = {Hip, humanoid robot, Humanoid robots, Torque, Legged locomotion, Optimization, Foot, Biped walking, divergent component of motion (DCM), walking control}, - pages = {1367--1386}, - file = {Jeong et al_2019_A Robust Walking Controller Based on Online Optimization of Ankle, Hip, and.pdf:/home/dferigo/Zotero/storage/J53BAQWP/Jeong et al_2019_A Robust Walking Controller Based on Online Optimization of Ankle, Hip, and.pdf:application/pdf}, +@techreport{blondel_efficient_2022, + title = {Efficient and {Modular} {Implicit} {Differentiation}}, + url = {http://arxiv.org/abs/2105.15183}, + abstract = {Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, so far, implicit differentiation remained difficult to use for practitioners, as it often required case-by-case tedious mathematical derivations and implementations. In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines directly in Python a function F capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of F and the implicit function theorem to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many existing implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.}, + language = {en}, + number = {arXiv:2105.15183}, + urldate = {2022-06-01}, + institution = {arXiv}, + author = {Blondel, Mathieu and Berthet, Quentin and Cuturi, Marco and Frostig, Roy and Hoyer, Stephan and Llinares-López, Felipe and Pedregosa, Fabian and Vert, Jean-Philippe}, + month = may, + year = {2022}, + note = {arXiv:2105.15183 [cs, math, stat] +type: article}, + keywords = {Computer Science - Machine Learning, Statistics - Machine Learning, Mathematics - Numerical Analysis}, + file = {Blondel et al. - 2022 - Efficient and Modular Implicit Differentiation.pdf:/home/dferigo/Zotero/storage/QT85GLQN/Blondel et al. - 2022 - Efficient and Modular Implicit Differentiation.pdf:application/pdf}, } -@inproceedings{mcgreavy_unified_2020, - title = {Unified {Push} {Recovery} {Fundamentals}: {Inspiration} from {Human} {Study}}, - shorttitle = {Unified {Push} {Recovery} {Fundamentals}}, - doi = {10.1109/ICRA40945.2020.9196911}, - abstract = {Currently for balance recovery, humans outperform humanoid robots which use hand-designed controllers in terms of the diverse actions. This study aims to close this gap by finding core control principles that are shared across ankle, hip, toe and stepping strategies by formulating experiments to test human balance recoveries and define criteria to quantify the strategy in use. To reveal fundamental principles of balance strategies, our study shows that a minimum jerk controller can accurately replicate comparable human behaviour at the Centre of Mass level. Therefore, we formulate a general Model-Predictive Control (MPC) framework to produce recovery motions in any system, including legged machines, where the framework parameters are tuned for time-optimal performance in robotic systems.}, - booktitle = {2020 {IEEE} {International} {Conference} on {Robotics} and {Automation} ({ICRA})}, - author = {McGreavy, Christopher and Yuan, Kai and Gordon, Daniel and Tan, Kang and Wolfslag, Wouter J and Vijayakumar, Sethu and Li, Zhibin}, - month = may, +@article{sabne_xla_2020, + title = {{XLA}: {Compiling} {Machine} {Learning} for {Peak} {Performance}}, + shorttitle = {{XLA}}, + author = {Sabne, Amit}, year = {2020}, - note = {ISSN: 2577-087X}, - keywords = {Hip, Robots, Foot, Force, Modulation, Stability criteria}, - pages = {10876--10882}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/2YFRVS3Q/9196911.html:text/html;McGreavy et al_2020_Unified Push Recovery Fundamentals.pdf:/home/dferigo/Zotero/storage/W2PNKYLM/McGreavy et al_2020_Unified Push Recovery Fundamentals.pdf:application/pdf}, + file = {Snapshot:/home/dferigo/Zotero/storage/GZ238JR2/pub50530.html:text/html}, } -@article{shafiee-ashtiani_push_2016, - title = {Push {Recovery} of a {Humanoid} {Robot} {Based} on {Model} {Predictive} {Control} and {Capture} {Point}}, - url = {http://arxiv.org/abs/1612.08034}, - abstract = {The three bio-inspired strategies that have been used for balance recovery of biped robots are the ankle, hip and stepping Strategies. However, there are several cases for a biped robot where stepping is not possible, e. g. when the available contact surfaces are limited. In this situation, the balance recovery by modulating the angular momentum of the upper body (Hip-strategy) or the Zero Moment Point (ZMP) (Ankle strategy) is essential. In this paper, a single Model Predictive Control (MPC) scheme is employed for controlling the Capture Point (CP) to a desired position by modulating both the ZMP and the Centroidal Moment Pivot (CMP). The goal of the proposed controller is to control the CP, employing the CMP when the CP is out of the support polygon, and/or the ZMP when the CP is inside the support polygon. The proposed algorithm is implemented on an abstract model of the SURENA III humanoid robot. Obtained results show the effectiveness of the proposed approach in the presence of severe pushes, even when the support polygon is shrunken to a point or a line.}, +@article{harris_array_2020, + title = {Array programming with {NumPy}}, + volume = {585}, + copyright = {2020 The Author(s)}, + issn = {1476-4687}, + url = {https://www.nature.com/articles/s41586-020-2649-2}, + doi = {10.1038/s41586-020-2649-2}, + abstract = {Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.}, language = {en}, - urldate = {2022-05-21}, - journal = {arXiv:1612.08034 [cs]}, - author = {Shafiee-Ashtiani, Milad and Yousefi-Koma, Aghil and Shariat-Panahi, Masoud and Khadiv, Majid}, + number = {7825}, + urldate = {2022-06-01}, + journal = {Nature}, + author = {Harris, Charles R. and Millman, K. Jarrod and van der Walt, Stéfan J. and Gommers, Ralf and Virtanen, Pauli and Cournapeau, David and Wieser, Eric and Taylor, Julian and Berg, Sebastian and Smith, Nathaniel J. and Kern, Robert and Picus, Matti and Hoyer, Stephan and van Kerkwijk, Marten H. and Brett, Matthew and Haldane, Allan and del Río, Jaime Fernández and Wiebe, Mark and Peterson, Pearu and Gérard-Marchant, Pierre and Sheppard, Kevin and Reddy, Tyler and Weckesser, Warren and Abbasi, Hameer and Gohlke, Christoph and Oliphant, Travis E.}, + month = sep, + year = {2020}, + note = {Number: 7825 +Publisher: Nature Publishing Group}, + keywords = {Software, Computational neuroscience, Computational science, Computer science, Solar physics}, + pages = {357--362}, + file = {Harris et al_2020_Array programming with NumPy.pdf:/home/dferigo/Zotero/storage/7ZRWE676/Harris et al_2020_Array programming with NumPy.pdf:application/pdf}, +} + +@article{van_der_walt_numpy_2011, + title = {The {NumPy} {Array}: {A} {Structure} for {Efficient} {Numerical} {Computation}}, + volume = {13}, + issn = {1558-366X}, + shorttitle = {The {NumPy} {Array}}, + doi = {10.1109/MCSE.2011.37}, + abstract = {In the Python world, NumPy arrays are the standard representation for numerical data and enable efficient implementation of numerical computations in a high-level language. As this effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.}, + number = {2}, + journal = {Computing in Science \& Engineering}, + author = {van der Walt, Stefan and Colbert, S. Chris and Varoquaux, Gael}, + month = mar, + year = {2011}, + note = {Conference Name: Computing in Science \& Engineering}, + keywords = {Arrays, Computational efficiency, Finite element methods, Numerical analysis, numerical computations, NumPy, Performance evaluation, programming libraries, Python, Resource management, scientific programming, Vector quantization}, + pages = {22--30}, + file = {van der Walt et al_2011_The NumPy Array.pdf:/home/dferigo/Zotero/storage/SMLVNJJN/van der Walt et al_2011_The NumPy Array.pdf:application/pdf}, +} + +@inproceedings{maclaurin_autograd_2015, + title = {Autograd: {Effortless} {Gradients} in {Numpy}}, + abstract = {Automatic differentiation can greatly speed up prototyping and implementation of machine learning models. However, most packages are implicitly domain-specific, requiring the use of a restricted mini-language for specifying functions. We introduce autograd, a package which differentiates standard Python and Numpy code, and can differentiate code containing while loops, branches, closures, classes and even its own gradient evaluations.}, + language = {en}, + author = {Maclaurin, Dougal and Duvenaud, David and Adams, Ryan P}, + year = {2015}, + pages = {3}, + file = {Maclaurin et al. - Autograd Effortless Gradients in Numpy.pdf:/home/dferigo/Zotero/storage/HB32N7M8/Maclaurin et al. - Autograd Effortless Gradients in Numpy.pdf:application/pdf}, +} + +@article{lee_dart_2018, + title = {{DART}: {Dynamic} {Animation} and {Robotics} {Toolkit}}, + issn = {2475-9066}, + shorttitle = {{DART}}, + doi = {10.21105/joss.00500}, + language = {en}, + urldate = {2019-08-09}, + journal = {The Journal of Open Source Software}, + author = {Lee, Jeongseok and X. Grey, Michael and Ha, Sehoon and Kunz, Tobias and Jain, Sumit and Ye, Yuting and S. Srinivasa, Siddhartha and Stilman, Mike and Karen Liu, C.}, + month = feb, + year = {2018}, + file = {Lee et al_2018_DART.pdf:/home/dferigo/Zotero/storage/VGHXAZPR/Lee et al_2018_DART.pdf:application/pdf}, +} + +@inproceedings{koenig_design_2004, + address = {Sendai, Japan}, + title = {Design and use paradigms for gazebo, an open-source multi-robot simulator}, + volume = {3}, + isbn = {978-0-7803-8463-7}, + url = {http://ieeexplore.ieee.org/document/1389727/}, + doi = {10.1109/IROS.2004.1389727}, + abstract = {Simulators have played a critical role in robotics research as tools for quick and efficient testing of new concepts, strategies, and algorithms. To date, most simulators have been restricted to 2D worlds, and few have matured to the point where they are both highly capable and easily adaptable. Gazebo is designed to fill this niche by creating a 3D dynamic multi-robot environment capable of recreating the complex worlds that will be encountered by the next generation of mobile robots. Its open source status, fine grained control, and high fidelity place Gazebo in a unique position to become more than just a stepping stone between the drawing board and real hardware: data visualization, simulation of remote environments, and even reverse engineering of blackbox systems are all possible applications.}, + language = {en}, + urldate = {2019-02-07}, + booktitle = {2004 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS}) ({IEEE} {Cat}. {No}.{04CH37566})}, + publisher = {IEEE}, + author = {Koenig, N. and Howard, A.}, + year = {2004}, + pages = {2149--2154}, + file = {Koenig_Howard_2004_Design and use paradigms for gazebo, an open-source multi-robot simulator.pdf:/home/dferigo/Zotero/storage/9KCY9GLI/Koenig_Howard_2004_Design and use paradigms for gazebo, an open-source multi-robot simulator.pdf:application/pdf}, +} + +@article{natale_icub_2017, + title = {{iCub}: {The} not-yet-finished story of building a robot child}, + volume = {2}, + shorttitle = {{iCub}}, + url = {https://www.science.org/doi/full/10.1126/scirobotics.aaq1026}, + doi = {10.1126/scirobotics.aaq1026}, + number = {13}, + urldate = {2022-05-31}, + journal = {Science Robotics}, + author = {Natale, Lorenzo and Bartolozzi, Chiara and Pucci, Daniele and Wykowska, Agnieszka and Metta, Giorgio}, month = dec, - year = {2016}, - note = {arXiv: 1612.08034}, - keywords = {Computer Science - Robotics}, - file = {Shafiee-Ashtiani et al. - 2016 - Push Recovery of a Humanoid Robot Based on Model P.pdf:/home/dferigo/Zotero/storage/V5B979ZC/Shafiee-Ashtiani et al. - 2016 - Push Recovery of a Humanoid Robot Based on Model P.pdf:application/pdf}, + year = {2017}, + note = {Publisher: American Association for the Advancement of Science}, + pages = {eaaq1026}, + file = {Natale et al_2017_iCub.pdf:/home/dferigo/Zotero/storage/FJ6ZI85X/Natale et al_2017_iCub.pdf:application/pdf}, } -@inproceedings{stephens_humanoid_2007, - address = {Pittsburgh, PA, USA}, - title = {Humanoid push recovery}, - isbn = {978-1-4244-1861-9}, - url = {http://ieeexplore.ieee.org/document/4813931/}, - doi = {10.1109/ICHR.2007.4813931}, - abstract = {We extend simple models previously developed for humanoids to large push recovery. Using these simple models, we develop analytic decision surfaces that are functions of reference points, such as the center of mass and center of pressure, that predict whether or not a fall is inevitable. We explore three strategies for recovery: 1) using ankle torques, 2) moving internal joints, and 3) taking a step. These models can be used in robot controllers or in analysis of human balance and locomotion.}, +@article{metta_open_2005, + title = {an open framework for research in embodied cognition}, + abstract = {This paper presents a research initiative on embodied cognition called RobotCub. RobotCub is an EUfunded project that aims at developing an open humanoid robotic platform and, simultaneously, following an original research path in synthetic cognition. We report on the motivations behind the realization of our humanoid robotic platform and the current status of the design just before the construction of the first full prototype.}, language = {en}, + author = {Metta, Giorgio and Sandini, Giulio and Vernon, David and Caldwell, Darwin and Tsagarakis, Nikolaos and Beira, Ricardo and Santos-Victor, Jose' and Ijspeert, Auke and Righetti, Ludovic and Cappiello, Giovanni and Stellin, Giovanni and Becchi, Francesco}, + year = {2005}, + pages = {9}, + file = {Metta et al. - 2005 - an open framework for research in embodied cogniti.pdf:/home/dferigo/Zotero/storage/ZWCLYQYE/Metta et al. - 2005 - an open framework for research in embodied cogniti.pdf:application/pdf}, +} + +@misc{noauthor_icub_nodate, + title = {{iCub}: {The} not-yet-finished story of building a robot child}, + url = {https://www.science.org/doi/10.1126/scirobotics.aaq1026}, + urldate = {2022-05-31}, +} + +@techreport{zamora_extending_2017, + title = {Extending the {OpenAI} {Gym} for robotics: a toolkit for reinforcement learning using {ROS} and {Gazebo}}, + shorttitle = {Extending the {OpenAI} {Gym} for robotics}, + url = {http://arxiv.org/abs/1608.05742}, + abstract = {This paper presents an extension of the OpenAI Gym for robotics using the Robot Operating System (ROS) and the Gazebo simulator. The content discusses the software architecture proposed and the results obtained by using two Reinforcement Learning techniques: Q-Learning and Sarsa. Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions.}, + number = {arXiv:1608.05742}, + urldate = {2022-05-31}, + institution = {arXiv}, + author = {Zamora, Iker and Lopez, Nestor Gonzalez and Vilches, Victor Mayoral and Cordero, Alejandro Hernandez}, + month = feb, + year = {2017}, + doi = {10.48550/arXiv.1608.05742}, + note = {arXiv:1608.05742 [cs] +type: article}, + keywords = {Computer Science - Robotics}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/HH56KUXC/1608.html:text/html;Zamora et al_2017_Extending the OpenAI Gym for robotics.pdf:/home/dferigo/Zotero/storage/563JXI4P/Zamora et al_2017_Extending the OpenAI Gym for robotics.pdf:application/pdf}, +} + +@techreport{duburcq_reactive_2022, + title = {Reactive {Stepping} for {Humanoid} {Robots} using {Reinforcement} {Learning}: {Application} to {Standing} {Push} {Recovery} on the {Exoskeleton} {Atalante}}, + shorttitle = {Reactive {Stepping} for {Humanoid} {Robots} using {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/2203.01148}, + abstract = {State-of-the-art reinforcement learning is now able to learn versatile locomotion, balancing and push-recovery capabilities for bipedal robots in simulation. Yet, the reality gap has mostly been overlooked and the simulated results hardly transfer to real hardware. Either it is unsuccessful in practice because the physics is over-simplified and hardware limitations are ignored, or regularity is not guaranteed and unexpected hazardous motions can occur. This paper presents a reinforcement learning framework capable of learning robust standing push recovery for bipedal robots with a smooth out-of-the-box transfer to reality, requiring only instantaneous proprioceptive observations. By combining original termination conditions and policy smoothness conditioning, we achieve stable learning, sim-to-real transfer and safety using a policy without memory nor observation history. Reward shaping is then used to give insights into how to keep balance. We demonstrate its performance in reality on the lower-limb medical exoskeleton Atalante.}, + number = {arXiv:2203.01148}, urldate = {2022-05-21}, - booktitle = {2007 7th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, - publisher = {IEEE}, - author = {Stephens, Benjamin}, - month = nov, - year = {2007}, - pages = {589--595}, - file = {Stephens - 2007 - Humanoid push recovery.pdf:/home/dferigo/Zotero/storage/W4D9H84Y/Stephens - 2007 - Humanoid push recovery.pdf:application/pdf}, + institution = {arXiv}, + author = {Duburcq, Alexis and Schramm, Fabian and Boéris, Guilhem and Bredeche, Nicolas and Chevaleyre, Yann}, + month = mar, + year = {2022}, + doi = {10.48550/arXiv.2203.01148}, + note = {arXiv:2203.01148 [cs] +version: 1 +type: article}, + keywords = {Computer Science - Robotics}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/UBANG96C/2203.html:text/html;Duburcq et al_2022_Reactive Stepping for Humanoid Robots using Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/SQ73JWFF/Duburcq et al_2022_Reactive Stepping for Humanoid Robots using Reinforcement Learning.pdf:application/pdf}, +} + +@inproceedings{kim_push_2019, + title = {Push {Recovery} {Control} for {Humanoid} {Robot} {Using} {Reinforcement} {Learning}}, + doi = {10.1109/IRC.2019.00102}, + abstract = {A humanoid robot similar to a human is structurally unstable, so the push recovery control is essential. The proposed push recovery controller consists of a IMU sensor part, a highlevel push recovery controller and a low-level push recovery controller. The IMU sensor part measures the linear velocity and angular velocity and transmits it to the high-level push recovery controller. The high-level push recovery controller selects the strategy of the low-level push recovery controller based on the stability region. The stability region is improved using the DQN(Deep Q-Network) of the reinforcement learning method. The low-level push recovery controller consists of a ankle, hip and step strategies. Each strategy is analyzed using LIPM(Linear Inverted Pendulum Model). Based on the analysis, the actuators corresponding to each strategy are controlled.}, + booktitle = {2019 {Third} {IEEE} {International} {Conference} on {Robotic} {Computing} ({IRC})}, + author = {Kim, Harin and Seo, Donghyeon and Kim, Donghan}, + month = feb, + year = {2019}, + keywords = {Reinforcement learning, Humanoid robots, reinforcement learning, humanoid robots, learning (artificial intelligence), legged locomotion, humanoid robot, stability, motion control, robot programming, Mathematical model, Hip, linear systems, bipedal robot, control engineering computing, push recovery, Actuators, pendulums, Torso, angular velocity, balancing control, Deep Q-Network, DQN, IMU sensor part, linear inverted pendulum model, linear velocity, nonlinear systems, push recovery control, stability region}, + pages = {488--492}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/9R2B6JH9/8675597.html:text/html;Kim et al_2019_Push Recovery Control for Humanoid Robot Using Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/I52S4RI7/Kim et al_2019_Push Recovery Control for Humanoid Robot Using Reinforcement Learning.pdf:application/pdf}, +} + +@inproceedings{wieber_trajectory_2006, + title = {Trajectory {Free} {Linear} {Model} {Predictive} {Control} for {Stable} {Walking} in the {Presence} of {Strong} {Perturbations}}, + doi = {10.1109/ICHR.2006.321375}, + abstract = {A humanoid walking robot is a highly nonlinear dynamical system that relies strongly on contact forces between its feet and the ground in order to realize stable motions, but these contact forces are unfortunately severely limited. Model predictive control, also known as receding horizon control, is a general control scheme specifically designed to deal with such constrained dynamical systems, with the potential ability to react efficiently to a wide range of situations. Apart from the question of computation time which needs to be taken care of carefully (these schemes can be highly computation intensive), the initial question of which optimal control problems should be considered to be solved online in order to lead to the desired walking movements is still unanswered. A key idea for answering to this problem can be found in the ZMP preview control scheme. After presenting here this scheme with a point of view slightly different from the original one, we focus on the problem of compensating strong perturbations of the dynamics of the robot and propose a new linear model predictive control scheme which is an improvement of the original ZMP preview control scheme.}, + booktitle = {2006 6th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, + author = {Wieber, Pierre-brice}, + month = dec, + year = {2006}, + note = {ISSN: 2164-0580}, + keywords = {Control systems, Humanoid robots, Legged locomotion, Trajectory, Optimal control, Force control, Motion control, Predictive models, Nonlinear dynamical systems, Predictive control}, + pages = {137--142}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/6NQ6937N/4115592.html:text/html;Wieber_2006_Trajectory Free Linear Model Predictive Control for Stable Walking in the.pdf:/home/dferigo/Zotero/storage/PHYMXB8L/Wieber_2006_Trajectory Free Linear Model Predictive Control for Stable Walking in the.pdf:application/pdf}, } -@inproceedings{koolen_balance_2016, - title = {Balance control using center of mass height variation: {Limitations} imposed by unilateral contact}, - shorttitle = {Balance control using center of mass height variation}, - doi = {10.1109/HUMANOIDS.2016.7803247}, - abstract = {Maintaining balance is fundamental to legged robots. The most commonly used mechanisms for balance control are taking a step, regulating the center of pressure (`ankle strategies'), and to a lesser extent, changing centroidal angular momentum (e.g., `hip strategies'). In this paper, we disregard these three mechanisms, instead focusing on a fourth: varying center of mass height. We study a 2D variable-height center of mass model, and analyze how center of mass height variation can be used to achieve balance, in the sense of convergence to a fixed point of the dynamics. In this analysis, we pay special attention to the constraint of unilateral contact forces. We first derive a necessary condition that must be satisfied to be able to achieve balance. We then present two control laws, and derive their regions of attraction in closed form. We show that one of the control laws achieves balance from any state satisfying the necessary condition for balance. Finally, we briefly discuss the relative importance of CoM height variation and other balance mechanisms.}, - booktitle = {2016 {IEEE}-{RAS} 16th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, - author = {Koolen, Twan and Posa, Michael and Tedrake, Russ}, +@inproceedings{feng_optimization_2014, + title = {Optimization based full body control for the atlas robot}, + doi = {10.1109/HUMANOIDS.2014.7041347}, + abstract = {One popular approach to controlling humanoid robots is through inverse kinematics (IK) with stiff joint position tracking. On the other hand, inverse dynamics (ID) based approaches have gained increasing acceptance by providing compliant motions and robustness to external perturbations. However, the performance of such methods is heavily dependent on high quality dynamic models, which are often very difficult to produce for a physical robot. IK approaches only require kinematic models, which are much easier to generate in practice. In this paper, we supplement our previous work with ID-based controllers by adding IK, which helps compensate for modeling errors. The proposed full body controller is applied to three tasks in the DARPA Robotics Challenge (DRC) Trials in Dec. 2013.}, + booktitle = {2014 {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, + author = {Feng, Siyuan and Whitman, Eric and Xinjilefu, X and Atkeson, Christopher G.}, month = nov, - year = {2016}, + year = {2014}, note = {ISSN: 2164-0580}, - keywords = {Trajectory, Lips, Dynamics, Legged locomotion, Convergence, Two dimensional displays}, - pages = {8--15}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/9A9HYM9S/7803247.html:text/html;Koolen et al_2016_Balance control using center of mass height variation.pdf:/home/dferigo/Zotero/storage/NNEK5L6N/Koolen et al_2016_Balance control using center of mass height variation.pdf:application/pdf}, + keywords = {Joints, Robots, Kinematics, Dynamics, Torque, Foot, Jacobian matrices}, + pages = {120--127}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/QSUVBIRV/7041347.html:text/html}, } -@article{miki_learning_2022, - title = {Learning robust perceptive locomotion for quadrupedal robots in the wild}, - volume = {7}, - url = {https://www.science.org/doi/abs/10.1126/scirobotics.abk2822}, - doi = {10.1126/scirobotics.abk2822}, - number = {62}, - urldate = {2022-05-21}, - journal = {Science Robotics}, - author = {Miki, Takahiro and Lee, Joonho and Hwangbo, Jemin and Wellhausen, Lorenz and Koltun, Vladlen and Hutter, Marco}, +@article{vukobratovic_contribution_1969, + title = {Contribution to the {Synthesis} of {Biped} {Gait}}, + volume = {BME-16}, + issn = {1558-2531}, + doi = {10.1109/TBME.1969.4502596}, + abstract = {The connection between the dynamics of an object and the algorithmic level has been modified in this paper, based on two-level control. The central modification consists in introducing feedbacks, that is, a system of regulators at the level of the formed typed of gait only. Such a modification originates from the assumption that a very narrow class of gait types needs to be taken into account when generating the gait. In the paper the gait has been formed on the basis of a fixed program having a kinematic-dynamic character. The kinematic part concerns the kinematic programnmed connections for activating the lower extremities, while the dynamic part exposes appropriate changes in the characteristic coordinates of the compensation system. Such a connection with a minimum number of coordinates extends the possibility of solving the problem of equilibrium in motion for one type of gait without any particular algorithm that would take into account the motion coordinates and form out of them a stable motion at a higher algebraic level.}, + number = {1}, + journal = {IEEE Transactions on Biomedical Engineering}, + author = {Vukobratovic, Miomir and Juricic, Davor}, month = jan, - year = {2022}, - note = {Publisher: American Association for the Advancement of Science}, - pages = {eabk2822}, - file = {Miki et al_2022_Learning robust perceptive locomotion for quadrupedal robots in the wild.pdf:/home/dferigo/Zotero/storage/VF6EF3ZW/Miki et al_2022_Learning robust perceptive locomotion for quadrupedal robots in the wild.pdf:application/pdf}, + year = {1969}, + note = {Conference Name: IEEE Transactions on Biomedical Engineering}, + keywords = {Control systems, Kinematics, Motion control, Aerodynamics, Leg, Automatic control, Control system synthesis, Extremities, MIMO, Organisms}, + pages = {1--6}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/VPRS9676/4502596.html:text/html}, } -@article{lee_learning_2020, - title = {Learning quadrupedal locomotion over challenging terrain}, - volume = {5}, - issn = {2470-9476}, - url = {https://www.science.org/doi/10.1126/scirobotics.abc5986}, - doi = {10.1126/scirobotics.abc5986}, - abstract = {A learning-based locomotion controller enables a quadrupedal ANYmal robot to traverse challenging natural environments. - , - Legged locomotion can extend the operational domain of robots to some of the most challenging environments on Earth. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have increased in complexity but fallen short of the generality and robustness of animal locomotion. Here, we present a robust controller for blind quadrupedal locomotion in challenging natural environments. Our approach incorporates proprioceptive feedback in locomotion control and demonstrates zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. The controller is driven by a neural network policy that acts on a stream of proprioceptive signals. The controller retains its robustness under conditions that were never encountered during training: deformable terrains such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work indicates that robust locomotion in natural environments can be achieved by training in simple domains.}, +@article{vukobratovic_zero-moment_2004, + title = {Zero-moment point — thirty five years of its life}, + volume = {01}, + issn = {0219-8436, 1793-6942}, + url = {https://www.worldscientific.com/doi/abs/10.1142/S0219843604000083}, + doi = {10.1142/S0219843604000083}, + abstract = {This paper is devoted to the permanence of the concept of Zero-Moment Point, widely-known by the acronym ZMP. Thirty-five years have elapsed since its implicit presentation (actually before being named ZMP) to the scientific community and thirty-three years since it was explicitly introduced and clearly elaborated, initially in the leading journals published in English. Its first practical demonstration took place in Japan in 1984, at Waseda University, Laboratory of Ichiro Kato, in the first dynamically balanced robot WL-10RD of the robotic family WABOT. The paper gives an in-depth discussion of source results concerning ZMP, paying particular attention to some delicate issues that may lead to confusion if this method is applied in a mechanistic manner onto irregular cases of artificial gait, i.e. in the case of loss of dynamic balance of a humanoid robot. + After a short survey of the history of the origin of ZMP a very detailed elaboration of ZMP notion is given, with a special review concerning "boundary cases" when the ZMP is close to the edge of the support polygon and "fictious cases" when the ZMP should be outside the support polygon. In addition, the difference between ZMP and the center of pressure is pointed out. Finally, some unresolved or insufficiently treated phenomena that may yield a significant improvement in robot performance are considered.}, language = {en}, - number = {47}, + number = {01}, urldate = {2022-05-21}, - journal = {Science Robotics}, - author = {Lee, Joonho and Hwangbo, Jemin and Wellhausen, Lorenz and Koltun, Vladlen and Hutter, Marco}, - month = oct, - year = {2020}, - pages = {eabc5986}, - file = {Lee et al. - 2020 - Learning quadrupedal locomotion over challenging t.pdf:/home/dferigo/Zotero/storage/U25M66GV/Lee et al. - 2020 - Learning quadrupedal locomotion over challenging t.pdf:application/pdf}, + journal = {International Journal of Humanoid Robotics}, + author = {Vukobratović, Miomir and Borovac, Branislav}, + month = mar, + year = {2004}, + pages = {157--173}, + file = {Vukobratović and Borovac - 2004 - ZERO-MOMENT POINT — THIRTY FIVE YEARS OF ITS LIFE.pdf:/home/dferigo/Zotero/storage/7BQEUGQN/Vukobratović and Borovac - 2004 - ZERO-MOMENT POINT — THIRTY FIVE YEARS OF ITS LIFE.pdf:application/pdf}, } -@article{nashner_organization_1985, - title = {The organization of human postural movements: a formal basis and experimental synthesis}, - volume = {8}, - shorttitle = {The organization of human postural movements}, - number = {1}, - journal = {Behavioral and brain sciences}, - author = {Nashner, Lewis M. and McCollum, Gin}, - year = {1985}, - note = {Publisher: Cambridge University Press}, - pages = {135--150}, +@article{vukobratovic_zero-moment_2004-1, + title = {Zero-moment point — thirty five years of its life}, + volume = {01}, + issn = {0219-8436}, + url = {https://www.worldscientific.com/doi/abs/10.1142/S0219843604000083}, + doi = {10.1142/S0219843604000083}, + abstract = {This paper is devoted to the permanence of the concept of Zero-Moment Point, widely-known by the acronym ZMP. Thirty-five years have elapsed since its implicit presentation (actually before being named ZMP) to the scientific community and thirty-three years since it was explicitly introduced and clearly elaborated, initially in the leading journals published in English. Its first practical demonstration took place in Japan in 1984, at Waseda University, Laboratory of Ichiro Kato, in the first dynamically balanced robot WL-10RD of the robotic family WABOT. The paper gives an in-depth discussion of source results concerning ZMP, paying particular attention to some delicate issues that may lead to confusion if this method is applied in a mechanistic manner onto irregular cases of artificial gait, i.e. in the case of loss of dynamic balance of a humanoid robot. + +After a short survey of the history of the origin of ZMP a very detailed elaboration of ZMP notion is given, with a special review concerning "boundary cases" when the ZMP is close to the edge of the support polygon and "fictious cases" when the ZMP should be outside the support polygon. In addition, the difference between ZMP and the center of pressure is pointed out. Finally, some unresolved or insufficiently treated phenomena that may yield a significant improvement in robot performance are considered.}, + number = {01}, + urldate = {2022-05-21}, + journal = {International Journal of Humanoid Robotics}, + author = {Vukobratović, Miomir and Borovac, Branislav}, + month = mar, + year = {2004}, + note = {Publisher: World Scientific Publishing Co.}, + keywords = {zero-moment point, Biped locomotion, dynamically balanced gait, support polygon}, + pages = {157--173}, +} + +@inproceedings{kajita_3d_2001, + address = {Maui, HI, USA}, + title = {The {3D} linear inverted pendulum mode: a simple modeling for a biped walking pattern generation}, + volume = {1}, + isbn = {978-0-7803-6612-1}, + shorttitle = {The {3D} linear inverted pendulum mode}, + url = {http://ieeexplore.ieee.org/document/973365/}, + doi = {10.1109/IROS.2001.973365}, + abstract = {F or 3D walkingcontr ol of a biped robot we analyze the dynamics of a three-dimensional inverted pendulum in which motion is constr aine d to move along an arbitr arily de ned plane. This analysis leads us a simple line ar dynamics, the There-Dimensional Linear Inverte d Pendulum Mode (3D-LIPM). Geometric nature of trajectories under the 3D-LIPM and a method for walking pattern gener ationare discusse d. A simulation r esult of a walking contr ol using a 12 d.o.f. biped robot model is also shown.}, + language = {en}, + urldate = {2022-05-21}, + booktitle = {Proceedings 2001 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}. {Expanding} the {Societal} {Role} of {Robotics} in the the {Next} {Millennium} ({Cat}. {No}.{01CH37180})}, + publisher = {IEEE}, + author = {Kajita, S. and Kanehiro, F. and Kaneko, K. and Yokoi, K. and Hirukawa, H.}, + year = {2001}, + pages = {239--246}, + file = {Kajita et al. - 2001 - The 3D linear inverted pendulum mode a simple mod.pdf:/home/dferigo/Zotero/storage/UWHTKZPV/Kajita et al. - 2001 - The 3D linear inverted pendulum mode a simple mod.pdf:application/pdf}, } @article{maki_role_1997, @@ -2272,602 +2433,613 @@ @article{maki_role_1997 month = may, year = {1997}, pmid = {9149760}, - keywords = {Humans, Arm, Leg, Adaptation, Physiological, Adult, Aged, Aging, Middle Aged, Movement, Postural Balance, Posture, Sensation Disorders}, + keywords = {Humans, Leg, Arm, Adaptation, Physiological, Adult, Aged, Aging, Middle Aged, Movement, Postural Balance, Posture, Sensation Disorders}, pages = {488--507}, file = {Maki_McIlroy_1997_The role of limb movements in maintaining upright stance.pdf:/home/dferigo/Zotero/storage/I2REVTU8/Maki_McIlroy_1997_The role of limb movements in maintaining upright stance.pdf:application/pdf}, } -@inproceedings{kajita_3d_2001, - address = {Maui, HI, USA}, - title = {The {3D} linear inverted pendulum mode: a simple modeling for a biped walking pattern generation}, - volume = {1}, - isbn = {978-0-7803-6612-1}, - shorttitle = {The {3D} linear inverted pendulum mode}, - url = {http://ieeexplore.ieee.org/document/973365/}, - doi = {10.1109/IROS.2001.973365}, - abstract = {F or 3D walkingcontr ol of a biped robot we analyze the dynamics of a three-dimensional inverted pendulum in which motion is constr aine d to move along an arbitr arily de ned plane. This analysis leads us a simple line ar dynamics, the There-Dimensional Linear Inverte d Pendulum Mode (3D-LIPM). Geometric nature of trajectories under the 3D-LIPM and a method for walking pattern gener ationare discusse d. A simulation r esult of a walking contr ol using a 12 d.o.f. biped robot model is also shown.}, +@article{nashner_organization_1985, + title = {The organization of human postural movements: a formal basis and experimental synthesis}, + volume = {8}, + shorttitle = {The organization of human postural movements}, + number = {1}, + journal = {Behavioral and brain sciences}, + author = {Nashner, Lewis M. and McCollum, Gin}, + year = {1985}, + note = {Publisher: Cambridge University Press}, + pages = {135--150}, +} + +@article{lee_learning_2020, + title = {Learning quadrupedal locomotion over challenging terrain}, + volume = {5}, + issn = {2470-9476}, + url = {https://www.science.org/doi/10.1126/scirobotics.abc5986}, + doi = {10.1126/scirobotics.abc5986}, + abstract = {A learning-based locomotion controller enables a quadrupedal ANYmal robot to traverse challenging natural environments. + , + Legged locomotion can extend the operational domain of robots to some of the most challenging environments on Earth. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have increased in complexity but fallen short of the generality and robustness of animal locomotion. Here, we present a robust controller for blind quadrupedal locomotion in challenging natural environments. Our approach incorporates proprioceptive feedback in locomotion control and demonstrates zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. The controller is driven by a neural network policy that acts on a stream of proprioceptive signals. The controller retains its robustness under conditions that were never encountered during training: deformable terrains such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work indicates that robust locomotion in natural environments can be achieved by training in simple domains.}, language = {en}, + number = {47}, urldate = {2022-05-21}, - booktitle = {Proceedings 2001 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}. {Expanding} the {Societal} {Role} of {Robotics} in the the {Next} {Millennium} ({Cat}. {No}.{01CH37180})}, - publisher = {IEEE}, - author = {Kajita, S. and Kanehiro, F. and Kaneko, K. and Yokoi, K. and Hirukawa, H.}, - year = {2001}, - pages = {239--246}, - file = {Kajita et al. - 2001 - The 3D linear inverted pendulum mode a simple mod.pdf:/home/dferigo/Zotero/storage/UWHTKZPV/Kajita et al. - 2001 - The 3D linear inverted pendulum mode a simple mod.pdf:application/pdf}, + journal = {Science Robotics}, + author = {Lee, Joonho and Hwangbo, Jemin and Wellhausen, Lorenz and Koltun, Vladlen and Hutter, Marco}, + month = oct, + year = {2020}, + pages = {eabc5986}, + file = {Lee et al. - 2020 - Learning quadrupedal locomotion over challenging t.pdf:/home/dferigo/Zotero/storage/U25M66GV/Lee et al. - 2020 - Learning quadrupedal locomotion over challenging t.pdf:application/pdf}, } -@article{vukobratovic_zero-moment_2004, - title = {Zero-moment point — thirty five years of its life}, - volume = {01}, - issn = {0219-8436}, - url = {https://www.worldscientific.com/doi/abs/10.1142/S0219843604000083}, - doi = {10.1142/S0219843604000083}, - abstract = {This paper is devoted to the permanence of the concept of Zero-Moment Point, widely-known by the acronym ZMP. Thirty-five years have elapsed since its implicit presentation (actually before being named ZMP) to the scientific community and thirty-three years since it was explicitly introduced and clearly elaborated, initially in the leading journals published in English. Its first practical demonstration took place in Japan in 1984, at Waseda University, Laboratory of Ichiro Kato, in the first dynamically balanced robot WL-10RD of the robotic family WABOT. The paper gives an in-depth discussion of source results concerning ZMP, paying particular attention to some delicate issues that may lead to confusion if this method is applied in a mechanistic manner onto irregular cases of artificial gait, i.e. in the case of loss of dynamic balance of a humanoid robot. - -After a short survey of the history of the origin of ZMP a very detailed elaboration of ZMP notion is given, with a special review concerning "boundary cases" when the ZMP is close to the edge of the support polygon and "fictious cases" when the ZMP should be outside the support polygon. In addition, the difference between ZMP and the center of pressure is pointed out. Finally, some unresolved or insufficiently treated phenomena that may yield a significant improvement in robot performance are considered.}, - number = {01}, +@article{miki_learning_2022, + title = {Learning robust perceptive locomotion for quadrupedal robots in the wild}, + volume = {7}, + url = {https://www.science.org/doi/abs/10.1126/scirobotics.abk2822}, + doi = {10.1126/scirobotics.abk2822}, + number = {62}, urldate = {2022-05-21}, - journal = {International Journal of Humanoid Robotics}, - author = {Vukobratović, Miomir and Borovac, Branislav}, - month = mar, - year = {2004}, - note = {Publisher: World Scientific Publishing Co.}, - keywords = {zero-moment point, Biped locomotion, dynamically balanced gait, support polygon}, - pages = {157--173}, + journal = {Science Robotics}, + author = {Miki, Takahiro and Lee, Joonho and Hwangbo, Jemin and Wellhausen, Lorenz and Koltun, Vladlen and Hutter, Marco}, + month = jan, + year = {2022}, + note = {Publisher: American Association for the Advancement of Science}, + pages = {eabk2822}, + file = {Miki et al_2022_Learning robust perceptive locomotion for quadrupedal robots in the wild.pdf:/home/dferigo/Zotero/storage/VF6EF3ZW/Miki et al_2022_Learning robust perceptive locomotion for quadrupedal robots in the wild.pdf:application/pdf}, } -@article{vukobratovic_zero-moment_2004-1, - title = {Zero-moment point — thirty five years of its life}, - volume = {01}, - issn = {0219-8436, 1793-6942}, - url = {https://www.worldscientific.com/doi/abs/10.1142/S0219843604000083}, - doi = {10.1142/S0219843604000083}, - abstract = {This paper is devoted to the permanence of the concept of Zero-Moment Point, widely-known by the acronym ZMP. Thirty-five years have elapsed since its implicit presentation (actually before being named ZMP) to the scientific community and thirty-three years since it was explicitly introduced and clearly elaborated, initially in the leading journals published in English. Its first practical demonstration took place in Japan in 1984, at Waseda University, Laboratory of Ichiro Kato, in the first dynamically balanced robot WL-10RD of the robotic family WABOT. The paper gives an in-depth discussion of source results concerning ZMP, paying particular attention to some delicate issues that may lead to confusion if this method is applied in a mechanistic manner onto irregular cases of artificial gait, i.e. in the case of loss of dynamic balance of a humanoid robot. - After a short survey of the history of the origin of ZMP a very detailed elaboration of ZMP notion is given, with a special review concerning "boundary cases" when the ZMP is close to the edge of the support polygon and "fictious cases" when the ZMP should be outside the support polygon. In addition, the difference between ZMP and the center of pressure is pointed out. Finally, some unresolved or insufficiently treated phenomena that may yield a significant improvement in robot performance are considered.}, +@inproceedings{koolen_balance_2016, + title = {Balance control using center of mass height variation: {Limitations} imposed by unilateral contact}, + shorttitle = {Balance control using center of mass height variation}, + doi = {10.1109/HUMANOIDS.2016.7803247}, + abstract = {Maintaining balance is fundamental to legged robots. The most commonly used mechanisms for balance control are taking a step, regulating the center of pressure (`ankle strategies'), and to a lesser extent, changing centroidal angular momentum (e.g., `hip strategies'). In this paper, we disregard these three mechanisms, instead focusing on a fourth: varying center of mass height. We study a 2D variable-height center of mass model, and analyze how center of mass height variation can be used to achieve balance, in the sense of convergence to a fixed point of the dynamics. In this analysis, we pay special attention to the constraint of unilateral contact forces. We first derive a necessary condition that must be satisfied to be able to achieve balance. We then present two control laws, and derive their regions of attraction in closed form. We show that one of the control laws achieves balance from any state satisfying the necessary condition for balance. Finally, we briefly discuss the relative importance of CoM height variation and other balance mechanisms.}, + booktitle = {2016 {IEEE}-{RAS} 16th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, + author = {Koolen, Twan and Posa, Michael and Tedrake, Russ}, + month = nov, + year = {2016}, + note = {ISSN: 2164-0580}, + keywords = {Legged locomotion, Trajectory, Dynamics, Lips, Convergence, Two dimensional displays}, + pages = {8--15}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/9A9HYM9S/7803247.html:text/html;Koolen et al_2016_Balance control using center of mass height variation.pdf:/home/dferigo/Zotero/storage/NNEK5L6N/Koolen et al_2016_Balance control using center of mass height variation.pdf:application/pdf}, +} + +@inproceedings{stephens_humanoid_2007, + address = {Pittsburgh, PA, USA}, + title = {Humanoid push recovery}, + isbn = {978-1-4244-1861-9}, + url = {http://ieeexplore.ieee.org/document/4813931/}, + doi = {10.1109/ICHR.2007.4813931}, + abstract = {We extend simple models previously developed for humanoids to large push recovery. Using these simple models, we develop analytic decision surfaces that are functions of reference points, such as the center of mass and center of pressure, that predict whether or not a fall is inevitable. We explore three strategies for recovery: 1) using ankle torques, 2) moving internal joints, and 3) taking a step. These models can be used in robot controllers or in analysis of human balance and locomotion.}, language = {en}, - number = {01}, urldate = {2022-05-21}, - journal = {International Journal of Humanoid Robotics}, - author = {Vukobratović, Miomir and Borovac, Branislav}, - month = mar, - year = {2004}, - pages = {157--173}, - file = {Vukobratović and Borovac - 2004 - ZERO-MOMENT POINT — THIRTY FIVE YEARS OF ITS LIFE.pdf:/home/dferigo/Zotero/storage/7BQEUGQN/Vukobratović and Borovac - 2004 - ZERO-MOMENT POINT — THIRTY FIVE YEARS OF ITS LIFE.pdf:application/pdf}, + booktitle = {2007 7th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, + publisher = {IEEE}, + author = {Stephens, Benjamin}, + month = nov, + year = {2007}, + pages = {589--595}, + file = {Stephens - 2007 - Humanoid push recovery.pdf:/home/dferigo/Zotero/storage/W4D9H84Y/Stephens - 2007 - Humanoid push recovery.pdf:application/pdf}, } -@article{vukobratovic_contribution_1969, - title = {Contribution to the {Synthesis} of {Biped} {Gait}}, - volume = {BME-16}, - issn = {1558-2531}, - doi = {10.1109/TBME.1969.4502596}, - abstract = {The connection between the dynamics of an object and the algorithmic level has been modified in this paper, based on two-level control. The central modification consists in introducing feedbacks, that is, a system of regulators at the level of the formed typed of gait only. Such a modification originates from the assumption that a very narrow class of gait types needs to be taken into account when generating the gait. In the paper the gait has been formed on the basis of a fixed program having a kinematic-dynamic character. The kinematic part concerns the kinematic programnmed connections for activating the lower extremities, while the dynamic part exposes appropriate changes in the characteristic coordinates of the compensation system. Such a connection with a minimum number of coordinates extends the possibility of solving the problem of equilibrium in motion for one type of gait without any particular algorithm that would take into account the motion coordinates and form out of them a stable motion at a higher algebraic level.}, - number = {1}, - journal = {IEEE Transactions on Biomedical Engineering}, - author = {Vukobratovic, Miomir and Juricic, Davor}, - month = jan, - year = {1969}, - note = {Conference Name: IEEE Transactions on Biomedical Engineering}, - keywords = {Control systems, Motion control, Kinematics, Aerodynamics, Leg, Automatic control, Control system synthesis, Extremities, MIMO, Organisms}, - pages = {1--6}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/VPRS9676/4502596.html:text/html}, +@article{shafiee-ashtiani_push_2016, + title = {Push {Recovery} of a {Humanoid} {Robot} {Based} on {Model} {Predictive} {Control} and {Capture} {Point}}, + url = {http://arxiv.org/abs/1612.08034}, + abstract = {The three bio-inspired strategies that have been used for balance recovery of biped robots are the ankle, hip and stepping Strategies. However, there are several cases for a biped robot where stepping is not possible, e. g. when the available contact surfaces are limited. In this situation, the balance recovery by modulating the angular momentum of the upper body (Hip-strategy) or the Zero Moment Point (ZMP) (Ankle strategy) is essential. In this paper, a single Model Predictive Control (MPC) scheme is employed for controlling the Capture Point (CP) to a desired position by modulating both the ZMP and the Centroidal Moment Pivot (CMP). The goal of the proposed controller is to control the CP, employing the CMP when the CP is out of the support polygon, and/or the ZMP when the CP is inside the support polygon. The proposed algorithm is implemented on an abstract model of the SURENA III humanoid robot. Obtained results show the effectiveness of the proposed approach in the presence of severe pushes, even when the support polygon is shrunken to a point or a line.}, + language = {en}, + urldate = {2022-05-21}, + journal = {arXiv:1612.08034 [cs]}, + author = {Shafiee-Ashtiani, Milad and Yousefi-Koma, Aghil and Shariat-Panahi, Masoud and Khadiv, Majid}, + month = dec, + year = {2016}, + note = {arXiv: 1612.08034}, + keywords = {Computer Science - Robotics}, + file = {Shafiee-Ashtiani et al. - 2016 - Push Recovery of a Humanoid Robot Based on Model P.pdf:/home/dferigo/Zotero/storage/V5B979ZC/Shafiee-Ashtiani et al. - 2016 - Push Recovery of a Humanoid Robot Based on Model P.pdf:application/pdf}, } -@inproceedings{feng_optimization_2014, - title = {Optimization based full body control for the atlas robot}, - doi = {10.1109/HUMANOIDS.2014.7041347}, - abstract = {One popular approach to controlling humanoid robots is through inverse kinematics (IK) with stiff joint position tracking. On the other hand, inverse dynamics (ID) based approaches have gained increasing acceptance by providing compliant motions and robustness to external perturbations. However, the performance of such methods is heavily dependent on high quality dynamic models, which are often very difficult to produce for a physical robot. IK approaches only require kinematic models, which are much easier to generate in practice. In this paper, we supplement our previous work with ID-based controllers by adding IK, which helps compensate for modeling errors. The proposed full body controller is applied to three tasks in the DARPA Robotics Challenge (DRC) Trials in Dec. 2013.}, - booktitle = {2014 {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, - author = {Feng, Siyuan and Whitman, Eric and Xinjilefu, X and Atkeson, Christopher G.}, - month = nov, - year = {2014}, - note = {ISSN: 2164-0580}, - keywords = {Robots, Joints, Kinematics, Dynamics, Torque, Foot, Jacobian matrices}, - pages = {120--127}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/QSUVBIRV/7041347.html:text/html}, +@inproceedings{mcgreavy_unified_2020, + title = {Unified {Push} {Recovery} {Fundamentals}: {Inspiration} from {Human} {Study}}, + shorttitle = {Unified {Push} {Recovery} {Fundamentals}}, + doi = {10.1109/ICRA40945.2020.9196911}, + abstract = {Currently for balance recovery, humans outperform humanoid robots which use hand-designed controllers in terms of the diverse actions. This study aims to close this gap by finding core control principles that are shared across ankle, hip, toe and stepping strategies by formulating experiments to test human balance recoveries and define criteria to quantify the strategy in use. To reveal fundamental principles of balance strategies, our study shows that a minimum jerk controller can accurately replicate comparable human behaviour at the Centre of Mass level. Therefore, we formulate a general Model-Predictive Control (MPC) framework to produce recovery motions in any system, including legged machines, where the framework parameters are tuned for time-optimal performance in robotic systems.}, + booktitle = {2020 {IEEE} {International} {Conference} on {Robotics} and {Automation} ({ICRA})}, + author = {McGreavy, Christopher and Yuan, Kai and Gordon, Daniel and Tan, Kang and Wolfslag, Wouter J and Vijayakumar, Sethu and Li, Zhibin}, + month = may, + year = {2020}, + note = {ISSN: 2577-087X}, + keywords = {Robots, Hip, Foot, Force, Modulation, Stability criteria}, + pages = {10876--10882}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/2YFRVS3Q/9196911.html:text/html;McGreavy et al_2020_Unified Push Recovery Fundamentals.pdf:/home/dferigo/Zotero/storage/W2PNKYLM/McGreavy et al_2020_Unified Push Recovery Fundamentals.pdf:application/pdf}, } -@inproceedings{wieber_trajectory_2006, - title = {Trajectory {Free} {Linear} {Model} {Predictive} {Control} for {Stable} {Walking} in the {Presence} of {Strong} {Perturbations}}, - doi = {10.1109/ICHR.2006.321375}, - abstract = {A humanoid walking robot is a highly nonlinear dynamical system that relies strongly on contact forces between its feet and the ground in order to realize stable motions, but these contact forces are unfortunately severely limited. Model predictive control, also known as receding horizon control, is a general control scheme specifically designed to deal with such constrained dynamical systems, with the potential ability to react efficiently to a wide range of situations. Apart from the question of computation time which needs to be taken care of carefully (these schemes can be highly computation intensive), the initial question of which optimal control problems should be considered to be solved online in order to lead to the desired walking movements is still unanswered. A key idea for answering to this problem can be found in the ZMP preview control scheme. After presenting here this scheme with a point of view slightly different from the original one, we focus on the problem of compensating strong perturbations of the dynamics of the robot and propose a new linear model predictive control scheme which is an improvement of the original ZMP preview control scheme.}, - booktitle = {2006 6th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, - author = {Wieber, Pierre-brice}, +@article{jeong_robust_2019, + title = {A {Robust} {Walking} {Controller} {Based} on {Online} {Optimization} of {Ankle}, {Hip}, and {Stepping} {Strategies}}, + volume = {35}, + issn = {1941-0468}, + doi = {10.1109/TRO.2019.2926487}, + abstract = {In this paper, we propose a biped walking controller that optimized three push recovery strategies: the ankle, hip, and stepping strategies. We suggested formulations that related the effects of each strategy to the stability of walking based on the linear inverted pendulum with flywheel model. With these relations, we could set up an optimization problem that integrates all the strategies, including step time change. These strategies are not applied hierarchically, but applied according to each weighting factor. Various combinations of weighting factors can be used to determine how the robot should respond to an external push. The optimization problem derived here includes many nonlinear components, but it has been linearized though some assumptions and it can be applied to a robot in real time. Our method is designed to be robust to modeling errors or weak perturbations, by exploiting the advantages of the foot. Hence, it is very practical to apply this algorithm to a real robot. The effectiveness of the walking controller has been verified through abstracted model simulation, full dynamics simulation, and a practical robot experiments.}, + number = {6}, + journal = {IEEE Transactions on Robotics}, + author = {Jeong, Hyobin and Lee, Inho and Oh, Jaesung and Lee, Kang Kyu and Oh, Jun-Ho}, month = dec, - year = {2006}, + year = {2019}, + note = {Conference Name: IEEE Transactions on Robotics}, + keywords = {Biped walking, Humanoid robots, Legged locomotion, humanoid robot, Torque, Hip, Foot, Optimization, divergent component of motion (DCM), walking control}, + pages = {1367--1386}, + file = {Jeong et al_2019_A Robust Walking Controller Based on Online Optimization of Ankle, Hip, and.pdf:/home/dferigo/Zotero/storage/J53BAQWP/Jeong et al_2019_A Robust Walking Controller Based on Online Optimization of Ankle, Hip, and.pdf:application/pdf}, +} + +@inproceedings{aftab_ankle_2012, + title = {Ankle, hip and stepping strategies for humanoid balance recovery with a single {Model} {Predictive} {Control} scheme}, + doi = {10.1109/HUMANOIDS.2012.6651514}, + abstract = {While humans are highly efficient in dealing with balance perturbations, current biped humanoid robots are far from showing similar skills. This is mainly due to the limited capacity of current robot controllers to deal with the inherently complex dynamics of biped robots. Though Model Predictive Control schemes have shown improved robustness to perturbations, they still suffer from a few shortcomings such as not considering the upper body inertial effects or non-optimal step durations. We propose here a Model Predictive Control scheme that specifically addresses these shortcomings and generates human-like responses to perturbations, involving appropriate combinations of ankle, hip and stepping strategies, with automatically adjusted step durations. The emphasis of this paper is on modeling and analyzing the effects of different cost functions and coefficients on the behavior of the controller while leaving real-time implementations and experiments for later work.}, + booktitle = {2012 12th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots} ({Humanoids} 2012)}, + author = {Aftab, Zohaib and Robert, Thomas and Wieber, Pierre-Brice}, + month = nov, + year = {2012}, note = {ISSN: 2164-0580}, - keywords = {Trajectory, Optimal control, Control systems, Motion control, Force control, Humanoid robots, Legged locomotion, Predictive models, Nonlinear dynamical systems, Predictive control}, - pages = {137--142}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/6NQ6937N/4115592.html:text/html;Wieber_2006_Trajectory Free Linear Model Predictive Control for Stable Walking in the.pdf:/home/dferigo/Zotero/storage/PHYMXB8L/Wieber_2006_Trajectory Free Linear Model Predictive Control for Stable Walking in the.pdf:application/pdf}, + keywords = {Robots, Stability analysis, Acceleration, Hip, Foot, Predictive control, Linear programming}, + pages = {159--164}, + file = {Aftab et al_2012_Ankle, hip and stepping strategies for humanoid balance recovery with a single.pdf:/home/dferigo/Zotero/storage/FGKZFRTL/Aftab et al_2012_Ankle, hip and stepping strategies for humanoid balance recovery with a single.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/LJPL5LQR/6651514.html:text/html}, } -@inproceedings{kim_push_2019, - title = {Push {Recovery} {Control} for {Humanoid} {Robot} {Using} {Reinforcement} {Learning}}, - doi = {10.1109/IRC.2019.00102}, - abstract = {A humanoid robot similar to a human is structurally unstable, so the push recovery control is essential. The proposed push recovery controller consists of a IMU sensor part, a highlevel push recovery controller and a low-level push recovery controller. The IMU sensor part measures the linear velocity and angular velocity and transmits it to the high-level push recovery controller. The high-level push recovery controller selects the strategy of the low-level push recovery controller based on the stability region. The stability region is improved using the DQN(Deep Q-Network) of the reinforcement learning method. The low-level push recovery controller consists of a ankle, hip and step strategies. Each strategy is analyzed using LIPM(Linear Inverted Pendulum Model). Based on the analysis, the actuators corresponding to each strategy are controlled.}, - booktitle = {2019 {Third} {IEEE} {International} {Conference} on {Robotic} {Computing} ({IRC})}, - author = {Kim, Harin and Seo, Donghyeon and Kim, Donghan}, - month = feb, - year = {2019}, - keywords = {Reinforcement learning, learning (artificial intelligence), linear systems, Mathematical model, Hip, humanoid robots, robot programming, motion control, humanoid robot, Humanoid robots, stability, legged locomotion, reinforcement learning, control engineering computing, push recovery, bipedal robot, Torso, linear inverted pendulum model, Actuators, angular velocity, balancing control, Deep Q-Network, DQN, IMU sensor part, linear velocity, nonlinear systems, pendulums, push recovery control, stability region}, - pages = {488--492}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/9R2B6JH9/8675597.html:text/html;Kim et al_2019_Push Recovery Control for Humanoid Robot Using Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/I52S4RI7/Kim et al_2019_Push Recovery Control for Humanoid Robot Using Reinforcement Learning.pdf:application/pdf}, +@inproceedings{abdallah_biomechanically_2005, + title = {A {Biomechanically} {Motivated} {Two}-{Phase} {Strategy} for {Biped} {Upright} {Balance} {Control}}, + doi = {10.1109/ROBOT.2005.1570406}, + abstract = {Balance maintenance and upright posture recovery under unexpected environmental forces are key requirements for safe and successful co-existence of humanoid robots in normal human environments. In this paper we present a two-phase control strategy for robust balance maintenance under a force disturbance. The first phase, called the reflex phase, is designed to withstand the immediate effect of the force. The second phase is the recovery phase where the system is steered back to a statically stable “home” posture. The reflex control law employs angular momentum and is characterized by its counter-intuitive quality of “yielding” to the disturbance. The recovery control employs a general scheme of seeking to maximize the potential energy and is robust to local ground surface feature. Biomechanics literature indicates a similar strategy in play during human balance maintenance.}, + booktitle = {Proceedings of the 2005 {IEEE} {International} {Conference} on {Robotics} and {Automation}}, + author = {Abdallah, M. and Goswami, A.}, + month = apr, + year = {2005}, + note = {ISSN: 1050-4729}, + keywords = {Robust control, Biped robot, Humanoid robots, Force control, Hip, Foot, Biological system modeling, Humans, Biomechanics, balance, disturbance rejection, posture recovery, potential energy, Potential energy, Robotics and automation}, + pages = {1996--2001}, + file = {Abdallah_Goswami_2005_A Biomechanically Motivated Two-Phase Strategy for Biped Upright Balance Control.pdf:/home/dferigo/Zotero/storage/CQLPYTNX/Abdallah_Goswami_2005_A Biomechanically Motivated Two-Phase Strategy for Biped Upright Balance Control.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/IGRBKH2M/1570406.html:text/html}, } -@techreport{duburcq_reactive_2022, - title = {Reactive {Stepping} for {Humanoid} {Robots} using {Reinforcement} {Learning}: {Application} to {Standing} {Push} {Recovery} on the {Exoskeleton} {Atalante}}, - shorttitle = {Reactive {Stepping} for {Humanoid} {Robots} using {Reinforcement} {Learning}}, - url = {http://arxiv.org/abs/2203.01148}, - abstract = {State-of-the-art reinforcement learning is now able to learn versatile locomotion, balancing and push-recovery capabilities for bipedal robots in simulation. Yet, the reality gap has mostly been overlooked and the simulated results hardly transfer to real hardware. Either it is unsuccessful in practice because the physics is over-simplified and hardware limitations are ignored, or regularity is not guaranteed and unexpected hazardous motions can occur. This paper presents a reinforcement learning framework capable of learning robust standing push recovery for bipedal robots with a smooth out-of-the-box transfer to reality, requiring only instantaneous proprioceptive observations. By combining original termination conditions and policy smoothness conditioning, we achieve stable learning, sim-to-real transfer and safety using a policy without memory nor observation history. Reward shaping is then used to give insights into how to keep balance. We demonstrate its performance in reality on the lower-limb medical exoskeleton Atalante.}, - number = {arXiv:2203.01148}, +@article{shafiee_online_2019, + title = {Online {DCM} {Trajectory} {Generation} for {Push} {Recovery} of {Torque}-{Controlled} {Humanoid} {Robots}}, + url = {http://arxiv.org/abs/1909.10403}, + abstract = {We present a computationally efficient method for online planning of bipedal walking trajectories with push recovery. In particular, the proposed methodology fits control architectures where the Divergent-Component-of-Motion (DCM) is planned beforehand, and adds a step adapter to adjust the planned trajectories and achieve push recovery. Assuming that the robot is in a single support state, the step adapter generates new positions and timings for the next step. The step adapter is active in single support phases only, but the proposed torque-control architecture considers double support phases too. The key idea for the design of the step adapter is to impose both initial and final DCM step values using an exponential interpolation of the time varying ZMP trajectory. This allows us to cast the push recovery problem as a Quadratic Programming (QP) one, and to solve it online with stateof-the-art optimisers. The overall approach is validated with simulations of the torque-controlled 33 kg humanoid robot iCub. Results show that the proposed strategy prevents the humanoid robot from falling while walking at 0.28 m/s and pushed with external forces up to 150 Newton for 0.05 seconds.}, + language = {en}, urldate = {2022-05-21}, - institution = {arXiv}, - author = {Duburcq, Alexis and Schramm, Fabian and Boéris, Guilhem and Bredeche, Nicolas and Chevaleyre, Yann}, - month = mar, - year = {2022}, - doi = {10.48550/arXiv.2203.01148}, - note = {arXiv:2203.01148 [cs] -version: 1 -type: article}, + journal = {arXiv:1909.10403 [cs]}, + author = {Shafiee, Milad and Romualdi, Giulio and Dafarra, Stefano and Chavez, Francisco Javier Andrade and Pucci, Daniele}, + month = oct, + year = {2019}, + note = {arXiv: 1909.10403}, keywords = {Computer Science - Robotics}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/UBANG96C/2203.html:text/html;Duburcq et al_2022_Reactive Stepping for Humanoid Robots using Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/SQ73JWFF/Duburcq et al_2022_Reactive Stepping for Humanoid Robots using Reinforcement Learning.pdf:application/pdf}, + file = {Shafiee et al. - 2019 - Online DCM Trajectory Generation for Push Recovery.pdf:/home/dferigo/Zotero/storage/SC6TWNXN/Shafiee et al. - 2019 - Online DCM Trajectory Generation for Push Recovery.pdf:application/pdf}, } -@techreport{zamora_extending_2017, - title = {Extending the {OpenAI} {Gym} for robotics: a toolkit for reinforcement learning using {ROS} and {Gazebo}}, - shorttitle = {Extending the {OpenAI} {Gym} for robotics}, - url = {http://arxiv.org/abs/1608.05742}, - abstract = {This paper presents an extension of the OpenAI Gym for robotics using the Robot Operating System (ROS) and the Gazebo simulator. The content discusses the software architecture proposed and the results obtained by using two Reinforcement Learning techniques: Q-Learning and Sarsa. Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions.}, - number = {arXiv:1608.05742}, - urldate = {2022-05-31}, - institution = {arXiv}, - author = {Zamora, Iker and Lopez, Nestor Gonzalez and Vilches, Victor Mayoral and Cordero, Alejandro Hernandez}, - month = feb, +@article{shafiee-ashtiani_push_2017, + title = {Push {Recovery} of a {Position}-{Controlled} {Humanoid} {Robot} {Based} on {Capture} {Point} {Feedback} {Control}}, + url = {http://arxiv.org/abs/1710.10598}, + abstract = {In this paper, a combination of ankle and hip strategy is used for push recovery of a position-controlled humanoid robot. Ankle strategy and hip strategy are equivalent to Center of Pressure (CoP) and Centroidal Moment Pivot (CMP) regulation respectively. For controlling the CMP and CoP we need a torque-controlled robot, however most of the conventional humanoid robots are position controlled. In this regard, we present an efficient way for implementation of the hip and ankle strategies on a position controlled humanoid robot. We employ a feedback controller to compensate the capture point error. Using our scheme, a simple and practical push recovery controller is designed which can be implemented on the most of the conventional humanoid robots without the need for torque sensors. The effectiveness of the proposed approach is verified through push recovery experiments on SURENA-Mini humanoid robot under severe pushes.}, + language = {en}, + urldate = {2022-05-21}, + journal = {arXiv:1710.10598 [cs]}, + author = {Shafiee-Ashtiani, Milad and Yousefi-Koma, Aghil and Mirjalili, Reihaneh and Maleki, Hessam and Karimi, Mojtaba}, + month = oct, year = {2017}, - doi = {10.48550/arXiv.1608.05742}, - note = {arXiv:1608.05742 [cs] -type: article}, + note = {arXiv: 1710.10598}, keywords = {Computer Science - Robotics}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/HH56KUXC/1608.html:text/html;Zamora et al_2017_Extending the OpenAI Gym for robotics.pdf:/home/dferigo/Zotero/storage/563JXI4P/Zamora et al_2017_Extending the OpenAI Gym for robotics.pdf:application/pdf}, + file = {Shafiee-Ashtiani et al. - 2017 - Push Recovery of a Position-Controlled Humanoid Ro.pdf:/home/dferigo/Zotero/storage/BFPJNTZT/Shafiee-Ashtiani et al. - 2017 - Push Recovery of a Position-Controlled Humanoid Ro.pdf:application/pdf}, } -@misc{noauthor_icub_nodate, - title = {{iCub}: {The} not-yet-finished story of building a robot child}, - url = {https://www.science.org/doi/10.1126/scirobotics.aaq1026}, - urldate = {2022-05-31}, +@inproceedings{pratt_capture_2006, + title = {Capture {Point}: {A} {Step} toward {Humanoid} {Push} {Recovery}}, + shorttitle = {Capture {Point}}, + doi = {10.1109/ICHR.2006.321385}, + abstract = {It is known that for a large magnitude push a human or a humanoid robot must take a step to avoid a fall. Despite some scattered results, a principled approach towards "when and where to take a step" has not yet emerged. Towards this goal, we present methods for computing capture points and the capture region, the region on the ground where a humanoid must step to in order to come to a complete stop. The intersection between the capture region and the base of support determines which strategy the robot should adopt to successfully stop in a given situation. Computing the capture region for a humanoid, in general, is very difficult. However, with simple models of walking, computation of the capture region is simplified. We extend the well-known linear inverted pendulum model to include a flywheel body and show how to compute exact solutions of the capture region for this model. Adding rotational inertia enables the humanoid to control its centroidal angular momentum, much like the way human beings do, significantly enlarging the capture region. We present simulations of a simple planar biped that can recover balance after a push by stepping to the capture region and using internal angular momentum. Ongoing work involves applying the solution from the simple model as an approximate solution to more complex simulations of bipedal walking, including a 3D biped with distributed mass.}, + booktitle = {2006 6th {IEEE}-{RAS} {International} {Conference} on {Humanoid} {Robots}}, + author = {Pratt, Jerry and Carff, John and Drakunov, Sergey and Goswami, Ambarish}, + month = dec, + year = {2006}, + note = {ISSN: 2164-0580}, + keywords = {Computational modeling, Humanoid robots, Legged locomotion, humanoid robots, legged locomotion, humanoid robot, Acceleration, nonlinear control systems, Foot, torque control, position control, linear inverted pendulum model, Biological system modeling, biped robot, capture point, capture region, centroidal angular momentum, Closed-form solution, Cognition, flywheel body, Flywheels, humanoid push recovery, Humans, rotational inertia}, + pages = {200--207}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/JNWUQT8M/4115602.html:text/html;Pratt et al_2006_Capture Point.pdf:/home/dferigo/Zotero/storage/2M2Q2H7D/Pratt et al_2006_Capture Point.pdf:application/pdf}, } -@article{metta_open_2005, - title = {an open framework for research in embodied cognition}, - abstract = {This paper presents a research initiative on embodied cognition called RobotCub. RobotCub is an EUfunded project that aims at developing an open humanoid robotic platform and, simultaneously, following an original research path in synthetic cognition. We report on the motivations behind the realization of our humanoid robotic platform and the current status of the design just before the construction of the first full prototype.}, +@inproceedings{traversaro_unied_2017, + title = {A {Unified} {View} of the {Equations} of {Motion} used for {Control} {Design} of {Humanoid} {Robots}}, + abstract = {This paper contributes towards the development of a unified standpoint on the equations of motion used for the control of free-floating mechanical systems. In particular, the contribution of the manuscript is twofold. First, we show how to write the system equations of motion for any choice of the base frame, without the need of re-applying algorithms for evaluating the mass, coriolis, and gravity matrix. A particular attention is paid to the properties associated with the mechanical systems, which are shown to be invariant with respect to the base frame choice. Secondly, we show that the so-called centroidal dynamics can be obtained from any expression of the equations of motion via an appropriate system state transformation. In this case, we show that the mass matrix associated with the new state is block diagonal, and the new base velocity corresponds to the so-called average 6D velocity.}, language = {en}, - author = {Metta, Giorgio and Sandini, Giulio and Vernon, David and Caldwell, Darwin and Tsagarakis, Nikolaos and Beira, Ricardo and Santos-Victor, Jose' and Ijspeert, Auke and Righetti, Ludovic and Cappiello, Giovanni and Stellin, Giovanni and Becchi, Francesco}, - year = {2005}, - pages = {9}, - file = {Metta et al. - 2005 - an open framework for research in embodied cogniti.pdf:/home/dferigo/Zotero/storage/ZWCLYQYE/Metta et al. - 2005 - an open framework for research in embodied cogniti.pdf:application/pdf}, -} - -@article{natale_icub_2017, - title = {{iCub}: {The} not-yet-finished story of building a robot child}, - volume = {2}, - shorttitle = {{iCub}}, - url = {https://www.science.org/doi/full/10.1126/scirobotics.aaq1026}, - doi = {10.1126/scirobotics.aaq1026}, - number = {13}, - urldate = {2022-05-31}, - journal = {Science Robotics}, - author = {Natale, Lorenzo and Bartolozzi, Chiara and Pucci, Daniele and Wykowska, Agnieszka and Metta, Giorgio}, - month = dec, + author = {Traversaro, Silvio and Pucci, Daniele and Nori, Francesco}, year = {2017}, - note = {Publisher: American Association for the Advancement of Science}, - pages = {eaaq1026}, - file = {Natale et al_2017_iCub.pdf:/home/dferigo/Zotero/storage/FJ6ZI85X/Natale et al_2017_iCub.pdf:application/pdf}, + pages = {24}, } -@inproceedings{maclaurin_autograd_2015, - title = {Autograd: {Effortless} {Gradients} in {Numpy}}, - abstract = {Automatic differentiation can greatly speed up prototyping and implementation of machine learning models. However, most packages are implicitly domain-specific, requiring the use of a restricted mini-language for specifying functions. We introduce autograd, a package which differentiates standard Python and Numpy code, and can differentiate code containing while loops, branches, closures, classes and even its own gradient evaluations.}, +@article{baydin_automatic_2018, + title = {Automatic {Differentiation} in {Machine} {Learning}: a {Survey}}, + abstract = {Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply “autodiff”, is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other’s results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names “dynamic computational graphs” and “differentiable programming”. We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms “autodiff”, “automatic differentiation”, and “symbolic differentiation” as these are encountered more and more in machine learning settings.}, language = {en}, - author = {Maclaurin, Dougal and Duvenaud, David and Adams, Ryan P}, - year = {2015}, - pages = {3}, - file = {Maclaurin et al. - Autograd Effortless Gradients in Numpy.pdf:/home/dferigo/Zotero/storage/HB32N7M8/Maclaurin et al. - Autograd Effortless Gradients in Numpy.pdf:application/pdf}, + author = {Baydin, Atılım Gunes and Pearlmutter, Barak A and Radul, Alexey Andreyevich and Siskind, Jeffrey Mark}, + year = {2018}, + pages = {43}, + file = {Baydin et al_Automatic Differentiation in Machine Learning.pdf:/home/dferigo/Zotero/storage/9SE98F69/Baydin et al_Automatic Differentiation in Machine Learning.pdf:application/pdf}, } -@article{van_der_walt_numpy_2011, - title = {The {NumPy} {Array}: {A} {Structure} for {Efficient} {Numerical} {Computation}}, - volume = {13}, - issn = {1558-366X}, - shorttitle = {The {NumPy} {Array}}, - doi = {10.1109/MCSE.2011.37}, - abstract = {In the Python world, NumPy arrays are the standard representation for numerical data and enable efficient implementation of numerical computations in a high-level language. As this effort shows, NumPy performance can be improved through three techniques: vectorizing calculations, avoiding copying data in memory, and minimizing operation counts.}, - number = {2}, - journal = {Computing in Science \& Engineering}, - author = {van der Walt, Stefan and Colbert, S. Chris and Varoquaux, Gael}, - month = mar, - year = {2011}, - note = {Conference Name: Computing in Science \& Engineering}, - keywords = {Arrays, Computational efficiency, Finite element methods, Numerical analysis, numerical computations, NumPy, Performance evaluation, programming libraries, Python, Resource management, scientific programming, Vector quantization}, - pages = {22--30}, - file = {van der Walt et al_2011_The NumPy Array.pdf:/home/dferigo/Zotero/storage/SMLVNJJN/van der Walt et al_2011_The NumPy Array.pdf:application/pdf}, +@article{frostig_compiling_2018, + title = {Compiling machine learning programs via high-level tracing}, + abstract = {We describe JAX, a domain-specific tracing JIT compiler for generating high-performance accelerator code from pure Python and Numpy machine learning programs. JAX uses the XLA compiler infrastructure to generate optimized code for the program subroutines that are most favorable for acceleration, and these optimized subroutines can be called and orchestrated by arbitrary Python. Because the system is fully compatible with Autograd, it allows forward- and reverse-mode automatic differentiation of Python functions to arbitrary order. Because JAX supports structured control flow, it can generate code for sophisticated machine learning algorithms while maintaining high performance. We show that by combining JAX with Autograd and Numpy we get an easily programmable and highly performant ML system that targets CPUs, GPUs, and TPUs, capable of scaling to multi-core Cloud TPUs.}, + language = {en}, + author = {Frostig, Roy and Johnson, Matthew James and Leary, Chris}, + year = {2018}, + pages = {3}, + file = {Frostig et al. - Compiling machine learning programs via high-level.pdf:/home/dferigo/Zotero/storage/R9ZJ9L5K/Frostig et al. - Compiling machine learning programs via high-level.pdf:application/pdf}, } -@article{harris_array_2020, - title = {Array programming with {NumPy}}, - volume = {585}, - copyright = {2020 The Author(s)}, - issn = {1476-4687}, - url = {https://www.nature.com/articles/s41586-020-2649-2}, - doi = {10.1038/s41586-020-2649-2}, - abstract = {Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.}, +@article{choi_use_2021, + title = {On the use of simulation in robotics: {Opportunities}, challenges, and suggestions for moving forward}, + copyright = {Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.}, + shorttitle = {On the use of simulation in robotics}, + url = {https://dspace.mit.edu/handle/1721.1/139616}, + abstract = {© 2021 National Academy of Sciences. All rights reserved. The last five years marked a surge in interest for and use of smart robots, which operate in dynamic and unstructured environments and might interact with humans. We posit that well-validated computer simulation can provide a virtual proving ground that in many cases is instrumental in understanding safely, faster, at lower costs, and more thoroughly how the robots of the future should be designed and controlled for safe operation and improved performance. Against this backdrop, we discuss how simulation can help in robotics, barriers that currently prevent its broad adoption, and potential steps that can eliminate some of these barriers. The points and recommendations made concern the following simulation-in-robotics aspects: simulation of the dynamics of the robot; simulation of the virtual world; simulation of the sensing of this virtual world; simulation of the interaction between the human and the robot; and, in less depth, simulation of the communication between robots. This Perspectives contribution summarizes the points of view that coalesced during a 2018 National Science Foundation/Department of Defense/National Institute for Standards and Technology workshop dedicated to the topic at hand. The meeting brought together participants from a range of organizations, disciplines, and application fields, with expertise at the intersection of robotics, machine learning, and physics-based simulation.}, language = {en}, - number = {7825}, - urldate = {2022-06-01}, - journal = {Nature}, - author = {Harris, Charles R. and Millman, K. Jarrod and van der Walt, Stéfan J. and Gommers, Ralf and Virtanen, Pauli and Cournapeau, David and Wieser, Eric and Taylor, Julian and Berg, Sebastian and Smith, Nathaniel J. and Kern, Robert and Picus, Matti and Hoyer, Stephan and van Kerkwijk, Marten H. and Brett, Matthew and Haldane, Allan and del Río, Jaime Fernández and Wiebe, Mark and Peterson, Pearu and Gérard-Marchant, Pierre and Sheppard, Kevin and Reddy, Tyler and Weckesser, Warren and Abbasi, Hameer and Gohlke, Christoph and Oliphant, Travis E.}, - month = sep, - year = {2020}, - note = {Number: 7825 -Publisher: Nature Publishing Group}, - keywords = {Software, Computer science, Computational neuroscience, Computational science, Solar physics}, - pages = {357--362}, - file = {Harris et al_2020_Array programming with NumPy.pdf:/home/dferigo/Zotero/storage/7ZRWE676/Harris et al_2020_Array programming with NumPy.pdf:application/pdf}, + urldate = {2022-05-06}, + journal = {PNAS}, + author = {Choi, HeeSun and Crump, Cindy and Duriez, Christian and Elmquist, Asher and Hager, Gregory and Han, David and Hearl, Frank and Hodgins, Jessica and Jain, Abhinandan and Leve, Frederick and Li, Chen and Meier, Franziska and Negrut, Dan and Righetti, Ludovic and Rodriguez, Alberto and Tan, Jie and Trinkle, Jeff}, + year = {2021}, + note = {Accepted: 2022-01-14T19:58:12Z +Publisher: Proceedings of the National Academy of Sciences}, + file = {Choi et al_2021_On the use of simulation in robotics.pdf:/home/dferigo/Zotero/storage/2Z2DTR3H/Choi et al_2021_On the use of simulation in robotics.pdf:application/pdf;Snapshot:/home/dferigo/Zotero/storage/D7Q7XUG8/139616.html:text/html}, +} + +@article{james_pyrep_2019, + title = {{PyRep}: {Bringing} {V}-{REP} to {Deep} {Robot} {Learning}}, + shorttitle = {{PyRep}}, + url = {http://arxiv.org/abs/1906.11176}, + abstract = {PyRep is a toolkit for robot learning research, built on top of the virtual robotics experimentation platform (V-REP). Through a series of modifications and additions, we have created a tailored version of V-REP built with robot learning in mind. The new PyRep toolkit offers three improvements: (1) a simple and flexible API for robot control and scene manipulation, (2) a new rendering engine, and (3) speed boosts upwards of 10,000x in comparison to the previous Python Remote API. With these improvements, we believe PyRep is the ideal toolkit to facilitate rapid prototyping of learning algorithms in the areas of reinforcement learning, imitation learning, state estimation, mapping, and computer vision.}, + urldate = {2022-05-02}, + journal = {arXiv:1906.11176 [cs]}, + author = {James, Stephen and Freese, Marc and Davison, Andrew J.}, + month = jun, + year = {2019}, + note = {arXiv: 1906.11176}, + keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, Computer Science - Robotics}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/B7I7BMTG/1906.html:text/html;James et al_2019_PyRep.pdf:/home/dferigo/Zotero/storage/72C2E4NJ/James et al_2019_PyRep.pdf:application/pdf}, } -@article{sabne_xla_2020, - title = {{XLA}: {Compiling} {Machine} {Learning} for {Peak} {Performance}}, - shorttitle = {{XLA}}, - author = {Sabne, Amit}, +@article{lucchi_robo-gym_2020, + title = {robo-gym -- {An} {Open} {Source} {Toolkit} for {Distributed} {Deep} {Reinforcement} {Learning} on {Real} and {Simulated} {Robots}}, + url = {http://arxiv.org/abs/2007.02753}, + abstract = {Applying Deep Reinforcement Learning (DRL) to complex tasks in the field of robotics has proven to be very successful in the recent years. However, most of the publications focus either on applying it to a task in simulation or to a task in a real world setup. Although there are great examples of combining the two worlds with the help of transfer learning, it often requires a lot of additional work and fine-tuning to make the setup work effectively. In order to increase the use of DRL with real robots and reduce the gap between simulation and real world robotics, we propose an open source toolkit: robo-gym. We demonstrate a unified setup for simulation and real environments which enables a seamless transfer from training in simulation to application on the robot. We showcase the capabilities and the effectiveness of the framework with two real world applications featuring industrial robots: a mobile robot and a robot arm. The distributed capabilities of the framework enable several advantages like using distributed algorithms, separating the workload of simulation and training on different physical machines as well as enabling the future opportunity to train in simulation and real world at the same time. Finally we offer an overview and comparison of robo-gym with other frequently used state-of-the-art DRL frameworks.}, + urldate = {2022-05-02}, + journal = {arXiv:2007.02753 [cs]}, + author = {Lucchi, Matteo and Zindler, Friedemann and Mühlbacher-Karrer, Stephan and Pichler, Horst}, + month = nov, year = {2020}, - file = {Snapshot:/home/dferigo/Zotero/storage/GZ238JR2/pub50530.html:text/html}, + note = {arXiv: 2007.02753}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Software Engineering}, + file = {Lucchi et al_2020_robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/R5BMV45S/Lucchi et al_2020_robo-gym -- An Open Source Toolkit for Distributed Deep Reinforcement Learning.pdf:application/pdf}, } -@techreport{blondel_efficient_2022, - title = {Efficient and {Modular} {Implicit} {Differentiation}}, - url = {http://arxiv.org/abs/2105.15183}, - abstract = {Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, so far, implicit differentiation remained difficult to use for practitioners, as it often required case-by-case tedious mathematical derivations and implementations. In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines directly in Python a function F capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of F and the implicit function theorem to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many existing implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.}, +@inproceedings{delhaisse_pyrobolearn_2019, + title = {{PyRoboLearn}: {A} {Python} {Framework} for {Robot} {Learning} {Practitioners}}, + abstract = {On the quest for building autonomous robots, several robot learning frameworks with different functionalities have recently been developed. Yet, frameworks that combine diverse learning paradigms (such as imitation and reinforcement learning) into a common place are scarce. Existing ones tend to be robot-specific, and often require time-consuming work to be used with other robots. Also, their architecture is often weakly structured, mainly because of a lack of modularity and flexibility. This leads users to reimplement several pieces of code to integrate them into their own experimental or benchmarking work. To overcome these issues, we introduce PyRoboLearn, a new Python robot learning framework that combines different learning paradigms into a single framework. Our framework provides a plethora of robotic environments, learning models and algorithms. PyRoboLearn is developed with a particular focus on modularity, flexibility, generality, and simplicity to favor (re)usability. This is achieved by abstracting each key concept, undertaking a modular programming approach, minimizing the coupling among the different modules, and favoring composition over inheritance for better flexibility. We demonstrate the different features and utility of our framework through different use cases.}, language = {en}, - number = {arXiv:2105.15183}, - urldate = {2022-06-01}, - institution = {arXiv}, - author = {Blondel, Mathieu and Berthet, Quentin and Cuturi, Marco and Frostig, Roy and Hoyer, Stephan and Llinares-López, Felipe and Pedregosa, Fabian and Vert, Jean-Philippe}, - month = may, - year = {2022}, - note = {arXiv:2105.15183 [cs, math, stat] -type: article}, - keywords = {Computer Science - Machine Learning, Statistics - Machine Learning, Mathematics - Numerical Analysis}, - file = {Blondel et al. - 2022 - Efficient and Modular Implicit Differentiation.pdf:/home/dferigo/Zotero/storage/QT85GLQN/Blondel et al. - 2022 - Efficient and Modular Implicit Differentiation.pdf:application/pdf}, + author = {Delhaisse, Brian and Rozo, Leonel and Caldwell, Darwin G}, + year = {2019}, + pages = {11}, + file = {Delhaisse et al_2019_PyRoboLearn.pdf:/home/dferigo/Zotero/storage/DP65NMCS/Delhaisse et al_2019_PyRoboLearn.pdf:application/pdf}, } -@inproceedings{pucci_highly_2016, - title = {Highly dynamic balancing via force control}, - doi = {10.1109/HUMANOIDS.2016.7803266}, - abstract = {This video shows the latest results in the whole-body control of humanoid robots achieved by the Dynamic Interaction Control Lab at the Italian Institute of Technology. In particular, the control architecture is composed of two nested control loops. The internal loop, which runs at 1 KHz, is in charge of stabilizing any desired joint torque. This task is achieved thanks to an off-line identification procedure providing us with a reliable model of friction and motor constants. The outer loop, which generates desired joint torques at 100 Hz, is a momentum based control algorithm in the context of free-floating systems. More precisely, the control objective for the outer loop is the stabilization of the robot's linear and angular momentum and the associated zero dynamics. The latter objective can be used to stabilize a desired joint configuration. The stability of the control framework is shown to be in the sense of Lyapunov. The contact forces and torques are regulated so as to break contacts only at desired configurations. Switching between several contacts is taken into account thanks to a finite-state-machine that dictates the constraints acting on the system. The control framework is implemented on the iCub humanoid robot.}, - booktitle = {2016 {IEEE}-{RAS} 16th {International} {Conference} on {Humanoid} {Robots} ({Humanoids})}, - author = {Pucci, Daniele and Romano, Francesco and Traversaro, Silvio and Nori, Francesco}, - month = nov, +@article{hwangbo_per-contact_2018, + title = {Per-{Contact} {Iteration} {Method} for {Solving} {Contact} {Dynamics}}, + volume = {3}, + issn = {2377-3766}, + doi = {10.1109/LRA.2018.2792536}, + abstract = {This letter introduces a new iterative method for contact dynamics problems. The proposed method is based on an efficient bisection method which iterates over each contact. We compared our approach to two existing ones for the same model and found that it is about twice as fast as the existing ones. We also introduce four different robotic simulation experiments and compare the proposed method to the most common contact solver, the projected Gauss-Seidel (PGS) method. We show that, while both methods are very efficient in solving simple problems, the proposed method significantly outperforms the PGS method in more complicated contact scenarios. Simulating one time step of an 18-DOF quadruped robot with multiple contacts took less than 20 μs with a single core of a CPU. This is at least an order of magnitude faster than many other simulators which employ multiple relaxation methods to the major dynamic principles in order to boost their computational speed. The proposed simulation method is also stable at 50 Hz due to its strict adherence to the dynamical principles. Although the accuracy might be compromised at such a low update rate, this means that we can simulate an 18-DOF robot more than thousand times faster than the real time.}, + number = {2}, + journal = {IEEE Robotics and Automation Letters}, + author = {Hwangbo, J. and Lee, J. and Hutter, M.}, + month = apr, + year = {2018}, + keywords = {Legged locomotion, Robot kinematics, legged locomotion, legged robots, Friction, Mathematical model, iterative methods, robot dynamics, robot kinematics, mechanical contact, 18-DOF quadruped robot, bisection method, collision avoidance, complicated contact scenarios, contact dynamics problems, contact modeling, contact solver, CPU, dynamical principles, frequency 50.0 Hz, Indexes, multiple contacts, multiple relaxation methods, per-contact iteration method, PGS method, projected Gauss-Seidel method, robotic simulation experiments, Simulation and animation, simulation method}, + pages = {895--902}, + file = {Hwangbo et al_2018_Per-Contact Iteration Method for Solving Contact Dynamics.pdf:/home/dferigo/Zotero/storage/EM7MBJHW/Hwangbo et al_2018_Per-Contact Iteration Method for Solving Contact Dynamics.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/KYQ3U5SM/8255551.html:text/html}, +} + +@book{coumans_pybullet_2016, + title = {Pybullet, a python module for physics simulation in robotics, games and machine learning}, + author = {Coumans, Erwin and Bai, Yunfei}, year = {2016}, - note = {ISSN: 2164-0580}, - keywords = {Dynamics, Force control, Friction, Humanoid robots, Reliability, Torque}, - pages = {141--141}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/LG6JJYV2/7803266.html:text/html}, } -@inproceedings{dafarra_control_2018, - title = {A {Control} {Architecture} with {Online} {Predictive} {Planning} for {Position} and {Torque} {Controlled} {Walking} of {Humanoid} {Robots}}, - doi = {10.1109/IROS.2018.8594277}, - abstract = {A common approach to the generation of walking patterns for humanoid robots consists in adopting a layered control architecture. This paper proposes an architecture composed of three nested control loops. The outer loop exploits a robot kinematic model to plan the footstep positions. In the mid layer, a predictive controller generates a Center of Mass trajectory according to the well-known table-cart model. Through a whole-body inverse kinematics algorithm, we can define joint references for position controlled walking. The outcomes of these two loops are then interpreted as inputs of a stack-of-task QP-based torque controller, which represents the inner loop of the presented control architecture. This resulting architecture allows the robot to walk also in torque control, guaranteeing higher level of compliance. Real world experiments have been carried on the humanoid robot iCub.}, - booktitle = {2018 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, - author = {Dafarra, Stefano and Nava, Gabriele and Charbonneau, Marie and Guedelha, Nuno and Andrade, Francisco and Traversaro, Silvio and Fiorio, Luca and Romano, Francesco and Nori, Francesco and Metta, Giorgio and Pucci, Daniele}, +@inproceedings{todorov_mujoco_2012, + title = {{MuJoCo}: {A} physics engine for model-based control}, + shorttitle = {{MuJoCo}}, + doi = {10.1109/IROS.2012.6386109}, + abstract = {We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are specified using either a high-level C++ API or an intuitive XML file format. A built-in compiler transforms the user model into an optimized data structure used for runtime computation. The engine can compute both forward and inverse dynamics. The latter are well-defined even in the presence of contacts and equality constraints. The model can include tendon wrapping as well as actuator activation states (e.g. pneumatic cylinders or muscles). To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel. Around 400,000 dynamics evaluations per second are possible on a 12-core machine, for a 3D homanoid with 18 dofs and 6 active contacts. We have already used the engine in a number of control applications. It will soon be made publicly available.}, + booktitle = {2012 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems}}, + author = {Todorov, E. and Erez, T. and Tassa, Y.}, month = oct, - year = {2018}, - note = {ISSN: 2153-0866}, - keywords = {Trajectory, Mathematical model, humanoid robots, Robot kinematics, Computer architecture, Humanoid robots, legged locomotion, Legged locomotion, path planning, robot kinematics, torque control, position control, predictive control, center of mass trajectory, control loops, footstep positions, gait analysis, iCub, inverse kinematics algorithm, layered control architecture, online predictive planning, position controlled walking, predictive controller, robot kinematic model, stack-of-task QP-based torque controller, table-cart model, torque controlled walking}, - pages = {1--9}, - file = {Dafarra et al_2018_A Control Architecture with Online Predictive Planning for Position and Torque.pdf:/home/dferigo/Zotero/storage/LH9MSN77/Dafarra et al_2018_A Control Architecture with Online Predictive Planning for Position and Torque.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/96C6RSJ4/8594277.html:text/html}, + year = {2012}, + keywords = {Computational modeling, humanoid robots, Dynamics, application program interfaces, Heuristic algorithms, Mathematical model, optimal control, control engineering computing, 12-core machine, 3D humanoid, active contacts, actuator activation states, built-in compiler transforms, C++ language, data structures, Engines, finite difference methods, finite differencing, high-level C++ API, intuitive XML file format, model-based control, MuJoCo, multijoint dynamics, optimal control applications, Optimization, optimized data structure, physics engine, program compilers, recursive algorithms, runtime computation, shock absorbers, spring-dampers, tendon wrapping, velocity-stepping approach, XML}, + pages = {5026--5033}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/UYHBLVES/6386109.html:text/html;Todorov et al_2012_MuJoCo.pdf:/home/dferigo/Zotero/storage/MH7NEVNR/Todorov et al_2012_MuJoCo.pdf:application/pdf}, } -@incollection{cangelosi_robot_2022, - address = {Cambridge, MA, USA}, - series = {Intelligent {Robotics} and {Autonomous} {Agents} series}, - title = {Robot {Platforms} and {Simulators}}, - copyright = {All rights reserved}, - isbn = {978-0-262-04683-1}, - abstract = {The current state of the art in cognitive robotics, covering the challenges of building AI-powered intelligent robots inspired by natural cognitive systems.}, - language = {en}, - booktitle = {Cognitive {Robotics}}, - publisher = {MIT Press}, - author = {Ferigo, Diego and Parmiggiani, Alberto and Rampone, Elena and Tikhanoff, Vadim and Traversaro, Silvio and Pucci, Daniele and Natale, Lorenzo}, - editor = {Cangelosi, Angelo and Asada, Minoru and Arkin, Ronald C.}, +@article{peng_sim--real_2018, + title = {Sim-to-{Real} {Transfer} of {Robotic} {Control} with {Dynamics} {Randomization}}, + doi = {10.1109/ICRA.2018.8460528}, + abstract = {Simulations are attractive environments for training agents as they provide an abundant source of data and alleviate certain safety concerns during the training process. But the behaviours developed by agents in simulation are often specific to the characteristics of the simulator. Due to modeling error, strategies that are successful in simulation may not transfer to their real world counterparts. In this paper, we demonstrate a simple method to bridge this "reality gap". By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained. This adaptivity enables the policies to generalize to the dynamics of the real world without any training on the physical system. Our approach is demonstrated on an object pushing task using a robotic arm. Despite being trained exclusively in simulation, our policies are able to maintain a similar level of performance when deployed on a real robot, reliably moving an object to a desired location from random initial configurations. We explore the impact of various design decisions and show that the resulting policies are robust to significant calibration error.}, + urldate = {2019-05-22}, + journal = {IEEE International Conference on Robotics and Automation (ICRA)}, + author = {Peng, Xue Bin and Andrychowicz, Marcin and Zaremba, Wojciech and Abbeel, Pieter}, month = may, - year = {2022}, - file = {Open Access:/home/dferigo/Zotero/storage/MKCT3PSL/Cognitive-Robotics.html:text/html}, + year = {2018}, + note = {arXiv: 1710.06537}, + keywords = {Computer Science - Robotics, Computer Science - Systems and Control}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/2MBLKJ24/1710.html:text/html;Peng et al_2018_Sim-to-Real Transfer of Robotic Control with Dynamics Randomization.pdf:/home/dferigo/Zotero/storage/BUBU4LU3/Peng et al_2018_Sim-to-Real Transfer of Robotic Control with Dynamics Randomization.pdf:application/pdf}, } -@article{viceconte_adherent_2022, - title = {{ADHERENT}: {Learning} {Human}-like {Trajectory} {Generators} for {Whole}-body {Control} of {Humanoid} {Robots}}, - volume = {7}, - copyright = {All rights reserved}, - issn = {2377-3766}, - shorttitle = {{ADHERENT}}, - doi = {10.1109/LRA.2022.3141658}, - abstract = {Human-like trajectory generation and footstep planning represent challenging problems in humanoid robotics. Recently, research in computer graphics investigated machine-learning methods for character animation based on training human-like models directly on motion capture data. Such methods proved effective in virtual environments, mainly focusing on trajectory visualization. This letter presents ADHERENT, a system architecture integrating machine-learning methods used in computer graphics with whole-body control methods employed in robotics to generate and stabilize human-like trajectories for humanoid robots. Leveraging human motion capture locomotion data, ADHERENT yields a general footstep planner, including forward, sideways, and backward walking trajectories that blend smoothly from one to another. Furthermore, at the joint configuration level, ADHERENT computes data-driven whole-body postural reference trajectories coherent with the generated footsteps, thus increasing the human likeness of the resulting robot motion. Extensive validations of the proposed architecture are presented with both simulations and real experiments on the iCub humanoid robot, thus demonstrating ADHERENT to be robust to varying step sizes and walking speeds.}, - number = {2}, - journal = {IEEE Robotics and Automation Letters}, - author = {Viceconte, Paolo Maria and Camoriano, Raffaello and Romualdi, Giulio and Ferigo, Diego and Dafarra, Stefano and Traversaro, Silvio and Oriolo, Giuseppe and Rosasco, Lorenzo and Pucci, Daniele}, - month = apr, - year = {2022}, - note = {Conference Name: IEEE Robotics and Automation Letters}, - keywords = {Computer architecture, Generators, Humanoid robot systems, Humanoid robots, Legged locomotion, machine learning for robot control, Robot kinematics, Robots, Trajectory, whole-body motion planning and control}, - pages = {2779--2786}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/WMUUTDFE/9676410.html:text/html;Viceconte et al_2022_ADHERENT.pdf:/home/dferigo/Zotero/storage/CKLBBNAL/Viceconte et al_2022_ADHERENT.pdf:application/pdf}, +@inproceedings{krammer_standardized_2019, + title = {Standardized {Integration} of {Real}-{Time} and {Non}-{Real}-{Time} {Systems}: {The} {Distributed} {Co}-{Simulation} {Protocol}}, + shorttitle = {Standardized {Integration} of {Real}-{Time} and {Non}-{Real}-{Time} {Systems}}, + doi = {10.3384/ecp1915787}, + abstract = {Co-simulation techniques have evolved significantly over the last 10 years. System simulation and hardware-in-the-loop testing are used to develop complex products in many industrial sectors. The Functional Mock-Up Interface (FMI) represents a standardized solution for integration of simulation models, tools and solvers. In practice the integration and coupling of heterogeneous systems still require enormous efforts. Until now no standardized interface or protocol specification is available, which allows the interaction of real-time and non-real-time systems of different vendors. This paper presents selected technical aspects of the novel Distributed Co-simulation Protocol (DCP) and highlights primary application possibilities. The DCP consists of a data model, a finite state machine, and a communication protocol including a set of protocol data units. It supports a master-slave architecture for simulation setup and control. The DCP was developed in context of the ACOSAR project and was subsequently adopted by Modelica Association as a Modelica Association Project (MAP). It may be used in numerous industrial and scientific applications. The standardization of the DCP allows for a modular and interoperable development between system providers and integrators. In the end, this will lead to more efficient product development and testing.}, + language = {en}, + urldate = {2019-08-08}, + booktitle = {13th {International} {Modelica} {Conference}}, + author = {Krammer, Martin and Schuch, Klaus and Kater, Christian and Alekeish, Khaled and Blochwitz, Torsten and Materne, Stefan and Soppa, Andreas and Benedikt, Martin}, + month = feb, + year = {2019}, + file = {Krammer et al_2019_Standardized Integration of Real-Time and Non-Real-Time Systems.pdf:/home/dferigo/Zotero/storage/B7937HYG/Krammer et al_2019_Standardized Integration of Real-Time and Non-Real-Time Systems.pdf:application/pdf}, } -@techreport{chen_imitation_2022, - title = {Imitation {Learning} via {Differentiable} {Physics}}, - url = {http://arxiv.org/abs/2206.04873}, - abstract = {Existing imitation learning (IL) methods such as inverse reinforcement learning (IRL) usually have a double-loop training process, alternating between learning a reward function and a policy and tend to suffer long training time and high variance. In this work, we identify the benefits of differentiable physics simulators and propose a new IL method, i.e., Imitation Learning via Differentiable Physics (ILD), which gets rid of the double-loop design and achieves significant improvements in final performance, convergence speed, and stability. The proposed ILD incorporates the differentiable physics simulator as a physics prior into its computational graph for policy learning. It unrolls the dynamics by sampling actions from a parameterized policy, simply minimizing the distance between the expert trajectory and the agent trajectory, and back-propagating the gradient into the policy via temporal physics operators. With the physics prior, ILD policies can not only be transferable to unseen environment specifications but also yield higher final performance on a variety of tasks. In addition, ILD naturally forms a single-loop structure, which significantly improves the stability and training speed. To simplify the complex optimization landscape induced by temporal physics operations, ILD dynamically selects the learning objectives for each state during optimization. In our experiments, we show that ILD outperforms state-of-the-art methods in a variety of continuous control tasks with Brax, requiring only one expert demonstration. In addition, ILD can be applied to challenging deformable object manipulation tasks and can be generalized to unseen configurations.}, +@inproceedings{xia_gibson_2018, + title = {Gibson {Env}: {Real}-{World} {Perception} for {Embodied} {Agents}}, + isbn = {978-1-5386-6420-9}, + shorttitle = {Gibson {Env}}, + doi = {10.1109/CVPR.2018.00945}, language = {en}, - number = {arXiv:2206.04873}, - urldate = {2022-06-18}, - institution = {arXiv}, - author = {Chen, Siwei and Ma, Xiao and Xu, Zhongwen}, + urldate = {2019-05-21}, + booktitle = {{IEEE}/{CVF} {Conference} on {Computer} {Vision} and {Pattern} {Recognition}}, + author = {Xia, Fei and Zamir, Amir R. and He, Zhiyang and Sax, Alexander and Malik, Jitendra and Savarese, Silvio}, month = jun, - year = {2022}, - note = {arXiv:2206.04873 [cs] -type: article}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, - file = {Chen et al_2022_Imitation Learning via Differentiable Physics.pdf:/home/dferigo/Zotero/storage/AHDG85ED/Chen et al_2022_Imitation Learning via Differentiable Physics.pdf:application/pdf}, -} - -@inproceedings{yang_emergence_2017, - title = {Emergence of human-comparable balancing behaviours by deep reinforcement learning}, - doi = {10.1109/HUMANOIDS.2017.8246900}, - abstract = {This paper presents a hierarchical framework based on deep reinforcement learning that naturally acquires control policies that are capable of performing balancing behaviours such as ankle push-offs for humanoid robots, without explicit human design of controllers. Only the reward for training the neural network is specifically formulated based on the physical principles and quantities, and hence explainable. The successful emergence of human-comparable behaviours through the deep reinforcement learning demonstrates the feasibility of using an AI-based approach for humanoid motion control in a unified framework. Moreover, the balance strategies learned by reinforcement learning provides a larger range of disturbance rejection than that of the zero moment point based methods, suggesting a research direction of using learning-based controls to explore the optimal performance.}, - booktitle = {2017 {IEEE}-{RAS} 17th {International} {Conference} on {Humanoid} {Robotics} ({Humanoids})}, - author = {Yang, Chuanyu and Komura, Taku and Li, Zhibin}, - month = nov, - year = {2017}, - note = {ISSN: 2164-0580}, - keywords = {learning (artificial intelligence), Machine learning, humanoid robots, robot programming, motion control, Legged locomotion, control engineering computing, Foot, deep reinforcement learning, Torso, control policies, human-comparable balancing behaviours, humanoid motion control, Pelvis, zero moment point}, - pages = {372--377}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/4CTDBA8W/8246900.html:text/html;Yang et al_2017_Emergence of human-comparable balancing behaviours by deep reinforcement.pdf:/home/dferigo/Zotero/storage/WB86I9KP/Yang et al_2017_Emergence of human-comparable balancing behaviours by deep reinforcement.pdf:application/pdf}, + year = {2018}, + file = {Xia et al_2018_Gibson Env.pdf:/home/dferigo/Zotero/storage/X7M9GV3X/Xia et al_2018_Gibson Env.pdf:application/pdf}, } -@inproceedings{nava_stability_2016, - title = {Stability analysis and design of momentum-based controllers for humanoid robots}, - doi = {10.1109/IROS.2016.7759126}, - abstract = {Envisioned applications for humanoid robots call for the design of balancing and walking controllers. While promising results have been recently achieved, robust and reliable controllers are still a challenge for the control community dealing with humanoid robotics. Momentum-based strategies have proven their effectiveness for controlling humanoids balancing, but the stability analysis of these controllers is still missing. The contribution of this paper is twofold. First, we numerically show that the application of state-of-the-art momentum-based control strategies may lead to unstable zero dynamics. Secondly, we propose simple modifications to the control architecture that avoid instabilities at the zero-dynamics level. Asymptotic stability of the closed loop system is shown by means of a Lyapunov analysis on the linearized system's joint space. The theoretical results are validated with both simulations and experiments on the iCub humanoid robot.}, - booktitle = {2016 {IEEE}/{RSJ} {International} {Conference} on {Intelligent} {Robots} and {Systems} ({IROS})}, - author = {Nava, Gabriele and Romano, Francesco and Nori, Francesco and Pucci, Daniele}, - month = oct, - year = {2016}, - note = {ISSN: 2153-0866}, - keywords = {humanoid robots, Robot kinematics, Humanoid robots, iCub humanoid robot, Lyapunov methods, Acceleration, asymptotic stability, Asymptotic stability, balancing controller design, closed loop system asymptotic stability, closed loop systems, control system synthesis, legged locomotion, Legged locomotion, linearisation techniques, linearized system joint space, Lyapunov analysis, momentum, momentum-based controller design, robust control, robust controllers, stability analysis, Stability analysis, unstable zero dynamics, walking controller design}, - pages = {680--687}, - file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/2QU4BGKY/7759126.html:text/html;Nava et al_2016_Stability analysis and design of momentum-based controllers for humanoid robots.pdf:/home/dferigo/Zotero/storage/XPQJX7AK/Nava et al_2016_Stability analysis and design of momentum-based controllers for humanoid robots.pdf:application/pdf}, +@article{ramos_bayessim_2019, + title = {{BayesSim}: adaptive domain randomization via probabilistic inference for robotics simulators}, + shorttitle = {{BayesSim}}, + abstract = {We introduce BayesSim 1, a framework for robotics simulations allowing a full Bayesian treatment for the parameters of the simulator. As simulators become more sophisticated and able to represent the dynamics more accurately, fundamental problems in robotics such as motion planning and perception can be solved in simulation and solutions transferred to the physical robot. However, even the most complex simulator might still not be able to represent reality in all its details either due to inaccurate parametrization or simplistic assumptions in the dynamic models. BayesSim provides a principled framework to reason about the uncertainty of simulation parameters. Given a black box simulator (or generative model) that outputs trajectories of state and action pairs from unknown simulation parameters, followed by trajectories obtained with a physical robot, we develop a likelihood-free inference method that computes the posterior distribution of simulation parameters. This posterior can then be used in problems where Sim2Real is critical, for example in policy search. We compare the performance of BayesSim in obtaining accurate posteriors in a number of classical control and robotics problems. Results show that the posterior computed from BayesSim can be used for domain randomization outperforming alternative methods that randomize based on uniform priors.}, + language = {en}, + urldate = {2019-08-09}, + author = {Ramos, Fabio and Possas, Rafael Carvalhaes and Fox, Dieter}, + month = jun, + year = {2019}, + note = {arXiv: 1906.01728}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, + file = {Ramos et al_2019_BayesSim.pdf:/home/dferigo/Zotero/storage/2GPXV65N/Ramos et al_2019_BayesSim.pdf:application/pdf}, } -@book{bullo_geometric_2004, - title = {Geometric {Control} of {Mechanical} {Systems}: {Modeling}, {Analysis}, and {Design} for {Simple} {Mechanical} {Control} {Systems}}, - volume = {49}, - publisher = {Springer Science \& Business Media}, - author = {Bullo, Francesco and Lewis, Andrew D.}, - year = {2004}, - file = {Bullo_Lewis_2004_Geometric Control of Mechanical Systems.pdf:/home/dferigo/Zotero/storage/AYJJLTUV/Bullo_Lewis_2004_Geometric Control of Mechanical Systems.pdf:application/pdf}, +@inproceedings{parker_optix_2010, + title = {{OptiX}: {A} {General} {Purpose} {Ray} {Tracing} {Engine}}, + isbn = {978-1-4503-0210-4}, + shorttitle = {{OptiX}}, + doi = {10.1145/1833349.1778803}, + abstract = {The NVIDIA® OptiX™ ray tracing engine is a programmable system designed for NVIDIA GPUs and other highly parallel architectures. The OptiX engine builds on the key observation that most ray tracing algorithms can be implemented using a small set of programmable operations. Consequently, the core of OptiX is a domain-specific just-in-time compiler that generates custom ray tracing kernels by combining user-supplied programs for ray generation, material shading, object intersection, and scene traversal. This enables the implementation of a highly diverse set of ray tracing-based algorithms and applications, including interactive rendering, offline rendering, collision detection systems, artificial intelligence queries, and scientific simulations such as sound propagation. OptiX achieves high performance through a compact object model and application of several ray tracing-specific compiler optimizations. For ease of use it exposes a single-ray programming model with full support for recursion and a dynamic dispatch mechanism similar to virtual function calls.}, + urldate = {2019-05-22}, + author = {Parker, Steven G. and Bigler, James and Dietrich, Andreas and Friedrich, Heiko and Hoberock, Jared and Luebke, David and McAllister, David and McGuire, Morgan and Morley, Keith and Robison, Austin and Stich, Martin}, + year = {2010}, + keywords = {graphics hardware, graphics systems, ray tracing}, } -@book{marsden_jerrold_e_introduction_2010, - title = {Introduction to {Mechanics} and {Symmetry}}, - author = {{Marsden, Jerrold E} and {Ratiu, Tudor S.}}, - year = {2010}, - file = {Marsden, Jerrold E_Ratiu, Tudor S._2010_Introduction to Mechanics and Symmetry.pdf:/home/dferigo/Zotero/storage/SC73LG43/Marsden, Jerrold E_Ratiu, Tudor S._2010_Introduction to Mechanics and Symmetry.pdf:application/pdf}, +@article{lopez_gym-gazebo2_2019, + title = {gym-gazebo2, a toolkit for reinforcement learning using {ROS} 2 and {Gazebo}}, + url = {http://arxiv.org/abs/1903.06278}, + abstract = {This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.}, + urldate = {2019-03-18}, + journal = {arXiv:1903.06278 [cs]}, + author = {Lopez, Nestor Gonzalez and Nuin, Yue Leire Erro and Moral, Elias Barba and Juan, Lander Usategui San and Rueda, Alejandro Solano and Vilches, Víctor Mayoral and Kojcev, Risto}, + month = mar, + year = {2019}, + note = {arXiv: 1903.06278}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/ISB6DK6W/1903.html:text/html;Lopez et al_2019_gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo.pdf:/home/dferigo/Zotero/storage/QVQB9GJE/Lopez et al_2019_gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo.pdf:application/pdf}, } -@book{maruskin_dynamical_2018, - title = {Dynamical {Systems} and {Geometric} {Mechanics}: {An} {Introduction}}, - isbn = {978-3-11-059780-6}, - shorttitle = {Dynamical {Systems} and {Geometric} {Mechanics}}, - url = {https://www.degruyter.com/document/doi/10.1515/9783110597806/html}, - language = {en}, - urldate = {2022-03-30}, - publisher = {De Gruyter}, - author = {Maruskin, Jared}, - month = aug, +@article{juliani_unity_2018, + title = {Unity: {A} {General} {Platform} for {Intelligent} {Agents}}, + shorttitle = {Unity}, + url = {http://arxiv.org/abs/1809.02627}, + abstract = {Recent advances in Deep Reinforcement Learning and Robotics have been driven by the presence of increasingly realistic and complex simulation environments. Many of the existing platforms, however, provide either unrealistic visuals, inaccurate physics, low task complexity, or a limited capacity for interaction among artificial agents. Furthermore, many platforms lack the ability to flexibly configure the simulation, hence turning the simulation environment into a black-box from the perspective of the learning system. Here we describe a new open source toolkit for creating and interacting with simulation environments using the Unity platform: Unity ML-Agents Toolkit. By taking advantage of Unity as a simulation platform, the toolkit enables the development of learning environments which are rich in sensory and physical complexity, provide compelling cognitive challenges, and support dynamic multi-agent interaction. We detail the platform design, communication protocol, set of example environments, and variety of training scenarios made possible via the toolkit.}, + urldate = {2019-05-03}, + journal = {arXiv:1809.02627 [cs, stat]}, + author = {Juliani, Arthur and Berges, Vincent-Pierre and Vckay, Esh and Gao, Yuan and Henry, Hunter and Mattar, Marwan and Lange, Danny}, + month = sep, year = {2018}, - doi = {10.1515/9783110597806}, - file = {Maruskin_2018_Dynamical Systems and Geometric Mechanics.pdf:/home/dferigo/Zotero/storage/E95RJ763/Maruskin_2018_Dynamical Systems and Geometric Mechanics.pdf:application/pdf}, + note = {arXiv: 1809.02627}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning, Computer Science - Neural and Evolutionary Computing}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/DT6IGGHT/1809.html:text/html;Juliani et al_2018_Unity.pdf:/home/dferigo/Zotero/storage/DBUW2T99/Juliani et al_2018_Unity.pdf:application/pdf}, } -@book{selig_geometric_2005, - title = {Geometric {Fundamentals} of {Robotics}}, - volume = {128}, - url = {https://link.springer.com/book/10.1007/b138859}, - language = {en}, - urldate = {2022-06-23}, - publisher = {Springer}, - author = {Selig, Jon M.}, - year = {2005}, - file = {Snapshot:/home/dferigo/Zotero/storage/22L6Y9MA/b138859.html:text/html}, +@article{nori_icub_2015, + title = {{iCub} {Whole}-{Body} {Control} through {Force} {Regulation} on {Rigid} {Non}-{Coplanar} {Contacts}}, + abstract = {This paper details the implementation on the humanoid robot iCub of state-of-the-art algorithms for whole-body control. We regulate the forces between the robot and its surrounding environment to stabilize a desired robot posture. We assume that the forces and torques are exerted on rigid contacts. The validity of this assumption is guaranteed by constraining the contact forces and torques, e.g. the contact forces must belong to the associated friction cones. The implementation of this control strategy requires to estimate the external forces acting on the robot, and the internal joint torques. We then detail algorithms to obtain these estimations when using a robot with an iCub-like sensor set, i.e. distributed six-axis force-torque sensors and whole-body tactile sensors. A general theory for identifying the robot inertial parameters is also presented. From an actuation standpoint, we show how to implement a joint torque control in the case of DC brushless motors. In addition, the coupling mechanism of the iCub torso is investigated. The soundness of the entire control architecture is validated in a real scenario involving the robot iCub balancing and making contacts at both arms.}, + urldate = {2020-07-22}, + journal = {Frontiers in Robotics and AI}, + author = {Nori, Francesco and Traversaro, Silvio and Eljaik, Jorhabib and Romano, Francesco and Del Prete, Andrea and Pucci, Daniele}, + year = {2015}, + keywords = {whole-body control, floating base robots, force sensors., noncoplanar contact, Rigid contacts, Tactile sensors}, } -@book{warner_foundations_1983, - title = {Foundations of {Differentiable} {Manifolds} and {Lie} {Groups}}, - volume = {94}, - url = {https://link.springer.com/book/10.1007/978-1-4757-1799-0}, +@book{featherstone_rigid_2008, + title = {Rigid body dynamics algorithms}, + isbn = {978-0-387-74314-1 978-0-387-74315-8}, + abstract = {"Rigid Body Dynamics presents the subject of computational rigid-body dynamics through the medium of spatial 6D vector notation. It explains how to model a rigid-body system and how to analyze it, and it presents the most comprehensive collection of the best rigid-body dynamics algorithms to be found in a single source. The use of spatial vector notation greatly reduces the volume of algebra which allows systems to be described using fewer equations and fewer quantities. It also allows problems to be solved in fewer steps, and solutions to be expressed more succinctly. In addition algorithms are explained simply and clearly, and are expressed in a compact form. The use of spatial vector notation facilitates the implementation of dynamics algorithms on a computer: shorter, simpler code that is easier to write, understand and debug, with no loss of efficiency." "Rigid Body Dynamics Algorithms is aimed at readers who already have some elementary knowledge of rigid-body dynamics, and are interested in calculating the dynamics of a rigid-body system. This book serves as an algorithms recipe book as well as a guide to the analysis and deeper understanding of rigid-body systems."--BOOK JACKET}, language = {en}, - urldate = {2022-06-23}, - publisher = {Springer Science \& Business Media}, - author = {Warner, Frank W.}, - year = {1983}, - file = {Snapshot:/home/dferigo/Zotero/storage/TRCEMDKF/978-1-4757-1799-0.html:text/html}, + publisher = {Springer}, + author = {Featherstone, Roy}, + year = {2008}, + note = {OCLC: ocn190774140}, + keywords = {Robots, Dynamics, Dynamics, Rigid, Recursive functions}, } -@misc{achiam_spinning_2018, - title = {Spinning {Up} in {Deep} {RL}}, - url = {https://spinningup.openai.com}, - language = {en}, - urldate = {2022-06-23}, - author = {Achiam, Josh}, - year = {2018}, - file = {Welcome to Spinning Up in Deep RL! — Spinning Up documentation:/home/dferigo/Zotero/storage/WYUQEUMW/latest.html:text/html}, +@article{brockman_openai_2016, + title = {{OpenAI} {Gym}}, + url = {http://arxiv.org/abs/1606.01540}, + abstract = {OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.}, + urldate = {2019-05-22}, + journal = {arXiv:1606.01540 [cs]}, + author = {Brockman, Greg and Cheung, Vicki and Pettersson, Ludwig and Schneider, Jonas and Schulman, John and Tang, Jie and Zaremba, Wojciech}, + month = jun, + year = {2016}, + note = {arXiv: 1606.01540}, + keywords = {Computer Science - Machine Learning, Computer Science - Artificial Intelligence}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/MPYQGNAJ/1606.html:text/html;Brockman et al_2016_OpenAI Gym.pdf:/home/dferigo/Zotero/storage/Q2ANSE79/Brockman et al_2016_OpenAI Gym.pdf:application/pdf}, } -@book{dong_deep_2020, - address = {Singapore}, - title = {Deep {Reinforcement} {Learning}: {Fundamentals}, {Research} and {Applications}}, - isbn = {9789811540943 9789811540950}, - shorttitle = {Deep {Reinforcement} {Learning}}, - url = {http://link.springer.com/10.1007/978-981-15-4095-0}, - language = {en}, - urldate = {2022-02-24}, - publisher = {Springer Singapore}, - editor = {Dong, Hao and Ding, Zihan and Zhang, Shanghang}, - year = {2020}, - doi = {10.1007/978-981-15-4095-0}, - file = {Dong et al_2020_Deep Reinforcement Learning.pdf:/home/dferigo/Zotero/storage/DNN9KNNP/Dong et al_2020_Deep Reinforcement Learning.pdf:application/pdf}, +@article{collins_review_2021, + title = {A {Review} of {Physics} {Simulators} for {Robotic} {Applications}}, + volume = {9}, + issn = {2169-3536}, + doi = {10.1109/ACCESS.2021.3068769}, + abstract = {The use of simulators in robotics research is widespread, underpinning the majority of recent advances in the field. There are now more options available to researchers than ever before, however navigating through the plethora of choices in search of the right simulator is often non-trivial. Depending on the field of research and the scenario to be simulated there will often be a range of suitable physics simulators from which it is difficult to ascertain the most relevant one. We have compiled a broad review of physics simulators for use within the major fields of robotics research. More specifically, we navigate through key sub-domains and discuss the features, benefits, applications and use-cases of the different simulators categorised by the respective research communities. Our review provides an extensive index of the leading physics simulators applicable to robotics researchers and aims to assist them in choosing the best simulator for their use case.}, + journal = {IEEE Access}, + author = {Collins, J. and Chand, S. and Vanderkop, A. and Howard, D.}, + year = {2021}, + keywords = {Robots, Sensors, Legged locomotion, Robot sensing systems, robotics, review, Simulation, aerial robotics, field robotics, manipulation, marine robotics, Mobile robots, Navigation, Physics, robotic learning, soft robotics, surgical robotics}, + pages = {51416--51431}, } -@inproceedings{carpentier_pinocchio_2019, - title = {The {Pinocchio} {C}++ library : {A} fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives}, - shorttitle = {The {Pinocchio} {C}++ library}, - doi = {10.1109/SII.2019.8700380}, - abstract = {We introduce Pinocchio, an open-source software framework that implements rigid body dynamics algorithms and their analytical derivatives. Pinocchio does not only include standard algorithms employed in robotics (e.g., forward and inverse dynamics) but provides additional features essential for the control, the planning and the simulation of robots. In this paper, we describe these features and detail the programming patterns and design which make Pinocchio efficient. We evaluate the performances against RBDL, another framework with broad dissemination inside the robotics community. We also demonstrate how the source code generation embedded in Pinocchio outperforms other approaches of state of the art.}, - booktitle = {2019 {IEEE}/{SICE} {International} {Symposium} on {System} {Integration} ({SII})}, - author = {Carpentier, Justin and Saurel, Guilhem and Buondonno, Gabriele and Mirabel, Joseph and Lamiraux, Florent and Stasse, Olivier and Mansard, Nicolas}, - month = jan, - year = {2019}, - note = {ISSN: 2474-2325}, - keywords = {Heuristic algorithms, Robot kinematics, Kinematics, Computational modeling, Libraries, Software algorithms}, - pages = {614--619}, - file = {Carpentier et al_2019_The Pinocchio C++ library.pdf:/home/dferigo/Zotero/storage/8XHJM8IS/Carpentier et al_2019_The Pinocchio C++ library.pdf:application/pdf;IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/HGDTSVGR/8700380.html:text/html}, +@article{heiden_neuralsim_2021, + title = {{NeuralSim}: {Augmenting} {Differentiable} {Simulators} with {Neural} {Networks}}, + url = {http://arxiv.org/abs/2011.04217}, + abstract = {Differentiable simulators provide an avenue for closing the sim-to-real gap by enabling the use of efficient, gradient-based optimization algorithms to find the simulation parameters that best fit the observed sensor readings. Nonetheless, these analytical models can only predict the dynamical behavior of systems for which they have been designed. In this work, we study the augmentation of a novel differentiable rigid-body physics engine via neural networks that is able to learn nonlinear relationships between dynamic quantities and can thus learn effects not accounted for in traditional simulators.Such augmentations require less data to train and generalize better compared to entirely data-driven models. Through extensive experiments, we demonstrate the ability of our hybrid simulator to learn complex dynamics involving frictional contacts from real data, as well as match known models of viscous friction, and present an approach for automatically discovering useful augmentations. We show that, besides benefiting dynamics modeling, inserting neural networks can accelerate model-based control architectures. We observe a ten-fold speed-up when replacing the QP solver inside a model-predictive gait controller for quadruped robots with a neural network, allowing us to significantly improve control delays as we demonstrate in real-hardware experiments. We publish code, additional results and videos from our experiments on our project webpage at https://sites.google.com/usc.edu/neuralsim.}, + urldate = {2022-03-01}, + journal = {arXiv:2011.04217 [cs]}, + author = {Heiden, Eric and Millard, David and Coumans, Erwin and Sheng, Yizhou and Sukhatme, Gaurav S.}, + month = may, + year = {2021}, + note = {arXiv: 2011.04217}, + keywords = {Computer Science - Robotics}, + file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/M9F3TAP8/2011.html:text/html;Heiden et al_2021_NeuralSim.pdf:/home/dferigo/Zotero/storage/BZ9QUVGR/Heiden et al_2021_NeuralSim.pdf:application/pdf}, } -@misc{carpentier_pinocchio_2015, - title = {Pinocchio: fast forward and inverse dynamics for poly-articulated systems}, - url = {https://stack-of-tasks.github.io/pinocchio}, - author = {Carpentier, Justin and Valenza, Florian and Mansard, Nicolas}, - year = {2015}, +@article{makoviychuk_isaac_2021, + title = {Isaac {Gym}: {High} {Performance} {GPU}-{Based} {Physics} {Simulation} {For} {Robot} {Learning}}, + url = {http://arxiv.org/abs/2108.10470}, + abstract = {Isaac Gym offers a high performance learning platform to train policies for wide variety of robotics tasks directly on GPU. Both physics simulation and the neural network policy training reside on GPU and communicate by directly passing data from physics buffers to PyTorch tensors without ever going through any CPU bottlenecks. This leads to blazing fast training times for complex robotics tasks on a single GPU with 2-3 orders of magnitude improvements compared to conventional RL training that uses a CPU based simulator and GPU for neural networks. We host the results and videos at https://sites.google.com/view/ isaacgym-nvidia and isaac gym can be downloaded at https://developer. nvidia.com/isaac-gym.}, + language = {en}, + urldate = {2022-03-01}, + journal = {arXiv:2108.10470 [cs]}, + author = {Makoviychuk, Viktor and Wawrzyniak, Lukasz and Guo, Yunrong and Lu, Michelle and Storey, Kier and Macklin, Miles and Hoeller, David and Rudin, Nikita and Allshire, Arthur and Handa, Ankur and State, Gavriel}, + month = aug, + year = {2021}, + note = {arXiv: 2108.10470}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics}, } -@article{korber_comparing_2021-1, - title = {Comparing {Popular} {Simulation} {Environments} in the {Scope} of {Robotics} and {Reinforcement} {Learning}}, - url = {http://arxiv.org/abs/2103.04616}, - abstract = {This letter compares the performance of four different, popular simulation environments for robotics and reinforcement learning (RL) through a series of benchmarks. The benchmarked scenarios are designed carefully with current industrial applications in mind. Given the need to run simulations as fast as possible to reduce the real-world training time of the RL agents, the comparison includes not only different simulation environments but also different hardware configurations, ranging from an entry-level notebook up to a dual CPU high performance server. We show that the chosen simulation environments benefit the most from single core performance. Yet, using a multi core system, multiple simulations could be run in parallel to increase the performance.}, - urldate = {2022-04-20}, - journal = {arXiv:2103.04616 [cs]}, - author = {Körber, Marian and Lange, Johann and Rediske, Stephan and Steinmann, Simon and Glück, Roland}, - month = mar, +@article{freeman_brax_2021, + title = {Brax -- {A} {Differentiable} {Physics} {Engine} for {Large} {Scale} {Rigid} {Body} {Simulation}}, + url = {http://arxiv.org/abs/2106.13281}, + abstract = {We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX. We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine. Additionally, we provide reimplementations of PPO, SAC, ES, and direct policy optimization in JAX that compile alongside our environments, allowing the learning algorithm and the environment processing to occur on the same device, and to scale seamlessly on accelerators. Finally, we include notebooks that facilitate training of performant policies on common OpenAI Gym MuJoCo-like tasks in minutes.}, + language = {en}, + urldate = {2021-12-09}, + journal = {arXiv:2106.13281 [cs]}, + author = {Freeman, C. Daniel and Frey, Erik and Raichuk, Anton and Girgin, Sertan and Mordatch, Igor and Bachem, Olivier}, + month = jun, year = {2021}, - note = {arXiv: 2103.04616}, - keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Computer Science - Artificial Intelligence}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/43R53FNN/2103.html:text/html;Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:/home/dferigo/Zotero/storage/75YCB5Z3/Körber et al_2021_Comparing Popular Simulation Environments in the Scope of Robotics and.pdf:application/pdf}, + note = {arXiv: 2106.13281}, + keywords = {Computer Science - Robotics, Computer Science - Artificial Intelligence}, } -@article{farley_how_2022, - title = {How to pick a mobile robot simulator: {A} quantitative comparison of {CoppeliaSim}, {Gazebo}, {MORSE} and {Webots} with a focus on the accuracy of motion simulations}, - issn = {1569190X}, - shorttitle = {How to pick a mobile robot simulator}, - url = {https://linkinghub.elsevier.com/retrieve/pii/S1569190X22001046}, - doi = {10.1016/j.simpat.2022.102629}, - abstract = {The number of available tools for dynamic simulation of robots has been growing rapidly in recent years. However, to the best of our knowledge, there are very few reported quantitative comparisons of the most widelyused robot simulation tools. This article attempts to partly fill this gap by providing quantitative and objective comparisons of four widely-used simulation packages for mobile robots. The comparisons reported here were conducted by obtaining data from a real Husky A200 mobile robot driving on mixed terrains as ground truth and by simulating a 3D mobile robot model in a developed identical simulation world of these terrains for each simulator. We then compared the simulation outputs with real, measured results by weighted metrics. Based on our experiments and selected metrics, we conclude that CoppeliaSim is currently the best performing simulator, although Gazebo is not far behind and is a good alternative.}, +@book{cellier_continuous_2006, + address = {New York}, + title = {Continuous system simulation}, + isbn = {978-0-387-26102-7 978-0-387-30260-7}, language = {en}, - urldate = {2022-07-12}, - journal = {Simulation Modelling Practice and Theory}, - author = {Farley, Andrew and Wang, Jie and Marshall, Joshua A.}, - month = jul, - year = {2022}, - pages = {102629}, - file = {Farley et al_2022_How to pick a mobile robot simulator.pdf:/home/dferigo/Zotero/storage/5MEB9LW7/Farley et al_2022_How to pick a mobile robot simulator.pdf:application/pdf}, + publisher = {Springer}, + author = {Cellier, François E. and Kofman, Ernesto}, + year = {2006}, + keywords = {Computer simulation, Mathematical models, Mathematics, Simulation methods}, + file = {Cellier and Kofman - 2006 - Continuous system simulation.pdf:/home/dferigo/Zotero/storage/R84I784E/Cellier and Kofman - 2006 - Continuous system simulation.pdf:application/pdf}, } -@article{romualdi_modeling_2021, - title = {Modeling of {Visco}-{Elastic} {Environments} for {Humanoid} {Robot} {Motion} {Control}}, - volume = {6}, - issn = {2377-3766, 2377-3774}, - url = {http://arxiv.org/abs/2105.14622}, - doi = {10.1109/LRA.2021.3067589}, - abstract = {This manuscript presents a model of compliant contacts for time-critical humanoid robot motion control. The proposed model considers the environment as a continuum of spring-damper systems, which allows us to compute the equivalent contact force and torque that the environment exerts on the contact surface. We show that the proposed model extends the linear and rotational springs and dampers - classically used to characterize soft terrains - to the case of large contact surface orientations. The contact model is then used for the real-time whole-body control of humanoid robots walking on visco-elastic environments. The overall approach is validated by simulating walking motions of the iCub humanoid robot. Furthermore, the paper compares the proposed whole-body control strategy and state of the art approaches. In this respect, we investigate the terrain compliance that makes the classical approaches assuming rigid contacts fail. We finally analyze the robustness of the presented control design with respect to non-parametric uncertainty in the contact-model.}, - number = {3}, - urldate = {2022-07-26}, - journal = {IEEE Robotics and Automation Letters}, - author = {Romualdi, Giulio and Dafarra, Stefano and Pucci, Daniele}, - month = jul, - year = {2021}, - note = {arXiv:2105.14622 [cs]}, - keywords = {Computer Science - Robotics}, - pages = {4289--4296}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/PEPFYDNZ/2105.html:text/html;Romualdi et al_2021_Modeling of Visco-Elastic Environments for Humanoid Robot Motion Control.pdf:/home/dferigo/Zotero/storage/2VQZDSNI/Romualdi et al_2021_Modeling of Visco-Elastic Environments for Humanoid Robot Motion Control.pdf:application/pdf}, +@book{friedland_control_2005, + address = {Mineola, NY}, + edition = {Dover ed}, + title = {Control system design: an introduction to state-space methods}, + isbn = {978-0-486-44278-5}, + shorttitle = {Control system design}, + language = {en}, + publisher = {Dover Publications}, + author = {Friedland, Bernard}, + year = {2005}, + keywords = {Control theory, Automatic control, State-space methods, System design}, + file = {Friedland - 2005 - Control system design an introduction to state-sp.pdf:/home/dferigo/Zotero/storage/UHRJEGUE/Friedland - 2005 - Control system design an introduction to state-sp.pdf:application/pdf}, } -@article{peng_learning_2017-1, - title = {Learning {Locomotion} {Skills} using {DeepRL}: does the choice of {Action} {Space} matter?}, - abstract = {The use of deep reinforcement learning allows for high-dimensional state descriptors, but little is known about how the choice of action representation impacts the learning difficulty and the resulting performance. We compare the impact of four different action parameterizations (torques, muscle-activations, target joint angles, and target joint-angle velocities) in terms of learning time, policy robustness, motion quality, and policy query rates. Our results are evaluated on a gaitcycle imitation task for multiple planar articulated figures and multiple gaits. We demonstrate that the local feedback provided by higher-level action parameterizations can significantly impact the learning, robustness, and quality of the resulting policies.}, +@article{andrle_geometric_2013, + title = {Geometric {Integration} of {Quaternions}}, language = {en}, - author = {Peng, Xue Bin}, - year = {2017}, - pages = {16}, - file = {Peng_2017_Learning Locomotion Skills using DeepRL.pdf:/home/dferigo/Zotero/storage/2ZQWS6JU/Peng_2017_Learning Locomotion Skills using DeepRL.pdf:application/pdf}, + journal = {Journal of Guidance, Control, and Dynamics}, + author = {Andrle, Michael S and Crassidis, John L}, + year = {2013}, + pages = {10}, + file = {Andrle and Crassidis - Geometric Integration of Quaternions.pdf:/home/dferigo/Zotero/storage/B8PXR3LE/Andrle and Crassidis - Geometric Integration of Quaternions.pdf:application/pdf}, } -@inproceedings{reda_learning_2020, - address = {Virtual Event SC USA}, - title = {Learning to {Locomote}: {Understanding} {How} {Environment} {Design} {Matters} for {Deep} {Reinforcement} {Learning}}, - isbn = {978-1-4503-8171-0}, - shorttitle = {Learning to {Locomote}}, - url = {https://dl.acm.org/doi/10.1145/3424636.3426907}, - doi = {10.1145/3424636.3426907}, - abstract = {Learning to locomote is one of the most common tasks in physicsbased animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results. Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits. We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote.}, +@book{bishop_pattern_2006, + address = {New York}, + series = {Information science and statistics}, + title = {Pattern recognition and machine learning}, + isbn = {978-0-387-31073-2}, language = {en}, - urldate = {2022-07-30}, - booktitle = {Motion, {Interaction} and {Games}}, - publisher = {ACM}, - author = {Reda, Daniele and Tao, Tianxin and van de Panne, Michiel}, - month = oct, - year = {2020}, - pages = {1--10}, - file = {Reda et al_2020_Learning to Locomote.pdf:/home/dferigo/Zotero/storage/44VB7PI2/Reda et al_2020_Learning to Locomote.pdf:application/pdf}, + publisher = {Springer}, + author = {Bishop, Christopher M.}, + year = {2006}, + keywords = {Machine learning, Pattern perception}, + file = {Bishop - 2006 - Pattern recognition and machine learning.pdf:/home/dferigo/Zotero/storage/Y5DW5TID/Bishop - 2006 - Pattern recognition and machine learning.pdf:application/pdf}, } -@inproceedings{chou_improving_2017, - title = {Improving {Stochastic} {Policy} {Gradients} in {Continuous} {Control} with {Deep} {Reinforcement} {Learning} using the {Beta} {Distribution}}, - abstract = {Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and robotic manipulation. However, in real-world control problems, the actions one can take are bounded by physical constraints, which introduces a bias when the standard Gaussian distribution is used as the stochastic policy. In this work, we propose to use the Beta distribution as an alternative and analyze the bias and variance of the policy gradients of both policies. We show that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization (TRPO) and actor critic with experience replay (ACER), the state-of-the-art on- and offpolicy stochastic methods respectively, on OpenAI Gym’s and MuJoCo’s continuous control environments.}, +@book{puterman_markov_2005, + address = {Hoboken, NJ}, + series = {Wiley series in probability and statistics}, + title = {Markov decision processes: discrete stochastic dynamic programming}, + isbn = {978-0-471-72782-8}, + shorttitle = {Markov decision processes}, language = {en}, - author = {Chou, Po-Wei and Maturana, Daniel and Scherer, Sebastian}, - year = {2017}, - pages = {10}, - file = {Chou et al_Improving Stochastic Policy Gradients in Continuous Control with Deep.pdf:/home/dferigo/Zotero/storage/AL92UVVR/Chou et al_Improving Stochastic Policy Gradients in Continuous Control with Deep.pdf:application/pdf}, + publisher = {Wiley-Interscience}, + author = {Puterman, Martin L.}, + year = {2005}, + file = {Puterman - 2005 - Markov decision processes discrete stochastic dyn.pdf:/home/dferigo/Zotero/storage/4NCDPP3U/Puterman - 2005 - Markov decision processes discrete stochastic dyn.pdf:application/pdf}, } -@article{peng_ase_2022, - title = {{ASE}: {Large}-{Scale} {Reusable} {Adversarial} {Skill} {Embeddings} for {Physically} {Simulated} {Characters}}, - volume = {41}, - issn = {0730-0301, 1557-7368}, - shorttitle = {{ASE}}, - url = {http://arxiv.org/abs/2205.01906}, - doi = {10.1145/3528223.3530110}, - abstract = {The incredible feats of athleticism demonstrated by humans are made possible in part by a vast repertoire of general-purpose motor skills, acquired through years of practice and experience. These skills not only enable humans to perform complex tasks, but also provide powerful priors for guiding their behaviors when learning new tasks. This is in stark contrast to what is common practice in physics-based character animation, where control policies are most typically trained from scratch for each task. In this work, we present a large-scale data-driven framework for learning versatile and reusable skill embeddings for physically simulated characters. Our approach combines techniques from adversarial imitation learning and unsupervised reinforcement learning to develop skill embeddings that produce life-like behaviors, while also providing an easy to control representation for use on new downstream tasks. Our models can be trained using large datasets of unstructured motion clips, without requiring any task-specific annotation or segmentation of the motion data. By leveraging a massively parallel GPU-based simulator, we are able to train skill embeddings using over a decade of simulated experiences, enabling our model to learn a rich and versatile repertoire of skills. We show that a single pre-trained model can be effectively applied to perform a diverse set of new tasks. Our system also allows users to specify tasks through simple reward functions, and the skill embedding then enables the character to automatically synthesize complex and naturalistic strategies in order to achieve the task objectives.}, - number = {4}, - urldate = {2022-07-30}, - journal = {ACM Transactions on Graphics}, - author = {Peng, Xue Bin and Guo, Yunrong and Halper, Lina and Levine, Sergey and Fidler, Sanja}, - month = jul, - year = {2022}, - note = {arXiv:2205.01906 [cs]}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Graphics, Computer Science - Machine Learning}, - pages = {1--17}, +@article{jaakkola_reinforcement_1994, + title = {Reinforcement {Learning} {Algorithm} for {Partially} {Observable} {Markov} {Decision} {Problems}}, + abstract = {Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due to successes in the theoretical analysis of their behavior in Markov environments. If the Markov assumption is removed, however, neither generally the algorithms nor the analyses continue to be usable. We propose and analyze a new learning algorithm to solve a certain class of non-Markov decision problems. Our algorithm applies to problems in which the environment is Markov, but the learner has restricted access to state information. The algorithm involves a Monte-Carlo policy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum. The algorithm operates in the space of stochastic policies, a space which can yield a policy that performs considerably better than any deterministic policy. Although the space of stochastic policies is continuous-even for a discrete action space-our algorithm is computationally tractable.}, + language = {en}, + journal = {Advances in neural information processing systems}, + author = {Jaakkola, Tommi and Singh, Satinder P and Jordan, Michael I}, + year = {1994}, + file = {Jaakkola et al. - Reinforcement Learning Algorithm for Partially Obs.pdf:/home/dferigo/Zotero/storage/D5UK3Q5E/Jaakkola et al. - Reinforcement Learning Algorithm for Partially Obs.pdf:application/pdf}, } -@misc{xu_accelerated_2022, - title = {Accelerated {Policy} {Learning} with {Parallel} {Differentiable} {Simulation}}, - url = {http://arxiv.org/abs/2204.07137}, - doi = {10.48550/arXiv.2204.07137}, - abstract = {Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.}, - urldate = {2022-07-28}, - publisher = {arXiv}, - author = {Xu, Jie and Makoviychuk, Viktor and Narang, Yashraj and Ramos, Fabio and Matusik, Wojciech and Garg, Animesh and Macklin, Miles}, - month = apr, - year = {2022}, - note = {arXiv:2204.07137 [cs]}, - keywords = {Computer Science - Artificial Intelligence, Computer Science - Graphics, Computer Science - Machine Learning, Computer Science - Robotics}, - file = {arXiv.org Snapshot:/home/dferigo/Zotero/storage/RZKKUGFX/2204.html:text/html;Xu et al_2022_Accelerated Policy Learning with Parallel Differentiable Simulation.pdf:/home/dferigo/Zotero/storage/AT2J2BBX/Xu et al_2022_Accelerated Policy Learning with Parallel Differentiable Simulation.pdf:application/pdf}, +@article{lovejoy_survey_1991, + title = {A survey of algorithmic methods for partially observed {Markov} decision processes}, + volume = {28}, + issn = {0254-5330, 1572-9338}, + url = {http://link.springer.com/10.1007/BF02055574}, + doi = {10.1007/BF02055574}, + language = {en}, + number = {1}, + urldate = {2023-01-07}, + journal = {Annals of Operations Research}, + author = {Lovejoy, William S.}, + month = dec, + year = {1991}, + pages = {47--65}, + file = {Lovejoy - 1991 - A survey of algorithmic methods for partially obse.pdf:/home/dferigo/Zotero/storage/E6ISAHRJ/Lovejoy - 1991 - A survey of algorithmic methods for partially obse.pdf:application/pdf}, +} + +@book{marsden_jerrold_e_introduction_2013, + title = {Introduction to {Mechanics} and {Symmetry}: a {Basic} {Exposition} of {Classical} {Mechanical} {Systems}.}, + publisher = {Springer Science \& Business Media}, + author = {{Marsden, Jerrold E.} and {Ratiu, Tudor S.}}, + year = {2013}, + file = {(Texts in Applied Mathematics) Jerrold E. Marsden, Tudor S. Ratiu - Introduction To Mechanics And Symmetry A Basic Exposition of Classical Mechanical Systems-Springer (2010).pdf:/home/dferigo/Zotero/storage/2QH6GPAH/(Texts in Applied Mathematics) Jerrold E. Marsden, Tudor S. Ratiu - Introduction To Mechanics And Symmetry A Basic Exposition of Classical Mechanical Systems-Springer (2010).pdf:application/pdf}, } From e3cb259f21ad8f22cfcfd040f9dffeb790e1cbfb Mon Sep 17 00:00:00 2001 From: Diego Ferigo Date: Thu, 25 May 2023 12:34:08 +0200 Subject: [PATCH 3/5] Update soft-contacts formulation and add Mujoco comparison --- Chapters/Part_2/chapter_7.tex | 199 ++++++++--- Chapters/Part_2/chapter_8.tex | 80 +++-- .../box_falling_inclined_mu_0.0.tikz | 322 ++++++++++++++++++ .../box_falling_inclined_mu_0.5.tikz | 272 +++++++++++++++ .../box_falling_inclined_mu_2.0.tikz | 216 ++++++++++++ .../chapter_7/soft_contact_model.tikz | 11 +- zotero.bib | 47 +++ 7 files changed, 1062 insertions(+), 85 deletions(-) create mode 100644 images/contributions/chapter_7/box_falling_inclined_mu_0.0.tikz create mode 100644 images/contributions/chapter_7/box_falling_inclined_mu_0.5.tikz create mode 100644 images/contributions/chapter_7/box_falling_inclined_mu_2.0.tikz diff --git a/Chapters/Part_2/chapter_7.tex b/Chapters/Part_2/chapter_7.tex index cafe098..0623985 100644 --- a/Chapters/Part_2/chapter_7.tex +++ b/Chapters/Part_2/chapter_7.tex @@ -261,6 +261,7 @@ \section{State-space Multibody Dynamics} In the next section, we describe a methodology to compute the unknown forces exchanged with the environment by assuming that the floating-base model only interacts with the terrain $\forcesix_\mathcal{L}^{contact} := \forcesix_\mathcal{L}^{terrain}$, assumption compatible with our locomotion setting. In a more general setting, $\forcesix_\mathcal{L}^{contact}$ should also include the forces exchanged between bodies, for example to simulate either self-collisions or interaction with other bodies part of the scene, particularly useful for robot manipulation. +\newpage \section{Contact model} \label{section:contact_model} @@ -282,6 +283,7 @@ \section{Contact model} In the following sections, we first provide a description of the point-surface setting, introducing all the necessary elements for the contact model. Then, we provide a more detailed analysis of continuous methods for \emph{collisions handling}, specifying how they can model both the \emph{normal} and the \emph{tangential} forces. Finally, we describe how their effects can be included in our dynamical system defined in Equation~\eqref{equation:floatig_base_dynamics_state_space}. +The algorithm implementing the proposed soft-contact model is reported in the next chapter in Section~\ref{sec:soft_contacts_algorithm}. \subsection{Point-surface collisions} \label{section:point-surface_collisions} @@ -290,11 +292,13 @@ \subsection{Point-surface collisions} \centering \resizebox{.63\textwidth}{!}{ \includegraphics{images/contributions/chapter_7/soft_contact_model.tikz}} - \caption{Illustration of the point-surface soft-contact model for non-planar terrains. The collidable point follow a trajectory $\pos_{cp}(t)$, penetrating the ground in $\pos_{cp}^0 := \left( x^0, y^0, \mathcal{H}(x^0, y^0) \right)$. While penetrating the material, the point reaches a generic point $\pos_{cp}$, over which a local contact frame $C = (\pos_{cp}, [W])$ is positioned, with a linear velocity $\vellin_{W,C} \in \realn^3$. The figure reports also the \emph{penetration depth} $h \in \realn$, the \emph{normal deformation} $\delta \in \realn$, and the tangential deformation $\mathbf{u} \in \realn^3$, used for the calculation of the 3D reaction force $\forcelin[C]_{cp}$ with the proposed soft-contact model.} + \caption{Illustration of the point-surface soft-contact model for non-planar terrains. The collidable point follow a trajectory $\pos_{cp}(t)$, penetrating the ground in $\pos_{cp}^0 := \left( x^0, y^0, \mathcal{H}(x^0, y^0) \right)$. While penetrating the material, the point reaches a generic point $\pos_{cp}$, over which a local contact frame $C = (\pos_{cp}, [W])$ is positioned, with a linear velocity $\vellin[{C[W]}]_{W,C} = \posd[W]_C \in \realn^3$. The figure reports also the \emph{penetration depth} $h \in \realn$, the \emph{normal deformation} $\delta \in \realn$, and the compounded tangential deformation $\mathbf{m} \in \realn^3$ of the material, used for the calculation of the 3D reaction force $\forcelin[C]_{cp}$ with the proposed soft-contact model.} \label{fig:soft_contact_model} \end{figure} For each time-varying collidable point belonging to the simulated model, we introduce a new local frame $C = (\ori_C, [W])$, having its origin located over the point's time-varying position $\pos[W]_{cp}(t)$ and orientation of $W$, illustrated in Figure~\ref{fig:soft_contact_model}. +Beyond the position of the collidable point, the contact model accounts also its linear velocity $\vellin_{W,C} \in \realn^3$. +In the following formulation, we use the mixed representation of the point velocity, \ie $\vellin_{W,C} = \posd[W]_C$. In this setup, \emph{collision detection} is as easy as assessing if the $z$ coordinate of the collidable point is lower than the terrain height. We can assume having a \emph{heightmap} function $\mathcal{H}: (x, y) \mapsto z$ providing the terrain height at any location. @@ -312,22 +316,25 @@ \subsection{Point-surface collisions} % where $h_z = z_T - z_C$ is the \emph{penetration depth}, positive only for collidable points below the ground surface. -In the following sections, we need to project the penetration vector $\mathbf{h}$ and the linear velocity $\vellin[W]_{W, C}$ of the collidable point into the parallel and normal directions \wrt the ground surface. -We denote the magnitude of the normal deformation as $\delta \in \mathbb{R}$, and the normal and tangential components of the velocity as $\vellin[W]^{\perp}_{W, C}, \vellin[W]^{\parallel}_{W, C} \in \mathbb{R}^3$: +In the following sections, we need to project the penetration vector $\mathbf{h}$ and the linear velocity $\posd[W]_C$ of the collidable point into the parallel and normal directions \wrt the ground surface. +We denote the magnitude of the normal deformation as $\delta \in \mathbb{R}^+$, and the normal and tangential components of the velocity as $\vellin_C^\perp,\, \vellin_C^\parallel \in \mathbb{R}^3$: % \begin{align*} \begin{cases} \delta = {}^W\mathbf{h} \cdot \hat{\mathbf{n}}, \\ - \vellin[W]^{\perp}_{W, C} = \left(\vellin[W]_{W, C} \cdot \hat{\mathbf{n}}\right) \hat{\mathbf{n}}, \\ - \vellin[W]^{\parallel}_{W, C} = \vellin[W]_{W, C} - \vellin[W]^{\perp}_{W, C} + \vellin_C^\perp = \left(\posd[W]_C \cdot \hat{\mathbf{n}}\right) \hat{\mathbf{n}}, \\ + \vellin_C^\parallel = \posd[W]_C - \vellin_C^\perp . \end{cases} \end{align*} % -We do not yet provide a geometrical equation of the tangential deformation $\mathbf{u} \in \mathbb{R}^3$, as it would require tracking over time the position of the initial penetration point $\pos[W]^0_{cp}$, introducing an additional state component in the system not part of its state-space representation that can be difficult to handle. -We will show in the next sections how to estimate $\mathbf{u}$. +We do not yet provide a geometrical equation to compute the \emph{compounded tangential deformation} $\mathbf{m} \in \mathbb{R}^3$ of the terrain material, as it would require tracking over time the position of the initial penetration point $\pos[W]^0_{cp}$. +Introducing in the system, for this purpose, an additional state component not part of its state-space representation would be difficult to handle. +We will show in the next sections how to compute $\mathbf{m}$ such that both sticking and slipping contacts are supported. +We also note that, assuming the knowledge of $\mathbf{m}$, the data required by the proposed contact model can be entirely computed from the floating-base configuration (generalized position $\Qu$ and velocity $\Nu$). +Therefore, the contact force $\forcelin[C] \in \realn^3$ becomes an instantaneous function of the kinematics. -Assuming that the effects of the normal and tangential deformations can be decomposed, in the next sections we first compute the normal force $\forcelin[C]_\perp = f_\perp \hat{\mathbf{n}} \in \mathbb{R}^3$, and then the tangential force $\forcelin[C]_\parallel \in \mathbb{R}^3$, both applied to the origin of frame $C$. +Assuming that the effects of the normal and tangential deformations of the material can be decomposed, in the next sections we first compute the normal force $\forcelin[C]_\perp = f_\perp \hat{\mathbf{n}} \in \mathbb{R}^3$, and then the tangential force $\forcelin[C]_\parallel \in \mathbb{R}^3$, both applied to the origin of frame $C$. \subsection{Normal forces} \label{section:normal_forces} @@ -340,14 +347,12 @@ \subsection{Normal forces} \begin{equation*} \begin{cases} \delta = {}^W \mathbf{h} \cdot \hat{\mathbf{n}}, \\ - \dot{\delta} = {}^W \dot{\mathbf{h}} \, \cdot \hat{\mathbf{n}} = -\posd[W]_{cp} \cdot \hat{\mathbf{n}} = -\vellin[W]_{W,C}^\perp + \dot{\delta} = {}^W \dot{\mathbf{h}} \, \cdot \hat{\mathbf{n}} = -\posd[W]_C \cdot \hat{\mathbf{n}} = -\vellin_C^\perp \end{cases} . \end{equation*} -% -Considering the usage of inertial representation, we also have the relation $\posd[W]_{cp} = \vellin[W]_{W,C}$, that helps noticing the correspondence between $\dot{\delta}$ and the normal component of the collidable point's velocity $\vellin[W]^\perp_{W,C}$. -The possible non-linear form of the relationship between the normal force $f_\perp$ and the deformation properties can be described by the following equation: +A possible non-linear form of the relationship between the normal force $f_\perp$ and the deformation properties can be described by the following equation: % \begin{equation*} f_\perp = \forcelin[C]_\perp \cdot \hat{\mathbf{n}} = @@ -357,8 +362,8 @@ \subsection{Normal forces} \end{cases} \end{equation*} % -where $k, \lambda \in \mathbb{R}$ are respectively the stiffness and damping coefficients of the material, and $a, b, c \in \mathbb{R}$ the parameters of the contact model. -Note that this contact model does not present any discontinuity, in fact for what regards the term proportional to deformation rate we also have $\delta^b$. +where $k, \lambda \in \mathbb{R}$ are respectively the \emph{stiffness} and \emph{damping} coefficients of the material, and $a, b, c \in \mathbb{R}$ the parameters of the contact model. +Note that this contact model does not present any discontinuity, in fact for what regards the term proportional to deformation rate we also have $\delta^b$ that zeros this term when $\delta = 0$. This model has appeared with different coefficients proposed by various studies. In our implementation, we use the parameters $a = \frac{3}{2}$, $b=\frac{1}{2}$, and $c=1$, as proposed by \textcite{azad_modeling_2010}. @@ -366,31 +371,33 @@ \subsection{Normal forces} assuming that the two collision types produce a comparable material deformation\footnote{These parameters, being quite difficult to identify, often belong to the domain randomization set in \ac{RL} settings.}. We can implement this model using the following logic: % -\begin{equation*} +\begin{equation} + \label{eq:soft_contact_normal_force} f_\perp = \begin{cases} \max\left\{ 0, \sqrt\delta (k \delta + \lambda \dot\delta) \right\} & \text{if $\delta\geq0$,} \\ 0 & \text{if $\delta<0$.} \end{cases} -\end{equation*} +\end{equation} % As remarked by the same authors, this model has the advantage of not exposing any additional state variable. -However, the implementation ignores the relaxation dynamics of the material after the contact is broken. +However, the implementation ignores the relaxation dynamics of the material in the normal direction after the contact is broken. This could cause incorrect dynamics if a new contact is made immediately following the deactivation of the previous one and before the spring-damper model could reach the steady state, but in practice the effect only occurs in the few instants before the contact becomes steady, not affecting the simulated dynamics significantly. \subsection{Tangential forces} The continuous contact dynamics introduced in the previous section allows for including frictional effects described with any friction model. -We approximate frictional effects with Coulomb's law of friction that, albeit being relatively simple, is widely adopted thanks to its versatility. +We consider only effects due to \emph{dry friction}. +We approximate these effects with Coulomb's law of friction that, albeit being relatively simple, is widely adopted thanks to its versatility. The physical interaction between two materials is assumed to be independent of the contact area, accounting consistently for our point-surface modelling. -Coulomb friction for an object at rest is governed by the following model: +The Coulomb friction for an object at rest is governed by the following model: % \begin{equation*} - \norm{\forcelin[C]_\parallel} \leq \mu f_\perp + \norm{\forcelin[C]_\parallel} \leq \mu_c f_\perp , \end{equation*} % -where $\forcelin_\parallel \in \mathbb{R}^3$ is the tangential force that the material deformation exerts on the point in the direction opposite to the tangential deformation, and $\mu \in \mathbb{R}$ is the \emph{static friction coefficient}. +where $\forcelin_\parallel \in \mathbb{R}^3$ is the tangential force that the material deformation exerts on the point in the direction opposite to the compounded tangential deformation $\mathbf{m}$, and $\mu_c \in \mathbb{R}^+$ is the \emph{static friction coefficient}. This model depends on the unilateral force $f_\perp \geq 0$, and can be visualised as a cone considering a space having the three force components as axes. For this reason, it is often referred to as \emph{friction cone}. @@ -399,26 +406,30 @@ \subsection{Tangential forces} \begin{equation*} \forcelin[C]_\parallel = \begin{cases} - \boldsymbol{f}_{stick} & \text{if $\norm{\boldsymbol{f}_{stick}} \leq \mu f_\perp$,} \\ + \boldsymbol{f}_{stick} & \text{if $\norm{\boldsymbol{f}_{stick}} \leq \mu_c f_\perp$,} \\ \boldsymbol{f}_{slip} & \text{otherwise.} \end{cases} \end{equation*} % -The same study we considered for the model of normal forces~\parencite{azad_modeling_2010} proposes a spring-damper-clutch system for the tangential forces, where the additional clutch component controls the sticking-slipping condition. +In practice, the two regimes are characterised respectively by the static friction coefficient $\mu_c \in \realn^+$ and the sliding friction coefficient $\mu_k \in \realn^+$, also called either kinetic or dynamic friction. +As soon as the regime transitions from sticking to slipping, the considered coefficient of friction should be changed from the static to the dynamic. +In the proposed model, in order to reduce the number of parameters to tune, we consider a unique coefficient $\mu = \mu_c = \mu_k$. +The implementation can be changed trivially to use a different parameter in the slipping regime. +The same study we considered for the model of normal forces~\parencite{azad_modeling_2010} proposes a spring-damper-clutch system for the tangential forces, where the additional clutch component controls the sticking-slipping condition. Extending their 2D formulation to our 3D point-surface setting, we can introduce the following relation between the tangential forces and the tangential material deformation: % \begin{equation} \label{equation:tangential_contact_model} \forcelin[C]_\parallel - = \alpha \mathbf{u} + \beta \dot{\mathbf{u}} - = \alpha \mathbf{u} + \beta (\vellin[W]^{\parallel}_{W,C} - \vellin[W]_{W,clutch}) + = \alpha \mathbf{m} + \beta \dot{\mathbf{m}} + = \alpha \mathbf{m} + \beta (\vellin_C^\parallel - \vellin[{C[W]}]_{W,clutch}) , \end{equation} % -where $\alpha, \beta \in \mathbb{R}$ are model parameters, $\mathbf{u} \in \mathbb{R}^3$ is the 3D tangential deformation of the material as illustrated in Figure~\ref{fig:soft_contact_model}, and $\vellin[W]_{W,clutch}$ is the unknown clutch velocity. +where $\alpha, \beta \in \mathbb{R}^+$ are model parameters, $\mathbf{m} \in \mathbb{R}^3$ is the compounded 3D tangential deformation of the material as illustrated in Figure~\ref{fig:soft_contact_model}, and $\vellin[{C[W]}]_{W,clutch}$ is the unknown clutch velocity. -When sticking, the clutch velocity is zero and, assuming the knowledge of $\mathbf{u}$, the tangential force can be computed with Equation~\eqref{equation:tangential_contact_model}. +When sticking, the clutch velocity is zero and, assuming the knowledge of $\mathbf{m}$, the tangential force can be computed with Equation~\eqref{equation:tangential_contact_model}. Instead, when the magnitude of the sticking force exceeds the friction cone bounds, the clutch is unlocked and the collidable point starts sliding. In slipping state, the tangential force maintains the sticking direction, but enforces its magnitude to lay on the friction cone boundary: % @@ -426,29 +437,31 @@ \subsection{Tangential forces} \label{equation:sticking_slipping_forces} \forcelin[C]_\parallel = \begin{cases} - \forcelin_{stick} = \alpha \mathbf{u} + \beta \vellin[W]^{\parallel}_{W,C} &\text{if sticking,} \\ + \forcelin_{stick} = \alpha \mathbf{m} + \beta \vellin_C^\parallel &\text{if sticking,} \\ \forcelin_{slip} = \mu f_\perp \frac{\boldsymbol{f}_{stick}}{\norm{\boldsymbol{f}_{stick}}} &\text{if slipping.} \end{cases} \end{equation} % We use $\alpha = -k_t\sqrt{\delta}$ and $\beta = -\lambda_t \sqrt{\delta}$ as presented by \textcite{azad_modeling_2010}, where also in this case we assume that collidable points produce a material deformation comparable to the sphere-plane setting. -The last missing point to discuss is how to calculate the tangential deformation $\mathbf{u}$ of the material. +The last missing point to discuss is how to calculate the compounded tangential deformation $\mathbf{m}$ of the material. Combining Equations~\eqref{equation:tangential_contact_model}~and~\eqref{equation:sticking_slipping_forces}, we can obtain the dynamics of the tangential deformation: % \begin{equation} \label{equation:tangential_deformation_dynamics} - \dot{\mathbf{u}} = + \dot{\mathbf{m}} = \begin{cases} - \vellin[W]^{\parallel}_{W,C} &\text{if sticking,} \\ - \beta^{-1} (\forcelin_{slip} - \alpha \mathbf{u}) &\text{if slipping,} \\ - -\alpha \beta^{-1} \mathbf{u} &\text{if contact is broken,} + \vellin_C^\parallel &\text{if sticking,} \\ + \beta^{-1} (\forcelin_{slip} - \alpha \mathbf{m}) &\text{if slipping,} \\ + -\alpha \beta^{-1} \mathbf{m} &\text{if contact is broken,} \end{cases} \end{equation} % -that can be numerically integrated to obtain $\mathbf{u}$. +that can be numerically integrated to obtain $\mathbf{m}$. It is worth noting that this formulation does not need to either know or keep track of the clutch velocity. +In fact, as soon as a sticking contact transitions to slipping, we calculate $\dot{\mathbf{m}}$ such that we obtain exactly the desired $\forcelin_{slip}$ (that is the projection of $\boldsymbol{f}_{stick}$ on the friction cone surface) given the current $(\mathbf{m}, \delta)$. +\newpage \subsection{Augmented system dynamics} The effects of the contact model introduced in the previous sections can be included in the system's dynamics~\eqref{equation:floatig_base_dynamics_state_space} by extending its state as follows: @@ -456,20 +469,20 @@ \subsection{Augmented system dynamics} \begin{equation*} \mathbf{x}(t) = \begin{bmatrix} - \mathbf{q} \\ \boldsymbol{\nu} \\ \operatorname{vec}(\mathbf{U}) + \mathbf{q} \\ \boldsymbol{\nu} \\ \operatorname{vec}(\mathbf{M}) \end{bmatrix} \in \mathbb{R}^{2n+3n_c+13} . \end{equation*} % -We introduced the matrix $\mathbf{U} \in \mathbb{R}^{3 \times n_c}$ that stacks the tangential deformations corresponding to all the $n_c$ collidable points of the model. -Its dynamics can be obtained from Equations~\eqref{equation:tangential_deformation_dynamics} and plugged in the following contacts-aware dynamics system: +We introduced the matrix $\mathbf{M} \in \mathbb{R}^{3 \times n_c}$ stacking the tangential deformations of all the $n_c$ collidable points. +The model's dynamics can be obtained from Equations~\eqref{equation:tangential_deformation_dynamics} and plugged in the following contact-aware dynamic system: % \begin{equation} \label{equation:floatig_base_dynamics_with_contacts_state_space} \dot{\mathbf{x}}(t) = \begin{bmatrix} - \Qud \\ \Nud \\ \operatorname{vec}(\dot{\mathbf{U}}) + \Qud \\ \Nud \\ \operatorname{vec}(\dot{\mathbf{M}}) \end{bmatrix} = \begin{bmatrix} \begin{pmatrix} @@ -478,7 +491,7 @@ \subsection{Augmented system dynamics} \dot{\mathbf{s}} \end{pmatrix} \\ \operatorname{FD}(\mathcal{M}, \Qu, \Nu, \Torques, \forcesix_\mathcal{L}^{ext}) \\ - \operatorname{vec}(\dot{\mathbf{U}}) + \operatorname{vec}(\dot{\mathbf{M}}) \end{bmatrix} = f\left(\mathbf{x}(t), \mathbf{u}(t)\right) . @@ -486,7 +499,7 @@ \subsection{Augmented system dynamics} % This final non-linear system, albeit being quite stiff when new contacts are made or broken, does not present any discontinuity. -In the considered locomotion setting, considering the decomposition of the external forces of Equation~\eqref{eq:decomposition_external_forces} and the assumption of knowing the terrain profile\footnote{The terrain profile could also be estimated by introducing perception algorithms.}, the input vector of Equation~\eqref{eq:input_and_state_vectors} can be simplified as follows: +The decomposition of the external forces of Equation~\eqref{eq:decomposition_external_forces} with the assumption of knowing the terrain profile (\eg estimated by introducing perception algorithms) allows to simplify the input vector of Equation~\eqref{eq:input_and_state_vectors} as follows: % \begin{equation*} \mathbf{u}(t) = \left( \Torques,\, \forcesix_\mathcal{L}^{user} \right) @@ -502,6 +515,7 @@ \subsection{Augmented system dynamics} % where $\mathcal{H}: (x, y) \mapsto z$ returns the terrain height and $\mathcal{S}: (x, y) \mapsto \hat{\mathbf{n}}$ returns the terrain normal. +\newpage \section{Integrators} \label{sec:integrators} @@ -566,11 +580,20 @@ \section{Integrators} It can be seen that $(\mathbf{k}_1, \mathbf{k}_4)$ are the slopes evaluated at the extremes of the integration interval, and $(\mathbf{k}_2, \mathbf{k}_3)$ at the midpoint. In the average, a higher weight is given to the slopes at the midpoint. +\newpage \section{Validation} In this section, we validate the properties of the system through simple simulations of a rigid body interacting with the terrain. A single rigid body is equivalent to the multibody system defined Equation~\eqref{equation:floatig_base_dynamics_with_contacts_state_space} having only variables related to the base and contacts. -In all experiments, the soft-contact model is configured with $k = k_t = 10^6$, $\lambda = \lambda_t = 2000$, and $\mu=0.5$. +In all experiments, the soft-contact model is configured with $k = k_t = 10^6$, $\lambda = \lambda_t = 2000$, and $\mu=0.5$, and the simulation is configured with a standard gravity of $g = 9.8 \, m/s^2$. + +We start the validation of the contact model on flat terrain by performing two experiments. +In the first one, we calculate the mechanical energy of a bouncing ball, validating that it changes only upon impact due to the damping of the contact model. +Given the continuous nature of the contact model, all the quantities should vary continuously. +In the second experiment, we apply a step-wise force with increasing magnitude to the \ac{CoM} of a box resting on a flat surface. +The box should start accelerating only when the applied force is able to overcome the opposing effects due to friction. +We conclude this section by validating the contact model on non-flat terrain. +We simulate a falling box over an inclined plane characterised by different coefficients of friction, and compare its trajectory with the Mujoco simulator~\parencite{todorov_mujoco_2012}. \subsection{Bouncing Ball} @@ -624,7 +647,7 @@ \subsection{Bouncing Ball} \label{fig:bouncing_ball} \end{figure} -\subsection{Sliding Box} +\subsection{Sliding Box on Flat Terrain} We consider a model composed of a single box-shaped link. The box has a mass of $1 \, Kg$ and $(x, y, z)$ dimensions equal to $(1.5, 1, 0.5) \, m$. @@ -633,7 +656,6 @@ \subsection{Sliding Box} The box is positioned on a flat ground surface at rest. We simulate this setting for $4 \, s$ using the \ac{RK4} integration scheme with a step size of $1 \, ms$. In this window, considering the frame of \ac{CoM} $G = (\pos[W]_{CoM}, [B])$, we apply to the \ac{CoM} of the box an external linear force $\forcelin[G]_{CoM} = (f_{CoM}, 0, 0) \in \realn^3$ with a profile reported in Figure~\ref{fig:sliding_box}. -The simulation is configured with a standard gravity of $g = 9.8 \, m/s$. In this setting, the threshold of the friction cone separating the sticking and the slipping regimes is $\mu f_\perp = 4.9 \, N$, averaged over the four contacts points of the bottom box surface. Figure~\ref{fig:sliding_box} reports the plots of the $x$ components of the \ac{CoM}'s position and linear velocity. @@ -651,6 +673,92 @@ \subsection{Sliding Box} \label{fig:sliding_box} \end{figure} +\subsection{Sliding Box on Inclined Plane} + +\begin{figure} + \centering + \subfloat[]{ + \includegraphics{images/contributions/chapter_7/box_falling_inclined_mu_2.0} + \label{fig:box_inclined_mu_2.0} + } + \\ + \subfloat[]{ + \includegraphics{images/contributions/chapter_7/box_falling_inclined_mu_0.0} + \label{fig:box_inclined_mu_0.0} + } + \\ + \subfloat[]{ + \includegraphics{images/contributions/chapter_7/box_falling_inclined_mu_0.5} + \label{fig:box_inclined_mu_0.5} + } + \caption{Comparison of the box's \ac{CoM} trajectory simulated with the proposed soft-contact model and the Mujoco simulator, considering a coefficient of friction (a) $\mu=2$, (b) $\mu=0$, (c) $\mu=0.5$.} + \label{fig:box_inclined_mu} +\end{figure} + +We consider a model composed of a single box-shaped link. +The box has a mass of $1 \, Kg$ and $(x, y, z)$ dimensions equal to $(0.15, 0.1, 0.05) \, m$. +Its collision shape is approximated considering the 8 points corresponding to its corners. + +\begin{table} + \small + \centering + \begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, + } + \toprule + Property & Value \\ + \midrule + \texttt{timestep} & $0.001$ \\ + \texttt{integrator} & \texttt{RK4} \\ + \texttt{solver} & \texttt{Newton} \\ + \texttt{iterations} & $50$ \\ + \texttt{cone} & \texttt{elliptic} \\ + \texttt{friction} & $(\mu,\, 0,\, 0)$ \\ + \texttt{solref} & $(-1e6,\, -2000)$ \\ + \texttt{condim} & $3$ \\ + \bottomrule + \end{tblr} + \caption{Mujoco configuration considered in the experiments of the sliding box on inclined surface matching as close as possible the setting and properties of our soft-contact model. Refer to the official documentation at \url{https://mujoco.readthedocs.io} for a detailed explanation of the options.} + \label{tab:mujoco_parameters} +\end{table} + +The box starts floating in the air having its \ac{CoM} positioned in $\pos[W]_G = (0, 0, 1.0) \, m$. +Then, we let the box fall due to gravity over a plane inclined by $20\degree$ so that the box slides down in the direction of the $x$ axis. +In this setting, we perform three experiment each characterised by a different coefficient of friction $\mu$. +We simulate this setting for $2.5 \, s$ using the \ac{RK4} integration scheme with a step size of $1 \, ms$. +For validation purpose, we simulate the same setting with the Mujoco simulator configured with comparable physics parameters, reported in Table~\ref{tab:mujoco_parameters}. +In particular, Mujoco provides a similar spring-damper soft-contact model that we can consider as ground truth. + +In the first experiment, we consider an extremely large $\mu = 2.0$ so that we can assess how the selected stiffness $k$ and damping $\lambda$ affect the landing over the inclined plane without accounting for major sliding effects. +From Figure~\ref{fig:box_inclined_mu_2.0}, it can be noticed that the falling trajectories due to gravity overlap perfectly until the impact. +After the impact, the $y$ position of both boxes always remains constant, showing that in both cases the generated tangential forces do not have any $y$ component. +In our soft-contact model this means that no material deformation occurs in this direction, as expected. +The $x$ and $z$ components, although not matching perfectly due to the different formulation of the soft contacts --mainly due to Equation~\eqref{eq:soft_contact_normal_force}-- are sufficiently close and show the same qualitative behaviour. + +In the second experiment, we consider a friction-less simulation with $\mu = 0$. +In this setting, our soft-contact model only produces normal forces proportional to the penetration depth. +The friction cone cannot be evaluated and no tangential forces are produced, therefore the only possible regime is sliding. +Figure~\ref{fig:box_inclined_mu_0.0} reports the trajectories of the two falling boxes, showing in this case almost a perfect match between our soft-contacts model and Mujoco. + +In the third experiment, we consider a realistic coefficient of friction $\mu = 0.5$. +The combination of this coefficient of friction with the inclination of the terrain has been selected for reaching a steady sticking regime after an initial sliding regime right after the impact. +Figure~\ref{fig:box_inclined_mu_0.5} shows the trajectories of the two boxes. +Also in this case, after the impact, both simulations correctly do not produce tangential forces in the $y$ direction. +The $x$ and $z$ components, similarly to the $\mu = 2.0$ case, do not match perfectly but also in this case show the same qualitative behaviour where the box transitions from the sliding to a steady sticking regime approximately at $t=1.2~s$. + +All three experiments show that the proposed contact model behaves similarly to the ground truth represented by the soft-contact model implemented in the Mujoco simulator. +The qualitative behaviour always look alike, although the trajectories do not necessarily match quantitatively. +The main reason for the numerical mismatch is twofold. +First, the formulation used to compute the normal force differs from Equation~\eqref{eq:soft_contact_normal_force}. +This difference propagates also to the tangential component since the sticking-slipping boundary is a function of the normal force. +Furthermore, the methodology to compute the contact force is completely different. +In fact, we use a continuous contact model that introduces an additional state to the \ac{ODE} describing the dynamics of the multi-body system. +Mujoco, instead, at each time step solves a constrained quadratic optimisation problem. +Further details on the internal details of Mujoco contacts can be found in \parencite{todorov_convex_2014, vousten_simulating_2022, yoon_comparative_2023}. + +\newpage \section{Conclusions} In this chapter, we described how to model a floating-base multibody system interacting with a non-flat surface. @@ -659,9 +767,10 @@ \section{Conclusions} This assumption simplifies the collision detection process, requiring just trivial geometrical assessments. The main benefit of our collisions and contacts modelling is the possibility of obtaining an extended state-space representation that includes both the dynamics of the multibody system and the dynamics of the contacts. The contact-aware evolution of the system can be derived by any numerical integration scheme. -We validated the dynamical system in a simplified setting in which single bodies interact with a flat terrain, showing the continuity of the soft-contact model and the switching between the sticking and slipping regimes. +We validated the dynamical system in two simplified settings. +In the first one, we considered single bodies interacting with a flat terrain, showing the continuity of the soft-contacts model and the switching between the sticking and slipping regimes. +In the second one, we considered non-flat terrain, showing the trajectory of a box falling over an inclined plane characterised by different coefficients of friction. While sphere and box collisions might seem trivial examples, they often represent the typical collision shapes adopted to simulate robot's feet. -The shown results remain valid also in case of a smooth uneven terrain supported by the contact-model, where a comparable behaviour occurs in a different plane. The final contact-aware dynamical system combined with the point-surface collision detection logic presents different limitations, especially when compared to implementations provided by general-purpose simulators. Our solution does not support detecting collisions between different bodies and does not consider joint limits and other types of constraints. diff --git a/Chapters/Part_2/chapter_8.tex b/Chapters/Part_2/chapter_8.tex index 75d3b51..14cc0a2 100644 --- a/Chapters/Part_2/chapter_8.tex +++ b/Chapters/Part_2/chapter_8.tex @@ -587,20 +587,21 @@ \subsection{ABA: Articulated-Body Algorithm} \end{algorithm} \subsection{Soft-contacts Algorithm} +\label{sec:soft_contacts_algorithm} Most of the algorithms listed in this chapter accept a matrix $\forcesix[W]^{ext}_\mathcal{L} \in \realn^{N_B^\mathcal{M}\times6}$ containing the 6D forces applied to each link of the model. These external forces are the sum of two components: the forces that could be applied to the links by the user, and the forces corresponding to active contacts with the terrain surface. Algorithm~\ref{algo:soft_contacts} provides the implementation of the soft-contacts model introduced in Section~\ref{section:contact_model} to compute the contact force corresponding to each collidable point of $\mathcal{M}$. -Assuming the knowledge of the location of each collidable point \wrt the frame of their associated link, we can compute their position $\pos[W]_{cp}$ and inertial-fixed linear velocity $\vellin[W]_{W,C}$ with forward kinematics from the pair $(\Qu, \Nu)$. +Assuming the knowledge of the location of each collidable point \wrt the frame of their associated link, we can compute their position $\pos[W]_{cp}$ and mixed linear velocity $\vellin[{C[W]}]_{W,C}$ with forward kinematics from the pair $(\Qu, \Nu)$. The algorithm takes the following inputs: % \begin{itemize} \item $\pos[W]_{cp} \in \realn^3$: the position of the contact point in world coordinates; - \item $\vellin[W]_{W,C} \in \realn^3$: the linear velocity expressed in world coordinates of the frame $C = (\pos_{cp}, [W])$ associated to the contact point; - \item ${}^W \mathbf{u} \in \realn^3$: the tangential deformation of the terrain associated to the contact point; + \item $\posd[W]_C \in \realn^3$: the linear velocity of the contact point, that matches the linear component of the mixed velocity $\velsix[{C[W]}]_{W,C} \in \realn^6$ of frame $C$; + \item ${}^W \mathbf{m} \in \realn^3$: the compounded tangential deformation of the terrain associated to the contact point; \item $K, D \in \realn^+$: the parameters of the spring-damper model used for both the normal and tangential force calculation; - \item $\mu \in \realn^+$: the static friction coefficient of the contact point; + \item $\mu \in \realn^+$: the coefficient of friction of the contact point; \item $\mathcal{H}: \realn^2 \to \realn$: a function returning the terrain height $z_T$ at given $(x, y)$ coordinates; \item $\mathcal{S}: \realn^2 \to \realn^3$: a function returning the normal of the surface $\hat{\mathbf{n}}$ at given $(x, y)$ coordinates\footnote{Under the assumption of smooth terrain, an approximation of $\mathcal{S}$ could be calculated from $\mathcal{H}$, \ie $\mathcal{S}(x, y) = f\left(\mathcal{H}(x, y)\right)$}; \end{itemize} @@ -609,7 +610,7 @@ \subsection{Soft-contacts Algorithm} % \begin{itemize} \item $\forcesix[W]_{cp} \in \realn^6$: the 6D force computed by the soft-contacts model; - \item ${}^W \dot{\mathbf{u}} \in \realn^3$: the derivative of the tangential deformation of the terrain associated to the contact point. + \item ${}^W \dot{\mathbf{m}} \in \realn^3$: the derivative of the tangential deformation of the terrain material associated to the contact point. \end{itemize} % In order to optimise performance, the algorithm can be vectorised to process all the collidable points belonging to the model. @@ -620,48 +621,55 @@ \subsection{Soft-contacts Algorithm} \begin{algorithmic}[1] \small -\State \textbf{inputs} $\left( \pos[W]_{cp}, \vellin[W]_{W,C}, {}^W \mathbf{u}, K, D, \mu, \mathcal{H}, \mathcal{S} \right)$ +\State \textbf{inputs} $\left( \pos[W]_{cp}, \posd[W]_C, {}^W \mathbf{m}, K, D, \mu, \mathcal{H}, \mathcal{S} \right)$ \State $x_{cp}, y_{cp}, z_{cp} = \pos[W]_{cp}$ \State $z_T = \mathcal{H}(x_{cp}, y_{cp})$ -\State ${}^W \dot{\mathbf{u}} = -(K / D) {}^W \mathbf{u}$ +\State ${}^W \dot{\mathbf{m}} = -(K / D) {}^W \mathbf{m}$ \Comment{Material relaxation dynamics} \State $\forcesix[W]_{cp} = \zeros_6$ -\If{$z_{cp} < z_T$} - \LComment{Compute normal force} - \State $\mathbf{h} = \left[0, 0, z_T - z_{cp}\right]^\top$ - \State $\hat{\mathbf{n}} = \mathcal{S}(x_{cp}, y_{cp})$ +% \If{$z_{cp} < z_T$} +\LComment{Compute normal force} +\State $\hat{\mathbf{n}} = \mathcal{S}(x_{cp}, y_{cp})$ +\State $\mathbf{h} = \left[0, 0, z_T - z_{cp}\right]^\top$ - \State $\delta = \max(0, -\mathbf{h} \cdot \hat{\mathbf{n}})$ - \State $\dot{\delta} = -\vellin[W]_{W,C} \cdot \hat{\mathbf{n}}$ - - \State $\forcelin_\perp = \sqrt{\delta} \left( K \delta + D \dot{\delta} \right) \hat{\mathbf{n}}$ - - \LComment{Compute tangential force} - \State $\forcelin_\parallel = \zeros_3$ - \If{$\mu \neq 0$} - \State $\vellin^\perp = (\vellin[W]_{W,C} \cdot \hat{\mathbf{n}}) \hat{\mathbf{n}}$ - \State $\vellin^\parallel = \vellin[W]_{W,C} - \vellin^\perp$ - - \State $\forcelin_\parallel = -\sqrt{\delta}(K {}^W \mathbf{u} + D \vellin^\parallel)$ - \Comment{Compute sticking force} - \State ${}^W \dot{\mathbf{u}} = \vellin_\parallel$ - - \State $f_{cone} = \mu \norm{\forcelin_\perp}$ +\State $\delta = \max(0, \mathbf{h} \cdot \hat{\mathbf{n}})$ +\State $\dot{\delta} = -\posd[W]_C \cdot \hat{\mathbf{n}}$ + +\State $\forcelin_\perp = \sqrt{\delta} \left( K \delta + D \dot{\delta} \right) \hat{\mathbf{n}}$ + +\LComment{Compute tangential force} +\State $\forcelin_\parallel = \zeros_3$ +% \State ${}^W \dot{\mathbf{m}} = \vellin_\parallel$ +\If{$\mu \neq 0$ and $z_{cp} > z_T$} + % \If{$z_{cp} > z_T$} + % \Comment{Contact not active or deactivated} - \If{$\norm{\forcelin_\parallel} > f_{cone}$} - \Comment{Compute slipping force} - \State $\forcelin_\parallel = \left(f_{cone} / \norm{\forcelin_\parallel}\right) \forcelin_\parallel$ - \State ${}^W \dot{\mathbf{u}} = -(\forcelin_\parallel + K \sqrt{\delta} {}^W \mathbf{u}) / (D \sqrt{\delta})$ - \EndIf - \EndIf + % \Else + % \Comment{Contact active} + \State $\vellin^\perp = (\posd[W]_C \cdot \hat{\mathbf{n}}) \hat{\mathbf{n}}$ + \State $\vellin^\parallel = \posd[W]_C - \vellin^\perp$ + + \State $\forcelin_\parallel = -\sqrt{\delta}(K {}^W \mathbf{m} + D \vellin^\parallel)$ + \Comment{Compute sticking force} + \State ${}^W \dot{\mathbf{m}} = \vellin_\parallel$ + + \State $f_{cone} = \mu \norm{\forcelin_\perp}$ - \LComment{Compute 6D contact force in the world frame} - \State $\forcesix[W]_{cp} = \transfor[W]^C [(\forcelin_\perp + \forcelin_\parallel)^\top, \zeros^\top_3]^\top$ + \If{$\norm{\forcelin_\parallel} > f_{cone}$} + \Comment{Compute slipping force} + \State $\forcelin_\parallel = \left(f_{cone} / \norm{\forcelin_\parallel}\right) \forcelin_\parallel$ + \State ${}^W \dot{\mathbf{m}} = -(\forcelin_\parallel + K \sqrt{\delta} {}^W \mathbf{m}) / (D \sqrt{\delta})$ + \EndIf + % \EndIf \EndIf -\State \textbf{outputs} $\left(\forcesix[W]_{cp}, {}^W \dot{\mathbf{u}}\right)$ +\LComment{Compute 6D contact force in the world frame} +\State $\forcesix[W]_{cp} = \transfor[W]^C [(\forcelin_\perp + \forcelin_\parallel)^\top, \zeros^\top_3]^\top$ +% \EndIf + +\State \textbf{outputs} $\left(\forcesix[W]_{cp}, {}^W \dot{\mathbf{m}}\right)$ \end{algorithmic} \end{algorithm} diff --git a/images/contributions/chapter_7/box_falling_inclined_mu_0.0.tikz b/images/contributions/chapter_7/box_falling_inclined_mu_0.0.tikz new file mode 100644 index 0000000..86a5472 --- /dev/null +++ b/images/contributions/chapter_7/box_falling_inclined_mu_0.0.tikz @@ -0,0 +1,322 @@ +% This file was created with tikzplotlib v0.10.1. +\begin{tikzpicture} +% [scale=0.6, every node/.style={scale=0.6}] + +\definecolor{darkgray176}{RGB}{176,176,176} +\definecolor{darkorange2251245}{RGB}{225,124,5} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{lightseagreen56166165}{RGB}{56,166,165} +\definecolor{olivedrab11517572}{RGB}{115,175,72} +\definecolor{seagreen1513384}{RGB}{15,133,84} +\definecolor{teal29105150}{RGB}{29,105,150} + +\begin{axis}[ +% height=3cm, +width=0.95*\textwidth, +height=0.8*\axisdefaultheight, +tick align=outside, +tick pos=left, +title={\textbf{Box falling over inclined plane with $\mu=0$}}, +x grid style={darkgray176!30}, +xlabel={Time [s]}, +xmajorgrids, +xmin=-0.12395, xmax=2.58, +xtick style={color=black}, +xtick={0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5}, +y grid style={darkgray176!30}, +ylabel={Base height [m]}, +ymajorgrids, +ymin=-4.17728665087181, ymax=10.3927832598347, +ytick style={color=black}, +legend cell align={left}, +legend pos=north west, +legend style={ + font=\small, +}, +] +\addplot [teal29105150] +table {% +0.0010000467300415 0 +0.442000031471252 0.000573396682739258 +0.449000000953674 0.00492429733276367 +0.453000068664551 0.0100739002227783 +0.460999965667725 0.0227117538452148 +0.565999984741211 0.193681240081787 +0.58899998664856 0.233665704727173 +0.639999985694885 0.33257532119751 +0.674000024795532 0.402942299842834 +0.705999970436096 0.47255551815033 +0.739000082015991 0.547716379165649 +0.773000001907349 0.628741979598999 +0.808000087738037 0.715954065322876 +0.842999935150146 0.807024478912354 +0.878999948501587 0.904722213745117 +0.916000008583069 1.00938737392426 +0.953000068664551 1.11836445331573 +0.990999937057495 1.23477506637573 +1.02999997138977 1.35897839069366 +1.06900000572205 1.48797225952148 +1.10899996757507 1.62525022029877 +1.14999997615814 1.7711900472641 +1.19099998474121 1.92242455482483 +1.23300004005432 2.08283758163452 +1.27600002288818 2.25282573699951 +1.32000005245209 2.43279576301575 +1.3639999628067 2.61886358261108 +1.40900003910065 2.81546711921692 +1.45500004291534 3.02303218841553 +1.50100004673004 3.23726177215576 +1.54799997806549 3.46303200721741 +1.59599995613098 3.70078730583191 +1.64400005340576 3.94579935073853 +1.692999958992 4.2034010887146 +1.74300003051758 4.47405481338501 +1.7940000295639 4.75823402404785 +1.84500002861023 5.05060577392578 +1.89699995517731 5.35714483261108 +1.94900000095367 5.67220067977905 +2.00200009346008 6.00207901000977 +2.05599999427795 6.34728097915649 +2.11100006103516 6.70831632614136 +2.16599988937378 7.07887983322144 +2.22199988365173 7.46596956253052 +2.27800011634827 7.86293697357178 +2.33599996566772 8.28449440002441 +2.39400005340576 8.71664714813232 +2.45300006866455 9.16712284088135 +2.5 9.53382110595703 +}; +\addplot [seagreen1513384] +table {% +0.0010000467300415 0 +2.5 0 +}; +\addplot [indianred2048062] +table {% +0.0010000467300415 0.999995112419128 +0.0130000114440918 0.99917197227478 +0.0249999761581421 0.99693751335144 +0.0369999408721924 0.993291854858398 +0.0490000247955322 0.988235116004944 +0.062000036239624 0.981164455413818 +0.0750000476837158 0.972437500953674 +0.0880000591278076 0.962054371833801 +0.101999998092651 0.949020385742188 +0.115999937057495 0.934065580368042 +0.13100004196167 0.915911078453064 +0.146000027656555 0.895551681518555 +0.161999940872192 0.871404409408569 +0.179000020027161 0.84299910068512 +0.195999979972839 0.811761617660522 +0.21399998664856 0.775599598884583 +0.23199999332428 0.736262321472168 +0.250999927520752 0.691295146942139 +0.271000027656555 0.640139102935791 +0.291000008583069 0.585063099861145 +0.312000036239624 0.523014426231384 +0.333999991416931 0.453375577926636 +0.355999946594238 0.378993630409241 +0.378999948501587 0.296159029006958 +0.402999997138977 0.204195857048035 +0.427000045776367 0.106587886810303 +0.442999958992004 0.0415151119232178 +0.450000047683716 0.0242810249328613 +0.453999996185303 0.0216199159622192 +0.45799994468689 0.0211853981018066 +0.470000028610229 0.0198094844818115 +0.48199999332428 0.0170224905014038 +0.49399995803833 0.0128241777420044 +0.506999969482422 0.0066835880279541 +0.519999980926514 -0.00111329555511475 +0.532999992370605 -0.010566234588623 +0.559999942779541 -0.0328836441040039 +0.574000000953674 -0.0461361408233643 +0.585000038146973 -0.0565563440322876 +0.611999988555908 -0.0735692977905273 +0.674000024795532 -0.12025785446167 +0.718999981880188 -0.156206130981445 +0.759999990463257 -0.190991878509521 +0.800999999046326 -0.22770619392395 +0.842000007629395 -0.266347408294678 +0.884000062942505 -0.307929277420044 +0.925999999046326 -0.35153341293335 +0.968999981880188 -0.398270726203918 +1.01199996471405 -0.44712769985199 +1.05599999427795 -0.499315023422241 +1.10000002384186 -0.55372166633606 +1.14499998092651 -0.611660599708557 +1.19000005722046 -0.671920895576477 +1.23599994182587 -0.735919713973999 +1.28299999237061 -0.803815245628357 +1.33000004291534 -0.874243021011353 +1.37800002098083 -0.948783159255981 +1.42599999904633 -1.0259644985199 +1.47500002384186 -1.10747814178467 +1.52499997615814 -1.19349265098572 +1.57500004768372 -1.28237307071686 +1.62600004673004 -1.37598371505737 +1.67700004577637 -1.47257602214813 +1.72899997234344 -1.57413220405579 +1.78199994564056 -1.68083131313324 +1.83500003814697 -1.79075062274933 +1.88900005817413 -1.90605580806732 +1.94400000572205 -2.02693247795105 +1.99899995326996 -2.1512770652771 +2.0550000667572 -2.28144526481628 +2.11199998855591 -2.4176299571991 +2.16899991035461 -2.55753898620605 +2.22699999809265 -2.70372605323792 +2.28600001335144 -2.85639023780823 +2.34500002861023 -3.01304483413696 +2.40499997138977 -3.17644739151001 +2.46600008010864 -3.34680390357971 +2.5 -3.44360828399658 +}; +\addplot [thick, lightseagreen56166165, densely dashed] +table {% +0.0010000467300415 0 +0.437000036239624 0.000566005706787109 +0.445999979972839 0.00671303272247314 +0.483999967575073 0.0645122528076172 +0.513000011444092 0.110060095787048 +0.54200005531311 0.159225106239319 +0.572000026702881 0.212929368019104 +0.602999925613403 0.271462559700012 +0.634000062942505 0.333084225654602 +0.666000008583069 0.399933218955994 +0.697999954223633 0.470073223114014 +0.730999946594238 0.545851945877075 +0.764999985694885 0.627587676048279 +0.799000024795532 0.713038444519043 +0.833999991416931 0.804883360862732 +0.870000004768372 0.903459548950195 +0.906000018119812 1.00620102882385 +0.943000078201294 1.11613667011261 +0.980999946594238 1.23362326622009 +1.01900005340576 1.35575079917908 +1.05799996852875 1.48591768741608 +1.09800004959106 1.62450003623962 +1.13900005817413 1.77188360691071 +1.17999994754791 1.92466962337494 +1.22200000286102 2.08678388595581 +1.26400005817413 2.25456738471985 +1.307000041008 2.4322190284729 +1.35099995136261 2.62015342712402 +1.39499998092651 2.81430983543396 +1.44000005722046 3.0193145275116 +1.48599994182587 3.23560166358948 +1.53299999237061 3.46361446380615 +1.58000004291534 3.69872665405273 +1.62800002098083 3.94616889953613 +1.67700004577637 4.2064037322998 +1.72699999809265 4.47990369796753 +1.77699995040894 4.76143836975098 +1.82799994945526 5.05688142776489 +1.87899994850159 5.36068296432495 +1.93099999427795 5.67904853820801 +1.98399996757507 6.01247930526733 +2.03699994087219 6.3549370765686 +2.09100008010864 6.71314191818237 +2.14599990844727 7.08761310577393 +2.20099997520447 7.47180652618408 +2.25799989700317 7.88022899627686 +2.31500005722046 8.29909324645996 +2.37299990653992 8.73602390289307 +2.43099999427795 9.18376636505127 +2.4909999370575 9.6583251953125 +2.5 9.73050689697266 +}; +\addplot [thick, olivedrab11517572, densely dashed] +table {% +0.0010000467300415 0 +2.5 0 +}; +\addplot [thick, darkorange2251245, densely dashed] +table {% +0.0010000467300415 0.999994993209839 +0.0130000114440918 0.999155044555664 +0.0249999761581421 0.996875047683716 +0.0369999408721924 0.993155002593994 +0.0490000247955322 0.987995028495789 +0.062000036239624 0.980780005455017 +0.0750000476837158 0.971874952316284 +0.0880000591278076 0.961279988288879 +0.101999998092651 0.947979927062988 +0.115999937057495 0.932719945907593 +0.13100004196167 0.91419506072998 +0.146000027656555 0.893419981002808 +0.161999940872192 0.868780016899109 +0.179000020027161 0.839794993400574 +0.195999979972839 0.807919979095459 +0.21399998664856 0.77101993560791 +0.23199999332428 0.73088002204895 +0.250999927520752 0.68499493598938 +0.271000027656555 0.632794976234436 +0.291000008583069 0.576595067977905 +0.312000036239624 0.513279914855957 +0.333999991416931 0.442219972610474 +0.355999946594238 0.366320013999939 +0.378999948501587 0.281795024871826 +0.402999997138977 0.187955021858215 +0.427000045776367 0.0883549451828003 +0.437999963760376 0.0441064834594727 +0.444999933242798 0.0244526863098145 +0.450000047683716 0.0221047401428223 +0.462000012397766 0.0177290439605713 +0.473999977111816 0.0119132995605469 +0.486999988555908 0.00398802757263184 +0.521000027656555 -0.0182926654815674 +0.559000015258789 -0.0422961711883545 +0.597000002861023 -0.0679888725280762 +0.635999917984009 -0.0961141586303711 +0.674999952316284 -0.126018524169922 +0.714999914169312 -0.158537864685059 +0.754999995231628 -0.192928791046143 +0.795000076293945 -0.229191422462463 +0.835999965667725 -0.268302917480469 +0.876999974250793 -0.309380650520325 +0.919000029563904 -0.353499174118042 +0.960999965667725 -0.399681329727173 +1.00399994850159 -0.449100494384766 +1.04700005054474 -0.500682711601257 +1.09099996089935 -0.555703401565552 +1.1360000371933 -0.614316940307617 +1.18099999427795 -0.675299167633057 +1.22699999809265 -0.740084767341614 +1.27300000190735 -0.807345628738403 +1.32000005245209 -0.878625154495239 +1.36800003051758 -0.954088449478149 +1.41600000858307 -1.03224658966064 +1.4650000333786 -1.11481308937073 +1.51400005817413 -1.20018804073334 +1.56400001049042 -1.29020047187805 +1.6139999628067 -1.38313734531403 +1.66499996185303 -1.4809455871582 +1.71700000762939 -1.58380424976349 +1.76900005340576 -1.68982589244843 +1.82200002670288 -1.80114114284515 +1.87600004673004 -1.91793620586395 +1.92999994754791 -2.03814220428467 +1.98500001430511 -2.16408061981201 +2.04099988937378 -2.29594421386719 +2.09699988365173 -2.431476354599 +2.15300011634827 -2.57067656517029 +2.21099996566772 -2.71871542930603 +2.26999998092651 -2.87334418296814 +2.3289999961853 -3.03204488754272 +2.38899993896484 -3.1976113319397 +2.44899988174438 -3.36738896369934 +2.5 -3.51501083374023 +}; + +\addlegendentry{$x$} +\addlegendentry{$y$} +\addlegendentry{$z$} + +\addlegendentry{Mujoco $x$} +\addlegendentry{Mujoco $y$} +\addlegendentry{Mujoco $z$} + +\end{axis} + +\end{tikzpicture} \ No newline at end of file diff --git a/images/contributions/chapter_7/box_falling_inclined_mu_0.5.tikz b/images/contributions/chapter_7/box_falling_inclined_mu_0.5.tikz new file mode 100644 index 0000000..168a6c1 --- /dev/null +++ b/images/contributions/chapter_7/box_falling_inclined_mu_0.5.tikz @@ -0,0 +1,272 @@ +% This file was created with tikzplotlib v0.10.1. +\begin{tikzpicture} + +\definecolor{darkgray176}{RGB}{176,176,176} +\definecolor{darkorange2251245}{RGB}{225,124,5} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{lightseagreen56166165}{RGB}{56,166,165} +\definecolor{olivedrab11517572}{RGB}{115,175,72} +\definecolor{seagreen1513384}{RGB}{15,133,84} +\definecolor{teal29105150}{RGB}{29,105,150} + +\begin{axis}[ +% height=6cm, +width=0.95*\textwidth, +height=0.8*\axisdefaultheight, +tick align=outside, +tick pos=left, +title={\textbf{Box falling over inclined plane with $\mu = 0.5$}}, +x grid style={darkgray176!30}, +xlabel={Time [s]}, +xmajorgrids, +xmin=-0.12395, xmax=2.58, +xtick style={color=black}, +xtick={0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5}, +y grid style={darkgray176!30}, +ylabel={Base height [m]}, +ymajorgrids, +ymin=-0.132449020285239, ymax=1.05392101048977, +ytick style={color=black}, +ytick={0, 0.25, 0.5, 0.75, 1}, +legend cell align={left}, +legend pos=north east, +legend style={ + font=\small, +}, +] +\addplot [teal29105150] +table {% +0.0010000467300415 0 +0.442000031471252 0.000253796577453613 +0.445999979972839 0.00247931480407715 +0.450000047683716 0.0051875114440918 +0.45799994468689 0.0108218193054199 +0.568000078201294 0.0835368633270264 +0.582000017166138 0.0918009281158447 +0.618000030517578 0.111893534660339 +0.644999980926514 0.125964283943176 +0.672999978065491 0.139641165733337 +0.700999975204468 0.152393937110901 +0.728999972343445 0.16422438621521 +0.756999969482422 0.175131916999817 +0.783999919891357 0.184775829315186 +0.810999989509583 0.193561673164368 +0.838000059127808 0.201489329338074 +0.865000009536743 0.208558917045593 +0.891999959945679 0.214770317077637 +0.919000029563904 0.220123529434204 +0.945999979972839 0.224618673324585 +0.973000049591064 0.2282555103302 +1 0.231034398078918 +1.02699995040894 0.232955098152161 +1.05299997329712 0.233993530273438 +1.08000004291534 0.234236001968384 +1.18799996376038 0.234227418899536 +2.5 0.234227418899536 +}; +\addplot [seagreen1513384] +table {% +0.0010000467300415 0 +2.5 0 +}; +\addplot [indianred2048062] +table {% +0.0010000467300415 0.999995112419128 +0.00999999046325684 0.999510049819946 +0.0190000534057617 0.998231172561646 +0.0279999971389771 0.996158361434937 +0.0369999408721924 0.993291854858398 +0.0460000038146973 0.989631652832031 +0.0559999942779541 0.984633684158325 +0.0659999847412109 0.978655576705933 +0.0759999752044678 0.971697568893433 +0.0859999656677246 0.963759660720825 +0.0970000028610229 0.953895926475525 +0.108000040054321 0.942846417427063 +0.11899995803833 0.930611133575439 +0.129999995231628 0.917190074920654 +0.141999959945679 0.901196479797363 +0.154000043869019 0.88379168510437 +0.166000008583069 0.864975571632385 +0.179000020027161 0.84299910068512 +0.192000031471252 0.819366455078125 +0.205000042915344 0.794077515602112 +0.218999981880188 0.764991044998169 +0.233000040054321 0.733983874320984 +0.246999979019165 0.701055884361267 +0.26199996471405 0.663644313812256 +0.276999950408936 0.624027967453003 +0.292999982833862 0.579339981079102 +0.309000015258789 0.532143115997314 +0.325000047683716 0.482437491416931 +0.342000007629395 0.426876425743103 +0.358999967575073 0.368483066558838 +0.376999974250793 0.303567886352539 +0.394999980926514 0.235477447509766 +0.412999987602234 0.164211869239807 +0.432000041007996 0.0855424404144287 +0.442000031471252 0.0443036556243896 +0.449000000953674 0.025312066078186 +0.450000047683716 0.0240069627761841 +0.453000068664551 0.0228824615478516 +0.455000042915344 0.0228866338729858 +0.468999981880188 0.0258573293685913 +0.477999925613403 0.0267777442932129 +0.486999988555908 0.0269043445587158 +0.496000051498413 0.0262372493743896 +0.504999995231628 0.0247762203216553 +0.513999938964844 0.0225213766098022 +0.523000001907349 0.0194728374481201 +0.532999992370605 0.0151544809341431 +0.542999982833862 0.00985622406005859 +0.555999994277954 0.00265753269195557 +0.565999984741211 -0.00239086151123047 +0.569999933242798 -0.00435543060302734 +0.575000047683716 -0.00585639476776123 +0.598000049591064 -0.0101779699325562 +0.60699999332428 -0.0122526884078979 +0.625999927520752 -0.0159016847610474 +0.658999919891357 -0.021952748298645 +0.707000017166138 -0.0299975872039795 +0.750999927520752 -0.0365008115768433 +0.795000076293945 -0.0421739816665649 +0.83899998664856 -0.0470175743103027 +0.883000016212463 -0.0510315895080566 +0.927000045776367 -0.0542162656784058 +0.970999956130981 -0.056571364402771 +1.01499998569489 -0.0580971240997314 +1.05900001525879 -0.058793306350708 +1.13199996948242 -0.058833122253418 +2.5 -0.0588325262069702 +}; +\addplot [thick, lightseagreen56166165, densely dashed] +table {% +0.0010000467300415 0 +0.434999942779541 0 +0.435999989509583 0.00112450122833252 +0.437000036239624 0.00119030475616455 +0.440999984741211 0.00584769248962402 +0.447000026702881 0.0120009183883667 +0.455999970436096 0.0187956094741821 +0.575000047683716 0.106510758399963 +0.601999998092651 0.123914003372192 +0.633000016212463 0.142856121063232 +0.660000085830688 0.158423900604248 +0.683000087738037 0.170990586280823 +0.703000068664551 0.181349158287048 +0.733999967575073 0.196545720100403 +0.754999995231628 0.206146717071533 +0.777999997138977 0.21612286567688 +0.804000020027161 0.226582407951355 +0.832000017166138 0.236918807029724 +0.858999967575073 0.246038675308228 +0.891999959945679 0.255948781967163 +0.921999931335449 0.263829946517944 +0.950000047683716 0.270217657089233 +0.973999977111816 0.274942517280579 +1.00199997425079 0.279575824737549 +1.02900004386902 0.283153653144836 +1.05299997329712 0.285598278045654 +1.07799994945526 0.287408590316772 +1.10500001907349 0.288519501686096 +1.13199996948242 0.288776874542236 +2.5 0.288834810256958 +}; +\addplot [thick, olivedrab11517572, densely dashed] +table {% +0.0010000467300415 0 +2.5 0 +}; +\addplot [thick, darkorange2251245, densely dashed] +table {% +0.0010000467300415 0.999994993209839 +0.00999999046325684 0.999500036239624 +0.0190000534057617 0.998194932937622 +0.0279999971389771 0.996079921722412 +0.0369999408721924 0.993155002593994 +0.0460000038146973 0.989419937133789 +0.0559999942779541 0.984319925308228 +0.0659999847412109 0.978219985961914 +0.0759999752044678 0.971119999885559 +0.0859999656677246 0.963020086288452 +0.0959999561309814 0.953920006752014 +0.10699999332428 0.942754983901978 +0.118000030517578 0.930379986763 +0.128999948501587 0.916795015335083 +0.141000032424927 0.90059494972229 +0.152999997138977 0.882955074310303 +0.164999961853027 0.863874912261963 +0.177999973297119 0.841579914093018 +0.190999984741211 0.8175950050354 +0.203999996185303 0.791919946670532 +0.217999935150146 0.762380003929138 +0.23199999332428 0.73088002204895 +0.246000051498413 0.697420001029968 +0.261000037193298 0.659394979476929 +0.276000022888184 0.619120001792908 +0.29200005531311 0.57367992401123 +0.307999968528748 0.525680065155029 +0.324000000953674 0.475120067596436 +0.340999960899353 0.41859495639801 +0.358000040054321 0.359179973602295 +0.375999927520752 0.29312002658844 +0.394000053405762 0.223819971084595 +0.411999940872192 0.151280045509338 +0.430999994277954 0.0711950063705444 +0.437000036239624 0.0467886924743652 +0.440999984741211 0.0357747077941895 +0.445999979972839 0.0225692987442017 +0.447000026702881 0.0232104063034058 +0.447999954223633 0.0229274034500122 +0.449000000953674 0.0232925415039062 +0.450000047683716 0.0229843854904175 +0.453999996185303 0.024310827255249 +0.463000059127808 0.0264202356338501 +0.472000002861023 0.0277197360992432 +0.480999946594238 0.0282092094421387 +0.490000009536743 0.0278886556625366 +0.499000072479248 0.026758074760437 +0.508000016212463 0.0248175859451294 +0.516999959945679 0.0220670700073242 +0.526000022888184 0.0185065269470215 +0.53600001335144 0.0136003494262695 +0.546000003814697 0.00769424438476562 +0.555999994277954 0.000788092613220215 +0.565999984741211 -0.00619137287139893 +0.574000000953674 -0.0109801292419434 +0.589999914169312 -0.0144456624984741 +0.608999967575073 -0.018815279006958 +0.618000030517578 -0.0206425189971924 +0.638999938964844 -0.0255417823791504 +0.648000001907349 -0.027535080909729 +0.656000018119812 -0.0293054580688477 +0.681999921798706 -0.0348525047302246 +0.687999963760376 -0.0361528396606445 +0.694000005722046 -0.037238597869873 +0.730000019073486 -0.0433946847915649 +0.745000004768372 -0.0461028814315796 +0.754999995231628 -0.0476474761962891 +0.76200008392334 -0.0488287210464478 +0.782999992370605 -0.0525106191635132 +0.805000066757202 -0.0557047128677368 +0.810999989509583 -0.0563809871673584 +0.851000070571899 -0.0616813898086548 +0.88100004196167 -0.065147876739502 +0.943000078201294 -0.0711517333984375 +0.983000040054321 -0.0739614963531494 +1.08299994468689 -0.0781017541885376 +1.26499998569489 -0.0785040855407715 +2.5 -0.0785231590270996 +}; + +\addlegendentry{$x$} +\addlegendentry{$y$} +\addlegendentry{$z$} + +\addlegendentry{Mujoco $x$} +\addlegendentry{Mujoco $y$} +\addlegendentry{Mujoco $z$} + +\end{axis} + +\end{tikzpicture} diff --git a/images/contributions/chapter_7/box_falling_inclined_mu_2.0.tikz b/images/contributions/chapter_7/box_falling_inclined_mu_2.0.tikz new file mode 100644 index 0000000..2c4c277 --- /dev/null +++ b/images/contributions/chapter_7/box_falling_inclined_mu_2.0.tikz @@ -0,0 +1,216 @@ +% This file was created with tikzplotlib v0.10.1. +\begin{tikzpicture} + +\definecolor{darkgray176}{RGB}{176,176,176} +\definecolor{darkorange2251245}{RGB}{225,124,5} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{lightseagreen56166165}{RGB}{56,166,165} +\definecolor{olivedrab11517572}{RGB}{115,175,72} +\definecolor{seagreen1513384}{RGB}{15,133,84} +\definecolor{teal29105150}{RGB}{29,105,150} + +\begin{axis}[ +% height=6cm, +width=0.95*\textwidth, +height=0.8*\axisdefaultheight, +tick align=outside, +tick pos=left, +title={\textbf{Box falling over inclined plane with $\mu = 2$}}, +x grid style={darkgray176!30}, +xlabel={Time [s]}, +xmajorgrids, +xmin=-0.12395, xmax=2.58, +xtick style={color=black}, +xtick={0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5}, +y grid style={darkgray176!30}, +ylabel={Base height [m]}, +ymajorgrids, +% ymin=-0.049999755, ymax=1.049994855, +ymin=-0.02, ymax=0.23, +ytick style={color=black}, +legend cell align={left}, +legend pos=north east, +legend style={ + font=\small, +}, +] +\addplot [teal29105150] +table {% +0.0010000467300415 0 +0.442000031471252 0.000253796577453613 +0.444000005722046 0.00143909454345703 +0.452000021934509 0.00707650184631348 +0.45799994468689 0.00766670703887939 +0.564000010490417 0.0109648704528809 +0.598000049591064 0.0101429224014282 +0.621999979019165 0.0101568698883057 +1.31900000572205 0.0101816654205322 +2.5 0.0101816654205322 +}; +\addplot [seagreen1513384] +table {% +0.0010000467300415 0 +2.5 0 +}; +\addplot [indianred2048062] +table {% +0.0010000467300415 0.999995112419128 +0.00999999046325684 0.999510049819946 +0.0190000534057617 0.998231172561646 +0.0279999971389771 0.996158361434937 +0.0369999408721924 0.993291854858398 +0.0460000038146973 0.989631652832031 +0.0549999475479126 0.985177516937256 +0.0640000104904175 0.979929685592651 +0.0740000009536743 0.973167657852173 +0.0839999914169312 0.965425610542297 +0.093999981880188 0.956703662872314 +0.103999972343445 0.947001576423645 +0.115000009536743 0.935197472572327 +0.126000046730042 0.922207593917847 +0.13699996471405 0.908031940460205 +0.148000001907349 0.892670392990112 +0.159999966621399 0.874559998512268 +0.172000050544739 0.855038404464722 +0.184000015258789 0.834105610847473 +0.197000026702881 0.809835910797119 +0.210000038146973 0.783910036087036 +0.223000049591064 0.756327867507935 +0.236999988555908 0.724771976470947 +0.250999927520752 0.691295146942139 +0.264999985694885 0.655897498130798 +0.279999971389771 0.615839958190918 +0.294999957084656 0.573577523231506 +0.309999942779541 0.529109954833984 +0.325999975204468 0.479247570037842 +0.342000007629395 0.426876425743103 +0.358999967575073 0.368483066558838 +0.375999927520752 0.307257652282715 +0.39300000667572 0.243199944496155 +0.41100001335144 0.172287106513977 +0.429000020027161 0.0981991291046143 +0.442000031471252 0.0443036556243896 +0.449000000953674 0.0250624418258667 +0.450000047683716 0.02353835105896 +0.450999975204468 0.0229588747024536 +0.453999996185303 0.0230329036712646 +0.457000017166138 0.0241019725799561 +0.465999960899353 0.0278085470199585 +0.475000023841858 0.0307213068008423 +0.483999967575073 0.032840371131897 +0.493000030517578 0.0341655015945435 +0.501999974250793 0.0346969366073608 +0.510999917984009 0.0344345569610596 +0.519999980926514 0.0333783626556396 +0.529000043869019 0.0315283536911011 +0.537999987602234 0.0288845300674438 +0.559000015258789 0.0219417810440063 +0.562999963760376 0.0220979452133179 +0.572999954223633 0.0230638980865479 +0.582000017166138 0.0231125354766846 +0.598999977111816 0.0227097272872925 +0.611000061035156 0.0227071046829224 +0.628999948501587 0.0227463245391846 +0.708999991416931 0.0227131843566895 +2.5 0.0227134227752686 +}; +\addplot [thick, lightseagreen56166165, densely dashed] +table {% +0.0010000467300415 0 +0.434999942779541 0 +0.435999989509583 0.00112450122833252 +0.437000036239624 0.00120925903320312 +0.440999984741211 0.00599515438079834 +0.445999979972839 0.0111814737319946 +0.447000026702881 0.0103875398635864 +0.453999996185303 0.00866603851318359 +0.458999991416931 0.00904178619384766 +0.500999927520752 0.0110310316085815 +0.539999961853027 0.0114284753799438 +0.588000059127808 0.0127019882202148 +0.733000040054321 0.0127121210098267 +2.5 0.0127869844436646 +}; +\addplot [thick, olivedrab11517572, densely dashed] +table {% +0.0010000467300415 0 +2.5 -0 +}; +\addplot [thick, darkorange2251245, densely dashed] +table {% +0.0010000467300415 0.999994993209839 +0.00999999046325684 0.999500036239624 +0.0190000534057617 0.998194932937622 +0.0279999971389771 0.996079921722412 +0.0369999408721924 0.993155002593994 +0.0460000038146973 0.989419937133789 +0.0549999475479126 0.984874963760376 +0.0640000104904175 0.979520082473755 +0.0740000009536743 0.972620010375977 +0.0839999914169312 0.964720010757446 +0.093999981880188 0.955820083618164 +0.103999972343445 0.945919990539551 +0.115000009536743 0.93387508392334 +0.126000046730042 0.920619964599609 +0.13699996471405 0.906154990196228 +0.148000001907349 0.890480041503906 +0.159999966621399 0.871999979019165 +0.172000050544739 0.85207998752594 +0.184000015258789 0.830719947814941 +0.197000026702881 0.805954933166504 +0.210000038146973 0.779500007629395 +0.223000049591064 0.751354932785034 +0.236999988555908 0.719155073165894 +0.250999927520752 0.68499493598938 +0.264999985694885 0.648874998092651 +0.279999971389771 0.608000040054321 +0.294999957084656 0.56487500667572 +0.309999942779541 0.519500017166138 +0.325999975204468 0.46862006187439 +0.342000007629395 0.415179967880249 +0.358999967575073 0.355594992637634 +0.375999927520752 0.29312002658844 +0.39300000667572 0.227754950523376 +0.41100001335144 0.155395030975342 +0.429000020027161 0.0797950029373169 +0.437000036239624 0.0467896461486816 +0.440999984741211 0.0358445644378662 +0.444999933242798 0.0244522094726562 +0.445999979972839 0.0229599475860596 +0.447000026702881 0.0243866443634033 +0.45799994468689 0.0285851955413818 +0.467000007629395 0.0323610305786133 +0.476000070571899 0.0353269577026367 +0.485000014305115 0.0374828577041626 +0.49399995803833 0.0388287305831909 +0.501999974250793 0.03978431224823 +0.513000011444092 0.040326714515686 +0.523000001907349 0.0400729179382324 +0.532999992370605 0.0390980243682861 +0.542999982833862 0.0373951196670532 +0.552999973297119 0.0349531173706055 +0.562999963760376 0.0317627191543579 +0.572999954223633 0.027813196182251 +0.583999991416931 0.0225932598114014 +0.585000038146973 0.0220732688903809 +0.585999965667725 0.0221720933914185 +0.587000012397766 0.0216654539108276 +0.589999914169312 0.0220365524291992 +0.600000023841858 0.0223872661590576 +0.611000061035156 0.0220082998275757 +0.615999937057495 0.0219769477844238 +0.947000026702881 0.0219740867614746 +2.5 0.0219501256942749 +}; + +\addlegendentry{$x$} +\addlegendentry{$y$} +\addlegendentry{$z$} + +\addlegendentry{Mujoco $x$} +\addlegendentry{Mujoco $y$} +\addlegendentry{Mujoco $z$} + +\end{axis} + +\end{tikzpicture} \ No newline at end of file diff --git a/images/contributions/chapter_7/soft_contact_model.tikz b/images/contributions/chapter_7/soft_contact_model.tikz index 64fa96c..db967e2 100644 --- a/images/contributions/chapter_7/soft_contact_model.tikz +++ b/images/contributions/chapter_7/soft_contact_model.tikz @@ -105,15 +105,18 @@ color(100bp)=(transparent!10) } % Text Node \draw (288,46.6) node [anchor=south east] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 65; green, 70; blue, 203 } ,opacity=1 ] {$\mathbf{p}_{cp}( t)$}; % Text Node -\draw (190,213.4) node [anchor=north] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 189; green, 16; blue, 224 } ,opacity=1 ] {$\boldsymbol{v}_{W,C}$}; +\draw (190,213.4) node [anchor=north] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 189; green, 16; blue, 224 } ,opacity=1 ] {$\posd[W]_C$}; +% {$\boldsymbol{v}_{W,C}$} % Text Node -\draw (221,111.6) node [anchor=south west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 234; green, 81; blue, 100 } ,opacity=1 ] {$\mathbf{u}$}; +\draw (221,111.6) node [anchor=south west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 234; green, 81; blue, 100 } ,opacity=1 ] {$\mathbf{m}$}; % Text Node \draw (242,156.6) node [anchor=south west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 67; green, 207; blue, 175 } ,opacity=1 ] {$\delta $}; % Text Node -\draw (178,190) node [anchor=east] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 99; green, 159; blue, 30 } ,opacity=1 ] {$\boldsymbol{v}_{W,C}^{\parallel }$}; +\draw (178,190) node [anchor=east] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 99; green, 159; blue, 30 } ,opacity=1 ] {$\vellin_C^\parallel$}; +% {$\boldsymbol{v}_{W,C}^{\parallel }$} % Text Node -\draw (232,190) node [anchor=west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 74; green, 144; blue, 226 } ,opacity=1 ] {$\boldsymbol{v}_{W,C}^{\perp }$}; +\draw (232,190) node [anchor=west] [inner sep=0.75pt] [font=\scriptsize,color={rgb, 255:red, 74; green, 144; blue, 226 } ,opacity=1 ] {$\vellin_C^\perp$}; +% {$\boldsymbol{v}_{W,C}^{\perp }$}; % Text Node \draw (298,146.6) node [anchor=south east] [inner sep=0.75pt] [font=\tiny,color={rgb, 255:red, 0; green, 0; blue, 0 } ,opacity=1 ] {$W$}; % Text Node diff --git a/zotero.bib b/zotero.bib index 6f61393..78eb035 100644 --- a/zotero.bib +++ b/zotero.bib @@ -3043,3 +3043,50 @@ @book{marsden_jerrold_e_introduction_2013 year = {2013}, file = {(Texts in Applied Mathematics) Jerrold E. Marsden, Tudor S. Ratiu - Introduction To Mechanics And Symmetry A Basic Exposition of Classical Mechanical Systems-Springer (2010).pdf:/home/dferigo/Zotero/storage/2QH6GPAH/(Texts in Applied Mathematics) Jerrold E. Marsden, Tudor S. Ratiu - Introduction To Mechanics And Symmetry A Basic Exposition of Classical Mechanical Systems-Springer (2010).pdf:application/pdf}, } + +@phdthesis{vousten_simulating_2022, + title = {Simulating {Box} {Impact} {Dynamics} in {MuJoCo}}, + url = {https://pure.tue.nl/ws/portalfiles/portal/212923254/1462253_Impacts_in_MuJoCo.pdf}, + urldate = {2023-05-25}, + school = {Eindhoven University of Technology}, + author = {Vousten, Laurens C.J.M.}, + year = {2022}, + file = {1462253_Impacts_in_MuJoCo.pdf:/home/dferigo/Zotero/storage/WT5I7HRD/1462253_Impacts_in_MuJoCo.pdf:application/pdf}, +} + +@article{yoon_comparative_2023, + title = {Comparative {Study} of {Physics} {Engines} for {Robot} {Simulation} with {Mechanical} {Interaction}}, + volume = {13}, + issn = {2076-3417}, + url = {https://www.mdpi.com/2076-3417/13/2/680}, + doi = {10.3390/app13020680}, + abstract = {Simulation with a reasonable physical model is important to develop control algorithms for robots quickly, accurately, and safely without damaging the associated physical systems in various environments. However, it is difficult to choose the suitable tool for simulating a specific project. To help users in selecting the best tool when simulating a given project, we compare the performance of the four widely used physics engines, namely, ODE, Bullet, Vortex, and MoJoco, for various simple and complex industrial scenarios. We first summarize the technical algorithms implemented in each physics engine. We also designed four simulation scenarios ranging from simple scenarios for which analytic solution exists to complex industrial scenarios to compare the performance of each physics engine. We then present the simulation results in the default settings of all the physics engines, and analyze the behavior and contact force of the simulated objects.}, + language = {en}, + number = {2}, + urldate = {2023-05-25}, + journal = {Applied Sciences}, + author = {Yoon, Jaemin and Son, Bukun and Lee, Dongjun}, + month = jan, + year = {2023}, + pages = {680}, + file = {Yoon et al. - 2023 - Comparative Study of Physics Engines for Robot Sim.pdf:/home/dferigo/Zotero/storage/8TQL87NM/Yoon et al. - 2023 - Comparative Study of Physics Engines for Robot Sim.pdf:application/pdf}, +} + +@inproceedings{todorov_convex_2014, + address = {Hong Kong, China}, + title = {Convex and analytically-invertible dynamics with contacts and constraints: {Theory} and implementation in {MuJoCo}}, + isbn = {978-1-4799-3685-4}, + shorttitle = {Convex and analytically-invertible dynamics with contacts and constraints}, + url = {http://ieeexplore.ieee.org/document/6907751/}, + doi = {10.1109/ICRA.2014.6907751}, + abstract = {We describe a full-featured simulation pipeline implemented in the MuJoCo physics engine. It includes multi-joint dynamics in generalized coordinates, holonomic constraints, dry joint friction, joint and tendon limits, frictionless and frictional contacts that can have sliding, torsional and rolling friction. The forward dynamics of a 27-dof humanoid with 10 contacts are evaluated in 0.1 msec. Since the simulation is stable at 10 msec timesteps, it can run 100 times faster than real-time on a single core of a desktop processor. Furthermore the entire simulation pipeline can be inverted analytically, an order-ofmagnitude faster than the corresponding forward dynamics. We soften all constraints, in a way that avoids instabilities and unrealistic penetrations associated with earlier spring-damper methods and yet is sufficient to allow inversion. Constraints are imposed via impulses, using an extended version of the velocitystepping approach. For holomonic constraints the extension involves a soft version of the Gauss principle. For all other constraints we extend our earlier work on complementarity-free contact dynamics – which were already known to be invertible via an iterative solver – and develop a new formulation allowing analytical inversion.}, + language = {en}, + urldate = {2023-05-25}, + booktitle = {2014 {IEEE} {International} {Conference} on {Robotics} and {Automation} ({ICRA})}, + publisher = {IEEE}, + author = {Todorov, Emanuel}, + month = may, + year = {2014}, + pages = {6054--6061}, + file = {Todorov - 2014 - Convex and analytically-invertible dynamics with c.pdf:/home/dferigo/Zotero/storage/P5DDZM5I/Todorov - 2014 - Convex and analytically-invertible dynamics with c.pdf:application/pdf}, +} From 233041d6445ec61f1de8af425547840a8f3d855b Mon Sep 17 00:00:00 2001 From: Diego Ferigo Date: Sat, 3 Jun 2023 10:21:24 +0200 Subject: [PATCH 4/5] Add sim-to-sim validation --- Chapters/Part_1/chapter_3.tex | 1 + Chapters/Part_2/chapter_7.tex | 2 + Chapters/Part_2/chapter_8.tex | 294 +- Chapters/epilogue.tex | 7 +- FrontBackmatter/contents.tex | 2 +- classicthesis-config.tex | 1 + images/contributions/chapter_8/cartpole.png | Bin 0 -> 18045 bytes .../chapter_8/cartpole_benchmark_mujoco.tikz | 955 ++++++ .../chapter_8/cartpole_jaxsim_vs_mujoco.tikz | 682 ++++ .../chapter_8/learning_curves.tikz | 2874 +++++++++++++++++ thesis.tex | 37 +- zotero.bib | 96 + 12 files changed, 4910 insertions(+), 41 deletions(-) create mode 100644 images/contributions/chapter_8/cartpole.png create mode 100644 images/contributions/chapter_8/cartpole_benchmark_mujoco.tikz create mode 100644 images/contributions/chapter_8/cartpole_jaxsim_vs_mujoco.tikz create mode 100644 images/contributions/chapter_8/learning_curves.tikz diff --git a/Chapters/Part_1/chapter_3.tex b/Chapters/Part_1/chapter_3.tex index f81a1ad..a0d8a99 100644 --- a/Chapters/Part_1/chapter_3.tex +++ b/Chapters/Part_1/chapter_3.tex @@ -593,6 +593,7 @@ \subsection{Policy Gradient} \end{remark*} \subsection{Generalized Advantage Estimation} +\label{sec:gae} Policy gradient methods are not uniquely defined by the final forms of Equation~\eqref{equation:policy_gradient_final} and Equation~\eqref{equation:policy_gradient_reward_to_go}. They are just two specific cases of a more general formulation expressed in the following form: diff --git a/Chapters/Part_2/chapter_7.tex b/Chapters/Part_2/chapter_7.tex index 0623985..8e158aa 100644 --- a/Chapters/Part_2/chapter_7.tex +++ b/Chapters/Part_2/chapter_7.tex @@ -594,6 +594,7 @@ \section{Validation} The box should start accelerating only when the applied force is able to overcome the opposing effects due to friction. We conclude this section by validating the contact model on non-flat terrain. We simulate a falling box over an inclined plane characterised by different coefficients of friction, and compare its trajectory with the Mujoco simulator~\parencite{todorov_mujoco_2012}. +The specifications of the machine used to execute the validation experiments are reported in Table~\ref{tab:laptop_specifications_validation}. \subsection{Bouncing Ball} @@ -674,6 +675,7 @@ \subsection{Sliding Box on Flat Terrain} \end{figure} \subsection{Sliding Box on Inclined Plane} +\label{sec:sliding_box_inclined_plane} \begin{figure} \centering diff --git a/Chapters/Part_2/chapter_8.tex b/Chapters/Part_2/chapter_8.tex index 14cc0a2..3092dbe 100644 --- a/Chapters/Part_2/chapter_8.tex +++ b/Chapters/Part_2/chapter_8.tex @@ -625,11 +625,10 @@ \subsection{Soft-contacts Algorithm} \State $x_{cp}, y_{cp}, z_{cp} = \pos[W]_{cp}$ \State $z_T = \mathcal{H}(x_{cp}, y_{cp})$ -\State ${}^W \dot{\mathbf{m}} = -(K / D) {}^W \mathbf{m}$ +\State ${}^W \dot{\mathbf{m}} = -(K / D) \, {}^W \mathbf{m}$ \Comment{Material relaxation dynamics} \State $\forcesix[W]_{cp} = \zeros_6$ -% \If{$z_{cp} < z_T$} \LComment{Compute normal force} \State $\hat{\mathbf{n}} = \mathcal{S}(x_{cp}, y_{cp})$ \State $\mathbf{h} = \left[0, 0, z_T - z_{cp}\right]^\top$ @@ -641,17 +640,12 @@ \subsection{Soft-contacts Algorithm} \LComment{Compute tangential force} \State $\forcelin_\parallel = \zeros_3$ -% \State ${}^W \dot{\mathbf{m}} = \vellin_\parallel$ -\If{$\mu \neq 0$ and $z_{cp} > z_T$} - % \If{$z_{cp} > z_T$} - % \Comment{Contact not active or deactivated} - - % \Else - % \Comment{Contact active} - \State $\vellin^\perp = (\posd[W]_C \cdot \hat{\mathbf{n}}) \hat{\mathbf{n}}$ + +\If{$\mu \neq 0$ and $z_{cp} < z_T$} + \State $\vellin^\perp = \left( \posd[W]_C \cdot \hat{\mathbf{n}} \right) \hat{\mathbf{n}}$ \State $\vellin^\parallel = \posd[W]_C - \vellin^\perp$ - \State $\forcelin_\parallel = -\sqrt{\delta}(K {}^W \mathbf{m} + D \vellin^\parallel)$ + \State $\forcelin_\parallel = -\sqrt{\delta}(K \, {}^W \mathbf{m} + D \vellin^\parallel)$ \Comment{Compute sticking force} \State ${}^W \dot{\mathbf{m}} = \vellin_\parallel$ @@ -660,14 +654,12 @@ \subsection{Soft-contacts Algorithm} \If{$\norm{\forcelin_\parallel} > f_{cone}$} \Comment{Compute slipping force} \State $\forcelin_\parallel = \left(f_{cone} / \norm{\forcelin_\parallel}\right) \forcelin_\parallel$ - \State ${}^W \dot{\mathbf{m}} = -(\forcelin_\parallel + K \sqrt{\delta} {}^W \mathbf{m}) / (D \sqrt{\delta})$ + \State ${}^W \dot{\mathbf{m}} = -(\forcelin_\parallel + K \sqrt{\delta} \, {}^W \mathbf{m}) / (D \sqrt{\delta})$ \EndIf - % \EndIf \EndIf \LComment{Compute 6D contact force in the world frame} \State $\forcesix[W]_{cp} = \transfor[W]^C [(\forcelin_\perp + \forcelin_\parallel)^\top, \zeros^\top_3]^\top$ -% \EndIf \State \textbf{outputs} $\left(\forcesix[W]_{cp}, {}^W \dot{\mathbf{m}}\right)$ @@ -718,7 +710,9 @@ \subsection{Features} % \end{remark*} -\begin{sidewaystable} +\begin{landscape} +% \begin{sidewaystable} +\begin{table} \centering \small \caption{Comparison of modern physics engines similar to \jaxsim. ${[*]}$ \jaxsim is developed with a differentiable framework, but this functionality has to be finalised.} @@ -741,7 +735,9 @@ \subsection{Features} \textbf{\jaxsim} & Python & Reduced & \ck & \ck & \ck & $[*]$ & \ck & & \ck & \ck \\ \bottomrule \end{tblr} -\end{sidewaystable} +\end{table} +% \end{sidewaystable} +\end{landscape} \subsection{Benchmarks} @@ -909,7 +905,260 @@ \subsubsection{Scalability} \label{fig:jaxsim_benchmark_parallel} \end{figure} -\pagebreak +\newpage +\section{Validation} +\label{sec:jaxsim_validation} + +In this section, we perform a validation of \jaxsim for generating synthetic data for robot learning. +We develop an environment exposing the \texttt{gym.Env} interface with \jaxsim, and show the sampling performance that can be reached by stepping a large number of parallel environments on a laptop \ac{GPU}. +We then plug the vectorized environment in a \ac{RL} pipeline for training a policy with the \ac{PPO} algorithm. +Finally, for presenting evidence that the data generated through the methods proposed in this thesis can be used in an out-of-distribution setting, we execute and evaluate the policy on a comparable dynamics simulated this time with Mujoco. + +The out-of-distribution validation is also known in the literature as sim-to-sim~\parencite{salvato_crossing_2021, muratore_robot_2022, bellegarda_robust_2021, du_auto-tuned_2021}. +Given that one of the assumptions for an effective transfer is the availability of a robust policy trained in the original setting, we consider as target task to learn the swing-up of an underactuated cartpole. +This task is similar to the canonical benchmark of cartpole balancing~\parencite{brockman_openai_2016}, in which the pole starts from an almost balanced configuration and the actions space is discrete (selecting either a positive of negative constant force to apply to the cart). +This cartpole balancing, however, is excessively simple, and it is not really representative of the typical problems in robotics, usually characterised by continuous action spaces. +The swing-up task makes the policy learning much more difficult by starting the episodes with an arbitrary pole position (also pointing down). +This diversity makes a big difference since it requires the policy to be considerably long-sighted, to the extent to learn to perform some initial swing to build up momentum before attempting to perform a proper balancing. + +All experiments presented in this section have been executed on a laptop whose specifications are reported in Table~\ref{tab:laptop_specifications_validation}. + +\begin{table} + \small + \centering + \begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, + } + \toprule + Specification & Value \\ + \midrule + Intel CPU & i7-10750H \\ + Nvidia GPU & GeForce GTX 1650 Ti Mobile \\ + CUDA cores & 1024 \\ + Operating system & Ubuntu 22.04 \\ + Nvidia driver & 530.41.03 \\ + CUDA & 11.2.2 \\ + cuDNN & 8.8.0.121 \\ + \jax & 0.3.15 \\ + Mujoco & 2.3.5 \\ + \bottomrule + \end{tblr} + \caption{Specifications of the machine used to execute the validation experiments.} + \label{tab:laptop_specifications_validation} +\end{table} + +\subsection{Sampling Experience for Robot Learning} + +The cartpole is a fixed-base model composed by a pole (shaped as a long and thin cylinder) connected through an un-actuated revolute joint to a cart (shaped as a box). +The cart can move along a track, whose displacement is simulated as a prismatic joint. +The linear force corresponding to this prismatic joint is the control input of the system. +Figure~\ref{fig:cartpole_swingup} reports a visualisation of the cartpole model. + +\begin{figure} + \centering + \includegraphics[width=0.85\textwidth]{images/contributions/chapter_8/cartpole.png} + \caption{Illustration of the cartpole model in the $\theta = d = 0$ configuration.} + \label{fig:cartpole_swingup} +\end{figure} + +\begin{table} + \small + \centering + \begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, + } + \toprule + Property & Value \\ + \midrule + Integrator & Semi-implicit Euler \\ + Integrator step & $0.0005\,$~s \\ + Environment step & $0.050\,$~s \\ + Control frequency & $20\,$Hz \\ + Action dimension & 1 \\ + Observation dimension & 4 \\ + Action space & $[-50,\, 50]\,$~N \\ + {Maximum episode steps} & $200$ \\ + Parallel environments & $512$ \\ + Equivalent \ac{RTF} & $24.38$ \\ + \bottomrule + \end{tblr} + \caption{Properties of the environment implementing the cartpole swing-up task.} + \label{tab:environment_properties_cartpole} +\end{table} + +\begin{table} + \small + \centering + \begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, + } + \toprule + Parameter & Value \\ + \midrule + Discount rate $\gamma$ & $0.95$ \\ + Clip parameter $\epsilon$ & $0.1$ \\ + Target \acs{KL} divergence & $0.025$ \\ + Learning rate $\alpha$ & $0.0003$ \\ + \acs{GAE} parameter $\lambda$ & $0.9$ \\ + Batch size & $2560$ \\ + Minibatch size & $256$ \\ + Number of \small{SGD} epochs & $10$ \\ + \bottomrule + \end{tblr} + \caption{PPO parameters for the the cartpole swing-up environment.} + \label{tab:ppo_parameters_cartpole} +\end{table} + +\begin{description} +% +\item[Observation] +The observation of the system is composed of the position and velocity of both joints. +If $\theta \in [-\pi,\, \pi]\,$rad is the angle of the un-actuated revolute joint (where $\theta = 0$ is a balanced pole), $d \in [-2.5,\, 2.5]\,$m the displacement of the prismatic joint, and $\omega = \dot{\theta}$ the pole velocity, we can define the observation as $\mathbf{o} = ( d, \dot{d}, \theta, \omega) \in \realn^4$. +% +\item[Action] +The action of the system is the linear force $a = f \in [-50,\, 50]\,\text{N}$ corresponding to the joint moving the cart. +It's worth noting that the action is continuous and its magnitude is not large enough to accelerate the cart and bring the pole in vertical position without swinging first. +% +\item[Environment details] +The environment simulates the physics with \jaxsim. +We use a semi-implicit Euler integrator with a step of $500\,\mu$s. +After the agent sets an action through the exposed \texttt{gym.Env} interface, we step the environment for $0.050\,$s, therefore performing $100$ physics steps each time. +The resulting control frequency, that is the update rate at which the policy applies its actions, results equal to $20\,$Hz. +The environment is implemented as a continuous control task with termination only occurring if the observation is outside its space. +In absence of termination, episodes are truncated after $200$ steps. +All environment properties are reported in Table~\ref{tab:environment_properties_cartpole}. +% +\item[Reward] +The reward function used to learn the swing-up task is the following: +% +\begin{align*} + &r_t(s_t, a_t, s_{t+1}) = \\ + &\quad r_{\text{alive}} + r_{\text{balance}} - 0.001 \, c_{\text{action}} - 0.1 \, c_{\text{vel}} - 0.5 \, c_{\text{displacement}} + , +\end{align*} +% +where $r_{\text{alive}}$ is set to $1.0$ when the environment is not terminal and $0$ otherwise, $r_{\text{balance}} = \cos{\theta}$ rewards the pole to be in a balanced state (characterised by $\theta=0)$, $c_{\text{action}} = \lVert \tau \rVert$ penalises large actuated forces, $c_{\text{vel}} = \lVert \Shaped \rVert$ penalises large joint velocities, and $c_{\text{displacement}} = \abs{d}$ penalises the cart to be far from the middle point of the track. +% +\item[Agent Networks] +We want to train a policy with \ac{RL} capable of bringing the pole to a balancing position from any state belonging to the observation space, and maintain the balance as long as possible. +The agent is composed of two neural networks corresponding to the actor --the policy-- and the critic --the return--, having two hidden layers with $512$ neurons each with {\small ReLU} activation function. +The input layer of both networks has a size of $4$ (the observation dimension). +The critic network has an output layer with just one dimension corresponding to the return, correctly bootstrapped from the value function is case of episode truncation. +The actor network encodes a univariate Gaussian distribution (the environment action is a scalar), therefore it has one output corresponding to the distribution mean and has a free parameter part of the optimisation parameters corresponding to the logarithm of the distribution's variance. +The variance is initialised with the value of $\log \sigma = \log(0.05)$. +The neural networks are optimised with Adam~\parencite{kingma_adam_2017} using a learning rate $\alpha = 0.0003$. +% +\item[Algorithm] +For the same reasons explained in Chapter~\ref{ch:learning_from_scratch}, we train the policy with the \ac{PPO} algorithm introduced in Section~\ref{sec:ppo}, configured with the clip parameter $\epsilon=0.1$. +We estimate the return from the advantage computed with \ac{GAE}, introduced in Section~\ref{sec:gae} as $R_t = \hat{A}^\text{GAE}_t + V_t$, configured with $\lambda=0.9$ and a discount rate $\gamma = 0.95$. +We use the \ac{PPO} implementation of \texttt{stable-baselines3}~\parencite{raffin_stable-baselines3_2021} that, instead of using a \ac{KL} penalty, stops the training epochs when the approximated \ac{KL} divergence exceeds a given value. +All the \ac{PPO} parameters are reported in Table~\ref{tab:ppo_parameters_cartpole}. +% +\item[Sampling] +We optimise the policy by acquiring five samples from $512$ parallelised environments running on \ac{GPU}, resulting to an equivalent \ac{RTF} of about $25$ on the machine used to run the experiments. +The number of environments has been selected by choosing the best equivalent \ac{RTF} of the \ac{JIT}-compiled vectorized \texttt{gym.Env.step} through a grid-search in the $N_\text{envs} = 2^p,\, p \in \{1,\, 2,\, 3,\, \dots, 12\}$ range. +The collected batch of trajectories, containing a total of $2560$ samples and equivalent to about 2 minutes of experience, is then split in $10$ minibatches of $256$ samples. +We perform $10$ optimisation epochs on the same batch of trajectories, that can be interrupted earlier in case the approximated \ac{KL} divergence \wrt the distribution corresponding to the previous policy exceeds $0.025$. +Before optimising the \acp{NN} of the agent, the collected observations and rewards are normalized by computing a running mean and standard variance. +Finally, in order to obtain a more robust policy, we inject some Gaussian noise with zero mean and $\sigma=0.05$ to the actions before being applied to the environment. +% +\end{description} + +\begin{figure} + \centering + \resizebox{0.75\textwidth}{!}{ + \includegraphics{images/contributions/chapter_8/learning_curves.tikz}} + \caption{Learning curves of the cartpole swing-up task. The plot reports the mean and standard deviation of the average rewards $\hat{r}_t^{(k)}$ computed over $k=10$ different training executions. For each individual training, the average reward $\hat{r}_t$ in the considered parallel setting is computed by averaging at each time step the $512$ rewards received from the vectorized \texttt{gym.Env.step}.} + \label{fig:learning_curve_cartpole_swingup} +\end{figure} + +\newpage +Figure~\ref{fig:learning_curve_cartpole_swingup} shows the learning curve of 10 trainings initialised with different seeds. +The policy is able to learn effective swing-up and balancing behaviours in $500000$ steps, corresponding to about $7$ hours of equivalent experience. +On the machine used to run this experiment, each training runs for approximately $15$ minutes. +The reward grows mostly monotonically with a limited variance, showing that the chosen parameters ensure stable policy updates, preventing optimisation steps too large that would destroy the previously obtained performance. + +\subsection{Sim-to-sim Policy Transfer} + +\begin{table} + \small + \centering + \begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, + } + \toprule + Property & Value \\ + \midrule + \texttt{timestep} & $0.001$ \\ + \texttt{integrator} & \texttt{RK4} \\ + \texttt{solver} & \texttt{Newton} \\ + \texttt{iterations} & $100$ \\ + \texttt{contype} & $0$ \\ + \bottomrule + \end{tblr} + \caption{Mujoco properties used for the sim-to-sim evaluation of the trained cartpole swing-up policy. Refer to the official documentation at \url{https://mujoco.readthedocs.io} for a detailed explanation of the options.} + \label{tab:mujoco_parameters} +\end{table} + +In this section, we attempt to evaluate the trained policy in an out-of-distribution setting. +This setting could represent any environment that differs from the one where the policy has been trained. +It serves as evidence that it's possible to deploy a policy obtained from training over synthetic data generated efficiently by a parallel simulator into an equivalent environment running on a different runtime. +In particular, this experiment can be seen as a sim-to-sim policy transfer. + +Similarly to Section~\ref{sec:sliding_box_inclined_plane}, we use the Mujoco simulator as out-of-distribution setting. +We translated the \ac{URDF} of the cartpole model loaded in \jaxsim in the training environment to an equivalent \ac{MJCF} that can be imported in Mujoco. +Then, we included in the same file the configuration of the physics engine, whose parameters are reported in Table~\ref{tab:mujoco_parameters}. +The chosen parameters expose a simulation characterised by the same control rate ($20\,$Hz), but in this case the physics is simulated in a different simulator using an integrator of a different family and different constraint solver. + +The Mujoco environment is only used for producing the state-action trajectory $\tau$ from a given initial observation $\mathbf{o}_0$, where the action is obtained by performing inference of the trained policy. +In order to perform a proper exploitation of the policy, we sample deterministically by taking the inferred mean of the Gaussian distribution described by the policy, \ie $a_t = \mu_{\boldsymbol{\theta}}(\mathbf{o}_t)$. + +The first evaluation we perform in this setting is a comparison between the swing-up trajectory obtained by running the policy in the training environment simulated with \jaxsim and the out-of-distribution environment simulated with Mujoco. +In order to obtain a meaningful comparison, we consider as initial observation the cart resting in the center of its track and the pole pointing down, both with zero velocity, \ie $\mathbf{o}_0 = (0, 0, \pi, 0)$. +The two trajectories are shown in Figure~\ref{fig:cartpole_trajectory_jaxsim_vs_mujoco}, where it can be noticed that the policy performance are comparable in both simulators. +The policy is able to succeed in the swing-up task and the resulting trajectories in both simulators are almost identical. + +As second evaluation, starting from the same initial observation $\mathbf{o}_0$ considered in the first evaluation, we assess the policy swing-up performance on different variations of the cartpole environment simulated with the out-of-distribution Mujoco. +In particular, we modify some physical parameters and evaluate whether the learned policy is robust enough to succeed in the task. +In the first variation, we double the mass of both the cart and the pole, taking care to compute the new $3\times 3$ inertia matrices corresponding to the primitive shape of the bodies. +In the second variation, in addition to the increased masses, we include joint damping, \ie a frictional force proportional to the joint velocity that opposes the motion direction. +For a revolute joint, its contribution is $\tau_\text{damping} = - k_v \dot{\theta}$. +Figure~\ref{fig:cartpole_mujoco_change_parameters} reports the curves simulated in the out-of-distribution Mujoco simulator using out-of-distribution model parameters. +Despite the out-of-distribution environment, it can be seen that the policy learned in a highly parallel setting simulated with \jaxsim on \ac{GPU} succeeds in swing-up task. +As it can be expected, the performance are affected by the change of parameters. +For example, the policy is able to reach the balancing state by using only one swing in the setting with nominal parameters. +Instead, in the two variations, the policy needs two swings. + +It's worth noting that the policy has been trained only using the nominal parameters. +Often, in order to obtain more robust policies, the inertial parameters of the simulated model become part of domain randomization. +In our case, we did not randomize any parameter. + +\begin{figure} + \centering + \resizebox{0.99\textwidth}{!}{ + \includegraphics{images/contributions/chapter_8/cartpole_jaxsim_vs_mujoco} + } + \caption{Sim-to-sim comparison of the trajectories obtained by exploiting the swing-up policy learned in a \jaxsim environment. The \jaxsim curves correspond to an in-distribution setting, where the policy is evaluated in the same simulator that generated training data. Instead, the Mujoco curves correspond to an out-of-distribution setting, where the policy is evaluate in a simulator different from the one that generated training data. Note that $\theta$, due to its range, is projected in the $[-\pi,\, \pi]$ range.} + \label{fig:cartpole_trajectory_jaxsim_vs_mujoco} +\end{figure} + +\begin{figure} + \centering + \resizebox{0.99\textwidth}{!}{ + \includegraphics{images/contributions/chapter_8/cartpole_benchmark_mujoco} + } + \caption{Trajectories of the cartpole swing-up policy acting on the out-of-distribution environment simulated in Mujoco. The \emph{nominal} curves are obtained by running the policy on a cartpole model having nominal masses of both the cart and the pole, and with no joint damping. The \emph{mass} curves show the obtained trajectories with the model having the masses of both bodies multiplied by $2x$. The \emph{mass+damping} curves are generated in a setting that extends the \emph{mass} one by also considering for both joints a damping with $k_v=0.015$. Note that in this case, we removed the bounds of the pole angle, showing more clearly the number of swings used by the policy to reach the balancing state.} + \label{fig:cartpole_mujoco_change_parameters} +\end{figure} + \section{Conclusions} In this chapter, we proposed \jaxsim, a new physics engine capable of executing multibody simulations on modern hardware accelerators. @@ -923,11 +1172,16 @@ \section{Conclusions} Applications requiring sampling experience with high throughput such as those characterising \ac{RL} research, would benefit the most from these performances. Furthermore, generating physics data directly in the same device where function approximators are optimised would remove any overhead related to the data transfer from the \ac{CPU}. We intend to investigate these directions in future activities. +Finally, we trained a policy to solve a continuous control problem simulated with \jaxsim, providing details on the parallelization level that can be reached for sampling synthetic data in a characteristic robot learning application. +Then, to provide evidence that it's possible to deploy policies obtained from training by sampling from highly parallel simulators, we performed a sim-to-sim transfer and evaluated the policy performance on a out-of-distribution simulator and physical parameters. +The obtained \ac{RL} setting is much simpler compared to one adopted in Chapter~\ref{ch:learning_from_scratch}. +The possibility to parallelize sampling on hardware accelerators removes the need to rely on distributed settings running on a cluster of machines, that is difficult to create, maintain, and handle. +A single-process application can deploy both sampling and \acp{NN} on the same \ac{GPU}, with no overhead related to network transport. +Beyond being easier to deploy in a {\small HPC} setting, this approach may lead to a faster development and prototyping. \jaxsim still presents multiple limitations in this first version. -Firstly, the shown performances were obtained in a setting where no exteroception was necessary. -Integrating basic rendering functionality would surely affect performance. +Firstly, the shown performance were obtained in a setting where no exteroception was necessary. +Integrating basic rendering functionality would surely negatively affect them. Furthermore, collisions between links are not yet supported, limiting the adoption for robot manipulation. Finally, it does not yet allow enforcing additional dynamic constraints like joint limits, and closed kinematic chains. - To conclude, although the automatic differentiation capability provided by the \jax framework has not yet been thoroughly tested with the \jaxsim algorithms, its combination with the smooth dynamics given by the contact-aware state-space representation opens up exciting research directions that will be explored in the future. diff --git a/Chapters/epilogue.tex b/Chapters/epilogue.tex index 33dcf8f..ee76698 100644 --- a/Chapters/epilogue.tex +++ b/Chapters/epilogue.tex @@ -125,10 +125,11 @@ \subsection*{\autoref{ch:scaling_rigid_body_simulations}: Scaling Rigid Body Sim To conclude the discussion on the features related to \jax, our algorithms are not yet compatible with its \ac{AD} capability. The activities to assess the support and implement \ac{AD} support are ongoing, and we expect they will enable us to start investigating all the new emerging methodologies involving differentiable simulations. -Other activities planned for the near future involve the \ac{RL} stack built over \jaxsim. -The combination of an environment interfacing with \jaxsim and \ac{RL} algorithms implemented in \jax would result in a single application whose data never leaves the hardware accelerator. +Other activities planned for the near future involve enhancing the \ac{RL} stack built over \jaxsim. +The combination of an environment interfacing with \jaxsim and \ac{RL} algorithms implemented in \jax results in a single application whose data never leaves the hardware accelerator. Therefore, beyond the sampling performance of parallel simulations, the complete pipeline would also prevent the data transfer overhead that is always present when some computation has to happen on \acp{CPU}. -We already implemented a \jax version of \ac{PPO} and tested on the canonical examples of inverted pendulum and cartpole swing-up, but the results are too preliminary and have not been included in this thesis. +In Section~\ref{sec:jaxsim_validation}, we provided a continuous control validation by sampling from a cartpole environment simulated entirely on \ac{GPU}. +However, we used an existing \ac{PPO} implementation not developed in \jax, therefore it was not possible to compile in \ac{JIT} the entire collection of the batch but only an individual parallelized sample. Future work will continue this activity, extending the investigation to contact-rich locomotion problems. Finally, we would like to embed these environments in Gym-Ignition, creating a new \jaxsim \scenario component, so that all the benefits of future real-time backends could be applicable on \jaxsim experiments. Towards this goal, Gym-Ignition should switch to the upcoming functional version of \verb|gym.Env| that has been recently proposed upstream. diff --git a/FrontBackmatter/contents.tex b/FrontBackmatter/contents.tex index 06ca8dd..2a3449a 100644 --- a/FrontBackmatter/contents.tex +++ b/FrontBackmatter/contents.tex @@ -210,7 +210,7 @@ \nomenclature[L, 17]{$\langle \mathcal{S}, \mathcal{A}, \mathcal{R}, \mathcal{P}, \mathcal{\rho}_0 \rangle$}{Tuple defining a Markov Decision Process} \nomenclature[L, 18]{$V^\pi(s)$}{State-value function for policy $\pi$ at state $s$} \nomenclature[L, 18]{$Q^\pi(s, a)$}{Action-value function for policy $\pi$ at state-action pair $(s, a)$} - \nomenclature[L, 19]{$A^\pi(s, a)$}{Advantage function for policy $pi$ at state-action pair $(s, a)$} + \nomenclature[L, 19]{$A^\pi(s, a)$}{Advantage function for policy $\pi$ at state-action pair $(s, a)$} \nomenclature[L, 20]{$\mathbb{E}[\cdot]$}{Expected value of a random variable} \nomenclature[L, 21]{$\hat{\mathbb{E}}[\cdot]$}{Empirical average estimating the expected value of a random variable from samples} diff --git a/classicthesis-config.tex b/classicthesis-config.tex index 646fc92..cda2f9b 100644 --- a/classicthesis-config.tex +++ b/classicthesis-config.tex @@ -128,6 +128,7 @@ % 4. Setup floats: tables, (sub)figures, and captions % **************************************************************************************************** +\usepackage{pdflscape} \usepackage{rotating} \usepackage{tabularx} % better tables \setlength{\extrarowheight}{3pt} % increase table row height diff --git a/images/contributions/chapter_8/cartpole.png b/images/contributions/chapter_8/cartpole.png new file mode 100644 index 0000000000000000000000000000000000000000..f19570a31aff5bf19357a61f161c90b63cf96e28 GIT binary patch literal 18045 zcmd43g;&&3+cru`cMC`i-616sf(+f=4N6G}(jm>z-J(MdF-UiVl!(9p0@6~0bO{KY zA3oo+-uHa#`~k^=(sV zMtN)xyhP;)V&|#_U>)qqj{?ANhHp6oll*^)Zh(dX)z|8V5t~W=eP0Z*O@jXSR}WMC z+X)8*n+Y!azXUx%51>uDbm)QG0}G8(9xu4#v{IT zrJisS+3&VdX=I7Y46M{_M8*WgBb5?*NutZ|9T*J^SUIE(l35p3$L{kH6(t@Jt$K;YSc-IjU4$hJ9?XDo z()>R^Z}-Z^5U*3xr2zsmYW%0#f1<$6+9Q)Q3wuye238YM=zoRSLX=zcUvXjLNs`K$ zg#%;$$C#*jfvF$V0x0@@XjzyGDYHU>%z z6kC=!f4*sMufUJTvAx)Jx~Z~_K`u9}^FN}p3l&ckEYLtj!NS5q2@UQ}vZ#sP-*-K` zxkb#e3g(S}a}mo?+qx$TL&?j_i>)afcX%t9=_7+Y^#dC(H2F1Ga@{FCH$r{T3Ls-# zsA4z^X+vvkY(fHoi<{eaa7~r6WyGwVP)@f@sN7*;WXv-XZO6kx!;am>FCLQj!J)RF zIM8b2eRYAEmQ$PT&k_I)ys&N{&sp7E0geG|C%C)tF#%Uwk}%WR-_r@h;QOi|o*R7d zqwtt&V8HnJ_|>HyuKxECX;D#8(!o)o zx9;v*H_~irI8psfnBngQ1wTP5M(o0Ny&%0-$$w0Lk+{8RI?4k?<1xE;?nLIMyhuqh zeSiM6uy@6xBZfb4-&xgC6Pro>yQ6h+*`VEtQm*gxUrFUd7#Vm{5)G9{feF&l?UJS# z1B1d>4ks*axgrt8)Kko6r6fd#ZF+eq*!}K&jB$aPP?B0; z9~L#uj0Yz%V&3idr+s+YW*sphf7au2=F)Z}@f;Vg>4KOevOnqeac~x}usNoYwdV$t zJ#BJKN^WBiX=4SkIxgBLZN|Pp*`2JZNQ1O3_4f8kLIvpvpIVx}n<7O;{b(RdJ|qY% z?@<33Sl-U;1@ZFB2J39GB7~Ikr83LD&KfB%8{lO*1shkwI zT|KwQoBteqY5V+I;gFWe;dQmZsE{)HId*zmWb$uS_48QRyxoH0N)}D+NP8hD4HD>v zgtSZ6jNRP5Xv;?b%&0etThlO?Fk~yQ9kJ$aly`)d$n3ju+;VY?vL#P~^0%zu$=~j3 z-oL?N0;$-GGnIzihp!$71OzC6opU+Tygmo|=uee8))*S*jrz5^Y;q3LGLIY4G#~$YPE4nP2K+$$Uqp49k!~r(DeI;-CBnnb)HwY zPe8b-(0;%p|4(N+>;4TLKBl<1c(!G`H<;+XEs1D>hHgoQqi>8(Hsb7qbt2iBBRlX{j3wT*(_qq0p zZAgfW_jAUOblybUuNnj0*KL4N&$}2@x)W!X^r?XgjiH_1Le%&I1sWt#*6Xd=VpD0` z;$S0{wjboiDPpakNClg|@_qgt#THv^;)bSMwm%rBm)m7$QS(IaU8r&%&)F@^g5~iK z#QtfQ+mxs=C%x6Ac(o2&XPVsygJ}e>YCn$+1)2?^OZR-S6hcC(f&LGux(uiHg)}4@gv-vlYH0dUmi_1&r zxIaiB{7t@&7|Y@p6LK8&o0q@Vl{PK7d*N;AG~k!BO~64PJ8Q{(O61PtIi~Q3I*2B?hcLkGM-rm^9Lepg$4IQqD4x?RgG%%*3SLAU0L>#)#ACxzp4TMz}q zV%>gb2||r|y&Oj{lhtUUPR3Y}Ta3-VXkMA6VA`__WjKDA!uUw5*;jR;N#$(3O~C2F zK6R<&ym9&G%RXFJI_I=^X^QBiy$S(?EUs-G2T8^{+(=SNm(6)i2OU5l#<230!@0II ziM^6qkw+KbT@-hWsb+4Ir8`~b0dRP}Lgy)j8 zibf=3ceNQ-?B4J)N2wi7>ou@(f90XlPMxcCYJ0kusF2XX3Le$VROj(4LMfVj&DWRv z5eK-HLK^DBolQ;uP_g$7pCLa>{@&jg$#g|X5b>@?~0cD(8E+*H{GW67oLDN{1B zxEh{+li8Kf9{R|L?W7r3K~hkPk;-L+Y+@aj&MP;~YMfw;$id)U7l#{Cvx39*^bQm6 zd}mkuDpbUHBka1>6(bS16;WHInm^etU!;;`U4 zatagUMRwj2EMx-iNbtG+xw@}Ys=SfzC+G8TRk0IXHyP{k-qAH_7kDgG&Q+-gYK!zr zAGA&srXoO~{k_6lo-Utb(A~UYmrzYV%VL)DEjbwTXX&63brcGcfn6^bg4XtVr2+=$ z3R6$A%TtRjEjaZjU$iq3-ftvrB_(ns>#m^}B(sF?-Wrf9D3hd?f}R?aFx@Jey%VA$ zv8afBEN&iLspIuxg8M5W5fPQ7q^F$l;}S<5eAIFMvROw7c{r%BRhbDhU&9K_{McST z_wVibL@=ey&xO8~n{!K<4R#LymwId>MF#N`{tVNcIUEq;DC?q0+EwAOX*;CW1-Avf zbns*GC;iE#f#AKCPbO?X>Uc1{1Ac^_h&{EZWB}qNO$1DWeT!0IOl``Wzq=`gCM}44 z=1O_dEkjvu#Xy7{z03z4%pSv4Ia>TC&BVO@U^AqqGVjfwneKejN*l<=xOHH*SC^E{<~JM*+AolMcelxk-a z5SMo|`jn%@cWuTxC7n%3@9ZY?3IP>T40Q;JshrY9p`KOkh+an5VF%S) zeQUNsV$@$2rH&@qFf`$$_fzR3?Yk!OT7EnFgYD2(iNI`k7qzXk%wQ(m&pVy(X=)+zq6;W{~X6Q_e zJ2@DSqTMz7FwbPX)Mh$S`%6dX39_<=9(Hb-O9~Q=BzAYQn8lM2y|oo-t&(=U7u7%% z-_GSkYtojpr;SLeXFG`UodnB~SeN3~HO$SieuZ-M-Q9JGXp@6Ux=wz+Z8?qAT(vYI zR%i;8hW}o1P$3X^y%C}#}RbOewU_4{nQ9rP(o=9)F z&u|OepjAZ1ML@$nX5ZRq8Ax%HrZ^!LYoAc)Bz+pOKUcnS^Mu(i2nrh)m?&PX0LYB< zTZ;?IoU`dJX7F?$zEYR;4CL1sd`0rmF#Yy6DDX6r>Zqvjrq%CB5nF8|NIe;$kv7P4 ze!FN?QhH!{zVc(?-cG33N+5eBzlCSgGk3p4*ooQwSTTrN;fQ^+$+Z;fwf)G+d6 z*p5hC-PZ(fh#<-->wXXHRp|)GFBR(IZ`=t3S3QtkH5QwG_tt(C)wTm zyEZj|^6kgR0G0h(DveP}^T7F>CaVojO_5M2)v?td$X@m$G9tHAj}r_s?o4oR9M{Y7 z$8S1NLTr52j#Dbz&7@b35vV++J~11z5e_+T4xwG?^d;%OD&e<(Q=?nbq%#>~JY`!d z>EL(WHer~ARw>8?>VEdUweGY7rX>HszaowIch#nT3{%YaURE4^Al_Yh8 zqb^-~UrKM3r+V&@@l>o{Gc_5DFwIbHqh3a^hsca7&#t(OBJ7uV?wj2dc`_&m(Z;dw z!S+qk^XI(?(mujF%_A`gICvq;@KCoi$~qZLMqWY|`Lt-F30;NFLWy z&M&Dt**)`ItLm|WGwinYlqry(k?7-8&Ik`MIqEAZk%z!UfY&$NbZlbp<1@E2+9L$R zeSB(vQh5?=ACYs{5raAp=d>mtUn9BQ{ zGK)V|V$f(I0A&zaq(@<^+(OEV_LCH>tghyY!z@zpigppxrB_d-x=LWWIRW~5`I6h2 zN09d`Vq330#-z6u$GUVQRix)#uDVpFgn`M#9(5~mtcPqmY-l5ux~_FVde_&5uh|f% z-+j%<>x6I^Y+z$gFmRIZ4;Nlt;PGOZ*B0E=#9#z zcmL*(aDPFzk!hF)oj+tFRu+Zg$fNhL9JS=B*V=TT!%fm#hw5PbPb2c5G}$u&$G>xGFYD^ji;Hre*$!f z|5LsBY?y|lWJX@6ro9c2`)S&Uoi#MW_5PIq{^YSWV!-EhJw=R$iSYQ=al8UEJDmUA z*Ci<$d0cLryuYIMBs{9$dfm;SOhDD2p2N*i$-k>=sH%#=ar0t-5wDNK+3M@tvPwQ{ zO`UmQ*g?q>&7-9J(~T#u&Zgk|@`rW)yLb}IgLhQPbi*hAgu_m&;$AcRZdV(4>e=H5n*Ow|x!g)}G z?u+w^scX^e6*fRNEg$*Gr@9%ue+%fWYD={=l$=R8hCAt_znOgjo28+_$85y;Y`?RT zhG&B&|-X54%s&hlC zfaka{?@+PmF1m&x*)I|+KdkkR@DFz_F+?bM)z6T`R(?tlqgH{P7GFiomdAkElY{+q z>UP z-%v3!5?~y9t+dRjTN~0XN_I2s{30{BA`nOB3m=Uy4n+2ktLF9y$jw_8Jv@?0%uJ$c z)}b6|iPJB~u21HP75E<4Jkpm(8;_~&XzaBkRShUm7R;ntyq!sP7xXVqENj{mw11N2 z+~*1t@D$`?iBWoM&zfU3DyFyf&3pa@tdvauWVxD>!U}wA$w&SQ;W7RfwqC&jfnr-z z^GU(8G;|!*ahbE?=eX2w;!hI!g)ZX6D@@}z{rIrMBJ~QZW7-od#okF{v-BJ|iDo?Q zQ#v3vGghf#Fx7Dr&KgKY(t65D;u??btK)s!6;+yn%(^ zW(ITClx^rkOfaK-8M??Wr4_;p?LaqLVB{E6wI{AH4c+wXgp5IzI)wam_4y0r<8w(B z`1CtwQ_`#B*3o) z4EwQj_4?-KS5lt2k^)tgRsBU@P@7L9=zt65nM55ot+-p(G9(oF_?YF zrH)s$v}FE@ON~8cR$`FC4B4F(5ibG{j)wauMS#%08RU|dcNx8&Z{m)Qj#gBU8qKvZ zu*96xweOkNH-m#~P<#wumzP7wBLxC!z|4YRLF=AEjj~vuTw}D~At86Ce|l#rrEk{b z8f!v%a<}T=k4ZCIOVU5rhO!bc45SckK%FDK-`AH0;Ft}&xvIohcU&VY6_wGAmV4QE zJ4W{j7-@BoJZ|C>IDAV zm~z*Ia#M$4u6w!zS{aL4eqbWccHHa6_=g3^RP-$j;bxpcRX`lIssm`f#ZUpyMv9ok zaT^E_5hcEv95PI(4**eZV9$M2~`Wsh?3|KK0k%8Rz@cuiR=P)L0+1^!U zF&NEy!R6mp1*dRl`)jry@ST&pE!SwjXcq6-QRoazo&#~}=%^3xc2%NoUBG3*$-EfkLJ99nHH$0|MmcotMlsHeNNgMjDZR_YYh{?7o<<3dSj$c z9{>bEp?9pE^eBXcgj;N`y-}z#ag5kocq3I8DyC6MFd-|C!A$-k{6D&JO5k(Sc^)b) zFM$*&^fq_}rb99&@Py*eM!}PgTEM;xZ3l*jcJNlSCX|jwe(?=cubVcEQ5~jye8q(5 zQOKcp_J)UgX;5EfbD^guEoQ}iEd%oEE^=o?N~}bU@eHd>mT&W!>!?Dj#lpPo0}pLk zByckB%ll{RDQTc83^f?S+*6q0%%#2{JbK-oPpohPZe7l0#*sSz6&^!o7j|&&Z-E0k ze>2jUAuz_l1woc7Ical!e%0CZ+UdAP=JHttfTs$BH@WeK?58F+^Nvl9K7Dc_?eg=% zC`1;(%JJ+1ufMro8)aGRQ(xtBp-&k)UxjK8KK2ewAVochg@Tk^KZCThB%yoVL0eCO z3!f;@MwbJ&C0>Y=7iVB|pPG;!l7nSO-GrQR7b2_FKI-XACW+xNKK>P9krRw4{M%ZvZR<;t#6< z-a%h!W6)`S^Hz6%fKdjr)_Uon>KCJ#5T$^Q(F!oisSp>^=kTNgOkt_()U%s?_!7uS z0~6>`-~8;Uc|h?cy!sjeF7tzMOAJqsOaIPCvH*Jx$HOfgpj&-MD8?_X7>ja`yRwTY zAiXLAP7nl~KtsK6n03Q@Qir6^OTt|IQoZR8*1j28!zt2hHx4g7e&&zygF?wUHGn@Z zxHc|3xsB<%P7+6Z^a2_1tFvW~&~k5AXoXFL%86uxL(%BJwE+5HJ_sXrhh%V1h&Bcf zT&^-a8s%s8{!*HA#Sf6{&rzC?k{z?z0kcxQh-6{<%EeqI-fs?1Bdo;>u>x~HHa9nu z0OV-KZ!+L9$t@}4YF~Z zctLYh!TW^0sL1Y;_1XqHWXp(MzlR8Us^T^YQl%-%KfI`htm zZYcZVL9SQ>4S2rGq4LcAUgv#W5jGr+yQo|~NE1X!-c?2vI<0c*9k+MW*mZNfojkhU zHMMu{?(6#+yqW=%D~$L`f%G^np-(yhd27eG>#wc6&qxT^v>>-|OZnyVH+tLwI6aVQJdv@jJ zxEAXGkAr2|Oyaw`h!O-^raW`O{lgCAJBa>(D{g*IA3d8#At zRnl0m6tMJKIQM?Vf#O=|54}`s(-TkW*cU=e<)U+*K|%VnSEt+C?3a0NU9Cs_D1VQl zWz7DVt|$a)7_sH5#qeZR=x(dimgcP1kAB!DF;Q2juhS}$m!`;3vyr4{PP9D)Yp0Ra z$SlwIMo&mZPs~|$y?ObI@TmPm3S^=Zf$`Yq^5TN>3pcI)PNf5TphYWN`P(+-_vu{m zxkfgHtRF&c-#5CngFvbv@$Nry-7$r?^M&SZopCJB&QJLSzXha-+J{{NSG9(zJc)Zp z$L(kacS7--F8?~SZa(W@qmYJ*-v?7$#K0?ByeRZV?FI%by*s*>5}^Ev{Y6^ z_Z#5_(bu@YD@^QVsadb00v*RwSUy+;>^HVj`|o^;ru<;aHX6YOGWL62N2#6hI#aUi zH#FT<&(bjUyXU!_o?{wEDy5)0&PtD{?$t;1qXXP7j*A$-=R@m{A=?Qf! z<#O$MBc0W=AOaaC!mP;#iNK%08ii^bkhy_ys($k@kamax4s9jSQzm^512GKZAmWsG znLkE*Pu60 z$B}jIn}Gm@`n}g^w9{Iywc5-uw0L%6j*l8A&C#mrwU)?|G_10~mT22?wUvh**0ZZ* zK%&OeKk;@u?IQoW1&6Fd&A^oy99mF5Fb42L_pRGUas++lN`uaG$DRId8m2lFa$XH_oyOH}e3`(u;;Z)fr@Ugk5 zD~F$7P=KZSQB(Y=H%?I#POP}}f#DqGCw3bs!LID8*H6p^vYFFXrzQ;@h`JJ~UV>PP zNaV!0*|b(x)g13p-BhT(`r(VZ*CyuX1wNuaf~oosY$ArN`_`K|Ltx`=qF*~LrIbr< z$@=@s)xS7@;+1Cpq|Pa_>UH9(%7X9C?r#U_urRNQNI8f+sm^hx0i6K;;-fk{JuvMe z+V3SXczbg>QH)+0!?U>CC#{= z99I<>>OQ%mYa9^kE8EjBQmOisH2JSS@R70nBtlrOcPNkg$0CI~(H#Kj<2;vt49{Rv zeywb0FHT1()cb04=YF044$ZCO81sYQR?gMw$^EOspm%GB_jk_SQJ2zJ${k0M{M5G= zmzS5Fch#u?{ZQMNr#{(~p37)!WeT-^aRXI+;dw>d<6+HWgZ)~w4d3Xm-~vkw!X}{! z%WV#v$-OJbsb_db_mxz=`-HnYbx#T~KKHBhxnr4cg^GtG>>hW2{i>>}@=5d*6%}96 z-&e0(SAR{n$ulU%aaEPVaXk&k0VOVY(KiYDW8^P;#k}y;pl*bDg?_6$Dwt>}DT&sg z8~QWW{deq2Wk{&h=T?v+4^){b{Dr%wBW&fSb7idacJnZV;r?_j`gnL+)7D$WGCcWm zIe@69<3zogFuf?W$sUR&i%iDj0^A1$~^WL8h; zlPih`LOEp%c8vodMF2x4xID9(Cw_zau7%_{Z@47qrWrcVriX)Yld}UEZ^77}~KM5;;5(c)XO!CqN zJ1ZCMadrhYkXyQUeve^oy(XU=Qw=0+q-Odab`BISn8N&K8X33yZA z8_*`k__4hF!5M|&(dt6o^P#GC)ZND=Zdna37ptpQU9P!5D%wZk$_pj{U%#&n`bZ1K z7Qa|-=xsk9(F_(ky(;!#R*p$$RkbU*s`dJmF(@?Yb-WI)g2TT#ceIj@OevFb_`z$7 z)88$>&iwlH#Wg{9u$p=q0hmsmEzJPJylT{Qc@v~oHoY))>23YhHofVm^q6>dlijQx zVpP~MLufH7@LTDEpT-ETHFMx!P_uWFCd$0Q}`Bv);p4}}S z24AXG7O(`WQ=y&L=!`KpAP<~5PD}Ia`LNI{81bY7W9juN6}-jX@_L?>#(u795AD#d z6wRXezAFcyr6<0Nch69*04^NzNB7Yb5d46O9U!~-TvsP_-`$n)Q3i_+uc*JjA934Q zyY5Zu|7l2qyAw}TKGTGG++Hnv)jZZ=CQ@^`NT+YO*#XY>z^!CE`k)#6V8b_bRFm3v zQ#nrTJxAm)rs_N#e0->YqPTNp8~4isHvcU7OyQ!t63gq7s|dH}srpn-A&nKRPtJx$ zvsdezYQl&ni{$#s$APOPRhrxVqIk^o?SMR@LgN?IB|sQ|CzB&p@6%z_o}|}){k0Z; ztxL&e!+zS)GxKs?#gIl2+n=hrY~kJshx#kzORAwYv%YdwN0yHtOyt%5t0D5$ZRD(I zZNMAh#l@_OjYI(m0;*xQ^-Fn#ZchBZ#{+S?t4VaH2m6iN#kJKw5rQk5vb zui^Gh=89jiD3HL(awf4ah<9W+r!V2>7LV%xWO#)XisA6|?h#Zci*nAQP8tHDQZ++a z1YfL0TBKnXxctPx-%^1hrLnCI0+fDO-p*LM-Wrv>dIc0y(TMf#)jVU4KSLtmI1v)b zIQ4UTjn}6YVaiZ>Pb`P0PBR!?#q?@6WA3hag$BhrAFB!Mr<46gN}O9pNBc_UGo69! zO3gP(O+Ty)MQpK9UK&nB~+~bCZ zhzjMlUs5D(b{z)0NVlcNyeE0LGZk>2ugsrA>!F;9wq`BvsGSx>w6wGd3U%@NUGE-? zNX}t;Xhpz*Y;{D4oomcm+> zsqc>!>a!9gHWyr>P-ns$@73I1!>;>yUHl7xj%jz=<(DdJkw+#uF(xCV2Ylq>h~3;{ zlO5;;;6*EK@A}z|lbWuproMM7G6Wv={%w^|JFKQq@J4`6BH;?Lp&K|tiMHG95WxFD zk6vzy0PGXqVoQ_UXTmmz4oZ1OFV1QfFF~N<0fO-{AVu!kin>E^>VWQEL<%;v*UG(A zRC96Hq*mdHrF;sGS1*gcU3CAaneq(YcRoJPb$@G>#Q`WM4nR42bl!qNKLo5T0MxhNt4Uik=<|%Am=G5gGna2aG0ZO>C%r%|=f$ubj6c)2@2y6sJQ8?t3Au(drU@ndVqcBedLg+#$VRg#>iKQzrGFLpZQ^aVFDv~+O9;5K}{G9 zcw#{UJrO24P-*!MS00utMUBs;n3SJH?cFL~1mTvlGYf_8WWyD4LZ|hh;}njLVagwz zK~6I>eIK|;l@C2JD8z5++uC$U%B3wOfFqmbYnQ3>N>u=r67N6qiSG!x3RrCKVt0LY zepDP+3HPl=DMdtx>WzrX8Xu;l^#NI~I5oaRP?*m{sxo&g99FE`Zqlq+GQKn$$C&ao zmU`xKqi3hp>p4Rh4!Pn`dMy33O+g)JlB>4SFXT?DN$_i!9tT8Y*9I?!r%h4=s51SE z^*6ZNa^cmZ;^tP8nhf7XyTuNR%2M2@3!r#T9C_l)^e%kzHDFBob&*2L%lhhaK*glkdD%Zf?Mj=dCsPJ zrZ6`em5p|et6EYy7OVt@<4raNYsYN6e5<|Mep>RzR_$mP^~I<6)M_Ubh;?{ za{b;y!WM;tK84y({%q-2F*{|PUD^qux|`rt<5^VN4<}qL08TGvO^)CfQZCriE(Iut zTxLyc)g(I(NNS2~f%Rei&IaW(`n;JK3}dsVgKz;AmByLLErn%W37qo949B<36IX%i zn6jZjXQ^gBrgugly%9W9ra`7XvaCb z9c5Lvk?0vcgo?Xs_3lfGB34O+h9 z2dxl*CyjxM4;RrQ4wcYJs0&a?#AG+AeJa1T?y60V z7usN-YCOp~4eKgh75eLCS3=R9XXm?JNSH9UE0CK>s7h13VZtQkbS zpPH`HP%FJY>)um3hw(~Lw@KlFwMFr7UIC9WbHgng%=v%o*BUbto!85`{0y;?pepy8 z2B7dcM~xV)U8H(gS$eHk%VHUGojsz*P$WOEMQ+bLN6p^kwJyHZnidwXsP>q(Kix#a zb+5)iK@8=Qj-z=6SdUGjByaYY#_YMC<+K(#Z#2Q%dXjo+!}iktw;Z&TYU8D@DMe5o z3WX|gI#Db7Kc;Q! zUrrXh{8}m1sN(s|D|84nP)RtU7^oNi`;&bHkRU)7{gB2ldyJ1souHrwD*?D!%<$fh zmoKxvl*n7NJk$YhAZIEy)T;Pw0A$Q%HyXov?&ghC%n?LoaIQ_h$&CN<>?VkC>e*^` zL%w*T@hl_y@wB^(3OhC$P`J&E-|uIf!S^{?B@G83MRotxs;z30y@HoFZQ3NKa=+*5 zNK{wO-_I0E(nbeN3~I2=BGvI%bD>~ad0iLLE}+_JPXv*IOxxD`L=K(|q=*@VNiTZw z>X|=MkUg0^yfoes!~%ZlX#vT2X}3vU`A^c~{m(|W55>&u;%|7aQ)Vp`2GU*9(vj%D70VFh zqQ-LfUkX>H2>4sW^9$#@PFwP!lykzr( z>u@X3URQeaWo$XC5*d~=6Cf!11k5-B=UFRCuYdU|7qw<1-{y37N?o(70@C#h+^?C< ziAh=bBQk!bQa8V{GGrxS<I4Ct13;`+yYZ(f zbgx*JV``p|Vf6lVL(I_TaRJ^DFeg`h-lrtz7f+h;666Y;5jN(&#OhFQe6ZK7hN{W< zEQ1{;eJ;o_hycj5-rn~aLCvUw$E?!meq}YO0K)I8J2_>ghJsHl~2u>ARzqc zew~`zouZ`AO-+I2^{(RB8}ov;+ntOH;0wennc?wK+f{*b-x0&KQ9ognGn zjMf>CPL+4g5hO+O4#^|=AF7bwKfl3jLgP&0D`lxRE$ze~L zvesxrp~F0BBM~HoFVjUVy3c1#|HKyliKR&#)*R65QhS~IhM&1o=)J9MMkexufP5Ej~tz+;lhOl;|?0}dCLiOJhQJeIUrk1)(|KXXgb*SGVujrC1OPr$^0eO zlsv{TFfecum$(O5FGTT6?eutO0{bA7nJlm0YOtK%#;yA0czTyzYkCr}a8(` zh%q`&5mp;U^$njsRqg5(@|8nMuYq=oK7o8HL6%LeyopgK3sYaFcxqN5B5;b|FTRYG zvS*{zH8qyqiDiYsZKcd7w1U36`<<_Xgc@_lb$dArdL?g^4)UW&w3z} ze(MuQ1yEvqO=`UhfF|mCiq|UfLoEbf4eA=kC4CDv#q>lkT(*LIuWkSCn zAhN&V*#+P3Sq=d#k(Wc>%*;&dXY|mw+uznk*b4_Xl4KH}`y8y!zbYIA$RppbLg2bB zN#}nS7yBXAK7OYQjvN~UQhYlhM>D!s57Y8qn}l;lXw3PCucTlSD98F`Za5YZ{ptf% zHY9x^__NI$Pt0w3K=T%%X)T&JewglU_ag9YQqg1MOTyEMN~qF>$L+PZS$FUgl%sHl zzgBlY%VcgHbr}Xt7gnEo5*WXerb+mIc&Vzst{=a{=6}8$(#}|p8IB47^o`q@dFQ4iv<*d^Q@>6Bm z37O+9Rg?b(SydZ|(S}|pyR!gFeBA;k@lGN89(Fr`xeP`wJNd;blhk)v`B)34hQhcDGxnLq5Pv%K_$4dpYRtEJg~ z7g19I3qukx%FzIsCQl5ssHo`MY+P!t-~bPsjuYGl@Y%zg)m4<9T>0@!iUK<-Wfl^7lv5Ep_OI?9I>CwYk-9YLnI;76*h&3?@1F*{VBS56*l_jZdlbY=CFV66lx$ zCYktgGs_nb1c+u8D`b^42P9XvB1-q?9A2b$CvXu+o(wa$_#D#D&dmXM>iGLhBI}?a zDUX|g%Y(I@9qc){d7dFdS&T_y+t+-=(8Yx)AkSLhH8Yu>Th+Bs2`b-H+nXD176jb= zy5u6EKhG!CjAi@@$ZhBao3>&ds%mO%?d=~;NS=P}NV3R{@jqHcJy_{Z+MBD{Sndq) zJslT2o>XL5>%3fk>i@Tku@b?Nhy)Kzz80?FNpvDnAZ2ay+9fTL7pd>=PGZF7fY4Y; z(w_#tK`DKb`;ulf_ZMJ5M&{;&)0#2~fKFXpU9DYRU15Hbi5smksl$3R3;OV$bd#sR zWos)=XF#b~S$K^o3zmUE{oq$y<9VlAjtfs@X*}pQOjoWcfq?A3IZCm7JQE)SqPan` z!S~dW>1)8STT3^?Y`Svd|1v9eE>c%4Y@)Q31F(VIudrS0RglgbcM!y zHCbl{bAMD#)Nb>5rE(*o<>B+wcGTe&&xPKu`Y$F8gd+IuaB0786f__%^oBysVa6ay9LIyEFgwYOAz$FuQCSGmT~=;Kxdjo%^Xhs4RBCG zl-9`aKrlU8aIOIm5V~9EIi>X7ani;^Y`wU6xBL15s=O9P4qUp@&it|P>Z;Nk0?Y)6 z`lnhhsi9N%nQ_{;GLQ76fi^4JJpr{g^Vi=gfi}A;?^#A`YBUZ@mZW^ojV}aM)zv^~ zUIUEe;f)xu?r@75F*hnQFi8N9XbeJ&2b8GNhymm`JLuUaea{_FH1^(;g0-DqEjI{T z%wdp&wNF2@e}*bul09;jx33I+BoX7?&V#I5^Q^jo1!62t9|;zLvl z{yP*xH@!;^WGB7=JK2#V_$&_j47cAM_HLv=LJ&YXuF8M>S zu$dU1DaViZLDRWqu=It1!_H{8;M2T61MJf8X~o{G0v0E8`cy`DPuiT-Eu;}3hItb# zPjNX`U5ZDdh*acBu&TD!aiv$YTFUf!1rZ0 z8lONp;ZNloCT@*@#tDS;9zA_e5{ zy5{B~!4*=z5=2|n&J%q)Uio+y9U$y=>ej&1*l{U_jsx9E-(xWPfIVl!D-_7V0C*<{ zQ?6Y(5gE`n54npE_)0pgDdIgOw%BO#3Pr8rxrzYRs*8((0)4E%uJ5uQkIZ!Moyay4 z078-5Xf*E=kS7Pn|1bt%v5ioU0EaaW4!(I;;A3Z}T7^5tcggx8eqfShy7ycrTJ_H7 zAM29Qe8|BUK(}@O+V=}`>V(nil*FrTh zCbLFA#2C``Jez~)%oQ|mQnym}f<^O=eLl+r%pyLf1zg7&59n>)%TTD;$mQUY%MkuR zonLZY?#e(;F!zRdNU#%32!J3DU*DajYy?}jPCZd%%A-I^F!-n!i?tiQ7HEMrqu4}o z;n>6MUn4G>#0IGB>JQ9jk$vUXxyYmqvZg>S&sSgH4-sT{)e_-JwaF4=yzM60hHQ;2 z$!F?C1Hmsdtn2unYf~kvYB1x_zH(HfmN8!WqjEVIgh*vM>HOm;(5LR$@eUgp^`2AY zxfbjg?F}VqW#z4H?oB(ra;QCR;CLGf*7}m##&eHGP#M&JF0}W z)1WNHOmpLk&H-p-w_pfaB-Z{03O``vZ0xoA9-GIy2uzA`L)aQ!-1#Bo&Tlaq?W4b} z`mro5>1HR~hMV{jJ1kV1Ia*8CDPI>E-B*tJk4V_U`Auz2?SGZm0Q=}bDs0gX< zM%NuR9_N2%d<)$d9D7{&=)pUnveocWrAZ9ISznzJLm0c1w!dt8cq{#i5VO^~p9^+% zxP07^b-@={&je?g0vCk|0TpyUo}p=SymQfs6Af|O`Mv(uU#|)HTPF{kfDxX5)jZ`N zaIr10-2T(cdE?`a+8@$8&sW&Et-oIU&is^6K}Gk@tGtgKBL7VY?NJ zn7sb(=1e(hS17*h${FXxDa;>N1B2aPqHU*|R?>XEnN_u49Z!DTbYbyrV5NBWtS_)- gdQU{^!GC78`Fyt)ael7@o+rcL>FVdQ&MBb@00wr2ZU6uP literal 0 HcmV?d00001 diff --git a/images/contributions/chapter_8/cartpole_benchmark_mujoco.tikz b/images/contributions/chapter_8/cartpole_benchmark_mujoco.tikz new file mode 100644 index 0000000..2e65bf0 --- /dev/null +++ b/images/contributions/chapter_8/cartpole_benchmark_mujoco.tikz @@ -0,0 +1,955 @@ +% This file was created with tikzplotlib v0.10.1. +\begin{tikzpicture} + +\definecolor{darkgray176}{RGB}{176,176,176} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{seagreen1513384}{RGB}{15,133,84} +\definecolor{teal29105150}{RGB}{29,105,150} + +\begin{groupplot}[group style={group size=1 by 3}] +\nextgroupplot[ +width=0.98*\textwidth, +height=0.8*\axisdefaultheight, +scaled x ticks=manual:{}{\pgfmathparse{#1}}, +tick align=outside, +tick pos=left, +title={\textbf{Cartpole swing-up trajectories with out-of-distribution parameters and simulator}}, +x grid style={darkgray176!30}, +xmajorgrids, +xmin=-0.5, xmax=10.5, +xtick style={color=black}, +xticklabels={}, +y grid style={darkgray176!30}, +ylabel={Pole angle $\theta$ [rad]}, +ymajorgrids, +ymin=-0.428858034986458, ymax=6.61225360864816, +ytick style={color=black}, +legend cell align={left}, +legend style={ + fill opacity=0.8, + draw opacity=1, + text opacity=1, + anchor=north east, + draw=white!80!black, +}, +] +\addlegendentry{nominal} +\addlegendentry{mass} +\addlegendentry{mass+damping} +\addplot [teal29105150] +table {% +0 3.14159274101257 +0.0499999523162842 3.11316442489624 +0.100000023841858 3.029217004776 +0.149999976158142 2.89824342727661 +0.200000047683716 2.73735928535461 +0.25 2.5680468082428 +0.299999952316284 2.41170239448547 +0.350000023841858 2.28390908241272 +0.399999976158142 2.19294714927673 +0.450000047683716 2.14294624328613 +0.5 2.13666033744812 +0.549999952316284 2.17623066902161 +0.600000023841858 2.26421642303467 +0.649999976158142 2.40401554107666 +0.700000047683716 2.59872937202454 +0.75 2.84934735298157 +0.799999952316284 3.15100049972534 +0.850000023841858 3.50543355941772 +0.899999976158142 3.88264894485474 +0.950000047683716 4.23039865493774 +1 4.53415298461914 +1.04999995231628 4.79751300811768 +1.10000002384186 5.02463150024414 +1.14999997615814 5.21915435791016 +1.20000004768372 5.38505220413208 +1.25 5.52796697616577 +1.29999995231628 5.65329122543335 +1.35000002384186 5.76173830032349 +1.39999997615814 5.85205841064453 +1.45000004768372 5.92473888397217 +1.5 5.9816780090332 +1.54999995231628 6.02631378173828 +1.60000002384186 6.06209897994995 +1.64999997615814 6.09090852737427 +1.70000004768372 6.11374855041504 +1.75 6.13175058364868 +1.79999995231628 6.1459379196167 +1.89999997615814 6.16585254669189 +2 6.17880010604858 +2.20000004768372 6.19638776779175 +2.45000004768372 6.21899080276489 +3 6.2739782333374 +3.20000004768372 6.28569269180298 +3.40000009536743 6.29125213623047 +3.65000009536743 6.29172992706299 +4.94999980926514 6.28203392028809 +7.34999990463257 6.28315734863281 +10 6.28318500518799 +}; +\addplot [seagreen1513384] +table {% +0 3.14159274101257 +0.0499999523162842 3.12737798690796 +0.100000023841858 3.08535408973694 +0.149999976158142 3.01776647567749 +0.200000047683716 2.92952942848206 +0.350000023841858 2.62188720703125 +0.399999976158142 2.53420686721802 +0.450000047683716 2.46666502952576 +0.5 2.42466759681702 +0.549999952316284 2.41208434104919 +0.600000023841858 2.43159580230713 +0.649999976158142 2.48502612113953 +0.700000047683716 2.57373929023743 +0.75 2.69829940795898 +0.799999952316284 2.85773730278015 +0.850000023841858 3.04890275001526 +0.899999976158142 3.26606345176697 +0.950000047683716 3.50950789451599 +1.04999995231628 4.0357985496521 +1.10000002384186 4.2825493812561 +1.14999997615814 4.50202655792236 +1.20000004768372 4.68788623809814 +1.25 4.83674240112305 +1.29999995231628 4.94769191741943 +1.35000002384186 5.02182006835938 +1.39999997615814 5.06094121932983 +1.45000004768372 5.06656932830811 +1.5 5.03937196731567 +1.54999995231628 4.97903060913086 +1.60000002384186 4.8844165802002 +1.64999997615814 4.7540111541748 +1.70000004768372 4.5864405632019 +1.75 4.38077020645142 +1.79999995231628 4.13698577880859 +1.85000002384186 3.85720920562744 +1.89999997615814 3.54757308959961 +2 2.88925981521606 +2.15000009536743 1.85401403903961 +2.20000004768372 1.55851316452026 +2.25 1.30029225349426 +2.29999995231628 1.07738780975342 +2.34999990463257 0.886365175247192 +2.40000009536743 0.72206974029541 +2.45000004768372 0.578912734985352 +2.5 0.454472303390503 +2.54999995231628 0.350138902664185 +2.59999990463257 0.266948223114014 +2.65000009536743 0.203443765640259 +2.70000004768372 0.156808614730835 +2.75 0.123730421066284 +2.79999995231628 0.101237535476685 +2.84999990463257 0.0870523452758789 +2.90000009536743 0.0792902708053589 +2.95000004768372 0.0762373208999634 +3.04999995231628 0.0787324905395508 +3.25 0.0909607410430908 +3.34999990463257 0.0901103019714355 +3.45000004768372 0.081243634223938 +3.54999995231628 0.0655739307403564 +3.84999990463257 0.0103480815887451 +3.95000004768372 -0.00236403942108154 +4.05000019073486 -0.0107022523880005 +4.15000009536743 -0.0149328708648682 +4.30000019073486 -0.0151280164718628 +4.5 -0.00835633277893066 +4.84999990463257 0.00531435012817383 +5.05000019073486 0.00726318359375 +5.34999990463257 0.00327968597412109 +5.75 -0.00186848640441895 +6.15000009536743 -0.00116312503814697 +6.84999990463257 0.000665068626403809 +8.85000038146973 1.00135803222656e-05 +10 8.82148742675781e-06 +}; +\addplot [indianred2048062] +table {% +0 3.14159274101257 +0.0499999523162842 3.127436876297 +0.100000023841858 3.08582830429077 +0.149999976158142 3.01936626434326 +0.200000047683716 2.93325781822205 +0.350000023841858 2.63873529434204 +0.399999976158142 2.55710244178772 +0.450000047683716 2.49586009979248 +0.5 2.46001195907593 +0.549999952316284 2.45308256149292 +0.600000023841858 2.47734642028809 +0.649999976158142 2.53414821624756 +0.700000047683716 2.62429928779602 +0.75 2.74772834777832 +0.799999952316284 2.90267753601074 +0.850000023841858 3.08518075942993 +0.899999976158142 3.28895664215088 +0.950000047683716 3.51513457298279 +1.04999995231628 4.00232458114624 +1.10000002384186 4.22979307174683 +1.14999997615814 4.43061113357544 +1.20000004768372 4.59798669815063 +1.25 4.72808170318604 +1.29999995231628 4.81948947906494 +1.35000002384186 4.87262201309204 +1.39999997615814 4.88866710662842 +1.45000004768372 4.86869955062866 +1.5 4.81318521499634 +1.54999995231628 4.72196531295776 +1.60000002384186 4.59456825256348 +1.64999997615814 4.43059015274048 +1.70000004768372 4.23008728027344 +1.75 3.99427461624146 +1.79999995231628 3.72676396369934 +1.89999997615814 3.13162565231323 +1.95000004768372 2.79632949829102 +2.04999995231628 2.07625985145569 +2.09999990463257 1.76077783107758 +2.15000009536743 1.48621475696564 +2.20000004768372 1.25158298015594 +2.25 1.05484640598297 +2.29999995231628 0.892646074295044 +2.34999990463257 0.759428739547729 +2.40000009536743 0.648389101028442 +2.45000004768372 0.55292022228241 +2.5 0.467167019844055 +2.59999990463257 0.309610962867737 +2.65000009536743 0.236144542694092 +2.70000004768372 0.168265700340271 +2.75 0.107647061347961 +2.79999995231628 0.0551607608795166 +2.84999990463257 0.0108542442321777 +2.90000009536743 -0.0256514549255371 +2.95000004768372 -0.0548015832901001 +3 -0.0771408081054688 +3.04999995231628 -0.0932321548461914 +3.09999990463257 -0.10357141494751 +3.15000009536743 -0.10861074924469 +3.20000004768372 -0.108807563781738 +3.25 -0.104635000228882 +3.29999995231628 -0.0965436697006226 +3.34999990463257 -0.0849307775497437 +3.45000004768372 -0.0529919862747192 +3.75 0.0587311983108521 +3.84999990463257 0.08613121509552 +3.90000009536743 0.095963716506958 +3.95000004768372 0.102633357048035 +4 0.105760931968689 +4.05000019073486 0.105126738548279 +4.09999990463257 0.100800395011902 +4.15000009536743 0.0932033061981201 +4.25 0.0710544586181641 +4.5 0.00765824317932129 +4.59999990463257 -0.0109573602676392 +4.69999980926514 -0.0233756303787231 +4.80000019073486 -0.0296239852905273 +4.90000009536743 -0.0304961204528809 +5 -0.0270965099334717 +5.15000009536743 -0.0166442394256592 +5.44999980926514 0.00780320167541504 +5.59999990463257 0.0145895481109619 +5.75 0.0158199071884155 +5.94999980926514 0.0106453895568848 +6.40000009536743 -0.00470054149627686 +6.59999990463257 -0.00606143474578857 +6.90000009536743 -0.00262999534606934 +7.34999990463257 0.00261390209197998 +7.75 0.00113165378570557 +8.39999961853027 -0.000965595245361328 +10 -0.000184416770935059 +}; + +\nextgroupplot[ +width=0.98*\textwidth, +height=0.8*\axisdefaultheight, +scaled x ticks=manual:{}{\pgfmathparse{#1}}, +tick align=outside, +tick pos=left, +x grid style={darkgray176!30}, +xmajorgrids, +xmin=-0.5, xmax=10.5, +xtick style={color=black}, +xticklabels={}, +y grid style={darkgray176!30}, +ylabel={Cart position $d$ [m]}, +ymajorgrids, +ymin=-0.533162417307961, ymax=1.32599500023271, +ytick style={color=black}, +legend cell align={left}, +legend style={ + fill opacity=0.8, + draw opacity=1, + text opacity=1, + anchor=north east, + draw=white!80!black, +}, +] +\addlegendentry{nominal} +\addlegendentry{mass} +\addlegendentry{mass+damping} +\addplot [teal29105150] +table {% +0 0 +0.0499999523162842 0.019012451171875 +0.100000023841858 0.0758955478668213 +0.149999976158142 0.167281746864319 +0.200000047683716 0.285481095314026 +0.25 0.42038881778717 +0.299999952316284 0.560904979705811 +0.350000023841858 0.697999477386475 +0.399999976158142 0.826302289962769 +0.450000047683716 0.94251012802124 +0.5 1.04367899894714 +0.549999952316284 1.12720656394958 +0.600000023841858 1.19019103050232 +0.649999976158142 1.22924363613129 +0.700000047683716 1.24148786067963 +0.75 1.22509467601776 +0.799999952316284 1.18036949634552 +0.850000023841858 1.09977996349335 +0.899999976158142 0.994075059890747 +0.950000047683716 0.893744945526123 +1 0.80928373336792 +1.04999995231628 0.737652540206909 +1.10000002384186 0.675179719924927 +1.14999997615814 0.619478702545166 +1.20000004768372 0.569486379623413 +1.25 0.52622652053833 +1.29999995231628 0.490436434745789 +1.45000004768372 0.395806670188904 +1.5 0.361668467521667 +1.60000002384186 0.292282462120056 +1.64999997615814 0.258697271347046 +1.70000004768372 0.225936889648438 +1.75 0.194191098213196 +1.79999995231628 0.163695812225342 +1.85000002384186 0.134603142738342 +1.89999997615814 0.107106566429138 +1.95000004768372 0.0814223289489746 +2 0.0576845407485962 +2.04999995231628 0.0359920263290405 +2.09999990463257 0.0164226293563843 +2.15000009536743 -0.000980019569396973 +2.20000004768372 -0.0161834955215454 +2.25 -0.029173731803894 +2.29999995231628 -0.0399761199951172 +2.34999990463257 -0.0486328601837158 +2.40000009536743 -0.0552521944046021 +2.45000004768372 -0.0599976778030396 +2.5 -0.063002347946167 +2.54999995231628 -0.0644195079803467 +2.59999990463257 -0.0644369125366211 +2.65000009536743 -0.0632408857345581 +2.70000004768372 -0.0610123872756958 +2.75 -0.0579200983047485 +2.84999990463257 -0.0498518943786621 +3 -0.0354187488555908 +3.15000009536743 -0.0210957527160645 +3.25 -0.0127717256546021 +3.34999990463257 -0.00587987899780273 +3.45000004768372 -0.000560283660888672 +3.54999995231628 0.00317895412445068 +3.65000009536743 0.00546109676361084 +3.75 0.00652170181274414 +3.90000009536743 0.00642657279968262 +4.09999990463257 0.00445318222045898 +4.59999990463257 -0.00164914131164551 +4.84999990463257 -0.00309062004089355 +5.15000009536743 -0.00336909294128418 +7 -0.00200200080871582 +9.94999980926514 -0.00203800201416016 +10 -0.00203800201416016 +}; +\addplot [seagreen1513384] +table {% +0 0 +0.0499999523162842 0.00950634479522705 +0.100000023841858 0.0379698276519775 +0.149999976158142 0.0849701166152954 +0.200000047683716 0.149004936218262 +0.25 0.227373123168945 +0.299999952316284 0.316799402236938 +0.350000023841858 0.413825392723083 +0.5 0.717469215393066 +0.549999952316284 0.8134605884552 +0.600000023841858 0.903050541877747 +0.649999976158142 0.984267234802246 +0.700000047683716 1.05500972270966 +0.75 1.11329555511475 +0.799999952316284 1.15762758255005 +0.850000023841858 1.18711233139038 +0.899999976158142 1.20140409469604 +0.950000047683716 1.19471395015717 +1 1.16426491737366 +1.04999995231628 1.11574673652649 +1.10000002384186 1.05390322208405 +1.14999997615814 0.981622338294983 +1.20000004768372 0.900310039520264 +1.25 0.811433076858521 +1.39999997615814 0.529704809188843 +1.45000004768372 0.43991494178772 +1.5 0.354740381240845 +1.54999995231628 0.274582147598267 +1.60000002384186 0.199295878410339 +1.64999997615814 0.128455877304077 +1.70000004768372 0.0621521472930908 +1.75 0.00101149082183838 +1.79999995231628 -0.0543900728225708 +1.85000002384186 -0.103915572166443 +1.89999997615814 -0.148486256599426 +2 -0.23242199420929 +2.04999995231628 -0.256365895271301 +2.09999990463257 -0.25565767288208 +2.15000009536743 -0.249871969223022 +2.20000004768372 -0.246229887008667 +2.25 -0.245240926742554 +2.29999995231628 -0.246075868606567 +2.34999990463257 -0.248424768447876 +2.40000009536743 -0.253158330917358 +2.45000004768372 -0.261140108108521 +2.5 -0.27062714099884 +2.54999995231628 -0.277476072311401 +2.59999990463257 -0.278488159179688 +2.65000009536743 -0.272762656211853 +2.70000004768372 -0.260740041732788 +2.75 -0.243544340133667 +2.79999995231628 -0.222366809844971 +2.84999990463257 -0.198178648948669 +2.90000009536743 -0.171886801719666 +3.04999995231628 -0.089552640914917 +3.09999990463257 -0.0636260509490967 +3.15000009536743 -0.0395869016647339 +3.20000004768372 -0.0180047750473022 +3.25 0.000634312629699707 +3.29999995231628 0.0159546136856079 +3.34999990463257 0.0277074575424194 +3.40000009536743 0.0358268022537231 +3.45000004768372 0.0404481887817383 +3.5 0.0418832302093506 +3.54999995231628 0.0405533313751221 +3.59999990463257 0.0369718074798584 +3.65000009536743 0.0317045450210571 +3.75 0.0182329416275024 +3.90000009536743 -0.00298762321472168 +3.95000004768372 -0.009116530418396 +4 -0.014452338218689 +4.05000019073486 -0.0188910961151123 +4.09999990463257 -0.0223748683929443 +4.15000009536743 -0.0248913764953613 +4.19999980926514 -0.0264656543731689 +4.25 -0.027147650718689 +4.30000019073486 -0.0270053148269653 +4.40000009536743 -0.0246149301528931 +4.5 -0.0201339721679688 +4.84999990463257 -0.00107824802398682 +4.94999980926514 0.00228273868560791 +5.05000019073486 0.00410079956054688 +5.15000009536743 0.00448489189147949 +5.30000019073486 0.00308430194854736 +5.80000019073486 -0.00424456596374512 +5.94999980926514 -0.00468134880065918 +6.15000009536743 -0.00400805473327637 +6.69999980926514 -0.00128722190856934 +7.05000019073486 -0.00153505802154541 +7.69999980926514 -0.00231456756591797 +10 -0.0020296573638916 +}; +\addplot [indianred2048062] +table {% +0 0 +0.0499999523162842 0.00950253009796143 +0.100000023841858 0.0379323959350586 +0.149999976158142 0.0848188400268555 +0.200000047683716 0.148600697517395 +0.25 0.226526618003845 +0.299999952316284 0.315297722816467 +0.350000023841858 0.411461234092712 +0.5 0.711578369140625 +0.549999952316284 0.806179285049438 +0.600000023841858 0.894288063049316 +0.649999976158142 0.973990917205811 +0.700000047683716 1.04327929019928 +0.75 1.10028576850891 +0.799999952316284 1.14371228218079 +0.850000023841858 1.17291986942291 +0.899999976158142 1.18778693675995 +0.950000047683716 1.18203568458557 +1 1.15188884735107 +1.04999995231628 1.10227918624878 +1.10000002384186 1.03738701343536 +1.14999997615814 0.959969401359558 +1.20000004768372 0.871989369392395 +1.25 0.775573492050171 +1.29999995231628 0.673943638801575 +1.39999997615814 0.468926548957825 +1.45000004768372 0.370574712753296 +1.5 0.276982545852661 +1.54999995231628 0.188712120056152 +1.60000002384186 0.105926632881165 +1.64999997615814 0.0288997888565063 +1.70000004768372 -0.0417933464050293 +1.75 -0.10572075843811 +1.79999995231628 -0.163009643554688 +1.85000002384186 -0.214813351631165 +1.89999997615814 -0.263411164283752 +1.95000004768372 -0.287912011146545 +2 -0.278084754943848 +2.04999995231628 -0.2571120262146 +2.09999990463257 -0.238086342811584 +2.15000009536743 -0.22255527973175 +2.29999995231628 -0.182314157485962 +2.34999990463257 -0.170916438102722 +2.40000009536743 -0.164104580879211 +2.45000004768372 -0.164179086685181 +2.5 -0.172871947288513 +2.54999995231628 -0.1908860206604 +2.59999990463257 -0.217397332191467 +2.65000009536743 -0.249939441680908 +2.75 -0.320370316505432 +2.79999995231628 -0.353200793266296 +2.84999990463257 -0.382329702377319 +2.90000009536743 -0.406847357749939 +2.95000004768372 -0.426079154014587 +3 -0.439598917961121 +3.04999995231628 -0.447165966033936 +3.09999990463257 -0.448655247688293 +3.15000009536743 -0.44406259059906 +3.20000004768372 -0.433526158332825 +3.25 -0.417322516441345 +3.29999995231628 -0.395828127861023 +3.34999990463257 -0.369486927986145 +3.40000009536743 -0.338848829269409 +3.45000004768372 -0.304659962654114 +3.5 -0.267884969711304 +3.70000004768372 -0.115323305130005 +3.75 -0.0799684524536133 +3.79999995231628 -0.0471334457397461 +3.84999990463257 -0.0175420045852661 +3.90000009536743 0.0081484317779541 +3.95000004768372 0.0293830633163452 +4 0.0457135438919067 +4.05000019073486 0.0568856000900269 +4.09999990463257 0.0629335641860962 +4.15000009536743 0.0642228126525879 +4.19999980926514 0.0613921880722046 +4.25 0.0551869869232178 +4.30000019073486 0.0464013814926147 +4.34999990463257 0.0358304977416992 +4.44999980926514 0.0121642351150513 +4.55000019073486 -0.01100754737854 +4.59999990463257 -0.0213189125061035 +4.65000009536743 -0.0303747653961182 +4.69999980926514 -0.037971019744873 +4.75 -0.0439836978912354 +4.80000019073486 -0.0483617782592773 +4.84999990463257 -0.0511152744293213 +4.90000009536743 -0.0523005723953247 +4.94999980926514 -0.0520071983337402 +5 -0.0503549575805664 +5.05000019073486 -0.0474947690963745 +5.09999990463257 -0.0436025857925415 +5.19999980926514 -0.0334981679916382 +5.44999980926514 -0.00446927547454834 +5.55000019073486 0.00468134880065918 +5.59999990463257 0.00821065902709961 +5.65000009536743 0.0109634399414062 +5.69999980926514 0.0129225254058838 +5.75 0.0140993595123291 +5.80000019073486 0.014528751373291 +5.90000009536743 0.0134128332138062 +6 0.0102936029434204 +6.15000009536743 0.00369417667388916 +6.30000019073486 -0.00292909145355225 +6.40000009536743 -0.00639927387237549 +6.5 -0.00873517990112305 +6.59999990463257 -0.0098426342010498 +6.69999980926514 -0.00979709625244141 +6.84999990463257 -0.00804257392883301 +7.34999990463257 0.000424623489379883 +7.5 0.00100338459014893 +7.69999980926514 0.000244498252868652 +8.25 -0.0032728910446167 +8.5 -0.00321042537689209 +9.35000038146973 -0.00156271457672119 +10 -0.00225186347961426 +}; + +\nextgroupplot[ +width=0.98*\textwidth, +height=0.8*\axisdefaultheight, +tick align=outside, +tick pos=left, +x grid style={darkgray176!30}, +xmajorgrids, +xmin=-0.5, xmax=10.5, +xtick style={color=black}, +y grid style={darkgray176!30}, +ylabel={Force applied to cart $f$ [N]}, +ymajorgrids, +ymin=-26.2798247337341, ymax=42.3187670707703, +ytick style={color=black}, +legend cell align={left}, +legend style={ + fill opacity=0.8, + draw opacity=1, + text opacity=1, + anchor=north east, + draw=white!80!black, +}, +] +\addlegendentry{nominal} +\addlegendentry{mass} +\addlegendentry{mass+damping} +\addplot [teal29105150] +table {% +0 15.5939607620239 +0.0499999523162842 15.520676612854 +0.100000023841858 13.037878036499 +0.149999976158142 9.61607646942139 +0.25 0.730765581130981 +0.299999952316284 -2.35638570785522 +0.350000023841858 -4.05070400238037 +0.399999976158142 -5.46542835235596 +0.450000047683716 -6.77976369857788 +0.5 -7.73431205749512 +0.549999952316284 -9.09086608886719 +0.600000023841858 -10.2340679168701 +0.649999976158142 -10.9837198257446 +0.700000047683716 -11.3140077590942 +0.75 -10.8572015762329 +0.799999952316284 -18.7454528808594 +0.850000023841858 -4.44649028778076 +0.899999976158142 4.84866809844971 +0.950000047683716 4.59893894195557 +1 3.25690722465515 +1.04999995231628 2.52445650100708 +1.10000002384186 2.13095307350159 +1.14999997615814 2.29954981803894 +1.20000004768372 3.44255042076111 +1.25 3.1438524723053 +1.29999995231628 0.945952892303467 +1.35000002384186 -0.0995728969573975 +1.39999997615814 -0.613557577133179 +1.45000004768372 -0.454593181610107 +1.5 0.192463159561157 +1.54999995231628 0.557344675064087 +1.60000002384186 0.468252539634705 +1.64999997615814 0.486636519432068 +1.70000004768372 0.59409499168396 +1.75 0.655533909797668 +1.79999995231628 0.699015140533447 +1.85000002384186 0.798288702964783 +1.89999997615814 0.863532900810242 +2 0.935891389846802 +2.04999995231628 0.954321384429932 +2.15000009536743 0.974729061126709 +2.20000004768372 0.970592379570007 +2.25 0.947738885879517 +2.29999995231628 0.92967414855957 +2.34999990463257 0.852154016494751 +2.40000009536743 0.788204550743103 +2.45000004768372 0.735793113708496 +2.59999990463257 0.493512868881226 +2.65000009536743 0.417134046554565 +2.70000004768372 0.34711766242981 +2.75 0.265109539031982 +2.79999995231628 0.19122838973999 +2.84999990463257 0.128156065940857 +2.95000004768372 0.021661639213562 +3 -0.0252951383590698 +3.04999995231628 -0.0653047561645508 +3.09999990463257 -0.0983569622039795 +3.15000009536743 -0.125717759132385 +3.20000004768372 -0.145013809204102 +3.25 -0.157893180847168 +3.29999995231628 -0.165747761726379 +3.34999990463257 -0.16906201839447 +3.45000004768372 -0.169271469116211 +3.5 -0.162108540534973 +3.59999990463257 -0.138792514801025 +3.75 -0.0954494476318359 +3.79999995231628 -0.0793169736862183 +3.95000004768372 -0.0442237854003906 +4.09999990463257 -0.0155829191207886 +4.25 0.00502192974090576 +4.40000009536743 0.017408013343811 +4.55000019073486 0.0226800441741943 +4.75 0.0217399597167969 +5.05000019073486 0.0122110843658447 +5.44999980926514 0.00010216236114502 +5.80000019073486 -0.00347757339477539 +6.34999990463257 -0.00157356262207031 +7.19999980926514 0.000505566596984863 +10 6.91413879394531e-06 +}; +\addplot [seagreen1513384] +table {% +0 15.5939607620239 +0.0499999523162842 15.5483503341675 +0.100000023841858 15.0517749786377 +0.149999976158142 13.3556661605835 +0.200000047683716 11.004150390625 +0.299999952316284 5.66796636581421 +0.399999976158142 0.307034492492676 +0.450000047683716 -1.86212491989136 +0.5 -3.85764694213867 +0.600000023841858 -7.11573123931885 +0.649999976158142 -8.82696914672852 +0.700000047683716 -10.1905851364136 +0.75 -11.1719121932983 +0.850000023841858 -12.5356960296631 +0.899999976158142 -22.8873767852783 +1 -15.7105150222778 +1.04999995231628 -12.5722913742065 +1.10000002384186 -10.6292123794556 +1.14999997615814 -9.04035377502441 +1.20000004768372 -6.60694217681885 +1.25 -2.70202302932739 +1.29999995231628 0.513923168182373 +1.35000002384186 2.80659747123718 +1.39999997615814 4.20211315155029 +1.45000004768372 4.80246877670288 +1.5 4.71114158630371 +1.54999995231628 4.09843158721924 +1.60000002384186 3.19204926490784 +1.64999997615814 3.1159200668335 +1.75 2.49207544326782 +1.79999995231628 1.7422126531601 +1.89999997615814 -0.33929717540741 +1.95000004768372 -1.03285264968872 +2 34.1125831604004 +2.04999995231628 14.1368865966797 +2.09999990463257 3.57156133651733 +2.15000009536743 0.952624797821045 +2.20000004768372 0.462041020393372 +2.25 0.0576977729797363 +2.29999995231628 -0.879440665245056 +2.34999990463257 -2.68613219261169 +2.40000009536743 -3.03205919265747 +2.45000004768372 -0.0728875398635864 +2.5 3.74255299568176 +2.54999995231628 5.19275569915771 +2.59999990463257 5.26579141616821 +2.65000009536743 4.54771709442139 +2.70000004768372 3.50678253173828 +2.75 2.66911578178406 +2.79999995231628 1.97149884700775 +2.90000009536743 0.434381365776062 +3 -0.866186857223511 +3.04999995231628 -1.4260630607605 +3.09999990463257 -1.91591954231262 +3.15000009536743 -2.3717782497406 +3.20000004768372 -2.72233486175537 +3.25 -2.99458789825439 +3.29999995231628 -3.13226318359375 +3.34999990463257 -3.09842991828918 +3.40000009536743 -2.89960026741028 +3.45000004768372 -2.57039570808411 +3.5 -2.18652868270874 +3.59999990463257 -1.22946858406067 +3.65000009536743 -0.784151434898376 +3.70000004768372 -0.391867876052856 +3.75 -0.0643924474716187 +3.79999995231628 0.207567691802979 +3.84999990463257 0.429673314094543 +3.90000009536743 0.597144246101379 +3.95000004768372 0.710273504257202 +4 0.781934499740601 +4.05000019073486 0.816150426864624 +4.09999990463257 0.809849977493286 +4.15000009536743 0.779927730560303 +4.19999980926514 0.730541944503784 +4.25 0.668222665786743 +4.30000019073486 0.586024045944214 +4.34999990463257 0.491402864456177 +4.40000009536743 0.402989864349365 +4.44999980926514 0.306830644607544 +4.59999990463257 0.00141704082489014 +4.65000009536743 -0.0885822772979736 +4.69999980926514 -0.166700959205627 +4.75 -0.230480790138245 +4.80000019073486 -0.278677463531494 +4.84999990463257 -0.311882972717285 +4.90000009536743 -0.329241871833801 +4.94999980926514 -0.331223011016846 +5 -0.319196581840515 +5.05000019073486 -0.295417189598083 +5.09999990463257 -0.262594938278198 +5.34999990463257 -0.0509960651397705 +5.40000009536743 -0.0153040885925293 +5.44999980926514 0.0161787271499634 +5.5 0.04292893409729 +5.55000019073486 0.0646121501922607 +5.59999990463257 0.0811101198196411 +5.65000009536743 0.0925499200820923 +5.69999980926514 0.0991678237915039 +5.75 0.101333618164062 +5.80000019073486 0.0995268821716309 +5.90000009536743 0.0865063667297363 +6 0.0647627115249634 +6.25 0.00144100189208984 +6.34999990463257 -0.017413854598999 +6.44999980926514 -0.0296354293823242 +6.55000019073486 -0.0350520610809326 +6.65000009536743 -0.0344771146774292 +6.80000019073486 -0.0255258083343506 +7.19999980926514 0.00574004650115967 +7.34999990463257 0.0107396841049194 +7.55000019073486 0.0105091333389282 +7.90000009536743 0.00120937824249268 +8.19999980926514 -0.00371861457824707 +8.55000019073486 -0.00226199626922607 +9.14999961853027 0.00127720832824707 +10 -0.000420212745666504 +}; +\addplot [indianred2048062] +table {% +0 15.5939607620239 +0.0499999523162842 15.5354099273682 +0.100000023841858 15.0025386810303 +0.149999976158142 13.2687253952026 +0.200000047683716 10.8791847229004 +0.299999952316284 5.52836990356445 +0.399999976158142 0.259945273399353 +0.450000047683716 -1.92690682411194 +0.5 -3.92053914070129 +0.649999976158142 -8.74668216705322 +0.700000047683716 -10.0694169998169 +0.75 -10.8939580917358 +0.799999952316284 -11.4569244384766 +0.850000023841858 -11.9334268569946 +0.899999976158142 -23.1617069244385 +1.04999995231628 -14.154839515686 +1.10000002384186 -12.1492357254028 +1.14999997615814 -9.83982944488525 +1.20000004768372 -7.18163919448853 +1.25 -3.01608276367188 +1.29999995231628 0.120339155197144 +1.35000002384186 2.50882983207703 +1.39999997615814 4.05655097961426 +1.45000004768372 4.56769418716431 +1.5 4.57700729370117 +1.54999995231628 4.07200336456299 +1.60000002384186 3.96518659591675 +1.64999997615814 3.76872539520264 +1.70000004768372 3.43790769577026 +1.75 2.78378081321716 +1.79999995231628 1.85426473617554 +1.85000002384186 0.852705955505371 +1.89999997615814 39.2006492614746 +2 4.88767290115356 +2.04999995231628 0.829306125640869 +2.09999990463257 0.441462874412537 +2.15000009536743 1.028315782547 +2.20000004768372 1.20706105232239 +2.25 -0.341105103492737 +2.29999995231628 -3.10087656974792 +2.34999990463257 -5.4143328666687 +2.40000009536743 -7.17361497879028 +2.45000004768372 -8.31033992767334 +2.5 -8.23518180847168 +2.54999995231628 -6.78589200973511 +2.65000009536743 -1.21341621875763 +2.70000004768372 0.948808908462524 +2.75 2.47381520271301 +2.79999995231628 3.38009715080261 +2.84999990463257 4.09732341766357 +2.90000009536743 4.60260534286499 +2.95000004768372 4.89380741119385 +3 5.07613372802734 +3.04999995231628 5.15852165222168 +3.09999990463257 5.1243748664856 +3.15000009536743 4.95596361160278 +3.20000004768372 4.67889404296875 +3.25 4.33179903030396 +3.29999995231628 3.93202042579651 +3.34999990463257 3.39901065826416 +3.40000009536743 2.66835427284241 +3.54999995231628 0.00767278671264648 +3.59999990463257 -0.672966718673706 +3.75 -2.45000839233398 +3.79999995231628 -3.05866050720215 +3.84999990463257 -3.57131433486938 +3.90000009536743 -4.00436401367188 +3.95000004768372 -4.33360004425049 +4 -4.43768310546875 +4.05000019073486 -4.28171396255493 +4.09999990463257 -3.83089971542358 +4.19999980926514 -2.58125615119934 +4.25 -1.87957751750946 +4.30000019073486 -1.23932349681854 +4.34999990463257 -0.648844003677368 +4.40000009536743 -0.132075309753418 +4.44999980926514 0.300155520439148 +4.5 0.652803659439087 +4.55000019073486 0.932989597320557 +4.59999990463257 1.14568328857422 +4.65000009536743 1.29083466529846 +4.69999980926514 1.36776268482208 +4.75 1.38858199119568 +4.80000019073486 1.36066484451294 +4.84999990463257 1.30141019821167 +4.90000009536743 1.21532607078552 +4.94999980926514 1.10311472415924 +5 0.962586402893066 +5.05000019073486 0.807267785072327 +5.09999990463257 0.640040874481201 +5.19999980926514 0.280190467834473 +5.30000019073486 -0.0827970504760742 +5.34999990463257 -0.244121670722961 +5.40000009536743 -0.383161544799805 +5.44999980926514 -0.496577024459839 +5.5 -0.582364320755005 +5.55000019073486 -0.641577005386353 +5.59999990463257 -0.672548770904541 +5.65000009536743 -0.673568725585938 +5.69999980926514 -0.65637469291687 +5.75 -0.616556763648987 +5.80000019073486 -0.554877638816833 +5.84999990463257 -0.476263284683228 +6.05000019073486 -0.120394110679626 +6.09999990463257 -0.0392366647720337 +6.15000009536743 0.034543514251709 +6.19999980926514 0.0993950366973877 +6.25 0.154694795608521 +6.30000019073486 0.199155330657959 +6.34999990463257 0.231175899505615 +6.40000009536743 0.251533389091492 +6.44999980926514 0.261793375015259 +6.5 0.26303505897522 +6.55000019073486 0.254929423332214 +6.59999990463257 0.239159107208252 +6.65000009536743 0.217308282852173 +6.69999980926514 0.18996000289917 +6.80000019073486 0.124739408493042 +7.05000019073486 -0.0385557413101196 +7.09999990463257 -0.064452052116394 +7.15000009536743 -0.0855569839477539 +7.19999980926514 -0.101575493812561 +7.25 -0.112316846847534 +7.30000019073486 -0.117862224578857 +7.34999990463257 -0.118492007255554 +7.40000009536743 -0.113857626914978 +7.5 -0.093222975730896 +7.65000009536743 -0.0510944128036499 +7.80000019073486 -0.00778985023498535 +7.90000009536743 0.0161279439926147 +8 0.0334899425506592 +8.10000038146973 0.0433659553527832 +8.19999980926514 0.0459102392196655 +8.30000019073486 0.0422691106796265 +8.44999980926514 0.0285452604293823 +8.80000019073486 -0.0100536346435547 +8.94999980926514 -0.0180414915084839 +9.10000038146973 -0.0190005302429199 +9.30000019073486 -0.0122296810150146 +9.69999980926514 0.00508999824523926 +9.89999961853027 0.00783455371856689 +10 0.00748765468597412 +}; +\end{groupplot} + +\draw ({$(current bounding box.south west)!0.51!(current bounding box.south east)$}|-{$(current bounding box.south west)!-0.04!(current bounding box.north west)$}) node[ + anchor=south, + text=black, + rotate=0.0 +]{Time [s]}; +\end{tikzpicture} diff --git a/images/contributions/chapter_8/cartpole_jaxsim_vs_mujoco.tikz b/images/contributions/chapter_8/cartpole_jaxsim_vs_mujoco.tikz new file mode 100644 index 0000000..4894f3b --- /dev/null +++ b/images/contributions/chapter_8/cartpole_jaxsim_vs_mujoco.tikz @@ -0,0 +1,682 @@ +% This file was created with tikzplotlib v0.10.1. +\begin{tikzpicture} + +\definecolor{darkgray176}{RGB}{176,176,176} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{teal29105150}{RGB}{29,105,150} + +\begin{groupplot}[group style={group size=1 by 3}] +\nextgroupplot[ +width=0.98*\textwidth, +height=0.8*\axisdefaultheight, +scaled x ticks=manual:{}{\pgfmathparse{#1}}, +tick align=outside, +tick pos=left, +title={\textbf{Sim-to-sim comparison of cartpole swing-up trajectories}}, +x grid style={darkgray176!30}, +xmajorgrids, +xmin=-0.5, xmax=10.5, +xtick style={color=black}, +xticklabels={}, +y grid style={darkgray176!30}, +ylabel={Pole angle $\theta$ [rad]}, +ymajorgrids, +ymin=-3.3464425162473, ymax=3.45054670929632, +ytick style={color=black}, +legend cell align={left}, +legend style={ + fill opacity=0.8, + draw opacity=1, + text opacity=1, + anchor=north east, + draw=white!80!black +}, +] +\addlegendentry{\jaxsim} +\addlegendentry{Mujoco} +\addplot [teal29105150] +table {% +0 3.12828063964844 +0.0499999523162842 3.08973956108093 +0.100000023841858 3.02917838096619 +0.149999976158142 2.95181894302368 +0.25 2.77702474594116 +0.299999952316284 2.6979808807373 +0.350000023841858 2.63771986961365 +0.399999976158142 2.60506439208984 +0.450000047683716 2.6073534488678 +0.5 2.65015053749084 +0.549999952316284 2.73693060874939 +0.600000023841858 2.86878252029419 +0.649999976158142 3.04197525978088 +0.700000047683716 -3.03748846054077 +0.75 -2.78491187095642 +0.850000023841858 -2.16993522644043 +0.899999976158142 -1.89504849910736 +0.950000047683716 -1.66008424758911 +1 -1.46315765380859 +1.04999995231628 -1.30036675930023 +1.10000002384186 -1.16649794578552 +1.14999997615814 -1.05624032020569 +1.20000004768372 -0.965004682540894 +1.25 -0.888532161712646 +1.29999995231628 -0.823412895202637 +1.35000002384186 -0.767791152000427 +1.39999997615814 -0.72126030921936 +1.45000004768372 -0.685214519500732 +1.5 -0.662487506866455 +1.54999995231628 -0.657304525375366 +1.60000002384186 -0.675146698951721 +1.64999997615814 -0.720012187957764 +1.70000004768372 -0.792727708816528 +1.75 -0.893112063407898 +1.79999995231628 -1.02188420295715 +1.85000002384186 -1.18090629577637 +1.89999997615814 -1.37279200553894 +1.95000004768372 -1.59993529319763 +2 -1.8638322353363 +2.04999995231628 -2.16430830955505 +2.09999990463257 -2.49803972244263 +2.15000009536743 -2.85696887969971 +2.20000004768372 3.0549693107605 +2.29999995231628 2.26932859420776 +2.34999990463257 1.92361342906952 +2.40000009536743 1.62233877182007 +2.45000004768372 1.35905873775482 +2.5 1.12626707553864 +2.54999995231628 0.918895244598389 +2.59999990463257 0.73847508430481 +2.65000009536743 0.590020418167114 +2.70000004768372 0.476671934127808 +2.75 0.397546410560608 +2.79999995231628 0.346388339996338 +2.84999990463257 0.313953638076782 +2.90000009536743 0.293228387832642 +2.95000004768372 0.27950918674469 +3.04999995231628 0.261070966720581 +3.15000009536743 0.244996786117554 +3.25 0.222989320755005 +3.29999995231628 0.20787513256073 +3.34999990463257 0.18962824344635 +3.45000004768372 0.145071983337402 +3.65000009536743 0.0436937808990479 +3.75 0.00320017337799072 +3.84999990463257 -0.0275084972381592 +3.90000009536743 -0.0391875505447388 +3.95000004768372 -0.0482983589172363 +4.05000019073486 -0.0594476461410522 +4.15000009536743 -0.0630413293838501 +4.25 -0.0616987943649292 +4.40000009536743 -0.0545977354049683 +5.15000009536743 -0.0103991031646729 +5.40000009536743 -0.0031052827835083 +5.69999980926514 0.000843405723571777 +6.19999980926514 0.00169563293457031 +9 -4.76837158203125e-06 +10 -1.19209289550781e-07 +}; +\addplot [indianred2048062] +table {% +0 3.14159274101257 +0.0499999523162842 3.12841296195984 +0.100000023841858 3.09000372886658 +0.149999976158142 3.02959418296814 +0.200000047683716 2.95245027542114 +0.299999952316284 2.77854895591736 +0.350000023841858 2.70033621788025 +0.399999976158142 2.64125347137451 +0.450000047683716 2.6101701259613 +0.5 2.61448645591736 +0.549999952316284 2.65977168083191 +0.600000023841858 2.74946117401123 +0.649999976158142 2.88445234298706 +0.700000047683716 3.06046462059021 +0.75 -3.01709961891174 +0.799999952316284 -2.76439499855042 +0.899999976158142 -2.15505146980286 +0.950000047683716 -1.88420593738556 +1 -1.65372931957245 +1.04999995231628 -1.46192443370819 +1.10000002384186 -1.30490899085999 +1.14999997615814 -1.17753517627716 +1.20000004768372 -1.07466351985931 +1.25 -0.991733312606812 +1.29999995231628 -0.92469310760498 +1.35000002384186 -0.870853781700134 +1.39999997615814 -0.829620122909546 +1.45000004768372 -0.802276968955994 +1.5 -0.792054533958435 +1.54999995231628 -0.803150177001953 +1.60000002384186 -0.839268326759338 +1.64999997615814 -0.902688264846802 +1.70000004768372 -0.994869589805603 +1.75 -1.11740469932556 +1.79999995231628 -1.27220284938812 +1.85000002384186 -1.461594581604 +1.89999997615814 -1.68766963481903 +1.95000004768372 -1.95152020454407 +2 -2.2521960735321 +2.04999995231628 -2.58497595787048 +2.09999990463257 -2.94019150733948 +2.15000009536743 2.97896265983582 +2.25 2.22001838684082 +2.29999995231628 1.88686096668243 +2.34999990463257 1.5971622467041 +2.40000009536743 1.34546029567719 +2.45000004768372 1.12487685680389 +2.5 0.92871356010437 +2.54999995231628 0.755142211914062 +2.59999990463257 0.607916593551636 +2.65000009536743 0.491432905197144 +2.70000004768372 0.406946182250977 +2.75 0.350964784622192 +2.79999995231628 0.315355062484741 +2.84999990463257 0.292256832122803 +2.90000009536743 0.276500940322876 +3 0.253287196159363 +3.09999990463257 0.230611085891724 +3.15000009536743 0.216920733451843 +3.20000004768372 0.200592279434204 +3.25 0.181232929229736 +3.34999990463257 0.134451508522034 +3.5 0.0574408769607544 +3.54999995231628 0.0351225137710571 +3.59999990463257 0.0154869556427002 +3.65000009536743 -0.00152981281280518 +3.70000004768372 -0.0159744024276733 +3.75 -0.0279232263565063 +3.84999990463257 -0.044538140296936 +3.95000004768372 -0.0527807474136353 +4.05000019073486 -0.0549479722976685 +4.19999980926514 -0.0514926910400391 +4.40000009536743 -0.0409291982650757 +4.84999990463257 -0.0158613920211792 +5.09999990463257 -0.00675451755523682 +5.40000009536743 -0.00098109245300293 +5.80000019073486 0.00123274326324463 +7.19999980926514 9.8109245300293e-05 +10 0 +}; + +\nextgroupplot[ +width=0.99*\textwidth, +height=0.8*\axisdefaultheight, +scaled x ticks=manual:{}{\pgfmathparse{#1}}, +tick align=outside, +tick pos=left, +x grid style={darkgray176!30}, +xmajorgrids, +xmin=-0.5, xmax=10.5, +xtick style={color=black}, +xticklabels={}, +y grid style={darkgray176!30}, +ylabel={Cart position $d$ [m]}, +ymajorgrids, +ymin=-0.631790527701378, ymax=0.67541010081768, +ytick style={color=black}, +legend cell align={left}, +legend style={ + fill opacity=0.8, + draw opacity=1, + text opacity=1, + anchor=north east, + draw=white!80!black +}, +] +\addlegendentry{\jaxsim} +\addlegendentry{Mujoco} +\addplot [teal29105150] +table {% +0 0.00890243053436279 +0.0499999523162842 0.0350052118301392 +0.100000023841858 0.0771286487579346 +0.149999976158142 0.133326053619385 +0.200000047683716 0.200749754905701 +0.25 0.275750160217285 +0.299999952316284 0.353763818740845 +0.350000023841858 0.429482340812683 +0.399999976158142 0.497620105743408 +0.450000047683716 0.553164601325989 +0.5 0.591783046722412 +0.549999952316284 0.610209941864014 +0.600000023841858 0.606362819671631 +0.649999976158142 0.580616354942322 +0.700000047683716 0.536728620529175 +0.75 0.457116484642029 +0.799999952316284 0.330049991607666 +0.850000023841858 0.183290719985962 +0.899999976158142 0.0418491363525391 +0.950000047683716 -0.0884727239608765 +1 -0.206520676612854 +1.04999995231628 -0.310338258743286 +1.10000002384186 -0.398048877716064 +1.14999997615814 -0.468464016914368 +1.20000004768372 -0.521247506141663 +1.25 -0.555911421775818 +1.29999995231628 -0.572372317314148 +1.35000002384186 -0.571665406227112 +1.39999997615814 -0.55568265914917 +1.45000004768372 -0.527414083480835 +1.5 -0.490539312362671 +1.60000002384186 -0.408573150634766 +1.64999997615814 -0.371230363845825 +1.70000004768372 -0.337037682533264 +1.79999995231628 -0.27104377746582 +1.85000002384186 -0.236450910568237 +1.89999997615814 -0.199932217597961 +1.95000004768372 -0.16098940372467 +2 -0.118803739547729 +2.04999995231628 -0.0728247165679932 +2.09999990463257 -0.0231015682220459 +2.15000009536743 0.0295578241348267 +2.20000004768372 0.0837509632110596 +2.25 0.156019926071167 +2.29999995231628 0.236790895462036 +2.34999990463257 0.298767566680908 +2.40000009536743 0.341155767440796 +2.45000004768372 0.365291714668274 +2.5 0.372175097465515 +2.54999995231628 0.366458415985107 +2.59999990463257 0.357807874679565 +2.65000009536743 0.35550332069397 +2.70000004768372 0.36490261554718 +2.75 0.387304067611694 +2.79999995231628 0.419624328613281 +2.90000009536743 0.493313670158386 +2.95000004768372 0.52774441242218 +3 0.557626366615295 +3.04999995231628 0.581999063491821 +3.09999990463257 0.600204229354858 +3.15000009536743 0.611686706542969 +3.20000004768372 0.615991830825806 +3.25 0.612850427627563 +3.29999995231628 0.602138757705688 +3.34999990463257 0.584131240844727 +3.40000009536743 0.559529066085815 +3.45000004768372 0.529090285301208 +3.5 0.49378502368927 +3.54999995231628 0.454969644546509 +3.70000004768372 0.332336902618408 +3.75 0.292792439460754 +3.79999995231628 0.254786610603333 +3.84999990463257 0.218716621398926 +3.90000009536743 0.184960007667542 +3.95000004768372 0.15387237071991 +4 0.125602841377258 +4.05000019073486 0.100119590759277 +4.09999990463257 0.0773657560348511 +4.15000009536743 0.0571831464767456 +4.19999980926514 0.0393478870391846 +4.25 0.0236907005310059 +4.30000019073486 0.0100297927856445 +4.34999990463257 -0.00180017948150635 +4.40000009536743 -0.0119644403457642 +4.44999980926514 -0.0206114053726196 +4.5 -0.027857780456543 +4.55000019073486 -0.0338411331176758 +4.59999990463257 -0.0386948585510254 +4.65000009536743 -0.042540431022644 +4.69999980926514 -0.0454915761947632 +4.75 -0.0476570129394531 +4.80000019073486 -0.0491417646408081 +4.84999990463257 -0.0500349998474121 +4.94999980926514 -0.0503547191619873 +5.05000019073486 -0.0491746664047241 +5.15000009536743 -0.0470380783081055 +5.34999990463257 -0.041523814201355 +5.59999990463257 -0.0347222089767456 +5.80000019073486 -0.0304003953933716 +6 -0.0272568464279175 +6.19999980926514 -0.0251579284667969 +6.44999980926514 -0.0236357450485229 +6.75 -0.0228226184844971 +7.25 -0.0225495100021362 +10 -0.0227161645889282 +}; +\addplot [indianred2048062] +table {% +0 0 +0.0499999523162842 0.00881421566009521 +0.100000023841858 0.034833550453186 +0.149999976158142 0.0768800973892212 +0.200000047683716 0.133003950119019 +0.25 0.200346350669861 +0.299999952316284 0.275237441062927 +0.350000023841858 0.35309100151062 +0.399999976158142 0.428573966026306 +0.450000047683716 0.496377110481262 +0.5 0.551440954208374 +0.549999952316284 0.589408278465271 +0.600000023841858 0.606992483139038 +0.649999976158142 0.602147817611694 +0.700000047683716 0.575459480285645 +0.75 0.530866503715515 +0.799999952316284 0.451065182685852 +0.850000023841858 0.324338912963867 +0.899999976158142 0.178063750267029 +0.950000047683716 0.0366272926330566 +1 -0.0941169261932373 +1.04999995231628 -0.212679147720337 +1.10000002384186 -0.316718101501465 +1.14999997615814 -0.404213786125183 +1.20000004768372 -0.47402036190033 +1.25 -0.525506615638733 +1.29999995231628 -0.558065176010132 +1.35000002384186 -0.572020173072815 +1.39999997615814 -0.569268941879272 +1.45000004768372 -0.55285120010376 +1.5 -0.526931762695312 +1.54999995231628 -0.495857954025269 +1.60000002384186 -0.462857246398926 +1.64999997615814 -0.429202437400818 +1.70000004768372 -0.394768834114075 +1.75 -0.358934760093689 +1.79999995231628 -0.320860743522644 +1.85000002384186 -0.280032634735107 +1.89999997615814 -0.236078023910522 +1.95000004768372 -0.18831729888916 +2 -0.136427760124207 +2.04999995231628 -0.0809823274612427 +2.09999990463257 -0.0232033729553223 +2.15000009536743 0.0355653762817383 +2.20000004768372 0.111127614974976 +2.25 0.195897340774536 +2.29999995231628 0.264551520347595 +2.34999990463257 0.316031932830811 +2.40000009536743 0.351408958435059 +2.45000004768372 0.370806813240051 +2.5 0.375476717948914 +2.59999990463257 0.367850542068481 +2.65000009536743 0.371694803237915 +2.70000004768372 0.386312961578369 +2.75 0.410823345184326 +2.79999995231628 0.440624475479126 +2.84999990463257 0.471022963523865 +2.90000009536743 0.498998641967773 +2.95000004768372 0.522289514541626 +3 0.539729356765747 +3.04999995231628 0.550746440887451 +3.09999990463257 0.554946184158325 +3.15000009536743 0.551987648010254 +3.20000004768372 0.54172420501709 +3.25 0.524322152137756 +3.29999995231628 0.500394821166992 +3.34999990463257 0.470903635025024 +3.40000009536743 0.437037706375122 +3.45000004768372 0.400330066680908 +3.54999995231628 0.324892044067383 +3.59999990463257 0.288179755210876 +3.65000009536743 0.25282096862793 +3.70000004768372 0.219209909439087 +3.75 0.187657237052917 +3.79999995231628 0.158442258834839 +3.84999990463257 0.131726503372192 +3.90000009536743 0.107518434524536 +3.95000004768372 0.0857696533203125 +4 0.0663524866104126 +4.05000019073486 0.0491009950637817 +4.09999990463257 0.0338577032089233 +4.15000009536743 0.020465612411499 +4.19999980926514 0.00879073143005371 +4.25 -0.00129985809326172 +4.30000019073486 -0.00992894172668457 +4.34999990463257 -0.0172281265258789 +4.40000009536743 -0.0233333110809326 +4.44999980926514 -0.0283607244491577 +4.5 -0.0324239730834961 +4.55000019073486 -0.0356357097625732 +4.59999990463257 -0.0381040573120117 +4.65000009536743 -0.039919376373291 +4.69999980926514 -0.0411621332168579 +4.80000019073486 -0.0422399044036865 +4.90000009536743 -0.0418741703033447 +5 -0.0405844449996948 +5.19999980926514 -0.0366964340209961 +5.5 -0.0307084321975708 +5.69999980926514 -0.0276848077774048 +5.90000009536743 -0.0255707502365112 +6.15000009536743 -0.0239710807800293 +6.44999980926514 -0.0230597257614136 +6.94999980926514 -0.0226665735244751 +9.89999961853027 -0.02271568775177 +10 -0.02271568775177 +}; + +\nextgroupplot[ +width=0.99*\textwidth, +height=0.8*\axisdefaultheight, +tick align=outside, +tick pos=left, +x grid style={darkgray176!30}, +xmajorgrids, +xmin=-0.5, xmax=10.5, +xtick style={color=black}, +y grid style={darkgray176!30}, +ylabel={Force applied to cart $f$ [N]}, +ymajorgrids, +ymin=-25.1923530578613, ymax=17.3741691589355, +ytick style={color=black}, +legend cell align={left}, +legend style={ + fill opacity=0.8, + draw opacity=1, + text opacity=1, + anchor=north east, + draw=white!80!black +}, +] +\addlegendentry{\jaxsim} +\addlegendentry{Mujoco} +\addplot [teal29105150] +table {% +0 7.22933101654053 +0.0499999523162842 6.90207624435425 +0.100000023841858 6.32859230041504 +0.149999976158142 5.42100811004639 +0.200000047683716 4.14687442779541 +0.25 2.58467864990234 +0.299999952316284 0.515276074409485 +0.399999976158142 -3.88790082931519 +0.450000047683716 -5.96117973327637 +0.5 -7.50577974319458 +0.549999952316284 -8.61604690551758 +0.600000023841858 -9.13747310638428 +0.649999976158142 -8.31422328948975 +0.700000047683716 -6.33481073379517 +0.75 -23.257511138916 +0.799999952316284 -17.7160568237305 +0.850000023841858 -2.0039644241333 +0.899999976158142 3.09535670280457 +0.950000047683716 3.72238874435425 +1 5.06463956832886 +1.04999995231628 6.29326009750366 +1.10000002384186 7.32018852233887 +1.14999997615814 7.71118783950806 +1.20000004768372 7.81918144226074 +1.25 8.22388172149658 +1.29999995231628 7.92653036117554 +1.35000002384186 7.33710241317749 +1.39999997615814 6.30399894714355 +1.45000004768372 4.79155397415161 +1.5 3.18720054626465 +1.60000002384186 -0.828768849372864 +1.64999997615814 -1.36188519001007 +1.75 0.0335919857025146 +1.79999995231628 0.53402853012085 +1.85000002384186 0.642840147018433 +1.89999997615814 0.402592301368713 +1.95000004768372 0.27892529964447 +2 0.127048492431641 +2.04999995231628 -0.228793263435364 +2.09999990463257 -0.492278575897217 +2.15000009536743 -0.533463478088379 +2.20000004768372 -0.0438318252563477 +2.25 15.4393272399902 +2.29999995231628 -4.82834959030151 +2.34999990463257 -6.47109746932983 +2.40000009536743 -6.26959419250488 +2.45000004768372 -6.5345606803894 +2.5 -6.49109983444214 +2.54999995231628 -3.30576181411743 +2.59999990463257 1.23575913906097 +2.65000009536743 4.12180995941162 +2.70000004768372 5.39663076400757 +2.75 5.01427793502808 +2.84999990463257 0.446870446205139 +2.95000004768372 -1.81358420848846 +3 -2.32077383995056 +3.09999990463257 -2.85167288780212 +3.15000009536743 -3.04018068313599 +3.20000004768372 -3.21372294425964 +3.25 -3.24813985824585 +3.29999995231628 -3.29619455337524 +3.40000009536743 -2.69155955314636 +3.45000004768372 -2.34745788574219 +3.5 -1.86065101623535 +3.59999990463257 -0.528856515884399 +3.65000009536743 0.0371658802032471 +3.70000004768372 0.355319976806641 +3.75 0.526922225952148 +3.79999995231628 0.727855324745178 +3.84999990463257 0.877295732498169 +3.90000009536743 1.05789351463318 +3.95000004768372 1.18671071529388 +4 1.19490802288055 +4.05000019073486 1.17090392112732 +4.09999990463257 1.15480709075928 +4.15000009536743 1.0460342168808 +4.19999980926514 0.972355246543884 +4.25 0.906895160675049 +4.30000019073486 0.821828603744507 +4.34999990463257 0.767959117889404 +4.40000009536743 0.683352947235107 +4.44999980926514 0.641496777534485 +4.59999990463257 0.468937158584595 +4.69999980926514 0.370583772659302 +4.80000019073486 0.279679298400879 +4.84999990463257 0.247013330459595 +4.90000009536743 0.210584282875061 +4.94999980926514 0.184504389762878 +5.09999990463257 0.0936516523361206 +5.19999980926514 0.0493795871734619 +5.30000019073486 0.0188789367675781 +5.40000009536743 -0.00356149673461914 +5.5 -0.0180238485336304 +5.59999990463257 -0.0265772342681885 +5.75 -0.0315145254135132 +5.94999980926514 -0.030137300491333 +6.25 -0.0209922790527344 +6.69999980926514 -0.00819253921508789 +7.15000009536743 -0.00207018852233887 +7.80000019073486 0.000114798545837402 +10 1.19209289550781e-07 +}; +\addplot [indianred2048062] +table {% +0 7.22933197021484 +0.0499999523162842 6.90288639068604 +0.100000023841858 6.32860660552979 +0.149999976158142 5.41723680496216 +0.200000047683716 4.13550186157227 +0.25 2.56257319450378 +0.299999952316284 0.480666875839233 +0.350000023841858 -1.7945431470871 +0.450000047683716 -6.04088068008423 +0.5 -7.56744384765625 +0.549999952316284 -8.7059326171875 +0.600000023841858 -9.16715335845947 +0.649999976158142 -8.23786067962646 +0.700000047683716 -6.24103307723999 +0.75 -23.1848068237305 +0.799999952316284 -17.4136085510254 +0.850000023841858 -2.07990741729736 +0.899999976158142 2.85365509986877 +0.950000047683716 3.67996978759766 +1.04999995231628 6.58129978179932 +1.10000002384186 7.4976601600647 +1.14999997615814 7.94723415374756 +1.20000004768372 8.24588775634766 +1.25 8.56696128845215 +1.29999995231628 8.00359725952148 +1.35000002384186 6.95196151733398 +1.39999997615814 5.4132776260376 +1.45000004768372 3.41205883026123 +1.5 1.71932089328766 +1.54999995231628 0.650655746459961 +1.60000002384186 0.574363470077515 +1.64999997615814 0.625057816505432 +1.70000004768372 0.870755672454834 +1.75 0.960698366165161 +1.79999995231628 0.723125457763672 +1.85000002384186 0.461954474449158 +1.89999997615814 0.330698132514954 +1.95000004768372 -0.174191832542419 +2 -0.56350839138031 +2.04999995231628 -0.578842043876648 +2.09999990463257 0.102494955062866 +2.15000009536743 14.9047269821167 +2.20000004768372 -3.68910050392151 +2.25 -5.51284074783325 +2.29999995231628 -5.41098308563232 +2.34999990463257 -5.79524898529053 +2.40000009536743 -6.36348819732666 +2.45000004768372 -5.44280385971069 +2.5 -1.62004935741425 +2.54999995231628 1.89433884620667 +2.59999990463257 4.08388185501099 +2.65000009536743 4.46958827972412 +2.70000004768372 3.2725088596344 +2.75 0.649245977401733 +2.84999990463257 -1.82059442996979 +2.90000009536743 -2.43413972854614 +2.95000004768372 -2.76438927650452 +3 -2.88924431800842 +3.04999995231628 -3.07263517379761 +3.09999990463257 -3.15169620513916 +3.15000009536743 -3.17159175872803 +3.20000004768372 -2.98918271064758 +3.25 -2.63670301437378 +3.29999995231628 -2.16481280326843 +3.34999990463257 -1.62289249897003 +3.40000009536743 -0.867696285247803 +3.45000004768372 -0.165898442268372 +3.5 0.246776938438416 +3.59999990463257 0.642847895622253 +3.65000009536743 0.792241811752319 +3.75 1.03929150104523 +3.79999995231628 1.06595206260681 +3.84999990463257 1.05697762966156 +3.90000009536743 1.0340963602066 +3.95000004768372 0.956934809684753 +4 0.900716423988342 +4.05000019073486 0.828515768051147 +4.19999980926514 0.659697771072388 +4.25 0.612848043441772 +4.30000019073486 0.547633409500122 +4.44999980926514 0.399682760238647 +4.5 0.350879430770874 +4.55000019073486 0.306310653686523 +4.69999980926514 0.201528787612915 +4.75 0.178205728530884 +4.84999990463257 0.116938591003418 +4.90000009536743 0.0910664796829224 +4.94999980926514 0.0696830749511719 +5 0.0516270399093628 +5.09999990463257 0.0240956544876099 +5.19999980926514 0.00269067287445068 +5.30000019073486 -0.0112292766571045 +5.40000009536743 -0.019273042678833 +5.55000019073486 -0.0243669748306274 +5.75 -0.0234584808349609 +6.84999990463257 -0.00193357467651367 +7.59999990463257 5.340576171875e-05 +10 -8.34465026855469e-07 +}; +\end{groupplot} + +\draw ({$(current bounding box.south west)!0.58!(current bounding box.south east)$}|-{$(current bounding box.south west)!-0.04!(current bounding box.north west)$}) node[ + anchor=south, + text=black, + rotate=0.0 +]{Time [s]}; +\end{tikzpicture} diff --git a/images/contributions/chapter_8/learning_curves.tikz b/images/contributions/chapter_8/learning_curves.tikz new file mode 100644 index 0000000..786fe87 --- /dev/null +++ b/images/contributions/chapter_8/learning_curves.tikz @@ -0,0 +1,2874 @@ +% This file was created with tikzplotlib v0.10.1. +\begin{tikzpicture} + +\definecolor{darkgray176}{RGB}{176,176,176} +\definecolor{silver180200168}{RGB}{180,200,168} +\definecolor{teal}{RGB}{0,128,128} + +\begin{axis}[ +width=0.8*\textwidth, +height=\axisdefaultheight, +tick align=outside, +tick pos=left, +title={\textbf{Average reward of the cartpole swing-up task}}, +x grid style={darkgray176!30}, +xlabel={Samples}, +xmajorgrids, +xmin=-24601.6, xmax=527897.6, +xtick style={color=black}, +y grid style={darkgray176!30}, +ymajorgrids, +ymin=-0.485883057615086, ymax=2.3, +ytick style={color=black}, +% +legend cell align={left}, +legend style={ + fill opacity=0.8, + draw opacity=1, + text opacity=1, + at={(0.97,0.03)}, + anchor=south east, + draw=white!80!black +}, +] +\addlegendimage{area legend, draw=silver180200168, fill=silver180200168} +\addlegendentry{$\pm \sigma$} +\path [draw=silver180200168, fill=silver180200168] +(axis cs:512,-0.0334432687843966) +--(axis cs:512,-0.0760355292017089) +--(axis cs:1024,-0.125564428506183) +--(axis cs:1536,-0.0605580029320801) +--(axis cs:2048,-0.163199088858746) +--(axis cs:2560,-0.229005380119447) +--(axis cs:3072,-0.281221640693616) +--(axis cs:3584,-0.308396103122968) +--(axis cs:4096,-0.356177379306277) +--(axis cs:4608,-0.359452369357445) +--(axis cs:5120,-0.365342311456732) +--(axis cs:5632,-0.376970275057085) +--(axis cs:6144,-0.34130159432209) +--(axis cs:6656,-0.319149785802188) +--(axis cs:7168,-0.300639728071469) +--(axis cs:7680,-0.283195445250103) +--(axis cs:8192,-0.29291294073075) +--(axis cs:8704,-0.287722661778006) +--(axis cs:9216,-0.301426230878208) +--(axis cs:9728,-0.326385485294115) +--(axis cs:10240,-0.332162544861962) +--(axis cs:10752,-0.342705373770876) +--(axis cs:11264,-0.311559804686294) +--(axis cs:11776,-0.304788521304883) +--(axis cs:12288,-0.295106821098414) +--(axis cs:12800,-0.278606045437036) +--(axis cs:13312,-0.275634774346427) +--(axis cs:13824,-0.262178953084814) +--(axis cs:14336,-0.252871023547725) +--(axis cs:14848,-0.24673306179309) +--(axis cs:15360,-0.254717432173942) +--(axis cs:15872,-0.263025906265022) +--(axis cs:16384,-0.26831542417566) +--(axis cs:16896,-0.283465750498615) +--(axis cs:17408,-0.299454319507425) +--(axis cs:17920,-0.279578541628238) +--(axis cs:18432,-0.281828064216883) +--(axis cs:18944,-0.271934503363371) +--(axis cs:19456,-0.261573531707471) +--(axis cs:19968,-0.271349837238113) +--(axis cs:20480,-0.282269784000755) +--(axis cs:20992,-0.285293587273408) +--(axis cs:21504,-0.286227234280024) +--(axis cs:22016,-0.290833505172999) +--(axis cs:22528,-0.280966742502568) +--(axis cs:23040,-0.27607076288107) +--(axis cs:23552,-0.270784124295394) +--(axis cs:24064,-0.273128075472102) +--(axis cs:24576,-0.279033197053553) +--(axis cs:25088,-0.267116803801847) +--(axis cs:25600,-0.276663010343161) +--(axis cs:26112,-0.278597464242176) +--(axis cs:26624,-0.2841256506801) +--(axis cs:27136,-0.27040398007958) +--(axis cs:27648,-0.261613720646101) +--(axis cs:28160,-0.255852323682162) +--(axis cs:28672,-0.262980753874717) +--(axis cs:29184,-0.273563050227888) +--(axis cs:29696,-0.266995541584986) +--(axis cs:30208,-0.251941849503592) +--(axis cs:30720,-0.243573312897101) +--(axis cs:31232,-0.221684729275884) +--(axis cs:31744,-0.212908114852037) +--(axis cs:32256,-0.199639122784816) +--(axis cs:32768,-0.20544544283046) +--(axis cs:33280,-0.209032342526209) +--(axis cs:33792,-0.213192013694462) +--(axis cs:34304,-0.206634759101954) +--(axis cs:34816,-0.198007480576792) +--(axis cs:35328,-0.188973471150921) +--(axis cs:35840,-0.190370827298048) +--(axis cs:36352,-0.197746182690931) +--(axis cs:36864,-0.203581101721657) +--(axis cs:37376,-0.218269339612834) +--(axis cs:37888,-0.212798070938695) +--(axis cs:38400,-0.210736798892768) +--(axis cs:38912,-0.188650830381709) +--(axis cs:39424,-0.172492987321114) +--(axis cs:39936,-0.160705941084626) +--(axis cs:40448,-0.150907749000352) +--(axis cs:40960,-0.141824576083381) +--(axis cs:41472,-0.138015822147632) +--(axis cs:41984,-0.117686520276718) +--(axis cs:42496,-0.102582279061538) +--(axis cs:43008,-0.0971939293563254) +--(axis cs:43520,-0.114279416595973) +--(axis cs:44032,-0.113708222540842) +--(axis cs:44544,-0.113552544274656) +--(axis cs:45056,-0.109478667284214) +--(axis cs:45568,-0.0909483107633721) +--(axis cs:46080,-0.0860209364626829) +--(axis cs:46592,-0.0717811329307197) +--(axis cs:47104,-0.0665053302301164) +--(axis cs:47616,-0.0545481765891385) +--(axis cs:48128,-0.0553780478781278) +--(axis cs:48640,-0.054678718206497) +--(axis cs:49152,-0.0675333048713339) +--(axis cs:49664,-0.0500689414840588) +--(axis cs:50176,-0.0585854072593439) +--(axis cs:50688,-0.052267163529977) +--(axis cs:51200,-0.044567087390368) +--(axis cs:51712,-0.0474751953094755) +--(axis cs:52224,-0.0452715889797723) +--(axis cs:52736,-0.0189944829242448) +--(axis cs:53248,-0.00772879733183334) +--(axis cs:53760,-0.0126545241499927) +--(axis cs:54272,-0.00451450020196591) +--(axis cs:54784,-0.00295952229374213) +--(axis cs:55296,0.00920659399335619) +--(axis cs:55808,0.0289960944986077) +--(axis cs:56320,0.020356359714843) +--(axis cs:56832,0.0327799237236525) +--(axis cs:57344,0.0355698031678987) +--(axis cs:57856,0.0560948952753775) +--(axis cs:58368,0.0683528183178387) +--(axis cs:58880,0.0808946386522734) +--(axis cs:59392,0.0869486531628597) +--(axis cs:59904,0.111895125674853) +--(axis cs:60416,0.129821325662684) +--(axis cs:60928,0.135313532708542) +--(axis cs:61440,0.140816111698435) +--(axis cs:61952,0.147648547522729) +--(axis cs:62464,0.154817267808887) +--(axis cs:62976,0.161859096173051) +--(axis cs:63488,0.174512578708121) +--(axis cs:64000,0.179939573198627) +--(axis cs:64512,0.18219152071853) +--(axis cs:65024,0.189825577057455) +--(axis cs:65536,0.19908636844489) +--(axis cs:66048,0.216535443501563) +--(axis cs:66560,0.214420496862485) +--(axis cs:67072,0.218088305504439) +--(axis cs:67584,0.227751818492921) +--(axis cs:68096,0.230517661873082) +--(axis cs:68608,0.234824650215634) +--(axis cs:69120,0.25468681848303) +--(axis cs:69632,0.263424682407577) +--(axis cs:70144,0.272359224619714) +--(axis cs:70656,0.291709459585572) +--(axis cs:71168,0.301062502834479) +--(axis cs:71680,0.316808095364967) +--(axis cs:72192,0.333878791485555) +--(axis cs:72704,0.343045300966601) +--(axis cs:73216,0.347652738245132) +--(axis cs:73728,0.363728806051772) +--(axis cs:74240,0.372964102697192) +--(axis cs:74752,0.38522079388629) +--(axis cs:75264,0.3979370011553) +--(axis cs:75776,0.40572157147448) +--(axis cs:76288,0.403721147780325) +--(axis cs:76800,0.406955223784909) +--(axis cs:77312,0.414186925848186) +--(axis cs:77824,0.432813301935538) +--(axis cs:78336,0.445425305204329) +--(axis cs:78848,0.454722925297336) +--(axis cs:79360,0.453283720968416) +--(axis cs:79872,0.463391263805286) +--(axis cs:80384,0.479017347767354) +--(axis cs:80896,0.482527485121749) +--(axis cs:81408,0.493906145728829) +--(axis cs:81920,0.495707134971078) +--(axis cs:82432,0.520745347277226) +--(axis cs:82944,0.536893803695373) +--(axis cs:83456,0.552200745811822) +--(axis cs:83968,0.560475323108861) +--(axis cs:84480,0.572092640562651) +--(axis cs:84992,0.577940716699118) +--(axis cs:85504,0.586662137985437) +--(axis cs:86016,0.599440854444497) +--(axis cs:86528,0.616251206008562) +--(axis cs:87040,0.629086980683932) +--(axis cs:87552,0.637892333090929) +--(axis cs:88064,0.643430384284504) +--(axis cs:88576,0.658283310295728) +--(axis cs:89088,0.676250583913821) +--(axis cs:89600,0.683191434023462) +--(axis cs:90112,0.691538213335899) +--(axis cs:90624,0.701916081098436) +--(axis cs:91136,0.714599187150386) +--(axis cs:91648,0.726760377155372) +--(axis cs:92160,0.743227492090961) +--(axis cs:92672,0.758121472311662) +--(axis cs:93184,0.769560297766694) +--(axis cs:93696,0.786177504693726) +--(axis cs:94208,0.799439223129374) +--(axis cs:94720,0.812056957584811) +--(axis cs:95232,0.818763848776477) +--(axis cs:95744,0.832524462263531) +--(axis cs:96256,0.848875229682059) +--(axis cs:96768,0.863822048263162) +--(axis cs:97280,0.884811268461535) +--(axis cs:97792,0.904556511434662) +--(axis cs:98304,0.930816209029891) +--(axis cs:98816,0.950037692620479) +--(axis cs:99328,0.973022082360685) +--(axis cs:99840,0.993460816609024) +--(axis cs:100352,1.01700475344197) +--(axis cs:100864,1.03548583102274) +--(axis cs:101376,1.05213362436407) +--(axis cs:101888,1.06734024563051) +--(axis cs:102400,1.08187498962027) +--(axis cs:102912,1.09381725341381) +--(axis cs:103424,1.10685927957272) +--(axis cs:103936,1.07008138405385) +--(axis cs:104448,1.08021028437552) +--(axis cs:104960,1.08888398786989) +--(axis cs:105472,1.09296289225943) +--(axis cs:105984,1.10151996329365) +--(axis cs:106496,1.10965543879152) +--(axis cs:107008,1.11910270306344) +--(axis cs:107520,1.12121771533532) +--(axis cs:108032,1.12617222783189) +--(axis cs:108544,1.13960597062398) +--(axis cs:109056,1.14818750870993) +--(axis cs:109568,1.16803923828978) +--(axis cs:110080,1.17853110555557) +--(axis cs:110592,1.19055835861709) +--(axis cs:111104,1.1950388345025) +--(axis cs:111616,1.21188931415738) +--(axis cs:112128,1.22619003757199) +--(axis cs:112640,1.23381308645326) +--(axis cs:113152,1.24104523072328) +--(axis cs:113664,1.24057462009595) +--(axis cs:114176,1.25751764817297) +--(axis cs:114688,1.27061216849448) +--(axis cs:115200,1.27365103907229) +--(axis cs:115712,1.27314339372188) +--(axis cs:116224,1.27585647955449) +--(axis cs:116736,1.28610601543425) +--(axis cs:117248,1.29432879684735) +--(axis cs:117760,1.30120483867881) +--(axis cs:118272,1.30542080623036) +--(axis cs:118784,1.314831342148) +--(axis cs:119296,1.32582894908445) +--(axis cs:119808,1.32192855373256) +--(axis cs:120320,1.32641466941081) +--(axis cs:120832,1.32917884070751) +--(axis cs:121344,1.33439391954201) +--(axis cs:121856,1.33744628010929) +--(axis cs:122368,1.3436046153856) +--(axis cs:122880,1.34825914636887) +--(axis cs:123392,1.35556703814262) +--(axis cs:123904,1.35853269422296) +--(axis cs:124416,1.36738519183833) +--(axis cs:124928,1.36697929642257) +--(axis cs:125440,1.36917701693144) +--(axis cs:125952,1.37332724256734) +--(axis cs:126464,1.38306783456598) +--(axis cs:126976,1.38299518165794) +--(axis cs:127488,1.38702933419858) +--(axis cs:128000,1.39222602927759) +--(axis cs:128512,1.39470036640653) +--(axis cs:129024,1.39196637026404) +--(axis cs:129536,1.39833966706706) +--(axis cs:130048,1.40313769345866) +--(axis cs:130560,1.40893799410728) +--(axis cs:131072,1.41251475748667) +--(axis cs:131584,1.40907398116215) +--(axis cs:132096,1.40360781843973) +--(axis cs:132608,1.40488887177682) +--(axis cs:133120,1.40934007370137) +--(axis cs:133632,1.40101146103777) +--(axis cs:134144,1.41111661255121) +--(axis cs:134656,1.41088317789684) +--(axis cs:135168,1.40706532326312) +--(axis cs:135680,1.40393167170393) +--(axis cs:136192,1.40404874858009) +--(axis cs:136704,1.39461456857543) +--(axis cs:137216,1.40336702611051) +--(axis cs:137728,1.3969430797719) +--(axis cs:138240,1.39750948855605) +--(axis cs:138752,1.4004151776149) +--(axis cs:139264,1.39864955926851) +--(axis cs:139776,1.40762905935156) +--(axis cs:140288,1.4190979392633) +--(axis cs:140800,1.42139936545853) +--(axis cs:141312,1.42115279947858) +--(axis cs:141824,1.43187935604861) +--(axis cs:142336,1.43643966142299) +--(axis cs:142848,1.43612433801119) +--(axis cs:143360,1.42974064855823) +--(axis cs:143872,1.43071643045407) +--(axis cs:144384,1.43376493300493) +--(axis cs:144896,1.42471229086244) +--(axis cs:145408,1.41069761489708) +--(axis cs:145920,1.41592543234123) +--(axis cs:146432,1.41529379254442) +--(axis cs:146944,1.41165785098843) +--(axis cs:147456,1.40995673937121) +--(axis cs:147968,1.39813432082184) +--(axis cs:148480,1.39039083895492) +--(axis cs:148992,1.39426898387474) +--(axis cs:149504,1.39035490236994) +--(axis cs:150016,1.38186258283537) +--(axis cs:150528,1.38177510817397) +--(axis cs:151040,1.37787748522385) +--(axis cs:151552,1.37485118341225) +--(axis cs:152064,1.37334661195498) +--(axis cs:152576,1.36861243606186) +--(axis cs:153088,1.35506424316921) +--(axis cs:153600,1.3470818217148) +--(axis cs:154112,1.35279486618612) +--(axis cs:154624,1.35768285929549) +--(axis cs:155136,1.36107497983246) +--(axis cs:155648,1.36244185067002) +--(axis cs:156160,1.37574004626717) +--(axis cs:156672,1.38548057408884) +--(axis cs:157184,1.39637442042707) +--(axis cs:157696,1.40168819506096) +--(axis cs:158208,1.40530608012066) +--(axis cs:158720,1.40371193881528) +--(axis cs:159232,1.40242780925515) +--(axis cs:159744,1.39923331842206) +--(axis cs:160256,1.40098822580769) +--(axis cs:160768,1.39700820951828) +--(axis cs:161280,1.39348570768343) +--(axis cs:161792,1.39409268848974) +--(axis cs:162304,1.38444156930401) +--(axis cs:162816,1.39060013698573) +--(axis cs:163328,1.39223189749058) +--(axis cs:163840,1.3912291379589) +--(axis cs:164352,1.39212897791263) +--(axis cs:164864,1.39743375416323) +--(axis cs:165376,1.40029101489641) +--(axis cs:165888,1.40872965294633) +--(axis cs:166400,1.40614370053924) +--(axis cs:166912,1.41026569969885) +--(axis cs:167424,1.40852750197696) +--(axis cs:167936,1.40169592478495) +--(axis cs:168448,1.40159969820336) +--(axis cs:168960,1.40518567668861) +--(axis cs:169472,1.41178704366957) +--(axis cs:169984,1.41003425641843) +--(axis cs:170496,1.4175878352573) +--(axis cs:171008,1.42639921379439) +--(axis cs:171520,1.42187654689129) +--(axis cs:172032,1.41093275656763) +--(axis cs:172544,1.41485839380852) +--(axis cs:173056,1.41484286251342) +--(axis cs:173568,1.42695671475681) +--(axis cs:174080,1.42404692736111) +--(axis cs:174592,1.42839326221005) +--(axis cs:175104,1.43526550390572) +--(axis cs:175616,1.43741993135483) +--(axis cs:176128,1.43573478380241) +--(axis cs:176640,1.44610707174951) +--(axis cs:177152,1.45266956751566) +--(axis cs:177664,1.45297306487076) +--(axis cs:178176,1.45206711351007) +--(axis cs:178688,1.4569691844654) +--(axis cs:179200,1.45467559634391) +--(axis cs:179712,1.4629934317224) +--(axis cs:180224,1.4703078730031) +--(axis cs:180736,1.47639248382073) +--(axis cs:181248,1.48576737761526) +--(axis cs:181760,1.49690492655591) +--(axis cs:182272,1.49782286643862) +--(axis cs:182784,1.50554253863019) +--(axis cs:183296,1.50253158148879) +--(axis cs:183808,1.50706918616638) +--(axis cs:184320,1.5111031465163) +--(axis cs:184832,1.51850662681647) +--(axis cs:185344,1.51877388134981) +--(axis cs:185856,1.5223987929491) +--(axis cs:186368,1.52592145637169) +--(axis cs:186880,1.53625163683144) +--(axis cs:187392,1.54181338178845) +--(axis cs:187904,1.54903774165152) +--(axis cs:188416,1.55157564105061) +--(axis cs:188928,1.55477013826551) +--(axis cs:189440,1.5564367297922) +--(axis cs:189952,1.56776449287549) +--(axis cs:190464,1.56758844095457) +--(axis cs:190976,1.57239990797007) +--(axis cs:191488,1.57842359517611) +--(axis cs:192000,1.58102525101898) +--(axis cs:192512,1.58073004495381) +--(axis cs:193024,1.58573632128056) +--(axis cs:193536,1.5788654936077) +--(axis cs:194048,1.58750646383654) +--(axis cs:194560,1.59475479262369) +--(axis cs:195072,1.59698739601486) +--(axis cs:195584,1.60263274044919) +--(axis cs:196096,1.60360854225245) +--(axis cs:196608,1.60585111036691) +--(axis cs:197120,1.60817858474433) +--(axis cs:197632,1.61072761643659) +--(axis cs:198144,1.61968032718528) +--(axis cs:198656,1.63281978106689) +--(axis cs:199168,1.64281163909717) +--(axis cs:199680,1.64597427615263) +--(axis cs:200192,1.64688063566485) +--(axis cs:200704,1.65398791340945) +--(axis cs:201216,1.66004470649184) +--(axis cs:201728,1.66414737488606) +--(axis cs:202240,1.66299969353997) +--(axis cs:202752,1.66230613756602) +--(axis cs:203264,1.66940629068686) +--(axis cs:203776,1.67485911643897) +--(axis cs:204288,1.6811360077742) +--(axis cs:204800,1.68430704783309) +--(axis cs:205312,1.68906130728106) +--(axis cs:205824,1.69078293145255) +--(axis cs:206336,1.66715139056146) +--(axis cs:206848,1.67167298718575) +--(axis cs:207360,1.67281909345868) +--(axis cs:207872,1.6785106823399) +--(axis cs:208384,1.6694327823461) +--(axis cs:208896,1.67100293969858) +--(axis cs:209408,1.66975255973042) +--(axis cs:209920,1.66613653700666) +--(axis cs:210432,1.66831003674105) +--(axis cs:210944,1.66857536305617) +--(axis cs:211456,1.67029703591962) +--(axis cs:211968,1.67451292529261) +--(axis cs:212480,1.67227406039299) +--(axis cs:212992,1.66743445120332) +--(axis cs:213504,1.66676486047555) +--(axis cs:214016,1.66130643172187) +--(axis cs:214528,1.65857850327679) +--(axis cs:215040,1.65476435572935) +--(axis cs:215552,1.65151352487086) +--(axis cs:216064,1.64655386834065) +--(axis cs:216576,1.64619408217526) +--(axis cs:217088,1.6418387225859) +--(axis cs:217600,1.64250211147116) +--(axis cs:218112,1.64295055476848) +--(axis cs:218624,1.63693848972382) +--(axis cs:219136,1.63470701820279) +--(axis cs:219648,1.63116198266542) +--(axis cs:220160,1.63287062287043) +--(axis cs:220672,1.62851769714696) +--(axis cs:221184,1.62526939731435) +--(axis cs:221696,1.62400298613407) +--(axis cs:222208,1.62624506070925) +--(axis cs:222720,1.62823452348653) +--(axis cs:223232,1.62787172608544) +--(axis cs:223744,1.62669359071934) +--(axis cs:224256,1.62828755872273) +--(axis cs:224768,1.62149060115154) +--(axis cs:225280,1.62459372875092) +--(axis cs:225792,1.62142278890173) +--(axis cs:226304,1.62159776527845) +--(axis cs:226816,1.61885640475556) +--(axis cs:227328,1.61464746079452) +--(axis cs:227840,1.6130348176198) +--(axis cs:228352,1.61531938377092) +--(axis cs:228864,1.61195566230458) +--(axis cs:229376,1.60959409444654) +--(axis cs:229888,1.61225505151603) +--(axis cs:230400,1.61175517911916) +--(axis cs:230912,1.61275642296427) +--(axis cs:231424,1.61195091739711) +--(axis cs:231936,1.60694144805801) +--(axis cs:232448,1.60695781291589) +--(axis cs:232960,1.59995420161114) +--(axis cs:233472,1.60096142714456) +--(axis cs:233984,1.60476776840843) +--(axis cs:234496,1.60268251991228) +--(axis cs:235008,1.60036315706109) +--(axis cs:235520,1.59544920650579) +--(axis cs:236032,1.58823537673979) +--(axis cs:236544,1.58028260495509) +--(axis cs:237056,1.58044496000239) +--(axis cs:237568,1.58090154089575) +--(axis cs:238080,1.5836382316731) +--(axis cs:238592,1.58356602396523) +--(axis cs:239104,1.59171407381882) +--(axis cs:239616,1.58943655558222) +--(axis cs:240128,1.59493711928413) +--(axis cs:240640,1.59713435610331) +--(axis cs:241152,1.59933914039295) +--(axis cs:241664,1.60174402556299) +--(axis cs:242176,1.60075165460089) +--(axis cs:242688,1.60298918499029) +--(axis cs:243200,1.59904595784993) +--(axis cs:243712,1.60129235734716) +--(axis cs:244224,1.59703561182704) +--(axis cs:244736,1.59170670503321) +--(axis cs:245248,1.59067096626633) +--(axis cs:245760,1.58955137128489) +--(axis cs:246272,1.57951990392006) +--(axis cs:246784,1.5747494459288) +--(axis cs:247296,1.5803358744355) +--(axis cs:247808,1.5752271965993) +--(axis cs:248320,1.5728352203386) +--(axis cs:248832,1.56387134342664) +--(axis cs:249344,1.55448131880821) +--(axis cs:249856,1.54996974961295) +--(axis cs:250368,1.54515696653724) +--(axis cs:250880,1.53929422856888) +--(axis cs:251392,1.54445208344333) +--(axis cs:251904,1.53950086100113) +--(axis cs:252416,1.53816538881017) +--(axis cs:252928,1.5382188302686) +--(axis cs:253440,1.53625313917441) +--(axis cs:253952,1.53615593247053) +--(axis cs:254464,1.52983603745422) +--(axis cs:254976,1.53451987662906) +--(axis cs:255488,1.53575730353316) +--(axis cs:256000,1.52761865172396) +--(axis cs:256512,1.51907656079044) +--(axis cs:257024,1.51785163004118) +--(axis cs:257536,1.51650374371636) +--(axis cs:258048,1.51068974390949) +--(axis cs:258560,1.50866648722729) +--(axis cs:259072,1.50193699512455) +--(axis cs:259584,1.50183016745364) +--(axis cs:260096,1.50216019754085) +--(axis cs:260608,1.49826101357924) +--(axis cs:261120,1.50145362408887) +--(axis cs:261632,1.49658070520062) +--(axis cs:262144,1.49883154744389) +--(axis cs:262656,1.49120033232895) +--(axis cs:263168,1.49721042804165) +--(axis cs:263680,1.49524216144956) +--(axis cs:264192,1.49321398694039) +--(axis cs:264704,1.49660781474227) +--(axis cs:265216,1.50080761412111) +--(axis cs:265728,1.49937492552638) +--(axis cs:266240,1.49574803648751) +--(axis cs:266752,1.50014357105384) +--(axis cs:267264,1.50282553786888) +--(axis cs:267776,1.50449581965099) +--(axis cs:268288,1.50577839943782) +--(axis cs:268800,1.50437134537804) +--(axis cs:269312,1.49799404392183) +--(axis cs:269824,1.50321853458423) +--(axis cs:270336,1.50474239956559) +--(axis cs:270848,1.50145384861047) +--(axis cs:271360,1.498384294671) +--(axis cs:271872,1.49791007250839) +--(axis cs:272384,1.49766492143547) +--(axis cs:272896,1.49740855210671) +--(axis cs:273408,1.5085036203306) +--(axis cs:273920,1.51095111854256) +--(axis cs:274432,1.51533114636324) +--(axis cs:274944,1.51246432630815) +--(axis cs:275456,1.52054440672644) +--(axis cs:275968,1.52678759105962) +--(axis cs:276480,1.52801434720379) +--(axis cs:276992,1.532361253245) +--(axis cs:277504,1.53436415148196) +--(axis cs:278016,1.53121658706588) +--(axis cs:278528,1.52671160700499) +--(axis cs:279040,1.52800979518487) +--(axis cs:279552,1.52747607705965) +--(axis cs:280064,1.52265831545465) +--(axis cs:280576,1.52136971755975) +--(axis cs:281088,1.51501173466243) +--(axis cs:281600,1.51891386177213) +--(axis cs:282112,1.51778020360397) +--(axis cs:282624,1.51480259094932) +--(axis cs:283136,1.52248707966956) +--(axis cs:283648,1.52581948183092) +--(axis cs:284160,1.53231034353578) +--(axis cs:284672,1.53736939963523) +--(axis cs:285184,1.55050519177383) +--(axis cs:285696,1.55589563815384) +--(axis cs:286208,1.56224666251175) +--(axis cs:286720,1.56296325168736) +--(axis cs:287232,1.56187158172912) +--(axis cs:287744,1.56547452060295) +--(axis cs:288256,1.57034750942304) +--(axis cs:288768,1.57467557149474) +--(axis cs:289280,1.56875407342902) +--(axis cs:289792,1.56865243932337) +--(axis cs:290304,1.56664841186675) +--(axis cs:290816,1.56802255963913) +--(axis cs:291328,1.57308766946885) +--(axis cs:291840,1.57754179792411) +--(axis cs:292352,1.57581676194613) +--(axis cs:292864,1.57742288513043) +--(axis cs:293376,1.58072800984513) +--(axis cs:293888,1.57931166459037) +--(axis cs:294400,1.58654918715198) +--(axis cs:294912,1.59182942282591) +--(axis cs:295424,1.59638069436805) +--(axis cs:295936,1.59947080727046) +--(axis cs:296448,1.60914899431231) +--(axis cs:296960,1.615761307582) +--(axis cs:297472,1.61932974524278) +--(axis cs:297984,1.61573136113043) +--(axis cs:298496,1.61831993155531) +--(axis cs:299008,1.61502157590089) +--(axis cs:299520,1.61779204620357) +--(axis cs:300032,1.61945370788526) +--(axis cs:300544,1.62227557728652) +--(axis cs:301056,1.62841629421458) +--(axis cs:301568,1.63163595992654) +--(axis cs:302080,1.63645405088842) +--(axis cs:302592,1.64291218628505) +--(axis cs:303104,1.65075534579288) +--(axis cs:303616,1.65976701972489) +--(axis cs:304128,1.66511910073128) +--(axis cs:304640,1.67086734019655) +--(axis cs:305152,1.67176875201136) +--(axis cs:305664,1.67237947419065) +--(axis cs:306176,1.67093389806576) +--(axis cs:306688,1.67602459449831) +--(axis cs:307200,1.67601308241853) +--(axis cs:307712,1.67700378144061) +--(axis cs:308224,1.68544424433908) +--(axis cs:308736,1.67664406280166) +--(axis cs:309248,1.67821287375194) +--(axis cs:309760,1.67592357073593) +--(axis cs:310272,1.68142497060108) +--(axis cs:310784,1.68016287605161) +--(axis cs:311296,1.67536915244006) +--(axis cs:311808,1.67655739769343) +--(axis cs:312320,1.68156807708283) +--(axis cs:312832,1.68002674119085) +--(axis cs:313344,1.68413892701652) +--(axis cs:313856,1.68411801164286) +--(axis cs:314368,1.68279035521413) +--(axis cs:314880,1.6799529126992) +--(axis cs:315392,1.67874641665579) +--(axis cs:315904,1.67659190380396) +--(axis cs:316416,1.67809571012367) +--(axis cs:316928,1.674570683498) +--(axis cs:317440,1.67480106593107) +--(axis cs:317952,1.67396692373769) +--(axis cs:318464,1.67649151220517) +--(axis cs:318976,1.67515260417243) +--(axis cs:319488,1.67368590863942) +--(axis cs:320000,1.67409705721789) +--(axis cs:320512,1.67616477227477) +--(axis cs:321024,1.67822439642159) +--(axis cs:321536,1.67165898415774) +--(axis cs:322048,1.67245142051807) +--(axis cs:322560,1.66771141317657) +--(axis cs:323072,1.66476529289367) +--(axis cs:323584,1.66122113551606) +--(axis cs:324096,1.66535858592476) +--(axis cs:324608,1.66629925724855) +--(axis cs:325120,1.67175475248321) +--(axis cs:325632,1.67819867543912) +--(axis cs:326144,1.67902286550382) +--(axis cs:326656,1.68044081359822) +--(axis cs:327168,1.69067080247833) +--(axis cs:327680,1.69069580218087) +--(axis cs:328192,1.689759136496) +--(axis cs:328704,1.68894875216241) +--(axis cs:329216,1.68768285975324) +--(axis cs:329728,1.68861228385301) +--(axis cs:330240,1.69376150174365) +--(axis cs:330752,1.6910764589231) +--(axis cs:331264,1.68697212178688) +--(axis cs:331776,1.68265378883014) +--(axis cs:332288,1.68346151563332) +--(axis cs:332800,1.68480289737871) +--(axis cs:333312,1.6831403391154) +--(axis cs:333824,1.68197970732691) +--(axis cs:334336,1.67966426537867) +--(axis cs:334848,1.67529696108451) +--(axis cs:335360,1.6718375133574) +--(axis cs:335872,1.67373913989172) +--(axis cs:336384,1.67183909014061) +--(axis cs:336896,1.67245688010765) +--(axis cs:337408,1.66942569122916) +--(axis cs:337920,1.66829468697828) +--(axis cs:338432,1.66717430345056) +--(axis cs:338944,1.66603841985707) +--(axis cs:339456,1.66893530996218) +--(axis cs:339968,1.6703113490211) +--(axis cs:340480,1.67142674715664) +--(axis cs:340992,1.66839140404768) +--(axis cs:341504,1.66686844144827) +--(axis cs:342016,1.66071462690868) +--(axis cs:342528,1.65468384884251) +--(axis cs:343040,1.65410033437023) +--(axis cs:343552,1.65310757819642) +--(axis cs:344064,1.66163892379317) +--(axis cs:344576,1.6703558721736) +--(axis cs:345088,1.6680800743411) +--(axis cs:345600,1.66847726644423) +--(axis cs:346112,1.66662445306828) +--(axis cs:346624,1.66858161197127) +--(axis cs:347136,1.67032680271791) +--(axis cs:347648,1.67548447407315) +--(axis cs:348160,1.66704848227092) +--(axis cs:348672,1.65945491749279) +--(axis cs:349184,1.65511498271164) +--(axis cs:349696,1.64767211489502) +--(axis cs:350208,1.64640604483615) +--(axis cs:350720,1.6423583309316) +--(axis cs:351232,1.63836460430685) +--(axis cs:351744,1.63218806721973) +--(axis cs:352256,1.6321974530212) +--(axis cs:352768,1.62991763873666) +--(axis cs:353280,1.62562835612985) +--(axis cs:353792,1.62176254716759) +--(axis cs:354304,1.62340474845909) +--(axis cs:354816,1.62266286631456) +--(axis cs:355328,1.61825595273677) +--(axis cs:355840,1.62241330867277) +--(axis cs:356352,1.62119243468703) +--(axis cs:356864,1.61591152612267) +--(axis cs:357376,1.6134969529126) +--(axis cs:357888,1.6193446248437) +--(axis cs:358400,1.61574275848144) +--(axis cs:358912,1.61265041224252) +--(axis cs:359424,1.60645717996618) +--(axis cs:359936,1.60187894710793) +--(axis cs:360448,1.59580596995461) +--(axis cs:360960,1.5949782386584) +--(axis cs:361472,1.59213185918961) +--(axis cs:361984,1.59186550014392) +--(axis cs:362496,1.58702799343608) +--(axis cs:363008,1.58067455643503) +--(axis cs:363520,1.58196919439552) +--(axis cs:364032,1.57847634661705) +--(axis cs:364544,1.58393820886367) +--(axis cs:365056,1.58089413266473) +--(axis cs:365568,1.57922604238132) +--(axis cs:366080,1.58568743547285) +--(axis cs:366592,1.58488384474148) +--(axis cs:367104,1.5796576063185) +--(axis cs:367616,1.58242793036825) +--(axis cs:368128,1.58275902581182) +--(axis cs:368640,1.58583171354442) +--(axis cs:369152,1.5844381634962) +--(axis cs:369664,1.58478787175248) +--(axis cs:370176,1.58765864157522) +--(axis cs:370688,1.58539833459512) +--(axis cs:371200,1.58056119469157) +--(axis cs:371712,1.58356888915061) +--(axis cs:372224,1.58548804858296) +--(axis cs:372736,1.58223982175027) +--(axis cs:373248,1.58368346406846) +--(axis cs:373760,1.57539561166291) +--(axis cs:374272,1.58112698239177) +--(axis cs:374784,1.58069661718579) +--(axis cs:375296,1.58088042279307) +--(axis cs:375808,1.57960201695361) +--(axis cs:376320,1.58240594957041) +--(axis cs:376832,1.58671242152806) +--(axis cs:377344,1.58118540025894) +--(axis cs:377856,1.58051802266187) +--(axis cs:378368,1.57737047111569) +--(axis cs:378880,1.58225092171192) +--(axis cs:379392,1.58565165332154) +--(axis cs:379904,1.59227322682803) +--(axis cs:380416,1.59514936128973) +--(axis cs:380928,1.59217577566539) +--(axis cs:381440,1.58759915950524) +--(axis cs:381952,1.59324149181584) +--(axis cs:382464,1.59035862584743) +--(axis cs:382976,1.59062626005774) +--(axis cs:383488,1.58742416864718) +--(axis cs:384000,1.58377360028275) +--(axis cs:384512,1.5826340957699) +--(axis cs:385024,1.57707824684045) +--(axis cs:385536,1.57757534315723) +--(axis cs:386048,1.58356414129109) +--(axis cs:386560,1.58573869934695) +--(axis cs:387072,1.58853865818726) +--(axis cs:387584,1.58945955802855) +--(axis cs:388096,1.58900118202481) +--(axis cs:388608,1.59096460444971) +--(axis cs:389120,1.5941804013553) +--(axis cs:389632,1.58736067107461) +--(axis cs:390144,1.58642474077776) +--(axis cs:390656,1.59062179933369) +--(axis cs:391168,1.60052625562052) +--(axis cs:391680,1.60611557092792) +--(axis cs:392192,1.61356401761949) +--(axis cs:392704,1.6193043649481) +--(axis cs:393216,1.62340610766217) +--(axis cs:393728,1.62442418025609) +--(axis cs:394240,1.62899059051707) +--(axis cs:394752,1.63181046766917) +--(axis cs:395264,1.63813561569815) +--(axis cs:395776,1.63881238898414) +--(axis cs:396288,1.63738919351387) +--(axis cs:396800,1.64201697511624) +--(axis cs:397312,1.64156190238943) +--(axis cs:397824,1.64749713294809) +--(axis cs:398336,1.65099596682683) +--(axis cs:398848,1.64631272172887) +--(axis cs:399360,1.6519724096329) +--(axis cs:399872,1.6587761045697) +--(axis cs:400384,1.6614542883721) +--(axis cs:400896,1.66579859915878) +--(axis cs:401408,1.66660381424449) +--(axis cs:401920,1.66938394244726) +--(axis cs:402432,1.67396619999939) +--(axis cs:402944,1.67958426041028) +--(axis cs:403456,1.6866998531574) +--(axis cs:403968,1.6916034551147) +--(axis cs:404480,1.69661234530948) +--(axis cs:404992,1.69866857204756) +--(axis cs:405504,1.69945213172814) +--(axis cs:406016,1.69951153476929) +--(axis cs:406528,1.70639279946014) +--(axis cs:407040,1.70791521558253) +--(axis cs:407552,1.71499442139663) +--(axis cs:408064,1.72111688732793) +--(axis cs:408576,1.7254364748005) +--(axis cs:409088,1.73320544397576) +--(axis cs:409600,1.73918587226678) +--(axis cs:410112,1.73862584835686) +--(axis cs:410624,1.7420996661586) +--(axis cs:411136,1.73922579792894) +--(axis cs:411648,1.73750729116945) +--(axis cs:412160,1.73848192474633) +--(axis cs:412672,1.73565406027923) +--(axis cs:413184,1.73688698291784) +--(axis cs:413696,1.7342244969027) +--(axis cs:414208,1.742115285774) +--(axis cs:414720,1.7398136448972) +--(axis cs:415232,1.74103562307457) +--(axis cs:415744,1.73337720439031) +--(axis cs:416256,1.72956168401803) +--(axis cs:416768,1.73130008820415) +--(axis cs:417280,1.72747123889476) +--(axis cs:417792,1.72405969980091) +--(axis cs:418304,1.72661670851219) +--(axis cs:418816,1.72145438418088) +--(axis cs:419328,1.71415126761888) +--(axis cs:419840,1.71516964223569) +--(axis cs:420352,1.71604215271454) +--(axis cs:420864,1.71218221800443) +--(axis cs:421376,1.70887784570879) +--(axis cs:421888,1.70828052757849) +--(axis cs:422400,1.7094158454828) +--(axis cs:422912,1.70719717275537) +--(axis cs:423424,1.70462084171716) +--(axis cs:423936,1.70110807422227) +--(axis cs:424448,1.70344586246435) +--(axis cs:424960,1.70185683109837) +--(axis cs:425472,1.69723540860597) +--(axis cs:425984,1.70185169162234) +--(axis cs:426496,1.70131903783917) +--(axis cs:427008,1.70378873909928) +--(axis cs:427520,1.70934975228572) +--(axis cs:428032,1.70714931799406) +--(axis cs:428544,1.70883317990151) +--(axis cs:429056,1.70700060068726) +--(axis cs:429568,1.71090876012541) +--(axis cs:430080,1.71268348335799) +--(axis cs:430592,1.71019982119047) +--(axis cs:431104,1.71066051575504) +--(axis cs:431616,1.70094661968447) +--(axis cs:432128,1.69876730052389) +--(axis cs:432640,1.69959403526592) +--(axis cs:433152,1.69996568828638) +--(axis cs:433664,1.69390099191728) +--(axis cs:434176,1.69282619017016) +--(axis cs:434688,1.69484167584988) +--(axis cs:435200,1.69410604700433) +--(axis cs:435712,1.69120720819917) +--(axis cs:436224,1.68734615079325) +--(axis cs:436736,1.68581418408128) +--(axis cs:437248,1.68491507559056) +--(axis cs:437760,1.68646978584397) +--(axis cs:438272,1.68724767148548) +--(axis cs:438784,1.68771944914447) +--(axis cs:439296,1.68818595185262) +--(axis cs:439808,1.68892171765634) +--(axis cs:440320,1.68755887209669) +--(axis cs:440832,1.68266541942238) +--(axis cs:441344,1.68184246672837) +--(axis cs:441856,1.68726010547198) +--(axis cs:442368,1.68853659186018) +--(axis cs:442880,1.69002847106164) +--(axis cs:443392,1.68888028058826) +--(axis cs:443904,1.68370707273737) +--(axis cs:444416,1.67964575603326) +--(axis cs:444928,1.6763646318176) +--(axis cs:445440,1.67404883277308) +--(axis cs:445952,1.66821803654581) +--(axis cs:446464,1.66582023064361) +--(axis cs:446976,1.66351120911402) +--(axis cs:447488,1.66535124614002) +--(axis cs:448000,1.65782473748654) +--(axis cs:448512,1.66082985216269) +--(axis cs:449024,1.66096850657552) +--(axis cs:449536,1.66261365431434) +--(axis cs:450048,1.65768763848908) +--(axis cs:450560,1.64851782712423) +--(axis cs:451072,1.63944570158536) +--(axis cs:451584,1.63329527677116) +--(axis cs:452096,1.63626360500678) +--(axis cs:452608,1.63479180211964) +--(axis cs:453120,1.63035248943287) +--(axis cs:453632,1.6313525932612) +--(axis cs:454144,1.62876559809925) +--(axis cs:454656,1.62654545327179) +--(axis cs:455168,1.62478438615574) +--(axis cs:455680,1.62264288060558) +--(axis cs:456192,1.61856444324817) +--(axis cs:456704,1.61950109378324) +--(axis cs:457216,1.61562344939524) +--(axis cs:457728,1.62020245758686) +--(axis cs:458240,1.62397791911734) +--(axis cs:458752,1.62265366802288) +--(axis cs:459264,1.62158843333281) +--(axis cs:459776,1.6138687073326) +--(axis cs:460288,1.61210753674973) +--(axis cs:460800,1.61229031163052) +--(axis cs:461312,1.6153595945571) +--(axis cs:461824,1.61361149717854) +--(axis cs:462336,1.61255986553391) +--(axis cs:462848,1.60806177867726) +--(axis cs:463360,1.61179927555319) +--(axis cs:463872,1.60885128781282) +--(axis cs:464384,1.60810445610864) +--(axis cs:464896,1.60507212055016) +--(axis cs:465408,1.60776448694148) +--(axis cs:465920,1.60923806690209) +--(axis cs:466432,1.61044958736337) +--(axis cs:466944,1.60492036923625) +--(axis cs:467456,1.60578727009058) +--(axis cs:467968,1.60462403549263) +--(axis cs:468480,1.60438298298015) +--(axis cs:468992,1.61030315450197) +--(axis cs:469504,1.6091359624144) +--(axis cs:470016,1.60308466291796) +--(axis cs:470528,1.60009584779812) +--(axis cs:471040,1.59884021453924) +--(axis cs:471552,1.59502590216795) +--(axis cs:472064,1.5896716434438) +--(axis cs:472576,1.59045773827122) +--(axis cs:473088,1.58497762677021) +--(axis cs:473600,1.58508854182349) +--(axis cs:474112,1.5828954740705) +--(axis cs:474624,1.5753084730162) +--(axis cs:475136,1.57323502699764) +--(axis cs:475648,1.57663718210473) +--(axis cs:476160,1.57988228745415) +--(axis cs:476672,1.57594869546443) +--(axis cs:477184,1.57795323587878) +--(axis cs:477696,1.5751804028709) +--(axis cs:478208,1.5751720199403) +--(axis cs:478720,1.57792860964224) +--(axis cs:479232,1.58388598369047) +--(axis cs:479744,1.58420271172714) +--(axis cs:480256,1.58541406863379) +--(axis cs:480768,1.58392772270147) +--(axis cs:481280,1.58631773352896) +--(axis cs:481792,1.58739817498931) +--(axis cs:482304,1.59299818835788) +--(axis cs:482816,1.59904205032256) +--(axis cs:483328,1.60077762407296) +--(axis cs:483840,1.60684326003481) +--(axis cs:484352,1.61254395154603) +--(axis cs:484864,1.6190731478783) +--(axis cs:485376,1.61947528471453) +--(axis cs:485888,1.61741165713721) +--(axis cs:486400,1.62100703372727) +--(axis cs:486912,1.62221322848814) +--(axis cs:487424,1.62641987561424) +--(axis cs:487936,1.62740367773727) +--(axis cs:488448,1.62308698950414) +--(axis cs:488960,1.62681997061044) +--(axis cs:489472,1.63099522485894) +--(axis cs:489984,1.63040510370694) +--(axis cs:490496,1.63022484103933) +--(axis cs:491008,1.62590099070908) +--(axis cs:491520,1.6258848976526) +--(axis cs:492032,1.62295546870422) +--(axis cs:492544,1.62635224474419) +--(axis cs:493056,1.63042960669656) +--(axis cs:493568,1.62655451921828) +--(axis cs:494080,1.63346443244687) +--(axis cs:494592,1.64071250761195) +--(axis cs:495104,1.64746734125413) +--(axis cs:495616,1.65140842405684) +--(axis cs:496128,1.65347833865006) +--(axis cs:496640,1.65855970488686) +--(axis cs:497152,1.66034147932256) +--(axis cs:497664,1.66466700378386) +--(axis cs:498176,1.66833763113676) +--(axis cs:498688,1.66210792022426) +--(axis cs:499200,1.65992528585239) +--(axis cs:499712,1.65761386779446) +--(axis cs:500224,1.66422281515955) +--(axis cs:500736,1.66238925615994) +--(axis cs:501248,1.66182599350828) +--(axis cs:501760,1.67133663075093) +--(axis cs:502272,1.67510312174731) +--(axis cs:502784,1.68470763060705) +--(axis cs:502784,1.72529902386052) +--(axis cs:502784,1.72529902386052) +--(axis cs:502272,1.72060858398548) +--(axis cs:501760,1.71868666727579) +--(axis cs:501248,1.7133515693882) +--(axis cs:500736,1.71428216616785) +--(axis cs:500224,1.71209984367155) +--(axis cs:499712,1.70678029866463) +--(axis cs:499200,1.70667036911748) +--(axis cs:498688,1.71250749471538) +--(axis cs:498176,1.71326870373775) +--(axis cs:497664,1.71389901066665) +--(axis cs:497152,1.70892555170522) +--(axis cs:496640,1.70513774381339) +--(axis cs:496128,1.70373452678091) +--(axis cs:495616,1.69945523326765) +--(axis cs:495104,1.6974819712775) +--(axis cs:494592,1.68836836776991) +--(axis cs:494080,1.6814733053074) +--(axis cs:493568,1.67744471058504) +--(axis cs:493056,1.68087648193389) +--(axis cs:492544,1.68041547441686) +--(axis cs:492032,1.67519424065431) +--(axis cs:491520,1.67723269729805) +--(axis cs:491008,1.67619132162406) +--(axis cs:490496,1.67872089507185) +--(axis cs:489984,1.67504026328712) +--(axis cs:489472,1.67315105544626) +--(axis cs:488960,1.67515058741792) +--(axis cs:488448,1.67171487145152) +--(axis cs:487936,1.67067032080466) +--(axis cs:487424,1.66893148605861) +--(axis cs:486912,1.66813913707251) +--(axis cs:486400,1.66428843112463) +--(axis cs:485888,1.66081962718101) +--(axis cs:485376,1.66132914037512) +--(axis cs:484864,1.65886456208446) +--(axis cs:484352,1.65271546370872) +--(axis cs:483840,1.65118877874089) +--(axis cs:483328,1.65063953598576) +--(axis cs:482816,1.65218051751637) +--(axis cs:482304,1.64895021708528) +--(axis cs:481792,1.6459435938792) +--(axis cs:481280,1.64687458178221) +--(axis cs:480768,1.64474632885709) +--(axis cs:480256,1.64338898938841) +--(axis cs:479744,1.64685304761577) +--(axis cs:479232,1.65101153609713) +--(axis cs:478720,1.6523784243461) +--(axis cs:478208,1.64922851483537) +--(axis cs:477696,1.64671125987343) +--(axis cs:477184,1.64697113436757) +--(axis cs:476672,1.64438739561443) +--(axis cs:476160,1.64366125108576) +--(axis cs:475648,1.64157192485263) +--(axis cs:475136,1.64146065678058) +--(axis cs:474624,1.64653790638406) +--(axis cs:474112,1.65609804511582) +--(axis cs:473600,1.66091710688735) +--(axis cs:473088,1.66413736305774) +--(axis cs:472576,1.66616033507918) +--(axis cs:472064,1.66733637270549) +--(axis cs:471552,1.67171993338845) +--(axis cs:471040,1.67310689277581) +--(axis cs:470528,1.67374859112583) +--(axis cs:470016,1.67103374617087) +--(axis cs:469504,1.66942019635315) +--(axis cs:468992,1.67135279251943) +--(axis cs:468480,1.66803772794669) +--(axis cs:467968,1.67074964391745) +--(axis cs:467456,1.67126041131296) +--(axis cs:466944,1.67103411375742) +--(axis cs:466432,1.67513797363123) +--(axis cs:465920,1.67167392740202) +--(axis cs:465408,1.67259915456114) +--(axis cs:464896,1.67606852209295) +--(axis cs:464384,1.68313594758853) +--(axis cs:463872,1.68203488166674) +--(axis cs:463360,1.68181119438964) +--(axis cs:462848,1.68221501986899) +--(axis cs:462336,1.68490527407638) +--(axis cs:461824,1.68417356460059) +--(axis cs:461312,1.68787998040485) +--(axis cs:460800,1.68837047518443) +--(axis cs:460288,1.69409840889352) +--(axis cs:459776,1.69211914152828) +--(axis cs:459264,1.69281650737669) +--(axis cs:458752,1.69592401262158) +--(axis cs:458240,1.7004024660824) +--(axis cs:457728,1.69947981299892) +--(axis cs:457216,1.69899301285758) +--(axis cs:456704,1.69945612094433) +--(axis cs:456192,1.70042299113559) +--(axis cs:455680,1.70293027896265) +--(axis cs:455168,1.70357535482882) +--(axis cs:454656,1.70635361568282) +--(axis cs:454144,1.70501708796776) +--(axis cs:453632,1.70154448495223) +--(axis cs:453120,1.69836861017592) +--(axis cs:452608,1.70406956196505) +--(axis cs:452096,1.70542965025876) +--(axis cs:451584,1.7057600687107) +--(axis cs:451072,1.70319304322041) +--(axis cs:450560,1.70616991801709) +--(axis cs:450048,1.70877393441612) +--(axis cs:449536,1.71155553982203) +--(axis cs:449024,1.71106176012816) +--(axis cs:448512,1.711769493158) +--(axis cs:448000,1.7112059645761) +--(axis cs:447488,1.71957117662662) +--(axis cs:446976,1.71590805918937) +--(axis cs:446464,1.71797557569975) +--(axis cs:445952,1.71947220507252) +--(axis cs:445440,1.71998927499417) +--(axis cs:444928,1.7245070388523) +--(axis cs:444416,1.72570736389819) +--(axis cs:443904,1.72735950354593) +--(axis cs:443392,1.73281919929366) +--(axis cs:442880,1.72853670767272) +--(axis cs:442368,1.72911889374955) +--(axis cs:441856,1.72634498850581) +--(axis cs:441344,1.72403723985668) +--(axis cs:440832,1.72143975489434) +--(axis cs:440320,1.72701534395037) +--(axis cs:439808,1.72504257044569) +--(axis cs:439296,1.71966801638097) +--(axis cs:438784,1.71916599050276) +--(axis cs:438272,1.71820349725551) +--(axis cs:437760,1.72242053000034) +--(axis cs:437248,1.72406728344561) +--(axis cs:436736,1.72724105104146) +--(axis cs:436224,1.73032648669704) +--(axis cs:435712,1.73686527693778) +--(axis cs:435200,1.7413726313689) +--(axis cs:434688,1.74886081085025) +--(axis cs:434176,1.74658608921139) +--(axis cs:433664,1.74698553170723) +--(axis cs:433152,1.75351151860295) +--(axis cs:432640,1.75133736016755) +--(axis cs:432128,1.74733570701701) +--(axis cs:431616,1.75068443066011) +--(axis cs:431104,1.75334771023367) +--(axis cs:430592,1.75414570509881) +--(axis cs:430080,1.75709992683669) +--(axis cs:429568,1.75697734401293) +--(axis cs:429056,1.75457966139471) +--(axis cs:428544,1.75747608113016) +--(axis cs:428032,1.75598320370138) +--(axis cs:427520,1.76279884184888) +--(axis cs:427008,1.75851613091028) +--(axis cs:426496,1.75873015699267) +--(axis cs:425984,1.76384788698092) +--(axis cs:425472,1.76043452035637) +--(axis cs:424960,1.75855538631962) +--(axis cs:424448,1.75424358725519) +--(axis cs:423936,1.75213289132152) +--(axis cs:423424,1.75711896377723) +--(axis cs:422912,1.76063558183177) +--(axis cs:422400,1.76381266018694) +--(axis cs:421888,1.7659438425551) +--(axis cs:421376,1.77012647627263) +--(axis cs:420864,1.77446225760608) +--(axis cs:420352,1.77675897143585) +--(axis cs:419840,1.77895093123699) +--(axis cs:419328,1.78210241749918) +--(axis cs:418816,1.78451753312906) +--(axis cs:418304,1.78680765943673) +--(axis cs:417792,1.78595574532762) +--(axis cs:417280,1.79052656041303) +--(axis cs:416768,1.78860236430152) +--(axis cs:416256,1.79117711833255) +--(axis cs:415744,1.79043018947867) +--(axis cs:415232,1.79346141001125) +--(axis cs:414720,1.79201356088005) +--(axis cs:414208,1.79291515013126) +--(axis cs:413696,1.78774384903156) +--(axis cs:413184,1.78951092070167) +--(axis cs:412672,1.79179728379576) +--(axis cs:412160,1.79619045612813) +--(axis cs:411648,1.79389725960496) +--(axis cs:411136,1.79482772378329) +--(axis cs:410624,1.80128537610294) +--(axis cs:410112,1.7989994006929) +--(axis cs:409600,1.79908155399511) +--(axis cs:409088,1.79332054516901) +--(axis cs:408576,1.78905065609613) +--(axis cs:408064,1.78508873695052) +--(axis cs:407552,1.78494332783546) +--(axis cs:407040,1.78299195796865) +--(axis cs:406528,1.78198144906477) +--(axis cs:406016,1.77881918119377) +--(axis cs:405504,1.77803983654963) +--(axis cs:404992,1.77602688197307) +--(axis cs:404480,1.77325536481761) +--(axis cs:403968,1.76551576726698) +--(axis cs:403456,1.75852147316875) +--(axis cs:402944,1.75410702822836) +--(axis cs:402432,1.748794092375) +--(axis cs:401920,1.7397519285559) +--(axis cs:401408,1.7369082990177) +--(axis cs:400896,1.73587525372303) +--(axis cs:400384,1.73011616244617) +--(axis cs:399872,1.72726132286505) +--(axis cs:399360,1.72178481501218) +--(axis cs:398848,1.71744534517047) +--(axis cs:398336,1.71620417682942) +--(axis cs:397824,1.71639739869806) +--(axis cs:397312,1.71195906635139) +--(axis cs:396800,1.70766058465367) +--(axis cs:396288,1.70724830177371) +--(axis cs:395776,1.7040076452861) +--(axis cs:395264,1.69958064488718) +--(axis cs:394752,1.6892351208996) +--(axis cs:394240,1.68537305417455) +--(axis cs:393728,1.67909469821977) +--(axis cs:393216,1.67834865929562) +--(axis cs:392704,1.67495988132529) +--(axis cs:392192,1.67033178602859) +--(axis cs:391680,1.664640005466) +--(axis cs:391168,1.66225869139989) +--(axis cs:390656,1.65890271708785) +--(axis cs:390144,1.65770762934784) +--(axis cs:389632,1.65776929170347) +--(axis cs:389120,1.66182740183328) +--(axis cs:388608,1.65805711728347) +--(axis cs:388096,1.65453929781026) +--(axis cs:387584,1.65607726477075) +--(axis cs:387072,1.65350889147534) +--(axis cs:386560,1.64873375977445) +--(axis cs:386048,1.64760089205645) +--(axis cs:385536,1.64274947076429) +--(axis cs:385024,1.6391879193257) +--(axis cs:384512,1.6393098027181) +--(axis cs:384000,1.63597194755659) +--(axis cs:383488,1.63854411946032) +--(axis cs:382976,1.63649102970028) +--(axis cs:382464,1.63080068475025) +--(axis cs:381952,1.6357968385459) +--(axis cs:381440,1.63084519822232) +--(axis cs:380928,1.63452866462489) +--(axis cs:380416,1.63575166530255) +--(axis cs:379904,1.63266043809494) +--(axis cs:379392,1.62625485605881) +--(axis cs:378880,1.62293760971974) +--(axis cs:378368,1.61484068823479) +--(axis cs:377856,1.61743902300179) +--(axis cs:377344,1.61862349586823) +--(axis cs:376832,1.62389594964685) +--(axis cs:376320,1.63083302422353) +--(axis cs:375808,1.63257945162449) +--(axis cs:375296,1.63280052341288) +--(axis cs:374784,1.631411428998) +--(axis cs:374272,1.63218358371323) +--(axis cs:373760,1.63387886388816) +--(axis cs:373248,1.6401415259606) +--(axis cs:372736,1.63507900004536) +--(axis cs:372224,1.63982778137174) +--(axis cs:371712,1.63720728967826) +--(axis cs:371200,1.63668455189395) +--(axis cs:370688,1.64147488646965) +--(axis cs:370176,1.6409359055882) +--(axis cs:369664,1.64255215890222) +--(axis cs:369152,1.63792746417626) +--(axis cs:368640,1.64041493898907) +--(axis cs:368128,1.63969744638974) +--(axis cs:367616,1.63871363092181) +--(axis cs:367104,1.63389535876221) +--(axis cs:366592,1.63745897586562) +--(axis cs:366080,1.63268323986101) +--(axis cs:365568,1.62875947613802) +--(axis cs:365056,1.63024085570288) +--(axis cs:364544,1.63425919191348) +--(axis cs:364032,1.62927908381747) +--(axis cs:363520,1.63672504437054) +--(axis cs:363008,1.63636091972759) +--(axis cs:362496,1.63696983235957) +--(axis cs:361984,1.64151421856663) +--(axis cs:361472,1.64464523763258) +--(axis cs:360960,1.64507086785267) +--(axis cs:360448,1.64910027780999) +--(axis cs:359936,1.65218931834266) +--(axis cs:359424,1.6627046599951) +--(axis cs:358912,1.66125722667229) +--(axis cs:358400,1.6640248892449) +--(axis cs:357888,1.66981680449436) +--(axis cs:357376,1.67132859647822) +--(axis cs:356864,1.67388742596683) +--(axis cs:356352,1.67798341380433) +--(axis cs:355840,1.67820545164922) +--(axis cs:355328,1.68416734476327) +--(axis cs:354816,1.68644218596985) +--(axis cs:354304,1.69038138779155) +--(axis cs:353792,1.68441911110324) +--(axis cs:353280,1.68820660984861) +--(axis cs:352768,1.68705550955364) +--(axis cs:352256,1.69285478913342) +--(axis cs:351744,1.69475774675199) +--(axis cs:351232,1.69885644767078) +--(axis cs:350720,1.69943247096642) +--(axis cs:350208,1.70629022143945) +--(axis cs:349696,1.70674311244398) +--(axis cs:349184,1.713082125158) +--(axis cs:348672,1.71411371260604) +--(axis cs:348160,1.71619405432256) +--(axis cs:347648,1.72409242673949) +--(axis cs:347136,1.72026881773395) +--(axis cs:346624,1.71534095971536) +--(axis cs:346112,1.71545001108474) +--(axis cs:345600,1.714032156472) +--(axis cs:345088,1.71462542795945) +--(axis cs:344576,1.71679055944838) +--(axis cs:344064,1.70873302330029) +--(axis cs:343552,1.70136923420764) +--(axis cs:343040,1.70210653942802) +--(axis cs:342528,1.70562647131336) +--(axis cs:342016,1.70939538319108) +--(axis cs:341504,1.7091751683521) +--(axis cs:340992,1.70944138398937) +--(axis cs:340480,1.71357156116919) +--(axis cs:339968,1.71328512376781) +--(axis cs:339456,1.71073736775707) +--(axis cs:338944,1.70605475805623) +--(axis cs:338432,1.70310559817429) +--(axis cs:337920,1.70401805670023) +--(axis cs:337408,1.70696201571495) +--(axis cs:336896,1.70705423102945) +--(axis cs:336384,1.70647644569772) +--(axis cs:335872,1.70762845422492) +--(axis cs:335360,1.70810648527426) +--(axis cs:334848,1.70883690077317) +--(axis cs:334336,1.71281029369021) +--(axis cs:333824,1.71520342319092) +--(axis cs:333312,1.71483107938639) +--(axis cs:332800,1.71524928096842) +--(axis cs:332288,1.71544477980035) +--(axis cs:331776,1.71666253698461) +--(axis cs:331264,1.72098630351632) +--(axis cs:330752,1.72427793743183) +--(axis cs:330240,1.72607865062222) +--(axis cs:329728,1.72033080350616) +--(axis cs:329216,1.72114224653034) +--(axis cs:328704,1.72149563500634) +--(axis cs:328192,1.72010336214175) +--(axis cs:327680,1.72429642135816) +--(axis cs:327168,1.72463593684212) +--(axis cs:326656,1.72277770545632) +--(axis cs:326144,1.72837115603639) +--(axis cs:325632,1.72739104240974) +--(axis cs:325120,1.72806882404819) +--(axis cs:324608,1.72258907118926) +--(axis cs:324096,1.7176756979119) +--(axis cs:323584,1.71667212245904) +--(axis cs:323072,1.71882639366285) +--(axis cs:322560,1.72486066210272) +--(axis cs:322048,1.72674368933249) +--(axis cs:321536,1.72424969785139) +--(axis cs:321024,1.73119227595665) +--(axis cs:320512,1.73043702078915) +--(axis cs:320000,1.73050755602709) +--(axis cs:319488,1.73058320414133) +--(axis cs:318976,1.74247715768151) +--(axis cs:318464,1.75327130417823) +--(axis cs:317952,1.75680040685688) +--(axis cs:317440,1.76383168695234) +--(axis cs:316928,1.76377665038609) +--(axis cs:316416,1.76732417796457) +--(axis cs:315904,1.76840998525794) +--(axis cs:315392,1.7717275153607) +--(axis cs:314880,1.77459176345119) +--(axis cs:314368,1.77344198083425) +--(axis cs:313856,1.78161666111708) +--(axis cs:313344,1.78463980837439) +--(axis cs:312832,1.78204468518767) +--(axis cs:312320,1.78363988726752) +--(axis cs:311808,1.78218790260514) +--(axis cs:311296,1.78165045877685) +--(axis cs:310784,1.78276970810082) +--(axis cs:310272,1.78362298349315) +--(axis cs:309760,1.7820706555158) +--(axis cs:309248,1.78245011911788) +--(axis cs:308736,1.77722518852494) +--(axis cs:308224,1.7902716125173) +--(axis cs:307712,1.78511367390826) +--(axis cs:307200,1.78329881708088) +--(axis cs:306688,1.78449507893396) +--(axis cs:306176,1.78090532169442) +--(axis cs:305664,1.77841224274025) +--(axis cs:305152,1.77527457538381) +--(axis cs:304640,1.77437977248792) +--(axis cs:304128,1.77213560204408) +--(axis cs:303616,1.76952869959303) +--(axis cs:303104,1.76183703060084) +--(axis cs:302592,1.75788500851217) +--(axis cs:302080,1.75163105106096) +--(axis cs:301568,1.74530065316081) +--(axis cs:301056,1.74189556089588) +--(axis cs:300544,1.73826725892346) +--(axis cs:300032,1.73642640213473) +--(axis cs:299520,1.7327443304953) +--(axis cs:299008,1.73148811205748) +--(axis cs:298496,1.7282250156302) +--(axis cs:297984,1.72613860566653) +--(axis cs:297472,1.72477880213288) +--(axis cs:296960,1.71852681759161) +--(axis cs:296448,1.71020450221123) +--(axis cs:295936,1.70263807364833) +--(axis cs:295424,1.69574501990439) +--(axis cs:294912,1.69093825688043) +--(axis cs:294400,1.68467909642348) +--(axis cs:293888,1.67749681019601) +--(axis cs:293376,1.67538148087179) +--(axis cs:292864,1.67385310371597) +--(axis cs:292352,1.66872514653099) +--(axis cs:291840,1.66938545597701) +--(axis cs:291328,1.66345958664662) +--(axis cs:290816,1.6616379590304) +--(axis cs:290304,1.65893699582607) +--(axis cs:289792,1.65762038950871) +--(axis cs:289280,1.65293467424111) +--(axis cs:288768,1.65306977465403) +--(axis cs:288256,1.64754346462919) +--(axis cs:287744,1.64173468819527) +--(axis cs:287232,1.63745923784971) +--(axis cs:286720,1.63667392101356) +--(axis cs:286208,1.63303041972013) +--(axis cs:285696,1.63018759365987) +--(axis cs:285184,1.62251356569625) +--(axis cs:284672,1.61512998317552) +--(axis cs:284160,1.60907803507533) +--(axis cs:283648,1.60172566895216) +--(axis cs:283136,1.59427627250153) +--(axis cs:282624,1.58782657166243) +--(axis cs:282112,1.58623406464761) +--(axis cs:281600,1.58747867058389) +--(axis cs:281088,1.58478010953767) +--(axis cs:280576,1.58915196390272) +--(axis cs:280064,1.58757636698441) +--(axis cs:279552,1.5850066493556) +--(axis cs:279040,1.58320394367379) +--(axis cs:278528,1.57333418649311) +--(axis cs:278016,1.57312500751416) +--(axis cs:277504,1.565249883467) +--(axis cs:276992,1.56489478832685) +--(axis cs:276480,1.56045512824458) +--(axis cs:275968,1.5585013843123) +--(axis cs:275456,1.55302988603889) +--(axis cs:274944,1.54953546974112) +--(axis cs:274432,1.55115007991258) +--(axis cs:273920,1.55022189569283) +--(axis cs:273408,1.5490895840562) +--(axis cs:272896,1.54257659448918) +--(axis cs:272384,1.54111418086387) +--(axis cs:271872,1.5442817341959) +--(axis cs:271360,1.54955873767457) +--(axis cs:270848,1.55573754431817) +--(axis cs:270336,1.55586619404327) +--(axis cs:269824,1.55518367229906) +--(axis cs:269312,1.55053890239284) +--(axis cs:268800,1.55237564532483) +--(axis cs:268288,1.54734208007326) +--(axis cs:267776,1.54381474328417) +--(axis cs:267264,1.54446923211936) +--(axis cs:266752,1.53968972942044) +--(axis cs:266240,1.53242620914954) +--(axis cs:265728,1.53865429187808) +--(axis cs:265216,1.54023187220555) +--(axis cs:264704,1.54472664339157) +--(axis cs:264192,1.54660904979925) +--(axis cs:263680,1.5506611128066) +--(axis cs:263168,1.55669470973959) +--(axis cs:262656,1.55450343353645) +--(axis cs:262144,1.55858678016197) +--(axis cs:261632,1.55717966382495) +--(axis cs:261120,1.56121951191208) +--(axis cs:260608,1.56249083125011) +--(axis cs:260096,1.56235968497998) +--(axis cs:259584,1.56190138872718) +--(axis cs:259072,1.56650013729193) +--(axis cs:258560,1.57428496087084) +--(axis cs:258048,1.57822209219245) +--(axis cs:257536,1.5864741503428) +--(axis cs:257024,1.58930647794693) +--(axis cs:256512,1.59810759726022) +--(axis cs:256000,1.60414221996266) +--(axis cs:255488,1.6088988385337) +--(axis cs:254976,1.60542298233464) +--(axis cs:254464,1.5977066730031) +--(axis cs:253952,1.59896679962535) +--(axis cs:253440,1.60078054740959) +--(axis cs:252928,1.59806847263212) +--(axis cs:252416,1.59482647337715) +--(axis cs:251904,1.59654584727799) +--(axis cs:251392,1.60487407641865) +--(axis cs:250880,1.6100781710146) +--(axis cs:250368,1.61405819423402) +--(axis cs:249856,1.61890898926518) +--(axis cs:249344,1.62192252169124) +--(axis cs:248832,1.62475470786555) +--(axis cs:248320,1.6252432827687) +--(axis cs:247808,1.62669924065002) +--(axis cs:247296,1.62836289621303) +--(axis cs:246784,1.6231558606631) +--(axis cs:246272,1.62766313918632) +--(axis cs:245760,1.6352953618375) +--(axis cs:245248,1.64017022095135) +--(axis cs:244736,1.64068575381475) +--(axis cs:244224,1.64339087682888) +--(axis cs:243712,1.6478978001041) +--(axis cs:243200,1.64489286784394) +--(axis cs:242688,1.64981762623729) +--(axis cs:242176,1.64800388246427) +--(axis cs:241664,1.64776816015691) +--(axis cs:241152,1.64643362130086) +--(axis cs:240640,1.64482410140494) +--(axis cs:240128,1.64663708442091) +--(axis cs:239616,1.63920988348322) +--(axis cs:239104,1.64309192027106) +--(axis cs:238592,1.64080033891532) +--(axis cs:238080,1.63812301584252) +--(axis cs:237568,1.63960130256745) +--(axis cs:237056,1.63910958069497) +--(axis cs:236544,1.6373451044813) +--(axis cs:236032,1.64069063570711) +--(axis cs:235520,1.64100540812263) +--(axis cs:235008,1.64418855921781) +--(axis cs:234496,1.643384253975) +--(axis cs:233984,1.64465557797007) +--(axis cs:233472,1.64440714842122) +--(axis cs:232960,1.6448743330735) +--(axis cs:232448,1.65229832779017) +--(axis cs:231936,1.6537319050645) +--(axis cs:231424,1.65439169675495) +--(axis cs:230912,1.65378755279065) +--(axis cs:230400,1.65222353033014) +--(axis cs:229888,1.65614008834813) +--(axis cs:229376,1.65338063340368) +--(axis cs:228864,1.65588281624502) +--(axis cs:228352,1.65649892588619) +--(axis cs:227840,1.65627151347529) +--(axis cs:227328,1.66140526306182) +--(axis cs:226816,1.66273905384199) +--(axis cs:226304,1.66859849136354) +--(axis cs:225792,1.66239623293666) +--(axis cs:225280,1.66033744483136) +--(axis cs:224768,1.66305164711269) +--(axis cs:224256,1.66752317740843) +--(axis cs:223744,1.66191242986118) +--(axis cs:223232,1.65949104321451) +--(axis cs:222720,1.66351151564848) +--(axis cs:222208,1.66087688879195) +--(axis cs:221696,1.65936469379268) +--(axis cs:221184,1.66361556009439) +--(axis cs:220672,1.66737249290873) +--(axis cs:220160,1.67090415561494) +--(axis cs:219648,1.67083984456779) +--(axis cs:219136,1.67183122175623) +--(axis cs:218624,1.67611303642687) +--(axis cs:218112,1.6830987381877) +--(axis cs:217600,1.68314621621962) +--(axis cs:217088,1.68266720537548) +--(axis cs:216576,1.68911624124682) +--(axis cs:216064,1.68639741074976) +--(axis cs:215552,1.69441002639341) +--(axis cs:215040,1.6953279460468) +--(axis cs:214528,1.70240492666084) +--(axis cs:214016,1.70593595374566) +--(axis cs:213504,1.70722504196962) +--(axis cs:212992,1.70827825166454) +--(axis cs:212480,1.71102142760637) +--(axis cs:211968,1.70973865049983) +--(axis cs:211456,1.71105671128341) +--(axis cs:210944,1.71326026260319) +--(axis cs:210432,1.71470116914918) +--(axis cs:209920,1.71647498699168) +--(axis cs:209408,1.71737639671126) +--(axis cs:208896,1.72116644203084) +--(axis cs:208384,1.72213737347251) +--(axis cs:207872,1.72870045730129) +--(axis cs:207360,1.72915249890655) +--(axis cs:206848,1.73363105320599) +--(axis cs:206336,1.72789168977695) +--(axis cs:205824,1.76265319046395) +--(axis cs:205312,1.75611879232336) +--(axis cs:204800,1.75117301541265) +--(axis cs:204288,1.74869284659295) +--(axis cs:203776,1.74319394400475) +--(axis cs:203264,1.73885354870788) +--(axis cs:202752,1.73359814977122) +--(axis cs:202240,1.73059482894408) +--(axis cs:201728,1.73217076836033) +--(axis cs:201216,1.72848069501231) +--(axis cs:200704,1.72552791674644) +--(axis cs:200192,1.72248906555011) +--(axis cs:199680,1.71824444359794) +--(axis cs:199168,1.71410426012202) +--(axis cs:198656,1.70872828145784) +--(axis cs:198144,1.69826224453566) +--(axis cs:197632,1.69484152124469) +--(axis cs:197120,1.69065958361184) +--(axis cs:196608,1.686262325208) +--(axis cs:196096,1.68353169351079) +--(axis cs:195584,1.68079538928624) +--(axis cs:195072,1.67839783552303) +--(axis cs:194560,1.67591287902014) +--(axis cs:194048,1.67113761524364) +--(axis cs:193536,1.66640686835352) +--(axis cs:193024,1.66802582471591) +--(axis cs:192512,1.66357631173751) +--(axis cs:192000,1.66282457473804) +--(axis cs:191488,1.65787119308714) +--(axis cs:190976,1.65090321908258) +--(axis cs:190464,1.64407000052468) +--(axis cs:189952,1.64340403086568) +--(axis cs:189440,1.63359954253034) +--(axis cs:188928,1.62534902188243) +--(axis cs:188416,1.61904764324606) +--(axis cs:187904,1.61245971486892) +--(axis cs:187392,1.60671709106611) +--(axis cs:186880,1.59934768849264) +--(axis cs:186368,1.59062407234009) +--(axis cs:185856,1.58899687808502) +--(axis cs:185344,1.58543218446624) +--(axis cs:184832,1.58553034715015) +--(axis cs:184320,1.58494768970469) +--(axis cs:183808,1.57978863842291) +--(axis cs:183296,1.57158611159086) +--(axis cs:182784,1.56803733345175) +--(axis cs:182272,1.56211163124789) +--(axis cs:181760,1.55576014956689) +--(axis cs:181248,1.54341826833172) +--(axis cs:180736,1.53554001859258) +--(axis cs:180224,1.5278535225484) +--(axis cs:179712,1.52488328725265) +--(axis cs:179200,1.51930370644423) +--(axis cs:178688,1.52236112929236) +--(axis cs:178176,1.51272690692135) +--(axis cs:177664,1.50992526690024) +--(axis cs:177152,1.50295427412409) +--(axis cs:176640,1.49962661459542) +--(axis cs:176128,1.49118686503901) +--(axis cs:175616,1.48861171513182) +--(axis cs:175104,1.48565443888543) +--(axis cs:174592,1.47942229620719) +--(axis cs:174080,1.47526952822498) +--(axis cs:173568,1.4790352655383) +--(axis cs:173056,1.46757090423764) +--(axis cs:172544,1.46980157862506) +--(axis cs:172032,1.46905785375129) +--(axis cs:171520,1.47853749771029) +--(axis cs:171008,1.48040107983882) +--(axis cs:170496,1.47519785521098) +--(axis cs:169984,1.47175464434428) +--(axis cs:169472,1.47415132155296) +--(axis cs:168960,1.46152867379534) +--(axis cs:168448,1.46151674485323) +--(axis cs:167936,1.46282108353194) +--(axis cs:167424,1.46960700145613) +--(axis cs:166912,1.47219531405883) +--(axis cs:166400,1.46778100893155) +--(axis cs:165888,1.47328216538971) +--(axis cs:165376,1.47288284451926) +--(axis cs:164864,1.47185413560446) +--(axis cs:164352,1.46761852843003) +--(axis cs:163840,1.46772617915477) +--(axis cs:163328,1.457036698177) +--(axis cs:162816,1.45575511839881) +--(axis cs:162304,1.45267667737955) +--(axis cs:161792,1.45659623308332) +--(axis cs:161280,1.45727691453031) +--(axis cs:160768,1.46042010518757) +--(axis cs:160256,1.46475250593877) +--(axis cs:159744,1.45928587630063) +--(axis cs:159232,1.45944921808544) +--(axis cs:158720,1.45940831734702) +--(axis cs:158208,1.45749361849654) +--(axis cs:157696,1.46111264608438) +--(axis cs:157184,1.45479405724987) +--(axis cs:156672,1.45356486375131) +--(axis cs:156160,1.44619439685813) +--(axis cs:155648,1.44888904611045) +--(axis cs:155136,1.45662472386242) +--(axis cs:154624,1.4610934084775) +--(axis cs:154112,1.4603240674227) +--(axis cs:153600,1.46043172816233) +--(axis cs:153088,1.45612800720453) +--(axis cs:152576,1.46822186291504) +--(axis cs:152064,1.46857547246921) +--(axis cs:151552,1.4713198867239) +--(axis cs:151040,1.46996176701269) +--(axis cs:150528,1.47221097490417) +--(axis cs:150016,1.47807039986621) +--(axis cs:149504,1.4828656628865) +--(axis cs:148992,1.48943954501856) +--(axis cs:148480,1.49160577885567) +--(axis cs:147968,1.49329956140629) +--(axis cs:147456,1.49941000744817) +--(axis cs:146944,1.50171603525587) +--(axis cs:146432,1.49879454384514) +--(axis cs:145920,1.49627990854086) +--(axis cs:145408,1.49135604803086) +--(axis cs:144896,1.49355845605362) +--(axis cs:144384,1.49337760133369) +--(axis cs:143872,1.49265214453048) +--(axis cs:143360,1.48629486943029) +--(axis cs:142848,1.49783194133175) +--(axis cs:142336,1.50305492653316) +--(axis cs:141824,1.50086082679084) +--(axis cs:141312,1.49480169048014) +--(axis cs:140800,1.49697381104717) +--(axis cs:140288,1.49550284540052) +--(axis cs:139776,1.4944151213012) +--(axis cs:139264,1.48489709070348) +--(axis cs:138752,1.4877203274769) +--(axis cs:138240,1.48554332122727) +--(axis cs:137728,1.48433670023143) +--(axis cs:137216,1.49157741825213) +--(axis cs:136704,1.48935536779349) +--(axis cs:136192,1.49349368174982) +--(axis cs:135680,1.49364024756547) +--(axis cs:135168,1.48890157289624) +--(axis cs:134656,1.48811459999698) +--(axis cs:134144,1.48840096978474) +--(axis cs:133632,1.48534340587157) +--(axis cs:133120,1.48771502533318) +--(axis cs:132608,1.48220098626379) +--(axis cs:132096,1.48114524848401) +--(axis cs:131584,1.48415498877847) +--(axis cs:131072,1.48575278279239) +--(axis cs:130560,1.48584030296933) +--(axis cs:130048,1.47723796644531) +--(axis cs:129536,1.47305209660153) +--(axis cs:129024,1.46966625068831) +--(axis cs:128512,1.47016223994143) +--(axis cs:128000,1.46806258140534) +--(axis cs:127488,1.46536392693662) +--(axis cs:126976,1.46462878253577) +--(axis cs:126464,1.46467372606655) +--(axis cs:125952,1.4624158453512) +--(axis cs:125440,1.45498523149781) +--(axis cs:124928,1.44912119770752) +--(axis cs:124416,1.44485733827079) +--(axis cs:123904,1.44601759350251) +--(axis cs:123392,1.44396557923284) +--(axis cs:122880,1.44288506411969) +--(axis cs:122368,1.44139204255682) +--(axis cs:121856,1.4351212216064) +--(axis cs:121344,1.43105540633722) +--(axis cs:120832,1.43080665173125) +--(axis cs:120320,1.42753059190553) +--(axis cs:119808,1.42131729628136) +--(axis cs:119296,1.42167576346327) +--(axis cs:118784,1.41237906486309) +--(axis cs:118272,1.39973730422678) +--(axis cs:117760,1.39866745779776) +--(axis cs:117248,1.39373652712269) +--(axis cs:116736,1.38752190886221) +--(axis cs:116224,1.37517466670499) +--(axis cs:115712,1.3745104572363) +--(axis cs:115200,1.3695122709068) +--(axis cs:114688,1.36107873235935) +--(axis cs:114176,1.34877898619041) +--(axis cs:113664,1.33824562476606) +--(axis cs:113152,1.33274094463469) +--(axis cs:112640,1.32239512890398) +--(axis cs:112128,1.31290308217212) +--(axis cs:111616,1.30320077699026) +--(axis cs:111104,1.29416012428871) +--(axis cs:110592,1.28429847469835) +--(axis cs:110080,1.27665326303359) +--(axis cs:109568,1.27121527964346) +--(axis cs:109056,1.26094788057501) +--(axis cs:108544,1.25947292758595) +--(axis cs:108032,1.2488385557327) +--(axis cs:107520,1.24274727999949) +--(axis cs:107008,1.24348169267319) +--(axis cs:106496,1.2333061261103) +--(axis cs:105984,1.22669786940647) +--(axis cs:105472,1.22112479001687) +--(axis cs:104960,1.21347594272129) +--(axis cs:104448,1.20540519037062) +--(axis cs:103936,1.19484465295809) +--(axis cs:103424,1.25016261150131) +--(axis cs:102912,1.23418517713416) +--(axis cs:102400,1.21582872299089) +--(axis cs:101888,1.20047742064533) +--(axis cs:101376,1.18143544065418) +--(axis cs:100864,1.16445132832005) +--(axis cs:100352,1.14619456391428) +--(axis cs:99840,1.12315845415279) +--(axis cs:99328,1.10128270483468) +--(axis cs:98816,1.07801103885179) +--(axis cs:98304,1.05586664344394) +--(axis cs:97792,1.03439152503242) +--(axis cs:97280,1.01200835749207) +--(axis cs:96768,0.988287757747912) +--(axis cs:96256,0.971946379704838) +--(axis cs:95744,0.956143562487603) +--(axis cs:95232,0.939144844994619) +--(axis cs:94720,0.931116791596401) +--(axis cs:94208,0.920148856807875) +--(axis cs:93696,0.909633427281624) +--(axis cs:93184,0.896130581112245) +--(axis cs:92672,0.886214789272343) +--(axis cs:92160,0.87665337695478) +--(axis cs:91648,0.857858155572548) +--(axis cs:91136,0.841482449908431) +--(axis cs:90624,0.831451966638191) +--(axis cs:90112,0.819063914691151) +--(axis cs:89600,0.809242780920568) +--(axis cs:89088,0.800325124799954) +--(axis cs:88576,0.785824093962726) +--(axis cs:88064,0.767670625699619) +--(axis cs:87552,0.758650805986936) +--(axis cs:87040,0.738427649033456) +--(axis cs:86528,0.728422475873278) +--(axis cs:86016,0.711057296769741) +--(axis cs:85504,0.695455782515798) +--(axis cs:84992,0.680840931580281) +--(axis cs:84480,0.673633713134201) +--(axis cs:83968,0.660460604436716) +--(axis cs:83456,0.654305705663452) +--(axis cs:82944,0.641038281073655) +--(axis cs:82432,0.631685815438448) +--(axis cs:81920,0.613124752524586) +--(axis cs:81408,0.607798353407184) +--(axis cs:80896,0.600589428378497) +--(axis cs:80384,0.585557436655965) +--(axis cs:79872,0.570576307660476) +--(axis cs:79360,0.554335098634426) +--(axis cs:78848,0.544631632545236) +--(axis cs:78336,0.5299725720146) +--(axis cs:77824,0.515250377401938) +--(axis cs:77312,0.503719449257533) +--(axis cs:76800,0.490509411663967) +--(axis cs:76288,0.486390029104224) +--(axis cs:75776,0.479845630948841) +--(axis cs:75264,0.468500869205894) +--(axis cs:74752,0.463586060640116) +--(axis cs:74240,0.456818784208223) +--(axis cs:73728,0.446502154617983) +--(axis cs:73216,0.431967815626615) +--(axis cs:72704,0.431311173676665) +--(axis cs:72192,0.415607440484948) +--(axis cs:71680,0.405256916593983) +--(axis cs:71168,0.389023263468516) +--(axis cs:70656,0.375583807718429) +--(axis cs:70144,0.366670499528502) +--(axis cs:69632,0.355833150753976) +--(axis cs:69120,0.34415773348882) +--(axis cs:68608,0.327721528991555) +--(axis cs:68096,0.319042840737391) +--(axis cs:67584,0.313107137426667) +--(axis cs:67072,0.308896794460512) +--(axis cs:66560,0.297028097787964) +--(axis cs:66048,0.293372652324677) +--(axis cs:65536,0.2863516245382) +--(axis cs:65024,0.274704202894895) +--(axis cs:64512,0.256196223428875) +--(axis cs:64000,0.259368793605675) +--(axis cs:63488,0.26118071750197) +--(axis cs:62976,0.24264049054849) +--(axis cs:62464,0.233065366722915) +--(axis cs:61952,0.223490296989401) +--(axis cs:61440,0.215802601279893) +--(axis cs:60928,0.20910252124228) +--(axis cs:60416,0.194595322748604) +--(axis cs:59904,0.175798262827682) +--(axis cs:59392,0.153901196167387) +--(axis cs:58880,0.145083437027128) +--(axis cs:58368,0.133856131444717) +--(axis cs:57856,0.124462833845135) +--(axis cs:57344,0.108264984555929) +--(axis cs:56832,0.104400239689767) +--(axis cs:56320,0.0999635509624171) +--(axis cs:55808,0.100120638416843) +--(axis cs:55296,0.0902990643497027) +--(axis cs:54784,0.0798239311596155) +--(axis cs:54272,0.0805493382754014) +--(axis cs:53760,0.0592549867166342) +--(axis cs:53248,0.0645325233675247) +--(axis cs:52736,0.0517809091975652) +--(axis cs:52224,0.0236180171564355) +--(axis cs:51712,0.0199861629518601) +--(axis cs:51200,0.0145892452023847) +--(axis cs:50688,0.0109225517117524) +--(axis cs:50176,0.011907154301795) +--(axis cs:49664,0.0134655127777342) +--(axis cs:49152,0.00385860379428551) +--(axis cs:48640,0.00126824243825072) +--(axis cs:48128,-0.0100208386902547) +--(axis cs:47616,-0.0247992087349104) +--(axis cs:47104,-0.0360269280891002) +--(axis cs:46592,-0.0420900559312653) +--(axis cs:46080,-0.0566352703057523) +--(axis cs:45568,-0.0622770705489996) +--(axis cs:45056,-0.0755857650199356) +--(axis cs:44544,-0.0835432218269684) +--(axis cs:44032,-0.0713564886854318) +--(axis cs:43520,-0.0784775410985369) +--(axis cs:43008,-0.0630342018866565) +--(axis cs:42496,-0.0745402077072769) +--(axis cs:41984,-0.0784348817145472) +--(axis cs:41472,-0.0858482932469579) +--(axis cs:40960,-0.0903558927213386) +--(axis cs:40448,-0.0947851705484306) +--(axis cs:39936,-0.1124550993509) +--(axis cs:39424,-0.128224090962066) +--(axis cs:38912,-0.138616664566068) +--(axis cs:38400,-0.159091023154454) +--(axis cs:37888,-0.180273115928702) +--(axis cs:37376,-0.177314710648461) +--(axis cs:36864,-0.163917142366247) +--(axis cs:36352,-0.170140309731611) +--(axis cs:35840,-0.156478238751526) +--(axis cs:35328,-0.159024612272715) +--(axis cs:34816,-0.158736606519132) +--(axis cs:34304,-0.162287047358015) +--(axis cs:33792,-0.15520823422246) +--(axis cs:33280,-0.158380948188508) +--(axis cs:32768,-0.167720343824408) +--(axis cs:32256,-0.163376086904987) +--(axis cs:31744,-0.174385164785495) +--(axis cs:31232,-0.180131022131045) +--(axis cs:30720,-0.200047765526466) +--(axis cs:30208,-0.201326612980255) +--(axis cs:29696,-0.204208724144052) +--(axis cs:29184,-0.205955082920037) +--(axis cs:28672,-0.211670819053869) +--(axis cs:28160,-0.207178801086514) +--(axis cs:27648,-0.218144606715988) +--(axis cs:27136,-0.218374023798211) +--(axis cs:26624,-0.232584564874105) +--(axis cs:26112,-0.224540322157387) +--(axis cs:25600,-0.235199670703821) +--(axis cs:25088,-0.239504067620407) +--(axis cs:24576,-0.248989531640186) +--(axis cs:24064,-0.23557101660526) +--(axis cs:23552,-0.2366186023337) +--(axis cs:23040,-0.239165166462852) +--(axis cs:22528,-0.236741855481721) +--(axis cs:22016,-0.231557203726179) +--(axis cs:21504,-0.223113983241813) +--(axis cs:20992,-0.228960719111069) +--(axis cs:20480,-0.235142052900016) +--(axis cs:19968,-0.227629602930163) +--(axis cs:19456,-0.222892660521773) +--(axis cs:18944,-0.224016309987811) +--(axis cs:18432,-0.22933286624089) +--(axis cs:17920,-0.242318928205424) +--(axis cs:17408,-0.249628729246964) +--(axis cs:16896,-0.243096775311484) +--(axis cs:16384,-0.232701978686065) +--(axis cs:15872,-0.233642892672769) +--(axis cs:15360,-0.225504757883103) +--(axis cs:14848,-0.230306397744162) +--(axis cs:14336,-0.219455415717533) +--(axis cs:13824,-0.23396962851732) +--(axis cs:13312,-0.235955620683949) +--(axis cs:12800,-0.241062689315769) +--(axis cs:12288,-0.264371626707968) +--(axis cs:11776,-0.277331494045092) +--(axis cs:11264,-0.276772386363745) +--(axis cs:10752,-0.298155011744374) +--(axis cs:10240,-0.281248905690948) +--(axis cs:9728,-0.279085375243066) +--(axis cs:9216,-0.256843507098444) +--(axis cs:8704,-0.250247135952781) +--(axis cs:8192,-0.258139916468848) +--(axis cs:7680,-0.246082121759788) +--(axis cs:7168,-0.270189839894166) +--(axis cs:6656,-0.286996765224228) +--(axis cs:6144,-0.308269636584618) +--(axis cs:5632,-0.332495377700076) +--(axis cs:5120,-0.325037104340596) +--(axis cs:4608,-0.324030998743944) +--(axis cs:4096,-0.310737464878392) +--(axis cs:3584,-0.277455676933567) +--(axis cs:3072,-0.247456073939181) +--(axis cs:2560,-0.187238741401997) +--(axis cs:2048,-0.129340684999517) +--(axis cs:1536,-0.0256835370774362) +--(axis cs:1024,-0.0390980740860079) +--(axis cs:512,-0.0334432687843966) +--cycle; + +\addplot [semithick, teal] +table {% +512 -0.0547393560409546 +1024 -0.0823312997817993 +1536 -0.0431207418441772 +2048 -0.146269917488098 +2560 -0.20812201499939 +3072 -0.264338850975037 +3584 -0.292925834655762 +4096 -0.333457469940186 +4608 -0.341741681098938 +5120 -0.345189690589905 +5632 -0.354732871055603 +6144 -0.324785590171814 +6656 -0.303073287010193 +7168 -0.285414814949036 +7680 -0.264638781547546 +8192 -0.275526404380798 +8704 -0.268984913825989 +9216 -0.2791348695755 +9728 -0.302735447883606 +10240 -0.306705713272095 +10752 -0.320430159568787 +11264 -0.294166088104248 +11776 -0.291059970855713 +12288 -0.279739260673523 +12800 -0.259834408760071 +13312 -0.255795240402222 +13824 -0.248074293136597 +14336 -0.236163258552551 +15360 -0.240111112594604 +15872 -0.248334407806396 +16384 -0.250508666038513 +16896 -0.263281226158142 +17408 -0.27454149723053 +17920 -0.260948777198792 +18432 -0.255580425262451 +18944 -0.24797534942627 +19456 -0.242233037948608 +19968 -0.249489784240723 +20480 -0.258705854415894 +20992 -0.257127165794373 +21504 -0.2546706199646 +22016 -0.261195421218872 +22528 -0.258854269981384 +23040 -0.257617950439453 +23552 -0.253701329231262 +24064 -0.254349589347839 +24576 -0.264011383056641 +25088 -0.253310441970825 +25600 -0.255931377410889 +26112 -0.251568913459778 +26624 -0.258355140686035 +27136 -0.244389057159424 +27648 -0.239879131317139 +28160 -0.231515526771545 +28672 -0.23732578754425 +29184 -0.239759087562561 +29696 -0.235602140426636 +30208 -0.22663426399231 +30720 -0.221810579299927 +31232 -0.200907826423645 +31744 -0.193646669387817 +32256 -0.181507587432861 +32768 -0.186582922935486 +33280 -0.183706641197205 +34304 -0.184460878372192 +34816 -0.178372025489807 +35328 -0.173999071121216 +35840 -0.173424482345581 +36352 -0.183943271636963 +36864 -0.183749079704285 +37376 -0.197792053222656 +37888 -0.196535587310791 +38400 -0.184913873672485 +38912 -0.163633704185486 +39936 -0.136580467224121 +40448 -0.122846484184265 +40960 -0.116090297698975 +41472 -0.111932039260864 +41984 -0.0980607271194458 +42496 -0.0885612964630127 +43008 -0.0801141262054443 +43520 -0.0963784456253052 +44032 -0.0925323963165283 +44544 -0.0985479354858398 +45056 -0.0925321578979492 +45568 -0.0766127109527588 +46080 -0.0713281631469727 +46592 -0.0569355487823486 +47104 -0.051266074180603 +47616 -0.0396736860275269 +48128 -0.0326994657516479 +48640 -0.026705265045166 +49152 -0.0318373441696167 +49664 -0.0183017253875732 +50176 -0.0233391523361206 +50688 -0.0206723213195801 +51200 -0.014988899230957 +51712 -0.0137444734573364 +52224 -0.0108268260955811 +52736 0.0163931846618652 +53248 0.0284018516540527 +53760 0.0233001708984375 +54272 0.0380173921585083 +54784 0.038432240486145 +55296 0.0497528314590454 +55808 0.0645583868026733 +56320 0.0601599216461182 +56832 0.0685900449752808 +57344 0.0719174146652222 +57856 0.0902788639068604 +58368 0.101104497909546 +58880 0.112989068031311 +59392 0.12042498588562 +59904 0.143846750259399 +60416 0.162208318710327 +60928 0.172208070755005 +61440 0.178309321403503 +61952 0.185569405555725 +62976 0.202249765396118 +63488 0.217846632003784 +64000 0.219654202461243 +64512 0.219193935394287 +65024 0.232264876365662 +65536 0.242718935012817 +66048 0.254954099655151 +66560 0.255724310874939 +67584 0.270429491996765 +68096 0.2747802734375 +68608 0.281273126602173 +69120 0.299422264099121 +70144 0.319514870643616 +70656 0.333646655082703 +71168 0.345042943954468 +71680 0.361032485961914 +72192 0.374743103981018 +72704 0.387178182601929 +73216 0.38981032371521 +73728 0.405115485191345 +74752 0.424403429031372 +75776 0.44278359413147 +76288 0.445055603981018 +76800 0.448732376098633 +77312 0.458953142166138 +77824 0.474031805992126 +78336 0.487698912620544 +78848 0.499677300453186 +79360 0.503809452056885 +79872 0.5169837474823 +80384 0.532287359237671 +81408 0.550852298736572 +81920 0.554415941238403 +82432 0.576215505599976 +82944 0.588966131210327 +83456 0.603253245353699 +83968 0.610467910766602 +84480 0.622863173484802 +84992 0.629390835762024 +85504 0.641058921813965 +86016 0.655249118804932 +86528 0.67233681678772 +87040 0.683757305145264 +87552 0.698271512985229 +88064 0.7055504322052 +89088 0.738287925720215 +89600 0.746217131614685 +90112 0.755300998687744 +91136 0.778040885925293 +91648 0.792309284210205 +92160 0.809940457344055 +92672 0.822168111801147 +93184 0.832845449447632 +93696 0.847905397415161 +94720 0.871586799621582 +95232 0.878954410552979 +96256 0.91041088104248 +96768 0.926054954528809 +97280 0.948409795761108 +97792 0.969474077224731 +98304 0.993341445922852 +98816 1.0140243768692 +99328 1.03715240955353 +99840 1.058309674263 +100352 1.08159971237183 +100864 1.09996855258942 +101888 1.133908867836 +103424 1.17851090431213 +103936 1.13246297836304 +104448 1.14280772209167 +104960 1.15117990970612 +105472 1.15704381465912 +106496 1.17148077487946 +107008 1.18129217624664 +107520 1.18198251724243 +108032 1.18750536441803 +108544 1.19953942298889 +109056 1.20456767082214 +109568 1.2196272611618 +110080 1.22759222984314 +110592 1.23742842674255 +111104 1.24459946155548 +111616 1.25754499435425 +112128 1.26954650878906 +113152 1.28689312934875 +113664 1.28941011428833 +114176 1.30314826965332 +114688 1.31584548950195 +115200 1.32158160209656 +116224 1.32551562786102 +116736 1.33681392669678 +117248 1.34403264522552 +117760 1.34993612766266 +118272 1.352578997612 +118784 1.363605260849 +119296 1.37375235557556 +119808 1.37162292003632 +120320 1.37697267532349 +121856 1.38628375530243 +122368 1.39249837398529 +122880 1.39557206630707 +123392 1.39976632595062 +123904 1.40227508544922 +124416 1.40612125396729 +124928 1.4080502986908 +125440 1.41208112239838 +126464 1.42387080192566 +126976 1.42381203174591 +127488 1.42619657516479 +128000 1.43014430999756 +128512 1.43243134021759 +129024 1.43081629276276 +130048 1.4401878118515 +130560 1.44738912582397 +131072 1.44913375377655 +131584 1.44661450386047 +132096 1.44237649440765 +132608 1.44354498386383 +133120 1.44852757453918 +133632 1.44317746162415 +134144 1.44975876808167 +134656 1.44949889183044 +135168 1.44798350334167 +136192 1.44877123832703 +136704 1.44198501110077 +137216 1.44747221469879 +137728 1.44063985347748 +138240 1.44152641296387 +138752 1.44406771659851 +139264 1.44177329540253 +139776 1.45102214813232 +140288 1.45730042457581 +140800 1.45918655395508 +141312 1.45797729492188 +141824 1.46637010574341 +142336 1.46974730491638 +142848 1.46697819232941 +143360 1.45801770687103 +143872 1.46168434619904 +144384 1.46357131004333 +144896 1.45913541316986 +145408 1.45102679729462 +145920 1.45610272884369 +146432 1.45704412460327 +146944 1.45668697357178 +147456 1.4546834230423 +147968 1.44571697711945 +148480 1.44099831581116 +148992 1.44185423851013 +149504 1.43661034107208 +150016 1.42996644973755 +151040 1.42391967773438 +151552 1.42308557033539 +152576 1.418417096138 +153088 1.40559613704681 +153600 1.40375673770905 +154624 1.40938818454742 +155136 1.40884983539581 +155648 1.40566539764404 +156160 1.41096723079681 +156672 1.41952276229858 +157696 1.43140041828156 +159232 1.43093848228455 +159744 1.42925953865051 +160256 1.43287038803101 +161280 1.42538130283356 +161792 1.42534446716309 +162304 1.41855907440186 +162816 1.42317759990692 +163328 1.42463433742523 +163840 1.42947769165039 +164352 1.42987370491028 +164864 1.43464398384094 +165376 1.43658697605133 +165888 1.44100594520569 +166400 1.43696236610413 +166912 1.4412305355072 +167424 1.43906724452972 +167936 1.43225848674774 +168448 1.43155825138092 +168960 1.43335711956024 +169472 1.4429692029953 +169984 1.44089448451996 +170496 1.4463928937912 +171008 1.45340013504028 +171520 1.45020699501038 +172032 1.43999528884888 +172544 1.44233000278473 +173056 1.44120693206787 +173568 1.45299601554871 +174080 1.44965827465057 +174592 1.45390772819519 +175104 1.46045994758606 +175616 1.46301579475403 +176128 1.46346080303192 +176640 1.47286689281464 +177152 1.47781193256378 +177664 1.48144912719727 +178176 1.48239696025848 +178688 1.4896651506424 +179200 1.48698961734772 +179712 1.49393832683563 +180224 1.49908065795898 +180736 1.50596630573273 +181248 1.51459276676178 +181760 1.52633249759674 +182272 1.52996730804443 +182784 1.536789894104 +183296 1.53705883026123 +183808 1.54342889785767 +184832 1.55201852321625 +185344 1.55210304260254 +185856 1.55569779872894 +186368 1.55827271938324 +186880 1.56779968738556 +187904 1.58074867725372 +189440 1.59501814842224 +189952 1.60558426380157 +190464 1.6058292388916 +191488 1.61814737319946 +192000 1.62192487716675 +192512 1.62215316295624 +193024 1.62688112258911 +193536 1.6226361989975 +194560 1.635333776474 +195072 1.63769257068634 +195584 1.64171409606934 +196608 1.64605677127838 +197632 1.65278458595276 +198144 1.65897130966187 +198656 1.67077398300171 +199168 1.67845797538757 +199680 1.68210935592651 +200192 1.68468487262726 +201216 1.69426274299622 +201728 1.69815909862518 +202240 1.69679725170135 +202752 1.69795215129852 +203264 1.70412993431091 +203776 1.7090265750885 +204288 1.7149144411087 +204800 1.71774005889893 +205824 1.72671806812286 +206336 1.69752156734467 +206848 1.70265197753906 +207360 1.70098578929901 +207872 1.70360553264618 +208384 1.69578504562378 +208896 1.69608473777771 +209920 1.69130575656891 +210944 1.69091784954071 +211456 1.69067692756653 +211968 1.69212579727173 +212480 1.69164776802063 +212992 1.68785631656647 +213504 1.68699491024017 +214528 1.68049168586731 +215040 1.67504620552063 +215552 1.67296183109283 +216064 1.66647565364838 +216576 1.66765511035919 +217088 1.66225302219391 +218112 1.66302466392517 +218624 1.65652573108673 +219136 1.65326917171478 +219648 1.65100085735321 +220160 1.6518874168396 +221184 1.64444243907928 +221696 1.64168381690979 +222720 1.64587306976318 +223232 1.64368140697479 +223744 1.64430296421051 +224256 1.64790534973145 +224768 1.64227116107941 +225792 1.64190948009491 +226304 1.64509809017181 +226816 1.64079773426056 +227840 1.63465321063995 +228352 1.63590919971466 +229376 1.63148736953735 +229888 1.63419759273529 +230400 1.63198935985565 +230912 1.63327193260193 +231424 1.63317131996155 +231936 1.63033664226532 +232448 1.62962806224823 +232960 1.62241423130035 +233472 1.62268424034119 +233984 1.62471163272858 +234496 1.62303340435028 +235008 1.62227582931519 +236032 1.61446297168732 +236544 1.60881388187408 +238592 1.61218321323395 +239104 1.61740303039551 +239616 1.61432325839996 +240128 1.62078714370728 +240640 1.62097918987274 +241664 1.62475609779358 +242176 1.62437772750854 +242688 1.62640345096588 +243200 1.62196946144104 +243712 1.6245950460434 +244736 1.61619627475739 +245248 1.61542057991028 +245760 1.61242341995239 +246272 1.60359156131744 +246784 1.59895265102386 +247296 1.60434937477112 +247808 1.60096323490143 +248320 1.59903919696808 +248832 1.59431302547455 +249344 1.58820188045502 +249856 1.58443939685822 +250880 1.57468616962433 +251392 1.57466304302216 +251904 1.56802332401276 +252416 1.56649589538574 +252928 1.56814360618591 +253440 1.5685168504715 +253952 1.56756138801575 +254464 1.56377136707306 +254976 1.5699714422226 +255488 1.57232809066772 +256000 1.56588041782379 +256512 1.55859208106995 +257024 1.55357909202576 +257536 1.55148899555206 +258048 1.54445588588715 +258560 1.54147577285767 +259072 1.53421854972839 +259584 1.53186583518982 +260096 1.53225994110107 +260608 1.53037595748901 +261120 1.53133654594421 +261632 1.52688014507294 +262144 1.52870917320251 +262656 1.52285182476044 +263168 1.52695262432098 +263680 1.52295160293579 +264192 1.51991152763367 +264704 1.52066719532013 +265216 1.52051973342896 +265728 1.51901459693909 +266240 1.51408708095551 +266752 1.51991665363312 +267264 1.5236474275589 +267776 1.52415525913239 +268800 1.52837347984314 +269312 1.52426648139954 +269824 1.52920114994049 +270336 1.53030431270599 +270848 1.52859568595886 +271360 1.52397155761719 +271872 1.52109587192535 +272384 1.51938951015472 +272896 1.51999258995056 +273408 1.52879655361176 +273920 1.53058648109436 +274432 1.53324055671692 +274944 1.53099989891052 +275968 1.54264450073242 +276480 1.54423475265503 +276992 1.54862797260284 +277504 1.54980707168579 +278016 1.552170753479 +278528 1.55002284049988 +279040 1.55560684204102 +279552 1.55624139308929 +280064 1.55511736869812 +280576 1.55526089668274 +281088 1.54989588260651 +281600 1.55319631099701 +282624 1.55131459236145 +283136 1.55838167667389 +283648 1.56377255916595 +284160 1.57069420814514 +284672 1.57624971866608 +285184 1.58650934696198 +285696 1.59304165840149 +286208 1.59763848781586 +286720 1.59981858730316 +287232 1.59966540336609 +287744 1.60360455513 +288768 1.61387264728546 +289280 1.610844373703 +289792 1.6131364107132 +290304 1.61279273033142 +290816 1.61483025550842 +291328 1.6182736158371 +291840 1.62346363067627 +292352 1.62227094173431 +292864 1.62563800811768 +293376 1.62805473804474 +293888 1.6284042596817 +294400 1.63561415672302 +294912 1.64138388633728 +295936 1.65105438232422 +296448 1.65967679023743 +296960 1.66714406013489 +297472 1.67205429077148 +297984 1.67093503475189 +298496 1.6732724905014 +299008 1.67325484752655 +300032 1.67794001102448 +300544 1.68027138710022 +301056 1.68515586853027 +301568 1.68846833705902 +302592 1.70039856433868 +303104 1.70629620552063 +303616 1.71464788913727 +304640 1.72262358665466 +305152 1.72352170944214 +305664 1.72539591789246 +306176 1.72591960430145 +306688 1.73025989532471 +307200 1.72965598106384 +307712 1.73105871677399 +308224 1.73785793781281 +308736 1.72693467140198 +309248 1.73033154010773 +309760 1.7289971113205 +310272 1.73252391815186 +310784 1.73146629333496 +311296 1.72850978374481 +311808 1.72937262058258 +312320 1.73260402679443 +312832 1.7310357093811 +313344 1.73438942432404 +313856 1.73286736011505 +314368 1.72811615467072 +314880 1.72727239131927 +315904 1.72250092029572 +316416 1.7227098941803 +316928 1.71917366981506 +317440 1.71931636333466 +317952 1.71538364887238 +318464 1.7148814201355 +319488 1.70213460922241 +320512 1.70330095291138 +321024 1.70470833778381 +321536 1.69795429706573 +322048 1.69959759712219 +322560 1.69628608226776 +323072 1.69179582595825 +323584 1.6889466047287 +324608 1.69444417953491 +325120 1.69991183280945 +325632 1.70279490947723 +326144 1.70369696617126 +326656 1.70160925388336 +327168 1.70765340328217 +327680 1.70749616622925 +328192 1.70493125915527 +328704 1.70522224903107 +329216 1.70441257953644 +329728 1.70447158813477 +330240 1.70992004871368 +330752 1.70767724514008 +331776 1.69965815544128 +333824 1.69859158992767 +334336 1.69623732566833 +334848 1.69206690788269 +335360 1.68997204303741 +335872 1.69068384170532 +336384 1.68915772438049 +336896 1.68975555896759 +338432 1.68513989448547 +338944 1.6860466003418 +339456 1.68983638286591 +339968 1.69179821014404 +340480 1.6924991607666 +340992 1.68891644477844 +341504 1.68802177906036 +342016 1.68505501747131 +342528 1.68015515804291 +343040 1.67810344696045 +343552 1.67723846435547 +344576 1.69357323646545 +345088 1.69135272502899 +346112 1.69103717803955 +346624 1.69196128845215 +347136 1.69529783725739 +347648 1.69978845119476 +348160 1.69162130355835 +348672 1.68678426742554 +349184 1.68409860134125 +349696 1.67720758914948 +350208 1.67634809017181 +350720 1.67089545726776 +351232 1.66861057281494 +351744 1.66347289085388 +352256 1.66252613067627 +352768 1.65848660469055 +353280 1.65691745281219 +353792 1.65309083461761 +354304 1.65689301490784 +354816 1.65455257892609 +355328 1.65121161937714 +356352 1.64958786964417 +356864 1.64489948749542 +357376 1.64241278171539 +357888 1.64458072185516 +358400 1.63988387584686 +359424 1.63458096981049 +359936 1.62703418731689 +360448 1.62245309352875 +361472 1.61838853359222 +361984 1.61668980121613 +362496 1.6119989156723 +363008 1.60851776599884 +363520 1.60934710502625 +364032 1.60387766361237 +364544 1.60909867286682 +365056 1.60556745529175 +365568 1.60399270057678 +366080 1.60918533802032 +366592 1.61117136478424 +367104 1.60677647590637 +367616 1.61057078838348 +368128 1.61122822761536 +368640 1.61312329769135 +369152 1.61118280887604 +369664 1.61366999149323 +370176 1.61429727077484 +370688 1.61343657970428 +371200 1.60862290859222 +372224 1.61265790462494 +372736 1.60865938663483 +373248 1.61191248893738 +373760 1.60463726520538 +374272 1.6066552400589 +374784 1.60605406761169 +375296 1.60684049129486 +375808 1.60609078407288 +376320 1.60661947727203 +376832 1.60530424118042 +377344 1.59990441799164 +377856 1.5989785194397 +378368 1.59610557556152 +378880 1.60259425640106 +379392 1.60595321655273 +379904 1.61246681213379 +380416 1.61545050144196 +380928 1.61335217952728 +381440 1.6092221736908 +381952 1.6145191192627 +382464 1.61057960987091 +382976 1.61355865001678 +383488 1.61298418045044 +384000 1.60987281799316 +384512 1.61097192764282 +385024 1.60813307762146 +385536 1.61016237735748 +386048 1.61558246612549 +386560 1.61723625659943 +387072 1.62102377414703 +387584 1.62276840209961 +388096 1.6217702627182 +389120 1.62800395488739 +389632 1.62256503105164 +390144 1.62206614017487 +390656 1.62476229667664 +391168 1.63139247894287 +391680 1.63537776470184 +392192 1.64194786548615 +392704 1.64713215827942 +393216 1.65087735652924 +393728 1.65175938606262 +394240 1.65718185901642 +394752 1.66052281856537 +395264 1.66885817050934 +395776 1.67140996456146 +396288 1.67231869697571 +397312 1.67676043510437 +397824 1.68194723129272 +398336 1.68360006809235 +398848 1.6818790435791 +399360 1.68687856197357 +399872 1.69301867485046 +400384 1.69578528404236 +400896 1.70083689689636 +401408 1.7017560005188 +401920 1.70456790924072 +402432 1.7113801240921 +403968 1.72855961322784 +404480 1.73493385314941 +404992 1.73734772205353 +405504 1.73874592781067 +406016 1.73916530609131 +406528 1.74418711662292 +407040 1.74545359611511 +407552 1.74996888637543 +408064 1.75310277938843 +408576 1.75724351406097 +409600 1.76913368701935 +410112 1.76881265640259 +410624 1.77169251441956 +411136 1.76702678203583 +411648 1.76570224761963 +412160 1.7673362493515 +412672 1.76372563838959 +413184 1.76319897174835 +413696 1.76098418235779 +414208 1.76751518249512 +414720 1.76591360569 +415232 1.76724851131439 +415744 1.76190364360809 +416256 1.76036942005157 +417280 1.75899887084961 +417792 1.75500774383545 +418304 1.75671219825745 +418816 1.75298595428467 +419328 1.74812686443329 +420352 1.7464005947113 +421888 1.73711216449738 +422400 1.73661422729492 +423424 1.73086988925934 +423936 1.72662043571472 +424448 1.72884476184845 +424960 1.73020613193512 +425472 1.72883498668671 +425984 1.73284983634949 +426496 1.73002457618713 +427008 1.73115241527557 +427520 1.73607432842255 +428032 1.73156630992889 +428544 1.73315465450287 +429056 1.73079013824463 +429568 1.73394310474396 +430080 1.73489165306091 +430592 1.73217272758484 +431104 1.73200416564941 +431616 1.72581553459167 +432128 1.72305154800415 +432640 1.72546565532684 +433152 1.72673857212067 +433664 1.72044324874878 +434176 1.71970617771149 +434688 1.72185122966766 +436224 1.70883631706238 +437248 1.70449113845825 +437760 1.70444512367249 +438272 1.70272552967072 +439296 1.7039270401001 +439808 1.70698213577271 +440320 1.70728707313538 +440832 1.7020525932312 +441344 1.70293986797333 +441856 1.70680260658264 +442368 1.70882773399353 +442880 1.70928263664246 +443392 1.71084976196289 +443904 1.7055332660675 +446464 1.69189786911011 +446976 1.68970966339111 +447488 1.69246125221252 +448000 1.68451535701752 +448512 1.68629968166351 +449024 1.68601512908936 +449536 1.68708455562592 +450048 1.68323075771332 +451072 1.6713193655014 +451584 1.66952764987946 +452096 1.67084658145905 +452608 1.66943073272705 +453120 1.66436052322388 +453632 1.66644859313965 +454144 1.66689133644104 +454656 1.66644954681396 +455168 1.66417992115021 +455680 1.66278660297394 +456192 1.65949368476868 +456704 1.65947866439819 +457216 1.65730822086334 +458240 1.66219019889832 +459776 1.65299391746521 +460288 1.65310299396515 +460800 1.65033042430878 +461312 1.65161979198456 +461824 1.64889252185822 +462336 1.64873254299164 +462848 1.64513838291168 +463360 1.64680528640747 +463872 1.64544308185577 +464384 1.64562022686005 +464896 1.6405702829361 +465920 1.64045596122742 +466432 1.6427937746048 +466944 1.63797724246979 +467456 1.63852381706238 +468480 1.63621032238007 +468992 1.6408280134201 +470528 1.63692224025726 +471040 1.63597357273102 +471552 1.63337290287018 +472064 1.62850403785706 +472576 1.62830901145935 +473088 1.62455749511719 +473600 1.62300276756287 +474112 1.61949670314789 +474624 1.6109231710434 +475136 1.60734784603119 +475648 1.60910451412201 +476160 1.61177182197571 +476672 1.61016809940338 +477184 1.6124621629715 +477696 1.61094582080841 +478208 1.61220026016235 +479232 1.6174488067627 +480256 1.6144015789032 +480768 1.61433696746826 +481280 1.61659610271454 +481792 1.61667084693909 +482816 1.62561130523682 +483328 1.62570858001709 +484352 1.63262975215912 +484864 1.63896882534027 +485376 1.64040219783783 +485888 1.639115691185 +486400 1.6426477432251 +487424 1.64767563343048 +487936 1.64903700351715 +488448 1.64740097522736 +488960 1.65098524093628 +490496 1.65447282791138 +491008 1.65104615688324 +491520 1.65155875682831 +492032 1.64907491207123 +492544 1.65338385105133 +493056 1.65565299987793 +493568 1.65199959278107 +494080 1.65746891498566 +494592 1.66454041004181 +495104 1.67247462272644 +497152 1.68463349342346 +497664 1.68928301334381 +498176 1.69080317020416 +499200 1.68329787254333 +499712 1.68219709396362 +500224 1.68816137313843 +500736 1.68833565711975 +501248 1.68758881092072 +501760 1.69501161575317 +502272 1.69785583019257 +502784 1.70500338077545 +}; +\end{axis} + +\end{tikzpicture} diff --git a/thesis.tex b/thesis.tex index 961f134..8c8dda4 100644 --- a/thesis.tex +++ b/thesis.tex @@ -34,27 +34,31 @@ \pgfplotsset{compat=newest} \usepgfplotslibrary{groupplots} -%******************************************************************** +% ======================================= % Note: Make all your adjustments in here -%******************************************************* +% ======================================= + \input{classicthesis-config} -%******************************************************************** +% ============== % Bibliographies -%******************************************************* +% ============== + \addbibresource{zotero.bib} -%******************************************************************** +% =========== % Hyphenation -%******************************************************* +% =========== + %\hyphenation{put special hyphenation here} % Sequential numbering document-wise of footnotes \counterwithout{footnote}{chapter} -% ******************************************************************** +% ================== % GO!GO!GO! MOVE IT! -%******************************************************* +% ================== + \begin{document} \frenchspacing \raggedbottom @@ -65,9 +69,9 @@ \pagenumbering{arabic} \pagestyle{plain} -%******************************************************************** +% =========== % Frontmatter -%******************************************************* +% =========== %\include{FrontBackmatter/dirty_titlepage} \include{FrontBackmatter/titlepage} @@ -81,9 +85,9 @@ \clearpage\include{FrontBackmatter/copyright} % \cleardoublepage\include{FrontBackmatter/acknowledgments} -%******************************************************************** +% ========== % Mainmatter -%******************************************************* +% ========== \cleardoublepage \pagestyle{scrheadings} @@ -105,9 +109,9 @@ \part{Contribution}\label{part:contribution} \include{Chapters/Part_2/chapter_7} \include{Chapters/Part_2/chapter_8} -% ******************************************************************** +% ========== % Backmatter -%******************************************************* +% ========== \cleardoublepage\include{Chapters/epilogue} \cleardoublepage\include{FrontBackmatter/bibliography} @@ -119,9 +123,8 @@ \part{Contribution}\label{part:contribution} \part{Appendix} \include{Chapters/appendix} -% ******************************************************************** +% ===================================== % Game Over: Restore, Restart, or Quit? -%******************************************************* +% ===================================== \end{document} -% ******************************************************************** diff --git a/zotero.bib b/zotero.bib index 78eb035..79b287f 100644 --- a/zotero.bib +++ b/zotero.bib @@ -3090,3 +3090,99 @@ @inproceedings{todorov_convex_2014 pages = {6054--6061}, file = {Todorov - 2014 - Convex and analytically-invertible dynamics with c.pdf:/home/dferigo/Zotero/storage/P5DDZM5I/Todorov - 2014 - Convex and analytically-invertible dynamics with c.pdf:application/pdf}, } + +@article{raffin_stable-baselines3_2021, + title = {Stable-{Baselines3}: {Reliable} {Reinforcement} {Learning} {Implementations}}, + volume = {22}, + issn = {1533-7928}, + shorttitle = {Stable-{Baselines3}}, + url = {http://jmlr.org/papers/v22/20-1364.html}, + abstract = {Stable-Baselines3 provides open-source implementations of deep reinforcement learning (RL) algorithms in Python. The implementations have been benchmarked against reference codebases, and automated unit tests cover 95\% of the code. The algorithms follow a consistent interface and are accompanied by extensive documentation, making it simple to train and compare different RL algorithms. Our documentation, examples, and source-code are available at https://github.com/DLR-RM/stable-baselines3.}, + number = {268}, + urldate = {2023-05-26}, + journal = {Journal of Machine Learning Research}, + author = {Raffin, Antonin and Hill, Ashley and Gleave, Adam and Kanervisto, Anssi and Ernestus, Maximilian and Dormann, Noah}, + year = {2021}, + pages = {1--8}, + file = {Full Text PDF:/home/dferigo/Zotero/storage/VYS4Q5C5/Raffin et al. - 2021 - Stable-Baselines3 Reliable Reinforcement Learning.pdf:application/pdf;Source Code:/home/dferigo/Zotero/storage/5FJR88C5/stable-baselines3.html:text/html}, +} + +@misc{kingma_adam_2017, + title = {Adam: {A} {Method} for {Stochastic} {Optimization}}, + shorttitle = {Adam}, + url = {http://arxiv.org/abs/1412.6980}, + doi = {10.48550/arXiv.1412.6980}, + abstract = {We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.}, + urldate = {2023-05-26}, + publisher = {arXiv}, + author = {Kingma, Diederik P. and Ba, Jimmy}, + month = jan, + year = {2017}, + note = {arXiv:1412.6980 [cs]}, + keywords = {Computer Science - Machine Learning}, + file = {arXiv Fulltext PDF:/home/dferigo/Zotero/storage/S4R6CY99/Kingma and Ba - 2017 - Adam A Method for Stochastic Optimization.pdf:application/pdf;arXiv.org Snapshot:/home/dferigo/Zotero/storage/SPL9QAXB/1412.html:text/html}, +} + +@article{salvato_crossing_2021, + title = {Crossing the {Reality} {Gap}: {A} {Survey} on {Sim}-to-{Real} {Transferability} of {Robot} {Controllers} in {Reinforcement} {Learning}}, + volume = {9}, + issn = {2169-3536}, + shorttitle = {Crossing the {Reality} {Gap}}, + doi = {10.1109/ACCESS.2021.3126658}, + abstract = {The growing demand for robots able to act autonomously in complex scenarios has widely accelerated the introduction of Reinforcement Learning (RL) in robots control applications. However, the trial and error intrinsic nature of RL may result in long training time on real robots and, moreover, it may lead to dangerous outcomes. While simulators are useful tools to accelerate RL training and to ensure safety, they often are provided only with an approximated model of robot dynamics and of its interaction with the surrounding environment, thus resulting in what is called the reality gap (RG): a mismatch of simulated and real control-law performances caused by the inaccurate representation of the real environment in simulation. The most undesirable result occurs when the controller learnt in simulation fails the task on the real robot, thus resulting in an unsuccessful sim-to-real transfer. The goal of the present survey is threefold: (1) to identify the main approaches to face the RG problem in the context of robot control with RL, (2) to point out their shortcomings, and (3) to outline new potential research areas.}, + journal = {IEEE Access}, + author = {Salvato, Erica and Fenu, Gianfranco and Medvet, Eric and Pellegrino, Felice Andrea}, + year = {2021}, + note = {Conference Name: IEEE Access}, + keywords = {Faces, Process control, Reality gap, reinforcement learning, Reinforcement learning, Robot control, robotics, sim-to-real, Task analysis, Training}, + pages = {153171--153187}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/HC4H6JDQ/stamp.html:text/html;IEEE Xplore Full Text PDF:/home/dferigo/Zotero/storage/W7KM2P45/Salvato et al. - 2021 - Crossing the Reality Gap A Survey on Sim-to-Real .pdf:application/pdf}, +} + +@misc{bellegarda_robust_2021-1, + title = {Robust {Quadruped} {Jumping} via {Deep} {Reinforcement} {Learning}}, + url = {http://arxiv.org/abs/2011.07089}, + doi = {10.48550/arXiv.2011.07089}, + abstract = {In this paper we consider a general task of jumping varying distances and heights for a quadrupedal robot in noisy environments, such as off of uneven terrain and with variable robot dynamics parameters. To accurately jump in such conditions, we propose a framework using deep reinforcement learning to leverage the complex solution of nonlinear trajectory optimization for quadrupedal jumping. While the standalone optimization limits jumping to take-off from flat ground and requires accurate assumption of robot dynamics, our proposed approach improves the robustness to allow jumping off of significantly uneven terrain with variable robot dynamical parameters. Through our method, the quadruped is able to jump distances of up to 1 m and heights of up to 0.4 m, while being robust to environment noise of foot disturbances of up to 0.1 m in height as well as with 5\% variability of its body mass and inertia. This behavior is learned through just a few thousand simulated jumps in PyBullet, and we perform a sim-to-sim transfer to Gazebo. Video results can be found at https://youtu.be/jkzvL2o3g-s.}, + urldate = {2023-05-30}, + publisher = {arXiv}, + author = {Bellegarda, Guillaume and Nguyen, Quan}, + month = mar, + year = {2021}, + note = {arXiv:2011.07089 [cs, eess]}, + keywords = {Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control}, + file = {arXiv Fulltext PDF:/home/dferigo/Zotero/storage/NZQEVQ5U/Bellegarda and Nguyen - 2021 - Robust Quadruped Jumping via Deep Reinforcement Le.pdf:application/pdf;arXiv.org Snapshot:/home/dferigo/Zotero/storage/ZKZEFEYY/2011.html:text/html}, +} + +@inproceedings{du_auto-tuned_2021, + title = {Auto-{Tuned} {Sim}-to-{Real} {Transfer}}, + doi = {10.1109/ICRA48506.2021.9562091}, + abstract = {Policies trained in simulation often fail when transferred to the real world due to the ‘reality gap’ where the simulator is unable to accurately capture the dynamics and visual properties of the real world. Current approaches to tackle this problem, such as domain randomization, require prior knowledge and engineering to determine how much to randomize system parameters in order to learn a policy that is robust to sim-to-real transfer while also not being too conservative. We propose a method for automatically tuning simulator system parameters to match the real world using only raw RGB images of the real world without the need to define rewards or estimate state. Our key insight is to reframe the auto-tuning of parameters as a search problem where we iteratively shift the simulation system parameters to approach the real world system parameters. We propose a Search Param Model (SPM) that, given a sequence of observations and actions and a set of system parameters, predicts whether the given parameters are higher or lower than the true parameters used to generate the observations. We evaluate our method on multiple robotic control tasks in both sim-to-sim and sim-to-real transfer, demonstrating significant improvement over naive domain randomization. Project videos at https://yuqingd.github.io/autotuned-sim2real/.}, + booktitle = {2021 {IEEE} {International} {Conference} on {Robotics} and {Automation} ({ICRA})}, + author = {Du, Yuqing and Watkins, Olivia and Darrell, Trevor and Abbeel, Pieter and Pathak, Deepak}, + month = may, + year = {2021}, + note = {ISSN: 2577-087X}, + keywords = {Automation, Conferences, Knowledge engineering, Predictive models, Search problems, Task analysis, Visualization}, + pages = {1290--1296}, + file = {IEEE Xplore Abstract Record:/home/dferigo/Zotero/storage/SHA7TE7Y/stamp.html:text/html;Submitted Version:/home/dferigo/Zotero/storage/NNM4JVB8/Du et al. - 2021 - Auto-Tuned Sim-to-Real Transfer.pdf:application/pdf}, +} + +@article{muratore_robot_2022-1, + title = {Robot {Learning} {From} {Randomized} {Simulations}: {A} {Review}}, + volume = {9}, + issn = {2296-9144}, + shorttitle = {Robot {Learning} {From} {Randomized} {Simulations}}, + url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9038844/}, + doi = {10.3389/frobt.2022.799893}, + abstract = {The rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. Unfortunately, it is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the “reality gap.” We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named “domain randomization” which is a method for learning from randomized simulations.}, + urldate = {2023-05-30}, + journal = {Frontiers in Robotics and AI}, + author = {Muratore, Fabio and Ramos, Fabio and Turk, Greg and Yu, Wenhao and Gienger, Michael and Peters, Jan}, + month = apr, + year = {2022}, + pmid = {35494543}, + pmcid = {PMC9038844}, + pages = {799893}, + file = {PubMed Central Full Text PDF:/home/dferigo/Zotero/storage/74G2AAQN/Muratore et al. - 2022 - Robot Learning From Randomized Simulations A Revi.pdf:application/pdf}, +} From a54cfa0114d51823b410bfcd7438531398521e45 Mon Sep 17 00:00:00 2001 From: Diego Ferigo Date: Sun, 4 Jun 2023 11:58:59 +0200 Subject: [PATCH 5/5] Update style and formatting --- Chapters/Part_1/chapter_2.tex | 25 +-- Chapters/Part_1/chapter_3.tex | 3 + Chapters/Part_2/chapter_5.tex | 40 ++-- Chapters/Part_2/chapter_6.tex | 119 ++++++------ Chapters/Part_2/chapter_7.tex | 44 ++--- Chapters/Part_2/chapter_8.tex | 178 +++++++++--------- Chapters/prologue.tex | 4 +- FrontBackmatter/contents.tex | 7 +- .../jaxsim_benchmark_algo_anymal.tikz | 10 +- .../chapter_8/jaxsim_benchmark_algo_icub.tikz | 10 +- .../jaxsim_benchmark_algo_panda.tikz | 10 +- latexmkrc | 9 + thesis.tex | 4 + tikzexternalize/dummy.txt | 1 + 14 files changed, 254 insertions(+), 210 deletions(-) create mode 100644 latexmkrc create mode 100644 tikzexternalize/dummy.txt diff --git a/Chapters/Part_1/chapter_2.tex b/Chapters/Part_1/chapter_2.tex index 3613734..a903faf 100644 --- a/Chapters/Part_1/chapter_2.tex +++ b/Chapters/Part_1/chapter_2.tex @@ -942,17 +942,20 @@ \section{Joint Model} where $\ddot{s} \in \realn$ is the joint acceleration. \begin{table} - \centering - \begin{tabular}{lc} - Joint type & Motion subspace \\ - \hline\hline - \addlinespace[1mm] - Revolute & $\subspace = \begin{bmatrix}\zeros_3 \\ \mathbf{a}\end{bmatrix}$ \\ - \addlinespace[1mm] - Prismatic & $\subspace = \begin{bmatrix}\mathbf{a} \\ \zeros_3 \end{bmatrix}$ - \end{tabular} - \caption{List of motion subspaces for the supported 1 \ac{DoF} joints.} - \label{tab:motion_subspaces} +\centering +\caption{List of motion subspaces for the supported 1 \ac{DoF} joints.} +\label{tab:motion_subspaces} +\begin{tblr}{ + colspec={Q[l, m]Q[c, m]}, + row{1} = {font=\bfseries}, +} + \toprule + Joint type & Motion subspace \\ + \midrule + Revolute & $\subspace = \begin{bmatrix}\zeros_3 \\ \mathbf{a}\end{bmatrix}$ \\ + Prismatic & $\subspace = \begin{bmatrix}\mathbf{a} \\ \zeros_3 \end{bmatrix}$ \\ + \bottomrule +\end{tblr} \end{table} \subsubsection{Kinematics and dynamics propagation} diff --git a/Chapters/Part_1/chapter_3.tex b/Chapters/Part_1/chapter_3.tex index a0d8a99..8a2babf 100644 --- a/Chapters/Part_1/chapter_3.tex +++ b/Chapters/Part_1/chapter_3.tex @@ -244,6 +244,7 @@ \subsection{Return} % \end{remark*} +\newpage \subsection{The Reinforcement Learning Problem} \label{section:reinforcement_learning_problem} @@ -278,6 +279,7 @@ \subsection{The Reinforcement Learning Problem} % where $\pi^*$ is the \emph{optimal policy} yielding the maximum return from each visited state along the trajectory. +\newpage \section{Reinforcement Learning Formalism} The previous section provided an informal introduction to the \acl{RL} setting. @@ -650,6 +652,7 @@ \subsection{Generalized Advantage Estimation} We can recognise $\text{GAE}(\gamma, 0) = \hat{A}_t^{(1)}$ and $\text{GAE}(\gamma, 1) = \hat{A}_t^{(\infty)}$. The hyperparameter $0 < \lambda < 1$ balances between these two extremes. +\newpage \subsection{Proximal Policy Optimization} \label{sec:ppo} diff --git a/Chapters/Part_2/chapter_5.tex b/Chapters/Part_2/chapter_5.tex index 363d565..281ff02 100644 --- a/Chapters/Part_2/chapter_5.tex +++ b/Chapters/Part_2/chapter_5.tex @@ -12,6 +12,7 @@ \chapter{Reinforcement Learning Environments for Robotics} Finally, we propose a new framework for developing robotic environments, starting with a description of the design goals, and continuing with the implemented software architecture. This new framework will be validated in the next chapter, where it is used to learn a push-recovery policy for balancing a simulated humanoid robot. +\newpage \section{Frameworks providing robotic environments} \subsection{Properties} @@ -136,33 +137,35 @@ \subsection{Existing frameworks} \end{description} -\begin{sidewaystable} +\begin{landscape} +\begin{table} \centering \caption{Comparison of frameworks that provide robotic environments compatible with OpenAI Gym.} \label{tab:comparison} \newcommand{\x}{\ensuremath{\times}} \newcommand{\ck}{\checkmark} - \newcolumntype{Y}{>{\centering\arraybackslash}X} - \centerline{ - \small - \begin{tabularx}{\textwidth}{lYYYYYYYYY} + \begin{tblr}{ + colspec={Q[l,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries}, + } \toprule - Software & Reproducible & Multiple Physics Engines & Photorealistic Rendering & Accelerated & Parallel & Real-Time Compatible & Modular & Open Source \\ - \midrule \rowcolor{black!20} + Software & Reproducible & {Multiple \\Physics Engines} & {Photorealistic\\ Rendering} & Accelerated & Parallel & {Real-Time\\ Compatible} & Modular & Open Source \\ + \midrule OpenAI Robotic Envs & & & & \ck & \ck & & \ck & \ck \\ - Bullet3 Environments & \ck & \ck & & \ck & \ck & & $\sim$ & \ck \\ \rowcolor{black!20} + Bullet3 Environments & \ck & \ck & & \ck & \ck & & $\sim$ & \ck \\ Unity ML-Agents & \ck & \ck & \ck & \ck & \ck & & & \ck \\ - RaiSim & \ck & & \ck & \ck & \ck & & & \\ \rowcolor{black!20} - Jiminy & \ck & & & \ck & \ck & & & \ck \\ - PyRep & \ck & \ck & \ck & \ck & \ck & & & \ck \\ \rowcolor{black!20} - robo-gym & & \ck & & \ck & & \ck & & \ck \\ - Nvidia Isaac Gym & \ck & \ck & \ck & \ck & \ck & \ck & & \\ \rowcolor{black!20} - Brax & \ck & & & \ck & \ck & & & \ck \\ - Gym-Ignition & \ck & \ck & $\sim$ & \ck & \ck & \ck & \ck & \ck \\ + RaiSim & \ck & & \ck & \ck & \ck & & & \\ + Jiminy & \ck & & & \ck & \ck & & & \ck \\ + PyRep & \ck & \ck & \ck & \ck & \ck & & & \ck \\ + robo-gym & & \ck & & \ck & & \ck & & \ck \\ + Nvidia Isaac Gym & \ck & \ck & \ck & \ck & \ck & \ck & & \\ + Brax & \ck & & & \ck & \ck & & & \ck \\ + Gym-Ignition & \ck & \ck & $\sim$ & \ck & \ck & \ck & \ck & \ck \\ \bottomrule - \end{tabularx} - } -\end{sidewaystable} + \end{tblr} +\end{table} +\end{landscape} \section{Design Goals} @@ -339,6 +342,7 @@ \section{Gym-Ignition} Our framework also helps enforce reproducible results in presence of randomness by storing a single generator in the \verb|Runtime|, shared with all \verb|Task|s and \verb|Randomizer|s. Under the assumption that the environment logic's execution order does not change, multiple experiments sharing the same seed number produce the same sampled trajectories. +\newpage \section{Conclusions} This chapter presented a novel framework to create reproducible robotic environments for \acl{RL} applications. diff --git a/Chapters/Part_2/chapter_6.tex b/Chapters/Part_2/chapter_6.tex index 41f97fa..b6022c8 100644 --- a/Chapters/Part_2/chapter_6.tex +++ b/Chapters/Part_2/chapter_6.tex @@ -13,6 +13,7 @@ \chapter{Learning from scratch exploiting Robot Models} The priors act only as exploration hints, therefore, the model description does not need to be excessively accurate. We also apply domain randomization over the model description used in simulation, resulting in differences between the simulated robot's dynamics and the model's dynamic from which the reward-shaping priors are computed. +\newpage \section{Training Environment} \begin{figure} @@ -54,24 +55,27 @@ \subsection{Action} \subsection{State} \begin{table} - \small - \center + \centering \caption{Observation components.} \label{tab:observation} - \begin{tabular}{llcc} + \begin{tblr}{ + colspec={Q[l, m]Q[l, m]Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries}, + } \toprule Name & Value & Set & Range \\ - \midrule \rowcolor{black!10} + \midrule Joint positions & $\mathbf{o}_s = \boldsymbol{s}$ & $\mathbb{R}^n$ & $[\boldsymbol{s}_{lb}, \boldsymbol{s}_{ub}]$ \\ - Joint velocities & $\mathbf{o}_{\dot{s}} = \dot{\boldsymbol{s}}$ & $\mathbb{R}^n $ & $[-\pi, \pi]$ \\ \rowcolor{black!10} + Joint velocities & $\mathbf{o}_{\dot{s}} = \dot{\boldsymbol{s}}$ & $\mathbb{R}^n $ & $[-\pi, \pi]$ \\ Base height & $\mathbf{o}_h = \boldsymbol{p}_B^z$ & $\mathbb{R}$ & $[0, 0.78]$ \\ - Base orientation & $\mathbf{o}_R = (\rho, \phi)_B$ & $\mathbb{R}^2$ & $[-2\pi, 2\pi]$ \\ \rowcolor{black!10} + Base orientation & $\mathbf{o}_R = (\rho, \phi)_B$ & $\mathbb{R}^2$ & $[-2\pi, 2\pi]$ \\ Contact configuration & $\mathbf{o}_c = (c_L, c_R)$ & $\{0, 1\}^2$ & - \\ - \ac{CoP} forces & $ \mathbf{o}_f = (f^{CoP}_L, f^{CoP}_R)$ & $\mathbb{R}^2$ & $[0, mg]$ \\ \rowcolor{black!10} + \ac{CoP} forces & $ \mathbf{o}_f = (f^{CoP}_L, f^{CoP}_R)$ & $\mathbb{R}^2$ & $[0, mg]$ \\ Feet positions & $\mathbf{o}_F = ({}^B \boldsymbol{p}_L, {}^B \boldsymbol{p}_R)$ & $\mathbb{R}^6$ & $[0, 0.78]$ \\ \ac{CoM} velocity & $\mathbf{o}_v = {}^{G}\boldsymbol{v}_{CoM}$ & $\mathbb{R}^3$ & $[0, 3]$ \\ \bottomrule - \end{tabular} + \end{tblr} \end{table} Since no perception is involved, the state of the \ac{MDP} contains information about the robot's kinematics and dynamics. @@ -151,26 +155,28 @@ \subsection{Other specifications}\label{sec:env-other} \section{Agent} \begin{table} - \small - \center - \caption{PPO, policy, and training parameters.} - \label{tab:training_parameters} - \newcommand{\ck}{\checkmark} - \begin{tabular}{cc} - \toprule - Parameter & \multicolumn{1}{c}{Value} \\ - \midrule \rowcolor{black!10} - Discount rate $\gamma$ & 0.95 \\ - Clip parameter $\epsilon$ & 0.3 \\ \rowcolor{black!10} - Learning rate $\alpha$ & 0.0001 \\ - GAE parameter $\lambda$ & 1.0 \\ \rowcolor{black!10} - Batch size & 10000 \\ - Minibatch size & 512 \\ \rowcolor{black!10} - Number of SGD epochs & 32 \\ - Number or parallel workers & 32 \\ \rowcolor{black!10} - Value function clip parameter & 1000 \\ - \bottomrule - \end{tabular} +\centering +\caption{PPO, policy, and training parameters.} +\label{tab:training_parameters} +\begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries}, +} + \toprule + Parameter & \multicolumn{1}{c}{Value} \\ + \midrule + Discount rate $\gamma$ & 0.95 \\ + Clip parameter $\epsilon$ & 0.3 \\ + Learning rate $\alpha$ & 0.0001 \\ + GAE parameter $\lambda$ & 1.0 \\ + Batch size & 10000 \\ + Minibatch size & 512 \\ + Number of SGD epochs & 32 \\ + Number or parallel workers & 32 \\ + Value function clip parameter & 1000 \\ + \bottomrule +\end{tblr} \end{table} The agent receives the observation $\boldsymbol{o}$ from the environment and returns the action $\mathbf{a}$ defining the reference velocities of the controlled joints. @@ -201,35 +207,10 @@ \section{Agent} Worker nodes run only on \ac{CPU} resources, while the trainer has access to the \ac{GPU} for accelerating the optimisation process. We use the RLlib~\parencite{liang_rllib_2018} framework, OpenAI Gym, and distributed training. +\newpage \section{Reward Shaping} \label{sec:reward_shaping} -\begin{sidewaystable} - \center - \caption{Reward function details. Terms with a defined cutoff are processed by the RBF kernel.} - \label{tab:reward} - \newcommand{\ck}{\checkmark} - \setlength\extrarowheight{2pt} - \begin{tabular}{lcrccrlrr} - \toprule - Name & Symbol(s) & Weight & Value $\mathbf{x}$ & Target $\mathbf{x}^*$ & \multicolumn{2}{c}{Cutoff $x_c$} & SS & DS \\ - \midrule \rowcolor{black!10} - Joint torques & $r_\tau$ & 5 & $\Vert \boldsymbol{\tau}_{step} \Vert $ & $\boldsymbol{0}_n$ & 10.0 & Nm & \ck & \ck \\ - Joint velocities & $r_{\dot{s}}$ & 2 & $\boldsymbol{a}$ & $\boldsymbol{0}_n$ & 1.0 & rad/s & \ck & \ck \\ \rowcolor{black!10} - Postural & $r_{s}$ & 10 & $\boldsymbol{s}$ & $\boldsymbol{s}_0$ & 7.5 & deg & & \ck \\ - CoM $z$ velocity & $r_{v}^z$ & 2 & $\boldsymbol{v}^{xy}_{G}$ & $0$ & 1.0 & m/s & \ck & \ck \\ \rowcolor{black!10} - CoM $xy$ velocity & $r_{v}^{xy}$ & 2 & $\boldsymbol{v}^{z}_{G}$ & $\omega_0 (\boldsymbol{p}^{xy}_{G} - \boldsymbol{\bar{p}}^{xy}_{hull})$ & 0.5 & m/s & & \ck \\ - Feet contact forces & $\{r_{f}^L, r_{f}^R\}$ & 4 & $\{f^{CoP}_L, f^{CoP}_R\}$ & $m g / 2$ & $m g / 2$ & N & \ck & \ck \\ \rowcolor{black!10} - Centroidal momentum & $r_h$ & 1 & $\Vert {}_{G} \mathbf{h}_l \Vert^2 + \Vert {}_{G} \mathbf{h}_\omega \Vert^2$ & 0 & 50.0 & kg m$^2$/s & \ck & \ck \\ - Feet CoPs & $\{r_{p}^L, r_{p}^R\}$ & 20 & $\{\boldsymbol{p}_{L, CoP}, \boldsymbol{p}_{R, CoP}\}$ & $\{\bar{\boldsymbol{p}}^{xy}_{L, hull}, \bar{\boldsymbol{p}}^{xy}_{R, hull}\}$ & 0.3 & m & \ck & \ck \\ \rowcolor{black!10} - Feet orientation & $\{r_{o}^L, r_{o}^R\}$ & 3 & $\{\mathbf{r}^{(z)}_L \cdot \mathbf{e}_z, \mathbf{r}^{(z)}_R \cdot \mathbf{e}_z\}$ & $1$ & 0.01 & - & \ck & \ck \\ - CoM projection & $r_{G}$ & 10 & $\boldsymbol{p}^{xy}_{G}$ & in the CH of support polygon & \multicolumn{1}{r}{-} & - & & \ck \\ \rowcolor{black!10} - Feet in contact & $r_{c}$ & 2 & $c_L \land c_R$ & 1 & \multicolumn{1}{r}{-} & - & \ck & \ck \\ - Links in contact & $r_l$ & -10 & $c_{l}$ & $0$ & \multicolumn{1}{r}{-} & - & \ck & \ck \\ - \bottomrule - \end{tabular} -\end{sidewaystable} - \subsection{RBF Kernel} \ac{RBF} kernels are widely employed functions in machine learning, defined as @@ -365,6 +346,36 @@ \subsubsection{Transient} \end{description} +\begin{landscape} +\begin{table} + \centering + \caption{Reward function details. Terms with a defined cutoff are processed by the RBF kernel.} + \label{tab:reward} + \newcommand{\ck}{\checkmark} + \begin{tblr}{ + colspec={Q[l,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]Q[c,m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries}, + } + \toprule + Name & Symbol(s) & Weight & Value $\mathbf{x}$ & Target $\mathbf{x}^*$ & \multicolumn{2}{c}{Cutoff $x_c$} & SS & DS \\ + \midrule + Joint torques & $r_\tau$ & 5 & $\Vert \boldsymbol{\tau}_{step} \Vert $ & $\boldsymbol{0}_n$ & 10.0 & Nm & \ck & \ck \\ + Joint velocities & $r_{\dot{s}}$ & 2 & $\boldsymbol{a}$ & $\boldsymbol{0}_n$ & 1.0 & rad/s & \ck & \ck \\ + Postural & $r_{s}$ & 10 & $\boldsymbol{s}$ & $\boldsymbol{s}_0$ & 7.5 & deg & & \ck \\ + CoM $z$ velocity & $r_{v}^z$ & 2 & $\boldsymbol{v}^{xy}_{G}$ & $0$ & 1.0 & m/s & \ck & \ck \\ + CoM $xy$ velocity & $r_{v}^{xy}$ & 2 & $\boldsymbol{v}^{z}_{G}$ & $\omega_0 (\boldsymbol{p}^{xy}_{G} - \boldsymbol{\bar{p}}^{xy}_{hull})$ & 0.5 & m/s & & \ck \\ + Feet contact forces & $\{r_{f}^L, r_{f}^R\}$ & 4 & $\{f^{CoP}_L, f^{CoP}_R\}$ & $m g / 2$ & $m g / 2$ & N & \ck & \ck \\ + Centroidal momentum & $r_h$ & 1 & $\Vert {}_{G} \mathbf{h}_l \Vert^2 + \Vert {}_{G} \mathbf{h}_\omega \Vert^2$ & 0 & 50.0 & kg m$^2$/s & \ck & \ck \\ + Feet CoPs & $\{r_{p}^L, r_{p}^R\}$ & 20 & $\{\boldsymbol{p}_{L, CoP}, \boldsymbol{p}_{R, CoP}\}$ & $\{\bar{\boldsymbol{p}}^{xy}_{L, hull}, \bar{\boldsymbol{p}}^{xy}_{R, hull}\}$ & 0.3 & m & \ck & \ck \\ + Feet orientation & $\{r_{o}^L, r_{o}^R\}$ & 3 & $\{\mathbf{r}^{(z)}_L \cdot \mathbf{e}_z, \mathbf{r}^{(z)}_R \cdot \mathbf{e}_z\}$ & $1$ & 0.01 & - & \ck & \ck \\ + CoM projection & $r_{G}$ & 10 & $\boldsymbol{p}^{xy}_{G}$ & in the CH of support polygon & \multicolumn{1}{r}{-} & - & & \ck \\ + Feet in contact & $r_{c}$ & 2 & $c_L \land c_R$ & 1 & \multicolumn{1}{r}{-} & - & \ck & \ck \\ + Links in contact & $r_l$ & -10 & $c_{l}$ & $0$ & \multicolumn{1}{r}{-} & - & \ck & \ck \\ + \bottomrule + \end{tblr} +\end{table} +\end{landscape} \section{Results} diff --git a/Chapters/Part_2/chapter_7.tex b/Chapters/Part_2/chapter_7.tex index 8e158aa..292ea68 100644 --- a/Chapters/Part_2/chapter_7.tex +++ b/Chapters/Part_2/chapter_7.tex @@ -702,28 +702,28 @@ \subsection{Sliding Box on Inclined Plane} Its collision shape is approximated considering the 8 points corresponding to its corners. \begin{table} - \small - \centering - \begin{tblr}{ - colspec={Q[c, m]Q[c, m]}, - row{even} = {bg=gray9}, - row{1} = {font=\bfseries\footnotesize}, - } - \toprule - Property & Value \\ - \midrule - \texttt{timestep} & $0.001$ \\ - \texttt{integrator} & \texttt{RK4} \\ - \texttt{solver} & \texttt{Newton} \\ - \texttt{iterations} & $50$ \\ - \texttt{cone} & \texttt{elliptic} \\ - \texttt{friction} & $(\mu,\, 0,\, 0)$ \\ - \texttt{solref} & $(-1e6,\, -2000)$ \\ - \texttt{condim} & $3$ \\ - \bottomrule - \end{tblr} - \caption{Mujoco configuration considered in the experiments of the sliding box on inclined surface matching as close as possible the setting and properties of our soft-contact model. Refer to the official documentation at \url{https://mujoco.readthedocs.io} for a detailed explanation of the options.} - \label{tab:mujoco_parameters} +\small +\centering +\caption{Mujoco configuration considered in the experiments of the sliding box on inclined surface matching as close as possible the setting and properties of our soft-contact model. Refer to the official documentation at \url{https://mujoco.readthedocs.io} for a detailed explanation of the options.} +\label{tab:mujoco_parameters} +\begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, +} + \toprule + Property & Value \\ + \midrule + \texttt{timestep} & $0.001$ \\ + \texttt{integrator} & \texttt{RK4} \\ + \texttt{solver} & \texttt{Newton} \\ + \texttt{iterations} & $50$ \\ + \texttt{cone} & \texttt{elliptic} \\ + \texttt{friction} & $(\mu,\, 0,\, 0)$ \\ + \texttt{solref} & $(-1e6,\, -2000)$ \\ + \texttt{condim} & $3$ \\ + \bottomrule +\end{tblr} \end{table} The box starts floating in the air having its \ac{CoM} positioned in $\pos[W]_G = (0, 0, 1.0) \, m$. diff --git a/Chapters/Part_2/chapter_8.tex b/Chapters/Part_2/chapter_8.tex index 3092dbe..f772de6 100644 --- a/Chapters/Part_2/chapter_8.tex +++ b/Chapters/Part_2/chapter_8.tex @@ -711,7 +711,6 @@ \subsection{Features} \end{remark*} \begin{landscape} -% \begin{sidewaystable} \begin{table} \centering \small @@ -736,7 +735,6 @@ \subsection{Features} \bottomrule \end{tblr} \end{table} -% \end{sidewaystable} \end{landscape} \subsection{Benchmarks} @@ -924,29 +922,29 @@ \section{Validation} All experiments presented in this section have been executed on a laptop whose specifications are reported in Table~\ref{tab:laptop_specifications_validation}. \begin{table} - \small - \centering - \begin{tblr}{ - colspec={Q[c, m]Q[c, m]}, - row{even} = {bg=gray9}, - row{1} = {font=\bfseries\footnotesize}, - } - \toprule - Specification & Value \\ - \midrule - Intel CPU & i7-10750H \\ - Nvidia GPU & GeForce GTX 1650 Ti Mobile \\ - CUDA cores & 1024 \\ - Operating system & Ubuntu 22.04 \\ - Nvidia driver & 530.41.03 \\ - CUDA & 11.2.2 \\ - cuDNN & 8.8.0.121 \\ - \jax & 0.3.15 \\ - Mujoco & 2.3.5 \\ - \bottomrule - \end{tblr} - \caption{Specifications of the machine used to execute the validation experiments.} - \label{tab:laptop_specifications_validation} +\small +\centering +\caption{Specifications of the machine used to execute the validation experiments.} +\label{tab:laptop_specifications_validation} +\begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, +} + \toprule + Specification & Value \\ + \midrule + Intel CPU & i7-10750H \\ + Nvidia GPU & GeForce GTX 1650 Ti Mobile \\ + CUDA cores & 1024 \\ + Operating system & Ubuntu 22.04 \\ + Nvidia driver & 530.41.03 \\ + CUDA & 11.2.2 \\ + cuDNN & 8.8.0.121 \\ + \jax & 0.3.15 \\ + Mujoco & 2.3.5 \\ + \bottomrule +\end{tblr} \end{table} \subsection{Sampling Experience for Robot Learning} @@ -964,55 +962,53 @@ \subsection{Sampling Experience for Robot Learning} \end{figure} \begin{table} - \small - \centering - \begin{tblr}{ - colspec={Q[c, m]Q[c, m]}, - row{even} = {bg=gray9}, - row{1} = {font=\bfseries\footnotesize}, - } - \toprule - Property & Value \\ - \midrule - Integrator & Semi-implicit Euler \\ - Integrator step & $0.0005\,$~s \\ - Environment step & $0.050\,$~s \\ - Control frequency & $20\,$Hz \\ - Action dimension & 1 \\ - Observation dimension & 4 \\ - Action space & $[-50,\, 50]\,$~N \\ - {Maximum episode steps} & $200$ \\ - Parallel environments & $512$ \\ - Equivalent \ac{RTF} & $24.38$ \\ - \bottomrule - \end{tblr} - \caption{Properties of the environment implementing the cartpole swing-up task.} - \label{tab:environment_properties_cartpole} +\centering +\caption{Properties of the environment implementing the cartpole swing-up task.} +\label{tab:environment_properties_cartpole} +\begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries}, +} + \toprule + Property & Value \\ + \midrule + Integrator & Semi-implicit Euler \\ + Integrator step & $0.0005\,$~s \\ + Environment step & $0.050\,$~s \\ + Control frequency & $20\,$Hz \\ + Action dimension & 1 \\ + Observation dimension & 4 \\ + Action space & $[-50,\, 50]\,$~N \\ + {Maximum episode steps} & $200$ \\ + Parallel environments & $512$ \\ + Equivalent \ac{RTF} & $24.38$ \\ + \bottomrule +\end{tblr} \end{table} \begin{table} - \small - \centering - \begin{tblr}{ - colspec={Q[c, m]Q[c, m]}, - row{even} = {bg=gray9}, - row{1} = {font=\bfseries\footnotesize}, - } - \toprule - Parameter & Value \\ - \midrule - Discount rate $\gamma$ & $0.95$ \\ - Clip parameter $\epsilon$ & $0.1$ \\ - Target \acs{KL} divergence & $0.025$ \\ - Learning rate $\alpha$ & $0.0003$ \\ - \acs{GAE} parameter $\lambda$ & $0.9$ \\ - Batch size & $2560$ \\ - Minibatch size & $256$ \\ - Number of \small{SGD} epochs & $10$ \\ - \bottomrule - \end{tblr} - \caption{PPO parameters for the the cartpole swing-up environment.} - \label{tab:ppo_parameters_cartpole} +\centering +\caption{PPO parameters for the the cartpole swing-up environment.} +\label{tab:ppo_parameters_cartpole} +\begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries}, +} + \toprule + Parameter & Value \\ + \midrule + Discount rate $\gamma$ & $0.95$ \\ + Clip parameter $\epsilon$ & $0.1$ \\ + Target \acs{KL} divergence & $0.025$ \\ + Learning rate $\alpha$ & $0.0003$ \\ + \acs{GAE} parameter $\lambda$ & $0.9$ \\ + Batch size & $2560$ \\ + Minibatch size & $256$ \\ + Number of \small{SGD} epochs & $10$ \\ + \bottomrule +\end{tblr} \end{table} \begin{description} @@ -1087,25 +1083,25 @@ \subsection{Sampling Experience for Robot Learning} \subsection{Sim-to-sim Policy Transfer} \begin{table} - \small - \centering - \begin{tblr}{ - colspec={Q[c, m]Q[c, m]}, - row{even} = {bg=gray9}, - row{1} = {font=\bfseries\footnotesize}, - } - \toprule - Property & Value \\ - \midrule - \texttt{timestep} & $0.001$ \\ - \texttt{integrator} & \texttt{RK4} \\ - \texttt{solver} & \texttt{Newton} \\ - \texttt{iterations} & $100$ \\ - \texttt{contype} & $0$ \\ - \bottomrule - \end{tblr} - \caption{Mujoco properties used for the sim-to-sim evaluation of the trained cartpole swing-up policy. Refer to the official documentation at \url{https://mujoco.readthedocs.io} for a detailed explanation of the options.} - \label{tab:mujoco_parameters} +\small +\centering +\caption{Mujoco properties used for the sim-to-sim evaluation of the trained cartpole swing-up policy. Refer to the official documentation at \url{https://mujoco.readthedocs.io} for a detailed explanation of the options.} +\label{tab:mujoco_parameters_cartpole} +\begin{tblr}{ + colspec={Q[c, m]Q[c, m]}, + row{even} = {bg=gray9}, + row{1} = {font=\bfseries\footnotesize}, +} + \toprule + Property & Value \\ + \midrule + \texttt{timestep} & $0.001$ \\ + \texttt{integrator} & \texttt{RK4} \\ + \texttt{solver} & \texttt{Newton} \\ + \texttt{iterations} & $100$ \\ + \texttt{contype} & $0$ \\ + \bottomrule +\end{tblr} \end{table} In this section, we attempt to evaluate the trained policy in an out-of-distribution setting. @@ -1115,7 +1111,7 @@ \subsection{Sim-to-sim Policy Transfer} Similarly to Section~\ref{sec:sliding_box_inclined_plane}, we use the Mujoco simulator as out-of-distribution setting. We translated the \ac{URDF} of the cartpole model loaded in \jaxsim in the training environment to an equivalent \ac{MJCF} that can be imported in Mujoco. -Then, we included in the same file the configuration of the physics engine, whose parameters are reported in Table~\ref{tab:mujoco_parameters}. +Then, we included in the same file the configuration of the physics engine, whose parameters are reported in Table~\ref{tab:mujoco_parameters_cartpole}. The chosen parameters expose a simulation characterised by the same control rate ($20\,$Hz), but in this case the physics is simulated in a different simulator using an integrator of a different family and different constraint solver. The Mujoco environment is only used for producing the state-action trajectory $\tau$ from a given initial observation $\mathbf{o}_0$, where the action is obtained by performing inference of the trained policy. diff --git a/Chapters/prologue.tex b/Chapters/prologue.tex index 202dac7..47e24f2 100644 --- a/Chapters/prologue.tex +++ b/Chapters/prologue.tex @@ -94,7 +94,7 @@ \subsubsection{Part II: Contributions} % \item Chapter~\ref{ch:contact_aware_dynamics} starts addressing the problem of optimising the generation of synthetic experience for robot locomotion, whose performance was a bottleneck in the experiment presented in the previous chapter. This chapter provides a state-space representation that models the dynamics of a floating-base robot that can be integrated numerically to simulate its evolution. It formulates a soft-contacts model to compute the interaction forces between the robot and the terrain surface, supporting both static (sticking) and dynamic (slipping) regimes, having a friction cone boundary without approximations. The dynamics of the contact model are then included in an extended state-space representation, obtaining a system of differential equations that describe the contact-aware dynamics of a floating-base robot. % - \item Chapter~\ref{ch:scaling_rigid_body_simulations}, by exploiting the contact-aware state-space representation formulated in the previous chapter, presents a new physics engine in reduced coordinates that can be executed on hardware accelerators like \acsp{GPU} and \acsp{TPU} for maximising sampling throughput. To this end, with the notation introduced in Chapter~\ref{ch:robot_modelling}, this chapter also formulates canonical \aclp{RBDA} that can be executed in this accelerated context. The physics engine performance is finally benchmarked, assessing the accuracy and speed of its algorithms and the scaling properties when executed in a highly parallel setting integrating hundreds or thousands of robot models concurrently. + \item Chapter~\ref{ch:scaling_rigid_body_simulations}, by exploiting the contact-aware state-space representation formulated in the previous chapter, presents a new physics engine in reduced coordinates that can be executed on hardware accelerators like \acsp{GPU} and \acsp{TPU} for maximising sampling throughput. To this end, with the notation introduced in Chapter~\ref{ch:robot_modelling}, this chapter also formulates canonical \aclp{RBDA} that can be executed in this accelerated context. The physics engine performance is then benchmarked, assessing the accuracy and speed of its algorithms and the scaling properties when executed in a highly parallel setting integrating hundreds or thousands of robot models concurrently. Finally, we validate the performance of the proposed physics engine by training a policy by sampling experience from hundreds of parallel environments running on \ac{GPU} and evaluate its performance in a sim-to-sim setting representing an out-of-distribution environment. \end{itemize} \section*{Research Publications} @@ -163,7 +163,6 @@ \section*{Research Publications} \textit{International Journal of Advanced Robotic Systems}, 2020 \end{quote} -\newpage \begin{quote} \textbf{Towards Partner-Aware Humanoid Robot Control Under Physical Interactions} \\ Yeshasvi Tirupachuri, Gabriele Nava, Claudia Latella, Diego Ferigo, Lorenzo Rapetti, Luca Tagliapietra, Francesco Nori, Daniele Pucci \\ @@ -188,6 +187,7 @@ \section*{Research Publications} \textit{Cognitive Robotics, Chapter 7}, 2022 \end{quote} +\newpage \section*{Software Projects} The results of the research conducted for this thesis produced the following software projects that I developed and I am maintaining: diff --git a/FrontBackmatter/contents.tex b/FrontBackmatter/contents.tex index 2a3449a..31e21e9 100644 --- a/FrontBackmatter/contents.tex +++ b/FrontBackmatter/contents.tex @@ -14,7 +14,7 @@ \vfill \begin{center} - \textbf{Word Count: 53500} + \textbf{Word Count: 57000} \end{center} \automark[section]{chapter} @@ -187,8 +187,9 @@ \nomenclature[A, 27]{$\quat = (w, \mathbf{r}) \in \realn^4$}{Quaternion coefficients} \nomenclature[A, 28]{$\mathcal{H}$}{Function providing terrain height} \nomenclature[A, 29]{$\mathcal{S}$}{Function providing terrain normal} - \nomenclature[A, 30]{$(\cdot)^\parallel, \, (\cdot)_\parallel$}{Component parallel to the terrain} - \nomenclature[A, 30]{$(\cdot)^\perp, \, (\cdot)_\perp$}{Component normal to the terrain} + \nomenclature[A, 30]{$\mathbf{m}$}{3D tangential deformation of the terrain's material} + \nomenclature[A, 31]{$(\cdot)^\parallel, \, (\cdot)_\parallel$}{Component parallel to the terrain} + \nomenclature[A, 31]{$(\cdot)^\perp, \, (\cdot)_\perp$}{Component normal to the terrain} \nomenclature[L, 01]{$\mathcal{S}$}{The state space} \nomenclature[L, 02]{$\mathcal{A}$}{The action space} diff --git a/images/contributions/chapter_8/jaxsim_benchmark_algo_anymal.tikz b/images/contributions/chapter_8/jaxsim_benchmark_algo_anymal.tikz index c963741..f359c33 100644 --- a/images/contributions/chapter_8/jaxsim_benchmark_algo_anymal.tikz +++ b/images/contributions/chapter_8/jaxsim_benchmark_algo_anymal.tikz @@ -1,5 +1,9 @@ \begin{tikzpicture} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{seagreen1513384}{RGB}{15,133,84} +\definecolor{teal29105150}{RGB}{29,105,150} + \begin{semilogyaxis}[ title=\textbf{Anymal C - 12 DoFs}, axis lines*=left, @@ -29,9 +33,9 @@ RNEA 3.08 87 1050 ABA 11.2 111 1640 }\dataicub -\addplot table [x=Algo, y=Mean_Pinocchio] {\dataicub}; \addlegendentry{Pinocchio} -\addplot table [x=Algo, y=Mean_Jaxsim_CPU] {\dataicub}; \addlegendentry{JaxSim CPU} -\addplot table [x=Algo, y=Mean_Jaxsim_GPU] {\dataicub}; \addlegendentry{JaxSim GPU} +\addplot[indianred2048062, fill=indianred2048062!60] table [x=Algo, y=Mean_Pinocchio] {\dataicub}; \addlegendentry{Pinocchio} +\addplot[seagreen1513384, fill=seagreen1513384!60] table [x=Algo, y=Mean_Jaxsim_CPU] {\dataicub}; \addlegendentry{JaxSim CPU} +\addplot[teal29105150, fill=teal29105150!60] table [x=Algo, y=Mean_Jaxsim_GPU] {\dataicub}; \addlegendentry{JaxSim GPU} \end{semilogyaxis} diff --git a/images/contributions/chapter_8/jaxsim_benchmark_algo_icub.tikz b/images/contributions/chapter_8/jaxsim_benchmark_algo_icub.tikz index 2b8f580..af638e4 100644 --- a/images/contributions/chapter_8/jaxsim_benchmark_algo_icub.tikz +++ b/images/contributions/chapter_8/jaxsim_benchmark_algo_icub.tikz @@ -1,5 +1,9 @@ \begin{tikzpicture} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{seagreen1513384}{RGB}{15,133,84} +\definecolor{teal29105150}{RGB}{29,105,150} + \begin{semilogyaxis}[ title=\textbf{iCub - 32 DoFs}, axis lines*=left, @@ -29,9 +33,9 @@ RNEA 6.33 198 2850 ABA 20.5 256 4000 }\dataicub -\addplot table [x=Algo, y=Mean_Pinocchio] {\dataicub}; \addlegendentry{Pinocchio} -\addplot table [x=Algo, y=Mean_Jaxsim_CPU] {\dataicub}; \addlegendentry{JaxSim CPU} -\addplot table [x=Algo, y=Mean_Jaxsim_GPU] {\dataicub}; \addlegendentry{JaxSim GPU} +\addplot[indianred2048062, fill=indianred2048062!60] table [x=Algo, y=Mean_Pinocchio] {\dataicub}; \addlegendentry{Pinocchio} +\addplot[seagreen1513384, fill=seagreen1513384!60] table [x=Algo, y=Mean_Jaxsim_CPU] {\dataicub}; \addlegendentry{JaxSim CPU} +\addplot[teal29105150, fill=teal29105150!60] table [x=Algo, y=Mean_Jaxsim_GPU] {\dataicub}; \addlegendentry{JaxSim GPU} \end{semilogyaxis} diff --git a/images/contributions/chapter_8/jaxsim_benchmark_algo_panda.tikz b/images/contributions/chapter_8/jaxsim_benchmark_algo_panda.tikz index 40caa7b..0ba0692 100644 --- a/images/contributions/chapter_8/jaxsim_benchmark_algo_panda.tikz +++ b/images/contributions/chapter_8/jaxsim_benchmark_algo_panda.tikz @@ -1,5 +1,9 @@ \begin{tikzpicture} +\definecolor{indianred2048062}{RGB}{204,80,62} +\definecolor{seagreen1513384}{RGB}{15,133,84} +\definecolor{teal29105150}{RGB}{29,105,150} + \begin{semilogyaxis}[ title=\textbf{Panda - 9 DoFs}, axis lines*=left, @@ -29,9 +33,9 @@ RNEA 2.98 61.3 825 ABA 9.2 77.6 1280 }\dataicub -\addplot table [x=Algo, y=Mean_Pinocchio] {\dataicub}; \addlegendentry{Pinocchio} -\addplot table [x=Algo, y=Mean_Jaxsim_CPU] {\dataicub}; \addlegendentry{JaxSim CPU} -\addplot table [x=Algo, y=Mean_Jaxsim_GPU] {\dataicub}; \addlegendentry{JaxSim GPU} +\addplot[indianred2048062, fill=indianred2048062!60] table [x=Algo, y=Mean_Pinocchio] {\dataicub}; \addlegendentry{Pinocchio} +\addplot[seagreen1513384, fill=seagreen1513384!60] table [x=Algo, y=Mean_Jaxsim_CPU] {\dataicub}; \addlegendentry{JaxSim CPU} +\addplot[teal29105150, fill=teal29105150!60] table [x=Algo, y=Mean_Jaxsim_GPU] {\dataicub}; \addlegendentry{JaxSim GPU} \end{semilogyaxis} diff --git a/latexmkrc b/latexmkrc new file mode 100644 index 0000000..cae9b09 --- /dev/null +++ b/latexmkrc @@ -0,0 +1,9 @@ +# https://sdoerner.de/2022/05/31/tikz-image-externalization-with-overleaf/ +$pdflatex = 'pdflatex -shell-escape'; + +# Custom dependency and function for nomencl package +# https://tex.stackexchange.com/a/105978 +add_cus_dep( 'nlo', 'nls', 0, 'makenlo2nls' ); +sub makenlo2nls { +system( "makeindex -s nomencl.ist -o \"$_[0].nls\" \"$_[0].nlo\"" ); +} diff --git a/thesis.tex b/thesis.tex index 8c8dda4..90c7466 100644 --- a/thesis.tex +++ b/thesis.tex @@ -34,6 +34,10 @@ \pgfplotsset{compat=newest} \usepgfplotslibrary{groupplots} +% Cache TIKZ images. Requires custom latexmkrc. +% \usetikzlibrary{external} +% \tikzexternalize[prefix=tikzexternalize/] + % ======================================= % Note: Make all your adjustments in here % ======================================= diff --git a/tikzexternalize/dummy.txt b/tikzexternalize/dummy.txt new file mode 100644 index 0000000..7272eb9 --- /dev/null +++ b/tikzexternalize/dummy.txt @@ -0,0 +1 @@ +https://www.overleaf.com/learn/latex/Questions/I_have_a_lot_of_tikz%2C_matlab2tikz_or_pgfplots_figures%2C_so_I%27m_getting_a_compilation_timeout._Can_I_externalise_my_figures%3F