Merge pull request #37 from aimalz/issue/34/derivations

Issue/34/derivations
aimalz · Dec 12, 2016 · 28a32a8 · 28a32a8
2 parents 35523c6 + 4c042bb
commit 28a32a8
Showing 1 changed file with 78 additions and 34 deletions.
diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# The Kullback-Leibler Divergence\n",
     "\n",
-    "In this notebook, we try and gain some intuition about the magnitude of the KL divergence by computing its value between two Gaussian PDFs as a function of the \"tension\" between them.\n",
+    "The KL divergence is used as a measure of how close an approximation to a probability distribution is to the true probability distribution it approximates.  In this notebook, we try to gain some intuition about the magnitude of the KL divergence by computing its value between two Gaussian PDFs as a function of the \"precision\" and \"tension\" between them.\n",
     "\n",
     "### Requirements\n",
     "\n",
@@ -23,7 +23,7 @@
     "\n",
     "$D(P||Q) = \\int_{-\\infty}^{\\infty} \\log \\left( \\frac{P(x)}{Q(x)} \\right) P(x) dx$\n",
     "\n",
-    "The wikipedia page for the KL divergence gives the following useful interpretation of the KLD:\n",
+    "The Wikipedia page for the KL divergence gives the following useful interpretation of the KLD:\n",
     "\n",
     "> KL divergence is a measure of the difference between two probability distributions $P$ and $Q$. It is not symmetric in $P$ and $Q$. In applications, $P$ typically represents ... a precisely calculated theoretical distribution, while $Q$ typically represents ... [an] approximation of $P$.\n",
     ">\n",
@@ -36,7 +36,7 @@
    "source": [
     "## 1D Gaussian Illustration\n",
     "\n",
-    "\"Information\" is not a terribly familiar quantity to most of us, so lets compute the KLD between two Gaussians:\n",
+    "\"Information\" is not a terribly familiar quantity to most of us, so let's compute the KLD between two Gaussians:\n",
     "\n",
     "* The \"True\" 1D Gaussian PDF, $P(x)$, of unit width and central value 0\n",
     "\n",
@@ -137,6 +137,70 @@
     "i.e. Two concentric 1D Gaussian PDFs differing in width by a factor of 4.37 have a KLD of 1 nat."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Analytic Formulae"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Precision\n",
+    "\n",
+    "The KL divergence, in nats, between an approximating, lower precision, correctly aligned Gaussian of variance $\\sigma^{2}$ and a true Gaussian of variance $\\sigma_{0}^{2}$ is related to the ratio between the two distribution widths:\n",
+    "\n",
+    "\\begin{align*}\n",
+    "D &= \\int_{-\\infty}^{\\infty}P(x)\\log\\left[\\frac{P(x)}{Q(x)}\\right]dx\\\\\n",
+    "&= \\int_{-\\infty}^{\\infty}P(x)\\log[P(x)]dx-\\int_{-\\infty}^{\\infty}P(x)\\log[Q(x)]dx\\\\\n",
+    "&= \\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]]dx-\\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}]]dx\\\\\n",
+    "&= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\right)dx-\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}\\right)]dx\\right)\n",
+    "\\end{align*}\n",
+    "\n",
+    "We substitute $u=\\frac{(x-x_{0})^{2}}{2\\sigma_{0}}$:\n",
+    "\n",
+    "\\begin{align*}\n",
+    "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n",
+    "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n",
+    "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n",
+    "\\end{align*}\n",
+    "\n",
+    "We transform back and evaluate this at the limits:\n",
+    "\n",
+    "\\begin{align*}\n",
+    "D &= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[2\\sqrt{\\pi}]\\right]+\\left[\\sqrt{\\pi}]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[2\\sqrt{\\pi}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\sqrt{\\pi}]\\right]\\right)\\right)\\\\\n",
+    "&= -\\frac{1}{2}\\left(\\log\\left[\\frac{\\sigma_{0}}{\\sigma}\\right]+1-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\right)\n",
+    "\\end{align*}\n",
+    "\n",
+    "where $\\sigma_0$ is the width of the true distribution.\n",
+    "\n",
+    "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $r^{-1}$ going from approximation to truth, which in the 1D Gaussian case is just \n",
+    "### $r^{-1}\\equiv\\frac{\\sigma_{0}}{\\sigma}\\approx\\exp[-2D]$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tension\n",
+    "\n",
+    "The KL divergence, in nats, between an approximating Gaussian and a true Gaussian is _approximately_ equal to the square of the tension between the two distributions.  By a similar derivation to the above, we obtain\n",
+    "\n",
+    "## $D = \\log[r]-\\frac{1}{2}(1-r^{-2})+\\frac{1}{2}(1+r^{-2})t^{2} \\approx t^2$ \n",
+    "\n",
+    "where tension $t$ is defined as\n",
+    "\n",
+    "## $t = \\frac{\\Delta x}{\\sqrt{\\left(\\sigma_0^2 + \\sigma^2\\right)}}$\n",
+    "\n",
+    "and has, in some sense, \"units\" of \"sigma\". The KLD is the information lost when using the approximation: the information loss rises in proprtion to the tension squared. The above formula is most accurate in the limit where the two distributions have the same width.\n",
+    "\n",
+    "Still, we can see that the KL divergence might provide a route to a generalized quantification of tension.  The square root of the KLD between a PDF and its approximation, in nats, gives an approximate sense of the tension between the two distributions, in \"units\" of \"sigma\":\n",
+    "\n",
+    "## $t \\approx \\sqrt{D}$"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -329,39 +393,19 @@
    "source": [
     "## Conclusions\n",
     "\n",
-    "The simple numerical experiments in this notebook suggest the following approximate extrapolations and hypotheses.\n",
-    "\n",
-    "\n",
-    "### Precision\n",
-    "\n",
-    "The KL divergence, in nats, between an approximating, lower precision, correctly aligned Gaussian and a true Gaussian is _approximately_ equal to the logarithm of $2/\\pi$ times the ratio between the two distribution widths:\n",
-    "\n",
-    "## $D \\approx \\log{\\left( \\frac{2}{\\pi}\\frac{\\sigma}{\\sigma_0} \\right)}$\n",
-    "\n",
-    "where $\\sigma_0$ is the width of the true distribution. An analytic derivation of this result would be welcome!\n",
-    "\n",
-    "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $\\alpha$ going from approximation to truth (which in the 1D Gaussian case is just $\\sigma / \\sigma_0$) is given by:  \n",
-    "\n",
-    "## $\\alpha \\approx \\frac{\\pi}{2} e^{D}$\n",
-    "\n",
-    "\n",
-    "\n",
-    "### Tension\n",
+    "To summarize, the KL divergence $D$ is an appropriate metric of an approximation to a probability distribution, expressing the loss of information of the approximation from the true distribution.  The simple numerical experiments in this notebook suggest the following approximate extrapolations and hypotheses.  \n",
     "\n",
-    "The KL divergence, in nats, between an approximating Gaussian and a true Gaussian is _approximately_ equal to the square of the tension between the two distributions:\n",
-    "\n",
-    "## $D \\approx t^2$ \n",
-    "\n",
-    "where tension $t$ is defined as\n",
-    "\n",
-    "## $t = \\frac{\\Delta x}{\\sqrt{\\left(\\sigma_0^2 + \\sigma^2\\right)}}$\n",
-    "\n",
-    "and has, in some sense, \"units\" of \"sigma\". The KLD is the information lost when using the approximation: the information loss rises in proprtion to the tension squared. An analytic derivation of this result would be welcome! The above formula is most accurate when the two distributions hwave the same width: it would be good to have a more general formula.\n",
-    "\n",
-    "Still: we can see that the KL divergence might provide a route to a generalized quantification of tension.  The square root of the KLD between a PDF and its approximation, in nats, gives an approximate sense of the tension between the two distributions, in \"units\" of \"sigma\":\n",
-    "\n",
-    "## $t \\approx \\sqrt{D}$"
+    "Using a Gaussian example enables exploration of two quantities characterizing the approximate distribution: the \"precision\" $r^{-1}$ is a measure of the width of the approximating distribution relative to the truth, and the \"tension\" $t$ is a measure of the difference in centroids weighted by the root-mean-square width of the two distributions.  We have found that the KLD can be interpreted in terms of these quantities; the KLD is proportional to the log of the precision and the square of the tension."
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {