From f74bb7c35d9603a7568c5392a5af05ee840d6da8 Mon Sep 17 00:00:00 2001 From: aimalz Date: Thu, 8 Dec 2016 13:24:08 -0500 Subject: [PATCH 1/7] added derivation of precision --- docs/notebooks/kld.ipynb | 44 +++++++++++++++++++++++++++++++--------- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb index 8bf15494..543cbd6f 100644 --- a/docs/notebooks/kld.ipynb +++ b/docs/notebooks/kld.ipynb @@ -36,7 +36,7 @@ "source": [ "## 1D Gaussian Illustration\n", "\n", - "\"Information\" is not a terribly familiar quantity to most of us, so lets compute the KLD between two Gaussians:\n", + "\"Information\" is not a terribly familiar quantity to most of us, so let's compute the KLD between two Gaussians:\n", "\n", "* The \"True\" 1D Gaussian PDF, $P(x)$, of unit width and central value 0\n", "\n", @@ -329,23 +329,47 @@ "source": [ "## Conclusions\n", "\n", - "The simple numerical experiments in this notebook suggest the following approximate extrapolations and hypotheses.\n", - "\n", - "\n", + "The simple numerical experiments in this notebook suggest the following approximate extrapolations and hypotheses." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "### Precision\n", "\n", - "The KL divergence, in nats, between an approximating, lower precision, correctly aligned Gaussian and a true Gaussian is _approximately_ equal to the logarithm of $2/\\pi$ times the ratio between the two distribution widths:\n", + "The KL divergence, in nats, between an approximating, lower precision, correctly aligned Gaussian of variance $\\sigma^{2}$ and a true Gaussian of variance $\\sigma_{0}^{2}$ is related to the ratio between the two distribution widths:\n", "\n", - "## $D \\approx \\log{\\left( \\frac{2}{\\pi}\\frac{\\sigma}{\\sigma_0} \\right)}$\n", + "\\begin{align*}\n", + "D &= \\int_{-\\infty}^{\\infty}P(x)\\log\\left[\\frac{P(x)}{Q(x)}\\right]dx\\\\\n", + "&= \\int_{-\\infty}^{\\infty}P(x)\\log[P(x)]dx-\\int_{-\\infty}^{\\infty}P(x)\\log[Q(x)]dx\\\\\n", + "&= \\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]]dx-\\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}]]dx\\\\\n", + "&= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\right)dx-\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}\\right)]dx\\right)\n", + "\\end{align*}\n", "\n", - "where $\\sigma_0$ is the width of the true distribution. An analytic derivation of this result would be welcome!\n", + "We substitute $u=\\frac{(x-x_{0})^{2}}{2\\sigma_{0}}$:\n", + "\\begin{align*}\n", + "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n", + "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", + "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}\\ erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}\\ erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}\\ erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}\\ erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\\\\\n", + "\\end{align*}\n", "\n", - "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $\\alpha$ going from approximation to truth (which in the 1D Gaussian case is just $\\sigma / \\sigma_0$) is given by: \n", - "\n", - "## $\\alpha \\approx \\frac{\\pi}{2} e^{D}$\n", + "We transform back and evaluate this at the limits:\n", "\n", + "\\begin{align*}\n", + "D &= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[2\\sqrt{\\pi}]\\right]+\\left[\\sqrt{\\pi}]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[2\\sqrt{\\pi}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\sqrt{\\pi}]\\right]\\right)\\right)\\\\\n", + "&= -\\frac{1}{2}\\left(\\log[\\frac{\\sigma_{0}}{\\sigma}]+1-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\right)\n", + "\\end{align*}\n", "\n", + "where $\\sigma_0$ is the width of the true distribution.\n", "\n", + "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $\\alpha$ going from approximation to truth (which in the 1D Gaussian case is just $\\frac{\\sigma_{0}}{\\sigma}$) is given by $\\alpha\\approx\\exp[-D]$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "### Tension\n", "\n", "The KL divergence, in nats, between an approximating Gaussian and a true Gaussian is _approximately_ equal to the square of the tension between the two distributions:\n", From 79f3fe31c65c390a8bf4ca12ae1e3c60537a3625 Mon Sep 17 00:00:00 2001 From: aimalz Date: Thu, 8 Dec 2016 13:37:33 -0500 Subject: [PATCH 2/7] fixed bug --- docs/notebooks/kld.ipynb | 15 ++++++++++++--- qp/pdf.py | 4 +--- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb index 543cbd6f..6a18a995 100644 --- a/docs/notebooks/kld.ipynb +++ b/docs/notebooks/kld.ipynb @@ -363,7 +363,7 @@ "\n", "where $\\sigma_0$ is the width of the true distribution.\n", "\n", - "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $\\alpha$ going from approximation to truth (which in the 1D Gaussian case is just $\\frac{\\sigma_{0}}{\\sigma}$) is given by $\\alpha\\approx\\exp[-D]$." + "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $\\alpha$ going from approximation to truth (which in the 1D Gaussian case is just $r^{-1}\\equiv\\frac{\\sigma_{0}}{\\sigma}$) is given by $\\alpha\\approx\\exp[-2D]$." ] }, { @@ -372,9 +372,9 @@ "source": [ "### Tension\n", "\n", - "The KL divergence, in nats, between an approximating Gaussian and a true Gaussian is _approximately_ equal to the square of the tension between the two distributions:\n", + "The KL divergence, in nats, between an approximating Gaussian and a true Gaussian is _approximately_ equal to the square of the tension between the two distributions. By a similar derivation, we obtain:\n", "\n", - "## $D \\approx t^2$ \n", + "## $D = \\log[r]-\\frac{1}{2}(1-r^{-2})+\\frac{1}{2}(1+r^{-2})t^{2} \\approx t^2$ \n", "\n", "where tension $t$ is defined as\n", "\n", @@ -386,6 +386,15 @@ "\n", "## $t \\approx \\sqrt{D}$" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/qp/pdf.py b/qp/pdf.py index 58dfe876..96af22f7 100644 --- a/qp/pdf.py +++ b/qp/pdf.py @@ -4,7 +4,6 @@ import qp -<<<<<<< HEAD class PDF(object): def __init__(self, truth=None, quantiles=None, histogram=None, @@ -37,8 +36,7 @@ def __init__(self, truth=None, quantiles=None, histogram=None, self.initialized = 'histogram' self.last = self.initialized - if vb and self.truth is None and self.quantiles is None - and self.histogram is None: + if vb and self.truth is None and self.quantiles is None and self.histogram is None: print 'Warning: initializing a PDF object without inputs' self.difs = None self.mids = None From 9d3cb96afb7c08672981c7e0b32e9cf41d3ded93 Mon Sep 17 00:00:00 2001 From: aimalz Date: Thu, 8 Dec 2016 18:08:15 -0500 Subject: [PATCH 3/7] fixing tex that works locally but not on github preview --- docs/notebooks/kld.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb index 6a18a995..5d97547d 100644 --- a/docs/notebooks/kld.ipynb +++ b/docs/notebooks/kld.ipynb @@ -351,7 +351,7 @@ "\\begin{align*}\n", "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n", "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", - "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}\\ erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}\\ erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}\\ erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}\\ erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\\\\\n", + "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\\\\\n", "\\end{align*}\n", "\n", "We transform back and evaluate this at the limits:\n", From 42dd7bdb918f974ae2be5c65c56a8edf153315d4 Mon Sep 17 00:00:00 2001 From: aimalz Date: Thu, 8 Dec 2016 18:09:46 -0500 Subject: [PATCH 4/7] fixing tex that works locally but not on github preview --- docs/notebooks/kld.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb index 5d97547d..0bfa62b7 100644 --- a/docs/notebooks/kld.ipynb +++ b/docs/notebooks/kld.ipynb @@ -351,7 +351,7 @@ "\\begin{align*}\n", "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n", "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", - "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\\\\\n", + "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n", "\\end{align*}\n", "\n", "We transform back and evaluate this at the limits:\n", From 36d416f3366b101db526972899d72564f67e7140 Mon Sep 17 00:00:00 2001 From: aimalz Date: Thu, 8 Dec 2016 18:10:35 -0500 Subject: [PATCH 5/7] fixing tex that works locally but not on github preview --- docs/notebooks/kld.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb index 0bfa62b7..684518d6 100644 --- a/docs/notebooks/kld.ipynb +++ b/docs/notebooks/kld.ipynb @@ -350,8 +350,8 @@ "We substitute $u=\\frac{(x-x_{0})^{2}}{2\\sigma_{0}}$:\n", "\\begin{align*}\n", "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n", - "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", - "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n", + "%&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", + "%&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n", "\\end{align*}\n", "\n", "We transform back and evaluate this at the limits:\n", From 945e136c07958be9291d604b4aedeea8d4157672 Mon Sep 17 00:00:00 2001 From: aimalz Date: Thu, 8 Dec 2016 18:12:46 -0500 Subject: [PATCH 6/7] fixing tex that works locally but not on github preview --- docs/notebooks/kld.ipynb | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb index 684518d6..be7287cd 100644 --- a/docs/notebooks/kld.ipynb +++ b/docs/notebooks/kld.ipynb @@ -348,10 +348,11 @@ "\\end{align*}\n", "\n", "We substitute $u=\\frac{(x-x_{0})^{2}}{2\\sigma_{0}}$:\n", + "\n", "\\begin{align*}\n", "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n", - "%&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", - "%&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n", + "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", + "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n", "\\end{align*}\n", "\n", "We transform back and evaluate this at the limits:\n", From 4c042bbaec57435c9d6c7b97c41ccf241fafdfd3 Mon Sep 17 00:00:00 2001 From: aimalz Date: Fri, 9 Dec 2016 20:33:32 -0500 Subject: [PATCH 7/7] updated text in notebook --- docs/notebooks/kld.ipynb | 126 +++++++++++++++++++++------------------ 1 file changed, 68 insertions(+), 58 deletions(-) diff --git a/docs/notebooks/kld.ipynb b/docs/notebooks/kld.ipynb index be7287cd..28794466 100644 --- a/docs/notebooks/kld.ipynb +++ b/docs/notebooks/kld.ipynb @@ -6,7 +6,7 @@ "source": [ "# The Kullback-Leibler Divergence\n", "\n", - "In this notebook, we try and gain some intuition about the magnitude of the KL divergence by computing its value between two Gaussian PDFs as a function of the \"tension\" between them.\n", + "The KL divergence is used as a measure of how close an approximation to a probability distribution is to the true probability distribution it approximates. In this notebook, we try to gain some intuition about the magnitude of the KL divergence by computing its value between two Gaussian PDFs as a function of the \"precision\" and \"tension\" between them.\n", "\n", "### Requirements\n", "\n", @@ -23,7 +23,7 @@ "\n", "$D(P||Q) = \\int_{-\\infty}^{\\infty} \\log \\left( \\frac{P(x)}{Q(x)} \\right) P(x) dx$\n", "\n", - "The wikipedia page for the KL divergence gives the following useful interpretation of the KLD:\n", + "The Wikipedia page for the KL divergence gives the following useful interpretation of the KLD:\n", "\n", "> KL divergence is a measure of the difference between two probability distributions $P$ and $Q$. It is not symmetric in $P$ and $Q$. In applications, $P$ typically represents ... a precisely calculated theoretical distribution, while $Q$ typically represents ... [an] approximation of $P$.\n", ">\n", @@ -137,6 +137,70 @@ "i.e. Two concentric 1D Gaussian PDFs differing in width by a factor of 4.37 have a KLD of 1 nat." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Analytic Formulae" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Precision\n", + "\n", + "The KL divergence, in nats, between an approximating, lower precision, correctly aligned Gaussian of variance $\\sigma^{2}$ and a true Gaussian of variance $\\sigma_{0}^{2}$ is related to the ratio between the two distribution widths:\n", + "\n", + "\\begin{align*}\n", + "D &= \\int_{-\\infty}^{\\infty}P(x)\\log\\left[\\frac{P(x)}{Q(x)}\\right]dx\\\\\n", + "&= \\int_{-\\infty}^{\\infty}P(x)\\log[P(x)]dx-\\int_{-\\infty}^{\\infty}P(x)\\log[Q(x)]dx\\\\\n", + "&= \\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]]dx-\\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}]]dx\\\\\n", + "&= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\right)dx-\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}\\right)]dx\\right)\n", + "\\end{align*}\n", + "\n", + "We substitute $u=\\frac{(x-x_{0})^{2}}{2\\sigma_{0}}$:\n", + "\n", + "\\begin{align*}\n", + "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n", + "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", + "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n", + "\\end{align*}\n", + "\n", + "We transform back and evaluate this at the limits:\n", + "\n", + "\\begin{align*}\n", + "D &= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[2\\sqrt{\\pi}]\\right]+\\left[\\sqrt{\\pi}]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[2\\sqrt{\\pi}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\sqrt{\\pi}]\\right]\\right)\\right)\\\\\n", + "&= -\\frac{1}{2}\\left(\\log\\left[\\frac{\\sigma_{0}}{\\sigma}\\right]+1-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\right)\n", + "\\end{align*}\n", + "\n", + "where $\\sigma_0$ is the width of the true distribution.\n", + "\n", + "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $r^{-1}$ going from approximation to truth, which in the 1D Gaussian case is just \n", + "### $r^{-1}\\equiv\\frac{\\sigma_{0}}{\\sigma}\\approx\\exp[-2D]$." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Tension\n", + "\n", + "The KL divergence, in nats, between an approximating Gaussian and a true Gaussian is _approximately_ equal to the square of the tension between the two distributions. By a similar derivation to the above, we obtain\n", + "\n", + "## $D = \\log[r]-\\frac{1}{2}(1-r^{-2})+\\frac{1}{2}(1+r^{-2})t^{2} \\approx t^2$ \n", + "\n", + "where tension $t$ is defined as\n", + "\n", + "## $t = \\frac{\\Delta x}{\\sqrt{\\left(\\sigma_0^2 + \\sigma^2\\right)}}$\n", + "\n", + "and has, in some sense, \"units\" of \"sigma\". The KLD is the information lost when using the approximation: the information loss rises in proprtion to the tension squared. The above formula is most accurate in the limit where the two distributions have the same width.\n", + "\n", + "Still, we can see that the KL divergence might provide a route to a generalized quantification of tension. The square root of the KLD between a PDF and its approximation, in nats, gives an approximate sense of the tension between the two distributions, in \"units\" of \"sigma\":\n", + "\n", + "## $t \\approx \\sqrt{D}$" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -329,63 +393,9 @@ "source": [ "## Conclusions\n", "\n", - "The simple numerical experiments in this notebook suggest the following approximate extrapolations and hypotheses." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Precision\n", - "\n", - "The KL divergence, in nats, between an approximating, lower precision, correctly aligned Gaussian of variance $\\sigma^{2}$ and a true Gaussian of variance $\\sigma_{0}^{2}$ is related to the ratio between the two distribution widths:\n", - "\n", - "\\begin{align*}\n", - "D &= \\int_{-\\infty}^{\\infty}P(x)\\log\\left[\\frac{P(x)}{Q(x)}\\right]dx\\\\\n", - "&= \\int_{-\\infty}^{\\infty}P(x)\\log[P(x)]dx-\\int_{-\\infty}^{\\infty}P(x)\\log[Q(x)]dx\\\\\n", - "&= \\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]]dx-\\int_{-\\infty}^{\\infty}\\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\log[\\frac{1}{\\sqrt{2\\pi}\\sigma}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}]]dx\\\\\n", - "&= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\right)dx-\\int_{-\\infty}^{\\infty}\\exp[-\\frac{(x-x_{0})^{2}}{2\\sigma_{0}^{2}}]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{(x-x_{0})^{2}}{2\\sigma^{2}}\\right)]dx\\right)\n", - "\\end{align*}\n", - "\n", - "We substitute $u=\\frac{(x-x_{0})^{2}}{2\\sigma_{0}}$:\n", - "\n", - "\\begin{align*}\n", - "D &= \\frac{1}{\\sqrt{2\\pi}\\sigma_{0}}\\left(\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]-u]\\right)\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}-\\int\\exp[-u]\\left(-\\log[\\sqrt{2\\pi}\\sigma]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u\\right)]\\frac{du}{\\frac{\\sqrt{2u}}{\\sigma_{0}}}\\right)\\\\\n", - "&= \\frac{1}{2\\sqrt{\\pi}}\\left(\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma_{0}]u^{-\\frac{1}{2}}\\exp[-u]-u^{\\frac{1}{2}}\\exp[-u]\\right)du-\\int\\left(-\\log[\\sqrt{2\\pi}\\sigma]u^{-\\frac{1}{2}}\\exp[-u]-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}u^{\\frac{1}{2}}\\exp[-u]\\right)du\\right)\\\\\n", - "&= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[\\sqrt{\\pi}erf[u^{\\frac{1}{2}}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\frac{\\sqrt{\\pi}}{2}erf[u^{\\frac{1}{2}}]-u^{-\\frac{1}{2}}\\exp[-u]\\right]\\right)\\right)\n", - "\\end{align*}\n", + "To summarize, the KL divergence $D$ is an appropriate metric of an approximation to a probability distribution, expressing the loss of information of the approximation from the true distribution. The simple numerical experiments in this notebook suggest the following approximate extrapolations and hypotheses. \n", "\n", - "We transform back and evaluate this at the limits:\n", - "\n", - "\\begin{align*}\n", - "D &= -\\frac{1}{2\\sqrt{\\pi}}\\left(\\left(\\log[\\sqrt{2\\pi}\\sigma_{0}]\\left[2\\sqrt{\\pi}]\\right]+\\left[\\sqrt{\\pi}]\\right]\\right)-\\left(\\log[\\sqrt{2\\pi}\\sigma]\\left[2\\sqrt{\\pi}]\\right]+\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\left[\\sqrt{\\pi}]\\right]\\right)\\right)\\\\\n", - "&= -\\frac{1}{2}\\left(\\log[\\frac{\\sigma_{0}}{\\sigma}]+1-\\frac{\\sigma_{0}^{2}}{\\sigma^{2}}\\right)\n", - "\\end{align*}\n", - "\n", - "where $\\sigma_0$ is the width of the true distribution.\n", - "\n", - "We can perhaps take the KL divergence to provide a generalized quantification of increase of precision, as in: the increase in precision $\\alpha$ going from approximation to truth (which in the 1D Gaussian case is just $r^{-1}\\equiv\\frac{\\sigma_{0}}{\\sigma}$) is given by $\\alpha\\approx\\exp[-2D]$." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Tension\n", - "\n", - "The KL divergence, in nats, between an approximating Gaussian and a true Gaussian is _approximately_ equal to the square of the tension between the two distributions. By a similar derivation, we obtain:\n", - "\n", - "## $D = \\log[r]-\\frac{1}{2}(1-r^{-2})+\\frac{1}{2}(1+r^{-2})t^{2} \\approx t^2$ \n", - "\n", - "where tension $t$ is defined as\n", - "\n", - "## $t = \\frac{\\Delta x}{\\sqrt{\\left(\\sigma_0^2 + \\sigma^2\\right)}}$\n", - "\n", - "and has, in some sense, \"units\" of \"sigma\". The KLD is the information lost when using the approximation: the information loss rises in proprtion to the tension squared. An analytic derivation of this result would be welcome! The above formula is most accurate when the two distributions hwave the same width: it would be good to have a more general formula.\n", - "\n", - "Still: we can see that the KL divergence might provide a route to a generalized quantification of tension. The square root of the KLD between a PDF and its approximation, in nats, gives an approximate sense of the tension between the two distributions, in \"units\" of \"sigma\":\n", - "\n", - "## $t \\approx \\sqrt{D}$" + "Using a Gaussian example enables exploration of two quantities characterizing the approximate distribution: the \"precision\" $r^{-1}$ is a measure of the width of the approximating distribution relative to the truth, and the \"tension\" $t$ is a measure of the difference in centroids weighted by the root-mean-square width of the two distributions. We have found that the KLD can be interpreted in terms of these quantities; the KLD is proportional to the log of the precision and the square of the tension." ] }, {