IPL-UV · jejjohnson · Apr 8, 2019 · Jun 19, 2019 · Jun 19, 2019 · Jul 2, 2019
diff --git a/.env b/.env
@@ -0,0 +1 @@
+PYTHONPATH="${workspaceFolder}/."
diff --git a/.gitignore b/.gitignore
@@ -4,7 +4,6 @@ code_through/
 \.idea/
 \__pycache__/
 *.log
-
 *.tex
 .idea/misc.xml
 .idea/rbig.iml
@@ -18,4 +17,4 @@ code_through/
 *.csv
 
 \.eggs/
-\site/
+\site/
diff --git a/docs/.nojekyll b/docs/.nojekyll
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,16 @@
+# Rotation-Based Iterative Gaussianization
+
+
+A method that provides a transformation scheme from any distribution to a gaussian distribution. This repository will facilitate translating the original MATLAB code into a python implementation compatible with the scikit-learn framework.
+
+
+### Resources
+
+* Original Webpage - [ISP](http://isp.uv.es/rbig.html)
+* Original MATLAB Code - [webpage](http://isp.uv.es/code/featureextraction/RBIG_toolbox.zip)
+* Original Python Code - [github](https://github.com/spencerkent/pyRBIG)
+* [Paper](https://arxiv.org/abs/1602.00229) - Iterative Gaussianization: from ICA to Random Rotations
+
+Abstract From Paper
+
+> Most signal processing problems involve the challenging task of multidimensional probability density function (PDF) estimation. In this work, we propose a solution to this problem by using a family of Rotation-based Iterative Gaussianization (RBIG) transforms. The general framework consists of the sequential application of a univariate marginal Gaussianization transform followed by an orthonormal transform. The proposed procedure looks for differentiable transforms to a known PDF so that the unknown PDF can be estimated at any point of the original domain. In particular, we aim at a zero mean unit covariance Gaussian for convenience. RBIG is formally similar to classical iterative Projection Pursuit (PP) algorithms. However, we show that, unlike in PP methods, the particular class of rotations used has no special qualitative relevance in this context, since looking for interestingness is not a critical issue for PDF estimation. The key difference is that our approach focuses on the univariate part (marginal Gaussianization) of the problem rather than on the multivariate part (rotation). This difference implies that one may select the most convenient rotation suited to each practical application. The differentiability, invertibility and convergence of RBIG are theoretically and experimentally analyzed. Relation to other methods, such as Radial Gaussianization (RG), one-class support vector domain description (SVDD), and deep neural networks (DNN) is also pointed out. The practical performance of RBIG is successfully illustrated in a number of multidimensional problems such as image synthesis, classification, denoising, and multi-information estimation.
diff --git a/docs/_sidebar.md b/docs/_sidebar.md
@@ -0,0 +1,24 @@
+<!-- docs/_sidebar.md -->
+
+* [pyRBIG](README.md)
+
+**Theory**
+* [Literature](literature.md)
+* [What is Gaussianization?](gaussianization.md)
+* [What is RBIG?](rbig.md)
+* [Normalizing Flows](nfs.md)
+
+**Demos**
+* [Gaussianization](/)
+* [Information Theory](/)
+
+**Walk-Throughs**
+* [Uniformization](mu.md)
+* [Marginal Gaussianization](mg.md)
+* [Rotation](rotation.md)
+
+**Supplementary**
+* [Information Theory](itm.md)
+* [Gaussian Distribution](gaussian.md)
+* [Uniform Distribution](uniform.md)
+* [Exponential Family](exponential.md)
diff --git a/docs/dds.md b/docs/dds.md
@@ -0,0 +1,27 @@
+# Density Destructors
+
+
+## Main Idea
+
+
+## Forward Approach
+
+We can view the approach of modeling from two perspectives: constructive or destructive. A constructive process tries to learn how to build an exact sequence of transformations to go from $z$ to $x$. The destructive process does the complete opposite and decides to create a sequence of transforms from $x$ to $z$ while also remembering the exact transforms; enabling it to reverse that sequence of transforms.
+
+We can write some equations to illustrate exactly what we mean by these two terms. Let's define two spaces: one is our data space $\mathcal X$ and the other is the base space $\mathcal Z$. We want to learn a transformation $f_\theta$ that maps us from $\mathcal X$ to $\mathcal Z$, $f : \mathcal X \rightarrow \mathcal Z$. We also want a function $G_\theta$ that maps us from $\mathcal Z$ to $\mathcal X$, $f : \mathcal Z \rightarrow \mathcal X$.
+
+**TODO: Plot**
+
+More concretely, let's define the following pair of equations:
+
+$$z \sim \mathcal{P}_\mathcal{Z}$$
+$$\hat x = \mathcal G_\theta (z)$$
+
+This is called the generative step; how well do we fit our parameters such that $x \approx \hat x$. We can define the alternative step below:
+
+$$x \sim \mathcal{P}_\mathcal{X}$$
+$$\hat z = \mathcal f_\theta (x)$$
+
+This is called the inference step: how well do we fit the parameters of our transformation $f_\theta$ s.t. $z \approx \hat z$. So there are immediately some things to notice about this. Depending on the method you use in the deep learning community, the functions $\mathcal G_\theta$ and $f_\theta$ can be defined differently. Typically we are looking at the class of algorithms where we want $f_\theta = \mathcal G_\theta^{-1}$. In this ideal scenario, we only need to learn one transformation instead of two. With this requirement, we can actually compute the likelihood values exactly. The likelihood of the value $x$ given the transformation $\mathcal G_\theta$ is given as:
+
+$$\mathcal P_{\hat x}(x)=\mathcal P_{z} \left( \mathcal G_\theta (x) \right)\left| \text{det } \mathbf J_{\mathcal G_\theta} \right|$$
diff --git a/docs/demo_innf.md b/docs/demo_innf.md
@@ -0,0 +1,57 @@
+# Demo: Gaussianization
+
+
+
+## Data
+
+```python
+
+```
+
+## RBIG Model
+
+### Initialize Model
+
+```python
+# rbig parameters
+n_layers        = 1
+rotation_type   = 'PCA'
+random_state    = 123
+zero_tolerance  = 100
+base            = 'gauss'
+
+# initialize RBIG Class
+rbig_clf = RBIG(
+    n_layers=n_layers,
+    rotation_type=rotation_type,
+    random_state=random_state,
+    zero_tolerance=zero_tolerance,
+    base=base
+)
+```
+
+### Fit Model to Data
+
+```python
+# run RBIG model
+rbig_clf.fit(X);
+```
+
+### Visualization
+
+
+#### 1. Marginal Gaussianization
+
+```python
+# rotation matrix V (N x F)
+V = rbig_clf.rotation_matrix[0]
+
+# perform rotation
+data_marg_gauss = X @ V
+```
+
+#### 2. Rotation
+
+```python
+
+```
diff --git a/docs/exponential.md b/docs/exponential.md
@@ -0,0 +1,51 @@
+# Exponential Family of Distributions
+
+
+
+This is the close-form expression for the Sharma-Mittal entropy calculation for expontial families. The Sharma-Mittal entropy is a generalization of the Shannon, Rényi and Tsallis entropy measurements. This estimates Y using the maximum likelihood estimation and then uses the analytical formula for the exponential family.
+
+
+
+**Source Parameters, $\theta$**
+
+$$\theta = (\mu, \Sigma)$$
+
+where $\mu \in \mathbb{R}^{d}$ and $\Sigma > 0$
+
+**Natural Parameters, $\eta$**
+
+$$\eta = \left( \theta_2^{-1}\theta_1, \frac{1}{2}\theta_2^{-1} \right)$$
+
+**Expectation Parameters**
+
+
+
+**Log Normalizer, $F(\eta)$** 
+
+Also known as the log partition function.
+
+$$F(\eta) = \frac{1}{4} tr( \eta_1^\top \eta_2^{-1} \eta) - \frac{1}{2} \log|\eta_2| + \frac{d}{2}\log \pi$$
+
+
+**Gradient Log Normalizer, $\nabla F(\eta)$**
+
+$$\nabla F(\eta) = \left( \frac{1}{2} \eta_2^{-1}\eta_1, -\frac{1}{2} \eta_2^{-1}- \frac{1}{4}(\eta_2^{-1}-\eta_1)(\eta_2^{-1}-\eta_1)^\top \right)$$
+
+**Log Normalizer, $F(\theta)$** 
+
+Also known as the log partition function.
+
+$$F(\theta) = \frac{1}{2} \theta_1^\top \theta_2^{-1} \theta + \frac{1}{2} \log|\theta_2| $$
+
+**Final Entropy Calculation**
+
+$$H = F(\eta) - \langle \eta, \nabla F(\eta) \rangle$$
+
+
+## Resources
+
+* A closed-form expression for the Sharma-Mittal entropy of exponential families - Nielsen & Nock (2012) - [Paper]()
+* Statistical exponential families: A digest with flash cards - [Paper](https://arxiv.org/pdf/0911.4863.pdf)
+* The Exponential Family: Getting Weird Expectations! - [Blog](https://zhiyzuo.github.io/Exponential-Family-Distributions/)
+* Deep Exponential Family - [Code](https://github.com/tensorflow/probability/blob/master/tensorflow_probability/examples/deep_exponential_family.py)
+* PyMEF: A Framework for Exponential Families in Python - [Code](https://github.com/pbrod/pymef) | [Paper](http://www-connex.lip6.fr/~schwander/articles/ssp2011.pdf)
diff --git a/docs/gaussian.md b/docs/gaussian.md
@@ -0,0 +1,98 @@
+# Gaussian Distribution
+
+
+
+### **PDF**
+
+$$f(X)=
+\frac{1}{\sqrt{(2\pi)^D|\Sigma|}}
+\text{exp}\left( -\frac{1}{2} (x-\mu)^\top \Sigma^{-1} (x-\mu)\right)$$
+
+### **Likelihood**
+
+$$- \ln L = \frac{1}{2}\ln|\Sigma| + \frac{1}{2}(x-\mu)^\top \Sigma^{-1} (x - \mu) + \frac{D}{2}\ln 2\pi $$
+
+### Alternative Representation
+
+$$X \sim \mathcal{N}(\mu, \Sigma)$$
+
+where $\mu$ is the mean function and $\Sigma$ is the covariance. Let's decompose $\Sigma$ as with an eigendecomposition like so
+
+$$\Sigma = U\Lambda U^\top = U \Lambda^{1/2}(U\Lambda^{-1/2})^\top$$
+
+Now we can represent our Normal distribution as:
+
+$$X \sim \mu + U\Lambda^{1/2}Z$$
+
+
+
+where:
+
+* $U$ is a rotation matrix
+* $\Lambda^{-1/2}$ is a scale matrix
+* $\mu$ is a translation matrix
+* $Z \sim \mathcal{N}(0,I)$
+
+or also
+
+$$X \sim \mu + UZ$$
+
+where:
+
+* $U$ is a rotation matrix
+* $\Lambda$ is a scale matrix
+* $\mu$ is a translation matrix
+* $Z_n \sim \mathcal{N}(0,\Lambda)$
+
+
+#### Reparameterization
+
+So often in deep learning we will learn this distribution by a reparameterization like so:
+
+$$X = \mu + AZ $$
+
+where:
+
+* $\mu \in \mathbb{R}^{d}$
+* $A \in \mathbb{R}^{d\times l}$
+* $Z_n \sim \mathcal{N}(0, I)$
+* $\Sigma=AA^\top$ - the cholesky decomposition
+
+
+
+---
+### **Entropy**
+
+**1 dimensional**
+
+$$H(X) = \frac{1}{2} \log(2\pi e \sigma^2)$$
+
+**D dimensional**
+$$H(X) = \frac{D}{2} + \frac{D}{2} \ln(2\pi) + \frac{1}{2}\ln|\Sigma|$$
+
+
+### **KL-Divergence (Relative Entropy)**
+
+$$
+KLD(\mathcal{N}_0||\mathcal{N}_1) = \frac{1}{2}
+ \left[ 
+ \text{tr}(\Sigma_1^{-1}\Sigma_0) + 
+ (\mu_1 - \mu_0)^\top \Sigma_1^{-1} (\mu_1 - \mu_0) -
+D + \ln \frac{|\Sigma_1|}{\Sigma_0|}
+\right]
+$$
+
+if $\mu_1=\mu_0$ then:
+
+$$
+KLD(\Sigma_0||\Sigma_1) = \frac{1}{2} \left[ 
+\text{tr}(\Sigma_1^{-1} \Sigma_0)  - D  + \ln \frac{|\Sigma_1|}{|\Sigma_0|} \right]
+$$
+
+**Mutual Information**
+
+$$I(X)= - \frac{1}{2} \ln | \rho_0 |$$
+
+where $\rho_0$ is the correlation matrix from $\Sigma_0$.
+
+$$I(X)$$
-Original file line number
+Diff line change
@@ Expand Up / @@ -4,7 +4,6 @@ code_through/ @@
     \.idea/
     \__pycache__/
     *.log
     *.tex
     .idea/misc.xml
     .idea/rbig.iml
@@ Expand All / @@ -18,4 +17,4 @@ code_through/ @@
     *.csv
     \.eggs/
-    \site/
+    \site/