Skip to content

Commit

Permalink
typo correction ch 4
Browse files Browse the repository at this point in the history
  • Loading branch information
ElektroDuck committed Mar 8, 2024
1 parent aaf6883 commit 970ddff
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -587,25 +587,25 @@
"id": "f3449fcd",
"metadata": {},
"source": [
"Once the structure of the network is built, the next step is to let it learn the parameters (_CPDs_) from the dataset. The choice is [between two algorithms](https://towardsdatascience.com/maximum-likelihood-vs-bayesian-estimation-dd2eb4dfda8a):\n",
"Once the structure of the network is built, the next step is letting it learn the parameters (_CPDs_) from the dataset. The choice is [between two algorithms](https://towardsdatascience.com/maximum-likelihood-vs-bayesian-estimation-dd2eb4dfda8a):\n",
"\n",
"* **Maximum Likelihood Estimation**: Given the likelihood function $\\mathcal{L}(\\theta | D) = f(D | \\theta) = \\prod{}_{i=1}^{N}f(x_i | \\theta)$, the MLE algorithm tries to fit the parameter $\\theta$ that maximize the likelihood function:\n",
"$$ \\hat{\\theta} = \\argmax_\\theta \\prod_{i=1}^N f(x_i | \\theta) = \\frac{\\partial}{\\partial \\theta} \\prod_{i=1}^N f(x_i|\\theta)$$ \n",
"Since computing the derivatives of many products can get really complex, in the real world it is used the _log likelihood function_. This is done because the logs of productus is the sum of the logs, and that's simplify the computation, and also the $\\argmax$ of a function doens't change if we are applying the log, since it is strictly monotonic and scaled. So the real formula that is computed is\n",
"Since computing the derivatives of many products can get really complex, in the real world scenarios the _log likelihood function_ is used. This is done because the logs of products is the sum of the logs, and that simplifies the computation, and also the $\\argmax$ of a function doens't change if we are applying the log, since it is strictly monotonic and scaled. So the real computed formula is:\n",
"$$\\text{log likelihood} : \\int(\\theta) = \\ln \\prod_{i=1}^Nf(x_i | \\theta) = \\sum_{i=1}^N \\ln f(x_i | \\theta)$$\n",
"Therefore,\n",
"Therefore:\n",
"$$ \\hat{\\theta} = \\argmax_\\theta \\int (\\theta)$$\n",
"\n",
"This algorithm works the best if the dataset is big, since its outcome solely depends on the _observed_ data. It is also adviced when there is uncertainty about the prior.\n",
"\n",
"* **Bayesian Estimation**: The equation used for Bayesian estimation is the following:\n",
"$$ \\overbrace{\\mathbb{P}(\\theta | D)}^\\text{posterior distribution} = \\frac{\\overbrace{\\mathbb{P}(D|\\theta)}^{\\text{likelihood function}} \\overbrace{\\mathbb{P}(\\theta)}^{\\text{prior distribution}}}{\\int \\mathbb{P}(D|\\theta) \\mathbb{P}(\\theta) d \\theta}$$\n",
"the formula is quite similar to the *Bayes' theorem*, but instead of working with numerical value it uses models and pdfs. The Bayesian estimator tries to compute a distribution over the parameter space, called *posteriod pdf*, and denoted as $\\mathbb{P}(\\theta | D)$. This distribution represents how strongly we believe each parameter values is the one that generated our data, after taking into account both the observed data and the prior knowledge.\n",
"The formula is quite similar to the *Bayes' theorem*, but instead of working with numerical values it uses models and pdfs. The Bayesian estimator tries to compute a distribution over the parameter space, called *posteriod pdf*, and denoted as $\\mathbb{P}(\\theta | D)$. This distribution represents how strongly we believe each parameter values is the one that generated our data, after taking into account both the observed data and the prior knowledge.\n",
"\n",
"The bayesian estimations works the best if the priors of the networks *makes sense*.\n",
"The bayesian estimation works better if the priors of the networks *makes sense*.\n",
"\n",
"\n",
"Since we are working on a naïve network, the priors are probably not describing well how each feature affects each other, and we have proceeded using the **Maximum Likelihood Estimation**"
"Since we are working on a naïve network, the priors are probably not describing well how the features affect each other, and we have proceeded using the **Maximum Likelihood Estimation**"
]
},
{
Expand Down Expand Up @@ -674,7 +674,7 @@
"source": [
"## 4.1 Considerations\n",
"<a class=\"anchor\" id=\"ch41\"></a>\n",
"With the Naïve Bayesian classifier we get an overall good `roc_auc` score and we mantain a good computational performace. Is it possibile to improve the network? Does it make sense to add others **causal** link in between features?\n",
"With the Naïve Bayesian classifier we get an overall good `roc_auc` score and we mantain a good computational performace. Is it possibile to improve the network? Does it make sense to add others **causal** link between features?\n",
"\n",
"---"
]
Expand Down

0 comments on commit 970ddff

Please sign in to comment.