typo correction ch 4

MatteoFasulo · Mar 8, 2024 · 970ddff · 970ddff
1 parent aaf6883
commit 970ddff
Showing 1 changed file with 7 additions and 7 deletions.
diff --git a/notebook.ipynb b/notebook.ipynb
@@ -587,25 +587,25 @@
    "id": "f3449fcd",
    "metadata": {},
    "source": [
-    "Once the structure of the network is built, the next step is to let it learn the parameters (_CPDs_) from the dataset. The choice is [between two algorithms](https://towardsdatascience.com/maximum-likelihood-vs-bayesian-estimation-dd2eb4dfda8a):\n",
+    "Once the structure of the network is built, the next step is letting it learn the parameters (_CPDs_) from the dataset. The choice is [between two algorithms](https://towardsdatascience.com/maximum-likelihood-vs-bayesian-estimation-dd2eb4dfda8a):\n",
     "\n",
     "* **Maximum Likelihood Estimation**: Given the likelihood function $\\mathcal{L}(\\theta | D) = f(D | \\theta) = \\prod{}_{i=1}^{N}f(x_i | \\theta)$, the MLE algorithm tries to fit the parameter $\\theta$ that maximize the likelihood function:\n",
     "$$ \\hat{\\theta} = \\argmax_\\theta \\prod_{i=1}^N f(x_i | \\theta) = \\frac{\\partial}{\\partial \\theta} \\prod_{i=1}^N f(x_i|\\theta)$$ \n",
-    "Since computing the derivatives of many products can get really complex, in the real world it is used the _log likelihood function_. This is done because the logs of productus is the sum of the logs, and that's simplify the computation, and also the $\\argmax$ of a function doens't change if we are applying the log, since it is strictly monotonic and scaled. So the real formula that is computed is\n",
+    "Since computing the derivatives of many products can get really complex, in the real world scenarios the _log likelihood function_ is used. This is done because the logs of products is the sum of the logs, and that simplifies the computation, and also the $\\argmax$ of a function doens't change if we are applying the log, since it is strictly monotonic and scaled. So the real computed formula is:\n",
     "$$\\text{log likelihood} : \\int(\\theta) = \\ln \\prod_{i=1}^Nf(x_i | \\theta) = \\sum_{i=1}^N \\ln f(x_i | \\theta)$$\n",
-    "Therefore,\n",
+    "Therefore:\n",
     "$$ \\hat{\\theta} = \\argmax_\\theta \\int (\\theta)$$\n",
     "\n",
     "This algorithm works the best if the dataset is big, since its outcome solely depends on the _observed_ data. It is also adviced when there is uncertainty about the prior.\n",
     "\n",
     "* **Bayesian Estimation**: The equation used for Bayesian estimation is the following:\n",
     "$$ \\overbrace{\\mathbb{P}(\\theta | D)}^\\text{posterior distribution} = \\frac{\\overbrace{\\mathbb{P}(D|\\theta)}^{\\text{likelihood function}} \\overbrace{\\mathbb{P}(\\theta)}^{\\text{prior distribution}}}{\\int \\mathbb{P}(D|\\theta) \\mathbb{P}(\\theta) d \\theta}$$\n",
-    "the formula is quite similar to the *Bayes' theorem*, but instead of working with numerical value it uses models and pdfs. The Bayesian estimator tries to compute a distribution over the parameter space, called *posteriod pdf*, and denoted as $\\mathbb{P}(\\theta | D)$. This distribution represents how strongly we believe each parameter values is the one that generated our data, after taking into account both the observed data and the prior knowledge.\n",
+    "The formula is quite similar to the *Bayes' theorem*, but instead of working with numerical values it uses models and pdfs. The Bayesian estimator tries to compute a distribution over the parameter space, called *posteriod pdf*, and denoted as $\\mathbb{P}(\\theta | D)$. This distribution represents how strongly we believe each parameter values is the one that generated our data, after taking into account both the observed data and the prior knowledge.\n",
     "\n",
-    "The bayesian estimations works the best if the priors of the networks *makes sense*.\n",
+    "The bayesian estimation works better if the priors of the networks *makes sense*.\n",
     "\n",
     "\n",
-    "Since we are working on a naïve network, the priors are probably not describing well how each feature affects each other, and we have proceeded using the **Maximum Likelihood Estimation**"
+    "Since we are working on a naïve network, the priors are probably not describing well how the features affect each other, and we have proceeded using the **Maximum Likelihood Estimation**"
    ]
   },
   {
@@ -674,7 +674,7 @@
    "source": [
     "## 4.1 Considerations\n",
     "<a class=\"anchor\" id=\"ch41\"></a>\n",
-    "With the Naïve Bayesian classifier we get an overall good `roc_auc` score and we mantain a good computational performace. Is it possibile to improve the network? Does it make sense to add others **causal** link in between features?\n",
+    "With the Naïve Bayesian classifier we get an overall good `roc_auc` score and we mantain a good computational performace. Is it possibile to improve the network? Does it make sense to add others **causal** link between features?\n",
     "\n",
     "---"
    ]