Multinomial naive Bayes Spam massages Identifier

Identifying and distinguishing spam massages using the multinomial Naïve Bayes model.

what is Naive Bayes classifier

In statistics, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features. At the time of writing this repository, there are 5 different types of Naive Bayes classifiers, which as follow:

1- Bernoulli Naive Bayes classifier

2- Categorical Naive Bayes classifier

3- Complement Naive Bayes classifier

4- Gaussian Naive Bayes classifier

5- multinomial Naive Bayes classifier

In this repository, we have used the multinomial Naive Bayes classifier to detect spam messages, the reason for using this classifier is the simple implementation, high accuracy, and vector implementation method of this model. It should be noted that other methods can also be used to detect spam messages, such as the Complement Naive Bayes classifier and Tf-Idf.

Let's learn more about the Multinomial naive Bayes classifier

MultinomialNB implements the naive Bayes algorithm for multinomially distributed data, and is one of the two classic naive Bayes variants used in text classification (where the data are typically represented as word vector counts, although tf-idf vectors are also known to work well in practice). The distribution is parametrized by vectors θ y = ( θ y 1 , … , θ y n ) for each class y where n is the number of features (in text classification, the size of the vocabulary) and θ y i is the probability P ( x i ∣ y ) of feature i appearing in a sample belonging to class y

The parameters θ y is estimated by a smoothed version of maximum likelihood, i.e. relative frequency counting:

θ ^ y i = N y i + α / N y + α n

where N y i = ∑ x ∈ T x i is the number of times feature i appears in a sample of class y in the training set T and N y = ∑ i = 1 n N y i is the total count of all features for class y

Used database

I used the smsSpamCollection dataset to train my model, which can be accessed via the link below: https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection

Reviewing the results of the outputs of our trained model

The accuracy of our Naïve Bayes multinomial model is 99.01345291479821 % The Precision of our Naïve Bayes multinomial model is 97.88732394366197 % The Recall of our Naïve Bayes multinomial model is 94.5578231292517 %

We can use the confusion matrix to observe the performance of our model:

Steps

Import libraries
Upload dataset
Create the data frame
Split the data
Vectorize the data
Train & predict
calculate accuracy, precision, and recall
calculate the confusion matrix
Test the model with a new Sms/Email massage

More information is available in the Jupyter Notebook file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Multinomial naive Bayes Spam massages Identifier

what is Naive Bayes classifier

Let's learn more about the Multinomial naive Bayes classifier

Used database

Reviewing the results of the outputs of our trained model

Steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Multinomial naive Bayes Spam massages Identifier

what is Naive Bayes classifier

Let's learn more about the Multinomial naive Bayes classifier

Used database

Reviewing the results of the outputs of our trained model

Steps