Identification of Cyber Bullying using Hybrid Approach in Sentiment Analysis

This project was part of my Undergraduate Thesis. This was a cause that was important to all of us, due to the rising number of death threats and rape threats received by people voicing their opinion on social media. There are also a large number of youngsters present online, who might be more prone to depression and anxiety. As free speech is also a democratic right, we tried to segregate the tweets that are extremely negative and are targeted at a specific person, and which can be flagged down for removal.

The research paper was presented and published with IEEE Xplore

Description of all functions

data_collection.py uses the Tweepy API to extract tweets as per the filter entered by the user (eg. timeline, geography, etc)
data_preprocessing.py works on cleaning and segmenting the data into text and emoticons
knowledge_algo.py uses a lexicon for opinion mining called, SentiWordNet, to assign polarity to each tweet. A separate database assigns polarity to the emoticons.
ml_algo.py uses a Bagging Classifier including Naive Bayes and Linear Support Vector Machine to assign a polarity of either 0 or 1 to a tweet,thereby reaffirming the value given by SentiWorkNet. The trained dataset is stored as a pickle file.
main.py calls all the functions to iterate over a series of tweets.
accuracy.py uses a Confusion Matrix to calculate the accuracy of the algorithm "With Nouns". The classification algorithm is 70.3% accurate.

Architecture Diagram

Our framework consists of three main steps, which includes knowledge based sentiment analysis, whose result is then reinforced with machine learning based analysis. The resulting polarity is then aggregated with the polarity obtained from the emoticons, which segregates the tweets into varying levels of positive and negative text.

Comparing Sentiments of Knowledge-Based Approach and Machine Learning Approach.

The Resulting Confusion Matrix

Final Classfication

Using the Knowledge-Based approach allowed us to obtain the "Degree of Negativity" of tweets. Cyber Bullying Tweets were more negative than the rest, and were labelled "Level 1 Neg" tweets, where the Negative Tweets ranged from Level 1 to Level 4. The resulting classfication is presented below, which states that 0.1% of tweets are Cyber Bullying Tweets.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
Published IEEE Paper.pdf		Published IEEE Paper.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identification of Cyber Bullying using Hybrid Approach in Sentiment Analysis

Description of all functions

Architecture Diagram

Final Classfication

About

Releases

Packages

Languages

reeya26/Identification-of-Cyber-Bullying-using-Hybrid-Approach

Folders and files

Latest commit

History

Repository files navigation

Identification of Cyber Bullying using Hybrid Approach in Sentiment Analysis

Description of all functions

Architecture Diagram

Final Classfication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages