Skip to content

Identifying of Potential Cyber Bullying Tweets using a Hybrid approach of Sentiment Analysis

Notifications You must be signed in to change notification settings

reeya26/Identification-of-Cyber-Bullying-using-Hybrid-Approach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Identification of Cyber Bullying using Hybrid Approach in Sentiment Analysis

This project was part of my Undergraduate Thesis. This was a cause that was important to all of us, due to the rising number of death threats and rape threats received by people voicing their opinion on social media. There are also a large number of youngsters present online, who might be more prone to depression and anxiety. As free speech is also a democratic right, we tried to segregate the tweets that are extremely negative and are targeted at a specific person, and which can be flagged down for removal.

The research paper was presented and published with IEEE Xplore

Description of all functions

  • data_collection.py uses the Tweepy API to extract tweets as per the filter entered by the user (eg. timeline, geography, etc)
  • data_preprocessing.py works on cleaning and segmenting the data into text and emoticons
  • knowledge_algo.py uses a lexicon for opinion mining called, SentiWordNet, to assign polarity to each tweet. A separate database assigns polarity to the emoticons.
  • ml_algo.py uses a Bagging Classifier including Naive Bayes and Linear Support Vector Machine to assign a polarity of either 0 or 1 to a tweet,thereby reaffirming the value given by SentiWorkNet. The trained dataset is stored as a pickle file.
  • main.py calls all the functions to iterate over a series of tweets.
  • accuracy.py uses a Confusion Matrix to calculate the accuracy of the algorithm "With Nouns". The classification algorithm is 70.3% accurate.

Architecture Diagram

Our framework consists of three main steps, which includes knowledge based sentiment analysis, whose result is then reinforced with machine learning based analysis. The resulting polarity is then aggregated with the polarity obtained from the emoticons, which segregates the tweets into varying levels of positive and negative text.

Screen Shot 2022-01-25 at 1 22 06 PM

Comparing Sentiments of Knowledge-Based Approach and Machine Learning Approach. Screen Shot 2022-01-25 at 1 14 09 PM

The Resulting Confusion Matrix

Screen Shot 2022-01-25 at 1 15 57 PM

Final Classfication

Using the Knowledge-Based approach allowed us to obtain the "Degree of Negativity" of tweets. Cyber Bullying Tweets were more negative than the rest, and were labelled "Level 1 Neg" tweets, where the Negative Tweets ranged from Level 1 to Level 4. The resulting classfication is presented below, which states that 0.1% of tweets are Cyber Bullying Tweets.

Screen Shot 2022-01-25 at 1 16 26 PM

About

Identifying of Potential Cyber Bullying Tweets using a Hybrid approach of Sentiment Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages