You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
all-caps: the number of words with all characters in upper case;
clusters: presence/absence of tokens from each of the 1000 clusters (provided by Carnegie Mellon University's Twitter NLP tool);
elongated words: the number of words with one character repeated more than 2 times, e.g. 'soooo';
emoticons:
presence/absence of positive and negative emoticons at any position in the tweet;
whether the last token is a positive or negative emoticon;
hashtags: the number of hashtags;
negation: the number of negated contexts. A negated context also affects the ngram and lexicon features: each word and associated with it polarity in a negated context become negated (e.g., 'not perfect' becomes 'not perfect_NEG', 'POLARITY_positive' becomes 'POLARITY_positive_NEG');
POS: the number of occurrences for each part-of-speech tag;
punctuation:
the number of contiguous sequences of exclamation marks, question marks, and both exclamation and question marks;
whether the last token contains exclamation or question mark;
sentiment lexicons: automatically created lexicons (NRC Hashtag Sentiment Lexicon, Sentiment140 Lexicon), manually created sentiment lexicons (NRC Emotion Lexicon, MPQA, Bing Liu Lexicon). For each lexicon and each polarity we calculated:
total count of tokens in the tweet with score greater than 0;
the sum of the scores for all tokens in the tweet;
the maximal score;
the non-zero score of the last token in the tweet;
The lexicon features were created for all tokens in the tweet, for each part-of-speech tag, for hashtags, and for all-caps tokens.
word ngrams
character ngrams.
The text was updated successfully, but these errors were encountered:
See: NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets
The lexicon features were created for all tokens in the tweet, for each part-of-speech tag, for hashtags, and for all-caps tokens.
The text was updated successfully, but these errors were encountered: