Webpage: emoji.enricmor.eu
The tweets have been collected using the GetOldTweets-python fork that includes emoji support. The Python script bypass some limitations of Twitter Official API like accessing old tweets and requests limit.
The script used to download the tweets is composed by the following parameters:
python3 Exporter.py --lang "en" --querysearch "🍎" --since 2014-02-03 --until 2014-02-04 &
- lang "en": Filters the language of the tweets. English is the language selected to filter the tweets.
- querysearch "🍎": The text or emoji to be collected.
- since/until: tweet's date range. A one day range has been used for this project.
The scraping speed is around 3.7 million tweets per hour when running the script in parallel. Specifically, one instance of the script has been used for each day and for each emoji.
In terms of accuracy, the scrapper miss some tweets and missclassify the language of some tweets in other languages as English. However, the data extracted provide good insights in terms of the emoji frequency.
The data obtained has the following structure:
"username","date","retweets","favorites","text","geo","mentions","hashtags","id","permalink","emoji"
However, only the date
and emoji
columns are used for this project.
The processed data is composed by 2405 values with the daily usage of each emoji over the years. To smoothly represent the data in the browser the Largest-Triangle-Three-Buckets (LTTB) downsampling algorithm is applied to reduce it to 50 data points. The downsampled data keep the maximums/minimums while the data spacing is reasonable.
In order to visualize the data, Chart.js has been used. Chart.js is a Javascript library to create highly customizable interactive graphs on the browser.
The following plugins have been used to customize the charts:
The following examples are used in the website:
- https://codepen.io/fielding/pen/wYPRjj
- https://codepen.io/chrisgannon/pen/yjzPEO
- https://codepen.io/knyttneve/pen/EBNqPN
- Total tweets: 3,015,922,953
- Dataset size: 798GB
- Tweets scrapped per hour: 3.7 million (aprox)
There are some similar projects involving tweets and emojis that I used as a source of inspiration, specially Ribbonline from my friend Dani Balcells.
Other related projects are: Emojitracker and Twitter Emoji Race