Hey guys. So I was playing around using Twitter API to stream-listen random tweets and I thought of takings it further by stream-listening tweets as regards 2019 presidential election in Nigeria with focus on ‘Buhari’ and ‘Atiku’. Thereafter, I carried out sentiment analysis and basic exploratory data analysis.
I will keep things really simple by just explaining the whole workflow and my result (Hey! It is basic!!!).
So before I start with what I did, I will like to mention that this is a flawed analysis, as I stream-listened just 3 times within 2 weeks with a total time of 4 hours. Also, the majority of Nigerian voters are not on twitter, so to say, so an analysis using tweets is not representative of the Nigerian population. Last, some tweets contained both names of ‘Buhari’ and ‘Atiku’, making it difficult to classify.
The first thing that most likely comes to mind is ‘How is this possible?’. Anyway thanks to Twitter for approving my developer account after much scrutiny, and giving me limited access to stream tweets from Twitter’s API(Application Program Interface) using Python programming language.
Below are the python modules I used in achieving everything I did:
- Tweepy: Has the StreamListener library for real-time stream-listening.
- Textblob: Has TextBlob library to carry out sentiment analysis based on subjectivity and polarity(ie how bad or good is the tweet, 1 means totally positive, -1 means totally negative and 0 means neutral).
- Dataset: Used for saving the tweets to an SQLite database, as SQLite is supported in python without 3rd party installation(s).
- Data freeze: Has freeze library for converting the database contents into the comma separated values saved in a csv file.
- Pandas: Let us just call it the Microsoft Excel of python, for data wrangling and basic analysis.
- Matploblib: Has Plotly library use for data visualization.
- Seaborn: A powerful easy-to-use data visualization module built on Matplotlib.
The stream-listening was done with the help of tweepy’s library called StreamListerner, this allowed the real time streaming of tweets containing words such as ‘buhari’, ‘pmb’, ‘atiku’, and ‘abubakar’. You can use other terms such as 2019presidentialdebate etc.
Then, the dataset module(hey it is more than this, but…) inserted some of the basic information of each tweets into the SQlite database in real time as well as the subjectivity and polarity of the tweets performed by the textblob, a library in TextBlob. After which the datafreeze module converts the content of the database into a comma separated value with the help of the freeze library.
BASIC EXPLORATORY DATA ANALYSIS:
Oh boy! Sorry, the article is this long but this is the fun part!!!
So the csv file containing information such as the text, the user age, user location, subjectivity, polarity etc, was read into a pandas DataFrame using pandas module, and a new column containing the names of the candidate on each tweet was created from the text column, as shown below;
I was interested in knowing who had the highest mentions in tweets between Buhari and Atiku, so I checked this using value_counts() on the candidate column and plotting this result on a bar-chat, as shown in the result below;
President Buhari had the highest number of tweets having his name, this could be bad or good, as we will later discover.
I also checked the users’ location of the persons that tweeted the most about President Buhari and Atiku Abubakar, and most resided in Lagos, Nigeria, as expected.
I was prompted to check how long the Twitter users’ accounts that tweeted about either Buhari or Atiku joined Twitter. This is to check for fraudulent accounts created just for the purpose of election to make or mar a particular candidate, I got the result below;
The cruise of my analysis was to check the average sentiment about each candidate, though it seems to be similar, if we should re-scale it by a factor 100(because sentiments are scored between -1 and 1), there would be a significant difference. The higher the average sentiment of a candidate the better the tweets about the candidate. This is shown below;
As shown above, Atiku had a clear higher average sentiment polarity with lesser average subjectivity (average polarity of 0.047073 and average subjectivity of 0.255).
Meanwhile Buhari had lesser average polarity (average polarity of 0.029322), though with a higher number of tweets, showing most tweets about him were negative and more subjective(average subjectivity of 0.30).
Thanks for reading.
Feel free to comment and criticize.
Analysis done and prepared by Ighodaro Emwinghare, Data Analyst