COVID19 pandemic has wreaked havoc on the lives of almost everyone on this planet. In this report, we bring the impact of COVID19 vaccines and therapeutics on the population by analyzing the tweets. Because Covid-19 is so prevalent and is associated with major news stories almost weekly we are providing these reports on a weekly basis.
Data collection
Tweets are collected daily. For each topic, a list of related search terms was generated. For example, the topic “Pfizer vaccine” will also search for Comirnaty. At this moment, the tweets do not have to contain Covid-19 to be included, given the outsized presence Covid-19 has in discussions. Not all vaccines and therapeutics were included in this search. Likewise, the search list does not produce a comprehensive list of all tweets discussing the selected Covid-19 topics. The purpose of this data is to demonstrate the quality and quantity of data available in unstructured (text) data and visualize changes over time for the selected topics. Data limitations: The search is primarily English-language sources.
Sentiment analysis
At the time of data collection, the text of each tweet is cleaned and analyzed using Python’s NLTK library. The sentiment is determined using the VaDer algorithm, which was originally trained using Twitter data.
Word-level analysis.
An implantation of TDF-IF using SQL is applied to identify prominent words in tweets that were scored either negative, positive, or neutral within each Covid-19 vaccine or therapeutic topic.
Side effect classifier
10,000 tweets were reviewed for the classifier. Texts were tagged if it was determined that someone was describing a side effect/adverse reaction in the first person.