Tweet Analyzer

Examine and analyze public sentiment on Twitter following occurrence of a major event.


Hurricane Irene


  1. Collect all public tweets that are geotagged within a time period of 5 days following the the event and filtering for those that contain the "Irene" query term.

    Note: Twitter’s privacy setting has changed to block people from readily downloading public tweets, so we had to write custom code to scrape all tweets from within a certain time frame and then to parse all those tweets based on whether they contained our chosen keyword.

  2. Assign a sentiment (1-positive or 0-negative) to each tweet, based on all of the words it contains

  3. Aggregate tweets by the state with the closest geographic center based on longitude and latitude

Find overall talkativeness:
Return the percentage of tweets containing a given term against all tweets in a given time span.

Find most talkative state:
eturn the state containing the most tweets containing a given term.

Find public sentiment:
Find the cosine similarity of the body of tweets when run against two documents of positive and negative words respectively, and return the one with a higher cosine value
Return (1 - positive) or (0 - negative) depending on whether the overall public sentiment about the event was more positive or negative

Download the full project write-up here

github + code

Twitter database scraper & parser: