How to Scrape the Web and Analyze Twitter Data like a Pro

This two-day course provides an overview and guided exploration of cutting edge methods for collecting and analyzing web data. The course is divided into two parts. Part I provides a crash course (introduction through guided examples) in collecting, wrangling, and analyzing unstructured (via scraping) and structured (via APIs) web data. Guided and hands-on examples will focus on various digital media (e.g., New York Times API) and public opinion (e.g., Reddit) data sources and will include analytical examples of text mining and machine learning. Part II of the course provides a deep-dive into Twitter data, including an extensive overview of the popular {rtweet} package (by the package author himself) and numerous guided examples and walk-throughs featuring cutting edge techniques in sentiment analysis, neural networks, bot detection, network analysis, etc.

Programming language: R
Software libraries: httr, rvest, healthforum*, rtweet*, tweetbotornot2*, congress116, xgboost, igraph, quanteda, text2vec, textfeatures*, wactor*, tidyverse, data.table
*Authored or co-authored by course instructor

SurvMeth 988.217 (1 credit hour)
Instructor: Mike Kearney