A Virtually Syntax Free Practical Introduction to Web Scraping for Survey and Social Science Researchers

This short course will offer a very practical introduction to data gathering geared at social scientists and survey researchers.  This course begins with an overview of web scraping discussing some basic technical jargon, types of web data and various methods for scraping.  The course also includes a discussion and illustration of Application Programming Interfaces (APIs) use for gathering web data when they are available.   Some websites are designed to be easily accessible by web crawlers or scraping algorithms while others require much more advanced, custom programming.  And some web data can be accessed using an API that is provided by the website.    In this course we will illustrate how participants can discern these differences as well as presenting several motivating examples of the various ways web scraped data can be used throughout a study’s lifecycle from design to calibration to analysis.  We provide an extensive introduction to a suite of freeware programs that allow virtually syntax free, but customizable, web scraping capabilities.  We contrast this type of gathered data access to APIs for some websites like Zillow or Twitter and discuss pros and cons of using web scraping or APIs to gather this type of web data. The course concludes with specific focus on the import.io tool where we demonstrate its capabilities and provide several, hands-on practical examples for participants to begin scraping several websites of increasing complexity.  We will also illustrate API calls in R for Zillow, the Census and others as time permits.

 

SurvMeth 988.204-A (.5 credit hour)
Instructor: Trent Buskirk
Prerequisite: To take this class for UM credit you must take SurvMeth 988.204-A and 988.204-B, An Introduction to Big Data and Machine Learning for Survey Researchers and Social Scientists for a total of 1.0 credit hour.Having a trial import.io account set up (this is a 7 day trial so please plan to have the license active during our course).  Details can be found here: https://www.import.io/signup/.