A Virtually Syntax Free Practical Introduction to Web Scraping for Survey and Social Science Researchers

This short course will offer a very practical introduction to web scraping geared at social scientists and survey researchers.  This course begins with an overview of web scraping discussing some basic technical jargon, types of web data and various methods for scraping.  Some websites are designed to be easily accessible by web crawlers or scraping algorithms while others require much more advanced, custom programming.  In this course we will illustrate how participants can discern these differences as well as presenting several motivating examples of the various ways web scraped data can be used throughout a study’s lifecycle from design to calibration to analysis.  We provide an extensive introduction to a suite of freeware programs that allow virtually syntax free, but customizable, web scraping capabilities.  The course concludes with specific focus on the import.io tool where we demonstrate its capabilities and provide several, hands-on practical examples for participants to begin scraping several websites of increasing complexity.


SurvMeth 988.204-A (.5 credit hour)
Instructor: Trent Buskirk
Prerequisite: To take this class for UM credit you must take SurvMeth 988.204-A and 988.204-B, An Introduction to Big Data and Machine Learning for Survey Researchers and Social Scientists for a total of 1.0 credit hour.Having a trial import.io account set up (this is a 7 day trial so please plan to have the license active during our course).  Details can be found here: https://www.import.io/signup/.