A Virtually Syntax Free Practical Introduction to Web Scraping for Survey and Social Science Researchers

Course Date: July 18-19

Days: M-T (1:00pm-5:00pm)

This short course will offer a very practical introduction to data gathering geared at social scientists and survey researchers.  This course begins with an overview of web scraping discussing some basic technical jargon, types of web data and various methods for scraping.  The course also includes a discussion and illustration of Application Programming Interfaces (APIs) use for gathering web data when they are available.   Some websites are designed to be easily accessible by web crawlers or scraping algorithms while others require much more advanced, custom programming.  And some web data can be accessed using an API that is provided by the website.    In this course we will illustrate how participants can discern these differences as well as presenting several motivating examples of the various ways web scraped data can be used throughout a study’s lifecycle from design to calibration to analysis.  We provide an extensive introduction to a suite of freeware programs that allow virtually syntax free, but customizable, web scraping capabilities.  We contrast this type of gathered data access to APIs for some websites like Zillow or Twitter and discuss pros and cons of using web scraping or APIs to gather this type of web data. The course concludes with specific focus on the import.io tool where we demonstrate its capabilities and provide several, hands-on practical examples for participants to begin scraping several websites of increasing complexity.  We will also illustrate API calls in R for Zillow, the Census and others as time permits.


.5 course hour
Instructor: Trent Buskirk
Prerequisite: Having a trial Octoparse account set up (this is a 14 day free trial so please plan to have the license active during our course). Details can be found here: https://www.octoparse.com/ - click on Try free for 14 days.