Skip to content

wanzhuz/web-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Web Scraping

Overview: Scrape questions, answers, comments, and metadata from StackOverflow, specifically questions about R, starting at the URL.

  • start at this first page of the search results on stackoverflow
  • get the links to the questions on this page
  • get the next page of results and the links to the questions on that
  • process the questions on the first 3 and last page of results of the search results, fetching 50 results/questions per page

Task

The goal is to explore the source of the HTML pages for the search results to find HTML structures identifying the elements of interest described below:

  • the number of views of the question
  • the number of votes
  • the text of the question
  • the tags for the question
  • when the question was posted
  • the user/display name of the person posting the question, their reputation, and how many gold, silver, and bronze badges they have
  • who edited the question and when
  • for each answer/comment, find: the text, the person who posted, when they posted, and their reputation and badge information

Releases

No releases published

Packages