GitHub - iifeoluwa/hn-scraper: Scrape HackerNews

HackerNews Scraper

Description

This project crawls the HackerNews website and scrapes data about the current top stories. The scraped stories are then written to STDOUT in JSON format.

HackerNews provides an API that enables clients consume information about the top posts. For our use case though, consuming the API would have proved inefficient because, in the worst case scenario we would need to make 100+ network requests to fetch the top 100 stories.

This solution makes a maximum of 4 network requests, as opposed to 100+ API calls it would have taken to fetch the top 100 posts with the HackerNews API.

How To Run.

Download and install Node.js here. Skip this step if you already have Node installed.
Download and install Git here. Skip this as well if you have Git already installed on your computer.
Open a command line window from a newly created folder and run the following command;

git clone https://github.com/iifeoluwa/hn-scraper.git .

From the same command line window, run npm install -g

After completing the steps above, you can run the tool from any command line window using hackernews. It also accepts a --posts argument that specifies the number of stories it should return.

To run tests, run npm test from the project directory.

Sample Usage

hackernews --posts 1

// Writes to STDOUT
[ { title: 'Lambda School Announces $14M Series A Led by GV',
    uri: 'https://lambdaschool.com/blog/lambda-school-announces-14-million-series-a-led-by-gv/',
    author: 'tosh',
    points: '31',
    comments: '17',
    rank: '1' } ]

Libraries Used

The following libraries were used to create this tool;

Got: A lightweight HTTP request library. Used this because the project required making simple GET requests, and it is one of the lightest, actively maintained library for making HTTP requests.
Cheerio: Cheerio was used to parse the HTML document and extract the needed data from the file. It provides an expressive API that makes it easy to find specific information in documents.
Minimist: Parses the arguments passed to hackernews tool. Makes it easier to handle and validate inputs.
joi: Tool used to enforce validation rules and ensure only validated stories are retrieved.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
tests/lib		tests/lib
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HackerNews Scraper

Description

How To Run.

Sample Usage

Libraries Used

About

Releases

Packages

Languages

iifeoluwa/hn-scraper

Folders and files

Latest commit

History

Repository files navigation

HackerNews Scraper

Description

How To Run.

Sample Usage

Libraries Used

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages