Skip to content

Web scraping is an automated method used to extract large amounts of data from websites. ... Web scraping helps collect these unstructured data and store it in a structured form. There are different ways to scrape websites such as online Services, APIs or writing your own code.Here we are using coding method to scrape the data from various websi…

Notifications You must be signed in to change notification settings

sangwanamit621/NewsScrapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NewsScrapper

What is Web Scraping?

Web scraping refers to the extraction of data from a website.This information is collected and then exported into a format that is more useful for the user.

Libraries Used

  1. Beautiful Soup : Beautiful Soup provides simple methods for navigating, searching, and modifying a parse tree in HTML, XML files. It transforms a complex HTML document into a tree of Python objects. It also automatically converts the document to Unicode, so you don’t have to think about encodings. This tool not only helps you scrape but also to clean the data.

    To install Beautiful Soup run the following command in your conda environment : pip install beautifulsoup4

  2. Flask : Flask is Python’s micro-framework for web app development.Flask consists of Werkzeug WSGI toolkit and Jinja2 template engine.Web Server Gateway Interface (WSGI) is the standard for Python web application development and Jinja 2 renders the web pages for the server with any specified custom content given to it by the webserver. Flask renders its HTML based templates using Jinja 2.

    To install Flask run the following command in your conda environment : pip install Flask

About the Project

In this project , News are Scrapped from 3 different websites and displayed on our own designed html page.

The three websites used for scrapping news are :

  1. Hindi News website - AmarUjala
  2. Cricket website - Cricbuzz
  3. Technology News website - Gadget360

After Scrapping the news from websites these news are displayed on html page.

This project can be deployed on Cloud Platforms like Heroku , AWS , GCP , Azure etc. In our case, App was deployed on Heroku cloud platform and runs on this url https://news-scapper-site.herokuapp.com/

About

Web scraping is an automated method used to extract large amounts of data from websites. ... Web scraping helps collect these unstructured data and store it in a structured form. There are different ways to scrape websites such as online Services, APIs or writing your own code.Here we are using coding method to scrape the data from various websi…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages