Skip to content

wake-ua/OpenDataCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn


Logo

A tool to craw data to your projects from open data portals
Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Roadmap
  5. Contributing
  6. License
  7. Contact

About The Project

Open Data Crawler is a tool to extract data from open data portals and statistics portals. The community can contribute adding support to other data portals or adding new features.

Features:

  • Download datasets from open data portals or statsitics portal
  • Download metadata from resources
  • Filter by data type
  • Filter by topic

(back to top)

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

Requirements

  • You need python 3.9 installed

  • Clone the repo

    git clone https://github.com/aberenguerpas/opendatacrawler.git
  • Move to root directory

    cd opendatacrawler
  • Install the requirements from requirements.txt

    pip3 install -r requirements.txt
  • Socrata portals requiere an app token to avoid throttling limits, you can obtain an api key here and set on config.ini

Installation

  1. Run from the project root
    python3 setup.py install 

(back to top)

Usage

Use this tool is very simple, you only need to specify the data source and the tool automatically detect the portal type and starts to dowload the data.

Examples

Dowload all data from a portal:

python opendatacrawler -d https://data.smartdublin.ie/

Dowload all data with their metadata:

python opendatacrawler -d https://data.smartdublin.ie/ -m

Dowload partial dataset (first 50 lines for csv files):

python opendatacrawler -d https://data.smartdublin.ie/ -pd

Dowload specific fromat data. For example xls and csv:

python opendatacrawler -d https://data.smartdublin.ie/ -t xls csv

Dowload specifics categories. For example xls and csv:

python opendatacrawler -d https://data.smartdublin.ie/ -c tourism transport

Help with all posible commands:

python opendatacrawler -h

For more examples, please refer to the Documentation

(back to top)

Currently supported portals and sites

* Works with restrictions or download limitations

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion or add site/portal support that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Add support to other portal

  1. Create a file with the name of the portal + crawler Ex. examplecrawler.py inside the folder opendatacrawler
  2. Create a class ExampleCrawler who inherits from OpenDataCrawlerInterface
  3. The class must contain at least the functions get_package_list() and get_package() Check the descriptions of the functions on the opendatacrawlerInterface.py
  4. You can also use or add some functions to utils.py
  5. Add in the function detect_dms() on odcrawler.py a way to detect the site you want to add.

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Colaborators

Contact

🙋‍♂️Alberto Berenguer Pastor
📱@aberenguerpas
✉️ [email protected]

(back to top)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages