Skip to content

maganaluis/data-cleaning-project

Repository files navigation

Data Cleaning Project

Usage:

To replicate the Data Cleaning workflow you can open the jupyter notebook, and run all cells. The notebook uses standard anaconda packages to clean the data, so given that you have an Python Anaconda distribution, you should be able to run it. The notebook will create farmers.db which you can load into sqlite3 and this should contain the clean Farmers dataset.

For more information in the data cleaning workflow you can go trough the notebook preview which explains all steps. To get a information on this assesment you can read the InitialAssesment.pdf provided in the repository.

The python file workflow.py it is also provided to create CleanFarmers.csv directly, and this can be run with in the command line but will not generate the YesWorkFlow graph, to create this you'll need to follow the YesWorkFlow instructions: https://github.com/yesworkflow-org/yw-prototypes

Requirements:

  • Python 3.7 (Anaconda Distribution)
  • YesWorkFlow Binaries

About

Data Cleaning Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published