Skip to content

Visualize the RMS Titanic dataset in a Neo4j graph database.

License

Notifications You must be signed in to change notification settings

chrisammon3000/neo4j-titanic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributors Forks Stargazers Issues MIT License LinkedIn

neo4j-titanic

Simple data pipeline for the RMS Titanic dataset using Python and Pandas library for preprocessing. The data is loaded into a Neo4j graph database for exploration and analysis.

-- Project Status: [Active]

Project Description

The pipeline fetches the RMS Titanic dataset, cleans, preprocesses it then loads it into a Docker Neo4j instance where relationships between passengers and other entities such as other passengers, lifeboats, cabins, and other data can be visualized, analyzed and explored.

Neo4j Browser

Run Pipeline

Ensure that the Pandas library and Docker are installed. To run the pipeline, clone the repo and run:

make graph

This will perform the following steps:

  • Fetch the data from a URL
  • Process it and save it to the ./data folder
  • Pull a Neo4j Docker image and run it
  • Load the processed data using the create_db.cyp file

To explore the database, navigate to the Neo4j Browser and run any Cypher query. For info on using Cypher please visit the Cypher Basics at Neo4j.

When finished, make clean_up will stop Neo4j, remove the container and clean up cache files.

Sources

A complete Titanic dataset is available from https://data.world/nrippner/titanic-disaster-dataset.

Authors

Gregory Lindsey

License

This project is licensed under the MIT License - see the LICENSE file for details