Skip to content

Valentin-Golyonko/Coursera_Capstone

Repository files navigation

Coursera_Capstone

Coursera 'Applied Data Science Capstone' project

1. Introduction and Business Problem:

The task:

Is it worth the city guests to spend money on rental housing in the city center, or can they save money and choose a different area of the city ?!

To do this, i need to :

  • find the right area for life with the most public places where you can spend time.
  • cluster the districts of the city of Minsk by the uniqueness of places using the Foursquare API.

This data should help city guests to find areas and find the right place to stay.

2. Data section:

  • Foursquare venues data - 4049;
  • the distance from the center of each neighborhoods is 858 meters - this is half of the average distance between the centers of the neighborhoods which is calculated from their coordinates;
  • number of neighborhoods - 122 - post offices of the city;
  • coordinates of each post office - 122 - data collected using the Google GEO API and geopy library.

3.1 Work with data:

  • 122 post offices were parsed from the html page using Beautifulsoup.
  • See attached file - find_post_offices notebook;
  • zip_codes_minsk_list
  • Using Foursquare API, 4049 points were collected, after filtering unnecessary data, 2187 points remained, which give 310 unique points (places).
  • See attached file - minsk_venues notebook;
  • Duplicate data and noise points like [Trail, Bus Stop, Bus Station, Moving Target, Bus Line, Platform] were excluded;
  • minsk_venues
  • I used the k-means clustering method, the number of clusters is 5;
  • minsk_venues
  • Each cluster contains the top 10 most common locations in the area;
  • The number of elements in each cluster slightly changed, here are the limits of change:
    • cluster 1: 80 - 110;
    • cluster 2: 7 - 20;
    • cluster 3: 2 - 8;
    • cluster 4: up to 2;
    • cluster 5: up to 2.

4.1 Intermediate conclusions:

  • The most common places are:
    • Coffee Shop, Gym, Department Store, Food & Drink Shop;
  • Unexpected result: before filtering, in the top 3 there was a 'Bus Stop' in each cluster! It is nice to confirm that public transport is very developed in the city!
  • After many iterations, it turned out that fluctuations in cluster sizes are due to a small amount of data.
  • Free account Foursquare API does not allow you (50 request limit) to collect more data about venues: likes, ratings, tier prince. Аnd this is critical for this research!
  • It can be concluded that k-means clustering is not enough to solve the busines problem.

3.2 Work with additional data:

  • I found a way to increase the amount of data I need.
  • Minsk_flats_data notebook.
  • This is the price of apartments in different parts of the city.
  • I parsed 21 pages of a some local site and found around 900 rental offers (everything was done purely for scientific purposes).
  • The data was: the price of apartments, the number of rooms and their address.
  • Using Google GEO API I found their coordinates.
  • Also I created geojson for each district of the city with their names. Combined these names with apartment data.
  • minsk_flats_dataframe
  • I averaged the price of each apartment to the rental price as for a one-room apartment.
  • This is how it looks on the map:
  • minsk_avg_flat_price

4.2 Final results:

  • Final result notebook
  • final_result
  • Round colored dots - clusters with top 10 common spots within a radius of 860 meters from them.
  • Blue lines - two metro (subway) lines.

The cluster with the largest number of venue points was distributed in the city center and along the metro lines. So, all activities, shops, cafes or restaurants are here, in this locations.

As you can see, city guests can find some places to stay not far from the city center with good apartments prices.

The price indicated on the graphs (and some tables) is the average price for one-room apartments per month!

About

Coursera 'Applied Data Science Capstone' project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published