Skip to content

Exploratory Data Analysis on traffic dataset to find out different trends in order to reduce traffic violations.

Notifications You must be signed in to change notification settings

shubamsumbria/traffic-violation-eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Traffic and Drugs Related Violations Exploratory Data Analysis

This is Python based Exploratory Data Analysis on traffic dataset to find out different trends in order to reduce traffic violations.

Language and Libraries

Seaborn Seaborn numpy cplusplus Seaborn

About Dataset:

This dataset contains around 65k+ traffic related violation records.

Attribute Information:

  1. stop_date - Date of violation
  2. stop_time - Time of violation
  3. driver_gender - Gender of violators (Male-M, Female-F)
  4. driver_age - Age of violators
  5. driver_race -Race of violators
  • White
  • Black
  • Hispanic
  • Asian
  • Other
  1. violation - Category of violation
  • Speeding
  • Moving Violation (Reckless driving, Hit and run, Assaulting another driver, pedestrian, improper turns and lane changes etc)
  • Equipment (Window tint violations, Headlight/taillights out, Loud exhaust, Cracked windshield, etc.)
  • Registration/Plates
  • Seat Belt
  • other (Call for Service, Violation of City/Town Ordinance, Suspicious Person, Motorist Assist/Courtesy, etc.)
  1. search_conducted - Whether search is conducted in True and False form
  2. stop_outcome - Result of violation
  3. is_arrested - Whether a person was arrested in True and False form
  4. stop_duration - Detained time for violators approx (in minutes)
  5. drugs_related_stop - Whether a person was involved in drugs crime (True, False)

Data Cleaning:

Checking Missing Values in Dataset:

  • As in the above graph, country_name and search_type columns contain almost all NaN values. we have to drop these two columns.
  • All other columns have almost similar patterns of missing values, we have to drop rows from these columns.
  • Some missing values are to remain in driver_age column. We have to fill these missing values using median.
  • After cleaning, we again have to check the remaining missing values.

  • Now our dataset looks perfect for further analysis.

Data Analysis:

  1. Age Distribution

  • We can observe it that both male and female drivers aged between 20 to 40 are doing maximum violations, while those above 16 are committing them minimally. It is also clear from the plot that the trend of violations and age group for one gender group follows that of other. This implies to the fact that the violations are independent of the gender of a person, obviously considering all other parameters constant.
  1. Distribution in Violation Type

  • Undoubtedly, the traffic violations, as per this dataset, occur the most because of speeding issues at a bar of 60.5% of all other reasons for violations.
  1. Hours in Which Speed Violated

  • It is observable from the plot that for the most frequent reason of violations, i.e. 'over- speeding', most violations occur during 8:00 am and 4:00 pm, while it is being lower for after midnight and late evenings.
  1. Hours in Which Vehicle Stopped

  • It is quite obvious that people violate traffic rules for multiple reasons, be it hurry or urgency, inadequate driving skills, etc. So, for a loose analysis, considering only 'hurry' as a reason for violations, we can deduce forth written from this plot. It is also noteworthy that this assumption isn't entirely irrational, because 60.5% of the violations occur because of over-speeding (refer second plot). Also, this plot is high in contrast to previous bar plot, pointing to the fact that most of the overall violations at almost every hour of a day are due to 'over-speeding'. Hence, statistically, we are covering for most of the violators.  From 10:00 pm in the night, until 2:00 am, a high number of violations occur! An explanation for this might be people returning home or travelling to parties, celebrations, etc. Another important observation can be the highest of all (on an average) violations between 8:00 am to 3:00 pm. A possibility for this may be the conventional work hours of public during these hours, and that there may arise several situations in this case for 'over- speeding'. One must also note the considerable dip in violations, particularly at 12:00 noon in this interval. Lastly, for all the hours in a day, females always have had far lesser violation cases than males. There can be many possibilities for this, a few of them being, there are lesser female drivers than male drivers, or the possibilities suggested above don't follow with both genders, OR maybe 'females' are just 'better drivers'!
  1. Traffic Violation Distribution Based on Race

  • Segregating the total violations into the race of the violators, it is clear from this plot that the white, black and Hispanic contribute together to almost 97% of the total violations. Among these people with white race background violate the most with a participation of 74.4% of total. A very obvious reason for this may be the population distribution among a distinct race of people, i.e. since there are most white, any dataset is prone to have more observations for violations from this category. Although, it must be kept in mind that there can be multiple other reasons for this trend as well, and this is mere empirical judgement.
  1. Age Group Involve in Drugs

  • People in their 20s, i.e. of age-group 20–30 are observed to be involved in drugs a lot more than those of any other age- group. This also explains the high number of violation records of this age group, as in first plot. This also gives weight to the fact that 'drugs' are an important element of the equation, and must be considered for traffic violation predictions.
  1. Stop Duration Based on Race

  • This mapping makes it quite obvious that people with Hispanic and/ or black race background are made to stop the most, for a potential violation case than any other race. Secondly, it is quite surprising that although white people have recorded to be violating traffic rules the most; they aren't stopped comparatively enough.
  1. Correlation Heatmap

  • This figure represents dependency of 'drug- based' cases with searches conducted for various types of violation types or reasons. As is followed in all the violation types, most of the total searches conducted do not turn out to be of people involved in drugs. Although a small proportion of them fall in 'drug- involvement category'. It is interesting to note that this relation is independent of the type of traffic violation committed.
  1. Total Search Conduct vs. Drug Related Stop

  • This bar plot also follows a similar trend to that of previous figure, except only the factorisation of total searches committed into time (yearly) in one and into violation type in the other.
  1. Arrested vs. Not Arrested (Before and After Search Conduct)

About

Exploratory Data Analysis on traffic dataset to find out different trends in order to reduce traffic violations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published