Skip to content

weekysui/School-district-and-funding-report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

PySchool Analysis:

  • The school size and spending per student seems correlate. The larger the school size the lower passing rates and mean scores.
  • Charter school has better performing than the district school. The top 5 perfroming schools are charter schools, and the bottom 5 are the district schools.
  • Accoring to the math scores by grade and the reading scores by grade, you can see their mean scores stay consistent across grades.

District summary

# import files, create dataframe
import pandas as pd
csv_read = pd.read_csv("raw_data/schools_complete.csv")
school_df=pd.DataFrame(csv_read)
csvRead = pd.read_csv("raw_data/students_complete.csv")
student_df=pd.DataFrame(csvRead)
#rename school_df name to school:
school_df = school_df.rename(columns={"name":"school"})
school_df.head
<bound method NDFrame.head of     School ID                 school      type  size   budget
0           0      Huang High School  District  2917  1910635
1           1   Figueroa High School  District  2949  1884411
2           2    Shelton High School   Charter  1761  1056600
3           3  Hernandez High School  District  4635  3022020
4           4    Griffin High School   Charter  1468   917500
5           5     Wilson High School   Charter  2283  1319574
6           6    Cabrera High School   Charter  1858  1081356
7           7     Bailey High School  District  4976  3124928
8           8     Holden High School   Charter   427   248087
9           9       Pena High School   Charter   962   585858
10         10     Wright High School   Charter  1800  1049400
11         11  Rodriguez High School  District  3999  2547363
12         12    Johnson High School  District  4761  3094650
13         13       Ford High School  District  2739  1763916
14         14     Thomas High School   Charter  1635  1043130>
# District Summary
total_school = school_df["school"].count()
#total students: merge 2 dataframe:
merge_df = pd.merge(school_df,student_df,on = "school")
total_students = merge_df["name"].count()
#total budget:
total_budget = school_df["budget"].sum()
#average math score, average reading score:
avg_math=merge_df["math_score"].mean()
avg_reading = merge_df["reading_score"].mean()
#%pass math, %pass reading
student_math= merge_df.loc[merge_df["math_score"]>=70]
pass_math = len(student_math)/total_students *100
student_reading = merge_df.loc[merge_df["reading_score"]>=70]
pass_reading = len(student_reading)/total_students *100
#overall passing rate
student_overall = merge_df.loc[(merge_df["math_score"]>=70)&(merge_df["reading_score"]>=70)]
overall= len(student_overall)/total_students *100

district_df = pd.DataFrame({"Total Schools":[total_school],
                           "Total Students":[total_students],
                           "Total Budget":[total_budget],
                           "Average Math Score":[avg_math],
                           "Average Reading Score":[avg_reading],
                           "% Passing Math":[pass_math],
                           "% Passing Reading":[pass_reading],
                           "% Overall Passing Rate":[overall]})
district_df = district_df[["Total Schools",
                           "Total Students",
                           "Total Budget",
                           "Average Math Score",
                           "Average Reading Score",
                           "% Passing Math",
                           "% Passing Reading",
                           "% Overall Passing Rate"]]
#format
district_df["Total Students"]=district_df["Total Students"].map("{:,}".format)
district_df["Total Budget"]=district_df["Total Budget"].map("${:,}".format)
district_df["Average Math Score"]=district_df["Average Math Score"].map("{:.2f}".format)
district_df["Average Reading Score"]=district_df["Average Reading Score"].map("{:.2f}".format)
district_df["% Passing Math"]=district_df["% Passing Math"].map("{:.2f}%".format)
district_df["% Passing Reading"]=district_df["% Passing Reading"].map("{:.2f}%".format)
district_df["% Overall Passing Rate"]=district_df["% Overall Passing Rate"].map("{:.2f}%".format)

district_df
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Total Schools Total Students Total Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
0 15 39,170 $24,649,428 78.99 81.88 74.98% 85.81% 65.17%

School Summary

# set school as index in school_df, get the total budget, and type of schools
school_index = school_df.set_index("school")
sch_budget = school_index["budget"]
sch_type = school_index["type"]
# budget per student
student_budget = sch_budget / school_index["size"]

# set school as index, and then group the school for merge_df
index = merge_df.set_index('school')
grouped = index.groupby(['school'])
# avg math score, avg reading score by school
mean_math = grouped["math_score"].mean()
mean_reading = grouped["reading_score"].mean()
total_stu = grouped["name"].count()
#pass math
grouped_math = student_math.groupby("school")
pass_math_percent = grouped_math["name"].count() / total_stu *100
#pass reading
grouped_reading = student_reading.groupby("school")
pass_reading_percent = grouped_reading["name"].count() / total_stu *100

#overall passing
grouped_overall = student_overall.groupby("school")
overall_passing = grouped_overall["name"].count() / total_stu *100

sch_summary=pd.DataFrame({"School Type":sch_type,
                         "Total Students":total_stu,
                         "Total School Budget":sch_budget,
                         "Per Student Budget":student_budget,
                         "Average Math Score":mean_math,
                         "Average Reading Score":mean_reading,
                         "% Passing Math":pass_math_percent,
                         "% Passing Reading":pass_reading_percent,
                         "% Overall Passing Rate":overall_passing})
sch_summary = sch_summary[["School Type",
                           "Total Students",
                           "Total School Budget",
                           "Per Student Budget",
                           "Average Math Score",
                           "Average Reading Score",
                           "% Passing Math",
                           "% Passing Reading",
                           "% Overall Passing Rate"]]

sch_summary.head()


<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
Bailey High School District 4976 3124928 628.0 77.048432 81.033963 66.680064 81.933280 54.642283
Cabrera High School Charter 1858 1081356 582.0 83.061895 83.975780 94.133477 97.039828 91.334769
Figueroa High School District 2949 1884411 639.0 76.711767 81.158020 65.988471 80.739234 53.204476
Ford High School District 2739 1763916 644.0 77.102592 80.746258 68.309602 79.299014 54.289887
Griffin High School Charter 1468 917500 625.0 83.351499 83.816757 93.392371 97.138965 90.599455

Top Performing Schools (By Passing Rate)

#Top Performing Schools (By Passing Rate)
top_performing = sch_summary.sort_values(["% Overall Passing Rate"],ascending = False)
top_performing.head(5)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
Cabrera High School Charter 1858 1081356 582.0 83.061895 83.975780 94.133477 97.039828 91.334769
Thomas High School Charter 1635 1043130 638.0 83.418349 83.848930 93.272171 97.308869 90.948012
Griffin High School Charter 1468 917500 625.0 83.351499 83.816757 93.392371 97.138965 90.599455
Wilson High School Charter 2283 1319574 578.0 83.274201 83.989488 93.867718 96.539641 90.582567
Pena High School Charter 962 585858 609.0 83.839917 84.044699 94.594595 95.945946 90.540541

Bottom Performing Schools (By Passing Rate)

#Bottom Performing Schools (By passing Rate)
bottom_performing = sch_summary.sort_values(["% Overall Passing Rate"])
bottom_performing.head(5)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
School Type Total Students Total School Budget Per Student Budget Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
Rodriguez High School District 3999 2547363 637.0 76.842711 80.744686 66.366592 80.220055 52.988247
Figueroa High School District 2949 1884411 639.0 76.711767 81.158020 65.988471 80.739234 53.204476
Huang High School District 2917 1910635 655.0 76.629414 81.182722 65.683922 81.316421 53.513884
Hernandez High School District 4635 3022020 652.0 77.289752 80.934412 66.752967 80.862999 53.527508
Johnson High School District 4761 3094650 650.0 77.072464 80.966394 66.057551 81.222432 53.539172

Math scores by grade

#Math Scores by Grade
ninth = student_df.loc[student_df["grade"]=='9th'].groupby("school")
ninth_math = ninth["math_score"].mean()
tenth = student_df.loc[student_df["grade"]=='10th'].groupby("school")
tenth_math = tenth["math_score"].mean()
eleventh=student_df.loc[student_df["grade"]=='11th'].groupby("school")
eleventh_math = eleventh["math_score"].mean()
twelfth = student_df.loc[student_df["grade"]=='12th'].groupby("school")
twelfth_math = twelfth["math_score"].mean()

math_grade = pd.DataFrame({"9th":ninth_math,
                          "10th":tenth_math,
                          "11th":eleventh_math,
                          "12th":twelfth_math})
math_grade=math_grade[["9th","10th","11th","12th"]]

math_grade.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
9th 10th 11th 12th
school
Bailey High School 77.083676 76.996772 77.515588 76.492218
Cabrera High School 83.094697 83.154506 82.765560 83.277487
Figueroa High School 76.403037 76.539974 76.884344 77.151369
Ford High School 77.361345 77.672316 76.918058 76.179963
Griffin High School 82.044010 84.229064 83.842105 83.356164

Reading Scores by Grade

#Reading Score by Grade
nine = student_df.loc[student_df["grade"]=='9th'].groupby("school")
nine_reading = nine["reading_score"].mean()
ten  = student_df.loc[student_df["grade"]=='10th'].groupby("school")
ten_reading = ten["reading_score"].mean()
eleven=student_df.loc[student_df["grade"]=='11th'].groupby("school")
eleven_reading = eleven["reading_score"].mean()
twelve = student_df.loc[student_df["grade"]=='12th'].groupby("school")
twelve_reading = twelve["reading_score"].mean()

reading_grade = pd.DataFrame({"9th":nine_reading,
                          "10th":ten_reading,
                          "11th":eleven_reading,
                          "12th":twelve_reading})
reading_grade=reading_grade[["9th","10th","11th","12th"]]

reading_grade.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
9th 10th 11th 12th
school
Bailey High School 81.303155 80.907183 80.945643 80.912451
Cabrera High School 83.676136 84.253219 83.788382 84.287958
Figueroa High School 81.198598 81.408912 80.640339 81.384863
Ford High School 80.632653 81.262712 80.403642 80.662338
Griffin High School 83.369193 83.706897 84.288089 84.013699

Scores by School Spending

#Scores by School Spending
score_summary = pd.DataFrame(sch_summary)
bins = [0,585,615,645,675]
group_names = ["< $585","$585-615","$615-645","$645-675"]
score_summary["Per Student Budget"]=pd.cut(score_summary["Per Student Budget"],bins,labels = group_names)

#rename per student budget to "Spending Ranges (Per Student)"
score = score_summary.rename(columns={"Per Student Budget":"Spending Ranges (Per Student)"})

#group the "Spending Ranges (Per Student)"
score = score.groupby(["Spending Ranges (Per Student)"])
score_spending = pd.DataFrame(score["Average Math Score"].mean())
score_spending["Average Reading Score"]=score["Average Reading Score"].mean()
score_spending["% Passing Math"]=score["% Passing Math"].mean()
score_spending["% Passing Reading"]=score["% Passing Reading"].mean()
score_spending["% Overall Passing Rate"]=score["% Overall Passing Rate"].mean()

score_spending
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
Spending Ranges (Per Student)
< $585 83.455399 83.933814 93.460096 96.610877 90.369459
$585-615 83.599686 83.885211 94.230858 95.900287 90.216324
$615-645 79.079225 81.891436 75.668212 86.106569 66.112060
$645-675 76.997210 81.027843 66.164813 81.133951 53.526855

Scores by School Size

sch_size=pd.DataFrame(sch_summary)
#rename total students to "School Size"
sch_size = sch_size.rename(columns={"Total Students":"School Size"})
#bin number to groups_size
bins_size = [0,1000,2000,5000]
groups_size = ["Small (<1000)","Medium (1000-2000)","Large (2000-5000)"]
sch_size["School Size"]=pd.cut(sch_size["School Size"],bins_size,labels = groups_size)
#groupby school size
size_group = sch_size.groupby(["School Size"])
size_score = pd.DataFrame(size_group["Average Math Score"].mean())
size_score["Average Reading Score"]=size_group["Average Reading Score"].mean()
size_score["% Passing Math"]=size_group["% Passing Math"].mean()
size_score["% Passing Reading"]=size_group["% Passing Reading"].mean()
size_score["% Overall Passing Rate"]=size_group["% Overall Passing Rate"].mean()

size_score
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
School Size
Small (<1000) 83.821598 83.929843 93.550225 96.099437 89.883853
Medium (1000-2000) 83.374684 83.864438 93.599695 96.790680 90.621535
Large (2000-5000) 77.746417 81.344493 69.963361 82.766634 58.286003

Scores by School Type

#groupby school type
group_type = sch_summary.groupby(["School Type"])
type_score = pd.DataFrame(group_type["Average Math Score"].mean())
type_score["Average Reading Score"]=group_type["Average Reading Score"].mean()
type_score["% Passing Math"]=group_type["% Passing Math"].mean()
type_score["% Passing Reading"]=group_type["% Passing Reading"].mean()
type_score["% Overall Passing Rate"]=group_type["% Overall Passing Rate"].mean()
type_score
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
School Type
Charter 83.473852 83.896421 93.620830 96.586489 90.432244
District 76.956733 80.966636 66.548453 80.799062 53.672208

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages