Skip to content

Using R and statistics in RStudio to analyze different variables for review. Use the production data for insights that may help the manufacturing team. Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg .

Notifications You must be signed in to change notification settings

emaynard10/MechaCar_Statistical_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MechaCar_Statistical_Analysis

Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg of MechaCar prototypes. Collect summary statistics on the pounds per square inch (PSI) of the suspension coils from the manufacturing lots. Run t-tests to determine if the manufacturing lots are statistically different from the mean population. Design a statistical study to compare vehicle performance of the MechaCar vehicles against vehicles from other manufacturer.

Tools: R , RStudio, and statistics

Linear Regression to Predict MPG

Screen Shot 2022-07-23 at 8 29 04 AM

  • Which variables/coefficients provided a non-random amount of variance to the mpg values in the dataset? We can look at each Pr(>|t|) value which represents the probability that each coefficient contributes a random amount of variance to the linear model. In the Pr(<|t|) column the coefficients that have significance are ground clearance and vehicle length and intercept as seen with the asterisks. These are statistically unlikely to provide random amounts of variance to the linear model. Or they have a significant impact on mpg. When an intercept is statistically significant, there could be factors not included in the dataset that are significant to mpg.

  • Is the slope of the linear model considered to be zero? Why or why not? The slope of the linear model is not zero as shown by the p-value; the p-value of 5.35e-11 is of a level of extreme significance so the null hypothesis can be rejected. The variables are likely not distributed based on random chance.

  • Does this linear model predict mpg of MechaCar prototypes effectively? Why or why not? The r-squared value of the model is used to determine if the model sufficiently predicts our dependent variable or mpg. The r-squared value is .7149 or 71%. So yes and no. the linear model does predict mpg fairly effectively, though there are likely other factors to be considered.

Summary Statistics on Suspension Coils!

The design specifications for the MechaCar suspension coils dictate that the variance of the suspension coils must not exceed 100 pounds per square inch. Does the current manufacturing data meet this design specification for all manufacturing lots in total and each lot individually? Why or why not?

total_summary

The variance for the total lot summary is 62.3 which falls under the 100 PSI requirements. The statistics are more interesting when broken down between the three manufacturing lots, where the first two lots are very far below the variance. It Lot 3 that skews the variance to be as high as it is in the total summary. Lot three variance is 170 which is far over the acceptable metric of 100. So each lot individually does not meet the manufacturing requirement design specification.

lot_summary

T-Tests on Suspension Coils

The t.tests show very similar means between the lots compared with a population mean of 1500 PSI. All lots together show a p-value of .06028 which is slightly higher than the 5% significance value. Looking at the individual lots, Lot 1 the p-value is 1, so much higher than the 5% siginificance value, with lot 2 at .6072 still too high to reject the null hypothesis, and lot 3 p-value is .04168 or the only lot with a p-value below the siginificance level, so we can reject the null hypothesis. All tests show statistically similar means except Lot 3, where the means are statistically different.

t test_alllots

t testLot1

t testLot2

t testLot3

Study Design: MechaCar vs Competition

* What metric or metrics are you going to test?

In order to compare MechaCar to one or more competitors, we will compare cost, fuel efficiency, and safety ratings

* What is the null hypothesis or alternative hypothesis?

H0 : The means of all groups are equal, or µ1 = µ2 = … = µn. Ha : At least one of the means is different from all other competitors.

* What statistical test would you use to test the hypothesis? And why?

The statistical test we will use is the analysis of variance or the ANOVA. This test helps answer the analytical question of is there a statistical difference between the distribution means from multiple samples? This test was chosen because it will look at categorical data, being different competitors, and multiple variables to determine the differences in means across various competitors. While a t test is also capable of comparing means, this analysis is designed to compared across multiple competitors. In R we can use the aov() function which takes data and formula vaiables to calculate our statistics. If needed we can filter the columns in the dataset that match our metrics and comparing our metrics across the categories, and then take a summary() of the aov().

* What data is needed to run the statistical test?

The data needed for the test is the costs, mpg, and safety ratings from a selection of competitors.

Contact

Contact Emily

About

Using R and statistics in RStudio to analyze different variables for review. Use the production data for insights that may help the manufacturing team. Perform multiple linear regression analysis to identify which variables in the dataset predict the mpg .

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages