Employee turnover represents a major burden for companies because it leads to direct costs in the form of hiring costs, training costs, productivity loss, opportunity costs for accounts left unmanaged as well as indirect costs such as the loss of institutional knowledge and the impact on employee morale.
This is the second of series of projects on workforce analytics. I first explored the issue of employee turnover in python using a decision tree classifier, this time I will use a different dataset, model and language to dive deeper into how employee turnover can be analyzed and predicted.
- Calculate the turnover rate and explore it across different dimensions.
- Identify talent segments and combine relevant data from multiple HR data sources to derive better insights.
- Use feature engineering to create new variables and exemplify the concept of information value (IV).
- Build a logistic regression model to predict turnover while accounting for multicollinearity among variables.
- Evaluate the accuracy of the model and categorize employees into specific risk buckets.
- Formulate an intervention strategy and estimate its return on investment (ROI).
For notebook-based projects, please refer directly to the Google Colab notebook I uploaded to this repository.
- R libraries:
readr
dplyr
ggplot2
lubridate
Information
caret