Skip to content

Predict the current contraceptive method choice of a married Indonesian woman in 1987 based on her demographic and socio-economic characteristics. The choices are either no use, long-term methods, or short-term methods.

Notifications You must be signed in to change notification settings

Mathurkarishma/indonesian-contraception-in-1987

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Contraception in Indonesia

Predicting the contraceptive method choice from 1987 using a decision tree.
Explore the docs »

Report Bug · Request Feature

Table of Contents

  1. About The Project
  2. Getting Started
  3. Usage
  4. Conclusion
  5. Contact
  6. Acknowledgements

About The Project

We will be speaking into a survey conducted by the National Indonesia Contraceptive Prevalence in 1987. The survery requested married women who were either not pregnant or did not know if they were at the time to participate. We want to predict the current contraceptive method choice of a woman in this dataset based on her demographic and socio-economic characteristics. The choices are either no use, long-term methods, or short-term methods. We will be using a decision tree model, which predict outcomes through classification rules. It distills data into knowledge by taking a set of unfamiliar data and extracting rules.

Here is a link to the Contraceptive Method Choice dataset information.

Built With

Getting Started

To get a local copy up and running, download the decision-tree-model.R and the text input file, cmc.csv. Then run the code in an IDE software, such as RStudio. Set the working directory to the location of the CSV file.

Usage

The code guides you through the following:

  1. Importing the CSV file
  2. Visualizing the formatting of the variables (datatypes, number of rows/columns, measures of central tendancy, statistical descriptions, etc.)
  3. Exploring through histograms to find interesting variables
  4. Pre-processing such as transformation and installing decision tree packages (we removed factored categorical variables)
  5. Set the seed to allow for reproducability and split the dataset into a training set and test set
  6. Perform the decision tree model and evaluate the confusion matrix
  7. Change parameters to improve accuracy
  8. Compare model evaluation methods such as sensitivity, specificity, positive prediction value, negative prediction value, and prevalence of the data

Conclusion

The code results in the below decision tree ultimately. It shows 8 terminal nodes and a vector representing the proportion of instances in the node that have each of the three class values. For example, terminal node 6 states that 91 instances apply to this exact classification. 50.5% married woman out of the 91 do not use contraception, 16.5% use long-term contraception, and 3.3% use short-term contraception.

decision_tree

Furthermore, the below table calculates each of the model evaluation methods based off the training data confusion matrix and the test data confusion matrix. The true positive will be used as the number of couples who are not using contraception and the true negative will be used as the number of those using both long-term and short-term contraception. Every percentage of each of the evaluation methods in the table show a decrease between the training data and test data. If the model starts off overfitting on the training data, it will not generalize well for new unseen data, like the test data. Every dataset going forth that uses this model will have a similar outcome – low classification accuracy and other low evaluation metrics.

table

Education, being of the highest significance and our first assumption for the likelihood of contraception use, may have played a part in the level of accuracy of the model. However, due to the model being extremely overfit, proper outcomes were difficult to come by. Additionally, it was interesting to start off with the fact that the survey gave no option for a husband to not be holding an occupation. For instance, if the wife was working a job, then we do not know if the husband also was working or was not. This is a factor that we were not able to consider due to the lack of data in the dataset.

Contact

Karishma Mathur - [email protected]

Project Link: https://github.com/Mathurkarishma/indonesian-contraception-in-1987

Acknowledgements

About

Predict the current contraceptive method choice of a married Indonesian woman in 1987 based on her demographic and socio-economic characteristics. The choices are either no use, long-term methods, or short-term methods.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages