Skip to content

This project is part of the Udacity Azure ML Nanodegree. In this project, we build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn model. This model is then compared to an Azure AutoML run.

Notifications You must be signed in to change notification settings

C4ball/Udacity_Optimizing_a_Pipeline_in_Azure

Repository files navigation

Optimizing an ML Pipeline in Azure

Overview

This project is part of the Udacity Azure ML Nanodegree. In this project, we build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn model. This model is then compared to an Azure AutoML run.

Summary

This dataset contains data about direct marketing campaigns (from May 2008 to November 2010) of a Portuguese banking institution. The goal is to predict if the client will subscribe a term deposit, indicated in the variable 'y' (Yes = 1, No = 0)

The best performing model was created through the algorithm Voting Ensemble Classifier (PreFittedSoftVotingClassifier) generated by AutoML with the accuracy score of 0.9177975287231737. The model showed a 0,82% improvement compared to the model created by the HyperDrive method (Accuracy: 0.9103692463328276, Regularization Strength: 0.9072469401405283, Max iterations: 150).

Scikit-learn Pipeline

Pipeline Architecture

The pipeline consists of data preparation, training and test stages

Data Preparation In this stage, the CSV file was downloaded as dataset and converted to a Pandas dataframe. The data was then cleaned, hot encoded and divided in two dataframes - the feature variables and the target variable.

Explain the pipeline architecture, including data, hyperparameter tuning, and classification algorithm. Classifier

Logistic Regression from the Scikit-Learn library was used to demonstrate the HyperDrive approach.

Training configuration using HyperDrive Package

Hyperparameters to optimise:

C - regularisation strength max_iter - maximum number of iterations required for the classifier to converge

The parameter search space:

'C': uniform(0.1,1) 'max_iter': choice(50,100,150,200)

Sampling method: RandomParameterSampling - random search strategy to find the values

Primary metric to optimise: Accuracy

Early termination policy: BanditPolicy(slack_factor=0.1,evaluation_interval = 1,delay_evaluation=5)

Primary metric goal: PrimaryMetricGoal.MAXIMIZE

Max total runs: 100

Training and Test

The data was split into train (70%) and test (30%) datasets. We optimised hyperparameters by fitting multiple models with different hyperparameters on the train set and validating the models using the test set. The best run was selected and saved.

What are the benefits of the parameter sampler you chose? RandomParameterSampling is faster than Grid Sampling because it picks randomly hyperparameter values from the defined search space. It helps users to later refine the search based on the initial results.

What are the benefits of the early stopping policy you chose? Bandit stops the runs where the primary metric is not within the slack amount compared to the best performing run. For example, in this experiement after the interval 5 any run whose best metric is less than (1/(1+0.1) or 91% of then best performing run will be terminated.

There are two other stopping policies: Median stopping policy and Truncation selection policy.

Median stopping is based on the averages of primary metrics reported by the runs. Considering a delay_evaluation=5, in this policy after the interval 5 any run whose best metric is worse than the median of the running averages over intervals 1:5 across all training runs. In my opinion, this policy is slower because needs to computes running averages across all training runs, although it can be used to for less agress savings and without terminating promising jobs.

Truncation selection uses the percentage of performance of all runs to terminate the run. For example, with a truncation_percentage=10 a run terminates if is in the lowest 10% of performance of the previous runs. This policy can be more agressive if you choose a greater value for the truncation_percentage.

Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters

AutoML

The best performing model generated by AutoML was using the Voting Ensemble Classifier algorithm (PreFittedSoftVotingClassifier), which combines multiple models to produce a better result compared to a single model.

The idea behind the VotingClassifier is to combine conceptually different machine learning classifiers and use a majority vote or the average predicted probabilities (soft vote) to predict the class labels. Reference: https://scikit-learn.org/stable/modules/ensemble.html#voting-classifier

Pipeline comparison

Voting Ensemble (AutoML) LogisticRegressionCV (HyperDrive)
Accuracy: 0.9177975287231737 Accuracy: 0.9103692463328276
boosting_type='gbdt', Cs=0.9072469401405283
class_weight=None, Max iterations=150
colsample_bytree=1.0 |
importance_type='split' |
learning_rate=0.1 |
max_depth=-1 |
min_child_samples=20 |
min_child_weight=0.001 |
min_samples_leaf=0.01 |
min_samples_split=0.01 |
min_weight_fraction_leaf=0.0 |
n_estimators=25 |

The Hyperdrive and AutoML approaches produced similar results (0.9103692463328276 and 0.9177975287231737 respectively), the improvement in using AutoML was only 0.82% but I would still recommend using it. In the Hyperdrive method the user needs to develop the data preparation, training and validation stages, including specifying the range of hyperparameters will be used on the experiment, this can delay the delivery of the final model once the user needs to test diferente ranges if the result is not satisfactory. AutoML selects estimators, performs feature engineering and chooses hyperparameters, saving timing in development. In AutoML the user can start the search identifying the best models and then focusing on the best metrics to the Use Case.

Future work

For future work it will be necessary to test the methods using other metrics to get more reliable predictions, for example Recall, F1 Score or AUC weighted, depending on the use case and how the data is balanced (or imbalanced), accuracy can be not the best metric. Also it is possible to try other algorithms with the Hyperdrive to verify if there are improvements to achieve after AutoML show us the best performer models, for example Voting Ensemble Classifier, it is possible AutoML didn't identify the best hyperparameters in the time it was given.

About

This project is part of the Udacity Azure ML Nanodegree. In this project, we build and optimize an Azure ML pipeline using the Python SDK and a provided Scikit-learn model. This model is then compared to an Azure AutoML run.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published