Skip to content

Multi-class classification model to predict outcomes of cirrhosis patients using machine learning

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



6 Commits

Repository files navigation


This repository contains the solution for the Cirrhosis Patient Outcome Prediction competition. The task is to develop a multi-class classification model to predict the outcomes of patients with cirrhosis. The model predicts the probabilities for each of the three possible outcomes: Status_C (censored), Status_CL (censored due to liver transplant), and Status_D (deceased). The performance is evaluated using the multi-class logarithmic loss metric.


The goal of this project is to accurately predict the probability of each patient's outcome based on the given features in the dataset. The model's performance is measured using the multi-class logarithmic loss. The submission file should contain the predicted probabilities for each row in the test set.

Evaluation Metric

Submissions are evaluated using the multi-class logarithmic loss, calculated as follows:

$$ \text{logloss} = -\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{M} y_{ij} \log(p_{ij}) $$


  • (N) is the number of rows in the test set.
  • (M) is the number of outcomes (i.e., 3).
  • $log$ is the natural logarithm.
  • $y_{ij}$ is 1 if row $i$ has the ground truth label $j$ and 0 otherwise.
  • $p_{ij}$ is the predicted probability that observation $i$ belongs to class $j$.

The submitted probabilities for a given row do not need to sum to one, as they will be rescaled prior to scoring. To avoid extremes of the log function, predicted probabilities are replaced with $max(\min(p, 1 - 10^{-15}), 10^{-15})$.

Submission Format

The submission file should contain the predicted probabilities for each id in the test set, with the following format:



  • train.csv: Training dataset.
  • test.csv: Test dataset.
  • sample_submission.csv: Sample submission file in the correct format.


Thanks to the organizers of the competition for providing the dataset and evaluation framework.