This is the code repository for the paper entitled A data-driven approach to complement the A/T/(N) classification system using CSF biomarkers. The repository follows the methodology and results presented in the abovementioned work.
The Python scripts present in this repository are organized as follows:
- prepare_datasets_HCSC.py - prepare data for HCSC dataset
- prepare_datasets_ADNI.py - prepare data for ADNI dataset
- kmeans_clustering.py - script for KMeans clustering using CSF biomarkers data fron the different data sources
- clusters_description.py - main functions to obtain several metrics from the obtained clusters
Moreover, there are several Python Jupyter Notebooks done specifically to some tasks:
- datasets_description.ipynb - dataset description statistics (number, sociodemo, MMSE, biomarkers values)
- clustering_statistics.ipynb - clusters description statistics (number, sociodemo, MMSE, biomarkers values, tests)
- survival_analysis.ipynb - survival analysis using Kaplan-Meier plots and Cox regression models
Other subdirectories present in this repository:
- data contains several data files used in this work. Please note that data files are not available in this repository due to privacy reasons.
- results SI scores os clustering results. Again, other results files are not available in this repository due to privacy reasons.
- figures figures obtained for the manuscript.
The code in this work was built using:
- Scikit-Learn for building clustering models.
- SciPy for statistical analyses.
- lifelines for survival analyses.
Please refer any questions to: Laura Hernández-Lorenzo - GitHub - email