Skip to content

IPMI-ICNS-UKE/respiratory-signal-database

Repository files navigation

Respiratory signal database

This SQLite-driven database contains 2510 respiratory signals (uni-variate time series) of 419 patients (total acquisition time > 325.000 s). We hope that this data will be widely used for respiratory motion prediction as well as uni-variate time series forecasting in general. To this end, we provide data splitting (train, val and test set), preprocessing functionality and a pytorch dataset/loader template, which may be helpful in your machine or deep learning project.

We advise to use https://sqlitebrowser.org/ as additional database browsing tool. Currently, only python is supported. However, we hope that we can provide a Rest-API in the near future (approx. December 2023) for non-python user. If you experience any troubles, feel free to open an issue.

Author: Lukas Wimmert ([email protected])

Installation

Download sql database from our nextcloud server: https://icns-nas2.uke.uni-hamburg.de/drive/d/s/wFCsYhokwsJ9hLp9CHqBoHidu91yXfSk/lvQnG6BiEbfM2OqIOEXhaonW4vilyWKK-07sgZW6pCws.

Clone this repository to your local machine:

git clone https://github.com/IPMI-ICNS-UKE/respiratory-signal-database.git

cd into the repo and install necessary dependencies and the package itself:

pip install -r requirements.txt
pip install -e .

Tested with Python 3.9.11.

Usage

Always start by connecting to the database:

from pathlib import Path
from resp_db.client import RpmDatabaseClient

db_root = Path("example_database_path.db")  # change to the downloaded db-file
client = RpmDatabaseClient(db_filepath=db_root)

query and preprocess a specific signal

# query a signal
with client:
    df_signal, signal = client.get_specific_signal(
        research_number=5708019,
        modality="4DCT", # or "CBCT" or "LINAC"
        fraction= 0, # "4DCT" = 0, "LINAC" fractions between 1 and 5
        origin="UKE")

# preprocess 
sampling_rate = 25 # sampling rate in HZ, i.e. points per second.
df = RpmDatabaseClient.preprocess_signal(  
    df_signal=df_signal,
    sampling_rate=sampling_rate,
    only_beam_on=True,  # strongly recommended
    remove_offset=True, # strongly recommended
)
# df.columns = [time, amplitude]

query all signals of train(/val/test) set

with client:
    query = client.get_signals_of_dl_dataset(
        dl_dataset="train", project="short-term-prediction"
    )
query = list(query)  # query contains all train signals
# preprocess all signals 
query = [RpmDatabaseClient.preprocess_signal(df_signal=signal.df_signal,
                                             sampling_rate=25,
                                             only_beam_on=True,  # strongly recommended
                                             remove_offset=True  # strongly recommended
                                             )
         for signal in query]
# query is a list of dataframes. 

Feel free to write more query functions and add them to the RpmDatabaseClient.

data-set/loader template

Have a look at our pytorch template at "scripts/dataloader.py" for machine/deep learning projects.

Further information

respiratory signal acquisition details

All respiratory signals were recorded during radiotherapy treatment (i.e. 4D CT, 4D CBCT and dose delivery) of patients with thoracic lesions between February 2013 and May 2022 at the clinic of Radiotherapy and Radiation Oncology of the University Medical Center Hamburg-Eppendorf. Data acquisition was performed using the Varian RPM System, i.e. monitoring an external marker block placed on the patient's chest wall with an infrared camera system (uni-variate time series). Check out our acquisition images. All patients did not receive any visual guidance or coaching and, therefore, can be considered as free-breathing patients.

drawing

data overview

modality # signals # patients mean signal length (s)
4DCT 481 419 98.1
4D CBCT 251 52 59.6
dose delivery 1778 357 145.6
sum 2510 129.3

data splitting

Data splitting was performed on patient level. Thus, all signals of one patient belong to one set.

set proportion of data # signals # patients
training 50 % 1262 215
validation 20 % 514 84
testing 30 % 726 117
total 100 % 2502 416

Note that, eight signals (corresponding to three patients) were retrospectively classified as corrupted and, therefore, they were removed from the database.

Lastly, open access of retrospectively collected and anonymized respiratory signals was approved by the local ethics board and the need to obtain written informed consent was waived [2023-300334-WF].

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages