Skip to content

Releases: neurosity/EEG-GPT

v0.1.0 - Initial release!

01 Jul 01:29
Compare
Choose a tag to compare
Pre-release

Release Notes: v0.1.0

Initial Release (v0.1.0)

Overview

We are excited to announce the initial release of our EEG foundation model project, based on the NeuroGPT model by Wenhui Cui et al. This release includes several models trained on EEG data, available at different checkpoints throughout the training process.

PNG image

Features

  • Preprocessing Script: Added preprocess.py to convert CSV or EDF files to NumPy .npy files with various preprocessing steps, including notch filtering and bandpass filtering.
  • Parallel Processing: Implemented parallel processing for preprocessing using the --parallel flag.
  • TUH EEG Support: Added support for TUH EEG files in preprocessing.
  • Experiment Tracking: Integrated wandb for experiment tracking.
  • Training and Evaluation Logging: Added CSVLogCallback class for logging training and evaluation metrics to CSV files.
  • Distributed Training: Provided train_parallel.sh script for distributed training using PyTorch with multiple GPUs.
  • Model Checkpoints: Several models are available from different points within the training process, allowing for comparison and selection based on performance metrics.

Documentation

  • Comprehensive README: Detailed setup instructions, preprocessing details, and example usage.
  • External Resources: Links to additional help with tools like tmux.

Models

  • Multiple Checkpoints: Models are available at various checkpoints, including:
    • Checkpoint at 9,900 steps
    • Checkpoint at 25,000 steps
    • Checkpoint at 41,100 steps
    • Final model at 50,000 steps

How to Use

  1. Preprocessing: Use preprocess.py to convert your EEG data into the required format.
  2. Training: Use train_gpt.py with the provided scripts for training on your data.
  3. Evaluation: Evaluate the models using test_gpt.py to determine the best performing checkpoint for your application.

Example Usage

For preprocessing TUH EEG files:

python3 src/eeg/preprocess.py --input_directory edf/ --output_directory data/npy_tuh_eeg --notch_filter 50 60 --bandpass_filter 1 48 --verbose --tuh_eeg --cutoff_samples 18

For training the model:

python src/train_gpt.py --training-steps=50000 --eval_every_n_steps=100 --log-every-n-steps=10 --per-device-training-batch-size=32 --per-device-validation-batch-size=32 --num-workers=32 --num_chunks=32 --chunk_len=500 --chunk_ovlp=50 --num-hidden-layers=6 --num-encoder-layers=6 --run-name=32clen2_embed1024 --training-style=CSM_causal --embedding-dim=1024 --train-data-path=data/npy_tuh_eeg --verbose=True

For distributed training:

python -m torch.distributed.launch --nproc_per_node=8 src/train_gpt.py --training-steps=50000 --eval_every_n_steps=100000 --log-every-n-steps=100 --per-device-training-batch-size=32 --per-device-validation-batch-size=32 --num-workers=32 --num_chunks=32 --chunk_len=500 --chunk_ovlp=50 --num-hidden-layers=6 --num-encoder-layers=6 --run-name=32clen2_embed1024_multi_gpu --training-style=CSM_causal --embedding-dim=1024 --train-data-path=data/npy_tuh_eeg --verbose=True &> train_parallel.log

References

  • NeuroGPT: Based on the NeuroGPT model by Wenhui Cui et al.
  • Neurosity Foundational Model: Inspired by the Neurosity Foundational Model by Jeremy Nixon and AJ Keller.

Acknowledgments

We would like to thank the contributors and the community for their support and feedback. This project is under active development, and we welcome contributions and suggestions.

For more details, please refer to the README and the CHANGELOG.


"""
test.py

Testing of models based on given data. See get_args() for
details on command line arguments

Give it n chunks, n<32.. test the n+1 chunk...
"""

import os
from typing import Dict
from safetensors import safe_open
from safetensors.torch import load_file
from train_gpt import make_model, get_config
from batcher.downstream_dataset import EEGDataset
import torch
from torch.utils.data import DataLoader
from numpy import random

if __name__ == '__main__':

    config = dict(get_config())
    model = make_model(config)

    root_path = os.getcwd()
    
    # results/models/upstream/32clen2_embed1024/model_final/model.safetensors
    model_path = os.path.join(os.getcwd(), config["log_dir"], "model_final")

    state_dict = load_file(model_path + "/model.safetensors")
    
    model.load_state_dict(state_dict=state_dict)

    # input_dataset = {'inputs': torch.ones(size=(1,32,68,256)),
    #                  'attention_mask': torch.zeros(size=(1,32,68,256)).numpy(),
    #                  'seq_on': 0,
    #                  'seq_len': 32
    #                 }
    
    
    train_data_path = config["train_data_path"]
    files = [os.path.join(train_data_path, f) for f in os.listdir(train_data_path) if f.endswith('.npy')]

    # # Remove files less than 0.2 MB
    files = [f for f in files if os.path.getsize(f) >= 0.2 * 1024 * 1024]

    random.shuffle(files)
    num_files = len(files)
    split_index = int(num_files * 0.9)
    train_files = files[:split_index]
    validation_files = files[split_index:]

    test_dataset = EEGDataset(validation_files, sample_keys=[
        'inputs',
        'attention_mask'
    ], chunk_len=config["chunk_len"],num_chunks=config["num_chunks"], ovlp=config["chunk_ovlp"], root_path=root_path, gpt_only=not config["use_encoder"], normalization=config["do_normalization"])

    model.eval()

    sample = DataLoader(test_dataset, batch_size=1)
    output = model(next(iter(sample)), prep_batch=True)

    print("Predictions: ", output['outputs'])
    print("Shape: ", output['outputs'].shape)
{
  // Use IntelliSense to learn about possible attributes.
  // Hover to view descriptions of existing attributes.
  // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: EEG Utils",
      "type": "python",
      "request": "launch",
      "program": "${workspaceFolder}/src/eeg/utils.py",
      "args": [
        "--input_directory",
        "data/crown/sessions",
        "--find_latest_timestamp",
      ]
    },
    {
      "name": "Python: Validate Numpy Arrays",
      "type": "python",
      "request": "launch",
      "program": "${workspaceFolder}/src/eeg/validate.py",
      "args": [
        "--path",
        "data/npy_tuh_eeg",
        "--parallel"
      ]
    },
    {
      "name": "Python: TUH EEG DEBUG",
      "type": "python",
      "request": "launch",
      "program": "${workspaceFolder}/src/eeg/preprocess.py",
      "args": [
        "--input_directory",
        "data/tuh_eeg",
        "--output_directory",
        "data/npy_tuh_eeg_test",
        "--notch_filter",
        "50",
        "60",
        "--bandpass_filter",
        "1",
        "48",
python src/train_gpt.py \
    --training-steps=50000 \
    --eval_every_n_steps=100 \
    --log-every-n-steps=10 \
    --per-device-training-batch-size=32 \
    --per-device-validation-batch-size=32 \
    --num-workers=32 \
    --num_chunks=32 \
    --chunk_len=500 \
    --chunk_ovlp=50 \
    --num-hidden-layers=6 \
    --num-encoder-layers=6 \
    --run-name=32clen2_embed1024 \
    --training-style=CSM_causal \
    --embedding-dim=1024 \
    --train-data-path=data/npy_tuh_eeg \
    --verbose=True
python -m torch.distributed.launch --nproc_per_node=8 \
    src/train_gpt.py \
    --training-steps=50000 \
    --eval_every_n_steps=100000 \
    --log-every-n-steps=100 \
    --per-device-training-batch-size=32 \
    --per-device-validation-batch-size=32 \
    --num-workers=32 \
    --num_chunks=32 \
    --chunk_len=500 \
    --chunk_ovlp=50 \
    --num-hidden-layers=6 \
    --num-encoder-layers=6 \
    --run-name=32clen2_embed1024_multi_gpu \
    --training-style=CSM_causal \
    --embedding-dim=1024 \
    --train-data-path=data/npy_tuh_eeg \
    --verbose=True \
    &> train_parallel.log