Skip to content

Commit

Permalink
Fully release TSI-Bench code (#20)
Browse files Browse the repository at this point in the history
  • Loading branch information
WenjieDu committed Jun 19, 2024
1 parent 7010627 commit ac7849f
Show file tree
Hide file tree
Showing 1,387 changed files with 231,237 additions and 2,333 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/greetings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
steps:
- uses: actions/first-interaction@v1
with:
repo-token: ${{ secrets.ACCESS_TOKEN }}
repo-token: ${{ secrets.GITHUB_TOKEN }}
issue-message: |
Hi there 👋,
Expand All @@ -34,7 +34,7 @@ jobs:
pr-message: |
Hi there 👋,
We really really appreciate that you have taken the time to make this PR on PyPOTS' Awesome Imputation project!
We really appreciate that you have taken the time to make this PR on PyPOTS' Awesome Imputation project!
If you are trying to fix a bug, please reference the issue number in the description or give your details about the bug.
If you are implementing a feature request, please check with the maintainers that the feature will be accepted first.
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
benchmark_code/data/physionet_2012/test.h5
benchmark_code/data/physionet_2012/train.h5
benchmark_code/data/physionet_2012/val.h5
28 changes: 28 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Copyright (c) 2024-present, Wenjie Du
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
92 changes: 61 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,30 @@
<p align="center">
<a id="AwesomeImputation" href="#AwesomeImputation">
<img src="https://pypots.com/figs/pypots_logos/ImputationSurvey/banner.jpg"
alt="Time Series Imputation Survey" title="Time Series Imputation Survey" width="80%"
<img src="https://pypots.com/figs/pypots_logos/AwesomeImputation/banner.jpg"
alt="Time Series Imputation Survey and Benchmark"
title="Time Series Imputation Survey and Benchmark"
width="80%"
/>
</a>
</p>

The open-resource repository for the paper [**Deep Learning for Multivariate Time Series Imputation: A Survey**](https://arxiv.org/abs/2402.04059)
The repository for the paper [**TSI-Bench: Benchmarking Time Series Imputation**](https://arxiv.org/abs/2406.12747)
from <a href="https://pypots.com" target="_blank"><img src="https://pypots.com/figs/pypots_logos/PyPOTS/logo_FFBG.svg" width="30px" align="center"/> PyPOTS Research</a>.
The code and configurations for reproducing the experimental results in the paper are available under
the folder `time_series_imputation_survey_code`.

If you find this repository helpful to your work, please kindly star it and cite our survey paper (author profile links:
[Jun Wang](https://github.com/AugustJW), [Wenjie Du](https://github.com/WenjieDu),
[Wei Cao](https://weicao1990.github.io/), [Keli Zhang](https://github.com/kelizhang), [Wenjia Wang](https://www.wenjia-w.com/home),
[Yuxuan Liang](https://yuxuanliang.com/), [Qingsong Wen](https://sites.google.com/site/qingsongwen8/)) as follows:

```bibtex
@article{wang2024deep,
title={Deep Learning for Multivariate Time Series Imputation: A Survey},
author={Wang, Jun and Du, Wenjie and Cao, Wei and Zhang, Keli and Wang, Wenjia and Liang, Yuxuan and Wen, Qingsong},
journal={arXiv preprint arXiv:2402.04059},
year={2024}
}
```
The code and configurations for reproducing the experimental results in the paper are available under the folder `benchmark_code`.
The README file here maintains a list of must-read papers on time-series imputation, and a collection of time-series imputation toolkits and resources.

🤗 Contributions to update new resources and articles are very welcome!

## ❖ Time-Series Imputation Toolkits
### Datasets
### `Datasets`
[TSDB (Time Series Data Beans)](https://github.com/WenjieDu/TSDB): a Python toolkit can load 169 public time-series datasets with a single line of code.
<img src="https://img.shields.io/github/last-commit/WenjieDu/TSDB" align="center">

### Missingness
### `Missingness`
[PyGrinder](https://github.com/WenjieDu/PyGrinder): a Python library grinds data beans into the incomplete by introducing missing values with different missing patterns.
<img src="https://img.shields.io/github/last-commit/WenjieDu/PyGrinder" align="center">

### Algorithms
### `Algorithms`
[PyPOTS](https://github.com/WenjieDu/PyPOTS): a Python toolbox for data mining on Partially-Observed Time Series
<img src="https://img.shields.io/github/last-commit/WenjieDu/PyPOTS" align="center">

Expand All @@ -55,7 +43,21 @@ The papers listed here may be not from top publications, some of them even are n
but are all interesting papers related to time-series imputation that deserve reading to
researchers and practitioners who are interested in this field.

### Year 2023
### `Year 2024`

[ICML] **BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition**
[[paper](https://arxiv.org/abs/2308.14906)]

[ICLR] **Conditional Information Bottleneck Approach for Time Series Imputation**
[[paper](https://openreview.net/pdf?id=K1mcPiDdOJ)]
[[official code](https://github.com/Chemgyu/TimeCIB)]

[AISTATS] **SADI: Similarity-Aware Diffusion Model-Based Imputation for Incomplete Temporal EHR Data**
[[paper](https://proceedings.mlr.press/v238/dai24c/dai24c.pdf)]
[[official code](https://github.com/bestadcarry/SADI-Similarity-Aware-Diffusion-Model-Based-Imputation-for-Incomplete-Temporal-EHR-Data)]


### `Year 2023`

[ICLR] **Multivariate Time-series Imputation with Disentangled Temporal Representations**
[[paper](https://openreview.net/forum?id=rdjeCNUS6TG)]
Expand Down Expand Up @@ -111,7 +113,7 @@ researchers and practitioners who are interested in this field.
[[paper](https://dl.acm.org/doi/abs/10.1145/3583780.3614840)]


### Year 2022
### `Year 2022`

[ICLR] **Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks**
[[paper](https://arxiv.org/abs/2108.00298)]
Expand All @@ -128,7 +130,7 @@ researchers and practitioners who are interested in this field.
[[paper](https://ojs.aaai.org/index.php/AAAI/article/view/21189)]


### Year 2021
### `Year 2021`

[NeurIPS] **CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation**
[[paper](https://openreview.net/forum?id=VzuIzbRDrum)]
Expand All @@ -144,7 +146,7 @@ researchers and practitioners who are interested in this field.
[[paper](https://arxiv.org/abs/2209.10801)]


### Year 2020
### `Year 2020`

[AISTATS] **GP-VAE: Deep Probabilistic Time Series Imputation**
[[paper](https://arxiv.org/abs/1907.04155)]
Expand All @@ -160,7 +162,7 @@ researchers and practitioners who are interested in this field.
[[paper](https://drive.google.com/file/d/1AkWlqjYJ1PNgnu5apOx2dow_vgmqViQG/view)]


### Year 2019
### `Year 2019`

[NeurIPS] **NAOMI: Non-Autoregressive Multiresolution Sequence Imputation**
[[paper](https://arxiv.org/abs/1901.10946)]
Expand All @@ -175,7 +177,7 @@ researchers and practitioners who are interested in this field.
[[official code](https://github.com/tomstream/STI)]


### Year 2018
### `Year 2018`

[NeurIPS] **BRITS: Bidirectional Recurrent Imputation for Time Series**
[[paper](https://arxiv.org/abs/1805.10572)]
Expand All @@ -190,28 +192,56 @@ researchers and practitioners who are interested in this field.
[[official code](https://github.com/Luoyonghong/Multivariate-Time-Series-Imputation-with-Generative-Adversarial-Networks)]


### Year 2017
### `Year 2017`

[IEEE Transactions on Biomedical Engineering] **Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks**
[[paper](https://arxiv.org/abs/1711.08742)]
[[official code](https://github.com/jsyoon0823/MRNN)]


### Year 2016
### `Year 2016`

[IJCAI] **ST-MVL: Filling Missing Values in Geo-sensory Time Series Data**
[[paper](https://www.ijcai.org/Proceedings/16/Papers/384.pdf)]
[[official code](https://www.microsoft.com/en-us/research/uploads/prod/2016/06/STMVL-Release.zip)]


## ❖ Other Resources
### Repos about General Time Series
### `Articles about General Missingness and Imputation`
[blog] [**Data Imputation: An essential yet overlooked problem in machine learning**](https://www.vanderschaar-lab.com/data-imputation-an-essential-yet-overlooked-problem-in-machine-learning/)

[Journal of Big Data] **A survey on missing data in machine learning**
[[paper](https://journalofbigdata.springeropen.com/articles/10.1186/s40537-021-00516-9)]


### `Repos about General Time Series`
[Transformers in Time Series](https://github.com/qingsongedu/time-series-transformers-review)

[LLMs and Foundation Models for Time Series and Spatio-Temporal Data](https://github.com/qingsongedu/Awesome-TimeSeries-SpatioTemporal-LM-LLM)

[AI for Time Series (AI4TS) Papers, Tutorials, and Surveys](https://github.com/qingsongedu/awesome-AI-for-time-series-papers)

## ❖ Citing This Work
If you find this repository helpful to your work, please kindly star it and cite our benchmark paper and survey paper as follows:

```bibtex
@article{du2024tsibench,
title={TSI-Bench: Benchmarking Time Series Imputation},
author={Wenjie Du and Jun Wang and Linglong Qian and Yiyuan Yang and Fanxing Liu and Zepu Wang and Zina Ibrahim and Haoxin Liu and Zhiyuan Zhao and Yingjie Zhou and Wenjia Wang and Kaize Ding and Yuxuan Liang and B. Aditya Prakash and Qingsong Wen},
journal={arXiv preprint arXiv:2406.12747},
year={2024}
}
```

```bibtex
@article{wang2024deep,
title={Deep Learning for Multivariate Time Series Imputation: A Survey},
author={Jun Wang and Wenjie Du and Wei Cao and Keli Zhang and Wenjia Wang and Yuxuan Liang and Qingsong Wen},
journal={arXiv preprint arXiv:2402.04059},
year={2024}
}
```


<details>
<summary>🏠 Visits</summary>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"n_steps": {"_type":"choice","_value":[24]},
"n_features": {"_type":"choice","_value":[862]},
"epochs": {"_type":"choice","_value":[100]},
"patience": {"_type":"choice","_value":[10]},
"n_layers": {"_type":"choice","_value":[1,2,3]},
"d_model": {"_type":"choice","_value":[64,128,256,512,1024]},
"d_ffn": {"_type":"choice","_value":[64,128,256,512,1024]},
"n_heads": {"_type":"choice","_value":[1,2,4,8]},
"factor": {"_type":"choice","_value":[3]},
"moving_avg_window_size": {"_type":"choice","_value":[5,13,25]},
"dropout": {"_type":"choice","_value":[0,0.1,0.2,0.3,0.4,0.5]},
"lr":{"_type":"loguniform","_value":[0.00005,0.01]}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"n_steps": {"_type":"choice","_value":[48]},
"n_features": {"_type":"choice","_value":[35]},
"epochs": {"_type":"choice","_value":[100]},
"patience": {"_type":"choice","_value":[10]},
"n_layers": {"_type":"choice","_value":[1,2,3]},
"d_model": {"_type":"choice","_value":[64,128,256,512,1024]},
"d_ffn": {"_type":"choice","_value":[64,128,256,512,1024]},
"n_heads": {"_type":"choice","_value":[1,2,4,8]},
"factor": {"_type":"choice","_value":[3]},
"moving_avg_window_size": {"_type":"choice","_value":[5,13,25]},
"dropout": {"_type":"choice","_value":[0,0.1,0.2,0.3,0.4,0.5]},
"lr":{"_type":"loguniform","_value":[0.00005,0.01]}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
experimentName: Autoformer hyper-param searching
authorName: WenjieDu
trialConcurrency: 1
trainingServicePlatform: local
searchSpacePath: Autoformer_PhysioNet2012_tuning_space.json
multiThread: true
useAnnotation: false
tuner:
builtinTunerName: Random

trial:
command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.Autoformer --train_set ../../data/physionet_2012/train.h5 --val_set ../../data/physionet_2012/val.h5
codeDir: .
gpuNum: 1

localConfig:
useActiveGpu: true
maxTrialNumPerGpu: 100
gpuIndices: 0
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"n_steps": {"_type":"choice","_value":[24]},
"n_features": {"_type":"choice","_value":[132]},
"patience": {"_type":"choice","_value":[10]},
"epochs": {"_type":"choice","_value":[200]},
"epochs": {"_type":"choice","_value":[100]},
"rnn_hidden_size": {"_type":"choice","_value":[32,64,128,256,512,1024]},
"lr":{"_type":"loguniform","_value":[0.00005,0.01]}
}
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"n_steps": {"_type":"choice","_value":[96]},
"n_steps": {"_type":"choice","_value":[48]},
"n_features": {"_type":"choice","_value":[7]},
"patience": {"_type":"choice","_value":[10]},
"epochs": {"_type":"choice","_value":[200]},
"epochs": {"_type":"choice","_value":[100]},
"rnn_hidden_size": {"_type":"choice","_value":[32,64,128,256,512,1024]},
"lr":{"_type":"loguniform","_value":[0.00005,0.01]}
}
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"n_steps": {"_type":"choice","_value":[48]},
"n_features": {"_type":"choice","_value":[37]},
"n_features": {"_type":"choice","_value":[35]},
"patience": {"_type":"choice","_value":[10]},
"epochs": {"_type":"choice","_value":[200]},
"epochs": {"_type":"choice","_value":[100]},
"rnn_hidden_size": {"_type":"choice","_value":[32,64,128,256,512,1024]},
"lr":{"_type":"loguniform","_value":[0.00005,0.01]}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
experimentName: BRITS hyper-param searching
authorName: WenjieDu
trialConcurrency: 3
trainingServicePlatform: local
searchSpacePath: BRITS_PhysioNet2012_tuning_space.json
# searchSpacePath: BRITS_BeijingAir_tuning_space.json
# searchSpacePath: BRITS_ETTh1_tuning_space.json
multiThread: true
useAnnotation: false
tuner:
builtinTunerName: Random

trial:
command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.BRITS --train_set ../../data/physionet_2012/train.h5 --val_set ../../data/physionet_2012/val.h5
# command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.BRITS --train_set ../../data/air_quality/train.h5 --val_set ../../data/air_quality/val.h5
# command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.BRITS --train_set ../../data/ettm1/train.h5 --val_set ../../data/ettm1/val.h5
codeDir: .
gpuNum: 1

localConfig:
useActiveGpu: true
maxTrialNumPerGpu: 100
gpuIndices: 0
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"n_steps": {"_type":"choice","_value":[24]},
"n_features": {"_type":"choice","_value":[132]},
"patience": {"_type":"choice","_value":[10]},
"epochs": {"_type":"choice","_value":[200]},
"epochs": {"_type":"choice","_value":[100]},
"n_layers": {"_type":"choice","_value":[1,2,3,4,5,6]},
"n_heads": {"_type":"choice","_value":[1,2,4,8,16]},
"n_channels": {"_type":"choice","_value":[16,32,64,128]},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"n_steps": {"_type":"choice","_value":[96]},
"n_features": {"_type":"choice","_value":[7]},
"patience": {"_type":"choice","_value":[10]},
"epochs": {"_type":"choice","_value":[200]},
"epochs": {"_type":"choice","_value":[100]},
"n_layers": {"_type":"choice","_value":[1,2,3,4,5,6]},
"n_heads": {"_type":"choice","_value":[1,2,4,8,16]},
"n_channels": {"_type":"choice","_value":[16,32,64,128]},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
{
"n_features": {"_type":"choice","_value":[37]},
"n_steps": {"_type":"choice","_value":[48]},
"n_features": {"_type":"choice","_value":[35]},
"patience": {"_type":"choice","_value":[10]},
"epochs": {"_type":"choice","_value":[200]},
"epochs": {"_type":"choice","_value":[100]},
"n_layers": {"_type":"choice","_value":[1,2,3,4,5,6]},
"n_heads": {"_type":"choice","_value":[1,2,4,8,16]},
"n_channels": {"_type":"choice","_value":[16,32,64,128]},
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
experimentName: CSDI hyper-param searching
authorName: WenjieDu
trialConcurrency: 1
trainingServicePlatform: local
searchSpacePath: CSDI_PhysioNet2012_tuning_space.json
#searchSpacePath: CSDI_BeijingAir_tuning_space.json
# searchSpacePath: CSDI_ETTh1_tuning_space.json
multiThread: true
useAnnotation: false
tuner:
builtinTunerName: Random

trial:
command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.CSDI --train_set ../../data/physionet_2012/train.h5 --val_set ../../data/physionet_2012/val.h5
# command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.CSDI --train_set ../../data/air_quality/train.h5 --val_set ../../data/air_quality/val.h5
# command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.CSDI --train_set ../../data/ettm1/train.h5 --val_set ../../data/ettm1/val.h5
codeDir: .
gpuNum: 1

localConfig:
useActiveGpu: true
maxTrialNumPerGpu: 100
gpuIndices: 0
Loading

0 comments on commit ac7849f

Please sign in to comment.