Fully release TSI-Bench code (#20)

WenjieDu · Jun 19, 2024 · ac7849f · ac7849f
1 parent 7010627
commit ac7849f
Show file tree

Hide file tree

Showing 1,387 changed files with 231,237 additions and 2,333 deletions.
diff --git a/.github/workflows/greetings.yml b/.github/workflows/greetings.yml
@@ -18,7 +18,7 @@ jobs:
     steps:
     - uses: actions/first-interaction@v1
       with:
-        repo-token: ${{ secrets.ACCESS_TOKEN }}
+        repo-token: ${{ secrets.GITHUB_TOKEN }}
         issue-message: |
           Hi there 👋,
 
@@ -34,7 +34,7 @@ jobs:
         pr-message: |
           Hi there 👋,
 
-          We really really appreciate that you have taken the time to make this PR on PyPOTS' Awesome Imputation project!
+          We really appreciate that you have taken the time to make this PR on PyPOTS' Awesome Imputation project!
 
           If you are trying to fix a bug, please reference the issue number in the description or give your details about the bug.
           If you are implementing a feature request, please check with the maintainers that the feature will be accepted first.

diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,3 @@
+benchmark_code/data/physionet_2012/test.h5
+benchmark_code/data/physionet_2012/train.h5
+benchmark_code/data/physionet_2012/val.h5
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,28 @@
+Copyright (c) 2024-present, Wenjie Du
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
diff --git a/README.md b/README.md
@@ -1,42 +1,30 @@
 <p align="center">
     <a id="AwesomeImputation" href="#AwesomeImputation">
-        <img src="https://pypots.com/figs/pypots_logos/ImputationSurvey/banner.jpg" 
-            alt="Time Series Imputation Survey" title="Time Series Imputation Survey" width="80%"
+        <img src="https://pypots.com/figs/pypots_logos/AwesomeImputation/banner.jpg"
+            alt="Time Series Imputation Survey and Benchmark"
+            title="Time Series Imputation Survey and Benchmark"
+            width="80%"
         />
     </a>
 </p>
 
-The open-resource repository for the paper [**Deep Learning for Multivariate Time Series Imputation: A Survey**](https://arxiv.org/abs/2402.04059) 
+The repository for the paper [**TSI-Bench: Benchmarking Time Series Imputation**](https://arxiv.org/abs/2406.12747) 
 from <a href="https://pypots.com" target="_blank"><img src="https://pypots.com/figs/pypots_logos/PyPOTS/logo_FFBG.svg" width="30px" align="center"/> PyPOTS Research</a>.
-The code and configurations for reproducing the experimental results in the paper are available under 
-the folder `time_series_imputation_survey_code`.
-
-If you find this repository helpful to your work, please kindly star it and cite our survey paper (author profile links:
-[Jun Wang](https://github.com/AugustJW), [Wenjie Du](https://github.com/WenjieDu), 
-[Wei Cao](https://weicao1990.github.io/), [Keli Zhang](https://github.com/kelizhang), [Wenjia Wang](https://www.wenjia-w.com/home), 
-[Yuxuan Liang](https://yuxuanliang.com/), [Qingsong Wen](https://sites.google.com/site/qingsongwen8/)) as follows:
-
-```bibtex
-@article{wang2024deep,
-title={Deep Learning for Multivariate Time Series Imputation: A Survey},
-author={Wang, Jun and Du, Wenjie and Cao, Wei and Zhang, Keli and Wang, Wenjia and Liang, Yuxuan and Wen, Qingsong},
-journal={arXiv preprint arXiv:2402.04059},
-year={2024}
-}
-```
+The code and configurations for reproducing the experimental results in the paper are available under the folder `benchmark_code`.
+The README file here maintains a list of must-read papers on time-series imputation, and a collection of time-series imputation toolkits and resources.
 
 🤗 Contributions to update new resources and articles are very welcome!
 
 ## ❖ Time-Series Imputation Toolkits
-### Datasets
+### `Datasets`
 [TSDB (Time Series Data Beans)](https://github.com/WenjieDu/TSDB): a Python toolkit can load 169 public time-series datasets with a single line of code.
 <img src="https://img.shields.io/github/last-commit/WenjieDu/TSDB" align="center">
 
-### Missingness
+### `Missingness`
 [PyGrinder](https://github.com/WenjieDu/PyGrinder): a Python library grinds data beans into the incomplete by introducing missing values with different missing patterns.
 <img src="https://img.shields.io/github/last-commit/WenjieDu/PyGrinder" align="center">
 
-### Algorithms
+### `Algorithms`
 [PyPOTS](https://github.com/WenjieDu/PyPOTS): a Python toolbox for data mining on Partially-Observed Time Series
 <img src="https://img.shields.io/github/last-commit/WenjieDu/PyPOTS" align="center">
 
@@ -55,7 +43,21 @@ The papers listed here may be not from top publications, some of them even are n
 but are all interesting papers related to time-series imputation that deserve reading to 
 researchers and practitioners who are interested in this field.
 
-### Year 2023
+### `Year 2024`
+
+[ICML] **BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition**
+[[paper](https://arxiv.org/abs/2308.14906)]
+
+[ICLR] **Conditional Information Bottleneck Approach for Time Series Imputation**
+[[paper](https://openreview.net/pdf?id=K1mcPiDdOJ)]
+[[official code](https://github.com/Chemgyu/TimeCIB)]
+
+[AISTATS] **SADI: Similarity-Aware Diffusion Model-Based Imputation for Incomplete Temporal EHR Data**
+[[paper](https://proceedings.mlr.press/v238/dai24c/dai24c.pdf)]
+[[official code](https://github.com/bestadcarry/SADI-Similarity-Aware-Diffusion-Model-Based-Imputation-for-Incomplete-Temporal-EHR-Data)]
+
+
+### `Year 2023`
 
 [ICLR] **Multivariate Time-series Imputation with Disentangled Temporal Representations**
 [[paper](https://openreview.net/forum?id=rdjeCNUS6TG)]
@@ -111,7 +113,7 @@ researchers and practitioners who are interested in this field.
 [[paper](https://dl.acm.org/doi/abs/10.1145/3583780.3614840)]
 
 
-### Year 2022
+### `Year 2022`
 
 [ICLR] **Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks**
 [[paper](https://arxiv.org/abs/2108.00298)]
@@ -128,7 +130,7 @@ researchers and practitioners who are interested in this field.
 [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/21189)]
 
 
-### Year 2021
+### `Year 2021`
 
 [NeurIPS] **CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation**
 [[paper](https://openreview.net/forum?id=VzuIzbRDrum)]
@@ -144,7 +146,7 @@ researchers and practitioners who are interested in this field.
 [[paper](https://arxiv.org/abs/2209.10801)]
 
 
-### Year 2020
+### `Year 2020`
 
 [AISTATS] **GP-VAE: Deep Probabilistic Time Series Imputation**
 [[paper](https://arxiv.org/abs/1907.04155)]
@@ -160,7 +162,7 @@ researchers and practitioners who are interested in this field.
 [[paper](https://drive.google.com/file/d/1AkWlqjYJ1PNgnu5apOx2dow_vgmqViQG/view)]
 
 
-### Year 2019
+### `Year 2019`
 
 [NeurIPS] **NAOMI: Non-Autoregressive Multiresolution Sequence Imputation**
 [[paper](https://arxiv.org/abs/1901.10946)]
@@ -175,7 +177,7 @@ researchers and practitioners who are interested in this field.
 [[official code](https://github.com/tomstream/STI)]
 
 
-### Year 2018
+### `Year 2018`
 
 [NeurIPS] **BRITS: Bidirectional Recurrent Imputation for Time Series**
 [[paper](https://arxiv.org/abs/1805.10572)]
@@ -190,28 +192,56 @@ researchers and practitioners who are interested in this field.
 [[official code](https://github.com/Luoyonghong/Multivariate-Time-Series-Imputation-with-Generative-Adversarial-Networks)]
 
 
-### Year 2017
+### `Year 2017`
 
 [IEEE Transactions on Biomedical Engineering] **Estimating Missing Data in Temporal Data Streams Using Multi-Directional Recurrent Neural Networks**
 [[paper](https://arxiv.org/abs/1711.08742)]
 [[official code](https://github.com/jsyoon0823/MRNN)]
 
 
-### Year 2016
+### `Year 2016`
 
 [IJCAI] **ST-MVL: Filling Missing Values in Geo-sensory Time Series Data**
 [[paper](https://www.ijcai.org/Proceedings/16/Papers/384.pdf)]
 [[official code](https://www.microsoft.com/en-us/research/uploads/prod/2016/06/STMVL-Release.zip)]
 
 
 ## ❖ Other Resources
-### Repos about General Time Series
+### `Articles about General Missingness and Imputation`
+[blog] [**Data Imputation: An essential yet overlooked problem in machine learning**](https://www.vanderschaar-lab.com/data-imputation-an-essential-yet-overlooked-problem-in-machine-learning/)
+
+[Journal of Big Data] **A survey on missing data in machine learning** 
+[[paper](https://journalofbigdata.springeropen.com/articles/10.1186/s40537-021-00516-9)]
+
+
+### `Repos about General Time Series`
 [Transformers in Time Series](https://github.com/qingsongedu/time-series-transformers-review)
 
 [LLMs and Foundation Models for Time Series and Spatio-Temporal Data](https://github.com/qingsongedu/Awesome-TimeSeries-SpatioTemporal-LM-LLM)
 
 [AI for Time Series (AI4TS) Papers, Tutorials, and Surveys](https://github.com/qingsongedu/awesome-AI-for-time-series-papers)
 
+## ❖ Citing This Work
+If you find this repository helpful to your work, please kindly star it and cite our benchmark paper and survey paper as follows:
+
+```bibtex
+@article{du2024tsibench,
+title={TSI-Bench: Benchmarking Time Series Imputation},
+author={Wenjie Du and Jun Wang and Linglong Qian and Yiyuan Yang and Fanxing Liu and Zepu Wang and Zina Ibrahim and Haoxin Liu and Zhiyuan Zhao and Yingjie Zhou and Wenjia Wang and Kaize Ding and Yuxuan Liang and B. Aditya Prakash and Qingsong Wen},
+journal={arXiv preprint arXiv:2406.12747},
+year={2024}
+}
+```
+
+```bibtex
+@article{wang2024deep,
+title={Deep Learning for Multivariate Time Series Imputation: A Survey},
+author={Jun Wang and Wenjie Du and Wei Cao and Keli Zhang and Wenjia Wang and Yuxuan Liang and Qingsong Wen},
+journal={arXiv preprint arXiv:2402.04059},
+year={2024}
+}
+```
+
 
 <details>
 <summary>🏠 Visits</summary>

diff --git a/benchmark_code/PyPOTS_tuning_configs/Autoformer/Autoformer_PeMS_tuning_space.json b/benchmark_code/PyPOTS_tuning_configs/Autoformer/Autoformer_PeMS_tuning_space.json
@@ -0,0 +1,14 @@
+{
+  "n_steps": {"_type":"choice","_value":[24]},
+  "n_features":  {"_type":"choice","_value":[862]},
+  "epochs":  {"_type":"choice","_value":[100]},
+  "patience":  {"_type":"choice","_value":[10]},
+  "n_layers":  {"_type":"choice","_value":[1,2,3]},
+  "d_model":  {"_type":"choice","_value":[64,128,256,512,1024]},
+  "d_ffn":  {"_type":"choice","_value":[64,128,256,512,1024]},
+  "n_heads":  {"_type":"choice","_value":[1,2,4,8]},
+  "factor":  {"_type":"choice","_value":[3]},
+  "moving_avg_window_size":  {"_type":"choice","_value":[5,13,25]},
+  "dropout":  {"_type":"choice","_value":[0,0.1,0.2,0.3,0.4,0.5]},
+  "lr":{"_type":"loguniform","_value":[0.00005,0.01]}
+}
diff --git a/benchmark_code/PyPOTS_tuning_configs/Autoformer/Autoformer_PhysioNet2012_tuning_space.json b/benchmark_code/PyPOTS_tuning_configs/Autoformer/Autoformer_PhysioNet2012_tuning_space.json
@@ -0,0 +1,14 @@
+{
+  "n_steps": {"_type":"choice","_value":[48]},
+  "n_features":  {"_type":"choice","_value":[35]},
+  "epochs":  {"_type":"choice","_value":[100]},
+  "patience":  {"_type":"choice","_value":[10]},
+  "n_layers":  {"_type":"choice","_value":[1,2,3]},
+  "d_model":  {"_type":"choice","_value":[64,128,256,512,1024]},
+  "d_ffn":  {"_type":"choice","_value":[64,128,256,512,1024]},
+  "n_heads":  {"_type":"choice","_value":[1,2,4,8]},
+  "factor":  {"_type":"choice","_value":[3]},
+  "moving_avg_window_size":  {"_type":"choice","_value":[5,13,25]},
+  "dropout":  {"_type":"choice","_value":[0,0.1,0.2,0.3,0.4,0.5]},
+  "lr":{"_type":"loguniform","_value":[0.00005,0.01]}
+}
diff --git a/benchmark_code/PyPOTS_tuning_configs/Autoformer/Autoformer_searching_config.yml b/benchmark_code/PyPOTS_tuning_configs/Autoformer/Autoformer_searching_config.yml
@@ -0,0 +1,19 @@
+experimentName: Autoformer hyper-param searching
+authorName: WenjieDu
+trialConcurrency: 1
+trainingServicePlatform: local
+searchSpacePath: Autoformer_PhysioNet2012_tuning_space.json
+multiThread: true
+useAnnotation: false
+tuner:
+    builtinTunerName: Random
+
+trial:
+    command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.Autoformer --train_set ../../data/physionet_2012/train.h5 --val_set ../../data/physionet_2012/val.h5
+    codeDir: .
+    gpuNum: 1
+
+localConfig:
+    useActiveGpu: true
+    maxTrialNumPerGpu: 100
+    gpuIndices: 0
diff --git a/...configs/BRITS/BRITS_AQI_tuning_space.json → .../BRITS/BRITS_BeijingAir_tuning_space.json b/...configs/BRITS/BRITS_AQI_tuning_space.json → .../BRITS/BRITS_BeijingAir_tuning_space.json
@@ -2,7 +2,7 @@
   "n_steps":  {"_type":"choice","_value":[24]},
   "n_features":  {"_type":"choice","_value":[132]},
   "patience":  {"_type":"choice","_value":[10]},
-  "epochs":  {"_type":"choice","_value":[200]},
+  "epochs":  {"_type":"choice","_value":[100]},
   "rnn_hidden_size":  {"_type":"choice","_value":[32,64,128,256,512,1024]},
   "lr":{"_type":"loguniform","_value":[0.00005,0.01]}
 }
diff --git a/...nfigs/BRITS/BRITS_ETTm1_tuning_space.json → ...nfigs/BRITS/BRITS_ETTh1_tuning_space.json b/...nfigs/BRITS/BRITS_ETTm1_tuning_space.json → ...nfigs/BRITS/BRITS_ETTh1_tuning_space.json
@@ -1,8 +1,8 @@
 {
-  "n_steps":  {"_type":"choice","_value":[96]},
+  "n_steps":  {"_type":"choice","_value":[48]},
   "n_features":  {"_type":"choice","_value":[7]},
   "patience":  {"_type":"choice","_value":[10]},
-  "epochs":  {"_type":"choice","_value":[200]},
+  "epochs":  {"_type":"choice","_value":[100]},
   "rnn_hidden_size":  {"_type":"choice","_value":[32,64,128,256,512,1024]},
   "lr":{"_type":"loguniform","_value":[0.00005,0.01]}
 }
diff --git a/...ITS/BRITS_PhysioNet2012_tuning_space.json → ...ITS/BRITS_PhysioNet2012_tuning_space.json b/...ITS/BRITS_PhysioNet2012_tuning_space.json → ...ITS/BRITS_PhysioNet2012_tuning_space.json
@@ -1,8 +1,8 @@
 {
   "n_steps":  {"_type":"choice","_value":[48]},
-  "n_features":  {"_type":"choice","_value":[37]},
+  "n_features":  {"_type":"choice","_value":[35]},
   "patience":  {"_type":"choice","_value":[10]},
-  "epochs":  {"_type":"choice","_value":[200]},
+  "epochs":  {"_type":"choice","_value":[100]},
   "rnn_hidden_size":  {"_type":"choice","_value":[32,64,128,256,512,1024]},
   "lr":{"_type":"loguniform","_value":[0.00005,0.01]}
 }
diff --git a/benchmark_code/PyPOTS_tuning_configs/BRITS/BRITS_searching_config.yml b/benchmark_code/PyPOTS_tuning_configs/BRITS/BRITS_searching_config.yml
@@ -0,0 +1,23 @@
+experimentName: BRITS hyper-param searching
+authorName: WenjieDu
+trialConcurrency: 3
+trainingServicePlatform: local
+searchSpacePath: BRITS_PhysioNet2012_tuning_space.json
+# searchSpacePath: BRITS_BeijingAir_tuning_space.json
+# searchSpacePath: BRITS_ETTh1_tuning_space.json
+multiThread: true
+useAnnotation: false
+tuner:
+    builtinTunerName: Random
+
+trial:
+    command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.BRITS --train_set ../../data/physionet_2012/train.h5 --val_set ../../data/physionet_2012/val.h5
+#    command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.BRITS --train_set ../../data/air_quality/train.h5 --val_set ../../data/air_quality/val.h5
+    # command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.BRITS --train_set ../../data/ettm1/train.h5 --val_set ../../data/ettm1/val.h5
+    codeDir: .
+    gpuNum: 1
+
+localConfig:
+    useActiveGpu: true
+    maxTrialNumPerGpu: 100
+    gpuIndices: 0
diff --git a/...g_configs/CSDI/CSDI_AQI_tuning_space.json → ...gs/CSDI/CSDI_BeijingAir_tuning_space.json b/...g_configs/CSDI/CSDI_AQI_tuning_space.json → ...gs/CSDI/CSDI_BeijingAir_tuning_space.json
@@ -1,7 +1,8 @@
 {
+  "n_steps": {"_type":"choice","_value":[24]},
   "n_features":  {"_type":"choice","_value":[132]},
   "patience":  {"_type":"choice","_value":[10]},
-  "epochs":  {"_type":"choice","_value":[200]},
+  "epochs":  {"_type":"choice","_value":[100]},
   "n_layers":  {"_type":"choice","_value":[1,2,3,4,5,6]},
   "n_heads":  {"_type":"choice","_value":[1,2,4,8,16]},
   "n_channels":  {"_type":"choice","_value":[16,32,64,128]},

diff --git a/...configs/CSDI/CSDI_ETTm1_tuning_space.json → ...configs/CSDI/CSDI_ETTh1_tuning_space.json b/...configs/CSDI/CSDI_ETTm1_tuning_space.json → ...configs/CSDI/CSDI_ETTh1_tuning_space.json
@@ -1,7 +1,8 @@
 {
+  "n_steps": {"_type":"choice","_value":[96]},
   "n_features":  {"_type":"choice","_value":[7]},
   "patience":  {"_type":"choice","_value":[10]},
-  "epochs":  {"_type":"choice","_value":[200]},
+  "epochs":  {"_type":"choice","_value":[100]},
   "n_layers":  {"_type":"choice","_value":[1,2,3,4,5,6]},
   "n_heads":  {"_type":"choice","_value":[1,2,4,8,16]},
   "n_channels":  {"_type":"choice","_value":[16,32,64,128]},

diff --git a/...CSDI/CSDI_PhysioNet2012_tuning_space.json → ...CSDI/CSDI_PhysioNet2012_tuning_space.json b/...CSDI/CSDI_PhysioNet2012_tuning_space.json → ...CSDI/CSDI_PhysioNet2012_tuning_space.json
@@ -1,7 +1,8 @@
 {
-  "n_features":  {"_type":"choice","_value":[37]},
+  "n_steps": {"_type":"choice","_value":[48]},
+  "n_features":  {"_type":"choice","_value":[35]},
   "patience":  {"_type":"choice","_value":[10]},
-  "epochs":  {"_type":"choice","_value":[200]},
+  "epochs":  {"_type":"choice","_value":[100]},
   "n_layers":  {"_type":"choice","_value":[1,2,3,4,5,6]},
   "n_heads":  {"_type":"choice","_value":[1,2,4,8,16]},
   "n_channels":  {"_type":"choice","_value":[16,32,64,128]},

diff --git a/benchmark_code/PyPOTS_tuning_configs/CSDI/CSDI_searching_config.yml b/benchmark_code/PyPOTS_tuning_configs/CSDI/CSDI_searching_config.yml
@@ -0,0 +1,23 @@
+experimentName: CSDI hyper-param searching
+authorName: WenjieDu
+trialConcurrency: 1
+trainingServicePlatform: local
+searchSpacePath: CSDI_PhysioNet2012_tuning_space.json
+#searchSpacePath: CSDI_BeijingAir_tuning_space.json
+# searchSpacePath: CSDI_ETTh1_tuning_space.json
+multiThread: true
+useAnnotation: false
+tuner:
+    builtinTunerName: Random
+
+trial:
+    command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.CSDI --train_set ../../data/physionet_2012/train.h5 --val_set ../../data/physionet_2012/val.h5
+    # command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.CSDI --train_set ../../data/air_quality/train.h5 --val_set ../../data/air_quality/val.h5
+    # command: enable_tuning=1 pypots-cli tuning --model pypots.imputation.CSDI --train_set ../../data/ettm1/train.h5 --val_set ../../data/ettm1/val.h5
+    codeDir: .
+    gpuNum: 1
+
+localConfig:
+    useActiveGpu: true
+    maxTrialNumPerGpu: 100
+    gpuIndices: 0