Skip to content

Latest commit

 

History

History
115 lines (83 loc) · 4.17 KB

README.md

File metadata and controls

115 lines (83 loc) · 4.17 KB

Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

This reporsitory provided implemented codes for the paper, Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization. The codes are implemented based on the original DRL methods for each task; see the references and original codes for details.

Installation

Clone project and create environment with conda:

conda create -n sym python==3.7
conda activate sym

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda install -c rdkit rdkit
pip install -r requirements.txt

Note

  • We highly recommend using Python 3.7, PyTorch 1.12.1, and Pytorch Geometric 1.7.2. Additionally, we use PyTDC 0.4.0 instead of 0.3.6, which is recommended in mol_opt.
  • If you use the different cuda version, please modify the url for torch-scatter and torch-sparse in requirements.txt before run it; see here.
  • If you have any problems to install torch-scatter and torch-sparse, try conda install pyg -c pyg.
  • We slightly modified the original codes of AM and Sym-NCO to make them runable in Python 3.7 according to here.

Usage

We have followed the original (base) source codes.

Euclidean CO

Symmetric Replay Training

TSP (base: AM)

cd attention-learn-to-route
python run.py --problem tsp --batch_size 100 --epoch_size 10000 --n_epochs 200 --graph_size 50 --val_dataset '../data/tsp/tsp50_val_seed1234.pkl' --baseline critic --distil_every 1 --il_coefficient 0.001

CVRP (base: Sym-NCO)

cd sym_nco
python run.py --problem cvrp --batch_size 100 --epoch_size 10000 --n_epochs 100 --graph_size 50 --val_dataset '../data/vrp/vrp50_val_seed1234.pkl' --N_aug 5 --il_coefficient 0.001 --distil_every 1 --run_name sym_rd

Baseline

AM

cd attention-learn-to-route
python run.py --problem tsp --batch_size 100 --epoch_size 10000 --n_epochs 200 --graph_size 50 --val_dataset '../data/tsp/tsp50_val_seed1234.pkl' --baseline rollout

POMO

cd pomo/TSP/POMO
python train_n50.py 
python train_n100.py 

Sym-NCO

cd sym_nco
python run.py --problem cvrp --batch_size 100 --n_epochs 50 --graph_size 50 --val_dataset '../data/vrp/vrp50_val_seed1234.pkl'

Non-Euclidean CO

cd non_euclidean_co/mat_net/ATSP/ATSP_MatNet
python train.py

Please change the configuration USE_POMO as True in train.py to run the original MatNet (base DRL method).

Note: validation data can be downloaded in here.


MolOpt

Symmetric Replay Training

cd mol_opt
python run.py reinvent_selfies --task simple --oracle scaffold_hop --config_default 'hparams_symrd.yaml'

(Base) REINVENT-SELFIES

python run.py reinvent_selfies --task simple --oracle scaffold_hop

Orther baselines are runable by changing method to gflownet (GFlowNet), gflownet_al (GFlowNet-AL), and moldqn (MolDQN).


Acknowledgements

This work is done based on the following papers.