Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance on MPI-Sintel #11

Open
xylf opened this issue Jan 5, 2019 · 18 comments
Open

Bad performance on MPI-Sintel #11

xylf opened this issue Jan 5, 2019 · 18 comments

Comments

@xylf
Copy link

xylf commented Jan 5, 2019

Hi,
I have used your pretrained model to finetune on MPI-Sintel. The EPE on test set was 6.2. Have you tried it?

@tsenst
Copy link

tsenst commented Jan 17, 2019

To fine-tune on the MPI-Sintel dataset you have to change the dataset options. If you found respective in:

[1] Deqing Sun, Xiaodong Yang, Ming-Yu Liu, and Jan Kautz. "PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume." CVPR 2018 or arXiv:1709.02371](https://arxiv.org/abs/1709.02371)

and set them to:

ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS)
ds_opts['in_memory'] = False                           
ds_opts['aug_type'] = 'heavy'                       
ds_opts['flipud'] = 0              # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['translate'] = (0,0)   # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['scale'] = (0,0)         # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['batch_size'] = batch_size * len(gpu_devices)  
ds_opts['crop_preproc'] = (384, 768)                  # Crop to described in [1]
ds_opts['batch_size'] = 4

and

# Robust loss as described doesn't work, so try the following:
nn_opts['loss_fn'] = 'loss_multiscale' 
nn_opts['q'] = 0.4 # see[1]
nn_opts['epsilon'] = 0.01 # see[1]

By fine-tuning on clean and final and evaluating on the training data I got:

  • clean 1.4 EPE
  • final 1.88 EPE

however the results on the test data compared to the reported original are quite low:

  • clean 5.13 (Place 83) in contrast to 4.37 of the original
  • final 6.50 (Place 77) in contrast to 5.04 of the original

I have used the lg-6-2 Net. Could this be an issue of over-fitting? I would appreciate any help to get better results on the test data.

@jsczzzk
Copy link

jsczzzk commented Jan 18, 2019

I think the difference above can be listed as follows:
1.you should take care of the choices of validation set,see https://github.com/lmb-freiburg/flownet2/issues?utf8=%E2%9C%93&q=320

2.data augmentations used in the code have a little different in the original flownet paper,see #10 .When training in chairs,you should add that.

@tsenst
Copy link

tsenst commented Jan 18, 2019

Thanks I will take a try but you mentioned flownet2. I want to replicate the pwc-net results.

@jsczzzk
Copy link

jsczzzk commented Feb 13, 2019

Did you replicate the results successfully?

@tsenst
Copy link

tsenst commented Feb 13, 2019

Do you mean for flownet2 or pwc-net

@jsczzzk
Copy link

jsczzzk commented Feb 13, 2019

pwc-net

@tsenst
Copy link

tsenst commented Feb 13, 2019

Unless the one reported abobe, I don't have done any further experiments.

@jsczzzk
Copy link

jsczzzk commented Feb 16, 2019

Thank you so much!

@xianshunw
Copy link

@tsenst Hi, I also have this problem. do you find the reason and the corresponding solution?

@HeliosZhao
Copy link

@tsenst Hi, when I finetune the model on MPI-Sintel with your options
The loss and epe are all 'nan'
Did you meet this problem?

@Blcony
Copy link

Blcony commented Sep 16, 2019

@tsenst Hi, I also have this problem. do you find the reason and the corresponding solution?

Hi~
Have you solved the problems ?

@xianshunw
Copy link

@tsenst Hi, I also have this problem. do you find the reason and the corresponding solution?

Hi~
Have you solved the problems?

No solution, probably because of the data augmentation.

@HeliosZhao
Copy link

HeliosZhao commented Sep 17, 2019

@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan'
like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan

The fine-tune code is


from __future__ import absolute_import, division, print_function
import sys
from copy import deepcopy

from dataset_base import _DEFAULT_DS_TUNE_OPTIONS
from dataset_flyingchairs import FlyingChairsDataset
from dataset_flyingthings3d import FlyingThings3DHalfResDataset
from dataset_mixer import MixedDataset
from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS
from dataset_mpisintel import MPISintelDataset

# TODO: You MUST set dataset_root to the correct path on your machine!

_DATASET_ROOT = '/home/zyy/opticalflow/data/'
_MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel'

gpu_devices = ['/device:GPU:0']
controller = '/device:GPU:0'

# TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s)
# Batch size
batch_size = 8

# TODO: You MUST set the batch size based on the capabilities of your GPU(s)
#  Load train dataset
ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS)
ds_opts['in_memory'] = False                          # Too many samples to keep in memory at once, so don't preload them
ds_opts['aug_type'] = 'heavy'                         # Apply all supported augmentations
ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti)
ds_opts['crop_preproc'] = (384,768)		#(256, 448)                  # Crop to a smaller input size
ds_opts['train_mode'] = 'fine-tune'
#ds_opts['crop_preproc'] = None

ds_opts['type'] = 'final'
ds_opts['flipud'] = 0              # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['translate'] = (0,0)   # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['scale'] = (0,0)         # Only apply horizontal flipping for data augmentation, see [1]

ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts)

# Display dataset configuration
ds.print_config()

# Start from the default options
nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS)
nn_opts['verbose'] = True
nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000'
nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal'
nn_opts['batch_size'] = ds_opts['batch_size']
nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3]
nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2]
nn_opts['use_tf_data'] = True # Use tf.data reader
nn_opts['gpu_devices'] = gpu_devices
nn_opts['controller'] = controller
nn_opts['train_mode'] = 'fine-tune'
#
# # Use the PWC-Net-small model in quarter-resolution mode
nn_opts['use_dense_cx'] = False
nn_opts['use_res_cx'] = False
nn_opts['pyr_lvls'] = 6
nn_opts['flow_pred_lvl'] = 2
#
# # Robust loss as described doesn't work, so try the following:
nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust'
nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1.
nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0.

# Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8.
# Below,we adjust the schedule to the size of the batch and the number of GPUs.
nn_opts['lr_policy'] = 'multisteps'
nn_opts['init_lr'] = 1e-05
nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000]
nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07]
nn_opts['max_steps'] = 200000

# Below,we adjust the schedule to the size of the batch and our number of GPUs (2).
nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size'])
nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size'])

# Instantiate the model and display the model configuration
nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds)
nn.print_config()

# Train the model
nn.train()

Have you ever met this problem?

@Blcony
Copy link

Blcony commented Sep 17, 2019

@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan'
like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan

The fine-tune code is


from __future__ import absolute_import, division, print_function
import sys
from copy import deepcopy

from dataset_base import _DEFAULT_DS_TUNE_OPTIONS
from dataset_flyingchairs import FlyingChairsDataset
from dataset_flyingthings3d import FlyingThings3DHalfResDataset
from dataset_mixer import MixedDataset
from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS
from dataset_mpisintel import MPISintelDataset

# TODO: You MUST set dataset_root to the correct path on your machine!

_DATASET_ROOT = '/home/zyy/opticalflow/data/'
_MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel'

gpu_devices = ['/device:GPU:0']
controller = '/device:GPU:0'

# TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s)
# Batch size
batch_size = 8

# TODO: You MUST set the batch size based on the capabilities of your GPU(s)
#  Load train dataset
ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS)
ds_opts['in_memory'] = False                          # Too many samples to keep in memory at once, so don't preload them
ds_opts['aug_type'] = 'heavy'                         # Apply all supported augmentations
ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti)
ds_opts['crop_preproc'] = (384,768)		#(256, 448)                  # Crop to a smaller input size
ds_opts['train_mode'] = 'fine-tune'
#ds_opts['crop_preproc'] = None

ds_opts['type'] = 'final'
ds_opts['flipud'] = 0              # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['translate'] = (0,0)   # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['scale'] = (0,0)         # Only apply horizontal flipping for data augmentation, see [1]

ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts)

# Display dataset configuration
ds.print_config()

# Start from the default options
nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS)
nn_opts['verbose'] = True
nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000'
nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal'
nn_opts['batch_size'] = ds_opts['batch_size']
nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3]
nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2]
nn_opts['use_tf_data'] = True # Use tf.data reader
nn_opts['gpu_devices'] = gpu_devices
nn_opts['controller'] = controller
nn_opts['train_mode'] = 'fine-tune'
#
# # Use the PWC-Net-small model in quarter-resolution mode
nn_opts['use_dense_cx'] = False
nn_opts['use_res_cx'] = False
nn_opts['pyr_lvls'] = 6
nn_opts['flow_pred_lvl'] = 2
#
# # Robust loss as described doesn't work, so try the following:
nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust'
nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1.
nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0.

# Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8.
# Below,we adjust the schedule to the size of the batch and the number of GPUs.
nn_opts['lr_policy'] = 'multisteps'
nn_opts['init_lr'] = 1e-05
nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000]
nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07]
nn_opts['max_steps'] = 200000

# Below,we adjust the schedule to the size of the batch and our number of GPUs (2).
nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size'])
nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size'])

# Instantiate the model and display the model configuration
nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds)
nn.print_config()

# Train the model
nn.train()

Have you ever met this problem?

Hi~
I haven't tried to finetune on MIP-sintel, but maybe this link (#7) is helpful for you.
Maybe you can try it.

@jeffbaena
Copy link

@Blcony by any chance did you manage to implement this (#7) solution?
Could you post here the code?
I think it should be added between line 549 and 553 of model_pwcnet

Thanks,
Stefano

@HeliosZhao
Copy link

@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan'
like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan
The fine-tune code is


from __future__ import absolute_import, division, print_function
import sys
from copy import deepcopy

from dataset_base import _DEFAULT_DS_TUNE_OPTIONS
from dataset_flyingchairs import FlyingChairsDataset
from dataset_flyingthings3d import FlyingThings3DHalfResDataset
from dataset_mixer import MixedDataset
from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS
from dataset_mpisintel import MPISintelDataset

# TODO: You MUST set dataset_root to the correct path on your machine!

_DATASET_ROOT = '/home/zyy/opticalflow/data/'
_MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel'

gpu_devices = ['/device:GPU:0']
controller = '/device:GPU:0'

# TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s)
# Batch size
batch_size = 8

# TODO: You MUST set the batch size based on the capabilities of your GPU(s)
#  Load train dataset
ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS)
ds_opts['in_memory'] = False                          # Too many samples to keep in memory at once, so don't preload them
ds_opts['aug_type'] = 'heavy'                         # Apply all supported augmentations
ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti)
ds_opts['crop_preproc'] = (384,768)		#(256, 448)                  # Crop to a smaller input size
ds_opts['train_mode'] = 'fine-tune'
#ds_opts['crop_preproc'] = None

ds_opts['type'] = 'final'
ds_opts['flipud'] = 0              # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['translate'] = (0,0)   # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['scale'] = (0,0)         # Only apply horizontal flipping for data augmentation, see [1]

ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts)

# Display dataset configuration
ds.print_config()

# Start from the default options
nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS)
nn_opts['verbose'] = True
nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000'
nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal'
nn_opts['batch_size'] = ds_opts['batch_size']
nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3]
nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2]
nn_opts['use_tf_data'] = True # Use tf.data reader
nn_opts['gpu_devices'] = gpu_devices
nn_opts['controller'] = controller
nn_opts['train_mode'] = 'fine-tune'
#
# # Use the PWC-Net-small model in quarter-resolution mode
nn_opts['use_dense_cx'] = False
nn_opts['use_res_cx'] = False
nn_opts['pyr_lvls'] = 6
nn_opts['flow_pred_lvl'] = 2
#
# # Robust loss as described doesn't work, so try the following:
nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust'
nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1.
nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0.

# Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8.
# Below,we adjust the schedule to the size of the batch and the number of GPUs.
nn_opts['lr_policy'] = 'multisteps'
nn_opts['init_lr'] = 1e-05
nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000]
nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07]
nn_opts['max_steps'] = 200000

# Below,we adjust the schedule to the size of the batch and our number of GPUs (2).
nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size'])
nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size'])

# Instantiate the model and display the model configuration
nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds)
nn.print_config()

# Train the model
nn.train()

Have you ever met this problem?

Hi~
I haven't tried to finetune on MIP-sintel, but maybe this link (#7) is helpful for you.
Maybe you can try it.

Well, maybe that issue does not solve my problem. I encounter this problem as early as 200 iteration。
Like this


Start finetuning...
2019-09-17 22:40:18 Iter 100 [Train]: loss=3.39, epe=4.67, lr=0.000010, samples/sec=3.7, sec/step=1.081, eta=5 days, 0:05:35
2019-09-17 22:41:33 Iter 200 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.6, sec/step=0.710, eta=3 days, 6:48:41
2019-09-17 22:42:37 Iter 300 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=6.8, sec/step=0.590, eta=2 days, 17:29:28
2019-09-17 22:43:51 Iter 400 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.7, sec/step=0.703, eta=3 days, 5:59:56

Thank you very much, Maybe I need to open a new issue.

@lelelexxx
Copy link

@xianshunw @Blcony Hi, I try to fine-tune or train on MPISintel, but the loss and epe are all ''nan'
like this
2019-09-17 00:36:04 Iter 1000 [Train]: loss=nan, epe=nan, lr=0.000100, samples/sec=6.4, sec/step=0.628, eta=17 days, 10:29:35 2019-09-17 00:36:14 Iter 1000 [Val]: loss=nan, epe=nan
The fine-tune code is


from __future__ import absolute_import, division, print_function
import sys
from copy import deepcopy

from dataset_base import _DEFAULT_DS_TUNE_OPTIONS
from dataset_flyingchairs import FlyingChairsDataset
from dataset_flyingthings3d import FlyingThings3DHalfResDataset
from dataset_mixer import MixedDataset
from model_pwcnet import ModelPWCNet, _DEFAULT_PWCNET_FINETUNE_OPTIONS
from dataset_mpisintel import MPISintelDataset

# TODO: You MUST set dataset_root to the correct path on your machine!

_DATASET_ROOT = '/home/zyy/opticalflow/data/'
_MPI_ROOT = _DATASET_ROOT + 'MPI-Sintel'

gpu_devices = ['/device:GPU:0']
controller = '/device:GPU:0'

# TODO: You MUST adjust this setting below based on the amount of memory on your GPU(s)
# Batch size
batch_size = 8

# TODO: You MUST set the batch size based on the capabilities of your GPU(s)
#  Load train dataset
ds_opts = deepcopy(_DEFAULT_DS_TUNE_OPTIONS)
ds_opts['in_memory'] = False                          # Too many samples to keep in memory at once, so don't preload them
ds_opts['aug_type'] = 'heavy'                         # Apply all supported augmentations
ds_opts['batch_size'] = batch_size * len(gpu_devices) # Use a multiple of 8; here, 16 for dual-GPU mode (Titan X & 1080 Ti)
ds_opts['crop_preproc'] = (384,768)		#(256, 448)                  # Crop to a smaller input size
ds_opts['train_mode'] = 'fine-tune'
#ds_opts['crop_preproc'] = None

ds_opts['type'] = 'final'
ds_opts['flipud'] = 0              # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['translate'] = (0,0)   # Only apply horizontal flipping for data augmentation, see [1]
ds_opts['scale'] = (0,0)         # Only apply horizontal flipping for data augmentation, see [1]

ds = MPISintelDataset(mode='train_with_val', ds_root=_MPI_ROOT, options=ds_opts)

# Display dataset configuration
ds.print_config()

# Start from the default options
nn_opts = deepcopy(_DEFAULT_PWCNET_FINETUNE_OPTIONS)
nn_opts['verbose'] = True
nn_opts['ckpt_path'] = './models/pwcnet-sm-6-2-multisteps-chairsthingsmix/pwcnet.ckpt-592000'
nn_opts['ckpt_dir'] = './pwcnet-sm-6-2-cyclic-mpisintel_finetuned/MPI-Sintel_onlyfinal'
nn_opts['batch_size'] = ds_opts['batch_size']
nn_opts['x_shape'] = [2, ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 3]
nn_opts['y_shape'] = [ds_opts['crop_preproc'][0], ds_opts['crop_preproc'][1], 2]
nn_opts['use_tf_data'] = True # Use tf.data reader
nn_opts['gpu_devices'] = gpu_devices
nn_opts['controller'] = controller
nn_opts['train_mode'] = 'fine-tune'
#
# # Use the PWC-Net-small model in quarter-resolution mode
nn_opts['use_dense_cx'] = False
nn_opts['use_res_cx'] = False
nn_opts['pyr_lvls'] = 6
nn_opts['flow_pred_lvl'] = 2
#
# # Robust loss as described doesn't work, so try the following:
nn_opts['loss_fn'] = 'loss_multiscale' # 'loss_multiscale' # 'loss_robust' # 'loss_robust'
nn_opts['q'] = 0.4 # 0.4 # 1. # 0.4 # 1.
nn_opts['epsilon'] = 0.01 # 0.01 # 0. # 0.01 # 0.

# Set the learning rate schedule. This schedule is for a single GPU using a batch size of 8.
# Below,we adjust the schedule to the size of the batch and the number of GPUs.
nn_opts['lr_policy'] = 'multisteps'
nn_opts['init_lr'] = 1e-05
nn_opts['lr_boundaries'] = [80000, 120000, 160000, 200000]
nn_opts['lr_values'] = [1e-05, 5e-06, 2.5e-06, 1.25e-06, 6.25e-07]
nn_opts['max_steps'] = 200000

# Below,we adjust the schedule to the size of the batch and our number of GPUs (2).
nn_opts['max_steps'] = int(nn_opts['max_steps'] * 8 / ds_opts['batch_size'])
nn_opts['cyclic_lr_stepsize'] = int(nn_opts['cyclic_lr_stepsize'] * 8 / ds_opts['batch_size'])

# Instantiate the model and display the model configuration
nn = ModelPWCNet(mode='train_with_val', options=nn_opts, dataset=ds)
nn.print_config()

# Train the model
nn.train()

Have you ever met this problem?

Hi~
I haven't tried to finetune on MIP-sintel, but maybe this link (#7) is helpful for you.
Maybe you can try it.

Well, maybe that issue does not solve my problem. I encounter this problem as early as 200 iteration。
Like this


Start finetuning...
2019-09-17 22:40:18 Iter 100 [Train]: loss=3.39, epe=4.67, lr=0.000010, samples/sec=3.7, sec/step=1.081, eta=5 days, 0:05:35
2019-09-17 22:41:33 Iter 200 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.6, sec/step=0.710, eta=3 days, 6:48:41
2019-09-17 22:42:37 Iter 300 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=6.8, sec/step=0.590, eta=2 days, 17:29:28
2019-09-17 22:43:51 Iter 400 [Train]: loss=nan, epe=nan, lr=0.000010, samples/sec=5.7, sec/step=0.703, eta=3 days, 5:59:56

Thank you very much, Maybe I need to open a new issue.

Hi, I have meet the same situation, Moreover,this nan. stuff is not only appear during finetuning, but aslo pretraining using Chairs_Things_mix. Did you find the solution?

@yaanggny
Copy link

When I trained the model with a RTX 3090 + TF1.15, I got nan at first steps (global step 1, 2, etc). I found TF1.x do not supports RTX3090, TF1.15.x use CUDA 10.0, this configuration reports no errors but results in nan loss(even NaN values in feature maps from feature_estimator layer).
I fixed this by reinstalling TF 1.15 with Nvidia-tensorflow. see https://github.com/nvidia/tensorflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants