You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thanks for your great work firstly and There's always a strange problem when I try to run the code on my own dataset.
I have already changed my dataset to the VSPW dataset format. But a strange bug I can't solve.
File "./tools/train.py", line 188, in
main()
File "./tools/train.py", line 177, in main
train_segmentor(
File "/root/fuzhouquan/VSS-CFFM-main/mmseg/apis/train.py", line 115, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 51, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/root/fuzhouquan/VSS-CFFM-main/mmseg/models/segmentors/base.py", line 160, in train_step
print('loss:', losses)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/tensor.py", line 193, in repr
return torch._tensor_str._str(self)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 383, in _str
return _str_intern(self)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 358, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 242, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 90, in init
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered
I assure you that the GPU memory is adequate, because the VSPW raw dataset is capable of running on my 4 A800s.
But in my dataset, there always seems to be a problem with calculating the loss.
When I use your debug line in ./mmseg/models/segmentors/base.py line 155 to line 157 print(type(data_batch)) print(data_batch.keys()) print(data_batch['img'].shape, data_batch['gt_semantic_seg'].shape) # torch.Size([1, 3, 3, 480, 480]) torch.Size([1, 3, 1, 480, 480])
In my dataset, it prints
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
Thanks for your interest. I assume you have tried our code on VSPW dataset and you would like to use it for your own dataset. If not, please try our dataset first.
I have not met this issue before. But I think you might forget to adjust some dimensions (specific for your own dataset), like number of class, ... I would suggest you to check the dimensions of tensors. Please let me know if you have further questions
Hi, Thanks for your great work firstly and There's always a strange problem when I try to run the code on my own dataset.
I have already changed my dataset to the VSPW dataset format. But a strange bug I can't solve.
File "./tools/train.py", line 188, in
main()
File "./tools/train.py", line 177, in main
train_segmentor(
File "/root/fuzhouquan/VSS-CFFM-main/mmseg/apis/train.py", line 115, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 51, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/root/fuzhouquan/VSS-CFFM-main/mmseg/models/segmentors/base.py", line 160, in train_step
print('loss:', losses)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/tensor.py", line 193, in repr
return torch._tensor_str._str(self)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 383, in _str
return _str_intern(self)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 358, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 242, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/root/anaconda3/envs/cffm/lib/python3.8/site-packages/torch/_tensor_str.py", line 90, in init
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered
I assure you that the GPU memory is adequate, because the VSPW raw dataset is capable of running on my 4 A800s.
But in my dataset, there always seems to be a problem with calculating the loss.
When I use your debug line in ./mmseg/models/segmentors/base.py line 155 to line 157
print(type(data_batch)) print(data_batch.keys()) print(data_batch['img'].shape, data_batch['gt_semantic_seg'].shape) # torch.Size([1, 3, 3, 480, 480]) torch.Size([1, 3, 1, 480, 480])
In my dataset, it prints
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([4, 4, 3, 480, 480]) torch.Size([4, 4, 1, 480, 480])
In VSPW dataset, it prints
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([2, 4, 3, 480, 480]) torch.Size([2, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([2, 4, 3, 480, 480]) torch.Size([2, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([2, 4, 3, 480, 480]) torch.Size([2, 4, 1, 480, 480])
<class 'dict'>
dict_keys(['img_metas', 'img', 'gt_semantic_seg'])
torch.Size([2, 4, 3, 480, 480]) torch.Size([2, 4, 1, 480, 480])
it's true that the first dimension is not equal, why? How can I solve it?
Thanks so much! I'd appreciate it if you could help me out.
The text was updated successfully, but these errors were encountered: