request for cityscapes related configs and pretrain weight #4

Cloveryww · 2022-06-27T08:46:32Z

Hi,
Thanks for your great work firstly and could you please release configs and pretrain weights of cityscapes dataset?
I have seen part of files about cityscapes, but still lack of some configs file.

GuoleiSun · 2022-06-29T13:24:29Z

Thanks for your interest. Initially, we did not release the configs and pre-trained weights for cityscapes because video data of cityscapes is extremely large (>300 GB). Nevertheless, we will release the files for cityscapes later.

Cloveryww · 2022-07-01T09:34:42Z

Hi，
I have some question about CFFM when running in cityscapes dataset.

When I want to train CFFM in cityscapes dataset, since training CFFM in cityscapes is a semi-supervised fasion as its sparse annotations, after I change the codes about dataset and loss compute(listed bellow), is any other important code needed adjustment?
loss compute:

VSS-CFFM/mmseg/models/decode_heads/decode_head.py

Lines 752 to 760 in d93f8a0

    
           if seg_logit.shape[1]==seg_label.shape[1]+1:     # k+1 
        
               seg_logit_ori=seg_logit[:,:-1] 
        
               batch_size, num_clips, _, h ,w=seg_logit_ori.shape 
        
               seg_logit_ori=seg_logit_ori.reshape(batch_size*num_clips,-1,h,w) 
        
               seg_logit_lastframe=seg_logit[:,-1] 
        
               batch_size, num_clips, _, h ,w=seg_label.shape 
        
               seg_label_ori=seg_label.reshape(batch_size*num_clips,-1,h,w) 
        
               seg_label_lastframe=seg_label[:,-1]

seg label in dataset:

VSS-CFFM/mmseg/datasets/custom.py

Lines 1060 to 1078 in d93f8a0

    
           try: 
        
               img_anns=[] 
        
               for ii in dilation_used: 
        
                   img_info_one={} 
        
                   filename=img_info['filename'] 
        
                   seg_map=img_info['ann']['seg_map'] 
        
                   value_i_splits=filename.split('_') 
        
                   im_name_new = "_".join( 
        
                       value_i_splits[:-2] + [(str(int(value_i_splits[-2]) + ii)).rjust(6, "0")] + value_i_splits[-1:]) 
        
                   # value_i_splits=seg_map.split('_') 
        
                   # seg_map_new = "_".join( 
        
                   #     value_i_splits[:-2] + [(str(int(value_i_splits[-2]) - ii)).rjust(6, "0")] + value_i_splits[-1:]) 
        
                   img_info_one['filename']=im_name_new 
        
                   img_info_one['ann']=dict(seg_map=seg_map) 
        
                   ann_info_one=img_info_one['ann'] 
        
                   img_anns.append([img_info_one, ann_info_one]) 
        
                   if not os.path.isfile(self.img_dir+'/'+im_name_new): 
        
                       assert False

If the model run in cityscapes with a size of 512x1024, the params about core module of CFFM(such as bellow codes) is the same as params in VSPW? If not, how to adjust them? for example "input_resolution"?

VSS-CFFM/mmseg/models/decode_heads/cffm_head.py

Lines 80 to 106 in d93f8a0

    
           self.decoder_focal=BasicLayer3d3(dim=embedding_dim, 
        
                  input_resolution=(60, 
        
                                    60), 
        
                  depth=depths, 
        
                  num_heads=8, 
        
                  window_size=7, 
        
                  mlp_ratio=4., 
        
                  qkv_bias=True,  
        
                  qk_scale=None, 
        
                  drop=0.,  
        
                  attn_drop=0., 
        
                  drop_path=0., 
        
                  norm_layer=nn.LayerNorm,  
        
                  pool_method='fc', 
        
                  downsample=None, 
        
                  focal_level=2,  
        
                  focal_window=5,  
        
                  expand_size=3,  
        
                  expand_layer="all",                            
        
                  use_conv_embed=False, 
        
                  use_shift=False,  
        
                  use_pre_norm=False,  
        
                  use_checkpoint=False,  
        
                  use_layerscale=False,  
        
                  layerscale_value=1e-4, 
        
                  focal_l_clips=[7,4,2], 
        
                  focal_kernel_clips=[7,5,3])

Does the current version of codes not support efficient inference on video data(reach the fps reported in paper), or I did not find the corresponding codes?

Sincerely look forward to your reply, thank you!!!

ydhongHIT · 2023-06-11T09:13:53Z

Thanks for your interest. Initially, we did not release the configs and pre-trained weights for cityscapes because video data of cityscapes is extremely large (>300 GB). Nevertheless, we will release the files for cityscapes later.

Hi, when are you going to provide the config files for cityscapes? And it is very kind to provide a brief instruction for dataset preparation. Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

request for cityscapes related configs and pretrain weight #4

request for cityscapes related configs and pretrain weight #4

Cloveryww commented Jun 27, 2022

GuoleiSun commented Jun 29, 2022

Cloveryww commented Jul 1, 2022

ydhongHIT commented Jun 11, 2023 •

edited

Loading

request for cityscapes related configs and pretrain weight #4

request for cityscapes related configs and pretrain weight #4

Comments

Cloveryww commented Jun 27, 2022

GuoleiSun commented Jun 29, 2022

Cloveryww commented Jul 1, 2022

ydhongHIT commented Jun 11, 2023 • edited Loading

ydhongHIT commented Jun 11, 2023 •

edited

Loading