Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【bug】can not load cambrian-34b #12

Open
CSEEduanyu opened this issue Jun 28, 2024 · 16 comments
Open

【bug】can not load cambrian-34b #12

CSEEduanyu opened this issue Jun 28, 2024 · 16 comments
Assignees
Labels
bug Something isn't working

Comments

@CSEEduanyu
Copy link

in load_pretrained_model
model = CambrianLlamaForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3531, in from_pretrained
) = cls._load_pretrained_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([1024, 1152]) in "weight" (which has shape torch.Size([1024, 1024])), this look incorrect.

@ellisbrown ellisbrown added the bug Something isn't working label Jun 28, 2024
@penghao-wu
Copy link
Contributor

Hi, could you please provide more information about your case (e.g. the device_map for loading and the number of GPUs you are using). Also, can you try to load the 8/13b model to see whether the same problem happens?

@CSEEduanyu
Copy link
Author

transformers in my env is 4.39 , why must transformers==4.37.0 in dependencies ?

@CSEEduanyu
Copy link
Author

all dependencies is "== " , I wonder if ">" is OK?

@penghao-wu
Copy link
Contributor

Our training and evaluation are mainly conducted with the specified versions and haven't been extensively tested with higher versions to ensure correctness. But I have tested to run the 34B model with transformers==4.39.0 and it works fine. Could you provide the information about device_map for loading and the number of GPUs you are using? Also, what is the version of your accelerate?

@CSEEduanyu
Copy link
Author

Our training and evaluation are mainly conducted with the specified versions and haven't been extensively tested with higher versions to ensure correctness. But I have tested to run the 34B model with transformers==4.39.0 and it works fine. Could you provide the information about device_map for loading and the number of GPUs you are using? Also, what is the version of your accelerate?

A100*8

@CSEEduanyu
Copy link
Author

Loading checkpoint shards: 94%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 30/32 [00:19<00:01, 1.57it/s]
Traceback (most recent call last):
model = CambrianLlamaForCausalLM.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3852, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4286, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 807, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 285, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([1024, 1152]) in "weight" (which has shape torch.Size([1024, 1024])), this look incorrect.

@CSEEduanyu
Copy link
Author

when i add some log , is loding "model.mm_projector_aux_0.0.weight"
@penghao-wu

@CSEEduanyu
Copy link
Author

Is it because I only kept the second one in mm_vision_tower_aux_list?

@penghao-wu
Copy link
Contributor

Is it because I only kept the second one in mm_vision_tower_aux_list?

What do you mean by this? You don't need to modify the config if you want to load our trained model.

@CSEEduanyu
Copy link
Author

Is it because I only kept the second one in mm_vision_tower_aux_list?

What do you mean by this? You don't need to modify the config if you want to load our trained model.

because i can only load local path model,Can you list the huggingface download addresses for these four vision models?

"mm_vision_tower_aux_list": [
"siglip/CLIP-ViT-SO400M-14-384",
"openai/clip-vit-large-patch14-336",
"facebook/dinov2-giant-res378",
"clip-convnext-XXL-multi-stage"
],

@CSEEduanyu
Copy link
Author

For example, CLIP-ViT-SO400M-14-384 seems to have many versions, and I can't search clip-conv-xxL-multi-stage in huggfing face

@penghao-wu
Copy link
Contributor

CLIP-ViT-SO400M-14-384 should be hf-hub:timm/ViT-SO400M-14-SigLIP-384 and clip-conv-xxL-multi-stage should be hf-hub:laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-soup . If you use local path, you might need to look into the loading code for each of the vision encoders in the cambrian/model/multimodal_encoder folder to ensure the correctness.

@dionren
Copy link

dionren commented Jun 30, 2024

Hi, how can I set 2 48G gpus?

2024-06-30 15:21:12 PID=57 init.py:49 setup_logging() INFO → 'standard' logger initialized.
2024-06-30 15:21:13 PID=57 model_worker.py:274 () INFO → args: Namespace(host='0.0.0.0', port=40000, worker_address='http://localhost:40000', controller_address='http://localhost:10000', model_path='/mnt/cpn-pod/models/nyu-visionx/cambrian-34b', model_base=None, model_name=None, device='cuda', multi_modal=False, limit_model_concurrency=5, stream_interval=1, no_register=False, load_8bit=False, load_4bit=False)
2024-06-30 15:21:13 PID=57 model_worker.py:66 init() INFO → Loading the model cambrian-34b on worker b48646 ...
2024-06-30 15:21:13 PID=57 builder.py:119 load_pretrained_model() INFO → Loading Cambrian from /mnt/cpn-pod/models/nyu-visionx/cambrian-34b
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/cambrian/cambrian/serve/model_worker.py", line 279, in
worker = ModelWorker(args.controller_address,
File "/root/cambrian/cambrian/serve/model_worker.py", line 67, in init
self.tokenizer, self.model, self.image_processor, self.context_len = load_pretrained_model(
File "/root/cambrian/cambrian/model/builder.py", line 120, in load_pretrained_model
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 814, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2029, in from_pretrained
return cls._from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2261, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 178, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 203, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 905, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

@penghao-wu
Copy link
Contributor

return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

This error seems not related to multiple GPUs. Make sure that all model files are downloaded correctly (e.g. tokenizer.model)

@penghao-wu
Copy link
Contributor

@dionren Some of the vision encoders are not from transformers and do not support device_map, so there are some problems setting device_map=auto using multiple GPUs. And we are still working to convert the vision encoders to support this.

But I have a workaround for your case with 2 48G gpus. This includes the following modifications:

  1. Modify the beginning of cambrian/model/builder.py
from accelerate import infer_auto_device_map, dispatch_model

def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device_map="auto", device="cuda", **kwargs):
    device_map='sequential'
    kwargs = {"device_map": device_map, "max_memory":{0: "30GIB", 1: "49GIB"}, **kwargs}
  1. Change
    cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd], 2).flatten(1,2)
    to
    cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd.to(cur_latent_query.device)], 2).flatten(1,2)

@dionren
Copy link

dionren commented Jun 30, 2024

@dionren Some of the vision encoders are not from transformers and do not support device_map, so there are some problems setting device_map=auto using multiple GPUs. And we are still working to convert the vision encoders to support this.

But I have a workaround for your case with 2 48G gpus. This includes the following modifications:

  1. Modify the beginning of cambrian/model/builder.py
from accelerate import infer_auto_device_map, dispatch_model

def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device_map="auto", device="cuda", **kwargs):
    device_map='sequential'
    kwargs = {"device_map": device_map, "max_memory":{0: "30GIB", 1: "49GIB"}, **kwargs}
  1. Change
    cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd], 2).flatten(1,2)

    to
    cur_latent_query_with_newline = torch.cat([cur_latent_query, cur_newline_embd.to(cur_latent_query.device)], 2).flatten(1,2)

I'm gonna try it out. Thanks a ton for your help and the awesome work you've done. It's truly impressive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants