Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MuseV多块显卡执行图片生成依然报OOM #141

Open
youtianhong opened this issue Jun 17, 2024 · 9 comments
Open

MuseV多块显卡执行图片生成依然报OOM #141

youtianhong opened this issue Jun 17, 2024 · 9 comments

Comments

@youtianhong
Copy link

youtianhong commented Jun 17, 2024

问题背景:

启动的时候,大概占用了12G显存,然后我在gradio界面上开始执行图生视频,报OOM

return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 176.00 MiB (GPU 0; 14.57 GiB total capacity; 13.61 GiB already allocated; 118.75 MiB free; 14.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我的问题是,我生产是4块显卡,每块16G,共64G显存,这个museV如何指定自动分摊占用的显存到多块显卡上?我现在启动占了12G,然后在gradio上一跑就OOM了,(我单独指定一块显卡没用,发现一块显卡带不起来,步骤一和步骤二都在一块显卡上,被自己撑死了)

@xzqjack
Copy link
Contributor

xzqjack commented Jun 17, 2024

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

@youtianhong
Copy link
Author

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

@youtianhong
Copy link
Author

我花了九牛二虎之力,才把这个启动起来(一堆错误,还要改源码),然后真正图生视频却OOM了

@xzqjack
Copy link
Contributor

xzqjack commented Jun 17, 2024

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191
目前image2video、video2video的cuda设置都是 device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

@youtianhong
Copy link
Author

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是 device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后)

Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3261 C python 132MiB |
| 1 N/A N/A 3261 C python 5608MiB |
| 2 N/A N/A 3261 C python 6512MiB |
| 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |

报错问题如下:(必须在一块卡上么?哪设错了
File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong
Copy link
Author

@xzqjack 大佬,我现在按你说的,只改了 video2video的 改成 device="cuda:2",然后text2video的删了(默认0),还是报错啊(这次还是报OOM)
难道多块显卡也无解么? 求指导
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|

| 0 N/A N/A 29536 C python 13614MiB |
| 2 N/A N/A 29536 C python 6512MiB |
| 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |
+-----------------------------------------------------------------------------------------+

@xzqjack
Copy link
Contributor

xzqjack commented Jun 17, 2024

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是 device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后)

Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |

报错问题如下:(必须在一块卡上么?哪设错了) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的,只是 脚本哪里没有完全适配device的切换

@youtianhong
Copy link
Author

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是 device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后)
Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |
报错问题如下:(必须在一块卡上么?哪设错了) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的,只是 脚本哪里没有完全适配device的切换

大佬,这个问题是不是你们可以fix一下,发个版本啊,感谢啊

1 similar comment
@youtianhong
Copy link
Author

@youtianhong 目前gradio脚本的机制 image2video和image+middle2video 的确都在一块卡上,可以优化其中一个(如 image+middle2video)的卡。torch支持 "cuda:0" 、"cuda:1"这种方式来设置。

兄弟能说详细点么?你这个优化啥意思?我是想启动的时候占用显卡1,gradio跑图生视频的时候占用显卡2,你这个咋设置?

https://github.com/TMElyralab/MuseV/blob/main/scripts/gradio/gradio_video2video.py#L191 目前image2video、video2video的cuda设置都是 device="cuda",在torch里默认为都是 用"cuda:0",所以你可以试试 把video2video的 改成 device="cuda:1"

感谢大佬回复,貌似不行啊,我现在在gradio_text2video.py里面设置了device = "cuda:1",在gradio_video2video.py中设置了cuda:2,然后直接启动python app.py (gradio) 的,下面是显卡的内存使用,占了3块(启动后)
Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 3261 C python 132MiB | | 1 N/A N/A 3261 C python 5608MiB | | 2 N/A N/A 3261 C python 6512MiB | | 3 N/A N/A 19947 C /usr/local/bin/ollama 100MiB |
报错问题如下:(必须在一块卡上么?哪设错了) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/data/env/digital-human/musev/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution)

@youtianhong 看起来这个机制是可以的,只是 脚本哪里没有完全适配device的切换

大佬,这个问题是不是你们可以fix一下,发个版本啊,感谢啊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants