Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in training #142

Open
apuomline opened this issue Nov 20, 2023 · 5 comments
Open

error in training #142

apuomline opened this issue Nov 20, 2023 · 5 comments

Comments

@apuomline
Copy link

Hello, author, I made a mistake in training. What is the specific reason for the error?

python scripts/segmentation_train.py --data_name ISIC --data_dir F:\liuxiao\project\dataset\isbi_3b_medsegdiff --out_dir F:\liuxiao\project\MedSegDiff\outdir --image_size 256 --num_channels 128 --class_cond False --num_res_blocks 2 --num_heads 1 --learn_sigma True --use_scale_shift_norm False --attention_resolutions 16 --diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False --rescale_timesteps False --lr 1e-4 --batch_size 8

error:
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
retries = retries.increment(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8850): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024EDE7FCC40>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "F:\miniconda\envs\medsegdiff\lib\site-packages\visdom_init_.py", line 756, in _send
return self.handle_post(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\visdom_init
.py", line 720, in _handle_post
r = self.session.post(url, data=data)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\requests\adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8850): Max retries exceeded with url: /env/main (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000024EDE7FCC40>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。'))
[WinError 10061] 由于目标计算机积极拒绝,无法连接。
on_close() takes 1 positional argument but 3 were given
Visdom python client failed to establish socket to get messages from the server. This feature is optional and can be disabled by initializing Visdom with use_incoming_socket=False, which will prevent waiting for this request to timeout.
[W socket.cpp:663] [c10d] The client socket has failed to connect to [::ffff:127.0.1.1]:59878 (system error: 10049 - 在其上下文中,该请求的地址无效。).
Traceback (most recent call last):
File "scripts/segmentation_train.py", line 118, in
main()
File "scripts/segmentation_train.py", line 26, in main
dist_util.setup_dist(args)
File "F:\liuxiao\project\MedSegDiff.\guided_diffusion\dist_util.py", line 46, in setup_dist
dist.init_process_group(backend=backend, init_method="env://")
File "F:\miniconda\envs\medsegdiff\lib\site-packages\torch\distributed\c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "F:\miniconda\envs\medsegdiff\lib\site-packages\torch\distributed\distributed_c10d.py", line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File "F:\miniconda\envs\medsegdiff\lib\site-packages\torch\distributed\distributed_c10d.py", line 1268, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in

@Lxycherryup
Copy link

Have you solved this problem? I also encountered this problem

@Mia01023
Copy link

Mia01023 commented Jun 5, 2024

same problem

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


same problem

@fayeGou
Copy link

fayeGou commented Jun 27, 2024

how to solve this problem

@Rusab
Copy link

Rusab commented Jul 11, 2024

Seems like NCCL isn't available in Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants