Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] #64

Closed
Artorije opened this issue Jul 5, 2024 · 2 comments
Closed

[Bug] #64

Artorije opened this issue Jul 5, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@Artorije
Copy link

Artorije commented Jul 5, 2024

I use openmixup to train the adautomix but I suffered from a program bug suddenly during the training.

./tools/dist_train.sh configs/classification/cifar100/adautomix/basic/r18_l2_a1_bili_mlr5e_2.py 1 --auto_resume

but wrong information in middle processing
[>>>>>>>>>>>>>> ] 29/100, 33.3 task/s, elapsed: 1s, ETA: 2sException ignored in: <function Image.del at 0x7f6323d3e430>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 4017, in del
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.del at 0x7f6323d3e430>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 4017, in del
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Image.del at 0x7f6323d3e430>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 4017, in del
self.tk.call('image', 'delete', self.name)
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Exception ignored in: <function Variable.del at 0x7f6323d82af0>
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/tkinter/init.py", line 363, in del
if self._tk.getboolean(self._tk.call("info", "exists", self._name)):
RuntimeError: main thread is not in main loop
Tcl_AsyncDelete: async handler deleted by the wrong thread
E0705 00:40:46.738942 139874288846656 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: -6) local_rank: 0 (pid: 352411) of binary: /home/zhou/anaconda3/envs/openmixup/bin/python
Traceback (most recent call last):
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/launch.py", line 198, in
main()
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/launch.py", line 194, in main
launch(args)
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/launch.py", line 179, in launch
run(args)
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/zhou/anaconda3/envs/openmixup/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

tools/train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2024-07-05_00:40:46
host : zhou
rank : 0 (local_rank: 0)
exitcode : -6 (pid: 352411)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 352411

相关信息

there are some references that indicate it is maybe the parallel program call tkinter package,which not supports parallel operators.
joblib/joblib#807

@Artorije Artorije added the bug Something isn't working label Jul 5, 2024
@Lupin1998
Copy link
Member

Hi, @Artorije, sorry for the late reply. Could you please provide more details of your conda environment, as it doesn't require the tkinter package in OpenMixup? You might uninstall the package and try again.

@Lupin1998 Lupin1998 self-assigned this Jul 10, 2024
@Artorije
Copy link
Author

I just fixed this issue by import matplotlib and matplotlib.use('Agg').
Then i have to closed and Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants