Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Freezes When Using QPyTorch #60

Open
hasnainnaeem opened this issue Sep 4, 2022 · 9 comments
Open

Code Freezes When Using QPyTorch #60

hasnainnaeem opened this issue Sep 4, 2022 · 9 comments

Comments

@hasnainnaeem
Copy link

The code file does not execute and freezes indefinitely. When I remove the below-mentioned line from the code, the project runs as expected:
from qtorch.quant import float_quantize, fixed_point_quantize, block_quantize

I wonder if anyone encountered a similar issue. @Tiiiger Any ideas why this is happening?
We are unable to proceed with our research because of this issue, if you could please spare some time to look into the code, I can send you the files (setup will take ~2 mins).

@Tiiiger
Copy link
Owner

Tiiiger commented Sep 4, 2022

Hi @hasnainnaeem ,

what's your environment? pytorch, cuda version?

@hasnainnaeem
Copy link
Author

Environment Details:
Torch: 1.11.0
Cuda: 11.3
Ubuntu: 22.7
Python: 3.8
GCC: 9.3

@sbulfer
Copy link

sbulfer commented Oct 12, 2022

I ran into this exact problem. it seems it is hanging during the just in time compilation. I am not sure yet how to fix it.. it might require reinstalling pytorch to clear out some cache or something

@hasnainnaeem
Copy link
Author

I ran into this exact problem. it seems it is hanging during the just-in-time compilation. I am not sure yet how to fix it.. it might require reinstalling PyTorch to clear out some cache or something

Unfortunately, that doesn't fix the issue. I tried doing that multiple times, plus reinstalled the Linux subsystem. Then, I tried again on dual-booted ubuntu, but the issue persisted.

Right now, I am working on Colab, it does not occur there.

I think it has something to do with the graphics card/drivers.

@sbulfer
Copy link

sbulfer commented Oct 12, 2022

I figured it out!
something happened where a lock file was generated, but never cleared. The steps to fix it are the following:
when the python file hangs, use ctrl-c to kill the process. There should be a stack trace that is printed out. mine was the following:
File "", line 1, in
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/qtorch/quant/init.py", line 1, in
from .quant_function import *
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/qtorch/quant/quant_function.py", line 20, in
quant_cuda = load(
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1439, in _jit_compile
baton.wait()
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/file_baton.py", line 42, in wait
time.sleep(self.wait_seconds)

analyzing this trace, we see that it is hung on a file lock. I used pdb to debug the program like so:
python3 -m pdb my_file.py

within pdb, i set a breakpoint at the file:
b /home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/file_baton.py:42

press c to continue...

I then opened the file lock code and noticed there was an object called "self.lock_file_path"
i printed it by typing "self.lock_file_path" in pdb
navigate to this path (sans lock)
and delete the lock file
your file should now run again :)

@hasnainnaeem
Copy link
Author

Awesome! Thanks for letting me.

I knew it had something to do with some lock file, but I couldn't find the lock file.

@sbulfer
Copy link

sbulfer commented Oct 12, 2022

I'm glad I could help :)

@RuokaiYin
Copy link

Thank you very much for the solution! I have no idea why I suddenly ran into the same situation, but the solution fix the problem! (The codes work normally for weeks, then suddenly freeze...)

@Tiiiger
Copy link
Owner

Tiiiger commented Nov 16, 2022

Hi all on this thread,

Thank you all for sharing the knowledge here. I have become too busy to maintain this repo and have not tested it on more recent environment.

Sorry about this!

Bests,
Tianyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants