Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nicer error message for undefined symbol #1339

Merged
merged 7 commits into from
Jul 4, 2024

Conversation

dakinggg
Copy link
Collaborator

@dakinggg dakinggg commented Jul 4, 2024

Adds a nicer error message for the most common case of the flash attention install getting messed up.

Before:

ImportError:
/usr/lib/python3/dist-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so
: undefined symbol: _ZN3c104cuda9SetDeviceEi

After:

ImportError:
/usr/lib/python3/dist-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so
: undefined symbol: _ZN3c104cuda9SetDeviceEi

The above exception was the direct cause of the following exception:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /workspace/llm-foundry/scripts/train/train.py:25 in <module>                 │
│                                                                              │
│    22 from omegaconf import DictConfig                                       │
│    23 from omegaconf import OmegaConf as om                                  │
│    24                                                                        │
│ ❱  25 from llmfoundry.callbacks import AsyncEval, HuggingFaceCheckpointer    │
│    26 from llmfoundry.data.dataloader import build_dataloader                │
│    27 from llmfoundry.eval.metrics.nlp import InContextLearningMetric        │
│    28 from llmfoundry.layers_registry import ffns_with_megablocks            │
│                                                                              │
│ /workspace/llm-foundry/llmfoundry/__init__.py:17 in <module>                 │
│                                                                              │
│   14 │   del flash_attn_func                                                 │
│   15 except ImportError as e:                                                │
│   16 │   if "undefined symbol" in str(e):                                    │
│ ❱ 17 │   │   raise ImportError(                                              │
│   18 │   │   │   "The flash_attn package is not installed correctly. Usually │
│   19 │   │   │   " of PyTorch is different from the version that flash_attn  │
│   20 │   │   │   " workflow has resulted in PyTorch being reinstalled. This  │
╰──────────────────────────────────────────────────────────────────────────────╯
ImportError: The flash_attn package is not installed correctly. Usually this
means that your runtime version. of PyTorch is different from the version that
flash_attn was installed with, which can occur when your workflow has resulted
in PyTorch being reinstalled. This probably happened because you are using an
old docker image with the latest version of LLM Foundry. Check that the PyTorch
version in your Docker image matches the PyTorch version in LLM Foundry setup.py
and update accordingly. The latest Docker image can be found in the README.

@dakinggg dakinggg requested a review from a team as a code owner July 4, 2024 01:32
@dakinggg dakinggg requested a review from mvpatel2000 July 4, 2024 01:33
@dakinggg dakinggg enabled auto-merge (squash) July 4, 2024 01:33
Copy link
Contributor

@snarayan21 snarayan21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hella useful, ty

llmfoundry/__init__.py Outdated Show resolved Hide resolved
llmfoundry/__init__.py Outdated Show resolved Hide resolved
@dakinggg dakinggg merged commit 22e243a into mosaicml:main Jul 4, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants