Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.py NaN values for sparse matrix multiplication #13

Open
cswpy opened this issue Dec 29, 2021 · 0 comments
Open

model.py NaN values for sparse matrix multiplication #13

cswpy opened this issue Dec 29, 2021 · 0 comments

Comments

@cswpy
Copy link

cswpy commented Dec 29, 2021

When training the VGNN, in the attention module we multiply two sparse matrices, edge_e and data, together to produce h_prime. Then the code verifies that whether h_prime contains NaN, which failed for me. I tried to check for edge_e before multiplication by converting it to dense matrix but I don't think NaN value is in it. I used assert not torch.isnan(edge_e.to_dense()).any() for checking it. Below is a detailed trace log of the error.

  File "/scratch/pw1287/GNN4EHR/utils.py", line 19, in train
    logits, kld = model(input)
  File "/home/pw1287/.conda/envs/GNN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/pw1287/.conda/envs/GNN/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/pw1287/.conda/envs/GNN/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/pw1287/.conda/envs/GNN/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "/home/pw1287/.conda/envs/GNN/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/home/pw1287/.conda/envs/GNN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/pw1287/GNN4EHR/model.py", line 217, in forward
    outputs = [self.encoder_decoder(data[i, :]) for i in range(batch_size)]
  File "/scratch/pw1287/GNN4EHR/model.py", line 217, in <listcomp>
    outputs = [self.encoder_decoder(data[i, :]) for i in range(batch_size)]
  File "/scratch/pw1287/GNN4EHR/model.py", line 206, in encoder_decoder
    h_prime = self.out_att(output_edges, h_prime)
  File "/home/pw1287/.conda/envs/GNN/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/scratch/pw1287/GNN4EHR/model.py", line 111, in forward
    h_prime = torch.stack([self.attention(l, a, N, data, edge) for l, a in zip(self.W, self.a)], dim=0).mean(
  File "/scratch/pw1287/GNN4EHR/model.py", line 111, in <listcomp>
    h_prime = torch.stack([self.attention(l, a, N, data, edge) for l, a in zip(self.W, self.a)], dim=0).mean(
  File "/scratch/pw1287/GNN4EHR/model.py", line 99, in attention
    assert not torch.isnan(h_prime).any()

Is this an error of the input side? How do I get around this? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant