Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division by zero #293

Open
olegs opened this issue Oct 14, 2021 · 4 comments
Open

Division by zero #293

olegs opened this issue Oct 14, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@olegs
Copy link
Member

olegs commented Oct 14, 2021

To reproduce use predefined "brain computer interface" search from Pubmed.

[2021-10-14 08:34:35,747: INFO/ForkPoolWorker-1] Generating evolution topics descriptions
[2021-10-14 08:34:35,833: WARNING/ForkPoolWorker-1] /home/user/pysrc/papers/analysis/topics.py:116: RuntimeWarning: invalid value encountered in true_divide
  tokens_freqs_per_comp = tokens_freqs_per_comp / tokens_freqs_norm
[2021-10-14 08:34:35,833: WARNING/ForkPoolWorker-1] /home/user/pysrc/papers/analysis/topics.py:123: RuntimeWarning: divide by zero encountered in log
  adjusted_distance = distance.T * np.log(tokens_freqs_total)
@olegs olegs added the bug Something isn't working label Oct 14, 2021
@olegs
Copy link
Member Author

olegs commented Oct 14, 2021

@ctrltz is it possible to use np.log1p to avoid this problem?

@ctrltz
Copy link
Contributor

ctrltz commented Oct 14, 2021

Sure, but if tokens_freqs_total equals 0, I think it means that the whole corpus_counts contains only zeros, and one might also separate this case implicitly like:

if not corpus_counts.sum():
    return *empty descriptions here*

Did not keep evolution in mind when worked on the topics description, thanks for pointing it.

@olegs
Copy link
Member Author

olegs commented Oct 14, 2021

Also tokens_freqs_norm may be zero. What is correct fix for this?

@ctrltz
Copy link
Contributor

ctrltz commented Oct 14, 2021

As far as I understand, it means that some of the components have no corpus terms to be analyzed, so it would be correct to return an empty description for the respective components.

It might be simpler to plug in np.log1p at the moment to ensure stability, and I can think a bit more in the coming days.

NB: I have also fixed the previous comment in case you have used it already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants