Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix, improve, complete 'training loss' computation for *2Vec models #2617

Open
gojomo opened this issue Oct 1, 2019 · 1 comment
Open

Fix, improve, complete 'training loss' computation for *2Vec models #2617

gojomo opened this issue Oct 1, 2019 · 1 comment

Comments

@gojomo
Copy link
Collaborator

gojomo commented Oct 1, 2019

Word2Vec training-loss isn't quite yet the epoch-based loss most would expect – as pending PR #2135 might address – but also Doc2Vec and FastText should offer functional, analogous reporting, and the docs should make clear what this loss is good for (monitoring training progress) and what it's not good for (assessing overall model fitness for downstream tasks).

(Loss for Doc2Vec looks like it might be there due to inherited interfaces, and was requested along with Word2Vec as in #1272, but that request was closed as a duplicate of #999, which wound up only implementing it for Word2Vec.)

@gojomo
Copy link
Collaborator Author

gojomo commented Jul 12, 2020

In addition to adding loss tallying where it's missing (FastText, Doc2Vec)...

To address the potential multithreading issues (#2743), each thread should have its own loss-tally, only combined safely at the end of an epoch.

To address the precision issue of #2735, wider types should be used as appropriate. But also tallying the loss from a single call/batch into a local var first, before adding to a larger running total (that's potentially much larger and thus in lower-precision ranges of the floating-point implementation) could also help. And splitting tallying per thread, as above, could help as well.

Ensuring there's an easy way to get a loss summary from a single training-batch (or non-training inference in the case of Doc2Vec) might offer new/improved ways of doing a "does this text match this model's expectation" calculation, which might enable new uses (and/or replacing the old 'scoring' feature @mataddy added to some Word2Vec modes long ago).

Potentially even offering a way to tally loss per word (or other model aspect), if low-overhead, could also enable new insight into whether different parts of a model are relatively undertrained compared to others, or warnings when parts of a model are updated a lot without any updates to others (in the case of incremental training), or even dynamic choice of learning-rate per epoch or per word (as in Adagrad/etc).

Having loss-tracking really working might also allow a mode that avoids any explicit/fixed choice of epochs, just a "train 'til converged" option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant