Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix computation of Word2Vec loss & add loss value to logging string #2135

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Commits on Jul 19, 2018

  1. Fixing the computation of the Word2Vec loss.

    This commit re-writes the computation of the loss for both CBOW and SG
    Word2Vec. The loss that is computed and reported is the running average NCE
    loss within the epoch. This means that for each new epoch, the counters are
    reset to 0, and the new average is computed. This was not the cas before,
    and the loss was incremented during the whole training, which is not
    very informative, beside being also incorrect in the implementation (see below)
    
    The computation of the word2vec loss was flawed in many ways:
    - race condition on the running_training_loss parameter (updated concurrently in a
      GIL-free portion of the code)
    - incorrect dividing factor for the average in the case of SG. The averaging
      factor in the case of SG should not be the effective words, but the effective
      samples (a new variable I introduce), because the loss is incremented as many
      times as there are positive examples that are sampled for an effective word.
    
    Addtionnally, I add the logging of the current value of the loss in the progress
    logger, when compute_loss is set to True, and I add a parameter to the
    word2vec_standalone script to trigger the reporting of the loss.
    alreadytaikeune committed Jul 19, 2018
    Configuration menu
    Copy the full SHA
    e96798c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f447df0 View commit details
    Browse the repository at this point in the history
  3. Fixing docstring

    alreadytaikeune committed Jul 19, 2018
    Configuration menu
    Copy the full SHA
    a6548c4 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a2fd340 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1bdd4a5 View commit details
    Browse the repository at this point in the history
  6. Fixing pep8

    alreadytaikeune committed Jul 19, 2018
    Configuration menu
    Copy the full SHA
    7b457d6 View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2018

  1. Configuration menu
    Copy the full SHA
    18735e2 View commit details
    Browse the repository at this point in the history

Commits on Aug 14, 2018

  1. Configuration menu
    Copy the full SHA
    0bcae41 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2018

  1. Merging work done in PR piskvorky#2127

    In word2vec_inner.pyx, functions now used the new config object while still returning the number of samples.
    
    In base_any2vec, logging includes the new loss values, (the addition of this branch)
    alreadytaikeune committed Oct 1, 2018
    Configuration menu
    Copy the full SHA
    eb4b14d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    00e7b7d View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2019

  1. Configuration menu
    Copy the full SHA
    995b5f8 View commit details
    Browse the repository at this point in the history

Commits on Jan 18, 2019

  1. Fixing broken interface with doc2vec

    akhlif committed Jan 18, 2019
    Configuration menu
    Copy the full SHA
    6b46f64 View commit details
    Browse the repository at this point in the history
  2. Merging with develop base

    akhlif committed Jan 18, 2019
    Configuration menu
    Copy the full SHA
    f6a5cc5 View commit details
    Browse the repository at this point in the history
  3. Fixing docstring

    alreadytaikeune authored and akhlif committed Jan 18, 2019
    Configuration menu
    Copy the full SHA
    3a453a9 View commit details
    Browse the repository at this point in the history
  4. Fixing flake8 error line too long

    alreadytaikeune authored and akhlif committed Jan 18, 2019
    Configuration menu
    Copy the full SHA
    a8e4a66 View commit details
    Browse the repository at this point in the history
  5. Finishing proper rebasing

    akhlif committed Jan 18, 2019
    Configuration menu
    Copy the full SHA
    854c8fd View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2019

  1. Configuration menu
    Copy the full SHA
    5e21a85 View commit details
    Browse the repository at this point in the history

Commits on Feb 20, 2019

  1. Fixing docstrings and code redundancy

    akhlif committed Feb 20, 2019
    Configuration menu
    Copy the full SHA
    0f4d572 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    3eec299 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2019

  1. Configuration menu
    Copy the full SHA
    aaf9ed9 View commit details
    Browse the repository at this point in the history