Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance enhancements of conditional logit #81

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mathijsvdv
Copy link

In the past, I've used Pylogit (specifically the MNL) on a large dataset of 200mln rows. I have noticed two bottlenecks:

  1. The sparse matrix structure for weights_per_obs is not always kept, causing a 200 mln x 200 mln dense numpy array to be created, see also issue Sparse to Dense #79.
  2. The derivatives dh_dv for a conditional logit represent an identity matrix but are coded as a csr_matrix. This causes the calculation dh_dv.dot(design) to be relatively slow even though its result is trivially design.

To remedy the first bottleneck, I used the same solution proposed in issue #79.

For the second bottleneck, I made an efficient identity_matrix class (derived from scipy's spmatrix). When such an identity matrix I is multiplied with A using I.dot(A) we get A again.

I've run a benchmark by making a script that estimates an MNL on the usual Swiss-Metro dataset. I ran the line-profiler on some of the critical functions, namely calc_gradient and calc_fisher_info_matrix. In summary, this change reduced the computation time of calc_gradient by 26% (from 0.080697 to 0.059372), and that of calc_fisher_info_matrix by 99% (!) (from 0.906896s to 0.0062323s).

Profiling results are attached.
profile_before.txt
profile_after.txt

This is to be used as an efficient implementation of dh_dv
This means that dh_dv.dot(design) will be an instant calculation.
In the process, add checking for negative `arg1` argument.
@timothyb0912
Copy link
Owner

timothyb0912 commented Aug 17, 2021

Hi @mathijsvdv this is great! Wow:

  • the speedups are fantastic
  • the profiling is much appreciated
  • the super well documented class and tests are much appreciated.

Thanks also for your patience as I've been much delayed in responding to this PR and to the issue that spawned it.
I should be able to take a look at this within a few days and update the package.

Thanks again for your help!

@timothyb0912 timothyb0912 self-assigned this Aug 17, 2021
@timothyb0912 timothyb0912 linked an issue Aug 17, 2021 that may be closed by this pull request
@timothyb0912 timothyb0912 removed a link to an issue Aug 17, 2021
@timothyb0912 timothyb0912 linked an issue Aug 17, 2021 that may be closed by this pull request
@mathijsvdv
Copy link
Author

Glad you like it @timothyb0912 ! I really appreciate all the work you've done to make a flexible logit estimation suite so I'm happy to help!

Copy link

@dima-quant dima-quant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fixes, too bad still not merged into master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sparse to Dense
3 participants