Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastfood is slow #116

Open
glevv opened this issue May 24, 2021 · 8 comments
Open

Fastfood is slow #116

glevv opened this issue May 24, 2021 · 8 comments
Labels
bug Something isn't working

Comments

@glevv
Copy link

glevv commented May 24, 2021

Fastfood is somewhat slower than RBFSampler (while in theory it should be faster).
Google Colab timings (sklearn 0.24.2, sklearn_extra 0.2.0, numpy 1.19.5)
ka_timings_gc
Laptop timings (Ubuntu 20.04, Intel 8300H, 32GB RAM) (sklearn 0.23.2, sklearn_extra 0.2.0, numpy 1.19.2)
ka-timings

Sample code

!pip install scikit-learn-extra
import time
import numpy as np
from matplotlib import pyplot as plt
from sklearn.kernel_approximation import RBFSampler
from sklearn_extra.kernel_approximation import Fastfood

X = np.random.normal(size=(100_000, 100))
n_components = [64, 128, 256, 512, 1024, 2048]
timings_ff = []
timings_rbfs = []

for n in n_components:
  ff = Fastfood(n_components=n, random_state=1)
  tic = time.perf_counter()
  X_t = ff.fit_transform(X)
  tac = time.perf_counter()
  timings_ff.append(tac - tic)

for n in n_components:
  rbfs = RBFSampler(n_components=n, random_state=1)
  tic = time.perf_counter()
  X_t = rbfs.fit_transform(X)
  tac = time.perf_counter()
  timings_rbfs.append(tac - tic)

plt.plot(n_components, timings_ff);
plt.plot(n_components, timings_rbfs);

Changing tradeoff_mem_accuracy to 'mem' did not affect speed.

@glevv
Copy link
Author

glevv commented Jun 2, 2021

bottleneck seems to be _phi function

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       2    0.289    0.144    0.289    0.144 _fastfood.py:104(_approx_fourier_transformation_multi_dim)
       1    0.000    0.000    0.000    0.000 _fastfood.py:108(_l2norm_along_axis1)
       1    0.000    0.000    0.000    0.000 _fastfood.py:112(_uniform_vector)
       1    0.045    0.045    0.389    0.389 _fastfood.py:118(_apply_approximate_gaussian_matrix)
       1    0.036    0.036    0.036    0.036 _fastfood.py:136(_scale_transformed_data)
       1    0.441    0.441    0.441    0.441 _fastfood.py:144(_phi)
       1    0.000    0.000    0.007    0.007 _fastfood.py:153(fit)
       1    0.000    0.000    0.000    0.000 _fastfood.py:190(<listcomp>)
       1    0.000    0.000    0.897    0.897 _fastfood.py:207(transform)
       1    0.000    0.000    0.000    0.000 _fastfood.py:72(_is_number_power_of_two)
       1    0.000    0.000    0.000    0.000 _fastfood.py:76(_enforce_dimensionality_constraints)
       1    0.000    0.000    0.024    0.024 _fastfood.py:89(_pad_with_zeros)

numpy trigonometric functions (cos, sin) are slow (they take the same amount of time as _approx_fourier_transformation_multi_dim individually), numba does not help. Maybe pure Cython implementation of _phi function would be faster.

@rth
Copy link
Contributor

rth commented Jun 2, 2021

Thanks for the investigation @glevv !

There are two issues there,

  • first is that _phi (and therefore transform) returns (n_samples, 2*n_components) instead of (n_samples, n_components) for tradeoff_mem_accuracy == "accuracy". That is a bug. Also because it manipulates a 2X larger matrix that it should, it's therefore 2X slower. We should really add a unit test to detect this.
  • Generally the function _phi should be modified to use more inplace operations (https://stackoverflow.com/a/22952411/1791279), though computing cos and sin would likely be expensive in any case.

A PR to fix/improve that situation would be very welcome

@rth rth added the bug Something isn't working label Jun 2, 2021
@glevv
Copy link
Author

glevv commented Jun 2, 2021

Judging by this implementation it should return matrix NxN and should be used with SVM(kernel='precomputed').

On the other hand in the same repo, they are using [cos(X), sin(X)] matrix as a feature matrix for the next model (and it will be obviously 2*n_components). Straightforward solution would be to divide n_components by 2 (or decrease the power of 2 by 1) for "accuracy" case.

@glevv
Copy link
Author

glevv commented Jun 17, 2021

[Cos, Sin] matrix is a matrix of real and imaginary components of Fourier transform. In these cases sometimes only real part of transformation is taken and imaginary part is dropped, but then there will be no difference between 'accuracy' and 'mem' cases.

We can either warn users that in accuracy case feature space will be 2 times higher, or we can drop 'accuracy' case for now and just leave random kitchen sinks method.

@TimotheeMathieu
Copy link
Contributor

I don't think "[Cos, Sin] matrix is a matrix of real and imaginary components of Fourier transform." is true, if that were the case we could have used fft and we would have obtained a faster implementation but in fact here we don't use fourier transform or am I wrong ?

@glevv
Copy link
Author

glevv commented Jun 17, 2021

Well, I don't know particular implementation, but in paper "Fastfood: Approximate Kernel Expansions in Loglinear Time" [1] they stated
Screenshot 2021-06-17 at 10-10-31 1408 3060 pdf
Which is the same as [cos(X), sin(X)] / sqrt(n), so I assumed that cos(X) is real part and sin(X) is imaginary part, but I could be wrong.

@TimotheeMathieu
Copy link
Contributor

Yes this is not a Fourier transform.
Fourier transform involves a sum (or an integral) of exponential, and here we just have an exponential but not a sum.
But you are right that cos(X) and sin(X) are the real and imaginary part of the feature map phi_j(x). And I think the "mem" case is exactly as you described : we drop the imaginary part.

@glevv
Copy link
Author

glevv commented Jun 18, 2021

Well, I guess the only thing left is to replace numpy math operations with python ones where needed, sice python operations are faster with single numbers.
Like these

d = np.power(2, np.floor(np.log2(d)) + 1)

1 / (self.sigma * np.sqrt(self._d)) * np.multiply(np.ravel(S), VX)

return (1 / np.sqrt(X.shape[1])) * np.hstack(

return X * np.sqrt(2.0 / X.shape[1])

And maybe change default tradeoff to 'mem', since it is be more consistent and fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants