Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing suite is too slow #285

Open
cako opened this issue Dec 29, 2021 · 14 comments
Open

Testing suite is too slow #285

cako opened this issue Dec 29, 2021 · 14 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@cako
Copy link
Collaborator

cako commented Dec 29, 2021

On a Intel Core i7-8565U CPU @ 1.80GHz × 8, the full test suite takes on average, 250s. I have noticed that tests are routinely failing in the GitHub CI, and I believe it may be because of this.

To profile the matter, I generated a list of all tests which take more than 2 seconds to run (see below). It seems WavefieldDecomposition and SeismicInterpolation are the biggest culprits, so one suggestions would be to make the dimensions in those examples smaller.

============================== slowest durations ===============================
26.72s call     pytests/test_wavedecomposition.py::test_WavefieldDecomposition3D[par1]
25.61s call     pytests/test_seismicinterpolation.py::test_SeismicInterpolation2d[par3]
17.35s call     pytests/test_seismicinterpolation.py::test_SeismicInterpolation3d[par1]
8.76s call     pytests/test_wavedecomposition.py::test_WavefieldDecomposition3D[par0]
8.37s call     pytests/test_marchenko.py::test_Marchenko_timemulti_ana[par0]
6.01s call     pytests/test_seismicinterpolation.py::test_SeismicInterpolation2d[par2]
5.30s call     pytests/test_marchenko.py::test_Marchenko_freq[par0]
5.06s call     pytests/test_marchenko.py::test_Marchenko_time_ana[par0]
5.02s call     pytests/test_marchenko.py::test_Marchenko_time[par0]
5.01s call     pytests/test_marchenko.py::test_Marchenko_freq[par1]
4.88s setup    pytests/test_wavedecomposition.py::test_WavefieldDecomposition3D[par0]
4.88s call     pytests/test_sparsity.py::test_ISTA_FISTA_multiplerhs[par3]
4.87s call     pytests/test_sparsity.py::test_ISTA_FISTA_multiplerhs[par1]
4.74s call     pytests/test_sparsity.py::test_ISTA_FISTA_multiplerhs[par5]
4.71s call     pytests/test_sparsity.py::test_ISTA_FISTA_multiplerhs[par2]
4.64s call     pytests/test_sparsity.py::test_ISTA_FISTA_multiplerhs[par0]
4.62s call     pytests/test_sparsity.py::test_ISTA_FISTA_multiplerhs[par4]
4.16s call     pytests/test_radon.py::test_Radon3D[par2]
3.41s call     pytests/test_radon.py::test_Radon3D[par5]
3.36s call     pytests/test_radon.py::test_Radon2D[par2]
3.34s call     pytests/test_radon.py::test_Radon3D[par7]
3.17s call     pytests/test_sparsity.py::test_ISTA_FISTA[par2]
3.11s call     pytests/test_sparsity.py::test_ISTA_FISTA[par0]
3.10s call     pytests/test_sparsity.py::test_ISTA_FISTA[par1]
3.07s call     pytests/test_sparsity.py::test_ISTA_FISTA[par5]
2.99s call     pytests/test_sparsity.py::test_ISTA_FISTA[par4]
2.96s call     pytests/test_sparsity.py::test_ISTA_FISTA[par3]
2.69s call     pytests/test_radon.py::test_Radon2D[par5]
2.30s call     pytests/test_radon.py::test_Radon3D[par3]
2.21s call     pytests/test_radon.py::test_Radon2D[par7]
@cako cako added the help wanted Extra attention is needed label Dec 29, 2021
@mrava87
Copy link
Collaborator

mrava87 commented Dec 31, 2021

Regarding GitHub CI, this never fails (or at least not to my knowledge). The red stars you see in the latest commits are due to my impatience of merging without waiting for the tests to finish (since I only modified documentation files). The behavior of GitHub CI seems to be that the CI stops (quite understandably)...

To the tests, indeed anything that still does the job of checking an operator but shorter in terms of time is welcome.

I went ahead and made some small changes to test_WavefieldDecomposition3D, mostly reducing the size of the input 3d tensor and adapting parameters for the FK correct, now on my laptop it runs in about 1sec.

Feel free to take a look at test_SeismicInterpolation2d, here I worry it may be harder because L1 solver need a large number (100s) of iterations to converge so we may need to either reduce thresholds of acceptance or again try to make the example smaller (if possible)

mrava87 added a commit to mrava87/pylops that referenced this issue Dec 31, 2021
In order to speed up our CI pipeline,
PyLops#285 suggested
to reduce the size of a number of tests, test_wavedecomposition
being one of them. The tests are now running with a smaller
input data.
@cako
Copy link
Collaborator Author

cako commented Dec 31, 2021

Awesome, thank you! Runtime has now dropped to 180s. If you don't mind, I think we can keep the issue open so we don't forget to improve the runtime of other tests.

@mrava87
Copy link
Collaborator

mrava87 commented Dec 31, 2021

Yep, agree. As 1 and 4 are now very fast, I think if we can improve on 2 and 3 the overall timing of the test suite should become more reasonable. I am a bit unsure how much effort we should put on any test taking <5s as it may be more than the actual gain. What do you think?

@cako
Copy link
Collaborator Author

cako commented Dec 31, 2021

I think in the list of priorities it is not very far up, but it would definitely be nice at least remove the tests that take a long time (>5s). I can mark this as a first issue and we could maybe help people who want to get their toes wet with PyLops to tackle these issues? It is an excellent way of learning about the library.

@cako cako added the good first issue Good for newcomers label Dec 31, 2021
@dikwickley
Copy link
Contributor

dikwickley commented Jan 23, 2023

i have worked with pytest before and it can get very slow at times. how i solved this with my case was by running the tests in parallel using the pytest-xdist module. i can work on this if you think this can work.
@cako @mrava87

@mrava87
Copy link
Collaborator

mrava87 commented Jan 23, 2023

Hello @dikwickley,
this could be an interesting avenue. May I ask if you have tried this on CI before (eg GitHub Actions, Azure Pipelines)?

@dikwickley
Copy link
Contributor

@mrava87 it was used with jenkins. we just need to add -n <worker_count> at the end of pytest command. (ofcourse pytest-xdistshould be installed via requirements.txt)

@mrava87
Copy link
Collaborator

mrava87 commented Jan 23, 2023

Mmmh, we need to be able to run our CI with either GitHub Actions or Azure Pipelines (ideally both as we do now). Please have a look if this is allowed by those CI platforms before implementing it. If yes, we will be happy for you to go ahead and make a PR with the suggested changed :)

@dikwickley
Copy link
Contributor

I tested out parallelization locally, here are some interesting results

without parallelization
n-none

with parallelization (4 workers)
n-auto(4)

with parallelization (6 workers)
n-6

as you can see there isn't much of a speed improvement. I found out last 20% of the test cases took the most time, maybe we can try reducing test cases or removing trivial ones.

@mrava87
Copy link
Collaborator

mrava87 commented Jan 24, 2023

Alright, good to see. I was a bit unsure this would bring much improvement as we heavily use numpy (most methods are multi-threaded) and so we would end up trading the speed-up of multi-threaded functions with that of having multiple processes working in parallel.

I am not in favour of removing any test. If you want you can run the test suite locally like @cako did in the past (see above), identity the tests that take longer and try to reduce the size of the problem for those tests… we did this already for the very heavy ones, but I remember writing this issue as we thought there are at least a couple more that could be further sped up.

@dikwickley
Copy link
Contributor

We can use the --durations=N flag to find the slowest N tests. Let me try it find which ones are the heaviest.

@mrava87
Copy link
Collaborator

mrava87 commented Feb 16, 2023

Sure. Even if you just run tests in a IDE like Pycharm/VSCode you can get directly stats on duration. I suggest you look at the top 5 and we can discuss which one makes sense to tackle first. There may be problems that we can’t make smaller than what they are as the inversion test would simply be not representative anymore

@dikwickley
Copy link
Contributor

dikwickley commented Feb 16, 2023

I think we can start with these
test_SeismicInterpolation2d
test_SeismicInterpolation3d
test_Marchenko_freq
test_lsm2d

image

How can we decide that we can't reduce a test case any further for some test?

@mrava87
Copy link
Collaborator

mrava87 commented Feb 16, 2023

Sounds good. Well based on knowledge of the problem. You can start taking the top one and reduce its size, but I would not just run it as test blindly, I would also make some plots to see what the input is, what the expected output is and what the one obtained is… and use tutorials using the same operators to understand what the problem entails if you are not familiar :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants