Point access speedup #10

richardt94 · 2022-04-07T01:34:54Z

This will address the issue raised in #8. The problem there is that netCDF really doesn't like "fancy indexing" (see e.g. here), where an attempt is made to access a bunch of separate small areas in the file (e.g. a list of separated points). This is only going to be more of a problem when the file is being supplied by a remote THREDDS server as reported in the issue - if max_bytes is set large enough in the call to get_value_at_coords then the server will time out attempting to do the fancy indexing, and the only alternative is to set a small maximum request size that gets a few points each time, which is also quite slow.

This PR changes this by indexing the dataset with contiguous slices if max_bytes allows, which is processed much, much faster. I tested with the notebook in examples/2_geophys_netcdf_grid_utils_demo.ipynb (this uses the same dataset referenced in #8) and was able to retrieve 466 points at 10 km spacing in less than 2 seconds with a fast connection to NCI and max_bytes=50000000 (50 MB), compared to a minimum of 6.9 seconds with the current implementation using the minimum request size of max_bytes=1.

The changes aren't quite ready yet because the computation of the slice indices assumes that the list of points is "sorted" in a way that the rectangle bounded by the ith point and jth point is entirely contained in the rectangle bounded by the ith point and j+1th point. This will probably hold for lists of points that are along an almost straight line, but not otherwise.

…s for path and variable_to_map in main().

richardt94 · 2022-04-07T07:07:12Z

This should now work for any list of points, though it will be faster for lists where successive points are close to each other. I also improved the handling of passing a single point to get_value_at_coords (this is now explicitly checked instead of waiting for an error to be thrown by the logic that handles lists of points) and added a couple more tests for the function using different max_bytes.

andrew-j-turner-000 and others added 7 commits January 18, 2022 17:02

Removed the basemap layer as it was causing issues. Added script argv…

1dc32bb

…s for path and variable_to_map in main().

attempt to make netCDF access for list of points faster

600e2c3

faster point access for sorted points

2f787d5

restore original try-except block

e3d79d0

generalise to any list of points

722f2e4

add tests for get_value_at_coords with different request sizes

e1deecb

get rid of unsafe try-except and clean up comments

c1a4e26

richardt94 changed the title ~~[WIP] Point access speedup~~ Point access speedup Apr 7, 2022

richardt94 changed the base branch from master to develop April 12, 2022 01:36

andrew-j-turner-000 merged commit 14cb960 into GeoscienceAustralia:develop Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Point access speedup #10

Point access speedup #10

richardt94 commented Apr 7, 2022

richardt94 commented Apr 7, 2022

Point access speedup #10

Point access speedup #10

Conversation

richardt94 commented Apr 7, 2022

richardt94 commented Apr 7, 2022