Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Round-tripping with hsload and hsget #38

Open
rsignell-usgs opened this issue Oct 14, 2017 · 8 comments
Open

Round-tripping with hsload and hsget #38

rsignell-usgs opened this issue Oct 14, 2017 · 8 comments
Assignees

Comments

@rsignell-usgs
Copy link
Contributor

@jreadey, you used hsload to put our Hurricane Sandy netcdf4 file on HSDS:

(IOOS) rsignell@0e6be50c3dc2:~$ hsls /home/john/sandy.nc/

john                            domain   2017-09-07 22:11:07 /home/john/sandy.nc
1 items

If I try to use hsget to get that dataset back, I get errors:

(IOOS) rsignell@0e6be50c3dc2:~$ hsget /home/john/sandy.nc sandy.nc
2017-10-14 14:00:39,424 ERROR: failed to create dataset: Scalar datasets don't support chunk/filter options
ERROR: failed to create dataset: Scalar datasets don't support chunk/filter options
2017-10-14 14:01:50,324 ERROR: failed to create dataset: Scalar datasets don't support chunk/filter options

And although I do end up with a sandy.nc file, if I try to ncdump it, it doesn't work (see below). I guess that is not too surprising in light of #32, right?

But do you think one day we will be able to round-trip a dataset using hsload and hsget?


(IOOS) rsignell@0e6be50c3dc2:~$ ncdump -h sandy.nc
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140414440146688:
  #000: H5L.c line 1183 in H5Literate(): link iteration failed
    major: Symbol table
    minor: Iteration failed
  #001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
    major: Symbol table
    minor: Iteration failed
  #002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 140414440146688:
  #000: H5L.c line 1183 in H5Literate(): link iteration failed
    major: Symbol table
    minor: Iteration failed
  #001: H5Gint.c line 844 in H5G_iterate(): error iterating over links
    major: Symbol table
    minor: Iteration failed
  #002: H5Gobj.c line 708 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: H5Gstab.c line 566 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: H5B.c line 1221 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #005: H5B.c line 1177 in H5B_iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #006: H5Gnode.c line 1039 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
ncdump: sandy.nc: NetCDF: HDF error
(IOOS) rsignell@0e6be50c3dc2:~$
@jreadey
Copy link
Member

jreadey commented Dec 3, 2017

There are some updates in the v0.2.7 that enable files with dimension scales to be correctly uploaded to the HSDS service. There is still a problem with downloading the files which will require some HSDS updates to resolve.

Also, I noticed that there are some attributes in the sand.nc file that can't be read with h5py. These appear to be related to this issue: h5py/h5py#719.

@rsignell-usgs
Copy link
Contributor Author

rsignell-usgs commented Dec 3, 2017

Looks like this was fixed in NetCDF on September 1: Unidata/netcdf-c@4dd8e38

and released in version 4.5.0 on Oct 20 https://github.com/Unidata/netcdf-c/releases/tag/v4.5.0

I will try converting those files to netcdf4 again and see if that fixes the problem.

@jreadey
Copy link
Member

jreadey commented Dec 3, 2017

Ok thanks. For cases where a netcdf file with the bug is used, I've added a check so that hsload just prints a warning message and continues on with other attributes.

@rsignell-usgs
Copy link
Contributor Author

I used nccopy from NetCDF 4.5.0 to recreate my Sandy netcdf4 files from the original netcdf3 files:

nccopy -7 -d 7 Sandy_ocean_his.nc Sandy_ocean_his_nc4c.nc

and then used hsload to write to HSDS. The only error I got was:

$ hsload Sandy_ocean_his_nc4c.nc /home/rsignell/sandy2.nc
2017-12-03 19:42:48,871 utillib.py:266 ERROR: failed to create attribute script_file of object / -- unknown object type
ERROR: failed to create attribute script_file of object / -- unknown object type

@rsignell-usgs
Copy link
Contributor Author

When I try to load the HSDS dataset using xarray with the h5netcdf engine:

import xarray as xr
ds = xr.open_dataset('Sandy_ocean_his.nc')
ds = xr.open_dataset('/home/rsignell/sandy2.nc', engine='h5netcdf')

I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-8b828d9bcc43> in <module>()
----> 1 ds = xr.open_dataset('/home/rsignell/sandy2.nc', engine='h5netcdf')

~/.conda/envs/hsds/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables)
    292         elif engine == 'h5netcdf':
    293             store = backends.H5NetCDFStore(filename_or_obj, group=group,
--> 294                                            autoclose=autoclose)
    295         elif engine == 'pynio':
    296             store = backends.NioDataStore(filename_or_obj,

~/.conda/envs/hsds/lib/python3.6/site-packages/xarray/backends/h5netcdf_.py in __init__(self, filename, mode, format, group, writer, autoclose)
     62         opener = functools.partial(_open_h5netcdf_group, filename, mode=mode,
     63                                    group=group)
---> 64         self.ds = opener()
     65         if autoclose:
     66             raise NotImplementedError('autoclose=True is not implemented '

~/.conda/envs/hsds/lib/python3.6/site-packages/xarray/backends/h5netcdf_.py in _open_h5netcdf_group(filename, mode, group)
     48 def _open_h5netcdf_group(filename, mode, group):
     49     import h5netcdf.legacyapi
---> 50     ds = h5netcdf.legacyapi.Dataset(filename, mode=mode)
     51     with close_on_error(ds):
     52         return _nc4_group(ds, group, mode)

/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in __init__(self, path, mode, invalid_netcdf, **kwargs)
    584         # if we actually use invalid NetCDF features.
    585         self._write_ncproperties = (invalid_netcdf is not True)
--> 586         super(File, self).__init__(self, self._h5path)
    587 
    588     def _check_valid_netcdf_dtype(self, dtype, stacklevel=3):

/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in __init__(self, parent, name)
    241                     # variables.
    242                     self._current_dim_sizes[k] = \
--> 243                         self._determine_current_dimension_size(k, current_size)
    244 
    245                     if dim_id is None:

/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in _determine_current_dimension_size(self, dim_name, max_size)
    286 
    287             for i, var_d in enumerate(var.dims):
--> 288                 name = _name_from_dimension(var_d)
    289                 if name == dim_name:
    290                     max_size = max(var.shape[i], max_size)

/notebooks/rsignell/github/h5netcdf/h5netcdf/core.py in _name_from_dimension(dim)
     34     # First value in a dimension is the actual dimension scale
     35     # which we'll use to extract the name.
---> 36     return dim[0].name.split('/')[-1]
     37 
     38 

AttributeError: 'NoneType' object has no attribute 'split'

This was after changing import h5py to import h5phd as h5py in h5netcdf.

@ghost
Copy link

ghost commented Dec 4, 2017

@rsignell-usgs We are aware of this problem with h5netcdf and h5pyd. h5pyd currently cannot return the HDF5 path name for HDF5 objects that are not accessed following the file's hierarchy. Returning HDF5 dimension scale datasets as h5py.Dataset is one of those types of access.

Are you working on enabling h5netcdf to work with h5pyd? I'm asking because I just started working on this in the last couple of days. No need for us to duplicate the effort.

@rsignell-usgs
Copy link
Contributor Author

@ajelenak-thg, no, I'm not working on it. I just forked h5netcdf and replaced:
import h5py
with
import h5pyd as h5py
and then observed that didn't work.

@ghost
Copy link

ghost commented Dec 4, 2017

@rsignell-usgs That's how far I was able to progress, too. 😃 I think @jreadey is working on a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants