Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cf_writer adds a _FillValue = NaN to coordinate variables #2845

Open
joleenf opened this issue Jun 28, 2024 · 1 comment
Open

cf_writer adds a _FillValue = NaN to coordinate variables #2845

joleenf opened this issue Jun 28, 2024 · 1 comment

Comments

@joleenf
Copy link
Contributor

joleenf commented Jun 28, 2024

The cf_writer adds a _FillValue to the final netCDF output for the lat/lon coordinates. In this case, adapted from an abi_fixed_grid, the dims are ("y", "x") while the coords are ("latitude", "longitude")
even so, the coordinates should not contain a _FillValue upon writing to the netCDF file.
pydata/xarray#1598

Reproduce:

import dask.array as da
import os
import xarray as xr

from pyresample import AreaDefinition
from satpy import Scene

area_id = "abi_geos"
description = "Strided ABI L2 File Area"
proj_id = "abi_geos"
projection = {'ellps': 'GRS80', 'h': '35786023', 'lon_0': '-75', 'no_defs': 'None', 'proj': 'geos', 'sweep': 'x',
              'type': 'crs', 'units': 'm', 'x_0': '0', 'y_0': '0'}
width = 100
height = 100
area_extent = (-3627271.341, 1583173.822, 1382771.9518, 4589199.7649)
area_def = AreaDefinition(area_id, description, proj_id, projection, width, height, area_extent)
lonlats = area_def.get_lonlats()


scn = Scene()

scn["test"] = xr.DataArray(data=da.zeros((100, 100)), dims=("y", "x"), attrs={"name": "test", "area": area_def})

cf_ds = scn.to_xarray()
print(f"Longitude Encoding: {cf_ds['longitude'].encoding}")
print(f"Longitude Attrs: {cf_ds['longitude'].attrs}")
print(f"Longitude Min: {lonlats[0].min()}, Latitude Min: {lonlats[1].min()}")
print(f"Longitude Max: {lonlats[0].max()}, Latitude Max: {lonlats[1].max()}")
outpath = os.path.join(os.path.expanduser("~"), "test.nc")
scn.save_datasets(filename=outpath)

Though the original attributes do not contain a _FillValue, the resulting netCDF does, the code above prints:

Longitude Encoding: {}
Longitude Attrs: {'name': 'longitude', 'standard_name': 'longitude', 'units': 'degrees_east'}
Longitude Min: -147.87682093113054, Latitude Min: 14.704946854219488
Longitude Max: inf, Latitude Max: inf

but an ncdump -h on test.nc shows the addition of the _FillValue:

	double longitude(y, x) ;
		longitude:_FillValue = NaN ;
		longitude:name = "longitude" ;
		longitude:standard_name = "longitude" ;
		longitude:units = "degrees_east" ;
	double latitude(y, x) ;
		latitude:_FillValue = NaN ;
		latitude:name = "latitude" ;
		latitude:standard_name = "latitude" ;
		latitude:units = "degrees_north" ;
	double test(y, x) ;
		test:_FillValue = NaN ;
		test:grid_mapping = "abi_geos" ;
		test:long_name = "test" ;
		test:coordinates = "latitude longitude" ;

However, it is possible to add encoding to save_datasets to save the lat/lon without _FillValue:
scn.save_datasets(filename=test.nc, encoding={"latitude": {"_FillValue": None}, "longitude": {"_FillValue": None}})

		longitude:name = "longitude" ;
		longitude:standard_name = "longitude" ;
		longitude:units = "degrees_east" ;
	double latitude(y, x) ;
		latitude:name = "latitude" ;
		latitude:standard_name = "latitude" ;
		latitude:units = "degrees_north" ;
	double test(y, x) ;
		test:_FillValue = NaN ;
		test:grid_mapping = "abi_geos" ;
		test:long_name = "test" ;
		test:coordinates = "latitude longitude" 

It seems that the encoding for writing netCDF data should include this somewhere in save_datasets rather than being typed explicitly.

@djhoese
Copy link
Member

djhoese commented Jun 28, 2024

I'm not sure I agree that there should be no _FillValue. The argument in this xarray issue (pydata/xarray#1865) if I remember correctly is more about matching the CF standard. In CF a coordinate variable is a 1D variable that matches the name of a dimension. As we discussed on slack, it makes sense (as mentioned in the xarray issue about the CF standard) that you can't have fill values on a coordinate 1D variable. You can't have a pixel of data that has a "location" of (NaN, NaN). It just doesn't make sense. BUT our 2D lon/lats in CF-land I don't think are technically coordinate variables at least as far as the missing value concern...is concerned. This section of the CF docs makes me think that it is expected:

http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#reduced-horizontal-grid

Storing this type of gridded data in two-dimensional arrays wastes space, and results in the presence of missing values in the 2D coordinate variables.

So on this page:

http://cfconventions.org/cf-conventions/cf-conventions.html#terminology

auxiliary coordinate variable

Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the NUG and used by this standard - see below). Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s).

Which is used:

http://cfconventions.org/cf-conventions/cf-conventions.html#missing-data

Missing data is allowed in data variables and auxiliary coordinate variables. Generic applications should treat the data as missing where any auxiliary coordinate variables have missing values; special-purpose applications might be able to make use of the data. Missing data is not allowed in coordinate variables.

So if our 2D lon/lats are considered "auxiliary" then they're fine to have a _FillValue in CF.

Anyway, my opinion is that the lon/lat 2D arrays should have a _FillValue or at the very least a valid_range.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants