satpy.writers.cf_writer module

Writer for netCDF4/CF.

Example usage

The CF writer saves datasets in a Scene as CF-compliant netCDF file. Here is an example with MSG SEVIRI data in HRIT format:

>>> from satpy import Scene
>>> import glob
>>> filenames = glob.glob('data/H*201903011200*')
>>> scn = Scene(filenames=filenames, reader='seviri_l1b_hrit')
>>> scn.load(['VIS006', 'IR_108'])
>>> scn.save_datasets(writer='cf', datasets=['VIS006', 'IR_108'], filename='seviri_test.nc',
                      exclude_attrs=['raw_metadata'])

You can select the netCDF backend using the engine keyword argument. If None if follows to_netcdf() engine choices with a preference for ‘netcdf4’.
For datasets with area definition you can exclude lat/lon coordinates by setting include_lonlats=False. If the area has a projected CRS, units are assumed to be in metre. If the area has a geographic CRS, units are assumed to be in degrees. The writer does not verify that the CRS is supported by the CF conventions. One commonly used projected CRS not supported by the CF conventions is the equirectangular projection, such as EPSG 4087.
By default non-dimensional coordinates (such as scanline timestamps) are prefixed with the corresponding dataset name. This is because they are likely to be different for each dataset. If a non-dimensional coordinate is identical for all datasets, the prefix can be removed by setting pretty=True.
Some dataset names start with a digit, like AVHRR channels 1, 2, 3a, 3b, 4 and 5. This doesn’t comply with CF https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch02s03.html. These channels are prefixed with "CHANNEL_" by default. This can be controlled with the variable numeric_name_prefix to save_datasets. Setting it to None or ‘’ will skip the prefixing.

Grouping

All datasets to be saved must have the same projection coordinates x and y. If a scene holds datasets with different grids, the CF compliant workaround is to save the datasets to separate files. Alternatively, you can save datasets with common grids in separate netCDF groups as follows:

>>> scn.load(['VIS006', 'IR_108', 'HRV'])
>>> scn.save_datasets(writer='cf', datasets=['VIS006', 'IR_108', 'HRV'],
                      filename='seviri_test.nc', exclude_attrs=['raw_metadata'],
                      groups={'visir': ['VIS006', 'IR_108'], 'hrv': ['HRV']})

Note that the resulting file will not be fully CF compliant.

Dataset Encoding

Dataset encoding can be specified in two ways:

Via the encoding keyword argument of save_datasets:

>>> my_encoding = {
...    'my_dataset_1': {
...        'compression': 'zlib',
...        'complevel': 9,
...        'scale_factor': 0.01,
...        'add_offset': 100,
...        'dtype': np.int16
...     },
...    'my_dataset_2': {
...        'compression': None,
...        'dtype': np.float64
...     }
... }
>>> scn.save_datasets(writer='cf', filename='encoding_test.nc', encoding=my_encoding)

Via the encoding attribute of the datasets in a scene. For example

>>> scn['my_dataset'].encoding = {'compression': 'zlib'}
>>> scn.save_datasets(writer='cf', filename='encoding_test.nc')

See the xarray encoding documentation for all encoding options.

Note

Chunk-based compression can be specified with the compression keyword since

netCDF4-1.6.0
libnetcdf-4.9.0
xarray-2022.12.0

The zlib keyword is deprecated. Make sure that the versions of these modules are all above or all below that reference. Otherwise, compression might fail or be ignored silently.

Attribute Encoding

In the above examples, raw metadata from the HRIT files have been excluded. If you want all attributes to be included, just remove the exclude_attrs keyword argument. By default, dict-type dataset attributes, such as the raw metadata, are encoded as a string using json. Thus, you can use json to decode them afterwards:

>>> import xarray as xr
>>> import json
>>> # Save scene to nc-file
>>> scn.save_datasets(writer='cf', datasets=['VIS006', 'IR_108'], filename='seviri_test.nc')
>>> # Now read data from the nc-file
>>> ds = xr.open_dataset('seviri_test.nc')
>>> raw_mda = json.loads(ds['IR_108'].attrs['raw_metadata'])
>>> print(raw_mda['RadiometricProcessing']['Level15ImageCalibration']['CalSlope'])
[0.020865   0.0278287  0.0232411  0.00365867 0.00831811 0.03862197
 0.12674432 0.10396091 0.20503568 0.22231115 0.1576069  0.0352385]

Alternatively it is possible to flatten dict-type attributes by setting flatten_attrs=True. This is more human readable as it will create a separate nc-attribute for each item in every dictionary. Keys are concatenated with underscore separators. The CalSlope attribute can then be accessed as follows:

>>> scn.save_datasets(writer='cf', datasets=['VIS006', 'IR_108'], filename='seviri_test.nc',
                      flatten_attrs=True)
>>> ds = xr.open_dataset('seviri_test.nc')
>>> print(ds['IR_108'].attrs['raw_metadata_RadiometricProcessing_Level15ImageCalibration_CalSlope'])
[0.020865   0.0278287  0.0232411  0.00365867 0.00831811 0.03862197
 0.12674432 0.10396091 0.20503568 0.22231115 0.1576069  0.0352385]

This is what the corresponding ncdump output would look like in this case:

$ ncdump -h test_seviri.nc
...
IR_108:raw_metadata_RadiometricProcessing_Level15ImageCalibration_CalOffset = -1.064, ...;
IR_108:raw_metadata_RadiometricProcessing_Level15ImageCalibration_CalSlope = 0.021, ...;
IR_108:raw_metadata_RadiometricProcessing_MPEFCalFeedback_AbsCalCoeff = 0.021, ...;
...

class satpy.writers.cf_writer.AttributeEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

JSON encoder for dataset attributes.

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an RecursionError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

_encode(obj)[source]: Encode the given object as a json-serializable datatype.

default(obj)[source]

Return a json-serializable object for obj.

In order to facilitate decoding, elements in dictionaries, lists/tuples and multi-dimensional arrays are encoded recursively.

class satpy.writers.cf_writer.CFWriter(name=None, filename=None, base_dir=None, **kwargs)[source]

Bases: Writer

Writer producing NetCDF/CF compatible datasets.

Initialize the writer object.

Parameters:

name (str) – A name for this writer for log and error messages. If this writer is configured in a YAML file its name should match the name of the YAML file. Writer names may also appear in output file attributes.
filename (str) –
Filename to save data to. This filename can and should specify certain python string formatting fields to differentiate between data written to the files. Any attributes provided by the .attrs of a DataArray object may be included. Format and conversion specifiers provided by the trollsift package may also be used. Any directories in the provided pattern will be created if they do not exist. Example:
```
{platform_name}_{sensor}_{name}_{start_time:%Y%m%d_%H%M%S}.tif
```
base_dir (str) – Base destination directories for all created files.
kwargs (dict) – Additional keyword arguments to pass to the Plugin class.

static da2cf(dataarray, epoch='seconds since 1970-01-01 00:00:00', flatten_attrs=False, exclude_attrs=None, include_orig_name=True, numeric_name_prefix='CHANNEL_')[source]

Convert the dataarray to something cf-compatible.

Parameters:

dataarray (xr.DataArray) – The data array to be converted
epoch (str) – Reference time for encoding of time coordinates
flatten_attrs (bool) – If True, flatten dict-type attributes
exclude_attrs (list) – List of dataset attributes to be excluded
include_orig_name (bool) – Include the original dataset name in the netcdf variable attributes
numeric_name_prefix (str) – Prepend dataset name with this if starting with a digit

save_dataset(dataset, filename=None, fill_value=None, **kwargs)[source]: Save the dataset to a given filename.

save_datasets(datasets, filename=None, groups=None, header_attrs=None, engine=None, epoch='seconds since 1970-01-01 00:00:00', flatten_attrs=False, exclude_attrs=None, include_lonlats=True, pretty=False, include_orig_name=True, numeric_name_prefix='CHANNEL_', **to_netcdf_kwargs)[source]

Save the given datasets in one netCDF file.

Note that all datasets (if grouping: in one group) must have the same projection coordinates.

Parameters:

datasets (list) – List of xr.DataArray to be saved.
filename (str) – Output file
groups (dict) – Group datasets according to the given assignment: {‘group_name’: [‘dataset1’, ‘dataset2’, …]}. Group name None corresponds to the root of the file, i.e. no group will be created. Warning: The results will not be fully CF compliant!
header_attrs – Global attributes to be included.
engine (str) – Module to be used for writing netCDF files. Follows xarray’s to_netcdf() engine choices with a preference for ‘netcdf4’.
epoch (str) – Reference time for encoding of time coordinates.
flatten_attrs (bool) – If True, flatten dict-type attributes.
exclude_attrs (list) – List of dataset attributes to be excluded.
include_lonlats (bool) – Always include latitude and longitude coordinates, even for datasets with area definition.
pretty (bool) – Don’t modify coordinate names, if possible. Makes the file prettier, but possibly less consistent.
include_orig_name (bool) – Include the original dataset name as a variable attribute in the final netCDF.
numeric_name_prefix (str) – Prefix to add the each variable with name starting with a digit. Use ‘’ or None to leave this out.

static update_encoding(dataset, to_netcdf_kwargs)[source]: Update encoding info (deprecated).

satpy.writers.cf_writer._add_ancillary_variables_attrs(dataarray)[source]: Replace ancillary_variables DataArray with a list of their name.

satpy.writers.cf_writer._add_grid_mapping(dataarray)[source]: Convert an area to at CF grid mapping.

satpy.writers.cf_writer._add_history(attrs)[source]: Add ‘history’ attribute to dictionary.

satpy.writers.cf_writer._backend_versions_match()[source]

satpy.writers.cf_writer._check_backend_versions()[source]: Issue warning if backend versions do not match.

satpy.writers.cf_writer._collect_cf_dataset(list_dataarrays, epoch='seconds since 1970-01-01 00:00:00', flatten_attrs=False, exclude_attrs=None, include_lonlats=True, pretty=False, include_orig_name=True, numeric_name_prefix='CHANNEL_')[source]

Process a list of xr.DataArray and return a dictionary with CF-compliant xr.Dataset.

Parameters:

list_dataarrays (list) – List of DataArrays to make CF compliant and merge into a xr.Dataset.
epoch (str) – Reference time for encoding the time coordinates (if available). Example format: “seconds since 1970-01-01 00:00:00”. If None, the default reference time is retrieved using from satpy.cf_writer import EPOCH
flatten_attrs (bool, optional) – If True, flatten dict-type attributes.
exclude_attrs (list, optional) – List of xr.DataArray attribute names to be excluded.
include_lonlats (bool, optional) – If True, it includes ‘latitude’ and ‘longitude’ coordinates also for satpy scene defined on an AreaDefinition. If the ‘area’ attribute is a SwathDefinition, it always include latitude and longitude coordinates.
pretty (bool, optional) – Don’t modify coordinate names, if possible. Makes the file prettier, but possibly less consistent.
include_orig_name (bool, optional) – Include the original dataset name as a variable attribute in the xr.Dataset.
numeric_name_prefix (str, optional) – Prefix to add the each variable with name starting with a digit. Use ‘’ or None to leave this out.

Returns:

ds – A partially CF-compliant xr.Dataset

Return type:

xr.Dataset

satpy.writers.cf_writer._create_grid_mapping(area)[source]: Create the grid mapping instance for area.

satpy.writers.cf_writer._drop_exclude_attrs(dataarray, exclude_attrs)[source]: Remove user-specified list of attributes.

satpy.writers.cf_writer._encode_nc(obj)[source]

Try to encode obj as a netcdf compatible datatype which most closely resembles the object’s nature.

Raises:: ValueError if no such datatype could be found –

satpy.writers.cf_writer._encode_python_objects(obj)[source]

Try to find the datatype which most closely resembles the object’s nature.

If on failure, encode as a string. Plain lists are encoded recursively.

satpy.writers.cf_writer._format_prerequisites_attrs(dataarray)[source]: Reformat prerequisites attribute value to string.

satpy.writers.cf_writer._get_backend_versions()[source]

satpy.writers.cf_writer._get_groups(groups, list_datarrays)[source]

Return a dictionary with the list of xr.DataArray associated to each group.

If no groups (groups=None), return all DataArray attached to a single None key. Else, collect the DataArrays associated to each group.

satpy.writers.cf_writer._handle_dataarray_name(original_name, numeric_name_prefix)[source]

satpy.writers.cf_writer._initialize_root_netcdf(filename, engine, header_attrs, to_netcdf_kwargs)[source]: Initialize root empty netCDF.

satpy.writers.cf_writer._preprocess_dataarray_name(dataarray, numeric_name_prefix, include_orig_name)[source]: Change the DataArray name by prepending numeric_name_prefix if the name is a digit.

satpy.writers.cf_writer._process_time_coord(dataarray, epoch)[source]

Process the ‘time’ coordinate, if existing.

If expand the DataArray with a time dimension if does not yet exists.

The function assumes

that x and y dimensions have at least shape > 1

the time coordinate has size 1

satpy.writers.cf_writer._remove_none_attrs(dataarray)[source]: Remove attribute keys with None value.

satpy.writers.cf_writer._remove_satpy_attrs(new_data)[source]: Remove _satpy attribute.

satpy.writers.cf_writer._sanitize_writer_kwargs(writer_kwargs)[source]: Remove satpy-specific kwargs.

satpy.writers.cf_writer._set_default_chunks(encoding, dataset)[source]

Update encoding to preserve current dask chunks.

Existing user-defined chunks take precedence.

satpy.writers.cf_writer._set_default_fill_value(encoding, dataset)[source]

Set default fill values.

Avoid _FillValue attribute being added to coordinate variables (https://github.com/pydata/xarray/issues/1865).

satpy.writers.cf_writer._set_default_time_encoding(encoding, dataset)[source]

Set default time encoding.

Make sure time coordinates and bounds have the same units. Default is xarray’s CF datetime encoding, which can be overridden by user-defined encoding.

satpy.writers.cf_writer._update_encoding_dataset_names(encoding, dataset, numeric_name_prefix)[source]

Ensure variable names of the encoding dictionary account for numeric_name_prefix.

A lot of channel names in satpy starts with a digit. When preparing CF-compliant datasets, these channels are prefixed with numeric_name_prefix.

If variables names in the encoding dictionary are numeric digits, their name is prefixed with numeric_name_prefix

satpy.writers.cf_writer.add_lonlat_coords(dataarray)[source]: Add ‘longitude’ and ‘latitude’ coordinates to DataArray.

satpy.writers.cf_writer.add_time_bounds_dimension(ds, time='time')[source]: Add time bound dimension to xr.Dataset.

satpy.writers.cf_writer.area2cf(dataarray, include_lonlats=False, got_lonlats=False)[source]: Convert an area to at CF grid mapping or lon and lats.

satpy.writers.cf_writer.assert_xy_unique(datas)[source]: Check that all datasets share the same projection coordinates x/y.

satpy.writers.cf_writer.collect_cf_datasets(list_dataarrays, header_attrs=None, exclude_attrs=None, flatten_attrs=False, pretty=True, include_lonlats=True, epoch='seconds since 1970-01-01 00:00:00', include_orig_name=True, numeric_name_prefix='CHANNEL_', groups=None)[source]

Process a list of xr.DataArray and return a dictionary with CF-compliant xr.Datasets.

If the xr.DataArrays does not share the same dimensions, it creates a collection of xr.Datasets sharing the same dimensions.

Parameters:

(list) (exclude_attrs) – List of DataArrays to make CF compliant and merge into groups of xr.Datasets.
header_attrs ((dict):) – Global attributes of the output xr.Dataset.
(str) (numeric_name_prefix) – Reference time for encoding the time coordinates (if available). Example format: “seconds since 1970-01-01 00:00:00”. If None, the default reference time is retrieved using from satpy.cf_writer import EPOCH
(bool) (pretty) – If True, flatten dict-type attributes.
(list) – List of xr.DataArray attribute names to be excluded.
(bool) – If True, it includes ‘latitude’ and ‘longitude’ coordinates also for satpy scene defined on an AreaDefinition. If the ‘area’ attribute is a SwathDefinition, it always include latitude and longitude coordinates.
(bool) – Don’t modify coordinate names, if possible. Makes the file prettier, but possibly less consistent.
(bool). (include_orig_name) – Include the original dataset name as a variable attribute in the xr.Dataset.
(str) – Prefix to add the each variable with name starting with a digit. Use ‘’ or None to leave this out.
(dict) (groups) –
Group datasets according to the given assignment:

{‘<group_name>’: [‘dataset_name1’, ‘dataset_name2’, …]}

It is used to create grouped netCDFs using the CF_Writer. If None (the default), no groups will be created.

Returns:

grouped_datasets (dict) – A dictionary of CF-compliant xr.Dataset: {group_name: xr.Dataset}
header_attrs (dict) – Global attributes to be attached to the xr.Dataset / netCDF4.

satpy.writers.cf_writer.encode_attrs_nc(attrs)[source]

Encode dataset attributes in a netcdf compatible datatype.

Parameters:: attrs (dict) – Attributes to be encoded
Returns:: Encoded (and sorted) attributes
Return type:: dict

satpy.writers.cf_writer.encode_nc(obj)[source]: Encode the given object as a netcdf compatible datatype.

satpy.writers.cf_writer.get_extra_ds(dataarray, keys=None)[source]: Get the ancillary_variables DataArrays associated to a dataset.

satpy.writers.cf_writer.has_projection_coords(ds_collection)[source]: Check if DataArray collection has a “longitude” or “latitude” DataArray.

satpy.writers.cf_writer.is_lon_or_lat_dataarray(dataarray)[source]: Check if the DataArray represents the latitude or longitude coordinate.

satpy.writers.cf_writer.link_coords(datas)[source]

Link dataarrays and coordinates.

If the coordinates attribute of a data array links to other dataarrays in the scene, for example coordinates=’lon lat’, add them as coordinates to the data array and drop that attribute. In the final call to xr.Dataset.to_netcdf() all coordinate relations will be resolved and the coordinates attributes be set automatically.

satpy.writers.cf_writer.make_alt_coords_unique(datas, pretty=False)[source]

Make non-dimensional coordinates unique among all datasets.

Non-dimensional (or alternative) coordinates, such as scanline timestamps, may occur in multiple datasets with the same name and dimension but different values.

In order to avoid conflicts, prepend the dataset name to the coordinate name. If a non-dimensional coordinate is unique among all datasets and pretty=True, its name will not be modified.

Since all datasets must have the same projection coordinates, this is not applied to latitude and longitude.

Parameters:

datas (dict) – Dictionary of (dataset name, dataset)
pretty (bool) – Don’t modify coordinate names, if possible. Makes the file prettier, but possibly less consistent.

Returns:

Dictionary holding the updated datasets

satpy.writers.cf_writer.make_cf_dataarray(dataarray, epoch='seconds since 1970-01-01 00:00:00', flatten_attrs=False, exclude_attrs=None, include_orig_name=True, numeric_name_prefix='CHANNEL_')[source]

Make the xr.DataArray CF-compliant.

Parameters:

dataarray (xr.DataArray) – The data array to be made CF-compliant.
epoch (str, optional) – Reference time for encoding of time coordinates.
flatten_attrs (bool, optional) – If True, flatten dict-type attributes. The default is False.
exclude_attrs (list, optional) – List of dataset attributes to be excluded. The default is None.
include_orig_name (bool, optional) – Include the original dataset name in the netcdf variable attributes. The default is True.
numeric_name_prefix (TYPE, optional) – Prepend dataset name with this if starting with a digit. The default is "CHANNEL_".

Returns:

new_data – CF-compliant xr.DataArray.

Return type:

xr.DataArray

satpy.writers.cf_writer.preprocess_datarray_attrs(dataarray, flatten_attrs, exclude_attrs)[source]: Preprocess DataArray attributes to be written into CF-compliant netCDF/Zarr.

satpy.writers.cf_writer.preprocess_header_attrs(header_attrs, flatten_attrs=False)[source]: Prepare file header attributes.

satpy.writers.cf_writer.update_encoding(dataset, to_netcdf_kwargs, numeric_name_prefix='CHANNEL_')[source]

Update encoding.

Preserve dask chunks, avoid fill values in coordinate variables and make sure that time & time bounds have the same units.