satpy.resample module

Resampling in Satpy.

Satpy provides multiple resampling algorithms for resampling geolocated data to uniform projected grids. The easiest way to perform resampling in Satpy is through the Scene object’s resample() method. Additional utility functions are also available to assist in resampling data. Below is more information on resampling with Satpy as well as links to the relevant API documentation for available keyword arguments.

Resampling algorithms

Available Resampling Algorithms
Resampler	Description	Related
nearest	Nearest Neighbor	`KDTreeResampler`
ewa	Elliptical Weighted Averaging	`DaskEWAResampler`
ewa_legacy	Elliptical Weighted Averaging (Legacy)	`LegacyDaskEWAResampler`
native	Native	`NativeResampler`
bilinear	Bilinear	`BilinearResampler`
bucket_avg	Average Bucket Resampling	`BucketAvg`
bucket_sum	Sum Bucket Resampling	`BucketSum`
bucket_count	Count Bucket Resampling	`BucketCount`
bucket_fraction	Fraction Bucket Resampling	`BucketFraction`
gradient_search	Gradient Search Resampling	`GradientSearchResampler`

The resampling algorithm used can be specified with the resampler keyword argument and defaults to nearest:

>>> scn = Scene(...)
>>> euro_scn = scn.resample('euro4', resampler='nearest')

Warning

Some resampling algorithms expect certain forms of data. For example, the EWA resampling expects polar-orbiting swath data and prefers if the data can be broken in to “scan lines”. See the API documentation for a specific algorithm for more information.

Resampling for comparison and composites

While all the resamplers can be used to put datasets of different resolutions on to a common area, the ‘native’ resampler is designed to match datasets to one resolution in the dataset’s original projection. This is extremely useful when generating composites between bands of different resolutions.

>>> new_scn = scn.resample(resampler='native')

By default this resamples to the highest resolution area (smallest footprint per pixel) shared between the loaded datasets. You can easily specify the lowest resolution area:

>>> new_scn = scn.resample(scn.coarsest_area(), resampler='native')

Providing an area that is neither the minimum or maximum resolution area may work, but behavior is currently undefined.

Caching for geostationary data

Satpy will do its best to reuse calculations performed to resample datasets, but it can only do this for the current processing and will lose this information when the process/script ends. Some resampling algorithms, like nearest and bilinear, can benefit by caching intermediate data on disk in the directory specified by cache_dir and using it next time. This is most beneficial with geostationary satellite data where the locations of the source data and the target pixels don’t change over time.

>>> new_scn = scn.resample('euro4', cache_dir='/path/to/cache_dir')

See the documentation for specific algorithms to see availability and limitations of caching for that algorithm.

Create custom area definition

See pyresample.geometry.AreaDefinition for information on creating areas that can be passed to the resample method:

>>> from pyresample.geometry import AreaDefinition
>>> my_area = AreaDefinition(...)
>>> local_scene = scn.resample(my_area)

Create dynamic area definition

See pyresample.geometry.DynamicAreaDefinition for more information.

Examples coming soon…

Store area definitions

Area definitions can be saved to a custom YAML file (see pyresample’s writing to disk) and loaded using pyresample’s utility methods (pyresample’s loading from disk):

>>> from pyresample import load_area
>>> my_area = load_area('my_areas.yaml', 'my_area')

Or using satpy.resample.get_area_def(), which will search through all areas.yaml files in your SATPY_CONFIG_PATH:

>>> from satpy.resample import get_area_def
>>> area_eurol = get_area_def("eurol")

For examples of area definitions, see the file etc/areas.yaml that is included with Satpy and where all the area definitions shipped with Satpy are defined.

class satpy.resample.BaseResampler(source_geo_def, target_geo_def)[source]

Bases: object

Base abstract resampler class.

Initialize resampler with geolocation information.

Parameters:

source_geo_def (SwathDefinition, AreaDefinition) – Geolocation definition for the data to be resampled
target_geo_def (CoordinateDefinition, AreaDefinition) – Geolocation definition for the area to resample data to.

_create_cache_filename(cache_dir, prefix='', fmt='.zarr', **kwargs)[source]: Create filename for the cached resampling parameters.

compute(data, **kwargs)[source]

Do the actual resampling.

This must be implemented by subclasses.

get_hash(source_geo_def=None, target_geo_def=None, **kwargs)[source]: Get hash for the current resample with the given kwargs.

precompute(**kwargs)[source]

Do the precomputation.

This is an optional step if the subclass wants to implement more complex features like caching or can share some calculations between multiple datasets to be processed.

resample(data, cache_dir=None, mask_area=None, **kwargs)[source]

Resample data by calling precompute and compute methods.

Only certain resampling classes may use cache_dir and the mask provided when mask_area is True. The return value of calling the precompute method is passed as the cache_id keyword argument of the compute method, but may not be used directly for caching. It is up to the individual resampler subclasses to determine how this is used.

Parameters:

data (xarray.DataArray) – Data to be resampled
cache_dir (str) – directory to cache precomputed results (default False, optional)
mask_area (bool) – Mask geolocation data where data values are invalid. This should be used when data values may affect what neighbors are considered valid.

Returns (xarray.DataArray): Data resampled to the target area

class satpy.resample.BilinearResampler(source_geo_def, target_geo_def)[source]

Bases: BaseResampler

Resample using bilinear interpolation.

This resampler implements on-disk caching when the cache_dir argument is provided to the resample method. This should provide significant performance improvements on consecutive resampling of geostationary data.

Parameters:

cache_dir (str) – Long term storage directory for intermediate results.
radius_of_influence (float) – Search radius cut off distance in meters
epsilon (float) – Allowed uncertainty in meters. Increasing uncertainty reduces execution time.
reduce_data (bool) – Reduce the input data to (roughly) match the target area.

Init BilinearResampler.

compute(data, fill_value=None, **kwargs)[source]: Resample the given data using bilinear interpolation.

load_bil_info(cache_dir, **kwargs)[source]: Load bilinear resampling info from cache directory.

precompute(mask=None, radius_of_influence=50000, epsilon=0, reduce_data=True, cache_dir=False, **kwargs)[source]: Create bilinear coefficients and store them for later use.

save_bil_info(cache_dir, **kwargs)[source]: Save bilinear resampling info to cache directory.

class satpy.resample.BucketAvg(source_geo_def, target_geo_def)[source]

Bases: BucketResamplerBase

Class for averaging bucket resampling.

Bucket resampling calculates the average of all the values that are closest to each bin and inside the target area.

Parameters:

fill_value (float (default: np.nan)) – Fill value to mark missing/invalid values in the input data, as well as in the binned and averaged output data.
skipna (boolean (default: True)) – If True, skips missing values (as marked by NaN or fill_value) for the average calculation (similarly to Numpy’s nanmean). Buckets containing only missing values are set to fill_value. If False, sets the bucket to fill_value if one or more missing values are present in the bucket (similarly to Numpy’s mean). In both cases, empty buckets are set to fill_value.

Initialize bucket resampler.

compute(data, fill_value=nan, skipna=True, **kwargs)[source]

Call the resampling.

Parameters:

data (numpy.Array, dask.Array) – Data to be resampled
fill_value (numpy.nan, int) – fill_value. Defaults to numpy.nan
skipna (boolean) – Skip NA’s. Default True

Returns:

dask.Array

class satpy.resample.BucketCount(source_geo_def, target_geo_def)[source]

Bases: BucketResamplerBase

Class for bucket resampling which implements hit-counting.

This resampler calculates the number of occurences of the input data closest to each bin and inside the target area.

Initialize bucket resampler.

compute(data, **kwargs)[source]: Call the resampling.

class satpy.resample.BucketFraction(source_geo_def, target_geo_def)[source]

Bases: BucketResamplerBase

Class for bucket resampling to compute category fractions.

This resampler calculates the fraction of occurences of the input data per category.

Initialize bucket resampler.

compute(data, fill_value=nan, categories=None, **kwargs)[source]: Call the resampling.

class satpy.resample.BucketResamplerBase(source_geo_def, target_geo_def)[source]

Bases: BaseResampler

Base class for bucket resampling which implements averaging.

Initialize bucket resampler.

compute(data, **kwargs)[source]: Call the resampling.

precompute(**kwargs)[source]: Create X and Y indices and store them for later use.

resample(data, **kwargs)[source]

Resample data by calling precompute and compute methods.

Parameters:: data (xarray.DataArray) – Data to be resampled

Returns (xarray.DataArray): Data resampled to the target area

class satpy.resample.BucketSum(source_geo_def, target_geo_def)[source]

Bases: BucketResamplerBase

Class for bucket resampling which implements accumulation (sum).

This resampler calculates the cumulative sum of all the values that are closest to each bin and inside the target area.

Parameters:

fill_value (float (default: np.nan)) – Fill value for missing data
skipna (boolean (default: True)) – If True, skips NaN values for the sum calculation (similarly to Numpy’s nansum). Buckets containing only NaN are set to zero. If False, sets the bucket to NaN if one or more NaN values are present in the bucket (similarly to Numpy’s sum). In both cases, empty buckets are set to 0.

Initialize bucket resampler.

compute(data, skipna=True, **kwargs)[source]: Call the resampling.

class satpy.resample.KDTreeResampler(source_geo_def, target_geo_def)[source]

Bases: BaseResampler

Resample using a KDTree-based nearest neighbor algorithm.

This resampler implements on-disk caching when the cache_dir argument is provided to the resample method. This should provide significant performance improvements on consecutive resampling of geostationary data. It is not recommended to provide cache_dir when the mask keyword argument is provided to precompute which occurs by default for SwathDefinition source areas.

Parameters:

cache_dir (str) – Long term storage directory for intermediate results.
mask (bool) – Force resampled data’s invalid pixel mask to be used when searching for nearest neighbor pixels. By default this is True for SwathDefinition source areas and False for all other area definition types.
radius_of_influence (float) – Search radius cut off distance in meters
epsilon (float) – Allowed uncertainty in meters. Increasing uncertainty reduces execution time.

Init KDTreeResampler.

_adjust_radius_of_influence(radius_of_influence)[source]: Adjust radius of influence.

_apply_cached_index(val, idx_name, persist=False)[source]: Reassign resampler index attributes.

_check_numpy_cache(cache_dir, mask=None, **kwargs)[source]: Check if there’s Numpy cache file and convert it to zarr.

_read_resampler_attrs()[source]: Read certain attributes from the resampler for caching.

compute(data, weight_funcs=None, fill_value=nan, with_uncert=False, **kwargs)[source]: Resample data.

load_neighbour_info(cache_dir, mask=None, **kwargs)[source]: Read index arrays from either the in-memory or disk cache.

precompute(mask=None, radius_of_influence=None, epsilon=0, cache_dir=None, **kwargs)[source]

Create a KDTree structure and store it for later use.

Note: The mask keyword should be provided if geolocation may be valid where data points are invalid.

save_neighbour_info(cache_dir, mask=None, **kwargs)[source]: Cache resampler’s index arrays if there is a cache dir.

class satpy.resample.NativeResampler(source_geo_def, target_geo_def)[source]

Bases: BaseResampler

Expand or reduce input datasets to be the same shape.

If data is higher resolution (more pixels) than the destination area then data is averaged to match the destination resolution.

If data is lower resolution (less pixels) than the destination area then data is repeated to match the destination resolution.

This resampler does not perform any caching or masking due to the simplicity of the operations.

Initialize resampler with geolocation information.

Parameters:

source_geo_def (SwathDefinition, AreaDefinition) – Geolocation definition for the data to be resampled
target_geo_def (CoordinateDefinition, AreaDefinition) – Geolocation definition for the area to resample data to.

classmethod _expand_reduce(d_arr, repeats)[source]: Expand reduce.

compute(data, expand=True, **kwargs)[source]: Resample data with NativeResampler.

resample(data, cache_dir=None, mask_area=False, **kwargs)[source]: Run NativeResampler.

class satpy.resample._LegacySatpyEWAResampler(source_geo_def, target_geo_def)[source]

Bases: BaseResampler

Resample using an elliptical weighted averaging algorithm.

This algorithm does not use caching or any externally provided data mask (unlike the ‘nearest’ resampler).

This algorithm works under the assumption that the data is observed one scan line at a time. However, good results can still be achieved for non-scan based data provided rows_per_scan is set to the number of rows in the entire swath or by setting it to None.

Parameters:

rows_per_scan (int, None) – Number of data rows for every observed scanline. If None then the entire swath is treated as one large scanline.
weight_count (int) – number of elements to create in the gaussian weight table. Default is 10000. Must be at least 2
weight_min (float) – the minimum value to store in the last position of the weight table. Default is 0.01, which, with a weight_distance_max of 1.0 produces a weight of 0.01 at a grid cell distance of 1.0. Must be greater than 0.
weight_distance_max (float) – distance in grid cell units at which to apply a weight of weight_min. Default is 1.0. Must be greater than 0.
weight_delta_max (float) – maximum distance in grid cells in each grid dimension over which to distribute a single swath cell. Default is 10.0.
weight_sum_min (float) – minimum weight sum value. Cells whose weight sums are less than weight_sum_min are set to the grid fill value. Default is EPSILON.
maximum_weight_mode (bool) – If False (default), a weighted average of all swath cells that map to a particular grid cell is used. If True, the swath cell having the maximum weight of all swath cells that map to a particular grid cell is used. This option should be used for coded/category data, i.e. snow cover.

Init _LegacySatpyEWAResampler.

_call_fornav(cols, rows, target_geo_def, data, grid_coverage=0, **kwargs)[source]: Wrap fornav() to run as a dask delayed.

_call_ll2cr(lons, lats, target_geo_def, swath_usage=0)[source]: Wrap ll2cr() for handling dask delayed calls better.

compute(data, cache_id=None, fill_value=0, weight_count=10000, weight_min=0.01, weight_distance_max=1.0, weight_delta_max=1.0, weight_sum_min=-1.0, maximum_weight_mode=False, grid_coverage=0, **kwargs)[source]: Resample the data according to the precomputed X/Y coordinates.

precompute(cache_dir=None, swath_usage=0, **kwargs)[source]: Generate row and column arrays and store it for later use.

resample(*args, **kwargs)[source]: Run precompute and compute methods.

Note

This sets the default of ‘mask_area’ to False since it is not needed in EWA resampling currently.

satpy.resample._aggregate(d, y_size, x_size)[source]: Average every 4 elements (2x2) in a 2D array.

satpy.resample._get_arg_to_pass_for_skipna_handling(**kwargs)[source]: Determine if skipna can be passed to the compute functions for the average and sum bucket resampler.

satpy.resample._get_replicated_chunk_sizes(d_arr, repeats)[source]

satpy.resample._mean(data, y_size, x_size)[source]

satpy.resample._move_existing_caches(cache_dir, filename)[source]: Move existing cache files out of the way.

satpy.resample._rechunk_if_nonfactor_chunks(dask_arr, y_size, x_size)[source]

satpy.resample._repeat_by_factor(data, block_info=None)[source]

satpy.resample._replicate(d_arr, repeats)[source]: Repeat data pixels by the per-axis factors specified.

satpy.resample.add_crs_xy_coords(data_arr, area)[source]

Add pyproj.crs.CRS and x/y or lons/lats to coordinates.

For SwathDefinition or GridDefinition areas this will add a crs coordinate and coordinates for the 2D arrays of lons and lats.

For AreaDefinition areas this will add a crs coordinate and the 1-dimensional x and y coordinate variables.

Parameters:

data_arr (xarray.DataArray) – DataArray to add the ‘crs’ coordinate.
area (pyresample.geometry.AreaDefinition) – Area to get CRS information from.

satpy.resample.add_xy_coords(data_arr, area, crs=None)[source]

Assign x/y coordinates to DataArray from provided area.

If ‘x’ and ‘y’ coordinates already exist then they will not be added.

Parameters:

data_arr (xarray.DataArray) – data object to add x/y coordinates to
area (pyresample.geometry.AreaDefinition) – area providing the coordinate data.
crs (pyproj.crs.CRS or None) – CRS providing additional information about the area’s coordinate reference system if available. Requires pyproj 2.0+.

Returns (xarray.DataArray): Updated DataArray object

satpy.resample.get_area_def(area_name)[source]

Get the definition of area_name from file.

The file is defined to use is to be placed in the $SATPY_CONFIG_PATH directory, and its name is defined in satpy’s configuration file.

satpy.resample.get_area_file()[source]

Find area file(s) to use.

The files are to be named areas.yaml or areas.def.

satpy.resample.get_fill_value(dataset)[source]: Get the fill value of the dataset, defaulting to np.nan.

satpy.resample.hash_dict(the_dict, the_hash=None)[source]: Calculate a hash for a dictionary.

satpy.resample.prepare_resampler(source_area, destination_area, resampler=None, **resample_kwargs)[source]: Instantiate and return a resampler.

satpy.resample.resample(source_area, data, destination_area, resampler=None, **kwargs)[source]: Do the resampling.

satpy.resample.resample_dataset(dataset, destination_area, **kwargs)[source]

Resample dataset and return the resampled version.

Parameters:

dataset (xarray.DataArray) – Data to be resampled.
destination_area – The destination onto which to project the data, either a full blown area definition or a string corresponding to the name of the area as defined in the area file.
**kwargs – The extra parameters to pass to the resampler objects.

Returns:

A resampled DataArray with updated .attrs["area"] field. The dtype of the array is preserved.

satpy.resample.update_resampled_coords(old_data, new_data, new_area)[source]

Add coordinate information to newly resampled DataArray.

Parameters:

old_data (xarray.DataArray) – Old data before resampling.
new_data (xarray.DataArray) – New data after resampling.
new_area (pyresample.geometry.BaseDefinition) – Area definition for the newly resampled data.