Auxiliary Data Download
Sometimes Satpy components need some extra data files to get their work done properly. These include files like Look Up Tables (LUTs), coefficients, or Earth model data (ex. elevations). This includes any file that would be too large to be included in the Satpy python package; anything bigger than a small text file. To help with this, Satpy includes utilities for downloading and caching these files only when your component is used. This saves the user from wasting time and disk space downloading files they may never use. This functionality is made possible thanks to the Pooch library.
Downloaded files are stored in the directory configured by Data Directory.
Adding download functionality
The utility functions for data downloading include a two step process:
Registering: Tell Satpy what files might need to be downloaded and used later.
Retrieving: Ask Satpy to download and store the files locally.
Registering
Registering a file for downloading tells Satpy the remote URL for the file,
and an optional hash. The hash is used to verify a successful download.
Registering can also include a filename
to tell Satpy what to name the
file when it is downloaded. If not provided it will be determined from the URL.
Once registered, Satpy can be told to retrieve the file (see below) by using a
“cache key”. Cache keys follow the general scheme of
<component_type>/<filename>
(ex. readers/README.rst
).
Satpy includes a low-level function and a high-level Mixin class for
registering files. The higher level class is recommended for any Satpy
component like readers, writers, and compositors. The lower-level
register_file()
function can be used for any other
use case.
The DataMixIn
class is automatically included
in the FileYAMLReader
and
Writer
base classes. For any other component (like
a compositor) you should include it as another parent class:
from satpy.aux_download import DataDownloadMixin
from satpy.composites import GenericCompositor
class MyCompositor(GenericCompositor, DataDownloadMixin):
"""Compositor that uses downloaded files."""
def __init__(self, name, url=None, known_hash=None, **kwargs):
super().__init__(name, **kwargs)
data_files = [{'url': url, 'known_hash': known_hash}]
self.register_data_files(data_files)
However your code registers files, to be consistent it must do it during
initialization so that the find_registerable_files()
.
If your component isn’t a reader, writer, or compositor then this function
will need to be updated to find and load your registered files. See
Offline Downloads below for more information.
As mentioned, the mixin class is included in the base reader and writer class.
To register files in these cases, include a data_files
section in your
YAML configuration file. For readers this would go under the reader
section and for writers the writer
section. This parameter is a list
of dictionaries including a url
, known_hash
, and optional
filename
. For example:
reader:
name: abi_l1b
short_name: ABI L1b
long_name: GOES-R ABI Level 1b
... other metadata ...
data_files:
- url: "https://example.com/my_data_file.dat"
- url: "https://raw.githubusercontent.com/pytroll/satpy/main/README.rst"
known_hash: "sha256:5891286b63e7745de08c4b0ac204ad44cfdb9ab770309debaba90308305fa759"
- url: "https://raw.githubusercontent.com/pytroll/satpy/main/RELEASING.md"
filename: "satpy_releasing.md"
known_hash: null
See the DataDownloadMixin
for more information.
Retrieving
Files that have been registered (see above) can be retrieved by calling the
retrieve()
function. This function expects a single
argument: the cache key. Cache keys are returned by registering functions, but
can also be pre-determined by following the scheme
<component_type>/<filename>
(ex. readers/README.rst
).
Retrieving a file will download it to local disk if needed and then return
the local pathname. Data is stored locally in the Data Directory.
It is up to the caller to then open the file.
Offline Downloads
To assist with operational environments, Satpy includes a
retrieve_all()
function that will try to find all
files that Satpy components may need to download in the future and download
them to the current directory specified by Data Directory.
This function allows you to specify a list of readers
, writers
, or
composite_sensors
to limit what components are checked for files to
download.
The retrieve_all
function is also available through a command line script
called satpy_retrieve_all_aux_data
. Run the following for usage information.
satpy_retrieve_all_aux_data --help
To make sure that no additional files are downloaded when running Satpy see Demo Data Directory.