Reading remote files
Using a single reader
Some of the readers in Satpy can read data directly over various transfer protocols. This is done using fsspec and various packages it is using underneath.
As an example, reading ABI data from public AWS S3 storage can be done in the following way:
from satpy import Scene
storage_options = {'anon': True}
filenames = ['s3://noaa-goes16/ABI-L1b-RadC/2019/001/17/*_G16_s20190011702186*']
scn = Scene(reader='abi_l1b', filenames=filenames, reader_kwargs={'storage_options': storage_options})
scn.load(['true_color_raw'])
Reading from S3 as above requires the s3fs library to be installed in addition to fsspec.
As an alternative, the storage options can be given using fsspec configuration. For the above example, the configuration could be saved to s3.json in the fsspec configuration directory (by default placed in ~/.config/fsspec/ directory in Linux):
{
"s3": {
"anon": "true"
}
}
Note
Options given in reader_kwargs override only the matching options given in configuration file and everythin else is left as-is. In case of problems in data access, remove the configuration file to see if that solves the issue.
For reference, reading SEVIRI HRIT data from a local S3 storage works the same way:
filenames = [
's3://satellite-data-eumetcast-seviri-rss/H-000-MSG3*202204260855*',
]
storage_options = {
"client_kwargs": {"endpoint_url": "https://PLACE-YOUR-SERVER-URL-HERE"},
"secret": "VERYBIGSECRET",
"key": "ACCESSKEY"
}
scn = Scene(reader='seviri_l1b_hrit', filenames=filenames, reader_kwargs={'storage_options': storage_options})
scn.load(['WV_073'])
Using the fsspec configuration in s3.json the configuration would look like this:
{
"s3": {
"client_kwargs": {"endpoint_url": "https://PLACE-YOUR-SERVER-URL-HERE"},
"secret": "VERYBIGSECRET",
"key": "ACCESSKEY"
}
}
Using multiple readers
If multiple readers are used and the required credentials differ, the storage options are passed per reader like this:
reader1_filenames = [...]
reader2_filenames = [...]
filenames = {
'reader1': reader1_filenames,
'reader2': reader2_filenames,
}
reader1_storage_options = {...}
reader2_storage_options = {...}
reader_kwargs = {
'reader1': {
'option1': 'foo',
'storage_options': reader1_storage_options,
},
'reader2': {
'option1': 'foo',
'storage_options': reader1_storage_options,
}
}
scn = Scene(filenames=filenames, reader_kwargs=reader_kwargs)
Caching the remote files
Caching the remote file locally can speedup the overall processing time significantly, especially if the data are re-used for example when testing. The caching can be done by taking advantage of the fsspec caching mechanism:
reader_kwargs = {
'storage_options': {
's3': {'anon': True},
'simple': {
'cache_storage': '/tmp/s3_cache',
}
}
}
filenames = ['simplecache::s3://noaa-goes16/ABI-L1b-RadC/2019/001/17/*_G16_s20190011702186*']
scn = Scene(reader='abi_l1b', filenames=filenames, reader_kwargs=reader_kwargs)
scn.load(['true_color_raw'])
scn2 = scn.resample(scn.coarsest_area(), resampler='native')
scn2.save_datasets(base_dir='/tmp/', tiled=True, blockxsize=512, blockysize=512, driver='COG', overviews=[])
The following table shows the timings for running the above code with different cache statuses:
.. _cache_timing_table:
Caching |
Elapsed time |
Notes |
---|---|---|
No caching |
650 s |
remove reader_kwargs and simplecache:: from the code |
File cache |
66 s |
Initial run |
File cache |
13 s |
Second run |
Note
The cache is not cleaned by Satpy nor fsspec so the user should handle cleaning excess files from cache_storage.
Note
Only simplecache is considered thread-safe, so using the other caching mechanisms may or may not work depending on the reader, Dask scheduler or the phase of the moon.
Resources
See FSFile
for direct usage of fsspec with Satpy, and
fsspec documentation for more details on connection options
and detailes.