satpy.readers.core.grouping module

Grouping functionality for the readers.

satpy.readers.core.grouping._assign_files_to_readers(files_to_sort, reader_names, reader_kwargs)[source]

Assign files to readers.

Given a list of file names (paths), match those to reader instances.

Internal helper for group_files.

Parameters:

files_to_sort (Collection[str]) – Files to assign to readers.
reader_names (Collection[str]) – Readers to consider
reader_kwargs (Mapping)

Returns:

Mapping[str, Tuple[reader, Set[str]]] Mapping where the keys are reader names and the values are tuples of (reader_configs, filenames).

satpy.readers.core.grouping._check_raise_missing(missing, i, readers_without_files)[source]

satpy.readers.core.grouping._check_reader_file_status(reader_files, missing_ok)[source]

satpy.readers.core.grouping._check_sensor_status(sensor, sensor_supported)[source]

satpy.readers.core.grouping._filter_groups(groups, missing='pass')[source]

Filter multi-reader group-files behavior.

Helper for group_files. When group_files is called with multiple readers, make sure that the desired behaviour for missing files is enforced: if missing is "raise", raise an exception if at least one group has at least one reader without files; if it is "skip", remove those. If it is "pass", do nothing. Yields groups to be kept.

Parameters:

groups (List[Mapping[str, List[str]]]) – groups as found by group_files.
missing (str) – String controlling behaviour, see documentation above.

Yields:

Mapping[str:, List[str]] – groups to be retained

satpy.readers.core.grouping._get_file_keys_for_reader_files(reader_files, group_keys=None)[source]

From a mapping from _assign_files_to_readers, get file keys.

Given a mapping where each key is a reader name and each value is a tuple of reader instance (typically FileYAMLReader) and a collection of files, return a mapping with the same keys, but where the values are lists of tuples of (keys, filename), where keys are extracted from the filenames according to group_keys and filenames are the names those keys were extracted from.

Internal helper for group_files.

Returns:: Mapping[str, List[Tuple[Tuple, str]]], as described.

satpy.readers.core.grouping._get_group_status(gk, prev_key, threshold)[source]

satpy.readers.core.grouping._get_keys_with_empty_values(grp)[source]

Find mapping keys where values have length zero.

Helper for _filter_groups, which is in turn a helper for group_files. Given a mapping key -> Collection[Any], return the keys where the length of the collection is zero.

Parameters:: grp (Mapping[Any, Collection[Any]]) – dictionary to check
Returns:: set of keys

satpy.readers.core.grouping._get_loadables_for_reader_config(base_dir, reader, sensor, reader_configs, reader_kwargs, fs)[source]

Get loadables for reader configs.

Helper for find_files_and_readers.

Parameters:

base_dir (str) – as for find_files_and_readers
reader (str) – as for find_files_and_readers
sensor (str) – as for find_files_and_readers
reader_configs (dict) – reader metadata such as returned by configs_for_reader.
reader_kwargs (dict) – Keyword arguments to be passed to reader.
fs (fsspec.spec.AbstractFileSystem) – as for find_files_and_readers

satpy.readers.core.grouping._get_loadables_from_reader(reader_instance, base_dir, fs)[source]

satpy.readers.core.grouping._get_reader_instance(reader, reader_configs, **reader_kwargs)[source]

satpy.readers.core.grouping._get_sorted_file_groups(all_file_keys, time_threshold)[source]

Get sorted file groups.

Get a list of dictionaries, where each list item consists of a dictionary mapping a tuple of keys to a mapping of reader names to files. The files listed in each list item are considered to be grouped within the same time.

Parameters:

all_file_keys (Iterable) – as returned by _get_file_keys_for_reader_files
time_threshold (numbers.Number) – temporal threshold in seconds

Returns:

List[Mapping[Tuple, Mapping[str, List[str]]]], as described

Internal helper for group_files.

satpy.readers.core.grouping._is_single_reader(reader)[source]

satpy.readers.core.grouping._set_filter_times(filter_parameters, start_time, end_time)[source]

satpy.readers.core.grouping._update_existing_group(file_groups, rn, prev_key, f)[source]

satpy.readers.core.grouping._update_file_keys(file_keys, group_keys, file_info, f, reader_name)[source]

satpy.readers.core.grouping._update_reader_files(reader_files, reader_instance, loadables)[source]

satpy.readers.core.grouping._walk_through_sorted_filetype_items(reader_instance, file_keys, files_to_sort, group_keys, reader_name)[source]

satpy.readers.core.grouping.find_files_and_readers(start_time=None, end_time=None, base_dir=None, reader=None, sensor=None, filter_parameters=None, reader_kwargs=None, missing_ok=False, fs=None)[source]

Find files matching the provided parameters.

Use start_time and/or end_time to limit found filenames by the times in the filenames (not the internal file metadata). Files are matched if they fall anywhere within the range specified by these parameters.

Searching is NOT recursive.

Files may be either on-disk or on a remote file system. By default, files are searched for locally. Users can search on remote filesystems by passing an instance of an implementation of fsspec.spec.AbstractFileSystem (strictly speaking, any object of a class implementing a glob method works).

If locating files on a local file system, the returned dictionary can be passed directly to the Scene object through the filenames keyword argument. If it points to a remote file system, it is the responsibility of the user to download the files first (directly reading from cloud storage is not currently available in Satpy).

The behaviour of time-based filtering depends on whether or not the filename contains information about the end time of the data or not:

if the end time is not present in the filename, the start time of the filename is used and has to fall between (inclusive) the requested start and end times

otherwise, the timespan of the filename has to overlap the requested timespan

Example usage for querying a s3 filesystem using the s3fs module:

>>> import s3fs, satpy.readers, datetime
>>> satpy.readers.find_files_and_readers(
...     base_dir="s3://noaa-goes16/ABI-L1b-RadF/2019/321/14/",
...     fs=s3fs.S3FileSystem(anon=True),
...     reader="abi_l1b",
...     start_time=datetime.datetime(2019, 11, 17, 14, 40))
{'abi_l1b': [...]}

Parameters:

start_time (datetime.datetime) – Limit used files by starting time.
end_time (datetime.datetime) – Limit used files by ending time.
base_dir (str) – The directory to search for files containing the data to load. Defaults to the current directory.
reader (str or list) – The name of the reader to use for loading the data or a list of names.
sensor (str or list) – Limit used files by provided sensors.
filter_parameters (dict) – Filename pattern metadata to filter on. start_time and end_time are automatically added to this dictionary. Shortcut for reader_kwargs[‘filter_parameters’].
reader_kwargs (dict) – Keyword arguments to pass to specific reader instances to further configure file searching.
missing_ok (bool) – If False (default), raise ValueError if no files are found. If True, return empty dictionary if no files are found.
fs (fsspec.spec.AbstractFileSystem) – Optional, instance of implementation of fsspec.spec.AbstractFileSystem (strictly speaking, any object of a class implementing .glob is enough). Defaults to searching the local filesystem.

Returns:

Dictionary mapping reader name string to list of filenames

Return type:

dict

satpy.readers.core.grouping.group_files(files_to_sort, reader=None, time_threshold=10, group_keys=None, reader_kwargs=None, missing='pass')[source]

Group series of files by file pattern information.

By default this will group files by their filename start_time assuming it exists in the pattern. By passing the individual dictionaries returned by this function to the Scene classes’ filenames, a series Scene objects can be easily created.

Parameters:

files_to_sort (Iterable) – File paths to sort in to group
reader (str or Collection[str]) – Reader or readers whose file patterns should be used to sort files. If not given, try all readers (slow, adding a list of readers is strongly recommended).
time_threshold (int) – Number of seconds used to consider time elements in a group as being equal. For example, if the ‘start_time’ item is used to group files then any time within time_threshold seconds of the first file’s ‘start_time’ will be seen as occurring at the same time.
group_keys (list or tuple) – File pattern information to use to group files. Keys are sorted in order and only the first key is used when comparing datetime elements with time_threshold (see above). This means it is recommended that datetime values should only come from the first key in group_keys. Otherwise, there is a good chance that files will not be grouped properly (datetimes being barely unequal). Defaults to a reader’s group_keys configuration (set in YAML), otherwise ('start_time',). When passing multiple readers, passing group_keys is strongly recommended as the behaviour without doing so is undefined.
reader_kwargs (dict) – Additional keyword arguments to pass to reader creation.
missing (str) – Parameter to control the behavior in the scenario where multiple readers were passed, but at least one group does not have files associated with every reader. Valid values are "pass" (the default), "skip", and "raise". If set to "pass", groups are passed as-is. Some groups may have zero files for some readers. If set to "skip", groups for which one or more readers have zero files are skipped (meaning that some files may not be associated to any group). If set to "raise", raise a FileNotFoundError in case there are any groups for which one or more readers have no files associated.

Returns:

List of dictionaries mapping ‘reader’ to a list of filenames. Each of these dictionaries can be passed as filenames to a Scene object.