dep_tools.task module

Tasks form the core of the DEP scaling procedure. They orchestrate tasks by loading data, processing it, and writing the output. Tasks can be generic but are sometimes fine-tuned for specific processing.

class dep_tools.task.Task(task_id, loader, processor, writer, logger)[source]

Bases: ABC

The abstract base for Task objects.

Task objects load data, process it, and write output. They are reusable for the same task operating on new data.

Parameters:
  • id – An identifier for a particular task.

  • loader (Loader) – A loader loads data, usually based on the id.

  • processor (Processor) – A processor processes loaded data.

  • writer (Writer) – A writer writes output from the processor.

  • logger (Logger) – A logger records processing information.

abstract run()[source]

The run method orchestrates the processing. Usually a typical workflow would be load -> process -> write.

Return type:

Any

Returns:

Typically a task will return artifacts from the processing (like a file path).

class dep_tools.task.AreaTask(id, area, loader, processor, writer, logger=<RootLogger root (WARNING)>)[source]

Bases: Task

An AreaTask adds an area property to a basic Task.

Most other arguments are as for Task.

Parameters:

area (GeoDataFrame) – An area for use by the loader and/or processor. For instance, it can be used to clip data.

run()[source]

Run the task.

Return type:

str | list[str] | None

Returns:

The output of the writer, typically a list of paths as strings.

class dep_tools.task.StacTask(id, area, searcher, loader, processor, writer, post_processor=None, stac_creator=None, stac_writer=None, logger=<RootLogger root (WARNING)>)[source]

Bases: AreaTask

A StacTask extends AreaTask by adding a searcher and optional post-processor, STAC creator, and STAC writer.

Most arguments (id, area, loader, processor, writer, logger) are as for AreaTask.

Parameters:
  • searcher (Searcher) – The searcher searches for data, typically on the basis of the id and/or the area.

  • post_processor (Optional[Processor]) – A Processor that can prep data for writing, for example scaling or data type conversions.

  • stac_creator (Optional[StacCreator]) – Creates a STAC Item from the data.

  • stac_writer (Optional[Writer]) – Writes the STAC Item to storage.

run()[source]

Run the task.

Returns:

The output of the writer, typically a list of paths as strings.

class dep_tools.task.AwsStacTask(itempath, id, area, searcher, loader, processor, post_processor=None, logger=<RootLogger root (WARNING)>, **kwargs)[source]

Bases: StacTask

A convenience class with values of writer, stac_creator, and stac_writer set to sensible defaults for writing to S3.

By default, an AwsDsCogWriter is used as the primary writer, an AwsStacWriter is used to write STAC Items, and the base StacCreator is used to create the STAC object.

All other arguments are as for StacTask.

Parameters:

**kwargs – Additional arguments passed to StacTask.

class dep_tools.task.ItemStacTask(id, item, loader, processor, writer, post_processor=None, stac_creator=None, stac_writer=None, logger=<RootLogger root (WARNING)>)[source]

Bases: Task

A task for a single STAC item.

Most arguments are as for StacTask, except area is dropped.

Parameters:

item (Item) – A pystac.Item representing the input data.

run()[source]

The run method orchestrates the processing. Usually a typical workflow would be load -> process -> write.

Returns:

Typically a task will return artifacts from the processing (like a file path).

class dep_tools.task.ErrorCategoryAreaTask(id, area, loader, processor, writer, logger=<RootLogger root (WARNING)>)[source]

Bases: AreaTask

An AreaTask with extra logging.

Errors logged include:
  • EmptyCollectionError from loader: logged as “no items for areas”

  • Other Exception from loader: logged as “load error”

  • Exception from processor: logged as “processor error”

  • Empty processor output: logged as “no output from processor”

  • Writer error: logged as “error” (could be from writer or something beforehand if using dask)

run()[source]

Run the task.

Returns:

The output of the writer, typically a list of paths as strings.

class dep_tools.task.MultiAreaTask(ids, areas, logger, task_class, fail_on_error=True, **kwargs)[source]

Bases: object

A “Task” object that iterates over multiple IDs and runs a task for each.

This class is useful when running multiple short tasks where the time to build the run environment (for instance, if running on a pod) adds considerably to overall processing time.

Parameters:
  • ids (list[str]) – A list of IDs.

  • areas (GeoDataFrame) – A geopandas.GeoDataFrame with index corresponding to the IDs.

  • task_class (type[AreaTask]) – The AreaTask subclass to use for each task.

  • fail_on_error (bool) – If True, will exit on error. Otherwise, will log the full exception and continue.

  • logger – A logger.

  • **kwargs – Additional arguments to the task_class constructor.

run()[source]
class dep_tools.task.SimpleLoggingAreaTask(id, area, loader, processor, writer, logger=<RootLogger root (WARNING)>)[source]

Bases: AreaTask

An AreaTask with basic logging.

All arguments are as for AreaTask.

run()[source]

Run the task.

Returns:

The output of the writer, typically a list of paths as strings.