Skip to content
English - United States
  • There are no suggestions because the search field is empty.

Parallelizing Processes

Parallelizing - raster/storage/metadata requests, workflows jobs

  • What and when to parallelize?
  • If you have something interactive you want to parallelize locally
  • If you want to send a large task to run a long time reliably (ie overnight) then you want to use Tasks
  • Best practices in Tasks is published in the docs

 

Use cases for parallel processing

  • Speed up access to an API (ex: requests to Storage, or Task results)
    • Download a large collection of model outputs (ex: csv files) from DL Storage
    • Quickly download imagery across many AOI with Raster
  • Handle compute heavy workload that can be split into pieces
    • Deploy a model over a large AOI split into tiles
  • Scale handling of large datasets that can be split into pieces
    • Ingest a large dataset consisting of many images on a object storage service (Google Cloud Storage, Azure Storage)
    • Leverage larger memory footprints in DL Tasks to download very large files and upload to Catalog

 

Methods for parallel processing

  • Multithreading and multiprocessing on a single machine (e.g. Workbench)
  • Split across multiple machines in a scaled-compute environment (e.g Tasks)
  • Combination of both methods (multithreading within a Tasks worker)
  • Workflows map functions and built-in parallelism. (autoscaling)

 

Considerations to make when choosing a method

  • If the worker process is accessing an API, what are that API’s rate limits?
  • How much memory will each individual unit of work require? (Will the memory consumed by all active workers  exceed local memory? If so choose Tasks instead of local processing)
  • Will a unit of work exceed Workflows memory constraints or complexity?
  • Can a unit of work be expressed in a Workflows query?