Internals

The following gives an overview over what is happening in the background when the user calls:

python eodie_process.py --rasterdir ./testfiles/S2 --vector ./testfiles/shp/test_parcels_32635 --out ./results --id ID --index ndvi --platform s2 --statistics_out

Step 1: Validation

To make sure all necessary inputs have been given and they are interpretable, EODIE will check the inputs. If any of the input checks fails, EODIE exits and tells, what went wrong so the input can be fixed accordingly.

Step 2: Input vector conversion

If input vector is ESRI Shapefile (like in this example case), this step will be skipped. Otherwise, EODIE will convert the vector input to ESRI Shapefile format for further processing.

Step 3: Matching data

The actual the workflow begins with finding the right data to be processed (from --rasterdir). For that, the input vector’s projection is adjusted to match the one of the data, a convexhull is created and overlayed with the Sentinel-2 grid (tileshp in user_config.yml) to find the tilenames overlapping the area of interest. Based on this and the timeframe of interest (in this case default for --start and --end), a list of filenames is created with all files to be processed. Until this step both personal computer and HPC process are same.

Step 4: Splitting vectors

For efficient processing the input vector is split based on the tilegrid to have one shapefile per tile, which can then go into the process. Only polygons that are fully within a tile are considered. Due to the overlap of the tiles, all data is processed (rare exceptions). Each file in the list is processed one after another.

Step 5: Processing vectors

The process works along the list of splitted shapefiles, choosing the right vectorfile (--vector + _tilename + .extension) for each raster based on tilename. On HPC systems, the process can be done in parallel since the single processes do not overlap. Each process takes one raster, chooses the vectorfile accordingly and copies it to a temporary directory (which is automatically removed after the process) and applies the following workflow:

A binary cloudmask is extracted based on the scene classification of the Sentinel-2 tile. With the awareness that it is not the best possibly cloudmask, also external binary cloudmasks can be uploaded and used instead (using --external_cloudmask). Depending on the needs, the data is now processed to vegetation index (--index) and resampled (to pixelsize in user_config.yml) if necessary. The cloudmask is then applied to each index/band and user chosen statistics (default count since no --statistics are given) are extracted. This step takes into account all pixels that touch the polygons boundary (this can be changed to ‘all pixels whose midpoint is within the polygons boundary’ by using --exclude_border).

Step 6: Writing outputs

The extracted statistics are stored in csv file format with one file per tile per timepoint per index/band and one unique polygon per row of the file. These csv files can be further combined to form a timeseries per index/band by using one of the combine_statistics_x scripts in postprocesses if needed.