Tutorial
On this page are examples of running EODIE in different environments (your own computer vs. a high-performance computing environment) and with different platforms. The basics are the same regardless of platform. Tutorials for Landsat 8 can be found below Sentinel-2 tutorials.
Sentinel-2 tutorials
Case 1: growing season mean NDVI timeseries of agricultural fieldparcels of area x (larger than one Sentinel-2 tile)
Sentinel-2 data for years 2017 - 2020 of whole country
fieldparcel polygons of area x as ESRI shapefile, with unique ID fieldname ‘PlotID’
timeframe: April 1st - August 31st year 2018
SQLite database containing mean NDVI timeseries for each fieldparcel polygon
1. Call EODIE python eodie_process.py --platform s2 --rasterdir S2files/dir --vector full/path/to/shapefile.shp --out ./results --id PlotID --database_out --index ndvi --statistics mean
This results into a single SQLite database file (.db) containing results in a table named “ndvi”.
2. (optional) Use export_from_database.py script in postprocesses to extract values from database into a single .csv file.
Case 2: As Case 1 but field parcel array timeseries are the desired output
Sentinel-2 data for years 2017 - 2020 of whole country
fieldparcel polygons of area x as ESRI shapefile, with unique ID fieldname ‘PlotID’
timeframe: April 1st - August 31st year 2018
timeseries of fieldparcel arrays
Call EODIE:
python eodie_process.py --platform s2 --rasterdir S2files/dir --vector full/path/to/shapefile.shp --out ./results --id PlotID --array_out --index ndvi
this results in a number of single pickle files, one for each tile and date with all ids(optional) Use arrayplot.py in postprocesses to show/save timeseries plots from wished ids.
Case 3: As Case 1 but processing done on HPC environment with SLURM
Sentinel-2 data for years 2017-2020 of whole country
fieldparcel polygons of area x as ESRI shapefile, with unique ID fieldname ‘PlotID’
timeframe: April 1st - August 31st year 2018
database with NDVI timeseries for each fieldparcel polygon with statistics mean, median, standard deviation and range
Create a batch job script (example below is for CSCs Puhti supercomputer) with your data
#!/bin/bash -l
#SBATCH --job-name= # Give the job a name
#SBATCH --account=project_ # The project number on which the resources will be spent
#SBATCH --output=/path/to/job/output/%J_out.txt # Path to where the output text files will be saved
#SBATCH --error=/path/to/job/output/%J_err.txt # Path to where the error text files will be saved
#SBATCH --time=02:00:00 # Estimation of the time it takes to process the files
#SBATCH --ntasks=1 # The number of tasks
#SBATCH --partition=small # The estimated processing power needed limitations (more partitions can be found in https://docs.csc.fi/computing/running/batch-job-partitions/)
#SBATCH --mem-per-cpu=5000 # Estimation of how much memory is needed per cpu
#SBATCH --cpus-per-task=n # Change n to the number of CPUs per task
module load geoconda # Loads the needed module for processing
cd /path/to/the/program/EODIE/src/ # Needs to be in the EODIE directory to work properly
# The actual processing:
python eodie_process.py --platform s2 --rasterdir /path/to/directory/with/SAFEs/ --vector /path/to/vectorfile.shp --out ./results --id PlotID --database_out --index ndvi --statistics mean median std range
# More specific arguments and their purpose can be found in EODIE documentation: https://eodie.readthedocs.io/en/latest/
call
sbatch name_of_above_script.sh
Case 4: As Case 3 but with data on objectstorage
Sentinel-2 data for years 2017-2020 of whole country in buckets named Sentinel2-MSIL2A-cloud-0-95-YEAR-TTILE
fieldparcel polygons of area x as ESRI shapefile, with unique ID fieldname ‘PlotID’
database with NDVI timeseries for each fieldparcel polygon in the whole country with statistics mean, median, standard deviation and range
Similar as Case 3 but this needs another script, called download_and_eodie.sh, for downloading the input files from object storage and launching EODIE after download is completed:
start=$1
end=$2
startyear=$(echo $start | cut -c1-4)
endyear=$(echo $end | cut -c1-4)
shift
shift
tiles=$@
basebucket="s3://Sentinel2-MSIL2A-cloud-0-95"
timeperiod=$(seq $startyear $endyear)
for year in $timeperiod; do
for tile in $tiles; do
# Create a directory to download the imagery into
mkdir $year-$tile
# Define bucket name
bucket="$basebucket-$year-T$tile"
echo $bucket
# Load files from bucket to directory
s3cmd get -r $bucket/ $year-$tile/
# Send batch job with directory name as argument
sbatch sbatch_smart.sh $year-$tile/
done
done
The main batch job script is similar to the one in Case 3, called sbatch_smart.sh:
#!/bin/bash -l
#SBATCH --job-name=smart_xxx
#SBATCH --account=project_xxx
#SBATCH --output=/scratch/project_xxx/out/%J_out.txt
#SBATCH --error=/scratch/project_xxx/out/%J_err.txt
#SBATCH --time=02:00:00 # Depending on the complexity of your vectorfile, this time window might not be enough.
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=5
#SBATCH --mem-per-cpu=8G
#SBATCH --partition=small
# Store argument into variable
path=$1
module load geoconda
cd /scratch/project_xxx/EODIE/src/
# Call EODIE
python eodie_process.py --platform s2 --rasterdir $path --vector path/to/vectorfile.shp --out ./results --id PlotID --database_out --index ndvi --statistics mean median std range
# When ready, the contents of variable $path can be removed as the files are in object storage. Please make sure you have reserved enough time and computational resources for finishing the computations to avoid unnecessary deletion of raster files (or comment the rm off).
rm -r $path/
Call
bash download_and_eodie.sh startdate enddate tile1 tile2 tile3
with dates in YYYYMMDD format and tilenames in XX000 format. In this case the tilenames need to be identified beforehand. This will launch the script in step 1 that will proceed to launch EODIE for each tile and year requested.
Landsat 8 Tutorials
Please note: EODIE currently works only with Landsat 8 Collection 2 data.
Case 1: Growing season mean NDVI timeseries of agricultural fieldparcels of area x (larger than one Landsat 8 tile)
Landsat 8 data downloaded from Earth Explorer as .tar files, covering growing season 2019
fieldparcel polygons of area x as ESRI shapefile, with unique ID fieldname ‘PlotID’
SQLite database containing mean NDVI timeseries for each fieldparcel polygon
1. Downloading Landsat 8 data from Earth Explorer results to .tar files, from which the imagery needs to be extracted. The goal is to create a directory with the name of the tar file and extract files into the directory.
This can be achieved with some basic software, but in helper scripts there is also script extract_from_tar.py
to do this task for all (Landsat 8) tars in a given directory.
In addition to Landsat 8 imagery, the tiling grid is required. It can be downloaded here [Descending (daytime)]. After downloading, manually unzip and relocate the shapefile to EODIE’s source directory or use auxiliary script unzip_ls8_grid.py.
3. After extracting the data, EODIE can be called. Call is basically same as with Sentinel-2 data, but arguments –platform and –rasterdir need to be changed.
python eodie_process.py --platform ls8 --rasterdir dir/with/extracted/Landsat8/folders/ --vector full/path/to/shapefile.shp --out ./results --id PlotID --database_out --index ndvi --statistics mean median std range
Case 2: As Case 1 but processing done on HPC environment with SLURM
Landsat 8 data downloaded from Earth Explorer and extracted from .tar files, covering growing season 2019
fieldparcel polygons of area x as ESRI shapefile, with unique ID fieldname ‘PlotID’
timeframe: May 1st - July 31st year 2019
database with NDVI timeseries for each fieldparcel polygon with statistics mean, median, standard deviation and range
Create a batch job script (example below is for CSCs Puhti supercomputer) with your data
#!/bin/bash -l
#SBATCH --job-name=EODIE_landsat # Give the job a name
#SBATCH --account=project_ # The project number on which the resources will be spent
#SBATCH --output=/path/to/job/output/%J_out.txt # Path to where the output text files will be saved
#SBATCH --error=/path/to/job/output/%J_err.txt # Path to where the error text files will be saved
#SBATCH --time=02:00:00 # Estimation of the time it takes to process the files
#SBATCH --ntasks=1 # The number of tasks
#SBATCH --partition=small # The estimated processing power needed limitations (more partitions can be found in https://docs.csc.fi/computing/running/batch-job-partitions/)
#SBATCH --mem-per-cpu=5000 # Estimation of how much memory is needed per cpu
#SBATCH --cpus-per-task=n # Change n to the number of CPUs per task
module load geoconda # Loads the needed module for processing
cd /path/to/the/program/EODIE/src/ # Needs to be in the EODIE directory to work properly
# The actual processing:
python eodie_process.py --platform ls8 --rasterdir dir/with/extracted/Landsat8/folders/ --vector full/path/to/shapefile.shp --out ./results --id PlotID --database_out --index ndvi --statistics mean median std range --start 20190501 --end 20190731``
# More specific arguments and their purpose can be found in EODIE documentation: https://eodie.readthedocs.io/en/latest/
call
sbatch name_of_above_script.sh