Dataset Specifications

arXiv

Overview

GreenEarthNet is an updated version of the EarthNet2021 dataset. Improvements are mainly:

  • Files are now netCDF (having proper georeferencing)
  • Landcover map included
  • New cloud mask
  • New Scoring with a focus on vegetation modeling
  • No more mesoscale weather

It contains the same locations of minicubes that were present in EarthNet2021.

NOTE: In the paper the dataset is called GreenEarthNet, but in the codebase you will also find the following acronyms that were used during development: earthnet2021x, en21x.

One Minicube

One Minicube (one sample) of GreenEarthNet contains 20 variables of different dimensions:

  • Spatio-temporal:
    • Sentinel 2
      • Bands B02, B03, B04, B8A (blue, green, red, near-infrared)
      • 20m resolution
      • 5-daily (with NaN in between)
      • Variable names ["s2_B02", "s2_B03", "s2_B04", "s2_B8A"]
    • Sentinel 2 Auxilary information
      • Improved cloud mask (variable name "s2_mask")
      • Scene classification layer SCL (variable name "s2_SCL")
      • Availability indicator (only temporal, variable name "s2_avail")
  • Temporal:
    • E-OBS meterology
      • Wind speed "eobs_fg" (often missing!)
      • Relative humidity "eobs_hu"
      • Rainfall "eobs_rr"
      • Sea-level pressure "eobs_pp"
      • Shortwave downwelling radiation "eobs_qq"
      • Temperature (Daily Avg, Min, Max: "eobs_tg", "eobs_tn", "eobs_tx")
      • daily
  • Spatial:
    • Digital Elevation models
      • from NASA, ESA and JAXA
      • Variable names ["nasa_dem", "cop_dem", "alos_dem"]
      • Resampled to 20m
    • ESA Worldover Landcover map
      • "esawc_lc"
    • Geomorpho90m terrain classification
      • "geom_cls"

Computing NDVI

The recommended way for computing the Normalized Difference Vegetation Index (NDVI) is using the Python package xarray:

import xarray as xr

minicube = xr.open_dataset("path_to_minicube")
nir = minicube.s2_B8A
red = minicube.s2_B04
mask = minicube.s2_mask

ndvi = ((nir - red) / (nir + red + 1e-8)).where(mask == 0, np.NaN)

minicube["s2_ndvi"] = ndvi

Getting Sentinel 2 dates

Sentinel 2 observations are only (at maximum) 5-daily within the minicube. They are on each 5th datum, i.e. there is preceding 4 days of meterological observations before each Sentinel 2 observation.

You may select only dates with Sentinel 2 observations using the Python package xarray:

import xarray as xr
minicube = xr.open_dataset("path_to_minicube")
minicube_on_sen2_dates = minicube.isel(time = slice(4, None, 5))

Aggregating E-OBS data to 5-daily

You may want to aggregate E-OBS data to 5-daily to match with Sentinel 2 observations. This is possible using the Python package xarray:

import xarray as xr
minicube = xr.open_dataset("path_to_minicube")
minicube_5daily = minicube.coarsen(time = 5, coord_func = "max").mean()

Instead of mean(), you may use other aggregation functions such as min() or max().

Folder structure

After downloading GreenEarthNet to your data_dir, you will have the following folder structure:

data_dir
├── train   			# training set
|  ├── 29SND      # Sentinel 2 tile with samples at lon 29, lat S, subquadrant ND
|  |  ├── 29SND_2017...124.nc 	# First training minicube, name cubename.nc
|  |  └── ...
|  ├── 29SPC      # there is 85 tiles in the train set
|  └── ...      # with 23816 .nc train minicubes in total
├── val_chopped    # validation set
|     └── ... # same as train, but with test samples
├── ood-t_chopped     # ood-t test set
|     └── ...
├── ood-s_chopped    # ood-s test set
|     └── ...
└── ood-st_chopped   # ood-st test set
|     └── ...