Daniela Szwarcman, Sujit Roy, Paolo Fraccaro, Þorsteinn Elí Gíslason, Benedikt Blumenstiel, Rinki Ghosal, Pedro Henrique de Oliveira, João Lucas de Sousa Almeida, Rocco Sedona, Yanghui Kang, Srija Chakraborty, Sizhe Wang, Ankur Kumar, Myscon Truong, Denys Godwin, Hyunho Lee, Chia-Yu Hsu, Ata Akbari Asanjan, Besart Mujeci, Trevor Keenan, Paulo Arévolo, Wenwen Li, Hamed Alemohammad, Pontus Olofsson, Christopher Hain, Robert Kennedy, Bianca Zadrozny, Gabriele Cavallaro, Campbell Watson, Manil Maskey, Rahul Ramachandran, Juan Bernabe Moreno
**IBM Research, NASA Marshall Space Flight Center, The University of Alabama in Huntsville, University of Iceland, Jülich Supercomputing Centre, Virginia Tech, Arizona State University, Oregon State University, Clark University, Boston University, University of California, Berkeley, Earth from Space Institute **
This repository contains code and examples based on the TerraTorch library for fine-tuning Prithvi-EO-2.0, a more powerful version of the foundation model Prithvi developed by IBM and NASA. Trained on 4.2M global time series samples on the JUWELS HPC system at the Jülich Supercomputing Centre (JSC) using NASA’s Harmonized Landsat and Sentinel data at 30m resolution, it offers significant improvements over its predecessor.
- December 4, 2024: Prithvi-EO-2.0 pre-trained models and fine-tuning datasets released on Hugging Face.
- December 5, 2024: Prithvi-EO-2.0 paper released on arxiv link. 🔥🔥
Prithvi-EO-2.0 is based on the ViT architecture, pretrained using a masked autoencoder (MAE) approach, with two major modifications as shown in the figure below.
First, we replaced the 2D patch embeddings and 2D positional embeddings with 3D versions to support inputs with spatiotemporal characteristics, i.e., a sequence of T
images of size (H, W)
. Our 3D patch embeddings consist of a 3D convolutional layer, dividing the 3D input into non-overlapping cubes of size (t, h, w)
for time, height, and width dimensions, respectively. For the 3D positional encodings, we first generate 1D sin/cos encodings individually for each dimension and then combine them together into a single, 3D positional encoding.
Second, we considered geolocation (center latitude and longitude) and date of acquisition (year and day-of-year ranging 1-365) in pretraining. Both encoder and decoder receive time and location information for each sample and encodes them independently using 2D sin/cos encoding. They are added to the embedded tokens via a weighted sum with learned weights: one for time and one for location and separate weights for encoder and decoder. Since this metadata is often not available, we added a drop mechanism during pretraining that randomly drops the geolocation and/or the temporal data to help the model learn how to handle the absence of this information.
Model | Details | Weights |
---|---|---|
Prithvi-EO-2.0-300M | Pretrained 300M parameter model | https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M |
Prithvi-EO-2.0-300M-TL | Pretrained 300M parameter model with temporal and location embeddings | https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-300M-TL |
Prithvi-EO-2.0-600M | Pretrained 600M parameter model | https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M |
Prithvi-EO-2.0-600M-TL | Pretrained 600M parameter model with temporal and location embeddings | https://huggingface.co/ibm-nasa-geospatial/Prithvi-EO-2.0-600M-TL |
We validated the Prithvi-EO-2.0 models through extensive experiments using GEO-Bench, the most popular and rigorous benchmark framework available for Earth Observation foundation models. Prithvi-EO-2.0-600M-TL outperforms the previous Prithvi-EO model by 8% across a range of tasks. It also outperforms six other geospatial foundation models when benchmarked on remote sensing tasks from different domains and resolutions (i.e. from 0.1m to 15m).
We have fined-tuned Prithvi-EO-2.0 for downstream tasks in different domains of interest using TerraTorch (see instructions on how to get started here). Below we provide a list of the downstream tasks, along with links to the datasets, sample TerraTorch configuration files (or custom code, in the case of Gross Primary Product) and sample notebooks for fine-tuning.
Task | Dataset | TerraTorch Config/Code |
---|---|---|
Flood Detection | https://github.com/cloudtostreet/Sen1Floods11 | sen1floods11.yaml |
Wildfire Scar Detection | https://huggingface.co/datasets/ibm-nasa-geospatial/hls_burn_scars | firescars.yaml |
Burn Scar Intensity | https://huggingface.co/datasets/ibm-nasa-geospatial/burn_intensity | burnintensity.yaml |
Landslide Detection | https://huggingface.co/datasets/ibm-nasa-geospatial/Landslide4sense | landslide.yaml |
Multi-temporal Crop Segmentation (US) | https://huggingface.co/datasets/ibm-nasa-geospatial/multi-temporal-crop-classification | multicrop.yaml |
Multi-temporal Land Cover and Crop Classification (Europe) | https://datapub.fz-juelich.de/sen4map/ | sen4map_land-cover.yaml sen4map_crops.yaml |
Above Ground Biomass Estimation | https://huggingface.co/datasets/ibm-nasa-geospatial/BioMassters | biomassters.yaml |