Skip to content

Commit

Permalink
editorial and typo-y edits to tutorial and index pages
Browse files Browse the repository at this point in the history
  • Loading branch information
EmilyMarkowitz-NOAA committed May 8, 2024
1 parent deb0545 commit 789aa03
Show file tree
Hide file tree
Showing 4 changed files with 65 additions and 29 deletions.
4 changes: 2 additions & 2 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ subtitle: "Introduction to using earth data in the cloud for scientific workflow
---

<img src="images/cloud-overview.png" style="width:250px; float:right;">
Welcome to the NOAA Fisheries workshop focused on geospatial analysis using ocean 'big data'. Today, we are focused on using data from NASA [EarthData](https://www.earthdata.nasa.gov/) but the skills you will learn are transferable to other ways that you might get earth data, e.g. NESDIS, NCEI, ERDDAP servers, Copernicus, etc.
Welcome to the NOAA Fisheries workshop focused on geospatial analysis using ocean 'big data'. Today, we are focused on using data from NASA [EarthData](https://www.earthdata.nasa.gov/) but the skills you will learn are transferable to other ways that you might get earth data (e.g., NESDIS, NCEI, ERDDAP servers, Copernicus).

This workshop is focused on those who are brand new to working with earth data in the cloud and with geospatial packages. R users will be introduced to the earthdatalogin, terra and sf packages, while Python users will be introduced to the earthaccess and xarray packages. This workshop will also introduce working with JupyterHubs. We will use both Jupyter Lab (Python) and RStudio (R) within our JupyterHub.
This workshop is focused on those who are brand new to working with earth data in the cloud and with geospatial packages. R users will be introduced to the `earthdatalogin`, `terra` and `sf` packages, while Python users will be introduced to the `earthaccess` and `xarray` packages. This workshop will also introduce working with [JupyterHubs](https://jupyter.org/hub). We will use both Jupyter Lab (Python) and RStudio (R) within our JupyterHub.

## Topics for May 15, 2024

Expand Down
35 changes: 22 additions & 13 deletions tutorials/r/1-earthdatalogin.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,23 @@ In this example we will use the `earthdatalogin` R package to search for data co

For more on `earthdatalogin` visit the [`earthdatalogin` GitHub](https://github.com/boettiger-lab/earthdatalogin/) page and/or the [`earthdatalogin` documentation](https://boettiger-lab.github.io/earthdatalogin/) site. Be aware that `earthdatalogin` is under active development.

## Terminology

- **`NetCDF` files**: network Common Data Form; is a file format for storing multidimensional scientific data (variables) such as temperature, humidity, pressure, wind speed, and direction. Each of these variables can be displayed through a dimension (such as time) in ArcGIS by making a layer or table view from the netCDF file. Learn more [here](https://pro.arcgis.com/en/pro-app/latest/help/data/multidimensional/what-is-netcdf-data.htm).
- **`tif` or `tiff` or geo tiff file**: is used as an interchange format for georeferenced raster imagery. GeoTIFF is in wide use in NASA Earth science data systems. Learn more [here](https://www.earthdata.nasa.gov/esdis/esco/standards-and-practices/geotiff).
- **raster**: is a matrix of cells (or pixels) organized into rows and columns (or a grid) where each cell contains a value representing information, such as temperature. Rasters are digital aerial photographs, imagery from satellites, digital pictures, or even scanned maps.Learn more [here](https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/what-is-raster-data.htm).
- **GDAL**: is a translator library for raster and vector geospatial data formats. As a library, it presents a single raster abstract data model and single vector abstract data model to the calling application for all supported formats. It also comes with a variety of useful command line utilities for data translation and processing. Learn more [here](https://gdal.org/index.html).

## Prerequisites

An Earthdata Login account is required to access data from NASA Earthdata. Please visit <https://urs.earthdata.nasa.gov> to register as a new user and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.

### Import Required Packages

*Note: See the set-up tab (in left nav bar) for instructions on getting set up on your own computer, but
be aware that getting it is common to run into trouble getting GDAL set up properly to handle
netCDF files. Using a Docker image (and Python) is often less aggravating.*

## Load packages

*Note: See the [Earthdata login set-up tab](https://nmfs-opensci.github.io/EDMW-EarthData-Workshop-2024/content/02-earthdata.html) (in left nav bar) for instructions on getting set up on your own computer.*
Expand Down Expand Up @@ -48,11 +61,7 @@ This will put your login info in a `netrc` file located at:
earthdatalogin:::edl_netrc_path()
```

You can open a terminal and run this to see that it has your username and login.

```{}
cat /home/jovyan/.local/share/R/earthdatalogin/netrc
```
You can open a terminal and run `cat /home/jovyan/.local/share/R/earthdatalogin/netrc` to see that it has your username and login.

Once your `netrc` file is saved, you can use `earthdatalogin::edl_netrc()` to authenticate.

Expand All @@ -78,7 +87,7 @@ How can we find the `shortname`, `concept_id`, and `doi` for collections not in

![](images/SST_Blended_Earthdata_Search.png){width=50%}

If we hover over the top box, find and click on the more information button (an i with a circle around it). On this page, you will see the `DOI`. Now click "View More Info" to get to https://cmr.earthdata.nasa.gov/search/concepts/C1996881146-POCLOUD.html.
If we hover over the top box, find and click on the more information button (an i with a circle around it). On this page, you will see the `DOI`. Now click "View More Info" to get to [https://cmr.earthdata.nasa.gov/search/concepts/C1996881146-POCLOUD.html](https://cmr.earthdata.nasa.gov/search/concepts/C1996881146-POCLOUD.html).

On that page you will see the **"short name"** `MUR-JPL-L4-GLOB-v4.1`. Note the short name was also on the first search page (though it wasn't labeled as the short name, there).

Expand All @@ -102,15 +111,15 @@ results <- earthdatalogin::edl_search(
version = "4.1",
temporal = tbox
)
length(results)
results[1:3]
length(results) # how many links were returned
results[1:3] # let's see the first 3 of these links
```

In this example we used the `short_name` parameter to search from our desired data set. However, there are multiple ways to specify the collection(s) we are interested in. Alternative parameters include:

- `doi`: request collection by digital object identifier (e.g., `doi = '10.5067/GHAAO-4BC21'`)

**NOTE:** Each Earthdata collect has a unique `concept_id` and `doi`. This is not the case with `short_name`. A **shortname** can be associated with multiple versions of a collection. If multiple versions of a collection are publicly available, using the `short_name` parameter with return all versions available. It is advised to use the `version` parameter in conjunction with the `short_name` parameter with searching.
**NOTE:** Each Earthdata collect has a unique `concept_id` and `doi`. This is not the case with `short_name`, which can be associated with multiple versions of a collection. If multiple versions of a collection are publicly available, using the `short_name` parameter with return all versions available. It is advised to use the `version` parameter in conjunction with the `short_name` parameter with searching.

We can refine our search by passing more parameters that describe the spatiotemporal domain of our use case. Here, we use the `temporal` parameter to request a date range and the `bounding_box` parameter to request granules that intersect with a bounding box.

Expand All @@ -136,7 +145,7 @@ Following the search for data, you'll likely take one of two pathways with those

#### Download `earthdatalogin` results

In some cases you may want to download your assets. `earthdatalogin` makes downloading the data from the search results is very easy using the `earthdatalogin::edl_download()` function. The MUR SST files are 673 Gb file so I would prefer not to download. But you could.
In some cases you may want to download your assets. The `earthdatalogin::edl_download()` function makes downloading the data from the search results very easy. We won't download the MUR SST file for this tutorial because it is 673 Gb, but you could with the code below, if inclined.

```{r eval=FALSE}
earthdatalogin::edl_download(
Expand All @@ -158,7 +167,7 @@ oi <- earthdatalogin::edl_search(
oi
```

Let's try plotting this. I am going to authenticate again just to make sure my token did not expire. To search, we don't need to authenticate but to plot or download, we do.
Let's try plotting this. I am going to authenticate again just to make sure my token did not expire. To search, we don't need to authenticate, but to plot or download, we do.

```{r}
# Re-authenticate (just in case)
Expand All @@ -177,10 +186,10 @@ If you get the following error:
> Error: [rast] file does not exist: /vsicurl/https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/AVHRR_OI-NCEI-L4-GLOB-v2.1/20200115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.1.nc
It is likely because you do not have the End User Licence Agreement (EULA)/permissions to use that data set or are not properly logged in using `earthdatalogin::edl_netrc()`. Another reason may be that your
GDAL installation is not properly handling netCDF files.
GDAL installation is not properly handling `netCDF` files.
:::

Also try this example script from the `?earthdatalogin::edl_netrc` documentation that uses a tif file instead of netCDF.
Also try this example script from the `?earthdatalogin::edl_netrc` documentation that uses a `.tif` file instead of `.netCDF`.

```{r}
url <- earthdatalogin::lpdacc_example_url()
Expand Down
31 changes: 28 additions & 3 deletions tutorials/r/2-subset-and-plot.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,30 @@ author: Eli Holmes
3. How to crop a data cube to a box
:::

## Summary

In this example, we will utilize the `earthdatalogin` R package to retrieve, subset, and crop sea surface temperature data as a file and as a datacube from [NASA Earthdata search](https://search.earthdata.nasa.gov/search). The `earthdatalogin` R package simplifies the process of discovering and accessing NASA Earth science data.

For more on `earthdatalogin` visit the [`earthdatalogin` GitHub](https://github.com/boettiger-lab/earthdatalogin/) page and/or the [`earthdatalogin` documentation](https://boettiger-lab.github.io/earthdatalogin/) site. Be aware that `earthdatalogin` is under active development and that we are using the development version on GitHub.

## Terminology

- **`Zarr` files**: is a community project to develop specifications and software for storage of large N-dimensional typed arrays, also commonly known as tensors. A particular focus of Zarr is to provide support for storage using distributed systems like cloud object stores, and to enable efficient I/O for parallel computing applications. Learn more [here](https://zarr.dev/).
- **Open Data Cube (ODC)**: is an Open Source Geospatial Data Management and Analysis Software project that helps you harness the power of Satellite data. At its core, the ODC is a set of Python libraries and PostgreSQL database that helps you work with geospatial raster data. The ODC seeks to increase the value and impact of global Earth observation satellite data by providing an open and freely accessible exploitation architecture. Learn more [here](https://www.opendatacube.org/).

## Prerequisites

The tutorials today can be run with the guest Earthdata Login that is in `earthdatalogin`.
However, if you will be using the NASA Earthdata portal more regularly, please register for an
Earthdata Login account. Please <https://urs.earthdata.nasa.gov> to register and manage your
Earthdata Login account. This account is free to create and only takes a moment to set up.

### Import Required Packages

*Note: See the set-up tab (in left nav bar) for instructions on getting set up on your own computer, but
be aware that getting it is common to run into trouble getting GDAL set up properly to handle
netCDF files. Using a Docker image (and Python) is often less aggravating.*

## Load packages

```{r message=FALSE}
Expand Down Expand Up @@ -101,9 +125,9 @@ plot(rc_sst,
main = titles)
```

## Reading in a Zarr file
## Reading in a `Zarr` file

Reading in Zarr files is easy in Python with `xarray` but currently this is difficult in R. See the `gdalcubes.qmd` file in the `tutorials/r` directory. However we can open individual files from a Zarr file.
Reading in `Zarr` files is easy in Python with [`xarray`](https://docs.xarray.dev/en/latest/index.html) but currently this is difficult in R. See the [`gdalcubes.qmd` file](https://github.com/nmfs-opensci/EDMW-EarthData-Workshop-2024/blob/main/tutorials/r/gdalcubes.qmd) in the [`tutorials/r`](https://github.com/nmfs-opensci/EDMW-EarthData-Workshop-2024/tree/main/tutorials/r) directory of this GitHub repository. However we can open individual files from a `Zarr` file.

Read one file.

Expand All @@ -115,7 +139,8 @@ addr <- paste0(prefixes, url, slice)
y = terra::rast(addr)
```

Plot
Plot.

```{r}
e <- terra::ext(c(xmin=-75.5, xmax=-73.5, ymin=33.5, ymax=35.5 ))
y |> terra::crop(e) |> terra::plot()
Expand Down
24 changes: 13 additions & 11 deletions tutorials/r/3-extract-satellite-data-within-boundary.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,16 @@ author: 'NOAA CoastWatch, NOAA Openscapes'

## Summary

In this example, we will utilize the earthdatalogin R package to retrieve sea surface temperature data from [NASA Earthdata search](https://search.earthdata.nasa.gov/search). The `earthdatalogin` package simplifies the process of discovering and accessing NASA Earth science data.
In this example, we will utilize the `earthdatalogin` R package to retrieve sea surface temperature data from [NASA Earthdata search](https://search.earthdata.nasa.gov/search). The `earthdatalogin` R package simplifies the process of discovering and accessing NASA Earth science data.

This example is adapted from the NOAA CoastWatch Satellite Data Tutorials. To explore the full range of tutorials on accessing and utilizing oceanographic satellite data, visit the [NOAA CoastWatch Tutorial Github repository.](https://github.com/coastwatch-training/CoastWatch-Tutorials)

For more on `earthdatalogin` visit the [`earthdatalogin` GitHub](https://github.com/boettiger-lab/earthdatalogin/) page and/or the [`earthdatalogin` documentation](https://boettiger-lab.github.io/earthdatalogin/) site. Be aware that `earthdatalogin` is under active development and that we are using the development version on GitHub.

## Terminology

- **`shapefiles`**: is a simple, nontopological format for storing the geometric location and attribute information of geographic features. Geographic features in a shapefile can be represented by points, lines, or polygons (areas). Learn more [here](https://desktop.arcgis.com/en/arcmap/latest/manage-data/shapefiles/what-is-a-shapefile.htm).

## Prerequisites

The tutorials today can be run with the guest Earthdata Login that is in `earthdatalogin`.
Expand All @@ -38,7 +42,7 @@ This NOAA blended SST is a moderate resolution satellite-based gap-free sea surf
**Longhurst Marine Provinces**\
The dataset represents the division of the world oceans into provinces as defined by Longhurst (1995; 1998; 2006). This division has been based on the prevailing role of physical forcing as a regulator of phytoplankton distribution.

The Longhurst Marine Provinces dataset is available online (https://www.marineregions.org/downloads.php) and within the shapes folder associated with this repository. For this exercise we will use the Gulf Stream province (ProvCode: GFST)
The Longhurst Marine Provinces dataset is available online (https://www.marineregions.org/downloads.php) and within the shapes folder associated with this repository. For this exercise we will use the Gulf Stream province (`ProvCode: GFST`)

![](../images/longhurst.png)

Expand All @@ -56,7 +60,6 @@ library(ggplot2)
The shapefile for the Longhurst marine provinces includes a list of regions.\
For this exercise, we will only use the boundary of one province, the Gulf Stream region ("GFST").


```{r read province boundaries from shapefiles}
# Set directory path for shapefile
dir_path <- '../resources/longhurst_v4_2010/'
Expand Down Expand Up @@ -124,47 +127,46 @@ shp <- vect(shapes)
GFST <- shp[shp$ProvCode == "GFST",]
```

Plot the SST data
Plot the SST data.
```{r plot_SST}
plot(ras_sst)
```

Plot GFST boundaries from shapefile
Plot GFST boundaries from shapefile.
```{r plot_GFST}
plot(GFST,col='red')
```

Mask SST with the GFST boundaries and plot
Mask SST with the GFST boundaries and plot.
```{r mask_SST}
masked_rc <- mask(ras_sst, GFST)
# Visualize the SST in GFST Province and crop to the GFST extent
plot(masked_rc, ext = GFST)
```


## Compute monthly average of SST

We will construct a data cube to compute monthly average for sea surface temperature data within the boundary.
To minimize data loading times, the first 10 results, which correspond to approximately two months
of data, will be used for this exercise.

Select the SST results for end of Jan and beginning of Feb
Select the SST results for end of January and beginning of February.
```{r}
ras_all <- terra::rast(results[c(25:35)], vsi = TRUE)
```

Trim the SST data to the boundaries of GFST
Trim the SST data to the boundaries of GFST.
```{r}
rc_all <- terra::mask(ras_all, GFST)
```

SST data
Select SST data.
```{r}
rc_sst <- rc_all["analysed_sst", ]
```

Calculate monthly means
Calculate monthly SST means.
```{r get_means}
# Function to convert times to year-month format
year_month <- function(x) {
Expand Down

0 comments on commit 789aa03

Please sign in to comment.