Skip to content

Commit

Permalink
Feature/glossary updates (#72)
Browse files Browse the repository at this point in the history
* updated dataset maturity guide

* typo fix

* Fix quotation marks and update provisional use advice

* fixing typos

* add python to code block
  • Loading branch information
CEKrause authored Jan 12, 2024
1 parent f5698a5 commit 6d64b04
Show file tree
Hide file tree
Showing 3 changed files with 148 additions and 121 deletions.
44 changes: 34 additions & 10 deletions docs/guides/about/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,13 @@ In the context of remote sensing, algorithms generally specify how to determine
lower-level source data. For example, algorithms prescribe how atmospheric temperature and moisture profiles are
determined from a set of radiation observations originally sensed by satellite sounding instruments.

{#ancillary}
## Ancillary datasets

Data which enhance processing and utilisation of remote sensing instrument data. Ancillary datasets are used to assist
in the analysis and classification of e.g. [ARD](#ard) by providing supporting data on conditions at the time of
satellite data acquisition, such as aerosol and water vapour concentrations.

{#aws}
## Amazon Web Services (AWS)

Expand Down Expand Up @@ -159,7 +166,7 @@ and the most up-to-date collection available.
The reproduction of the [collection](#collection), including all downstream products, with the
initial input being the rawest form ([telemetry](#telemetry)). Collections are updated when
there are fundamental changes and upgrades to the data suite that make it incompatible with the existing collection.
Therefore a collection upgrade is more akin to a movie franchise reboot than a re-release.
Therefore, a collection upgrade is more akin to a movie franchise reboot than a re-release.

{#ceos-seo}
## Committee on Earth Observations, Systems Engineering Office (CEOS-SEO)
Expand Down Expand Up @@ -268,6 +275,14 @@ capability. The ESA is a partner of the [Copernicus Programme](#cop-prog).

The angle between a ray reflected from a surface and the line perpendicular to the surface at the point of emergence.

{#final}
## Final

A stage in DEA's dataset maturity lifecycle. DEA’s best quality [ARD](#ard), produced using high quality [ancillary](#ancillary)
datasets derived from observed data.

For more information, see [DEA dataset maturity](/guides/reference/dataset_maturity_guide#final).

{#fc}
## Fractional Cover (FC)

Expand Down Expand Up @@ -303,7 +318,7 @@ For more information, see [Geoscience Australia](https://www.ga.gov.au/).
## Geomedian

Geometric median is a robust high-dimensional statistic that maintains relationships between spectral bands, while
producing a multi-dimensional median over a timeseries of satellite images.
producing a multidimensional median over a timeseries of satellite images.

The Geometric Median provides information on the general conditions of a landscape over a timeseries.

Expand Down Expand Up @@ -338,6 +353,15 @@ a typical desktop computer or workstation in order to solve large problems in sc

The angle between a ray incident on a surface and the line perpendicular to the surface at the point of incidence.

{#interim}
## Interim

A stage in DEA's dataset maturity lifecycle. Interim production means that one or more [ancillary](#ancillary) datasets were not
available at the time of production, and the dataset has instead been corrected using a combination of [NRT](#nrt)
climatological ancillaries, and [final](#final) observed ancillaries.

For more information, see [DEA dataset maturity](/guides/reference/dataset_maturity_guide#interim).

{#nidem}
## Intertidal Elevation

Expand Down Expand Up @@ -474,12 +498,12 @@ Radiation just beyond the visible light spectrum. In Landsat and Sentinel 2 Eart
radiation between 0.7 - 0.9 micrometers.

{#nrt}
## Near-real time (NRT)
## Near real-time (NRT)

NRT data is a less refined/calibrated dataset, which is available much sooner after satellite acquisition than standard
[ARD](#ard) data.
A stage in DEA's dataset maturity lifecycle. NRT data is a less refined/calibrated dataset, which is available much
sooner after satellite acquisition than standard [ARD](#ard) data.

For more information, see [DEA dataset maturity](/guides/reference/dataset_maturity_guide/).
For more information, see [DEA dataset maturity](/guides/reference/dataset_maturity_guide#nrt).

{#odc}
## Open Data Cube (ODC)
Expand Down Expand Up @@ -562,7 +586,7 @@ definition which contains the product description and specification.
## Python

The programming language used to develop the [Open Data Cube](#odc) and most of [Digital Earth Australia](#dea).
It is an easy to use language, which also provides simple access to high performance processing capabilities.
It is an easy-to-use language, which also provides simple access to high performance processing capabilities.

For more information, see [Python](https://www.python.org/).

Expand Down Expand Up @@ -758,7 +782,7 @@ For more information, see [NASA Thematic Mapper Plus](https://landsat.gsfc.nasa.
{#thredds}
## Thematic Real-time Environmental Distributed Data Services (THREDDS)

An National Computational Infrastructure ([NCI](#nci)) server, which is a high-performance and high-availability
A National Computational Infrastructure ([NCI](#nci)) server, which is a high-performance and high-availability
installation of Unidata's Thematic Real-time Environmental Distributed Data Services (THREDDS).

THREDDS serves many of NCI’s open data collections at the file level, as well as some aggregations. It provides many
Expand Down Expand Up @@ -844,13 +868,13 @@ For more information, see [NASA: World Reference System](https://landsat.gsfc.na
{#xarray}
## xarray

An open source project and [Python](#python) package for working with labelled multi-dimensional arrays such as those
An open source project and [Python](#python) package for working with labelled multidimensional arrays such as those
returned by the [Open Data Cube](#odc).

{#yaml}
## Yet Another Markup Language (YAML)

A human readable data storage format. It is used throughout [DEA](#dea) for metadata files, product definitions and
A human-readable data storage format. It is used throughout [DEA](#dea) for metadata files, product definitions and
other configuration files.

{#zenith}
Expand Down
114 changes: 114 additions & 0 deletions docs/guides/reference/dataset_maturity_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# DEA Dataset Maturity

In December 2022, DEA implemented **dataset maturity** levels to provide a more streamlined user
experience and access mechanism for DEA Analysis Ready Data ([ARD](/guides/about/glossary/#ard)) datasets. All ARD data for a single sensor is now
provided in a single product, streamlining the user experience and minimising the data handling required by users.

As higher quality corrections become available, they replace the rapid near real-time data within the same product.
This upgrade provides users a simplified experience whereby they now only need to connect to one data feed which
provides the best possible, most up-to-date information at any point in time.

Dataset maturity metadata attributes are currently implemented across all Sentinel 2 and Landsat Collection 3 ARD products.

## How does 'dataset maturity' work?

DEA produces ARD data to three maturity levels:
* **Near Real-Time (NRT)**
* **Interim**
* **Final**

{#nrt}
### Near Real-Time (NRT)

**Near Real-Time (NRT)** is a rapid ARD product produced within 48 hours of image capture. NRT
data is corrected using existing long term climatology data, rather than observed conditions on the day of the
satellite capture (because these observational datasets, called [ancillaries](/guides/about/glossary/#ancillary), take a few weeks to be received by DEA). Due to the use of average
condition data, rather than observational data to perform the corrections to produce ARD, NRT data can be published
quickly, however is considered to be of slightly lower quality than '**final**' ARD data.

Over the next few weeks, higher quality ancillary datasets become available describing the specific
atmospheric conditions at the time and location the satellite image was captured. Using these
ancillaries, ‘**final**’ maturity ARD is produced. This replaces the '**NRT**' or '**interim**' product (see below).

{#final}
### Final

**Final** ARD is DEA’s best quality ARD, produced using high quality ancillary datasets derived
from observed data. These ancillary datasets are slower to produce but are observational
datasets of the conditions at the time of image capture and so provide our most accurate dataset
corrections.

DEA uses the following dynamic ancillary datasets to produce its **final** ARD:
* Bidirectional reflectance distribution function ([BRDF](/guides/about/glossary/#brdf)) data from the United States Geological Survey
* Water Vapour from USA National Oceanographic and Atmospheric Administration

{#interim}
### Interim

If high quality ancillaries required for the **final** ARD model don’t become available **within 23 days** of image capture,
**interim**’ maturity data is produced as a *stand-in* until the full ancillaries are available to produce the ‘**final**’ version.
This is our fall-back until the issue is resolved.

**Interim** production means that one or more ancillary datasets were not available at the time of production, and the dataset has
instead been corrected using a combination of **NRT** climatological ancillaries, and **final** observed
ancillaries. If there are no delays in the ancillary datasets, then **interim** maturity is skipped and
datasets move directly from **NRT** to **final**.

## Dataset maturity flowchart

All three maturity levels can be present inside a single product, with the maturity information stored
in the product metadata and as part of the filename, enabling users to choose an appropriate dataset
maturity level to suit their requirements. Datasets with lower maturity level will be replaced by more
mature dataset versions (interim and final) as they are generated.

![dataset_maturity_flowchart](/_files/reference/dataset_maturity_flowchart.drawio.svg)

**Tip:** to view larger, right-click then select **open image in new tab**

% Diagram editing notes for internal use:
% The SVG above contains an embedded copy of the source used to generate it.
% Download it, then drop it into https://app.diagrams.net/ to edit.
% When finished, *save* it, OR use *export as SVG* with the **Include a copy of my diagram** option checked.
% Then commit it back to the repo.

## How is this different from what DEA used to do?

Previously, DEA produced two separate products:

* Near Real-Time, which was kept as a rolling 90 day archive;
* ARD, published 9 to 16 days after satellite acquisition.

The user was then able to select which product they wanted to use according to their purpose.
It was more difficult to combine products as both NRT and the final ARD product contained data
for the same time step.

## How do I load only Near Real-Time/Interim/Final data using the datacube?

DEA data can be filtered to specific dataset maturity levels using `dataset_maturity`
metadata field. Valid options are 'final', 'nrt' or 'interim'; for example,
to load only 'final' maturity Landsat 8 data

```python
import datacube
dc = datacube.Datacube()

dc.load(product="ga_ls8c_ard_3",
measurements=['nbart_red'],
x=(150, 150.1),
y=(-30, -30.1),
time=('2022-01', '2022-02'),
dataset_maturity="final")
```

{#provisional}
## What about provisional?

The term **provisional** is used by DEA to denote products or services that have not yet passed quality control, and/or
have not yet been finalised for release. Products or services tagged as provisional could be e.g. beta versions
of new products, could represent other stages of product development, or could not yet have passed
DEA's quality control standards for a product or service.

**Provisional products are available for use, but should be used with appropriate caution.** See the individual product
or service metadata pages for information on product limitations and use.

Once a product is formally released, it is renamed to remove the provisional tag.
111 changes: 0 additions & 111 deletions docs/guides/reference/dataset_maturity_guide.rst

This file was deleted.

0 comments on commit 6d64b04

Please sign in to comment.