Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occurrence processing for Belgium (CUBE BE) #1

Open
peterdesmet opened this issue Feb 13, 2019 · 6 comments
Open

Occurrence processing for Belgium (CUBE BE) #1

peterdesmet opened this issue Feb 13, 2019 · 6 comments

Comments

@peterdesmet
Copy link
Member

peterdesmet commented Feb 13, 2019

Preprocessing

  1. Download occurrences (20 million)
  2. Randomly assign coordinates to each occurrence, within its coordinateUncertainty circle
  3. Calculate EEA 1km ref grid cell(51.726)

Baseline data

  1. Aggregate by:
    • kingdom
    • year
    • grid cell
  2. Summarize by:
    • occ_count: count(occurrences)

Years/grid cells without occurrences are not included.

Alien data

  1. Filter occurrences on checklist taxa (2.500):
    • For SPECIES => query on speciesKey (will include species synonyms, subspecies/varieties and their synonyms)
    • (For SUBSPECIES and VARIETY => query on acceptedKey (will include synonyms))
    • (For SYNONYM => query on taxonKey)
  2. Aggregate by:
    • species (or taxon)
    • kingdom
    • year
    • grid cell
  3. Summarize by:
    • occ_count: count(occurrences)

Note for step 1: as initial step, we could only work with SPECIES

@peterdesmet peterdesmet changed the title Occurrence processing for Belgium (CUBE) Occurrence processing for Belgium (CUBE 1) Feb 13, 2019
@peterdesmet peterdesmet changed the title Occurrence processing for Belgium (CUBE 1) Occurrence processing for Belgium (CUBE BE) Feb 13, 2019
@damianooldoni
Copy link
Contributor

@peterdesmet: For step 1, filtering on alien species
should we filter by downloading the unified checklist from GBIF or by using the GitHub version from https://github.com/trias-project/unified-checklist? Using GBIF checklist would make workflow more repeatable worldwide.

@peterdesmet
Copy link
Member Author

I would say: querying from GBIF.

@damianooldoni
Copy link
Contributor

Here below how I get three data cubes, one for each of the three filters mentioned above.

  1. For alien taxa where rank = SPECIES and taxonomicStatus one of ACCEPTED or DOUBTFUL:
year eea_cell_code speciesKey n
1792 1kmE3865N3144 2769766 1

Occurrences referring to species synonyms, subspecies/varieties and their synonyms are automatically included.

  1. For alien taxa where rank below SPECIES and taxonomicStatus one of ACCEPTED or DOUBTFUL:
year eea_cell_code acceptedTaxonKey n
2018 1kmE4012N3120 6411098 3

Occurrences referring to synonyms have been included in the query, that means searching occurrences where acceptedTaxonKey is one of nubKey from unified checklist. Contrarely to name data, in occurrence data the field acceptedTaxonKey is always present and it is equal to taxonKey in case of not synonyms (see issue in gbif checklistbank repo).

  1. For alien taxa where taxonomicStatus is NOT one of ACCEPTED or DOUBTFUL, i.e. synonyms:
year eea_cell_code taxonKey n
1940 1kmE3927N3130 3022668 1

Occurrences where taxonKey is one of nubKey from unified checklist.

Question

How to call the column name containing speciesKey/acceptedTaxonKey/taxonKey?

Possible solution

My suggestion is to use column name taxonKey:

year eea_cell_code taxonKey n
1792 1kmE3865N3144 2769766 1
2018 1kmE4012N3120 6411098 3
1940 1kmE3927N3130 3022668 1

@peterdesmet: what do you think?

@damianooldoni
Copy link
Contributor

Discussed with @peterdesmet : we use taxonKey.

@damianooldoni
Copy link
Contributor

Done. Documentation about structure and workflow added in README.md of the repo. Issue can be closed or left for informative reasons.

@trias-project trias-project deleted a comment May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants
@peterdesmet @damianooldoni and others