Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[July - Sep] Dataset Curation Workflow Tracking #138

Open
1 of 20 tasks
aditigopalan opened this issue Sep 4, 2024 · 0 comments
Open
1 of 20 tasks

[July - Sep] Dataset Curation Workflow Tracking #138

aditigopalan opened this issue Sep 4, 2024 · 0 comments

Comments

@aditigopalan
Copy link
Contributor

This ticket tracks curation workflow progression.

Note: It is possible for work to take place simultaneously in the three sections with overlapping periods, allowing curation workflows for different months to coincide.

1. Curation and Annotation

  • Run Pubmed crawler to generate PublicationView manifest [205 publications generated, long sprint anticipated]
  • Send Amber and Jineta a copy of the PublicationView manifest from latest crawl to review for MC2 Center Newsletter publication highlights
  • Send Amber "News from CCKP" for MC2 Center Newsletter
  • Annotate publications in PublicationView manifest [In progress]
  • Generate ToolView and DatasetView manifests based on PublicationView manifest
  • Run the automated curation workflow to upload publications, datasets and tools [This includes splitting manifests, processing and validating manifests, generating target synapse IDs for upload, schema updates, upload to synapse and (in progress) a validation check for uploads)
  • Generate UNION tables
  • QC of staging tables
  • Performing automate portal sync to CCKP
  • Validate data on the CCKP

Status check [Plan to report numbers for each category following pubmed crawl]:

  • Publication upload [ ]
  • Tool upload [ ]
  • Data set upload [ ]

2. Data model

  • Update valid values in the data model and build
  • Generate templates from new model
  • Release new model version [no changes, no release]
  • Update DCA config [no changes, no update]

3. Contributor Engagement

  • Emails to contributors (by grant) with info on newly added manifests, link to project, DCA, and instructions on review, annotation, validation, and submission, as applicable
  • Jira Help Desk ticket tracking, review, triage, with data model updates, as needed (TBD)
  • Annotation gap-filling of DatasetView and ToolView manifests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant