Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type_id for all linker tables #140

Open
laceysanderson opened this issue Apr 11, 2024 · 1 comment
Open

type_id for all linker tables #140

laceysanderson opened this issue Apr 11, 2024 · 1 comment

Comments

@laceysanderson
Copy link
Contributor

laceysanderson commented Apr 11, 2024

It would be really good to have an optional type_id added to all linker tables in chado consistently. That way we could indicate what type of link is being made and remove a lot of ambiguity that it currently happening ;-p

Specific use cases

Contacts

Contact linkers (feature_contact, featuremap_contact, library_contact, nd_experiment_contact, project_contact, pubauthor_contact) currently do not have a type_id at all which means there is no way to indicate the role that contact has in what you are linking it to.

For example, you can see in detail in this Tripal issue that I would like to add attribution specific fields to our experiment pages such as data collector, data custodian, curator, funder, research organization, etc but currently I have no way to indicate that a specific contact is the fulfilling this role for this specific experiment. If I try to use the type_id in the contact then that assumes that contact always has that role which is unrealistic 🤪 It also assumes that a researcher only ever fulfills one role which is also unrealistic.

By having a type_id in the link I would be able to say that for this specific experiment, this contact fulfills a specific role.

Projects

For projects, indicating the type of linkage is often in the form of describing if the project produced the item, used an existing item, referenced an item, improved upon an item, etc. Here are some specific examples:

project_feature

We often connect features with projects for multiple different reasons and it would be good to have a clear way to indicate this. For example, it would be really helpful to be able to specify if a genetic marker or gene is connected to a project because it was (a) generated by that project or (b) reused from a previous project or (c) referenced to provide context? If this is a project looking for the genetic control of flowering colour, then was the gene attached because it has been implicated in previous literature, because it showed up within a QTL region in this analysis or because it may be a homolog of a known flower colour gene in another species?

project_analysis

Picture a genome assembly project. It may be linked to an analysis describing how the scaffolds were generated, and another one describing the annotation process. You may want to use the project_analysis.type_id to indicate genome assembly or annotation. Alternatively, you may want to indicate genome assembly or annotation as the analysis type and use the project_analysis.type_id to indicate that this project funded/did both analyses. Then later another researcher might do some work using this genome version that indicates an issue in the assembly process. We may want to link this new analysis to the genome assembly project to provide warning to anyone using this version.

Implementation

  • Add a nullable type_id to all linker tables which do not already have one.
  • Since it's nullable this is a backwards compatible change
  • We would need to add the type_id to the unique constraint (e.g. allow one contact to fulfill multiple roles)
  • Nulls in the type_id can cause duplicates (see Organism - unique constraint does not work for nullable columns #139). I would mitigate this by providing a default value to these new type_ids (e.g. "connected to" RO:0002170) and by using the new PostgreSQL 15 syntax when it is available.
@dsenalik
Copy link

This sounds like a great improvement, especially since it is backwards compatible. I would prefer consistency in all of the linker tables. A few linker tables also have a rank column, e.g. project_analysis, stock_feature, which would be needed if you want to be able to specify an order, such as putting the "main" contact first. So this makes me wonder if we also want to include rank in all linker tables. We could allow NULL of course for backward compatibility, or for the tables I listed they specify NOT NULL but have a default value of 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants