Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add richer queries that combine AND/OR #37

Open
nicholastmosher opened this issue Apr 16, 2020 · 1 comment
Open

Add richer queries that combine AND/OR #37

nicholastmosher opened this issue Apr 16, 2020 · 1 comment

Comments

@nicholastmosher
Copy link
Contributor

This is an outline of how to go about implementing a version of the functionality described in tair/ifad-frontend#22.

Current functionality

Queries in ifad-backend are currently created by choosing a list of "segments" (pairs of Aspect and Annotation Status), and applying either the AND (intersection) or OR (union) operator over the entire list of segments. Because of this, some richer queries that might be desired are not possible.

Example desired query

A query that we would like to be possible but is not currently might look like this:

Fetch all genes and annotations which belong to both:

  • Known Experimental in Molecular Function, AND
  • Unknown in Biological Process OR Unannotated in Biological Process

Query Structure

In order to support richer queries like the one described above, we need a representation that describes query as a Tree of operations, rather than a List as it is currently. In code, a query in the current system is a single operator and a list of segments, where the single operator is used to join all of the segments, like this:

query = {
  strategy: "union",
  segments: [
    { aspect: "F", annotationStatus: "KNOWN_EXP" },
    { aspect: "P", annotationStatus: "KNOWN_EXP" },
    { aspect: "C", annotationStatus: "KNOWN_EXP" }
  ]
}

Applying the query goes something like this:

segment(F, EXP) UNION segment(P, EXP) UNION segment(C, EXP)

In code, this looks slightly different, but the result is the same.

Proposed new query structure

To support richer queries, we'd like to be able to choose a new operator to combine different components of the query. Continuing with the use case above (segment(F, EXP) INTERSECT (segment(P, UNKNOWN) UNION segment(P, UNANNOTATED)), here's how the queries could be restructured to support it:

query = {
  tag: "intersect",
  components: [
    { tag: "segment",
      segment: Segment },
    { tag: "union",
      components: [
        { tag: "segment",
          segment: Segment },
        { tag: "segment",
          segment: Segment }
      ]
    }
  ]
}

Notice that this query is essentially a tree with three types of nodes: a "segment" leaf node which queries a single segment, and "union" and "intersect" nodes which have children in the components field.

Implementing the new query structure

Implementing this new query should not be difficult because the query functions were defined in a modular way from the start. Let's look at the key types and functions (from queries.ts):

// Notice that the shape of query outputs is the same shape as the input
type QueryResult = StructuredData;

const querySegment = (dataset: StructuredData, segment: Segment): QueryResult => { ... }
const union = (one: StructuredData, two: StructuredData): QueryResult => { ... }
const intersect = (one: StructuredData, two: StructuredData): QueryResult => { ... }

The union and intersect functions make it easy to take any two subsets of data and combine them. This can be used to create a tree traversal over the query tree, combining the children of each node according to the operator specified by that node.

Aside: Gene Product Type filters

One last thing I'll mention here is that the queries I described here ignore the detail of gene product type filters. There are two possible choices that could be made in this regard:

  • Apply gene product type filters to the total query, or
  • Allow gene product type filters to be chosen for each level of the query.

The second option is more true to the idea of supporting richer queries, and should not take considerably more work to implement. It would just require adding a filter field into the nodes of the query tree and taking that into account when traversing the tree.

@tberardini
Copy link

Thanks for all that background, @nicholastmosher! The info will make it that much easier to pick up the work and keep going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants