Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend quality control and standardisation to QTL analysis #77

Open
Al-Murphy opened this issue Dec 8, 2021 · 2 comments
Open

Extend quality control and standardisation to QTL analysis #77

Al-Murphy opened this issue Dec 8, 2021 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Al-Murphy
Copy link
Collaborator

Extend quality control and standardisation to QTL analysis. Following checks are specific to QTL studies and need to be added:

  • Allow for duplicated SNPs
  • Check effect region/gene region
  • Convert genome build of effect region/gene region (I think I have a script for this in R and @bschilder has this function)
  • Standardise gene names - eQTLs (orthogene::map_genes might be helpful, it can take in any format (hgnc, ensembl, entrez, etc) and convert them all to one format. Can even take transcript IDs and map them onto standardised transcript IDs or HGNC symbols, same with protein IDs)
  • Any QTL specific column headers and what to standardise their names to? (browser eQTL catalogue)
@Al-Murphy Al-Murphy added help wanted Extra attention is needed enhancement New feature or request labels Mar 16, 2022
@Al-Murphy
Copy link
Collaborator Author

Al-Murphy commented Aug 12, 2022

v1.5.11 can now handle QTL sumstats however it will only check the SNPs, not the effect region (gene for eQTLs). Note to set check_dups = FALSE when running MSS for QTLs

@bschilder
Copy link
Collaborator

That's awesome news!
Standardizing gene names for eQTLs should be doable using orthogene:::map_genes(). This way, it can handle a variety of gene inputs (gene symbols, ensembl IDs, Entrez IDs, transcript IDs, UniProt IDs) onto standardized IDs (e.g. gene symbols). If you want to avoid having to install all the deps for orthogene, you could instead use the main function it relies on: gprofiler::gconvert()
https://github.com/neurogenomics/orthogene/blob/4977da1e09074f5b063f1b0413aa00e08b65929b/R/map_genes.R#L61

For non gene/transcript/protein-based regions, I imagine some other approach would be necessary (e.g. for methylation QTLs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants