Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNPlocs.Hsapiens.dbSNP154.GRCh38 and SNPlocs.Hsapiens.dbSNP154.GRCh37 #3

Open
AhmedArslan opened this issue Nov 3, 2023 · 6 comments

Comments

@AhmedArslan
Copy link

Hello, I would like to request if you could create dbSNP154.GRCh38/dbSNP154.GRCh37 or provide guidance to built dbSNP154. GWAS Catalogue uses dbSNP154 version and this could be helpful for help working on GWAS data.

Many thanks.

@Al-Murphy
Copy link

Hey, I believe one of the main (if not only?) uses of these packages is for MungeSumstats hance why I'm answering this. The creation of supplementary dbSNP release packages is something that has been discussed here.

The TLDR is that it is very RAM intensive and time consuming to create these packages (on the scale of 80 cpus and 384 Gb RAM running for a week for each package) and so isn't really feasible using the current approach. Really we need to refactor the approach is done which isn't something @hpages or me have had time to do.

@hpages
Copy link
Owner

hpages commented Nov 4, 2023

At least dbSNP154 is slightly smaller than dbSNP155 (729,491,867 RS count vs 1,085,850,277) so the requirements won't be so bad.

Let me know if you want to give this a try @AhmedArslan, by following the overview of the process I provided here. I'll be happy to answer questions and provide more detailed guidance if needed.

Best,
H.

@hpages
Copy link
Owner

hpages commented Nov 4, 2023

@Al-Murphy Actually now that I look at the numbers, I see that size of dbSNP156 is 1,130,597,309 RS count which is really not that much bigger than dbSNP155 (only 4% bigger), especially compared to the growth between dbSNP154 and dbSNP155, which was 49%. So maybe I'll give a shot at forging SNPlocs.Hsapiens.dbSNP156.GRCh38 and SNPlocs.Hsapiens.dbSNP156.GRCh37 after all, in the next couple of weeks or so.

@AhmedArslan
Copy link
Author

At least dbSNP154 is slightly smaller than dbSNP155 (729,491,867 RS count vs 1,085,850,277) so the requirements won't be so bad.

Let me know if you want to give this a try @AhmedArslan, by following the overview of the process I provided here. I'll be happy to answer questions and provide more detailed guidance if needed.

Best, H.

@hpages only limitation is that I do not have resources to perform such intensive analysis. Although if dbSNP155 is broadly different from dbSNP154 (as you mentioned) in terms of SNP ids, perhaps its essential to produce dbSNP154?

@hpages
Copy link
Owner

hpages commented Nov 6, 2023

Although if dbSNP155 is broadly different from dbSNP154 (as you mentioned) in terms of SNP ids

Well, all I'm saying is that dbSNP155 has a lot more SNP ids than dbSNP154. That doesn't mean that the SNP ids in the latter are not in the former.

IIUC dbSNP builds are incremental with every new build mostly adding new SNPs to the previous one and making some corrections to the existing ones. So I would expect dbSNP155 to be a superset of dbSNP154 i.e. that most of the SNP ids found in the latter are still in the former. In other words, I would imagine that using dbSNP155 would still cover your use case.

In the unlikely case that the SNPs in dbSNP154 have changed so much in dbSNP155 that the latter cannot be used to annotate the SNPs in the former, then this would suggest that the data in dbSNP154 is outdated, and that the GWAS Catalogue should probably be updated to be based on dbSNP155 in order to remain relevant.

What's the plan anyways for the GWAS Catalogue? How often do they switch to a more recent dbSNP build? dbSNP 154 is more than 3 year old now so maybe it's time.

@hpages
Copy link
Owner

hpages commented Nov 7, 2023

So ealier today I asked the GWAS folks about their plans to map to a more recent dbSNP build and I got the following answer:

Hi Hervé,

Thanks for your interest in the GWAS Catalog. We use dbSNP mappings from Ensembl, which is currently on Build 154. However, we expect that with the next release scheduled for this month, the mapping will be updated to dbSNP 156. See Ensembl’s page here: https://www.ensembl.info/2023/09/13/whats-coming-in-ensembl-111-ensembl-genomes-58/

I understand that build 155 will be skipped.

Best wishes,
Elliot Sollis
GWAS Catalog Curator

> On 6 Nov 2023, at 18:44, Hervé Pagès via gwas-info <[email protected]> wrote:
>
> Hi,
>
> Are there any plans to update the GWAS catalogue to map it to dbSNP Build 155 or 156 instead of dbSNP Build 154?
>
> Is there a timeline for that?
>
> Thanks,
>
> H.
> -- 
> Hervé Pagès
>
> Bioconductor Core Team
> [[email protected]](mailto:[email protected])

One more reason to focus on dbSNP156!

I will start working on SNPlocs.Hsapiens.dbSNP156.[GRCh38|GRCh37] this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants