Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 parent / id remapping issue. #109

Open
hexylena opened this issue May 26, 2016 · 2 comments
Open

GFF3 parent / id remapping issue. #109

hexylena opened this issue May 26, 2016 · 2 comments

Comments

@hexylena
Copy link

hexylena commented May 26, 2016

Just filing it since even though this library doesn't seem to see a lot of activity lately.

Script:

#!/usr/bin/env python
import sys
from BCBio import GFF
from Bio import SeqIO

seq_dict = SeqIO.to_dict(SeqIO.parse(sys.argv[1], "fasta"))
# Parse GFF3 records
for record in GFF.parse(sys.argv[2], base_dict=seq_dict):
    print record.id
    for feature in record.features:
        print feature.id, [x.id for x in feature.sub_features]

Input files:

>ctgA
ACTG
>ctgB
ACTG
##gff-version 3
##sequence-region ctgA 1 17039
ctgA    glimmer gene    39  209 .   +   .   ID=orf00001
ctgA    Glimmer3    CDS 39  209 .   +   0   ID=cds_orf00001;Parent=orf00001
ctgA    glimmer gene    215 637 .   +   .   ID=orf00002
ctgA    Glimmer3    CDS 215 637 .   +   0   ID=cds_orf00002;Parent=orf00002
ctgA    glimmer gene    637 3588    .   +   .   ID=orf00003
ctgA    Glimmer3    CDS 637 3588    .   +   0   ID=cds_orf00003;Parent=orf00003
ctgA    glimmer gene    3873    4085    .   -   .   ID=orf00004
ctgA    Glimmer3    CDS 3873    4085    .   -   0   ID=cds_orf00004;Parent=orf00004
##gff-version 3
##sequence-region ctgB 1 17040
ctgB    glimmer gene    21  209 .   +   .   ID=orf00001
ctgB    Glimmer3    CDS 21  209 .   +   0   ID=cds_orf00001;Parent=orf00001
ctgB    glimmer gene    215 637 .   +   .   ID=orf00002
ctgB    Glimmer3    CDS 215 637 .   +   0   ID=cds_orf00002;Parent=orf00002
ctgB    glimmer gene    637 3588    .   +   .   ID=orf00003
ctgB    Glimmer3    CDS 637 3588    .   +   0   ID=cds_orf00003;Parent=orf00003
ctgB    glimmer gene    3873    4085    .   -   .   ID=orf00004
ctgB    Glimmer3    CDS 3873    4085    .   -   0   ID=cds_orf00004;Parent=orf00004

Actual Output:

ctgA
orf00001 ['cds_orf00001']
orf00002 ['cds_orf00002', 'cds_orf00002']
orf00003 ['cds_orf00003', 'cds_orf00003']
orf00004 ['cds_orf00004', 'cds_orf00004']
ctgB
orf00001_2 ['cds_orf00001']
orf00002 []
orf00003 []
orf00004 []

Expected Output:

ctgA
orf00001 ['cds_orf00001']
orf00002 ['cds_orf00002']
orf00003 ['cds_orf00003']
orf00004 ['cds_orf00004']
ctgB
orf00001 ['cds_orf00001']
orf00002 ['cds_orf00002']
orf00003 ['cds_orf00003']
orf00004 ['cds_orf00004']
@chapmanb
Copy link
Owner

Eric;
Sorry about the issue. Have you tried looking at parsing in gffutils (https://github.com/daler/gffutils)? It's a better representation of the approach used here and is well maintained by Ryan. I've been generally pointing folks there to work on a single approach for parsing GFF in Python. Hope that works for you.

@hexylena
Copy link
Author

@chapmanb no problem. Yeah, I saw that comment from you on another issue and started to look into it, but 100% of our tooling is based on your lib, so it'll be a bit of a migration process. And yes, that'll work fine for us as a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants