Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

over url-encoding in attribute fields #86

Open
bobbyo opened this issue Apr 18, 2014 · 5 comments
Open

over url-encoding in attribute fields #86

bobbyo opened this issue Apr 18, 2014 · 5 comments

Comments

@bobbyo
Copy link

bobbyo commented Apr 18, 2014

In trying to add a Name=value field to my data, and have GFFOutput.py write it, I find that the value field is being fully URL encoded, which is different from the gff3 specification.
In my case, it means attributes like:
NAME=jgi.p|Schco3|1037802
end up urlencoded like this:
NAME=jgi.p%7CSchco3%7C1037802
which causes problems with our downstream data use. I believe these should not be escaped according to the gff3 standard. The gff3 standard v 1.21 says:

URL escaping rules are used for tags or values containing the following characters: ",=;". Spaces are allowed in this field, but tabs must be replaced with the %09 URL escape.  -- http://www.sequenceontology.org/gff3.shtml 

So the rule seems to be:

  1. attribute key or value variables should be fully URL escaped when they contain ",=;"
  2. attribute key or value TAB characters should always be escaped, but having TAB does not trigger full url encoding of that key or value

The attribute key and value in NAME=jgi.p|Schco3|1037802 do not contain ",=;". Hence this should not be escaped.

Do you agree? Would you like a patch to GFFOutput.py that provides a routine following those rules for escaping values?

@chapmanb
Copy link
Owner

Bobby;
That would be great. I wish the spec had a more consistent and standard quoting approach instead of something custom, hence my use of urllib.quote/unquote. If it's causing issues with downstream tools, it would make sense to clean it up and I'd be happy to accept a patch. Sorry about the issues and thanks for looking at this.

@bobbyo
Copy link
Author

bobbyo commented May 9, 2014

Here is a patch; feel free to tighten/modify as you wish.

The gff3 standard seems to make using the encoding it a bit tough, as how
does one know when URL-encoding like procedures have been used, e.g. I'm
not clear on how you know for certain to use URL-decoding when reading
the gff3 data back in. But this patch does apply the *encoding *that the
gff3 standard seems to be requesting. I confess that in the case that
caused me to write the patch, the standard suggests the data should not be
encoded, which is the use case I tested.

Best,
Bobby O

On Mon, Apr 21, 2014 at 8:25 AM, Brad Chapman [email protected]:

Bobby;
That would be great. I wish the spec had a more consistent and standard
quoting approach instead of something custom, hence my use of
urllib.quote/unquote. If it's causing issues with downstream tools, it
would make sense to clean it up and I'd be happy to accept a patch. Sorry
about the issues and thanks for looking at this.


Reply to this email directly or view it on GitHubhttps://github.com//issues/86#issuecomment-40943758
.


Robert P Otillar, PhD
Bioinformatics Analyst
Joint Genome Institute
Genomic Annotation Division
2800 Mitchell Drive
Walnut Creek, CA 94598
Tel: 925-296-5786
Fax: 925-296-5752

[email protected]

@chapmanb
Copy link
Owner

Bobby;
Thanks much for looking at this. I didn't see a patch in your reply. Could you send a pull request, or post the patch as a Gist? Thanks again.

@bobbyo
Copy link
Author

bobbyo commented May 15, 2014

Sorry; oddly I did see it attached to my earlier email; here it is again. I
definitely see it attached to this email, as attachment:

GFFOutput.col9_encoding_fix.patch (2k)

Let me know if it does not come through.

-B

On Sat, May 10, 2014 at 11:37 AM, Brad Chapman [email protected]:

Bobby;
Thanks much for looking at this. I didn't see a patch in your reply. Could
you send a pull request, or post the patch as a Gist? Thanks again.


Reply to this email directly or view it on GitHubhttps://github.com//issues/86#issuecomment-42750290
.


Robert P Otillar, PhD
Bioinformatics Analyst
Joint Genome Institute
Genomic Annotation Division
2800 Mitchell Drive
Walnut Creek, CA 94598
Tel: 925-296-5786
Fax: 925-296-5752

[email protected]

@chapmanb
Copy link
Owner

Bobby;
These e-mails come in as GitHub issue comments, and it looks like they remove attachments so I'm not getting it. You can see them on the issue page:

#86

A Gist (https://gist.github.com/) with the patch is probably the best approach. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants