Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New metadata #1285

Open
drinks opened this issue Apr 22, 2014 · 5 comments
Open

New metadata #1285

drinks opened this issue Apr 22, 2014 · 5 comments

Comments

@drinks
Copy link
Member

drinks commented Apr 22, 2014

There's been ongoing discussion about flagging forms that can be successfully submitted using POST directly via cURL or some other means to get some speed gains, and as it's appearing that some forms are broken by their own virtues maybe now's as good a time to approach them as any. I could see utility in adding two fields alongside bioguide, before contact_form:

notes, for listing caveats/comments in unstructured text, and

environments, where the value could be an array of javascript, curl, dom or similar keywords.

The latter would serve to indicate under what circumstances this form is able to work, where the expectation would be that curl forms could get success as a direct request, dom can be used through a non-js html interface such as mechanize, and javascript would work with a phantomjs-like environment. I'm not at all sold on these names, so suggest away.

And since there's a chance this thread could turn to multiple instruction sets per form I'd like to preemptively downvote that idea unless it becomes absolutely necessary; my hope is that the current schema can stay as close to simple manipulation of standard form elements as possible, allowing each type of end-user to interpret however works best for them.

So, thoughts?

@akosednar
Copy link
Member

Wouldn't that just add more unneeded parts to the system (the environments)? Something that works via phantomjs should work via curl or another way as it simulates how a user is using the form.

On this topic, I think though a better solution for broken forms would be to allow the injection of javascript into the page. I know we are simulating a user clicking an object but sometimes adding a quick javascript snippet to the page (for example add a unique id to each form field based on the label or something) would help get over the humps of any issues while preserving authentic user interaction as much as possible.

@drinks
Copy link
Member Author

drinks commented Apr 22, 2014

Well taken, but perhaps it's important to reinforce that congress-forms is a client for this data, not the client. There should be no assumptions made about the integrating system other than as dictated by realities of the forms themselves. So, it's not really safe to assume PhantomJS is present at all (Formageddon uses Mechanize) and I guess I don't feel as though indicating possible environments really muddies the waters on integration--unless you're unable to support one of them and it's needed. It's useful to me to know if a form requires js, for example, because it means I can save some CPU time and go straight to fax. @hainish--you've been wanting a flag for POST-able forms, any feelings about this?

@akosednar
Copy link
Member

True! Makes sense I guess you don't want to limit the person who might eventually use this dataset

@Hainish
Copy link
Contributor

Hainish commented Apr 25, 2014

I could imagine a case where a javascript instruction is not necessary to fill out a form, but very preferable. For instance: say a form, to prevent spammers, measures the amount of time you're on the page filling out fields via javascript. If you're on the page for less than, say 20 seconds, it marks it as spam and refuses to deliver the message. Obviously, from a spam mitigation perspective, throttling can happen on the server side as well and that would be a more effective measure, but just imagine it's being done in js. In this case, we would on the implementation side either have to wait 20 seconds (thus keeping the phantomjs process running and allocated in RAM for that much longer) or game the system by using javascript to fast-forward 20 seconds and filling in the form normally. I don't think any form is actually doing anything like this, but it is possible we might come across this and we shouldn't limit ourselves to just UX interactions simply because we should be acting like end users.

@drinks I think an environments field in the YAML is fine, we could implement something like this in typhoeus, and I think it would be much quicker than PhantomJS. I can see a big gain even by specifying post request fields in a relatively few number of forms. At EFF I imagine a large minority of the form fills will go to the senators from California and NY. So by just implementing this for 4 senators, we could be reducing load a significant amount.

@drinks
Copy link
Member Author

drinks commented Apr 25, 2014

@Hainish To me, that is a perfect case for a manual notes entry or similar mechanism. It's a pretty slippery slope to release the entire grammar of javascript into what I hope can remain a simple and small set of instructions, reducing the amount of crazy the end user has to account for to just get started.

Once we reach a degree of complexity where a file can't be dropped into an existing arbitrary (compliant) system and expected to work, there's no longer an advantage in trying to embed those instructions in a structured format versus just plain notes--it has to be touched by hand and accounted for in the implementing tool either way. Does that sound crazy? Every pattern we attempt to encapsulate/automate in script is another layer of complexity we introduce and my inclination is to KISS on this end and override quirks in the submitter. For example, I have a special clause in Formageddon for dealing with Gillibrand's radio button topic selector, and I imagine you do too.

One thing that just occurred to me as a possible means for communication of edge cases might be a set of 'exception codes', maybe in a caveats array, indicating any weird behaviors that occur widely enough to be considered a pattern. If the implementer's tool has a code path for dealing with a given type of quirk, they could automatically invoke it if its code is seen. Non-patternable quirks could remain in plain text, perhaps. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants