Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping HTML characters to prevent injections #48

Open
5 tasks
infogulch opened this issue Dec 7, 2022 · 3 comments
Open
5 tasks

Escaping HTML characters to prevent injections #48

infogulch opened this issue Dec 7, 2022 · 3 comments

Comments

@infogulch
Copy link

infogulch commented Dec 7, 2022

I think it's reasonable to expect strings embedded in html to be html-escaped.

I modified the Node::Text match arm to call html_escape::encode_text_to_string on string literals in text nodes in this commit as a demonstration: 9626096 However, this approach is deficient. First, this implementation will become less useful once the unquoted-text branch is merged. But more importantly, this doesn't address the bigger issue which is serializing user-generated data into a node which will arrive via some embedded rust code. Further, it would be nice to be able to nest calls to html! but not double-escape the html tags generated by the inner html!.

Here are some cases I think would be reasonable to support:

// Case 1: Embedded string literals should be escaped (see demo commit):
let case1 = html! { <p>"Some string in a text node that contains lt (<) and gt (>) characters"</p> }`;

// Case 2: Dynamic strings embedded into nodes should be escaped:
let user_comment = r#"I'm an innocent comment 👼 <script src="//example.com/infogulchs_steal_your_money.js"></script>😈"#;
let case2 = html! { <p>{ user_comment }</p> };

// Case 3: HTML fragments previously generated by calling html! should *not* be escaped again:
let user_name = "infogulch";
let case3 = html! { <div>"By: "<i>{ user_name }</i><br>{ case2 }</div> };

// Case 4: There should be some way to bypass automatic escaping:
let case4 = html! { <div>{ this_is_safe_html_pinky_promise(my_vetted_html_template.render(user_comment)) }</div>

// Case 5: Escaping should work on attribute values as well:
let oops_id = "myid\"oops";
let case5 = html! { <div id={oops_id}>foo</div> };

To support Cases 3 and 4 I suspect the best approach will be to wrap the output of html! with a newtype that indicates that it will emit correctly escaped html content, where if wrapped again it will not escape twice.

Steps

This is probably too big an effort to resolve all at once, here are some intermediate steps that I think would be useful in the meantime:

  • Note in the docs and readme that syn-rsx and html-to-string-macro doesn't do any escaping and that users should be careful to correctly escape inputs to avoid injections.
    • Docs should direct users to an html escaping crate like html-escape.
  • Survey rust html macro crates for solutions to double escaping wrt cases 3,4
  • Collect test cases for various escaping scenarios
  • Evaluate whether the newtype solution to double-escaping could work
  • Implement escaping (aka "draw the rest of the owl" 😆). Probably release a new major version (well, a new 0.x series) because this changes behavior.

Thoughts?

🏹🎯, 👀☝️🌠🌠

@infogulch
Copy link
Author

I'll add notes here.

Searching I found display-as which seems to solve the double-escaping problem similarly to what I was thinking, though they're moving in a different direction template-wise...

@stoically
Copy link
Owner

stoically commented Dec 8, 2022

Thanks for pointing this out and the detailed issue. I think it'd be the right call to escape html by default wherever possible and give some way of explicitly opting out. This would also follow JSX' lead: https://reactjs.org/docs/introducing-jsx.html#jsx-prevents-injection-attacks.

Just not sure if it's feasible to introduce escaping on the parser level, while also allowing to opt put per node.

@gbj
Copy link
Contributor

gbj commented Apr 5, 2023

To be honest I think this is out of scope for syn-rsx itself and should be implemented by the library user, or by the framework that's using syn-rsx internally. The JSX docs you link to are telling: "By default, React DOM any values embedded in JSX before rendering them." It's the library, not the JSX transform, that escapes them.

I can't imagine how a proc-macro could solve problems 3, 4, or 5. If you wanted to build an additional runtime layer on top of it to do that, it makes sense, as long as it's loosely coupled enough to be optional for syn-rsx users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants