Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some node names that include numbers are not parsed #43

Open
gbj opened this issue Nov 12, 2022 · 3 comments
Open

Some node names that include numbers are not parsed #43

gbj opened this issue Nov 12, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@gbj
Copy link
Contributor

gbj commented Nov 12, 2022

Per the HTML spec, an attribute name like data-id-25 is valid (albeit maybe unusual)

Attribute names must consist of one or more characters other than the space characters, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E GREATER-THAN SIGN (>), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control characters, and any characters that are not defined by Unicode. link

This can't be parsed by syn-rsx at present because each segment of the node name needs to be an Ident, and 25 isn't a valid Rust identifier.

You're probably more familiar with syn than I am but I wonder if changing the Punctuated variant of NodeName to something like

Punctuated(Punctuated<NodeNameFragment, Punct>),

where

pub enum NodeNameFragment {
  Ident(Ident),
  Number(u32)
}

would be a possibility. This only expands the possibilities to include numbers, and not other possibly-valid attribute names that aren't valid Rust identifiers, but would be a start.

Here's an issue in my repo with an example of where someone would use it—using a class: syntax to toggle a Tailwind-like CSS class name that included a number after a dash.

@stoically
Copy link
Owner

stoically commented Nov 24, 2022

Turning the fragment into an enum sounds generally like a workaround that could work. However, while giving an implementation a quick look, it turned out that this might get tricky in terms of adjusting the node_name_punctuated_ident_with_alternate method.

An alternative would be to switch an approach that also relies on rust-lang/rust#54725, just like the quoted text feature. Basically: Parse all tree tokens until we hit any of the not allowed chars. But given that's nightly only, it's probably not something that would work as solution right now for your consumers? Because further expanding the node_name_punctuated_ident_with_alternate method, which already is kinda hack-ish doesn't sound too appealing, tho I'd be totally fine with doing so if we have to.

@stoically stoically added enhancement New feature or request help wanted Extra attention is needed labels Nov 24, 2022
@gbj
Copy link
Contributor Author

gbj commented Nov 24, 2022

I'd be comfortable relying on nightly if people want to be able to use this particular kind of node name. (As long as the library also works without nightly and just works as it does currently!)

We already encourage nightly for some features it enables, with an opt-out feature for stable, so it would work for me at least.

@stoically stoically removed the help wanted Extra attention is needed label Dec 8, 2022
@stoically
Copy link
Owner

Didn't have a chance to continue working on this and probably won't for some time still. There's an unquoted-text branch, in case you are interested to give it look. I somehow didn't manage to get the correct line/column numbers for spans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants