Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Output JS-compatible line numbers #5264

Closed

Conversation

helixbass
Copy link
Collaborator

@helixbass helixbass commented Dec 17, 2019

@GeoffreyBooth this is an interesting one I came across while running the ESLint plugin against the Coffeescript repo source code:

It was crashing running ESLint against test/regex.coffee because of the literal U+2028/U+2029 (Unicode linebreak) characters in that file

Looking into it, it turns out our location data/line numbers aren't exactly JS-compatible - per eg here, JS considers newline (\n), carriage return (\r), carriage return + newline (\r\n), U+2028 and U+2029 to be line terminators

Whereas we strip carriage returns and then just consider newlines to be line terminators

So what was happening is that in a file with "unusual" line terminators according to JS (eg the U+2028 and U+2029 in test/regex.coffee), ESLint was doing the JS version of counting linebreaks and then getting confused when a certain line didn't have the number of columns reported by our location data

So thus far this PR swaps in the JS-compatible regex for detecting linebreaks in the obvious places. Which did get ESLint working against test/regex.coffee

But I'm probably inclined to not try and get this merged before initial release, as (a) it should only cause problems in files where someone's got "fake linebreak" characters floating around and (b) I think it's worth thinking through more fully

For one thing, I think that if someone had a \r in their source code that wasn't "attached" to a following \n, it'd need to be accounted for - since we strip out \r's, I think we'd need to go a step further than the locationDataCompensations (that we added into the lexer to get accurate location data despite things like \r's having been stripped out) and do something like additionally "remember" the source offsets of any "unattached" \r's and then take those into account when generating location data

But ultimately it seems like it'd be a good idea to be compatible with JS's definition of linebreaks in our location data?

Based on ast-splat-param-location-data, here is just the diff against that branch

@@ -1314,6 +1314,8 @@ OPERATOR = /// ^ (

WHITESPACE = /^[^\n\S]+/

exports.LINEBREAK = LINEBREAK = /\r\n|[\r\n\u2028\u2029]/ug
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex came from here

@GeoffreyBooth
Copy link
Collaborator

But ultimately it seems like it'd be a good idea to be compatible with JS's definition of linebreaks in our location data?

Yes, I feel like this might fix a lot of the bugs we get reported about wrong line numbers in source maps and stack traces.

@helixbass helixbass force-pushed the ast-js-compatible-line-numbers branch 4 times, most recently from e0ce666 to 771ea6e Compare December 22, 2019 20:35
@helixbass
Copy link
Collaborator Author

Reopened as #5277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants