Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid free variable conflicts within descendant scopes instead of global #5398

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

edemaine
Copy link
Contributor

@edemaine edemaine commented Jan 25, 2022

Instead of a global referencedVars option that is computed at the lexer level (via all IDENTIFIER tokens), recurse through the AST to mark each scope with all variables used within it and descendant scopes.

Likely also more efficient via Sets instead of Arrays.

Also fix some missing children in For nodes, which otherwise caused this rewrite to fail.

Fixes #4865 by causing parameter renaming in fewer (more predictable) scenarios.

I've separated into separate commits for:

  1. Actual changes.
  2. Changes to the compiler that result from these changes (running cake build a second time), so you can see the reduced renaming.
  3. Rebuilding the browser version.

Instead of a global `referencedVars` option that is computed at the
lexer level (via all IDENTIFIER tokens), recurse through the AST to mark
each scope with all variables used within it and descendant scopes.

Likely also more efficient via `Set`s instead of `Array`s.

Also fix some missing `children` in `For` nodes, which otherwise caused
this rewrite to fail.

Fixes jashkenas#4865 by causing parameter renaming in fewer (more predictable)
scenarios.
@GeoffreyBooth
Copy link
Collaborator

This seems fine. Any idea why the prior implementation was written the way that it was? Or put another way, are there any disadvantages of this refactor?

Also wouldn’t referencedVars tell us a key thing we would need to know to implement const, at least within a function scope?

@edemaine
Copy link
Contributor Author

edemaine commented Jan 25, 2022

Any idea why the prior implementation was written the way that it was? Or put another way, are there any disadvantages of this refactor?

@lydell wrote the original, and in #4865 (comment) says:

Why? Because it’s simple. Tracking scope is more complicated for no gain. And even if we did that it wouldn’t help if the name is in the same scope.

So the main reason was simplicity. I imagine the original initialization is also somewhat faster (array scan vs tree scan), though lookups should be faster here; and the original approach was more robust to mistakes in children lists (but those would be bugs that should be fixed anyway).

I agree that the new approach doesn't help renaming if the name is in the same scope (or descendant scope), but in that scenario the renaming is not surprising, whereas the current nonlocal behavior is very surprising (and especially annoying with jsdoc).

@lydell, do you agree that this is a good approach?

Also wouldn’t referencedVars tell us a key thing we would need to know to implement const, at least within a function scope?

Yes, I had the same thought. I imagine findReferencedVars could be extended to do the static analysis first pass that I mentioned in #5377 (comment), if we also extend to block scopes.

@GeoffreyBooth
Copy link
Collaborator

I imagine the original initialization is also somewhat faster

I ran cake bench on this branch and on main and sometimes this branch was faster, sometimes main was. So the performance impact seems to be negligible.

I want to leave this open for a few days in case @lydell or anyone else wants to review or discuss it, but it looks good to me.

Yes, I had the same thought. I imagine findReferencedVars could be extended to do the static analysis first pass that I mentioned in #5377 (comment), if we also extend to block scopes.

I guess the issue with potentially outputting a declaration/first assignment “in place,” as opposed to moving it up to where the var line is now, is that the current code might be relying on the hoisting; like if something then a = 1, where a needs to be in the function scope and not in that if block scope. So we probably can’t avoid doing the work of tracking block scopes in addition to function scopes, if we want to ever output const. But I feel like doing the work to track block scopes will probably yield some benefits.

@lydell
Copy link
Collaborator

lydell commented Jan 26, 2022

I won’t be reviewing, but thanks for asking!

@STRd6
Copy link
Contributor

STRd6 commented Apr 17, 2022

I'm excited for this! Is there anything I can do to help?

@GeoffreyBooth
Copy link
Collaborator

I’m excited for this! Is there anything I can do to help?

This one was pretty much ready to merge, but I was hoping it could get another reviewer. If you don’t mind reviewing it, and updating it per latest main, I think that’s all it needs.

While you review, my main concern with this PR and with #5395, though more so with #5395, was that I found the code related to scope-tracking quite dense and hard to follow. It could benefit from a lot more commentary, from you or @edemaine if you have time to better explain what’s going on. See the review of #5395 to get a sense of the parts I found confusing.

@edemaine
Copy link
Contributor Author

Sorry for the long hiatus. In some ways, coming back to this code after a long delay is good for seeing how confusing it is. 🙂

It could benefit from a lot more commentary, from you or @edemaine if you have time to better explain what’s going on.

I added a bunch of comments to document how this specific PR works. There's still a lot to explain how the Scope class works, but that's really a task for #5395, which will be my next task (not sure when exactly).

I also merged with main, so this should be ready to review/merge.

@@ -106,6 +100,7 @@ exports.compile = compile = withPrettyErrors (code, options = {}) ->
ast.tokens = tokens
return ast

nodes.findReferencedVars()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this doing? Populating the referencedVars property of the root and all children? Can we add a comment explaining this?

Comment on lines +364 to +372
# `findReferencedVars` recurses through the source parse tree to mark every
# "scope" parse node (`Root` and `Code`s) with all referenced
# (accessed or assigned) variables in that source scope, so that every scope
# knows what to avoid when generating new variable names.
# Specifically, it takes in an array of `scopeNodes`,
# each of which already has a `@referencedVars` attribute that is a `Set`.
# For each found reference to an identifier (see `IdentifierLiteral`'s
# override for `findReferencedVars`), it adds the name (as a `String`)
# to each scope node's `referencedVars` Set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment, just updating typography to match (most, if not all) other comments.

Suggested change
# `findReferencedVars` recurses through the source parse tree to mark every
# "scope" parse node (`Root` and `Code`s) with all referenced
# (accessed or assigned) variables in that source scope, so that every scope
# knows what to avoid when generating new variable names.
# Specifically, it takes in an array of `scopeNodes`,
# each of which already has a `@referencedVars` attribute that is a `Set`.
# For each found reference to an identifier (see `IdentifierLiteral`'s
# override for `findReferencedVars`), it adds the name (as a `String`)
# to each scope node's `referencedVars` Set.
# `findReferencedVars` recurses through the source parse tree to mark every
# scope parse node (`Root` and `Code`s) with all referenced
# (accessed or assigned) variables in that source scope, so that every scope
# knows what to avoid when generating new variable names.
# Specifically, it takes in an array of `scopeNodes`,
# each of which already has a `@referencedVars` attribute that is a `Set`.
# For each found reference to an identifier (see `IdentifierLiteral`s
# override for `findReferencedVars`), it adds the name (as a `String`)
# to each scope nodes `referencedVars` Set.

@@ -515,9 +528,16 @@ exports.Root = class Root extends Base
super()

@isAsync = (new Code [], @body).isAsync
@referencedVars = new Set # all referenced variable names in this scope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@referencedVars = new Set # all referenced variable names in this scope
# This set contains all the referenced variable names in this scope.
@referencedVars = new Set


children: ['body']

findReferencedVars: (scopeNodes = []) ->
# This is called at the top level to start the recursion,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to make this “automatic,” in the sense that we don’t need to remember to call findReferencedVars on the root, outside of nodes.coffee, to trigger this recursion? I assume we probably can’t call it in Root‘s constructor, as that would be too early? Could it happen within compile or compileNode, as maybe those get called after all the child nodes are added?

@@ -3904,6 +3929,7 @@ exports.Code = class Code extends Base
@isGenerator = no
@isAsync = no
@isMethod = no
@referencedVars = new Set # all referenced variable names in this scope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@referencedVars = new Set # all referenced variable names in this scope
# This set contains all the referenced variable names in this scope.
@referencedVars = new Set

@@ -5342,7 +5373,7 @@ exports.For = class For extends While
@addBody body
@addSource source

children: ['body', 'source', 'guard', 'step']
children: ['body', 'name', 'index', 'source', 'guard', 'step']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these added? I don’t see the connection to the “referenced vars” change. Not opposed, but I’m wondering if this should be here (or is part of a separate goal).

@@ -49,7 +47,8 @@ replDefaults =
isAsync = ast.isAsync
# Invoke the wrapping closure.
ast = new Root new Block [new Call ast]
js = ast.compile {bare: yes, locals: Object.keys(context), referencedVars, sharedScope: yes}
ast.findReferencedVars()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment above, I’m wondering if we could do without this call, that it could happen automatically as part of assembling the nodes tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

constructor renaming parameters
4 participants