Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use process.getBuiltinModule for hybrid node modules #294

Open
pi0 opened this issue Aug 20, 2024 · 6 comments
Open

Use process.getBuiltinModule for hybrid node modules #294

pi0 opened this issue Aug 20, 2024 · 6 comments

Comments

@pi0
Copy link
Member

pi0 commented Aug 20, 2024

Following up on an idea during the discussion with @IgorMinar and @petebacondarwin.

Currently we have $cloudflare specific entries that allow hybrid support for runtime compatible features using process.getBuiltinModule(id). (example: node:buffer)

Since process.getBuiltinModule is a Node.js standard, we might instead just merge entries. We can still treeshake (each export prefers built-in module if available and fallback to polyfill).

Cons:

  • Unenv fallbacks are always bundled (although can be lazy evaluated and are small)
  • Less control over runtime (runtime can enable natives in the future that might be incompatible with polyfill, however unlikely)

Pros:

  • We don't need to maintain a list of supported Node.js APIs (workerd#2097)
  • Universal support for other runtimes to also support
  • Future compatible. If runtime supports a new built-in natively, unenv code adopts it. Being adaptable, means also unenv can smartly change behavior based on compat date and flags without need to know info on build time
@pi0 pi0 added enhancement New feature or request discussion and removed enhancement New feature or request labels Aug 20, 2024
@IgorMinar
Copy link
Collaborator

I really like the desire not to rely on cloudflare/workerd#2097 and instead create the hybrid polyfills on the fly, but what I'm concerned about and I don't quite understand is the claim that disabling tree-shaking doesn't matter much, or at least that's how I read "Unenv fallbacks are always bundled (although can be lazy evaluated and are small)"

@pi0 can you please help me understand how this would not matter? some of the unenv polyfills are not small (e.g. node streams, url, etc). So how can we ignore the significant size of these?

@pi0
Copy link
Member Author

pi0 commented Sep 13, 2024

You are right @IgorMinar we will bundle polyfills that's the inevitable cost if we do this for other benefits (still bundlers can tree-shake per export - not per module that was my side-point).

IIRC the initiatives behind the idea were mentioned when we were talking about the possibility of shipping unenv dist via workers module registry directly instead of bundler.

Out of curiosity, I did a POC to see how much is the cost of entire unenv node polyfills bundled. It is ~191.6kb (!). Even without module registry and even without making an optimized bundle that takes into account workerd native node support which makes that number much less (we can switch the entry) I think the ratio of this number compared to the whole next.js server (or any deployment that needs node compat) bundle is fairly good. As a side-note, it will also solve process.getBuiltinModule to actually work in runtime (instead of returning workerd native subsets, it actually will work).

@IgorMinar
Copy link
Collaborator

some in progress notes... still wip

Pros:

Cons:

  • some exports might be only partially implemented, and we'd need to iterate not only over module exports, but also over the symbol properties (e.g. globalThis.process) which can become error prone or expensive
  • if workerd has a problematic native implementation, we won't be able to override it in unenv (shouldn't happen, so maybe not a problem?)

@pi0
Copy link
Member Author

pi0 commented Sep 16, 2024

If workerd has a problematic native implementation, we won't be able to override it in unenv (shouldn't happen, so maybe not a problem?)

If workerd have a known issue, unenv polyfills can explicitly avoid using it (per export) and only use polyfill. (unenv has control per build/deployment)

If workerd has a problematic native impl for any reason or drops a feature in favor of polyfill (like #302 i guess?) or when it enables it back, this approach actually self-adopts without need to rebuild/deploy. (workerd has control at runtime)

some exports might be only partially implemented, and we'd need to iterate not only over module exports, but also over the symbol properties (e.g. globalThis.process) which can become error prone or expensive

You are right about the cost of fallback check. But it can be as low as a one time nullish check per export or symbol|prop i guess right? (globalThis.process.*, is polyfilled once)

@IgorMinar
Copy link
Collaborator

IgorMinar commented Sep 17, 2024

@pi0 the notes above were from my brainstorming session with @petebacondarwin which we had to wrap up quickly so I just posted the WIP notes we captured before we finished.

I see a lot of value in this but I'm also a bit uneasy because of the potential cost of this approach, and hard to predict future interactions between workerd and unenv, which evolve and are updated differently:

  • workerd keeps on evolving and we update it's version in prod daily/weekly
    • it's a semi unpinned dependency in prod
    • a semi because compat_date does pin the behavior but not the implementation somewhat
  • the unenv layer remains pinned and is versioned along with the deployed applications
    • is baked into the deployed application bundle
    • unenv version changes only if the developer uses a different wrangler version (which brings in different unenv version), and redeploys the application using this different version of wrangler

In an ideal world, the version skew issue should not be a problem, it's just very hard to predict all of the possible interactions and corner-cases given the magnitude of the surface area, and the fact that workerd, node.js, and unenv are continuously changing and evolving.

So maybe it makes sense to assume the ideal world conditions and consider version skew not to be a problem, and explore if performance cost would rule out this approach even before we need to worry about version skew.

And performance comes to two things:

  1. how expensive (in terms of storage) would it be to include the entirety of unenv runtime in each worker bundle
    • we should keep in mind that unenv continue to grow, and especially if we start porting larger chunks of node.js runtime to unenv, it could grow significantly)
  2. how expensive (in terms of CPU time) would it be to perform a granular workerd API surface check and assemble hybrid polyfill layer on the fly
    • unlike with size, it's unlikely that we need to worry too much about the CPU time increasing over time, as it's unlikely that the node.js API surface area would double or triple any time soon (even when considering new additions like sqlite api, etc)

Both of these could be explored via experimentation in real world conditions, and we could help explore these some time over the next few weeks (just not in September). Thoughts?

@pi0
Copy link
Member Author

pi0 commented Sep 18, 2024

Thanks for sharing your thoughts @IgorMinar.

Let's stay with the current approach (unenv-bundled-by-usage) for wrangler tooling, considering the current system works for you and the announced date is close, we can keep manual sync between workerd<>unenv<>wrangler versions 👍🏼


Considering past experiences of build-time-only polyfills we had with Nitro/Nuxt , i am really positive that increased [runtime] coverage and stability of this alternative approach worth experimenting.

I will put a pending label on this issue until we setup a workerd testing system in unenv repo so we can properly measure any possible overhead and test functionality in actual runtime directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants