Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxying: Test out using vercel's infra to proxy to our php app #1884

Open
danielbeardsley opened this issue Aug 4, 2023 · 12 comments
Open

Comments

@danielbeardsley
Copy link
Member

danielbeardsley commented Aug 4, 2023

Motivation

We've been using Cloudfront as means to service both next.js and our php app from the same domain. Requests flow like this:

flowchart TD
A[Internet] --> B[Cloudfront]
B --> C[Vercel CDN]
B --> D[PHP App]
C --> E[Vercel next.js]
Loading

This was familiar to set up, but has some Cons:

  • CF and Vercel CDN are on different platforms (Vercel CDN uses cloudflare)
    • Thus requests may not take an optimal route between these two systems
  • Having two complex caching proxies between our servers and users has caused surprising problems
  • Dealing with session cookies through Cloudfront has required a lot of thinking and we still haven't done it

Proposal: Vercel as a Proxy

Have Vercel CDN do the front-end proxying of our www.ifixit.com domain and send php-app requests directly to us-east-1.

There's several options here:

Option 1 (Edge middleware with proxylbs)

Keep our proxylbs, let vercel's CDN proxy to proxylb.ifixit.com.

flowchart TD
A[Internet] --> B[Vercel CDN: global]
B --> C[PHP App: proxylbs]
C --> E[PHP App: us-east-1]
B --> D[Vercel next.js: us-east-1]
Loading

Option 2 (Edge middleware with one LB)

Have vercel's CDN send requests directly to one (or two) that are in us-east lb.ifixit.com.

flowchart TD
A[Internet] --> B[Vercel CDN: global]
B --> C[PHP App: us-east-1]
B --> D[Vercel next.js: us-east-1]
Loading

Option 3 (Next proxy)

Have vercel CDN send all requests to next.js and have the proxying to our PHP App happen within the next.js process. both next and our app are in us-east so the network time between them is less than 4ms apart.

flowchart TD
A[Internet] --> B[Vercel CDN global]
D --> |proxy in next| C[PHP App: us-east-1]
B --> D[Vercel next.js: us-east-1]
Loading

Do This

Discuss!!

All of these are testable without changing our routing:

  • Edge middleware
    • Setup edge middleware in next (running on vercel's CDN)
    • Proxy /health-check?op=1 to proxylb.ifixit.com
    • Proxy /health-check?op=2 to lb.ifixit.com
  • Next Proxy
    • Use rewrites section of next config to proxy /health-check?op=3 to lb.ifixit.com
    • If that doesn't work, there are several other ways to do a proxy from next.js to somewhere else

Once these are setup, trying pinging these urls from places around the globe to compare RTT.

CC @sterlinghirsh @dhmacs @djmetzle @sctice-ifixit @davidrans @masonmcelvain ...

@djmetzle
Copy link
Contributor

djmetzle commented Aug 4, 2023

One major concern here would be moving SSL termination to Vercel's CDN. With AWS, we have better control, and best-in-class SLAs.

Vercel and Cloudflare on the other hand are questionably reliable.

@danielbeardsley
Copy link
Member Author

I was going to say I'm sure Vercel's SLA is good... but ooof. They aim for 99.99% uptime (probably realistic) but that doesn't include this 👇

Scheduled Downtime will not exceed eight (8) hours per month and will be scheduled in advance during off-peak hours (based on PT). We will notify you via email of any Scheduled Downtime that will exceed two (2) hours.

!! That's 1% downtime they allow as "Scheduled Maintenance". I don't see any public record of scheduled downtime, so I don't know if it doesn't happen? or they keep it under the radar. I haven't seen any "scheduled downtime" emails about our react-commerce app.

@davidrans
Copy link
Member

I haven't seen any "scheduled downtime" emails about our react-commerce app.

I think we would have noticed if our store went down for 2 hours 😛 If we're seriously concerned about uptime, we probably shouldn't host the store on Vercel at all, right? It is kind of nice that it, currently, wouldn't take out the rest of the site though.

@djmetzle
Copy link
Contributor

djmetzle commented Aug 4, 2023

I just want to contrast that moving entirely off of AWS for routing is a non-trivial tradeoff.

Moving to the Vercel CDN isnt a bad idea in itself, but it will bring significant complications.

The Vercel DNS seems significantly less configurable than Cloudfront. It does not look like a good fit for us to eventually move *.ifixit.com to it.

@danielbeardsley
Copy link
Member Author

It is kind of nice that it, currently, wouldn't take out the rest of the site though

Heh... just the part that makes us money 💸 :-)

The Vercel DNS seems significantly less configurable than Cloudfront. It does not look like a good fit for us to eventually move *.ifixit.com to it.

I'm a bit confused here. I presumed we'd be adding a CNAME from www.ifixit.com to whatever vercel says we should and that's pretty much it:

If you're using a Subdomain (e.g. docs.example.com), you will need to configure it with a CNAME record

At this point, resolutions of www.ifixit.com would point to (hopefully) the closest POP of their CDN.

Is there something else I'm not aware of?

but it will bring significant complications.

I think this would be a good time to enumerate them, before we put more research effort into this idea.

@sterlinghirsh
Copy link
Member

I agree Vercel's SLA isn't the most promising, but I don't think we've seen anything close to that much downtime. They're based on CloudFlare, which promises 100% uptime (zero 9s!). I can't remember ever seeing a scheduled downtime for Vercel. Also, Shopify itself only promises 99.99%, so I don't think this is a big new risk for us.

We may have thought of this already but I think we'll have to do something fancy with the DNS since if www.ifixit.com points at Vercel, Vercel's requests to www.ifixit.com won't make it to our PHP app. Maybe we can have something like php.ifixit.com point to the lb or something and then rewrite the headers in varnish so it looks like www.

Of the proposed options, option 2 (Edge middleware with one LB) seems the simplest. I'm not sure I understand the point of having the proxy lbs in option 1, and I don't think there's any advantage to using the nodejs runtime (I think that's what you meant when you said next.js in your diagram) instead of the edge runtime.

Not sure if this would be best as an edge middleware or an edge function, but we can experiment.

I found some examples of similar behavior in the nextjs docs.

I think a solution like this will be fine for the majority of requests, but I'm still not sure how certain things like large file uploads will work.

@danielbeardsley
Copy link
Member Author

we'll have to do something fancy with the DNS since if www.ifixit.com points at Vercel, Vercel's requests to www.ifixit.com won't make it to our PHP app.

Not fancy with DNS, but api requests will have to connect to something like lb.ifixit.com and explicitly set a Host: www.ifixit.com header. Or perhaps have our app treat lb and www as synonyms or something.

I'm not sure I understand the point of having the proxy lbs in option 1,

I don't want to rely totally on my intuition here which is why I'd suggest doing all of them and then test their RTTs. My intuition says there's a chance that going from cloudflare to a nearest CF POP then over their fast connection to us-east could be faster than direct to us-east. Especially since the nearest pop would maintain an SSL connection to us-east.

I don't think there's any advantage to using the nodejs runtime (I think that's what you meant when you said next.js in your diagram) instead of the edge runtime.

It's not about runtime, it's about where the connection is coming from / going to. Each SSL connection takes 2-3x the RTT. So if you minimize the number of new SSL connections on each request, you can drastically reduce the RTT of normal user requests. This is practically the whole reason we have proxlbs in the first place (they keep an SSL connection alive to us-east).

but I'm still not sure how certain things like large file uploads will work.

I don't see a direct reason they won't work, CF already has a max request time of 60s that we've been dealing with. But experimentation will be needed. Another endpoint that just soaks up a post body and returns the MD5 of it or something would work.

@dhmacs
Copy link
Contributor

dhmacs commented Aug 11, 2023

All 3 options looks good to me. I'd probably test option 3, since it's also the one mentioned on the docs, which @sterlinghirsh pointed out (https://nextjs.org/docs/app/api-reference/next-config-js/rewrites#incremental-adoption-of-nextjs).

It's probably the only option that automatically picks up new pages we migrate to Next without us having to directly maintain a list of paths.

To implement option 3 effectively, I think we should get rid of the catch all route (https://github.com/iFixit/react-commerce/blob/main/frontend/pages/%5B...slug%5D.tsx). It's been added so that the marketing team can create marketing pages at any path (by setting the path in Strapi), but right now it's only being used for /Store.
To make our life easier we can restrict marketing pages under a subpath (with the exception of /Store of course) so that Next.js proxy can fallback to iFixit PHP for routes that are not being handled.

By the way I endorse this plan, I think it'll make our life easier when testing CDN caching and adopting new stuff like edge runtime 👍

@sctice-ifixit
Copy link
Member

@djmetzle You mentioned the other day that you thought it was possible to support an arbitrary number of path prefixes in CloudFront, or at least more than the <100 we've been assuming we'd be able to support—enough that we could support the #lang × #stores × # app route prefixes that we need. Can you explain the approach you had in mind?

@djmetzle
Copy link
Contributor

We can set up an origin request handler to do content based origin routing:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-examples.html#lambda-examples-content-based-routing-examples

That would let us handle a (nearly) unlimited number of NextJS paths.

@danielbeardsley
Copy link
Member Author

We can set up an origin request handler to do content based origin routing:

Ha! I looked dearly for those examples... hmm maybe I found them and dismissed them because they use lambda@Edge instead of cloudfront functions.

I remember thinking that lambda@edge is really slow. I don't remember why, but I hunted for performance numbers and found this: https://medium.com/@pauly4it/cloudfront-functions-20-faster-than-cloudflare-workers-230-faster-than-lambda-edge-c65c26221296

This was 1.5 years ago but the methodology seems solid and results seem pretty damning.

@djmetzle
Copy link
Contributor

I thought i'd found an example of using a plain cloudfront function to update the origin, but i cant seem to find anything along those lines now. We could rewrite the path of the request in a cloudfront function (to something like /nextjs/$1?) but that could take some rework.

Would keeping the Vercel origin completely contained at at a specific path prefix be feasible in the vercel CDN? That could help. Bonus points if it can support a path prefix in parallel with the current setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants