Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't access objects that have slashes in the key (ID) #73

Open
fitnycdigitalinitiatives opened this issue Jul 11, 2022 · 18 comments
Open
Labels
bug Something isn't working

Comments

@fitnycdigitalinitiatives

Hello,

I'm testing out the new version, specifically the cloudfront-enabled version and I'm running into a wall. Logs are all showing this kind of error below. I deployed from the SAR. Just a basic deployment to test.

2022-07-11T21:16:44.995Z 5a2df24f-5d7d-43f8-95c8-40646f2eb2c1 ERROR NotFound: null at Request.extractError (/opt/nodejs/node_modules/aws-sdk/lib/services/s3.js:691:35) at Request.callListeners (/opt/nodejs/node_modules/aws-sdk/lib/sequential_executor.js:106:20) at Request.emit (/opt/nodejs/node_modules/aws-sdk/lib/sequential_executor.js:78:10) at Request.emit (/opt/nodejs/node_modules/aws-sdk/lib/request.js:686:14) at Request.transition (/opt/nodejs/node_modules/aws-sdk/lib/request.js:22:10) at AcceptorStateMachine.runTo (/opt/nodejs/node_modules/aws-sdk/lib/state_machine.js:14:12) at /opt/nodejs/node_modules/aws-sdk/lib/state_machine.js:26:10 at Request.<anonymous> (/opt/nodejs/node_modules/aws-sdk/lib/request.js:38:9) at Request.<anonymous> (/opt/nodejs/node_modules/aws-sdk/lib/request.js:688:12) at Request.callListeners (/opt/nodejs/node_modules/aws-sdk/lib/sequential_executor.js:116:18) { code: 'NotFound', region: null, time: 2022-07-11T21:16:44.990Z, requestId: 'C0C0V2MZNE9SSCKT', extendedRequestId: 'm2x+Yj50FwGfFicNhs+U2JYmw/Pe4EUuwqgMDc1TD6W3QGbA7CUVHenECjtDJ8fHouYtYp2kg3o=', cfId: undefined, statusCode: 404, retryable: false, retryDelay: 8.697074109187763 }

Thanks,

Joseph

@mbklein
Copy link
Member

mbklein commented Jul 11, 2022

What URL are you requesting?

  • If it's /iiif/2, try an actual image request. The former OK response to the root resource broke when I switched from API Gateway to Function URL.
  • If it's an image request (either info.json or an actual image response), make sure the image file corresponding to the ID you passed in actually exists in the S3 bucket you specified when you deployed.

I'm going to put in a fix for both of these – the former should give an OK response and the latter should return a 404 Not Found instead of failing the way it does.

@mbklein
Copy link
Member

mbklein commented Jul 11, 2022

Splitting into issues #74 and #75 and closing.

@mbklein mbklein closed this as completed Jul 11, 2022
@fitnycdigitalinitiatives
Copy link
Author

Hi Michael,

After some testing, it appears the issue is with accessing objects that have slashes in the key, and I am properly encoding them.
So, in the pre-lambda url version which I'm also running, this works: https://d2lezwyxk8gi6a.cloudfront.net/iiif/2/ad-test%2Fph_000207/info.json
But it doesn't work with the new version: https://bocmgyklbvlgynuva5z4kwvqny0ybsxd.lambda-url.us-east-1.on.aws/iiif/2/ad-test%2Fph_000207/info.json but this object without slashes does work https://bocmgyklbvlgynuva5z4kwvqny0ybsxd.lambda-url.us-east-1.on.aws/iiif/2/US.NNFIT.SC.8.1.2.1941-001/info.json

I wonder if this is something inherent in the Lambda URL approach because it doesn't look like anything changed regarding slashes in the code itself.

Thanks,

Joseph

@fitnycdigitalinitiatives
Copy link
Author

Also, occurs in both cloudfront-enabled and not. It's just that I was first testing in the cloudfront version...

@fitnycdigitalinitiatives
Copy link
Author

Hello again,

After some more testing, I've confirmed what's going. It seems that the Lambda URL converts the encoded slashes to real slashes before it ever gets to the function. So even though the initial URL is https://bocmgyklbvlgynuva5z4kwvqny0ybsxd.lambda-url.us-east-1.on.aws/iiif/2/ad-test%2Fph_000207/info.json the path in the event is /iiif/2/ad-test/ph_000207/info.json.

@mbklein mbklein reopened this Jul 13, 2022
@mbklein mbklein changed the title Can't deploy new cloudfront-enabled version Can't access objects that have slashes in the key (ID) Jul 13, 2022
@mbklein mbklein added the bug Something isn't working label Jul 13, 2022
@mbklein
Copy link
Member

mbklein commented Jul 13, 2022

I've reopened this issue as a bug and renamed it to reflect what's actually going on. This is similar to (but not exactly the same as) #52, and I believe they both need to be fixed at the node-iiif level.

I have a good idea of how to fix this reliably, but it might take me a few days to find time for it.

@fitnycdigitalinitiatives
Copy link
Author

fitnycdigitalinitiatives commented Jul 13, 2022

Ok, thanks so much.

Just curious, are you thinking in the parse_url function, instead of getting id by popping off the last segment before region, getting the string between region and .../iiif/2/?

Edit: though, I guess that would presume that iiif/2 is the prefix, so maybe that wouldn't work

@mbklein
Copy link
Member

mbklein commented Jul 13, 2022

In node-iiif, I was trying not to dictate the path prefix, even though serverless-iiif always uses /iiif/2/. The fix (which is almost done) involves node-iiif using a default path prefix of /iiif/2/ while allowing a different one to be specified in the constructor.

@mbklein
Copy link
Member

mbklein commented Jul 13, 2022

The PR is in. As soon as samvera/node-iiif#23 is reviewed and approved for merging, I'll be able to release v4.0.2 with the fix.

@fitnycdigitalinitiatives
Copy link
Author

Excellent, thanks so much

@ryantomaselli
Copy link

ryantomaselli commented Jul 14, 2022

Update:
I was able to add CORS entries for the function URL and things work great.

Question though: is there a reason the manifest uses the function URL rather than the cloudfront one? If it did then one would only need to configure CORS on the cloudfront end point.


Thanks for the fix @mbklein. The keys I was using also comprised of slashes and this latest release addresses that.

A couple of changes I had to make to get things working for me are the following:

1 . Update resolvers.js
Make the lambda handle non-tif files

const defaultStreamLocation = (id) => {
  const key = id + '.tif';
  return { Bucket: sourceBucket, Key: key };
};

to

const defaultStreamLocation = (id) => {
  return { Bucket: sourceBucket, Key: id };
};

2. Update index.js to enable CORS

const makeResponse = (result, event) => {
  const { isBase64 } = helpers;

  const base64 = isBase64(result);
  const content = base64 ? result.body.toString('base64') : result.body;

  return {
    statusCode: 200,
    headers: {
      'Content-Type': result.contentType,
      'Access-Control-Allow-Origin': event.headers.origin,
      'Access-Control-Allow-Credentials': 'true'
    },
    isBase64Encoded: base64,
    body: content
  };
};

3. Add a Response headers policy to the Cloudfront distribution created to satisfy CORS

So now my front end calls the end point at https://xxxxxxxx.cloudfront.net/iiif/2/path/to/file/in/s3/info.json that resolves fine but the @id in the manifest refers to @id: https://xxxxxxxx.lambda-url.us-east-1.on.aws/iiif/2//path/to/file/in/s3".

Not quite sure why it refers to the lambda-url rather than the cloudfront one...but then when my viewer starts asking for tiles I get a CORS error

Access to XMLHttpRequest at 'https://xxxxxxxx.lambda-url.us-east-1.on.aws/iiif/2//path/to/file/in/s3/4096,0,305,1024/153,/0/default.jpg' from origin 'https://mydomain.com' has been blocked by CORS policy: The 'Access-Control-Allow-Origin' header contains multiple values 'https://mydomain.com, *', but only one is allowed.

Am I doing something wrong here? Any pointers very much appreciated.

@mbklein
Copy link
Member

mbklein commented Jul 15, 2022

  1. .tif is going to remain the default for the moment, but I've created Add resolverTemplate to allow for simple resolution without a custom function #76 that should give you the flexibility you need without changing code.
  2. CORS handling was removed from index.js (in favor of letting the function URL config handle the most common case) in 4c9c741 as a fix for Cross-Origin Request Blocked on info.json files #71. If you want to add CORS back into the function, you'll also need to remove it from the template. This will also fix the multiple values issue in your third point.
  3. Ah, this is a regression. I'll add a bug issue and fix the @id issue as soon as I get a chance.

@ryantomaselli
Copy link

ryantomaselli commented Jul 15, 2022

Thanks @mbklein.

Question: would it be problematic to turn off public access to the cache bucket? I know I can override the template myself, just wondering if having it be default would work for your common use case. So you adding the following to the template.

      PublicAccessBlockConfiguration:
        BlockPublicAcls : true
        BlockPublicPolicy : true
        IgnorePublicAcls : true
        RestrictPublicBuckets : true

@mbklein
Copy link
Member

mbklein commented Jul 15, 2022

The cache bucket was never supposed to be publicly accessible. If it is, that's a bug. It should only be accessible by the CloudFront Origin Access Identity by policy.

@mbklein
Copy link
Member

mbklein commented Jul 15, 2022

Please see this comment for a discussion of the @id hostname issue. I had run into this before, but forgot what the real issue is (because we use a custom hostname).

@mbklein
Copy link
Member

mbklein commented Jul 15, 2022

Thanks @mbklein.

Question: would it be problematic to turn off public access to the cache bucket? I know I can override the template myself, just wondering if having it be default would work for your common use case. So you adding the following to the template.

Added in 5be56ee

@ryantomaselli
Copy link

Thanks @mbklein.
Question: would it be problematic to turn off public access to the cache bucket? I know I can override the template myself, just wondering if having it be default would work for your common use case. So you adding the following to the template.

Added in 5be56ee

Thanks for adding this @mbklein. One another nicety would be to add a comment for the distribution just so it displays more obviously in the AWS console.

I added the following to the template:

  CachingEndpoint:
    Type: "AWS::CloudFront::Distribution"
    Properties:
      DistributionConfig:
        Comment: !Sub "${AWS::StackName} Cloudfront Distribution"

@fitnycdigitalinitiatives
Copy link
Author

Hi @mbklein,

Back to the original slash issue, everything is working (thanks so much!!), but I did notice that the id in the info.json file that gets written includes the decoded slashes. That's probably an easy fix (just re-encode the id when writing that file in iiif-processor), but I'm not sure if it's really a big issue because everything works for the application either way, whether they're decoded or encoded. I just happened to notice it because it didn't jive with the authentication function I was working on, but I was able get that working on my end.

But for instance if you make a request at .../iiif/2/ad-test%2Fph_000198/info.json the info.json reads "@id":".../iiif/2/ad-test/ph_000198"

Thanks,

Joseph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants