Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filename regex #328

Open
adamjtaylor opened this issue Dec 6, 2023 · 8 comments · Fixed by #327 or #333
Open

Fix filename regex #328

adamjtaylor opened this issue Dec 6, 2023 · 8 comments · Fixed by #327 or #333
Labels

Comments

@adamjtaylor
Copy link
Contributor

Per FDS-1416 we were getting a DCA crash due to an invalid regex for the attribute Filename.

Investigation revealed that the baskslash escape of the forward slash was not needed.

@adamjtaylor adamjtaylor self-assigned this Dec 6, 2023
@adamjtaylor adamjtaylor linked a pull request Dec 6, 2023 that will close this issue
@adamjtaylor adamjtaylor reopened this Dec 8, 2023
@adamjtaylor
Copy link
Contributor Author

Reopening as I'm still seeing an error. There seems to be entanglement between the regex and JSON escaping, and Milen mentioned that there may be some legacy escaping functionality within schematic itself.

Intend to test behaviour in the refactor.

@adamjtaylor
Copy link
Contributor Author

Re-opening as the PR #333 was a temp fix.

@aclayton555
Copy link
Contributor

As of 23-12 close out, Adam cannot get this to work. Seem to be multiple limitations at different staging (possible bugs re: escape characters).

  • CSV to JSON-LD conversion - might be adding some escape characters
  • Also cannot seem to add the right sequence of escape character to the CSV.

Mitigation:

  • we are already checking for this in our release scripts
  • already some known oddities about the special characters that we are using vs what CDS and Velsera are using

Approach:

  • revisit this once we have some feedback from CDS and Velsera.
  • Ideally want to pursue something here before DR6
  • Or, consider strict approach in renewal

Come back to after DR5. Another question is which file name platforms we need to support, Synapse, AWS, GCP, DRS, generic S3 protocols, SD, and CDS

@adamjtaylor
Copy link
Contributor Author

Email from Amanda: File Naming Conventions
“All projects are recommended to follow the guidelines provided in the link: https://docs.sevenbridges.com/docs/run-a-task#select-input-files
File naming convention: allowed characters in input file names are a-z, A-Z, 0-9, dash (- ), period (. ), semicolon (; ), tilde (~) and hash (#). If there are input files containing other characters in their file names, task execution will not be started”

@adamjtaylor
Copy link
Contributor Author

adamjtaylor commented Feb 8, 2024

For Synapse

This looks like the entity name regex, not sure where it is in the docs. I think there's a nice error response (that includes the allowable characters) if you have an invalid character in your entity name

ModelConstants.java

    public static final String VALID_ENTITY_NAME_REGEX = "^[a-zA-Z0-9,_. \\-+()']+";

@adamjtaylor
Copy link
Contributor Author

For AWS S3

The following character sets are generally safe for use in key names.

Alphanumeric characters
0-9

a-z

A-Z

Special characters
Exclamation point (!)

Hyphen (-)

Underscore (_)

Period (.)

Asterisk (*)

Single quote (')

Open parenthesis (()

@adamjtaylor
Copy link
Contributor Author

adamjtaylor commented Feb 8, 2024

Intersection appears to be only

Alphabets (a-z, A-Z)
Numbers (0-9)
Special characters: dash (-), period (.), underscore (_)

SB confirmed that underscore is permissible

@aclayton555
Copy link
Contributor

Through HTAN phase 1, take the approach of fixing any illegal file names. names (Clarisse is doing this already). Pick this back up as part of ID/file naming guidance in renewal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants