Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JED Checker to report linebreaks in language files #242

Open
toivo opened this issue Nov 29, 2023 · 4 comments
Open

JED Checker to report linebreaks in language files #242

toivo opened this issue Nov 29, 2023 · 4 comments

Comments

@toivo
Copy link

toivo commented Nov 29, 2023

The latest Joomla updates 4.4.1 and 5.0.1 break extensions if the language strings have linebreaks. The JED Checker is already reporting language strings that start or end with a space character. It should report linebreaks inside the language string so that webmasters and extension developers can fix the linebreaks that stop the translation of the language constants.

Ref. [#42416] - [4.4.1] Language constants of some extensions not translated

@toivo
Copy link
Author

toivo commented Nov 30, 2023

Clarification: the linebreaks in question are hard returns, what used to be called carriage return and line feed characters. The HTML linebreak tags
are allowed.

@dryabov
Copy link
Collaborator

dryabov commented Dec 1, 2023

To be honest, I'm a bit shocked by the latest security patch. I know it is possible to use PHP's constants in string values, but website visitors are not able to manipulate language files, so I don't see the attack vector here.

The difference between normal and raw parsers is not just the support for multiline strings (although it was probably used before, e.g. for initial values in textarea fields, and now developers have to rewrite code just to support the new syntax).

For example, in a normal parser you can specify values in both double and single quotes, but now single quotes will be parsed incorrectly (they will be included in the translated string). Of course, we can warn developers to replace single quotes with double quotes (and don't forget to escape double quotes inside the string).

But the most important thing: before it was possible to use escape sequences, but now only \" is supported. For example, we discussed earlier that a backslash followed by a double quote should be encoded as \\\" according to PHP rules (and I even patched PHP core to handle it properly). And now it doesn't work because in Joomla 5.0.0 it should be \\\" and in 5.0.1 it should be \\". There is no common denominator!!!

Finally, this patch affects performance, because strings were previously loaded as

$strings = parse_ini_file($fileName);

and now

$strings = parse_ini_file($fileName, false, INI_SCANNER_RAW);
$strings = str_replace('\"', '"', $strings);

so, Joomla has to do post-processing on each and every loaded string.

I'll try my best, but there's clearly more to it than just a warning about multiline strings.

@toivo
Copy link
Author

toivo commented Dec 1, 2023

Thank you for your comments. Now I understand better what is involved, but it would be brilliant to get it done.

The change in 4.4.1 and 5.0.1 was a total surprise. I have been developing a component and the documentation at Creating a language definition file does not say that the language string is limited to one line and that no cr/lf characters are allowed. I just added the following text there: "Note also that from Joomla 4.4.1 and 5.0.1 each value can only be one line of text. Hard linebreaks invalidate the whole language definition file." Others who know more like you may edit the section about the PHP INI parser.

@dryabov
Copy link
Collaborator

dryabov commented Jan 8, 2024

The PR #245 comes with all the necessary checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants