Skip to content

Translation File Structure

Nicholas K. Dionysopoulos edited this page Dec 4, 2023 · 1 revision

Types of files

Panopticon uses two kinds of files:

  • GNU GetText PO/POT files (*.po and *.pot). These are the “source” files which translators work with. The POT file is the ‘main’ language file (English, Great Britain). The PO files are the translations.
  • Joomla!-style INI translation files. The en-GB.ini file is managed directly by the developers, and gets automatically converted to the respective POT file. The other files are generated from the respective PO files. The INI files are used to display translated text in Panopticon.

There is a reason for this duality in translations.

INI files are ridiculously fast to parse, which is the most important reason they are used (language files need to be loaded with every request). Moreover, they allow for disambiguation of homograph words, e.g. the verb ‘save’ and the noun ‘save’, or using the same adjective in different contexts which would result in a different translation for some languages with gender and / or plural differentiations (e.g. ‘saved’ could be translated in Greek as «αποθηκευμένος», «αποθηκευμένη», «αποθηκευμένο», «αποθηκευμένοι», «αποθηκευμένες», or, «αποθηκευμένες» depending on singular / plural context, and the gender of the noun the word refers to). Finally, INI files allow for easier handling of plural forms than the respective function in GNU GetText.

The downside of INI files is that there is virtually no tooling for translating them. You'd need to use a text editor, and you'd need to locate changed, added, and removed translations yourself. This is quite problematic, leading to either a very disorganised translation file, or problematic translations with missing and / or superfluous language strings.

GNU GetText PO files are great for managing translations as there are plenty of tools to manage them, including Poedit. You don't need a text editor to edit them, and you won't miss any changed / added / removed strings. However, these files are normally keyed on the source language's natural language strings which do not allow for disambiguation. Furthermore, they are quite finicky when you are trying to manage plural forms in languages with more than one plural form (e.g. ecclesiastic Russian). Finally, they need to be converted to Machine Object (MO) files, and using these files is both cumbersome and very slow, making them fairly unsuitable for use in web software.

We are getting the best of both worlds by using both formats, and keying the PO translations against the INI language file's language key, using the PO file's optional message context (msgctxt) field. This is an approach that, to the best of our knowledge, has never been tried before by anybody else, and it works like a charm. We have managed to use easily translatable PO files to create the far more expressive, powerful and efficient INI language files using nothing but a bit of code and a generous amount of creativity.

Location

All translation files are located in the languages folder of the code repository.

File naming

The name of the translation source PO file is the language code followed by the extension .po. For example, the translation source PO file for Greek as spoken in Greece is el-GR.po.

The name of the translation INI file is the language code followed by the extension .ini. For example, the translation INI file for Greek as spoken in Greece is el-GR.po.

The language code follows the Joomla! naming convention. It consists of the lowercase ISO 639-1 language code, followed by a dash, followed by the uppercase ISO 3166-1 alpha2 country code. For example, the language code for German as spoken in Germany is de-DE, whereas German as spoken in Austria is de-AT.

⚠️ The country code SHOULD be one of the “Officially assigned code elements”. Using the “Reserved code elements” might confuse users.

Alternative language naming

It is possible to have a language code which consists only of the lowercase ISO 639-1 language code. For example, el for the Greek language, and de for the German language.

This naming form is considered deprecated, and support for it has been removed from Panopticon's interface.

The reason for this decision is that the same word, or phrase, in the same language may be formal and acceptable in one country, but may have a completely different or even derogatory meaning in the dialect of the language spoken in a different country. As a result, we discourage having a translation for a language without specifying which country's dialect of the language it refers to.

Difference from GNU GetText file naming

The GNU GetText handbook describes naming the translation files using only of the lowercase ISO 639-1 language code. For example, el for the Greek language, and de for the German language.

As explained above, we strongly discourage it and we no longer support it for the reasons explained in the section above, even if your language is only spoken in a single country (e.g. Hungarian).

INI file structure

The INI language format structure consists of zero or more lines in the form

TRANSLATION_KEY="Human-readable string"

The TRANSLATION_KEY is always in uppercase and consists of any combination of the English letters A-Z, underscore, and the numbers 0-9. It may not start with a underscore, or a number. Any other characters are considered invalid and are not guaranteed to be supported. The TRANSLATION_KEY MUST NOT have the same name as a PHP or environment constant.

The TRANSLATION_KEY is followed by zero or more whitespace (space, horizontal tab) characters, followed by an equals sign character (=), followed by zero or more whitespace (space, horizontal tab) characters, followed by the translated string.

The "Human readable string" is the translated text. It MUST be surrounded by double quotes ("). The translated text MAY have C-style escaped sequences, most commonly \n for newlines. The translated text MUST NOT have verbatim new line characters (0x10 or 0x13).

⚠️ Even though you can technically specify the translated text without surrounding it by double quotes, this practice is DISCOURAGED and may not be supported in the future.

⚠️ DO NOT use "_QQ_" or \" in lieu of a double quote (") in the human-readable text. It is unnecessary, and it confuses the translation scripts. Use a literal double quote (") character instead.

Lines starting with a semicolon (;) are comments and are completely ignored.

Empty lines are completely ignored as well.

You MAY include sections (e.g. [Some section]) in your INI files. Section lines are ignored when parsing the file. If you use them, it's for your own reference. Panopticon's main language (English, Great Britain) DOES NOT use sections.

Plural Forms

Sometimes you want to express the same thing for zero, one, or more countable objects. This is managed using translation keys expressing plural forms.

First, you have the canonical representation of multiple items, let's say EXAMPLE_HAVE_ITEMS:

EXAMPLE_HAVE_ITEMS="I have %d items"

The %d (or %u) in those strings is a PHP sprintf() format directive where %d means “any integer number” and %u means “any positive integer number”.

Then, you need the specialisation for specific number of parameters by appending an underscore, and the corresponding number to the translation key. So, to create a special translation for exactly one item you can do:

EXAMPLE_HAVE_ITEMS_1="I have one item"

Likewise, to create a translation key for zero items you can do:

EXAMPLE_HAVE_ITEMS_0="I have no items"

Note that the %d and %u format string directives are optional. If you have a known quantity you can spell it out. In most languages this makes sense when you have exactly zero, or exactly one item. Of course, this doesn't mean that you are limited to _1 and _0 items! You could have a special formatting string for 12 and 24:

EXAMPLE_HAVE_ITEMS_12="I have a dozen items"
EXAMPLE_HAVE_ITEMS_24="I have two dozen items"

The pluralisation code in Akeeba Web Framework, the underlying framework of Panopticon, is dead simple:

  • Do I have a string for this specific number of items? Use it.
  • Otherwise, use the string without the specifier for the number of items.

PO/POT file structure

The structure of PO/POT files is explained in GNU GetText's documentation.

The only thing of note is that Panopticon treats the three main entries of each language string slightly differently than GNU GetText proper:

  • msgid (the original English string) is completely ignored.
  • msgctxt (the translation ‘context’) is the TRANSLATION_KEY of the INI language file.
  • msgid is the translated string, just like regular GNU GetText.

Unlike GNU GetText, Panopticon's translation files DO NOT allow plural forms using msgid_plural and msgstr[0], msgstr[1], etc. Instead, plural forms are managed using the value of the msgctxt. See the Plural Forms section above.

Language name

There are two special strings in each translation file with the keys LANGUAGE_NAME_IN_ENGLISH and LANGUAGE_NAME_TRANSLATED. These strings MUST be translated to reflect the language you are translating into.

For example, when translating to German (Germany) you should use:

LANGUAGE_NAME_IN_ENGLISH="German - Germany"
LANGUAGE_NAME_TRANSLATED="Deutsch - Deutschland"

DO NOT leave these keys untranslated. DO NOT make these keys read a different language than what you are translating into. Failure to follow this instruction will lead to the rejection of the Pull Request. Remember, humans rely on the content of these language keys top figure out which language file they need to use.

Clone this wiki locally