Make migrate.transforms.REPLACE work for more variants of gecko placable formats

RESOLVED FIXED

Status

defect
RESOLVED FIXED
3 years ago
9 months ago

People

(Reporter: Pike, Assigned: Pike)

Tracking

(Blocks 1 bug)

Details

(Whiteboard: [gecko-l20n])

Attachments

(2 attachments)

REPLACE currently is very limited to literal matches of placables in the source code to create results.

We'll need to enhance that to work with all the various flavors of placables in particular in properties.

Taking this out of the initial work on the python module in bug 1316640 to get to something landable.
What formats should we support? I'm aware of StringBundle.getFormattedString's %S (and %1$S etc.). I also saw #1, #2 in files I worked with. What are those? Do they get replaced manually in the calling code?

Right now REPLACE takes a dict of literal specifiers to target FTL expression. I wonder if we should also allow an iterable of FTL expressions as an argument. The order would matter and we'd run a logic similar to that of StringBundle.getFormattedString to replace %S etc.
I can think of at least:
* %1$S, %S, $0.S
* %d (devtools)
* {{variable}} (PDF Viewer but also devtools)
* $BrandShortName (in installer properties)
Thanks. Am I right thinking that these:

> * {{variable}} (PDF Viewer but also devtools)
> * $BrandShortName (in installer properties)

Are fully handled by a map of replacements as implemented right now?

IIUC, the challenge is with the %S specifiers which may read %S in en-US but %2$S in the localization. This breaks in the map approach.
(In reply to Staś Małolepszy :stas from comment #3)
> Thanks. Am I right thinking that these:
> 
> > * {{variable}} (PDF Viewer but also devtools)
> > * $BrandShortName (in installer properties)
> 
> Are fully handled by a map of replacements as implemented right now?

I can't answer about that, I'm not really familiar with the migration code.

To answer the other part of your question, #1, #2 etc. are normally used in plural forms
https://transvision.mozfr.org/?recherche=%231&repo=gecko_strings&sourcelocale=en-US&locale=en-US&search_type=strings_entities

But there seem to be some weird cases, where those placeholders are replaced
https://hg.mozilla.org/l10n/gecko-strings/file/default/browser/chrome/browser/browser.properties#l404
https://hg.mozilla.org/l10n/gecko-strings/file/default/mobile/android/chrome/browser.properties#l69
http://searchfox.org/mozilla-central/rev/1ebd2eff44617df3b82eea7d2f3ca1b60cc591a0/mobile/android/chrome/content/browser.js#5139
(In reply to Francesco Lodolo [:flod] from comment #4)

> I can't answer about that, I'm not really familiar with the migration code.

The current code will look for a literal string "{{variable}}" or "$BrandShortName" and will replace it with some FTL expression as defined in teh migration.

> But there seem to be some weird cases, where those placeholders are replaced
> https://hg.mozilla.org/l10n/gecko-strings/file/default/browser/chrome/
> browser/browser.properties#l404
> https://hg.mozilla.org/l10n/gecko-strings/file/default/mobile/android/chrome/
> browser.properties#l69
> http://searchfox.org/mozilla-central/rev/
> 1ebd2eff44617df3b82eea7d2f3ca1b60cc591a0/mobile/android/chrome/content/
> browser.js#5139

Thanks for these examples. As long as localizers can't do anything funky to the specifier, i.e. change #1 to #1s or something, the mapping will work fine. The developer will say: replace #1 with FTL.Placeable(...) and in all locales, #1 will be found and replaced.

My understanding is that the above is not true for %S which can be spelled differently, e.g. as %1$S, in the localized files.
(In reply to Staś Małolepszy :stas from comment #5)
> My understanding is that the above is not true for %S which can be spelled
> differently, e.g. as %1$S, in the localized files.

Yes. Localizers are free to swap ordered/unordered placeholders, or use the zero-width trick to silence compare-locales warnings.
https://transvision.mozfr.org/?recherche=0.S&repo=gecko_strings&sourcelocale=en-US&locale=it&search_type=strings_entities
https://transvision.mozfr.org/?recherche=%251%240.S&repo=gecko_strings&sourcelocale=en-US&locale=it&search_type=strings_entities
Thanks again, this is exactly the kind of input I needed. This looks pretty complicated, too. Can there be both a %S and %1$S in one string?
(In reply to Staś Małolepszy :stas from comment #7)
> Thanks again, this is exactly the kind of input I needed. This looks pretty
> complicated, too. Can there be both a %S and %1$S in one string?

No, that's the only limit: you can't mix ordered and unordered placeholders in the same string.
Posted patch TestsSplinter Review
I wrote a few tests to assert my understanding of the problem. Would you be able to take a look at them and check if they look right?

The setUp method mocks the legacy localized content from which the migration will source translations. Then, each test defines a migration spec (msg = ...) which in real-life would be written by the developer or l10n-drivers.
Assignee: nobody → stas
Status: NEW → ASSIGNED
Attachment #8923789 - Flags: feedback?(l10n)
Attachment #8923789 - Flags: feedback?(francesco.lodolo)
Comment on attachment 8923789 [details] [diff] [review]
Tests

Review of attachment 8923789 [details] [diff] [review]:
-----------------------------------------------------------------

It seems to make sense to me.

::: tests/migrate/test_replace.py
@@ +152,5 @@
> +    def setUp(self):
> +        self.strings = parse(PropertiesParser, '''
> +            implicit = %S %S
> +            explicit = %2$S %1$S
> +            zero-width-implicit = %S%0.S

Add space: %S %0.S

@@ +153,5 @@
> +        self.strings = parse(PropertiesParser, '''
> +            implicit = %S %S
> +            explicit = %2$S %1$S
> +            zero-width-implicit = %S%0.S
> +            zero-width-explicit = %2$S%1$0.S

Add space here too (consistency with the non zero width example): %2$S %1$0.S

@@ +212,5 @@
> +
> +        self.assertEqual(
> +            evaluate(self, msg).to_json(),
> +            ftl_message_to_json('''
> +                implicit = { $one } %S

Possibly unrelated to the tests, but we should raise a warning if we find ourselves in this kind of situation during migration.
Attachment #8923789 - Flags: feedback?(francesco.lodolo) → feedback+
I modeled the %0.S examples after https://transvision.mozfr.org/?recherche=0.S&repo=gecko_strings&sourcelocale=en-US&locale=it&search_type=strings_entities. If we add the space the end result will be "{ $one } " with the trailing whitespace.

Also, the way this will be migrated will trigger compare-locales warnings for missing placeables. I don't know how to fix it but I can file a bug for this to start the discussion.
(In reply to Staś Małolepszy :stas from comment #11)
> I modeled the %0.S examples after
> https://transvision.mozfr.org/?recherche=0.
> S&repo=gecko_strings&sourcelocale=en-
> US&locale=it&search_type=strings_entities. If we add the space the end
> result will be "{ $one } " with the trailing whitespace.

True, but normally there would be other stuff in the middle. I think it's worth having a trailing whitespace, or maybe adding a word in the middle to making the test clearer?.

> Also, the way this will be migrated will trigger compare-locales warnings
> for missing placeables. I don't know how to fix it but I can file a bug for
> this to start the discussion.

Interesting. Unrelated question: wow do I avoid using a placeable and at the same time, not having compare-locales yelling at me?
(In reply to Francesco Lodolo [:flod] from comment #12)

> Interesting. Unrelated question: wow do I avoid using a placeable and at the
> same time, not having compare-locales yelling at me?

I feel like this is a very related question :) There's no way to do it in Fluent right now. We might want to use semantic comments for this (https://github.com/projectfluent/fluent/issues/16) in the future.
Comment on attachment 8923789 [details] [diff] [review]
Tests

Review of attachment 8923789 [details] [diff] [review]:
-----------------------------------------------------------------

To add to what flod said, we shouldn't create strings with printf params (if we have a printf substitution, that is).

Actually, I think we should either:

- not migrate this string at all
- create a proper fluent string with an unknown placable, creating a corresponding fluent c-l error.

implicit = { $one } { $fixme }

could be one.
Attachment #8923789 - Flags: feedback?(l10n) → feedback+
No longer blocks: fluent.migrate
Component: Python Library → Fluent Migration
Product: L20n → Localization Infrastructure and Tools
Summary: Make l20n.migrate.transforms.REPLACE work for more variants of gecko placable formats → Make migrate.transforms.REPLACE work for more variants of gecko placable formats
From previous discussion with Stas, a good approach would be to normalize both source and target strings to use ordered arguments.

During normalization we should also remove zero-width arguments, e.g. (%0.S, %1$0.S)

Impossibile connettersi a Sync. Riprovare?%0.S
Non è stato possibile installare il motore di ricerca da: %2$S%1$0.S
Taking this. Also, not as simple as I had hoped, as bug 1496127 broke replacing numbered printfs (re.find("$") does weird things.)
Assignee: stas → l10n
%S %.0S %S -> %1$S  %3$S

We normalize un-numbered printf arguments to numbered ones,
with just keeping the spec from the original format.
If the width was '0', then the argument was hidden in printf,
replicate that by removing the parameter completely.

Side-riding fix for a regression from bug 1496127, we need to
re.escape to find literal text fragments, otherwise things
like '$' end up being regex tokens.
Landed as https://hg.mozilla.org/l10n/fluent-migration/rev/635c8a4d83251ea18f4dd1e9c8fc5850df22e121.
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.