Closed Bug 1453557 Opened 6 years ago Closed 4 years ago

It should be possible to migrate strings using FTL files as source

Categories

(Localization Infrastructure and Tools :: Fluent Migration, enhancement, P3)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: flod, Assigned: Pike)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

Right now migration recipes allow to migrate strings from .properties and .DTDs to FTL. 

We should extend support to allow FTL files to be used as source, to move strings around as needed.
Full-on FTL to FTL migrations are significantly harder to do then DTD/properties to FTL because Fluent messages are asymmetric. This design makes it hard (or impossible) to reliably build new messages by taking parts of existing messages. OTOH it's likely that we don't actually need the advanced features of DTD/properties migrations. For instance, I don't expect the PLURALS transform to be useful.

That said, as soon as we have some migration support for FTL->FTL, we're likely to see requests to allow renaming external arguments, renaming message references etc. I'd like to discuss this bug in three layers of scope, in order of feasibility:

  1. Moving FTL messages between files, without making any changes to them.
  2. Building new FTL messages out of unmodified values or attributes existing FTL messages.
  3. Building new FTL messages by modifying values and attributes of existing FTL messages.

A few thoughts:

  - I think we should focus on #1 for now.
  - Is #2 even useful without #3?
  - #3 is the hardest and I admit I'm not sure to what extent it should allow modifying the existing messages.
    - Scanning the AST for all occurrences of ExternalArgument or MessageReference is easy and thus, simple renames should be OK to do.
    - In bug 1453480, Zibi would like to change

          search-results-need-help = Need help? Visit <a>{ -brand-short-name } Support</a>

      to

          search-results-need-help = Need help? Visit <a data-l10n-name="url">{ -brand-short-name } Support</a>

      This would require some search & replace in TextElements functionality.
    - More complex transformations, in particular those changing the shape of messages, shouldn't be possible. For instance, it shouldn't be possible to transform a simple text value into a SelectExpression because some translations might already be using SelectExpressions in the source message. We can't make any assumption about the shape of the AST in Fluent.
I think we should be use-case driven here. If we don't have a use-case for #1, maybe it's not worth focusing on? As in, if #3 is what we need, but we don't have resources to get there, maybe we shouldn't do #1 either for now.

We should look at this in the light of bug 1452107, which talks about making recipes look like ftl files. The challenge there is already about plurals, if we talk about AST transforms, things don't get much prettier.

That said, a few thoughts:

We can make pretty strong reasoning of the shape of a message we're transforming. In the end, we know it's a pattern, and that means that in the generated message, we can stuff the origin in a variant. That might not look pretty, but it's a valid message. We might end up doing a select expression on OS inside a select expression on OS, but that's OK. In a second step, we could think about ways to canonicalize messages, but I think that's an option.

If we're talking about transforming content in the message, we need a better way to get to it. This immediately brings me back to fluent.query or so ;-)
(In reply to Axel Hecht [:Pike] from comment #2)
> I think we should be use-case driven here. If we don't have a use-case for
> #1, maybe it's not worth focusing on? 

We already have at least two use cases:
* For #1, bug 1453486.
* For #3, bug 1453480.
Thanks Stas for a great intro to the problem scope.

I agree that #1 is the most important one (and likely the easiest?). Having this ability would alleviate an important function that l10n plays in preventing code refactors which I find very concerning.

For #2/#3 I agree with Pike - we should aim for case-by-case simple functionality. I don't have a good vision for how the API should look like, but I agree that it should not attempt to operate on the message shape (since its asymmetric) and instead perform tasks, such as:
 - search_and_replace in the message - scan the value/attribute walking all branches for FTL.TextElement that contains the search part and replace.
 - similar search_and_replace for external argument names
 - remove an attribute
 - add an attribute

Just that would be a great start
Most of our code-refactor requests are well outside of the fluent scope, like Android DTDs and such. To me, #1 isn't a good use-case for fluent.migrate.

I think that the #1 use-case should be handled outside of fluent.migrate, and try to build on compare-locales and the learnings from x-channel merge for serialization.
> Most of our code-refactor requests are well outside of the fluent scope

I don't understand. We just started migrating Firefox to Fluent so I don't know how fluent code-refactor requests could have been accrued in the past.

Looking into the future, as we migrate more of Firefox to Fluent, we have a chance to make the Fluent ecosystem in Firefox be free of that burden or maintain the status quo from DTD/properties.

Are you saying that in the past DTD/properties requests in Firefox weren't common? Because my experience of migrating Preferences indicates that a lot of the code around l10n seemed to be an atavism from the old Preferences structure which was then refactored to the new one (css+js+xul+xbl) except for everything around l10n due to lack of #1.

And that results in a pretty ugly tech debt that we now remove with Fluent migration, but will loose after that unless we solve #1 in the context of Firefox.
We're not going to benefit from fluent.migrate for refactorings in strings.xml for example. We're not going to benefit from fluent.migrate in XLIFF for iOS projects. Who knows what we'll end up with in mozilla.org, but for now, .lang is not going to benefit. And the way that fluent.migrate is designed, it won't adapt to those. The way compare-locales is designed, they will, as we grow formats supported by compare-locales.
(In reply to Axel Hecht [:Pike] from comment #5)
> Most of our code-refactor requests are well outside of the fluent scope,
> like Android DTDs and such. To me, #1 isn't a good use-case for
> fluent.migrate.
> 
> I think that the #1 use-case should be handled outside of fluent.migrate,
> and try to build on compare-locales and the learnings from x-channel merge
> for serialization.

If I understand correctly, you're not saying "let's not do this", but "let's do it elsewhere". Is it correct?

I honestly don't have a strong preference on where to do it, as long as we do, but it still sounds really confusing that we migrate X to FTL in fluent-migrate, and Y/W/Z in a different place.
Thanks Axel! I consulted with Stas on this and he explained the value of switching to x-channel algorithm a bit more to me. That sounds good.

My main concern now is timing. I don't have visibility into how much time it would take to add #1 in fluent.migrate vs. if we blocked on x-channel algorithm.
I'd like us to be able to take such things into account when discussing the value.

If adding #1 to fluent.migrate is a simple change (which I naively would hope it is, but don't know), and building #1 on top of compare-locales is a long term serious effort, then I'd like to suggest getting the FTL-only #1 first.
If, on the other hand, #1 in fluent.migrate is also a short to high volume of work, that will not end up being savagable when we switch to compare-locales, then I agree that it's not worth it.

In other words. From the perspective of Firefox engineering, getting #1 ASAP has a much higher value than getting the same for other formats and other operations (like search_and_replace).
I do think that doing #1 is a significant amount of work. Doing it wrong will break #2 and #3, and doing it outside of ftl, so we'll need to sketch at least how to do the other things. My assumption is that that requires face to face time of me and stas.

I also think that it's not urgent right now, as for the use-cases we have right now, we've been able to migrate from DTD/properties again.
We hashed it out on IRC with Axel. According to https://wiki.mozilla.org/Bug_Triage/Process/Triage#Weekly_or_More_Frequently_.28depending_on_the_component.29 this falls into P3 because we don't have capacity to handle that in any short timeframe either by extending fluent.migrate or via compare-locales.

If I observe an increase of scenarios where #1 is needed for FTL only in the meantime, I may have to write a python-fluent one-off script to just move messages between files on my own.
Priority: -- → P3
Let me put on record that re-migrating strings doesn't come for free. 

It involves me writing scripts, comparing existing FTL values with the result of the re-migration, and fixing strings in Pontoon. 

All that in order to avoid asking localizers to fix strings a second time, if they didn't fix the original strings in DTDs and properties after the first migration (they should have, but I can't run after them with a stick). And it's not uncommon to see fixes, because migrations resurface strings that haven't been looked at for 10 years (I found a typo myself).

It really bothers me that all discussions seem to ignore that, just because it doesn't come out as "let's find resources to code this". 

That's the second time we need this in a week, and I have no way to tell if that's an exception.
I totally agree with you Flod. I'm sorry we didn't put it more explicitly but I believe in the whole discussion we do recognize the cost of lack of FTL->FTL migration, and the per-case cost of using current fluent.migrate to workaround missing components of our stack.

From what Axel is saying, this is not a new problem, it's been raised and requested by the engineering teams for a long time nad we never managed to address it. The only difference is that we now have a new l10n system, and expectations for it are higher including "this will resolve all the major woes from the past" so it's additionally disappointing when we encounter cases where we don't have resources to properly prioritize it and address.

> That's the second time we need this in a week, and I have no way to tell if that's an exception.

It is not. The whole Fluent migration is going to be filled with such refactors, and quite frankly, would also be much, much easier if we would have the FTL->FTL migrations because now every decision in DTD->FTL is a huge one because it sets in stone things we may not fully understand yet, but we won't be able to fix later.
I already see places where the code moved around and l10n stayed because the lack of it and now I'm in the pickle because I'm migrating the scope of the code, but some strings should really be in different places.

I'm afraid that there's a point at which we'll have to consider the value of preserving a couple strings vs. the cost of accruing technical debt by keeping a those l10n strings in a wrong component linked to the new component.
And I don't think that historically we've done a great job at being able to balance those value/cost scenarios, so I'm worried about opening up that pandora box, but, like you, I see the collision inevitable and coming at us much sooner than we would like.
Depends on: 1456499

For the record, we have use cases for #2 (from comment 1): Building new FTL messages out of unmodified values or attributes existing FTL messages.

In both cases we'd need to move a string from a value attribute to the message value (as is)
https://phabricator.services.mozilla.com/D18489
https://phabricator.services.mozilla.com/D17964

I would expect this to happen more frequently as we move more content to Fluent.

We have another couple of cases
https://phabricator.services.mozilla.com/D23950 (4 strings)

Bug 1532651 doesn't have a patch, but they're discussing about changing all checkboxes…

Bug 1536507 resolved our initial story #2, that is, we can copy patterns around, and with that, also copy Messages. We can't copy Terms, as they have unknown attributes.

Is that good enough to resolve this bug?

We might still need some sort of REPLACE()? For example, if we decide to change the syntax for overlays, we're pretty much losing all existing strings.

This only has the bare core in transforms, with examples on
how to use it in test_transform_pattern.

We should collect some actual examples in practice before
declaring some of them to be an actual API. In the meantime,
recipes can do what the tests do, and just implement the
NodeTransformer bits to do their magic.

Assignee: nobody → l10n
Status: NEW → ASSIGNED

Gonna cut a release with this soon.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: