Closed Bug 1616056 Opened 6 years ago Closed 6 years ago

Enable trim by default in transforms

Categories

(Localization Infrastructure and Tools :: Fluent Migration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: flod, Assigned: stas)

Details

Attachments

(1 file)

Right now we only trim whitespaces when the parameter is explicitly set. The reality is that we should keep whitespaces in very few cases (DTD concatenations), most of the whitespaces existing in localizations are artifact from old tools (Pootle).

A few examples
https://hg.mozilla.org/l10n-central/mai/log?rev=Remove+unnecessary+leading%2Ftrailing+whitespaces

https://hg.mozilla.org/l10n-central/ml/log?rev=Remove+unnecessary+leading%2Ftrailing+whitespaces

It also affects .properties, not just DTDs.

We should set the default for trimming to True, and update the documentation accordingly.

Any chance to prioritize this? It should be relatively straightforward to fix, and right now I'm just fixing migration by hands after they land (no point in teaching folks to add trim, when we plan to remove it).

I started looking into this last week. The change to the implementation is simple, but there are a lot of tests which will need to be updated, as well as the docs.

(In reply to Staś Małolepszy :stas from comment #2)

I started looking into this last week. The change to the implementation is simple, but there are a lot of tests which will need to be updated, as well as the docs.

How does this get prioritized against other bugs in this component? When will you have cycles to address it?

Bugzilla's "triage owner" being me is misleading and doesn't represent the status quo. I haven't worked on fluent-migration for a long time now, and I don't feel comfortable making prioritization calls. I looked at this bug because Flod raised the issue during the weekly call, and I wanted to help out.

Fair enough. Who is the owner? Is that Zibi? The shortlog would have me believe that it's Axel. Is that accurate? https://hg.mozilla.org/l10n/fluent-migration/shortlog

Yeah, I'll defer to Axel for prioritization. I'll be happy to make time for fixing this even tomorrow morning, if it's urgent.

Assignee: nobody → stas
Status: NEW → ASSIGNED

I'm concerned about the API surface. I know we're talking about just a single boolean param, but ideally, we'd want to write our tests twice to account for the different possible values of trim. Pike suggested that CONCAT could disable trim automatically, and I think this would be a good path forward if we could completely remove trim along the way. Flod pointed out that there are rare strings like https://transvision.flod.org/string/?entity=browser/browser/sanitize.ftl:clear-time-duration-prefix.value&repo=gecko_strings which aren't concatenated but which still require trim, which made me think that we can't remove trim completely.

Today, I've had two more thoughts:

  • If those edge-cases are very rare, would fixing them manually be an acceptable alternative to keeping the trim parameter?
  • Or, would CONCAT(COPY(...)) be an acceptable solution for the one-offs, assuming that CONCAT would always turn trim to False? It would be the same as COPY(..., trim=False).
Flags: needinfo?(l10n)
Flags: needinfo?(francesco.lodolo)

(In reply to Staś Małolepszy :stas from comment #8)

  • If those edge-cases are very rare, would fixing them manually be an acceptable alternative to keeping the trim parameter?

What are you thinking of with "fix manually"?

  • Or, would CONCAT(COPY(...)) be an acceptable solution for the one-offs, assuming that CONCAT would always turn trim to False?

Are we using trim on CONCAT() at all?
https://github.com/flodolo/fluent-migrations/search?q=CONCAT&unscoped_q=CONCAT

I think we normally concatenate text elements, or COPY() and REPLACE() where we can set trim.

Flags: needinfo?(francesco.lodolo)

(In reply to Francesco Lodolo [:flod] from comment #9)

What are you thinking of with "fix manually"?

I meant the same process as in https://hg.mozilla.org/l10n-central/mai/log?rev=whitespace. I understand this isn't optimal, but I'm trying to gauge how often we'd need these manual interventions. If it's a once-every-quarter event, then I'd like to suggest we consider that as an option, too.

Are we using trim on CONCAT() at all?
https://github.com/flodolo/fluent-migrations/search?q=CONCAT&unscoped_q=CONCAT

I think we normally concatenate text elements, or COPY() and REPLACE() where we can set trim.

What I meant was that we could remove trim completely from the public API, and then:

  • COPY always sets trim=True internally.
  • CONCAT always changes its children to trim=False.
  • Consequently, CONCAT(COPY) would disable trimming in the COPY (just like COPY(..., trim=False) does right now), which could be used as a solution for weird one-offs.

(In reply to Staś Małolepszy :stas from comment #10)

(In reply to Francesco Lodolo [:flod] from comment #9)

What are you thinking of with "fix manually"?

I meant the same process as in https://hg.mozilla.org/l10n-central/mai/log?rev=whitespace. I understand this isn't optimal, but I'm trying to gauge how often we'd need these manual interventions. If it's a once-every-quarter event, then I'd like to suggest we consider that as an option, too.

Then no, it's not an option. We're talking about manually changing strings for 150 repositories, the whitespace issues used to happen to maybe 4 or 5, occasionally.

What I meant was that we could remove trim completely from the public API, and then:

  • COPY always sets trim=True internally.
  • CONCAT always changes its children to trim=False.
  • Consequently, CONCAT(COPY) would disable trimming in the COPY (just like COPY(..., trim=False) does right now), which could be used as a solution for weird one-offs.

That seems really not intuitive, I'd personally prefer keeping the trim parameter.

Thanks for getting the patch up. It's interesting to see the impact that inheritance has here.

Also, I think the patch clarifies that we really need he special handling on CONCAT.

I personally think that we should keep the power of trim, but with better defaults than we used to.

I agree that the testing of this is uncomfortable. I want to dig in a bit deeper on that later, let's let quarantine figure out when.

Keeping the NI open on that.

(In reply to Francesco Lodolo [:flod] from comment #11)

Then no, it's not an option. We're talking about manually changing strings for 150 repositories, the whitespace issues used to happen to maybe 4 or 5, occasionally.

Thanks, it's good to know that this isn't an option.

That seems really not intuitive, I'd personally prefer keeping the trim parameter.

The point I'm trying to make is that a solution for very rare situations doesn't need to be intuitive. It needs to be available :)

(In reply to Axel Hecht [:Pike] from comment #12)

Also, I think the patch clarifies that we really need he special handling on CONCAT.

Yes, at first I suggested being explicit and requiring trim=False in all CANCATs, but this patch made me realize that this would be error-prone.

I agree that the testing of this is uncomfortable. I want to dig in a bit deeper on that later, let's let quarantine figure out when.

If we keep trim, we'd probably want to have two (or three?) variants of most tests: default, trim=False, trim=True. When transforms are nested, we end up with a quadratic number of test cases.

The CONCAT(COPY()) idea has the benefit of applying the same semantics consistently in every situtation. We'd just need to test COPY, REPLACE etc. once, assuming trim=True, and test CONCAT once as well, knowing that it flips trim for its children.

(In reply to Staś Małolepszy :stas from comment #13)

If we keep trim, we'd probably want to have two (or three?) variants of most tests: default, trim=False, trim=True. When transforms are nested, we end up with a quadratic number of test cases.

Sorry, that's not correct. I forgot CONCAT itself doesn't take kwargs. So it's not quadratic, but rather linear (x2 or x3 for some test files).

I think we should trust unit tests here. Test that transforms do what they do, with default trim. Test that COPY handles trim right. Test that CONCAT treats trim right (not even by transforming, but just by inspecting c.elements).

Flags: needinfo?(l10n)

Let me ask a different question, and hopefully we can make a decision here and land this :)

What are the advantages of keeping trim compared to my CONCAT(COPY()) proposal, given that—IIUC—there has so far been a single case where an explicit COPY(trim=False) outside CANCAT would have been useful?

I'm not fond of CONCAT(COPY()). I think it's hard to explain. Or to rephrase that, my brain considers that to be a bug with side-effects, not a feature. We also declared CONCAT to be a persona non grata in transforms_from.

I updated the patch in Phabricator to use None as the default for trim. CONCAT sets it to False, unless it's been explicitly set in COPY to a non-None value.

I filed bug 1637176 and bug 1637182 about the issues I found when working on this.

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: