Enable trim by default in transforms
Categories
(Localization Infrastructure and Tools :: Fluent Migration, task)
Tracking
(Not tracked)
People
(Reporter: flod, Assigned: stas)
Details
Attachments
(1 file)
Right now we only trim whitespaces when the parameter is explicitly set. The reality is that we should keep whitespaces in very few cases (DTD concatenations), most of the whitespaces existing in localizations are artifact from old tools (Pootle).
A few examples
https://hg.mozilla.org/l10n-central/mai/log?rev=Remove+unnecessary+leading%2Ftrailing+whitespaces
https://hg.mozilla.org/l10n-central/ml/log?rev=Remove+unnecessary+leading%2Ftrailing+whitespaces
It also affects .properties, not just DTDs.
We should set the default for trimming to True, and update the documentation accordingly.
| Reporter | ||
Comment 1•6 years ago
|
||
Any chance to prioritize this? It should be relatively straightforward to fix, and right now I'm just fixing migration by hands after they land (no point in teaching folks to add trim, when we plan to remove it).
| Assignee | ||
Comment 2•6 years ago
|
||
I started looking into this last week. The change to the implementation is simple, but there are a lot of tests which will need to be updated, as well as the docs.
Comment 3•6 years ago
|
||
(In reply to Staś Małolepszy :stas from comment #2)
I started looking into this last week. The change to the implementation is simple, but there are a lot of tests which will need to be updated, as well as the docs.
How does this get prioritized against other bugs in this component? When will you have cycles to address it?
| Assignee | ||
Comment 4•6 years ago
|
||
Bugzilla's "triage owner" being me is misleading and doesn't represent the status quo. I haven't worked on fluent-migration for a long time now, and I don't feel comfortable making prioritization calls. I looked at this bug because Flod raised the issue during the weekly call, and I wanted to help out.
Comment 5•6 years ago
|
||
Fair enough. Who is the owner? Is that Zibi? The shortlog would have me believe that it's Axel. Is that accurate? https://hg.mozilla.org/l10n/fluent-migration/shortlog
| Assignee | ||
Comment 6•6 years ago
•
|
||
Yeah, I'll defer to Axel for prioritization. I'll be happy to make time for fixing this even tomorrow morning, if it's urgent.
| Assignee | ||
Comment 7•6 years ago
|
||
Updated•6 years ago
|
| Assignee | ||
Comment 8•6 years ago
|
||
I'm concerned about the API surface. I know we're talking about just a single boolean param, but ideally, we'd want to write our tests twice to account for the different possible values of trim. Pike suggested that CONCAT could disable trim automatically, and I think this would be a good path forward if we could completely remove trim along the way. Flod pointed out that there are rare strings like https://transvision.flod.org/string/?entity=browser/browser/sanitize.ftl:clear-time-duration-prefix.value&repo=gecko_strings which aren't concatenated but which still require trim, which made me think that we can't remove trim completely.
Today, I've had two more thoughts:
- If those edge-cases are very rare, would fixing them manually be an acceptable alternative to keeping the
trimparameter? - Or, would
CONCAT(COPY(...))be an acceptable solution for the one-offs, assuming thatCONCATwould always turntrimtoFalse? It would be the same asCOPY(..., trim=False).
| Reporter | ||
Comment 9•6 years ago
|
||
(In reply to Staś Małolepszy :stas from comment #8)
- If those edge-cases are very rare, would fixing them manually be an acceptable alternative to keeping the
trimparameter?
What are you thinking of with "fix manually"?
- Or, would
CONCAT(COPY(...))be an acceptable solution for the one-offs, assuming thatCONCATwould always turntrimtoFalse?
Are we using trim on CONCAT() at all?
https://github.com/flodolo/fluent-migrations/search?q=CONCAT&unscoped_q=CONCAT
I think we normally concatenate text elements, or COPY() and REPLACE() where we can set trim.
| Assignee | ||
Comment 10•6 years ago
|
||
(In reply to Francesco Lodolo [:flod] from comment #9)
What are you thinking of with "fix manually"?
I meant the same process as in https://hg.mozilla.org/l10n-central/mai/log?rev=whitespace. I understand this isn't optimal, but I'm trying to gauge how often we'd need these manual interventions. If it's a once-every-quarter event, then I'd like to suggest we consider that as an option, too.
Are we using trim on CONCAT() at all?
https://github.com/flodolo/fluent-migrations/search?q=CONCAT&unscoped_q=CONCATI think we normally concatenate text elements, or COPY() and REPLACE() where we can set trim.
What I meant was that we could remove trim completely from the public API, and then:
COPYalways setstrim=Trueinternally.CONCATalways changes its children totrim=False.- Consequently,
CONCAT(COPY)would disable trimming in theCOPY(just likeCOPY(..., trim=False)does right now), which could be used as a solution for weird one-offs.
| Reporter | ||
Comment 11•6 years ago
|
||
(In reply to Staś Małolepszy :stas from comment #10)
(In reply to Francesco Lodolo [:flod] from comment #9)
What are you thinking of with "fix manually"?
I meant the same process as in https://hg.mozilla.org/l10n-central/mai/log?rev=whitespace. I understand this isn't optimal, but I'm trying to gauge how often we'd need these manual interventions. If it's a once-every-quarter event, then I'd like to suggest we consider that as an option, too.
Then no, it's not an option. We're talking about manually changing strings for 150 repositories, the whitespace issues used to happen to maybe 4 or 5, occasionally.
What I meant was that we could remove
trimcompletely from the public API, and then:
COPYalways setstrim=Trueinternally.CONCATalways changes its children totrim=False.- Consequently,
CONCAT(COPY)would disable trimming in theCOPY(just likeCOPY(..., trim=False)does right now), which could be used as a solution for weird one-offs.
That seems really not intuitive, I'd personally prefer keeping the trim parameter.
Comment 12•6 years ago
|
||
Thanks for getting the patch up. It's interesting to see the impact that inheritance has here.
Also, I think the patch clarifies that we really need he special handling on CONCAT.
I personally think that we should keep the power of trim, but with better defaults than we used to.
I agree that the testing of this is uncomfortable. I want to dig in a bit deeper on that later, let's let quarantine figure out when.
Keeping the NI open on that.
| Assignee | ||
Comment 13•6 years ago
•
|
||
(In reply to Francesco Lodolo [:flod] from comment #11)
Then no, it's not an option. We're talking about manually changing strings for 150 repositories, the whitespace issues used to happen to maybe 4 or 5, occasionally.
Thanks, it's good to know that this isn't an option.
That seems really not intuitive, I'd personally prefer keeping the trim parameter.
The point I'm trying to make is that a solution for very rare situations doesn't need to be intuitive. It needs to be available :)
(In reply to Axel Hecht [:Pike] from comment #12)
Also, I think the patch clarifies that we really need he special handling on
CONCAT.
Yes, at first I suggested being explicit and requiring trim=False in all CANCATs, but this patch made me realize that this would be error-prone.
I agree that the testing of this is uncomfortable. I want to dig in a bit deeper on that later, let's let quarantine figure out when.
If we keep trim, we'd probably want to have two (or three?) variants of most tests: default, trim=False, trim=True. When transforms are nested, we end up with a quadratic number of test cases.
The CONCAT(COPY()) idea has the benefit of applying the same semantics consistently in every situtation. We'd just need to test COPY, REPLACE etc. once, assuming trim=True, and test CONCAT once as well, knowing that it flips trim for its children.
| Assignee | ||
Comment 14•6 years ago
|
||
(In reply to Staś Małolepszy :stas from comment #13)
If we keep
trim, we'd probably want to have two (or three?) variants of most tests: default, trim=False, trim=True. When transforms are nested, we end up with a quadratic number of test cases.
Sorry, that's not correct. I forgot CONCAT itself doesn't take kwargs. So it's not quadratic, but rather linear (x2 or x3 for some test files).
Comment 15•6 years ago
|
||
I think we should trust unit tests here. Test that transforms do what they do, with default trim. Test that COPY handles trim right. Test that CONCAT treats trim right (not even by transforming, but just by inspecting c.elements).
| Assignee | ||
Comment 16•6 years ago
|
||
Let me ask a different question, and hopefully we can make a decision here and land this :)
What are the advantages of keeping trim compared to my CONCAT(COPY()) proposal, given that—IIUC—there has so far been a single case where an explicit COPY(trim=False) outside CANCAT would have been useful?
Comment 17•6 years ago
|
||
I'm not fond of CONCAT(COPY()). I think it's hard to explain. Or to rephrase that, my brain considers that to be a bug with side-effects, not a feature. We also declared CONCAT to be a persona non grata in transforms_from.
| Assignee | ||
Comment 18•6 years ago
|
||
I updated the patch in Phabricator to use None as the default for trim. CONCAT sets it to False, unless it's been explicitly set in COPY to a non-None value.
| Assignee | ||
Comment 19•6 years ago
|
||
I filed bug 1637176 and bug 1637182 about the issues I found when working on this.
| Assignee | ||
Comment 20•6 years ago
|
||
Landed in https://hg.mozilla.org/l10n/fluent-migration/rev/0059e9cdcf81384a21fa5882e164aca0faf72107.
Description
•