Closed Bug 1659691 Opened 4 years ago Closed 3 years ago

Update cross-channel algorithm to not replay history

Categories

(Localization Infrastructure and Tools :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: flod, Assigned: mozilla)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 3 obsolete files)

Attached patch mozxchannel.patch (obsolete) — Splinter Review

As discussed in our last meeting l10n+RelEng, filing a bug to keep track of this alternative approach for cross-channel.

The general idea is to keep using cross-channel without changes, as long as it works.

Pros:

  • No ties to Mercurial's internals
  • Generally better code

Cons:

  • gecko-strings becomes useless for blame/annotate
  • It would become harder to understand what caused issues in the output

Attaching here Axel's notes, and a patch that could be resurrected in case we need to go down this road.

The history-aware code-base for l10n cross-channel needs a lot of
information on the graph and data from the mercurial source
repositories. Thus, it's written as python code directly interacting
with the internal APIs of Mercurial.

That it has close ties to the data enables it to create very meaningful
output, replaying history in the gecko-strings repository very strongly
tied to the history in mozilla-central. Being able to reason about the
generated commits in strong context of the original change it links to
makes reviewing quarantine much more efficient.

The downside is that new combinations of data can trigger crashing bugs
in the code, or unwanted output. Finding out where the problem comes
from exactly is hard. Also, updates to Mercurial require updates to the
code, though this hasn't happened since Mercurial 4.8 back in May 2019.

Another mixed bag is the split between mozilla-central and comm-central
history. With both repository families being processed independently,
it's hard to reason which files in the repository are actually needed.
As they could be needed by the other repo. This is a challenge when
dropping branches or projects.

An alternative approach is creating all content for one set of revisions
among all repositories, ignoring history. The needs here limit to
revision identification, and loading data for a file and a revision.
This can be safely done with hglib. That way we have a stable API,
though it's not terribly performant. Also, there's no need to analyze
the history graph, so it's just overall less code.

Plus, it's new code, so it's genuinely in better shape.

The downside is that changes to gecko-strings can't be traced back to
original commits immediately. One needs to investigate all revranges in
both repos and all branches to see which commit probably changed the
input responsible for the offending output.

Attached patch mozxchannel-mach-fix.patch (obsolete) — Splinter Review

I was testing out creating a comm-strings repository using this code as a starting point, and found that it does not run.
Simple fix though.

Assignee: nobody → rob
Assignee: rob → nobody
Blocks: 1713490

Will follow-up via email, but we're officially broken
https://bugzilla.mozilla.org/show_bug.cgi?id=1713490

No longer blocks: 1713490
Flags: needinfo?(mtabara)

I've tried the patch locally, out of curiosity, and it doesn't run (it gets stuck on comm-beta).

I manually cloned comm-central in /comm, but I guess there's more to it to get other branches available. Sadly, that's not obvious to me how to do it.

Blocks: 1713490

(In reply to Francesco Lodolo [:flod] from comment #3)

I've tried the patch locally, out of curiosity, and it doesn't run (it gets stuck on comm-beta).

I manually cloned comm-central in /comm, but I guess there's more to it to get other branches available. Sadly, that's not obvious to me how to do it.

OK, you manually need to pull comm-release, comm-beta and comm-esr78 to create those tags in /comm.

The diff seems OK for the mozilla-central part, can't really tell for comm-*
https://hg.mozilla.org/users/flodolo_mozilla.com/gecko-strings-test-20210530/rev/d4e73bb9e72a74a9f767886bbab07bedad803dbe

With that said, the code needs to be reviewed, and hopefully we can set up automation that pushes to quarantine without me running code locally. The only manual thing left would be periodically pushing from quarantine to the official gecko-strings repository.

Attached patch bug1659691.patchSplinter Review
Attachment #9170636 - Attachment is obsolete: true
Attachment #9222459 - Attachment is obsolete: true
Type: enhancement → defect

:flod, sounds like you've got this working for bug 1713490. a) is your current working code in the attached patch here? and b) is all that's left to do here is get review and land?

Flags: needinfo?(mtabara) → needinfo?(francesco.lodolo)

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #6)

:flod, sounds like you've got this working for bug 1713490. a) is your current working code in the attached patch here? and b) is all that's left to do here is get review and land?

Yes, the code I used is the patch attached here. It's basically the original patch rebased (removed the python2 file), and with the fix from comment 1.

Ideally, I'd like someone to look at the code, submit an actual patch, and have it reviewed by another person. That would fix the immediate problem, then we'll need to look into putting this in CI (see also email thread).

Flags: needinfo?(francesco.lodolo)

I'm playing with the latest patch. In CI, we have a few options:

  1. generate the changeset and possibly notify. A human will need to land the patch.
  2. generate the changeset and submit it to Phabricator, flagging a review group for review. A human will need to review and Lando.
  3. generate the changeset and push it to the strings repo without review.

Do we want to rule any of the above out?
(1) seems easiest to add to CI, but it can be prone to human error or bad actors. It could be an intermediate stopgap, however. (2) seems preferable in terms of security and human verification, but still requires someone to do something. (3) is completely hands-off, but we lose the ability to catch issues before they land, and we have to open the repo up to automated writes without review.

Flags: needinfo?(francesco.lodolo)
Attached patch strings.diff (obsolete) — Splinter Review

This is the patch that I generate locally. It seems to nuke most of calendar/, mail/, and suite/; did I mess something up?
Ah, looks like I have to softlink comm/en-US to ./en-US.

Flags: needinfo?(rob)
Flags: needinfo?(rob)
Attached patch strings.diffSplinter Review
Attachment #9224566 - Attachment is obsolete: true

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #9)

Created attachment 9224566 [details] [diff] [review]
strings.diff

This is the patch that I generate locally. It seems to nuke most of calendar/, mail/, and suite/; did I mess something up?
Ah, looks like I have to softlink comm/en-US to ./en-US.

Can you clarify what you did? I had to clone comm-central in /comm (within my clone of mozilla-unified), then manually pull the branches the first time

cd mozilla-unified
hg clone https://hg.mozilla.org/comm-central comm/
cd comm
hg pull comm-beta
hg pull comm-release
hg pull comm-esr78

As for the patch in comment 10, it looks slightly different from what my code pushed
https://hg.mozilla.org/users/axel_mozilla.com/gecko-strings-quarantine/rev/529ace05bf25a9e22954a110efc02a0b6d65735b

Flags: needinfo?(francesco.lodolo)

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #8)

I'm playing with the latest patch. In CI, we have a few options:

  1. generate the changeset and possibly notify. A human will need to land the patch.
  2. generate the changeset and submit it to Phabricator, flagging a review group for review. A human will need to review and Lando.
  3. generate the changeset and push it to the strings repo without review.

Do we want to rule any of the above out?
(1) seems easiest to add to CI, but it can be prone to human error or bad actors. It could be an intermediate stopgap, however. (2) seems preferable in terms of security and human verification, but still requires someone to do something. (3) is completely hands-off, but we lose the ability to catch issues before they land, and we have to open the repo up to automated writes without review.

I'd rule out 2, with my preference for 3 over 1.

The goal of this code is to push changes to a quarantine repository (that we should move out of a personal space, and probably place under /l10n)
https://hg.mozilla.org/users/axel_mozilla.com/gecko-strings-quarantine/

This repository is not hooked up to anything, it just receives updates. This content is periodically (and manually) pushed to gecko-strings, at which point it's officially exposed for localization via Pontoon, and used in other automation (e.g. linting, fluent migration testing, etc.).

There's a risk that the content will have issues, and we need to fix it somehow (stop automation, potentially nuke the content), but that doesn't justify the extra work required for 2 in terms of reviews. There will be at least one patch every day, and how do we deal with multiple patches accumulating over 3/4 days? The latter would be an issue also for 1.

Nuking the content of quarantine was the only option for the old code, since the references in each commit were needed to replay history. That wouldn't be the case here: we could stop automation, push a manual commit fix, and things would still work.

(In reply to Francesco Lodolo [:flod] from comment #12)

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #9)

Created attachment 9224566 [details] [diff] [review]
strings.diff

This is the patch that I generate locally. It seems to nuke most of calendar/, mail/, and suite/; did I mess something up?
Ah, looks like I have to softlink comm/en-US to ./en-US.

Can you clarify what you did? I had to clone comm-central in /comm (within my clone of mozilla-unified), then manually pull the branches the first time

cd mozilla-unified
hg clone https://hg.mozilla.org/comm-central comm/
cd comm
hg pull comm-beta
hg pull comm-release
hg pull comm-esr78

Described in https://phabricator.services.mozilla.com/D116537#3793862 .

As for the patch in comment 10, it looks slightly different from what my code pushed
https://hg.mozilla.org/users/axel_mozilla.com/gecko-strings-quarantine/rev/529ace05bf25a9e22954a110efc02a0b6d65735b

Yes, Phab allows you to push many versions of the same patch, and diff them, e.g. https://phabricator.services.mozilla.com/D116537?vs=446163&id=446169#toc

(In reply to Francesco Lodolo [:flod] from comment #13)

I'd rule out 2, with my preference for 3 over 1.

The goal of this code is to push changes to a quarantine repository (that we should move out of a personal space, and probably place under /l10n)
https://hg.mozilla.org/users/axel_mozilla.com/gecko-strings-quarantine/

Ah, ok. Let's limit the user who pushes here to only be able to push to the quarantine repo (possibly level 1 as well?), then we should be good.

comm-release is not used or updated any more. No need to pull strings from it.

See Also: → 1714239
Assignee: nobody → aki

(In reply to Francesco Lodolo [:flod] from comment #13)

I'd rule out 2, with my preference for 3 over 1.

The goal of this code is to push changes to a quarantine repository (that we should move out of a personal space, and probably place under /l10n)
https://hg.mozilla.org/users/axel_mozilla.com/gecko-strings-quarantine/

This repository is not hooked up to anything, it just receives updates. This content is periodically (and manually) pushed to gecko-strings, at which point it's officially exposed for localization via Pontoon, and used in other automation (e.g. linting, fluent migration testing, etc.).

There's a risk that the content will have issues, and we need to fix it somehow (stop automation, potentially nuke the content), but that doesn't justify the extra work required for 2 in terms of reviews. There will be at least one patch every day, and how do we deal with multiple patches accumulating over 3/4 days? The latter would be an issue also for 1.

Hm.
I'm actually thinking that the manual step of copying commits from quarantine to the production gecko-strings repo may be more work than:

  • looking at https://phabricator.services.mozilla.com/ for reviews requested from the review group
  • reviewing the strings patches, and requesting they land via Lando; Lando can land those string changes directly to gecko-strings rather than having to go to a quarantine repo

We could even make the Phabricator submission intelligent, so the patches depend on each other, and they can all land as a stack. Fixing up the strings would involve submitting a new patch to Phabricator, getting review, and landing, or pushing directly to the gecko-strings repository for a restricted set of users.

I'm not sure if you use the quarantine repo for anything specific other than looking at commits. If you only look at the commits, then Phabricator may be sufficient. If you use it for some other purpose, then Phabricator may require more changes.

Maybe I'll implement the quarantine repo for now, and we can consider the Phabricator approach in the future.

Depends on: 1714474

What I currently do is:

  1. Review every patch in Phabricator, to make sure we catch issues early.
  2. Translate `mozilla-central patch-by-patch.
  3. Use quarantine to confirm that the content generation worked as expected

I'd note that you can't really have the third point without the second, you'd need to fully trust the system.

In this scenario, having automation pushing to quarantine is ideal, because pushing to gecko-strings is just a one-line command, and I don't actually review the quarantine repository.

The alternative would be to drop the quarantine repository completely, and have Phabricator land directly in gecko-strings:

  • Automation create content and opens a Phabricator diff. If one already exists, automation can update that. Having stacked commits doesn't help here, quite the opposite.
  • Patches are reviewed once or twice a week, and land directly in gecko-strings.

This has the huge benefit that anyone could replace me, e.g. if I'm on PTO or not around anymore (I don't expect any other PM to translate content, or review patches). The concern is around Phabricator, and the fact that we're not exactly sure what's the long term plan for it.

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #17)

Maybe I'll implement the quarantine repo for now, and we can consider the Phabricator approach in the future.

This sounds like a good approach to me:

  • Let's land the patch and set up some automation. That's the priority, as the whole of Firefox is still depending on my Mac, running a VM, with an improvised/unreviewed patch on top of mozilla-central ;-)
  • Let's consider using Phabricator and getting rid of the quarantine repository as a follow-up. This heavily depends on the trust that we have in Phabricator's future, and how easy it would be to reimplement this in a different system.

:rjl, do you know why hg pull comm{,-beta,-esr78} works for me locally, but wouldn't in automation?
https://firefoxci.taskcluster-artifacts.net/Ov_0d4uiQSWvEBZizQIlvA/0/public/logs/live_backing.log

I wonder if I need to hg pull https://hg.mozilla.org/releases/comm-beta and manually create a comm-beta bookmark, or if there's something else I need to do.

Flags: needinfo?(rob)

Firefoxtree extension enabled? There's no comm-unified repo with all of the "bookmarks" set up like mozilla-unified has.

Flags: needinfo?(rob)

:flod, if we don't have any urgent string pushes, could you skip tonight's string pushes to the quarantine repo? I'd like to test pushes from maple tomorrow before I request review.

Flags: needinfo?(francesco.lodolo)

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #24)

:flod, if we don't have any urgent string pushes, could you skip tonight's string pushes to the quarantine repo? I'd like to test pushes from maple tomorrow before I request review.

Yep, I can definitely do that.

Flags: needinfo?(francesco.lodolo)

Push testing in automation didn't happen due to bug 1716076. I did push https://hg.mozilla.org/l10n/gecko-strings-quarantine/rev/a6c819605dd0cc72b34ced6d379d234c80bd5d20 manually via ./mach l10n-cross-channel prep create push

Attachment #9226034 - Attachment description: WIP: Bug 1659691 - grant access to try secrets for cross-channel-quarantine tasks → Bug 1659691 - create cron hook for l10n-cross-channel. r=#releng
Attachment #9224581 - Attachment description: WIP: Bug 1659691 - Update cross-channel algorithm to not replay history → Bug 1659691 - Update cross-channel algorithm to not replay history. r=#build
Attachment #9226039 - Attachment description: WIP: Bug 1659691 - automation to run l10n-cross-channel in taskgraph cron. → Bug 1659691 - automation to run l10n-cross-channel in taskgraph cron. r=#taskgraph

Works! https://hg.mozilla.org/l10n/gecko-strings-quarantine/rev/cbcfc7c9a4ab1ca2873e280fd9db4856cf4eb591
:flod, let me know if there are any changes to the commit or comments or anything you want changed. In the meantime, I'll work on getting these patches reviewed and landed, hopefully later this week. We may want manual string pushes until that point. But at least in theory, we'll get automated string pushes to quarantine once these patches land.

Flags: needinfo?(francesco.lodolo)

Can we simplify a bit the date in commit message?

Current: cross-channel content for 2021-06-14T19:12:05.998373

self.message = f"cross-channel content for {datetime.utcnow().isoformat()}"

Possibly: Update cross-channel content for 2021-06-14 19:12

self.message = f"Update cross-channel content for {datetime.utcnow().strftime('%Y-%m-%d %H:%M')}"

Should I keep pushing content manually in the meantime?

Flags: needinfo?(francesco.lodolo)

(In reply to Francesco Lodolo [:flod] from comment #28)

Can we simplify a bit the date in commit message?

Done.

Should I keep pushing content manually in the meantime?

Hm, I'm going to see if I can cron this somewhere until it lands on central.

Blocks: 1717096

I have lando'ed the patches, and disabled the temp cron jobs. We may have a gap in strings in the quarantine repo until this lands on Central.

Pushed by asasaki@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8bcf35827a03 Update cross-channel algorithm to not replay history. r=firefox-build-system-reviewers,releng-reviewers,bhearsum,mhentges https://hg.mozilla.org/integration/autoland/rev/1ca65cdfef04 automation to run l10n-cross-channel in taskgraph cron. r=taskgraph-reviewers,jmaher
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Oh, reopening til this merges in central.

Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED

the patch is landed in central, do we need to add milestone and tracking flags?

Flags: needinfo?(aki)

I don't think so. The automation only runs from central.
Thank you!

Flags: needinfo?(aki)
Pushed by asasaki@mozilla.com: https://hg.mozilla.org/ci/ci-configuration/rev/852de67395cb create cron hook for l10n-cross-channel. r=releng-reviewers,jmaher
Depends on: 1722708
See Also: → 1727300
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: