Closed Bug 848284 Opened 7 years ago Closed 4 years ago

Add l10n repackage visibility to try and inbound

Categories

(Release Engineering :: General, defect)

x86_64
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jimm, Assigned: nthomas)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(3 files, 4 obsolete files)

We recently had some work land on mc that broke the l10n builds that occur. The only visibility we had on this was over on the old l10n waterfall page. Despite testing l10n repacks before landing, we still managed to screw up a step builders go through in doing the packaging resulting in a l10n lang pack outage that lasted for a couple weeks. (I think there were actually two things broken, we fixed one and then another showed up.)

I don't think most developers even know these build steps take place. They certainly wouldn't know if something broke until the l10n community files a bug on build bustage.

So I would like to propose we add the ability to trigger a repackage for a couple languages on try (something like -p l10n), and also do a set of test repacks for a couple languages on inbound checkins. Pike suggested french and italian over in bug 845247.
We definitely need that on try. But I don't think we need this on inbound. However, exposing the l10n status on tbpl for nightlies would be *very* appreciated. Even an aggregated status. But that should probably be a separate bug.
(In reply to Jim Mathies [:jimm] from comment #0)
> We recently had some work land on mc that broke the l10n builds that occur.
> The only visibility we had on this was over on the old l10n waterfall page.
Per discussions with Axel: This old l10n waterfall page and the server it is on will go away soon, bug#843383, once we can stand up alternate l10n reporting bug#698910. 

Most localizers use the separate and unrelated l10n dashboards, which will remain as-is.

 
> Despite testing l10n repacks before landing, we still managed to screw up a
> step builders go through in doing the packaging resulting in a l10n lang
> pack outage that lasted for a couple weeks. (I think there were actually two
> things broken, we fixed one and then another showed up.)
> 
> I don't think most developers even know these build steps take place. They
> certainly wouldn't know if something broke until the l10n community files a
> bug on build bustage.
> 
> So I would like to propose we add the ability to trigger a repackage for a
> couple languages on try (something like -p l10n), and also do a set of test
> repacks for a couple languages on inbound checkins. Pike suggested french
> and italian over in bug 845247.
I can see how being able to do l10n repacks as non-default on try could be helpful. Not sure how easy/hard that is, but will investigate. Alternatively, would nightly builds with l10n repacks available on projects/build-system have solved the same need?
(In reply to John O'Duinn [:joduinn] from comment #2)
> I can see how being able to do l10n repacks as non-default on try could be
> helpful. Not sure how easy/hard that is, but will investigate.
> Alternatively, would nightly builds with l10n repacks available on
> projects/build-system have solved the same need?

In our particular case having test repacks on elm would have alerted us to the problem. This wouldn't address every day inbound checkins though.
Product: mozilla.org → Release Engineering
Attached patch enable l10n on try (obsolete) — Splinter Review
completely untested patches. brief inspection of a diff looks somewhat sane. hopefully we can find some time next week to do some testing in staging.
I've started to work on bug 1237678, using the in-tree version of compare-locales.

Which is fully in mozharness land, really, so having l10n on try would be great.

Additional trick is that I don't want to repack a nightly, but the build I've built, with mozharness in the revision that I pushed.
With Mark doing changes to how we repack for loop, we've got another change that busted the tree in various ways:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=1dbe350b57b1&exclusion_profile=false&filter-job_group_symbol=L10n

catlee, not sure how I can be helpful, I'll sure try, though.
Depends on: 1250626
Depends on: 1250629
Blocks: l10nrepacks
No longer depends on: l10nrepacks
Blocks: 1107630
Compared to your wip this reorders the config files pass to desktop_l10n.py, which are processed in the order you specify them (unlike fx_desktop_build.py!). This lets upload_env be overridden on try instead of getting the production/staging settings. Also suppresses the Balrog config file on try, and makes the builder name nicer for the 1 chunk case.
Attachment #8697646 - Attachment is obsolete: true
Attachment #8740899 - Flags: review?(catlee)
Sets up the config variables needed for l10n, suppress dep scheduler, and limit to a single builder (the idea is the user truncates the all-locales file).
Attachment #8697644 - Attachment is obsolete: true
Attachment #8740900 - Flags: review?(catlee)
Roughly in order of appearance:
* enable tooltool caching for !win (see bug 1263575 for windows globally)
* defines a new upload_environment of prod/stage to be used in upload hostname (only on try)
* copies configs/single_locale/mozilla-central.py and customizes for try
 * default to last m-c nightly, and l10n repos
 * don't update the gecko source to whatever you find in that nightly
 * clone try by rev, with purging and longer timeout, getting revision from buildbot
 * override the upload env vars for try special case
* in desktop_l10n.py:
 * revision may come from cmdline arg, or buildbot; track enUS revision separately for central & aurora
 * config variable interpolation needs to pass through non-string config items, see also pull()
 * steal _query_who() from mozilla/building/buildbase.py
 * massage _query_upload_env() to give the new variables we need for post_upload call. post_upload_extra is for staging support
 * _query_revision() is based on the similar function in buildbase.py
 * move updating gecko source to en-US build behind a variable which is False on try

This works pushed to try then built on mac in staging, see 
http://dev-master2.bb.releng.use1.mozilla.com:8711/builders/Firefox%20try%20macosx64%20l10n/builds/13
https://ftp.stage.mozaws.net/pub/firefox/try-builds/nobody@example.com-72a5bcd9edd7/

Log upload by the master should work, although I haven't tested it, and not sure how/if it will show up on Treeherder.

Intended workflow (initially):
* modify configs/single_locale/try.py if m-a is preferred, or other en-US build
* hack on mozharness
* push to try without 'try:' message (no builds start)
* use 'Treeherder > Add new jobs' to force builds on relevant platforms (TBC once live in prod)

Open question - do we want to submit to balrog & and generate partials funsize for full-path testing ?
Attachment #8740904 - Flags: review?(catlee)
(In reply to Nick Thomas [:nthomas] from comment #10)
> Created attachment 8740900 [details] [diff] [review]
> [buildbot-configs] Add l10n to try
> 
> Sets up the config variables needed for l10n, suppress dep scheduler, and
> limit to a single builder (the idea is the user truncates the all-locales
> file).

My perception is that l10n jobs should run on try by default, for a specific list of locales. The latter really only for load-limiting.

The reason is that on-purpose building on try is nice, but it's also that often folks don't expect to break l10n, and thus I think try only develops its real power if it's run by default. Probably also implied in the responses to my tier-1 thread https://groups.google.com/forum/#!topic/mozilla.dev.planning/gGAxHfciixs

My recommendation for locales that are active, and somewhat active: 'it', 'zh-TW'. Italian is most often complete, and zh-TW is occasionally updated from aurora, has a region code and non-latin script.
Will those repacks fail if the locale repo is not up-to-date in some way? I'd hate to have Try results that depend on something other than the patch submitted to succeed. The build peers are interested in this just for testing the repack machinery.
We need successful builds with not-up-to-date repo, which is why I want one of those to be in the test scenario. The idea is that the l10n nightlies will still succeed once the patch lands.

I'm trying to test somewhat up-to-date locales so that if we (at some later point) add runtime testing, we can confirm with screenshots that it's actually italian or Chinese, and not just English. Where we'd have problems to tell if this is a repack problem, or just merged strings.
Okay. I just wonder if for these purposes a synthetic locale produced with pseudolocalization from the en-US might be more useful?
pseudo would be great, but I'm concerned about all those funky edge-case strings we have with css and numbers and flag words for locale configs. In gaia, we were successful with pseudo. In gecko land, I'm scared of the edges and the various file and in particular variable formats. https://github.com/l20n/l20n.js/blob/master/src/lib/pseudo.js for inspiration.

Really, the flurry of variable formats we have, the total lack of any check that we're not adding new ones. Gaia just has one precisely defined variable format, and that's it.

Mostly just scared, I guess, of the work that comes with it.
Let's leave the decision/discussion of what locales to build by default until later. Right now we need to get this working on an opt-in basis on try.
Attachment #8740899 - Flags: review?(catlee) → review+
Attachment #8740900 - Flags: review?(catlee) → review+
Attachment #8740904 - Flags: review?(catlee) → review+
Pushed to try - https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c4d51201ce1. Mac and the two Linux worked OK.

Issues to followup
* win32 failed with an error in configure:
02:38:44     INFO -   1:13.97 DEBUG: configure: error: To build the installer you must have the latest MozillaBuild or NSIS version 3.0b1 or greater in your path.
* win64 failed to clone try quickly enough and keep hitting:
02:58:12     INFO -  command: hg clone --traceback -U -r 6c4d51201ce1 https://hg.mozilla.org/try c:\builds\hg-shared\try
02:58:12     INFO -  command: cwd: c:\builds\moz2_slave\try-w64-l10n-00000000000000000\build
02:58:12     INFO -  command: env: {'HGPLAIN': '1'}
03:28:13     INFO -  process, is taking too long: 1800.25s (timeout 1800s).Terminating
* treeherder shows the jobs with a ?
* forcing builds on buildbot doesn't set up the ownership (who) properly, so uploads end up in http://bucketlister-delivery.prod.mozaws.net/pub/firefox/try-builds/nobody@example.com-6c4d51201ce1/ instead of nthomas@mozilla.com-6c4d51201ce10dc10540a58c40776f184892c6bb/
* log uploads fail because the post_upload call is missing --who (we have calls like post_upload.py -b try -p firefox --revision 6c4d51201ce1 --builddir try-win32 --release-to-try-builds <dir> <files>). This leads to dead command queue alerts, and incorrect treeherder state
I don't know if this is covered by one of your existing issues, but that try push shows all the jobs as "no logs available for this job".
Assignee: nobody → nthomas
Depends on: 1265573
Now that treeherder knows about the builder names it's possible to use 'Add new jobs' (in the little drop down in the top right corner of the UI). The logs even get uploaded properly, because the 'who' from the original push is present.

Issues to fix
* windows jobs possibly fail on new instances when cloning try(timeout too short, prepopulated try share is not working properly??)
* windows jobs fail in configure with, perhaps a path issue, m-c is not affected
14:55:37     INFO -   0:56.88 DEBUG: configure: error: To build the installer you must have the latest MozillaBuild or NSIS version 3.0b1 or greater in your path.
* generally, files are uploaded to a different place to logs (going by what linux is doing)
This 
* removes the 3600s timeout on cloning try, since it's not making it to hgtool invocation
* adds NSIS 3.0b1 to the head of the PATH on win32 and win64 (fixes the configure error on those platforms)
* fixes _query_who() to find the actual user, so that binaries upload to the correct place on archive.m.o  (they're also on taskcluster)
* casts the revision to a non-unicode string, otherwise POST_UPLOAD_CMD ends up as unicode and that is not permitted on windows

Carrying catlee's r+ over.
Attachment #8740904 - Attachment is obsolete: true
Attachment #8745179 - Flags: review+
This should be final set of tweaks, and there's enough of them to r? again. Still waiting on a full set of green builds on https://treeherder.mozilla.org/#/jobs?repo=try&revision=a72add04d7e0&filter-searchStr=l10n

* set TOOLTOOL_CACHE & TOOLTOOL_HOME on all platforms, remove unused tooltool_cache on mac & linux. Use TOOLTOOL_CACHE to call tooltool.py with -c <cache_dir> (similar to what you did for desktop non-l10n)
* set hgtool_base_bundle_urls to work around slow clones on windows when starting from scratch (bug 1267913). Remove timeout setting which was being ignored
* fix _query_who() to look in the right place for the buildbot property. Make sure who and revision are cast into a str() so that unicode doesn't break POST_UPLOAD_CMD on windows
* make sure NSIS 3.0b1 is at the start of the path, to avoid failure in configure on windows
Attachment #8745179 - Attachment is obsolete: true
Attachment #8745956 - Flags: review?
Attachment #8745956 - Flags: review? → review?(catlee)
Attachment #8745956 - Flags: review?(catlee) → review+
Keywords: leave-open
Comment on attachment 8745956 [details] [diff] [review]
[gecko] l10n script support for try, v3

How to use this:

1, Optionally modify browser/locales/all-locales to reduce run time (eg limit to top-locales like de fr ja ja-JP-mac ru zh-TW)
2, Push to try, no trychooser message is needed
3, Go to treeherder for your push and filter with 'l10n'
4, Use the drop down menu just below the filter box to 'Add new jobs'. Select the platforms of interest and click on 'Trigger New Jobs'

Builds and logs will be uploaded to archive.m.o, as notified by email on push and the usual UI on treeherder.
Attachment #8745956 - Flags: checked-in+
L10n on m-c is green, so let's close this out and file new bugs for any followups.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Nick: thanks for all the hard work here! This is something we've wanted for so long, it's awesome to see it finally happen.
Blocks: 1312934
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.