Open Bug 1036563 Opened 10 years ago Updated 2 years ago

'make l10n-check' fails when run in parallel

Categories

(Firefox Build System :: General, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: mshal, Unassigned)

Details

Running 'make l10n-check -j4', at least on Windows, fails with errors like:

cp: cannot stat `../installer/windows/l10ngen/helper.exe': No such file or directory
cp: cannot stat `../installer/windows/l10ngen/setup.exe': No such file or directory

A 'make l10n-check -j1' works just fine, however. I suspect there are some dependencies missing between the steps.
You didn't expect the l10n code to actually work with j>1 did you? How long were you are on vacation? ;)

In all honesty, we should at least detect this footgun. I doubt anyone is signing up to fix these make files.
Hah :). I was mostly logging this here so I have something to point to in a comment. I'm working around it at the moment in bug 978211 by running each individual automation step with -j1 (so even if multiple automation steps are run in parallel, each individual step has a single process).
Product: Core → Firefox Build System

This job is #3 in term of usage in the CI. We should probably work on this.

Priority: -- → P1

(In reply to Sylvestre Ledru [:Sylvestre] from comment #3)

This job is #3 in term of usage in the CI. We should probably work on this.

There are two ways we can address this. The obvious one is to fix whatever parallelism problems l10n-check has, which is a reasonable thing to do in its own right.

The other way is to ask: does l10n-check really need to run on every build that we do? Or can we--like the Python unit tests--split it out to run per-platform? Or can we just run a single l10n-check job on each push? Either way would be a significant reduction in the amount of time that we spend running this, and as a bonus, builds generally would run faster.

Pike, what's the right thing to do here and how hard is doing the right thing (or any of the above)?

Flags: needinfo?(l10n)

There's something weird here. l10n-check is disabled on most automated builds, https://searchfox.org/mozilla-central/search?q=L10N_CHECK&case=false&regexp=false&path=.
Also, looking at https://firefoxci.taskcluster-artifacts.net/UeirCy-VS5u6ZqYnk1Vwag/0/public/build/build_resources.json, the l10n-check time is smaller than the check time.

Maybe there's a single data point that's way off or something?

As for this bug, I bet I fixed this back in 2017 with bug 1385227.

Flags: needinfo?(l10n)

(In reply to Axel Hecht [:Pike][OOO - Jan 4] from comment #5)

There's something weird here. l10n-check is disabled on most automated builds, https://searchfox.org/mozilla-central/search?q=L10N_CHECK&case=false&regexp=false&path=.

They appear to be running on Windows builds (at least opt)? It's hard to go from 40-odd results to what kinds of builds are actually running l10n-check.

But to the original question, do we need to be running them per-build? Can we do per-platform or even per-push instead?

Also, looking at https://firefoxci.taskcluster-artifacts.net/UeirCy-VS5u6ZqYnk1Vwag/0/public/build/build_resources.json, the l10n-check time is smaller than the check time.

Picking a random Windows opt build off autoland:

https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=283357712&revision=fdb7d4f7cacb1c22a97b59734d7fcd1fdf12c802
https://firefoxci.taskcluster-artifacts.net/Di9JmdYoTt2MO1R_lXd__Q/0/public/build/build_resources.html

l10n-check is taking much longer than check. Maybe we should be turning it off there as well? But then...what builds should we be running l10n-check on, if we're turning it off in so many different mozconfigs?

Flags: needinfo?(l10n)

(In reply to Axel Hecht [:Pike][OOO - Jan 4] from comment #5)

Maybe there's a single data point that's way off or something?

I had a look to the data the whole year and the same pattern is existing.

see the redash query if you want to investigate:
https://sql.telemetry.mozilla.org/queries/67342/source?p_start_date_undefined=2019-11-01&p_end_date_undefined=2019-11-30&p_start_date_67342=2019-01-01&p_end_date_67342=2019-12-31#170520

I've spun off the conversations here into two other bugs, bug 1607191 for some oddities around the data, and bug 1607193 to find out what to do with l10n-check itself.

This bug is originally about dependency problems in repacks, and I'd prefer it to stay that way.

Flags: needinfo?(l10n)

(In reply to Axel Hecht [:Pike][OOO - Jan 4] from comment #8)

and bug 1607191 to find out what to do with l10n-check itself.

I think this was Bug 1607193

Priority: P1 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.