Closed
Bug 1333234
Opened 8 years ago
Closed 8 years ago
L10n Routing on Aurora is too large, breaks amqp
Categories
(Firefox Build System :: Task Configuration, task)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: Callek)
References
Details
Attachments
(1 file)
So, as we initially thought in Bug 1323792 that bumping the route limits would not affect much overall, it turns out we have hit a hard amqp limit.
The overall message header size is configured to be a max of ~ 4kb, so is the route lengths, + some other stuff.
Since the l10n tasks are stable on central (for now) and breaks the decision task on aurora, we'll up the chunking on aurora, while simultaneously investigating/investing in an alternate way to define the used index's.
This is currently blocking aurora linux/linux64/android nightlies.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 2•8 years ago
|
||
Having used a desktop nightlies parameters file to generate a full taskgraph on central and aurora I concat'd all the routes into one string and took the length (using jq) and found the following lengths of routes:
== Central ==
cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value =reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover
"build-android-api-15-nightly/opt": 548,
"build-android-api-15-nightly/opt-upload-symbols": 156,
"build-android-x86-nightly/opt": 536,
"build-android-x86-nightly/opt-upload-symbols": 156,
"build-linux-nightly/opt": 516,
"build-linux-nightly/opt-upload-symbols": 156,
"build-linux64-nightly/opt": 524,
"build-linux64-nightly/opt-upload-symbols": 156,
"nightly-l10n-android-api-15-nightly-1/opt": 2162,
"nightly-l10n-android-api-15-nightly-2/opt": 2162,
"nightly-l10n-android-api-15-nightly-3/opt": 2144,
"nightly-l10n-android-api-15-nightly-4/opt": 2171,
"nightly-l10n-android-api-15-nightly-5/opt": 2162,
"nightly-l10n-android-api-15-nightly-6/opt": 2171,
"nightly-l10n-linux-nightly-1/opt": 1997,
"nightly-l10n-linux-nightly-2/opt": 2012,
"nightly-l10n-linux-nightly-3/opt": 1976,
"nightly-l10n-linux-nightly-4/opt": 1734,
"nightly-l10n-linux-nightly-5/opt": 1743,
"nightly-l10n-linux-nightly-6/opt": 1734,
"nightly-l10n-linux64-nightly-1/opt": 2039,
"nightly-l10n-linux64-nightly-2/opt": 2054,
"nightly-l10n-linux64-nightly-3/opt": 2018,
"nightly-l10n-linux64-nightly-4/opt": 1770,
"nightly-l10n-linux64-nightly-5/opt": 1779,
"nightly-l10n-linux64-nightly-6/opt": 1770,
== Aurora ==
cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value =
reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover
"build-android-api-15-nightly/opt": 542,
"build-android-api-15-nightly/opt-upload-symbols": 154,
"build-android-x86-nightly/opt": 530,
"build-android-x86-nightly/opt-upload-symbols": 154,
"build-linux-nightly/opt": 510,
"build-linux-nightly/opt-upload-symbols": 154,
"build-linux64-nightly/opt": 518,
"build-linux64-nightly/opt-upload-symbols": 154,
"nightly-l10n-android-api-15-nightly-1/opt": 4393,
"nightly-l10n-android-api-15-nightly-2/opt": 4426,
"nightly-l10n-android-api-15-nightly-3/opt": 4417,
"nightly-l10n-android-api-15-nightly-4/opt": 4384,
"nightly-l10n-android-api-15-nightly-5/opt": 4417,
"nightly-l10n-android-api-15-nightly-6/opt": 4118,
"nightly-l10n-linux-nightly-1/opt": 4293,
"nightly-l10n-linux-nightly-2/opt": 4323,
"nightly-l10n-linux-nightly-3/opt": 4314,
"nightly-l10n-linux-nightly-4/opt": 4278,
"nightly-l10n-linux-nightly-5/opt": 4320,
"nightly-l10n-linux-nightly-6/opt": 4296,
"nightly-l10n-linux64-nightly-1/opt": 4389,
"nightly-l10n-linux64-nightly-2/opt": 4419,
"nightly-l10n-linux64-nightly-3/opt": 4410,
"nightly-l10n-linux64-nightly-4/opt": 4374,
"nightly-l10n-linux64-nightly-5/opt": 4416,
"nightly-l10n-linux64-nightly-6/opt": 4392,
Assignee | ||
Comment 3•8 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #1)
> Created attachment 8829684 [details]
> Bug 1333234 - L10n Routing on Aurora is too large.
To save you the trouble, after the patch this route length is:
cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value =
reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover
"build-android-api-15-nightly/opt": 542,
"build-android-api-15-nightly/opt-upload-symbols": 154,
"build-android-x86-nightly/opt": 530,
"build-android-x86-nightly/opt-upload-symbols": 154,
"build-linux-nightly/opt": 510,
"build-linux-nightly/opt-upload-symbols": 154,
"build-linux64-nightly/opt": 518,
"build-linux64-nightly/opt-upload-symbols": 154,
"nightly-l10n-android-api-15-nightly-1/opt": 2704,
"nightly-l10n-android-api-15-nightly-10/opt": 2423,
"nightly-l10n-android-api-15-nightly-2/opt": 2698,
"nightly-l10n-android-api-15-nightly-3/opt": 2728,
"nightly-l10n-android-api-15-nightly-4/opt": 2710,
"nightly-l10n-android-api-15-nightly-5/opt": 2704,
"nightly-l10n-android-api-15-nightly-6/opt": 2686,
"nightly-l10n-android-api-15-nightly-7/opt": 2713,
"nightly-l10n-android-api-15-nightly-8/opt": 2710,
"nightly-l10n-android-api-15-nightly-9/opt": 2695,
"nightly-l10n-linux-nightly-1/opt": 2748,
"nightly-l10n-linux-nightly-10/opt": 2485,
"nightly-l10n-linux-nightly-2/opt": 2730,
"nightly-l10n-linux-nightly-3/opt": 2778,
"nightly-l10n-linux-nightly-4/opt": 2751,
"nightly-l10n-linux-nightly-5/opt": 2745,
"nightly-l10n-linux-nightly-6/opt": 2736,
"nightly-l10n-linux-nightly-7/opt": 2494,
"nightly-l10n-linux-nightly-8/opt": 2494,
"nightly-l10n-linux-nightly-9/opt": 2479,
"nightly-l10n-linux64-nightly-1/opt": 2808,
"nightly-l10n-linux64-nightly-10/opt": 2539,
"nightly-l10n-linux64-nightly-2/opt": 2790,
"nightly-l10n-linux64-nightly-3/opt": 2838,
"nightly-l10n-linux64-nightly-4/opt": 2811,
"nightly-l10n-linux64-nightly-5/opt": 2805,
"nightly-l10n-linux64-nightly-6/opt": 2796,
"nightly-l10n-linux64-nightly-7/opt": 2548,
"nightly-l10n-linux64-nightly-8/opt": 2548,
"nightly-l10n-linux64-nightly-9/opt": 2533,
Updated•8 years ago
|
Assignee: nobody → bugspam.Callek
Comment 4•8 years ago
|
||
mozreview-review |
Comment on attachment 8829684 [details]
Bug 1333234 - L10n Routing on Aurora is too large.
https://reviewboard.mozilla.org/r/106686/#review107858
As a followup if you have time, it might be nice to log the maximum of those values in the decision task (in the task-creation loop). Then if we run into this again, some log parsing will give us a nice threshold value rather than the gusstimates we have now.
Attachment #8829684 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> Comment on attachment 8829684 [details]
> Bug 1333234 - L10n Routing on Aurora is too large.
>
> https://reviewboard.mozilla.org/r/106686/#review107858
>
> As a followup if you have time, it might be nice to log the maximum of those
> values in the decision task (in the task-creation loop). Then if we run
> into this again, some log parsing will give us a nice threshold value rather
> than the gusstimates we have now.
We could run this across a range of the full .json's produced by decision tasks probably easier than we could scan/skim logs for a value. Especially rather than clogging up terminals with a cryptic debugging line.
Maybe we use grafana or something similar to graph the max instead? And make an estimate of an error threshold (say 3.5k or 4k)?
https://hg.mozilla.org/releases/mozilla-aurora/rev/c5a33bbf8cb46bf40fc4d9f2b619189e7f377230
Comment 6•8 years ago
|
||
That's a great point regarding analyzing .json files after the fact. We don't (yet) have a way to create statistics in a task, so I don't think we can gather it that easily, just yet. Hopefully we won't need to!
Assignee | ||
Comment 7•8 years ago
|
||
The followon (better) solution is being worked on in https://bugzil.la/1333255 where jonas is working on a way to actually index all these routes on the tasks, without needing the extra chunking.
Comment 8•8 years ago
|
||
Like asked on IRC by :Callek and now that the merge day happed, I relanded this patch on Aurora at: https://hg.mozilla.org/releases/mozilla-aurora/rev/577083e852674484f8064f45a9b99cf13e1f9b6f
Updated•7 years ago
|
Product: TaskCluster → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•