Closed Bug 1333234 Opened 3 years ago Closed 3 years ago

L10n Routing on Aurora is too large, breaks amqp

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Callek, Assigned: Callek)

References

Details

Attachments

(1 file)

So, as we initially thought in Bug 1323792 that bumping the route limits would not affect much overall, it turns out we have hit a hard amqp limit.

The overall message header size is configured to be a max of ~ 4kb, so is the route lengths, + some other stuff.

Since the l10n tasks are stable on central (for now) and breaks the decision task on aurora, we'll up the chunking on aurora, while simultaneously investigating/investing in an alternate way to define the used index's.

This is currently blocking aurora linux/linux64/android nightlies.
Having used a desktop nightlies parameters file to generate a full taskgraph on central and aurora I concat'd all the routes into one string and took the length (using jq) and found the following lengths of routes:

 == Central ==
cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value =reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover
  "build-android-api-15-nightly/opt": 548,
  "build-android-api-15-nightly/opt-upload-symbols": 156,
  "build-android-x86-nightly/opt": 536,
  "build-android-x86-nightly/opt-upload-symbols": 156,
  "build-linux-nightly/opt": 516,
  "build-linux-nightly/opt-upload-symbols": 156,
  "build-linux64-nightly/opt": 524,
  "build-linux64-nightly/opt-upload-symbols": 156,
  "nightly-l10n-android-api-15-nightly-1/opt": 2162,
  "nightly-l10n-android-api-15-nightly-2/opt": 2162,
  "nightly-l10n-android-api-15-nightly-3/opt": 2144,
  "nightly-l10n-android-api-15-nightly-4/opt": 2171,
  "nightly-l10n-android-api-15-nightly-5/opt": 2162,
  "nightly-l10n-android-api-15-nightly-6/opt": 2171,
  "nightly-l10n-linux-nightly-1/opt": 1997,
  "nightly-l10n-linux-nightly-2/opt": 2012,
  "nightly-l10n-linux-nightly-3/opt": 1976,
  "nightly-l10n-linux-nightly-4/opt": 1734,
  "nightly-l10n-linux-nightly-5/opt": 1743,
  "nightly-l10n-linux-nightly-6/opt": 1734,
  "nightly-l10n-linux64-nightly-1/opt": 2039,
  "nightly-l10n-linux64-nightly-2/opt": 2054,
  "nightly-l10n-linux64-nightly-3/opt": 2018,
  "nightly-l10n-linux64-nightly-4/opt": 1770,
  "nightly-l10n-linux64-nightly-5/opt": 1779,
  "nightly-l10n-linux64-nightly-6/opt": 1770,

== Aurora ==
cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value =
reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover
  "build-android-api-15-nightly/opt": 542,
  "build-android-api-15-nightly/opt-upload-symbols": 154,
  "build-android-x86-nightly/opt": 530,
  "build-android-x86-nightly/opt-upload-symbols": 154,
  "build-linux-nightly/opt": 510,
  "build-linux-nightly/opt-upload-symbols": 154,
  "build-linux64-nightly/opt": 518,
  "build-linux64-nightly/opt-upload-symbols": 154,
  "nightly-l10n-android-api-15-nightly-1/opt": 4393,
  "nightly-l10n-android-api-15-nightly-2/opt": 4426,
  "nightly-l10n-android-api-15-nightly-3/opt": 4417,
  "nightly-l10n-android-api-15-nightly-4/opt": 4384,
  "nightly-l10n-android-api-15-nightly-5/opt": 4417,
  "nightly-l10n-android-api-15-nightly-6/opt": 4118,
  "nightly-l10n-linux-nightly-1/opt": 4293,
  "nightly-l10n-linux-nightly-2/opt": 4323,
  "nightly-l10n-linux-nightly-3/opt": 4314,
  "nightly-l10n-linux-nightly-4/opt": 4278,
  "nightly-l10n-linux-nightly-5/opt": 4320,
  "nightly-l10n-linux-nightly-6/opt": 4296,
  "nightly-l10n-linux64-nightly-1/opt": 4389,
  "nightly-l10n-linux64-nightly-2/opt": 4419,
  "nightly-l10n-linux64-nightly-3/opt": 4410,
  "nightly-l10n-linux64-nightly-4/opt": 4374,
  "nightly-l10n-linux64-nightly-5/opt": 4416,
  "nightly-l10n-linux64-nightly-6/opt": 4392,
(In reply to Justin Wood (:Callek) from comment #1)
> Created attachment 8829684 [details]
> Bug 1333234 - L10n Routing on Aurora is too large.


To save you the trouble, after the patch this route length is:

cat ../jobs_test1.json | jq '[. | to_entries[] | select(.key) | .value =
reduce .value.task.routes[] as $item (0; . + ( $item | length ) ) ] | from_entries' | grep "nightly-l10n\|build.*nightly" | grep -v beetmover
  "build-android-api-15-nightly/opt": 542,
  "build-android-api-15-nightly/opt-upload-symbols": 154,
  "build-android-x86-nightly/opt": 530,
  "build-android-x86-nightly/opt-upload-symbols": 154,
  "build-linux-nightly/opt": 510,
  "build-linux-nightly/opt-upload-symbols": 154,
  "build-linux64-nightly/opt": 518,
  "build-linux64-nightly/opt-upload-symbols": 154,
  "nightly-l10n-android-api-15-nightly-1/opt": 2704,
  "nightly-l10n-android-api-15-nightly-10/opt": 2423,
  "nightly-l10n-android-api-15-nightly-2/opt": 2698,
  "nightly-l10n-android-api-15-nightly-3/opt": 2728,
  "nightly-l10n-android-api-15-nightly-4/opt": 2710,
  "nightly-l10n-android-api-15-nightly-5/opt": 2704,
  "nightly-l10n-android-api-15-nightly-6/opt": 2686,
  "nightly-l10n-android-api-15-nightly-7/opt": 2713,
  "nightly-l10n-android-api-15-nightly-8/opt": 2710,
  "nightly-l10n-android-api-15-nightly-9/opt": 2695,
  "nightly-l10n-linux-nightly-1/opt": 2748,
  "nightly-l10n-linux-nightly-10/opt": 2485,
  "nightly-l10n-linux-nightly-2/opt": 2730,
  "nightly-l10n-linux-nightly-3/opt": 2778,
  "nightly-l10n-linux-nightly-4/opt": 2751,
  "nightly-l10n-linux-nightly-5/opt": 2745,
  "nightly-l10n-linux-nightly-6/opt": 2736,
  "nightly-l10n-linux-nightly-7/opt": 2494,
  "nightly-l10n-linux-nightly-8/opt": 2494,
  "nightly-l10n-linux-nightly-9/opt": 2479,
  "nightly-l10n-linux64-nightly-1/opt": 2808,
  "nightly-l10n-linux64-nightly-10/opt": 2539,
  "nightly-l10n-linux64-nightly-2/opt": 2790,
  "nightly-l10n-linux64-nightly-3/opt": 2838,
  "nightly-l10n-linux64-nightly-4/opt": 2811,
  "nightly-l10n-linux64-nightly-5/opt": 2805,
  "nightly-l10n-linux64-nightly-6/opt": 2796,
  "nightly-l10n-linux64-nightly-7/opt": 2548,
  "nightly-l10n-linux64-nightly-8/opt": 2548,
  "nightly-l10n-linux64-nightly-9/opt": 2533,
Assignee: nobody → bugspam.Callek
Comment on attachment 8829684 [details]
Bug 1333234 - L10n Routing on Aurora is too large.

https://reviewboard.mozilla.org/r/106686/#review107858

As a followup if you have time, it might be nice to log the maximum of those values in the decision task (in the task-creation loop).  Then if we run into this again, some log parsing will give us a nice threshold value rather than the gusstimates we have now.
Attachment #8829684 - Flags: review?(dustin) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #4)
> Comment on attachment 8829684 [details]
> Bug 1333234 - L10n Routing on Aurora is too large.
> 
> https://reviewboard.mozilla.org/r/106686/#review107858
> 
> As a followup if you have time, it might be nice to log the maximum of those
> values in the decision task (in the task-creation loop).  Then if we run
> into this again, some log parsing will give us a nice threshold value rather
> than the gusstimates we have now.

We could run this across a range of the full .json's produced by decision tasks probably easier than we could scan/skim logs for a value. Especially rather than clogging up terminals with a cryptic debugging line.

Maybe we use grafana or something similar to graph the max instead? And make an estimate of an error threshold (say 3.5k or 4k)?

https://hg.mozilla.org/releases/mozilla-aurora/rev/c5a33bbf8cb46bf40fc4d9f2b619189e7f377230
That's a great point regarding analyzing .json files after the fact.  We don't (yet) have a way to create statistics in a task, so I don't think we can gather it that easily, just yet.  Hopefully we won't need to!
The followon (better) solution is being worked on in https://bugzil.la/1333255 where jonas is working on a way to actually index all these routes on the tasks, without needing the extra chunking.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
See Also: → 1333255
Like asked on IRC by :Callek and now that the merge day happed, I relanded this patch on Aurora at: https://hg.mozilla.org/releases/mozilla-aurora/rev/577083e852674484f8064f45a9b99cf13e1f9b6f
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.