Work around slow packet.net instances
Categories
(Testing :: General, defect, P1)
Tracking
(firefox68 fixed)
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: gbrown, Assigned: gbrown)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
See bug 1545308: From the task perspective, some packet.net instances are slow. Sometimes tests run at approx 1/4 normal speed, and /proc/cpuinfo shows approx 1/4 the normal MHz. We have tried 2 potential solutions without success.
Android 4.3 emulator tests on aws have experienced a similar issue: The emulator runs slowly on certain aws instances. There is an existing mechanism in mozharness to fail a task and trigger an automatic retry of the task when the emulator sees bogomips below a configurable threshold.
Android x86_64 7.0 tests normally see an emulator bogomips value of 7000 - 8000; a quick random selection of problematic tasks found emulator bogomips values in the 1500 - 2500 range.
I would prefer to see us fully understand the problem and solve it in bug 1545308, but perhaps the time has come for a workaround. We should still be able to find bad instances by reviewing retried tasks, so work can continue in bug 1545308.
Let's enable the mozharness minimum bogomips check for packet.net Android tests.
Assignee | ||
Comment 1•5 years ago
|
||
Enable the existing android mozharness support for retrying a task when the
bogomips are insufficient, for packet.net tests. This has worked well for
Android 4.3.
Assignee | ||
Comment 2•5 years ago
|
||
There are some other ideas in flight in bug 1545308; I don't mind putting this workaround aside if there is another solution imminent, but we really should get tests running reliably asap.
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d51a13b4f0ef Work around slow packet.net instances with min bogomips check; r=jmaher
Assignee | ||
Updated•5 years ago
|
Comment 4•5 years ago
|
||
bugherder |
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
|
||
This isn't working: Android 7.0 /proc/cpuinfo is slightly different from Android 4.3.
Assignee | ||
Comment 6•5 years ago
|
||
Older Android reported "BogoMIPS"; newer Android reports "bogomips".
Updated•5 years ago
|
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cff3c3d2b3c5 Ignore case when searching for android bogomips; r=jmaher
Comment 8•5 years ago
|
||
bugherder |
Assignee | ||
Comment 9•5 years ago
|
||
Now this is definitely working. Many Android 7.0 retries are evident, and the majority are due to the bogomips check. A survey of these retries show an excellent correlation with the problematic worker ids showing 800 MHz in cpuinfo.
There have been 0 Android 7.0 failures reported in bug 1474758 and bug 1411358 since this landing: an effective workaround.
Comment 10•5 years ago
|
||
excellent:)
Is there a high cost associated with this? how much extra cpu time do we spend for each slow instance check/retry? how many do we have? I care more about capacity planning at this point.
Assignee | ||
Comment 11•5 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #10)
The workaround is inefficient; please see https://bugzilla.mozilla.org/show_bug.cgi?id=1545308#c36.
Description
•