Closed
Bug 777436
Opened 12 years ago
Closed 12 years ago
Ascertain which Android test suites have >30% failure rate and hide them
Categories
(Tree Management Graveyard :: TBPL, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
(Whiteboard: [sheriff-want])
Attachments
(2 files)
3.86 KB,
patch
|
Details | Diff | Splinter Review | |
48.47 KB,
image/jpeg
|
Details |
The current overall Android test failure rate is extremely high, leading to:
* People not taking *any* Android test failures seriously, real or otherwise.
* Many retriggers, increasing load dramatically, which is a contributor to bug 772458 (and combined with things like bug 777273 the situation becomes dire).
* Sheriffs just starring all Android failures generically with 'a' (and have been for months), since there are just too many failures to open each log and star with the correct bug across 10+ trees. (Even starring generically, Android test failures eat up a significant proportion of my day).
People like gbrown, jmaher, Callek, armeng (and many more) are working hard at getting the failures (both infra/hardware and test specific) resolved (for which I'm exceptionally grateful), however we cannot wait any longer. As it is it will be weeks, if not months before we undo the conditioning of all devs to just ignore all Android test failures - even once the tests are routinely green.
Many of the failures are in only a handful of the test suites - so in the short term, we should just hide the worst behaving suites (those with over 30% failure rate).
This will:
* stop people being tempted to retrigger the unreliable test suites, reducing tegra load.
* improve the overall perception of the reliability of Android tests - starting the slow journey back to people trusting them.
* mean that the tests can still be viewed using &noignore=1 added to the TBPL URL; - it's not like we're disabling them and losing coverage.
Assignee | ||
Comment 1•12 years ago
|
||
Native Android:
from: 5bd9db1381a6 (bbondy@moco – Sat Jul 21 18:58:07 2012 UTC+1)
to: fe77957061ea (jmathies@moco – Wed Jul 25 10:28:32 2012 UTC+1)
~mochitest-1~
28 failing, 73 total (38%)
~mochitest-2~
17 failing, 67 total (25%)
~mochitest-3~
27 failing, 70 total (39%)
~mochitest-4~
7 failing, 64 total (11%)
~mochitest-5~
7 failing, 65 total (11%)
~mochitest-6~
9 failing, 71 total (13%)
~mochitest-7~
12 failing, 69 total (17%)
~mochitest-8~
28 failing, 71 total (39%)
~robocop~
50 failing, 88 total (57%)
~crashtest-2~
7 failing, 63 total (11%)
~crashtest-3~
3 failing, 64 total ( 5%)
~jsreftest-1~
11 failing, 72 total (15%)
~jsreftest-2~
10 failing, 66 total (15%)
~jsreftest-3~
4 failing, 61 total ( 7%)
~reftest-1~
9 failing, 67 total (13%)
~reftest-2~
41 failing, 81 total (51%)
~reftest-3~
46 failing, 86 total (53%)
~remote-tdhtml~
3 failing, 59 total ( 5%)
~remote-trobocheck~
24 failing, 66 total (36%)
~remote-trobocheck2~
18 failing, 64 total (28%)
~remote-trobocheck3~
32 failing, 78 total (41%)
~remote-trobopan~
26 failing, 66 total (39%)
~remote-troboprovider~
20 failing, 71 total (28%)
~remote-tsvg~
8 failing, 60 total (13%)
~remote-tp4m_nochrome~
5 failing, 63 total ( 8%)
~remote-ts~
45 failing, 86 total (52%)
Assignee | ||
Comment 2•12 years ago
|
||
Meant to add, the figures in comment 1 and here, exclude the blue retries; but include all other failures.
XUL Android:
(Same timeframe as comment 1)
~mochitest-1~
6 failing, 66 total ( 9%)
~mochitest-2~
6 failing, 69 total ( 9%)
~mochitest-3~
2 failing, 64 total ( 3%)
~mochitest-4~
6 failing, 63 total (10%)
~mochitest-5~
2 failing, 66 total ( 3%)
~mochitest-6~
4 failing, 63 total ( 6%)
~mochitest-7~
2 failing, 63 total ( 3%)
~mochitest-8~
10 failing, 67 total (15%)
~crashtest-2~
4 failing, 63 total ( 6%)
~crashtest-3~
4 failing, 62 total ( 6%)
~jsreftest-1~
16 failing, 67 total (24%)
~jsreftest-2~
15 failing, 68 total (22%)
~jsreftest-3~
3 failing, 59 total ( 5%)
~reftest-1~
2 failing, 63 total ( 3%)
~reftest-2~
3 failing, 62 total ( 5%)
~reftest-3~
9 failing, 65 total (14%)
Assignee | ||
Comment 3•12 years ago
|
||
Setting a threshold of 30% for native Android, and 20% for XUL (given that we only need it to ensure the metro/B2G stuff still works), leaves the following:
Native:
* mochitest-1 38%
* mochitest-3 39%
* mochitest-8 39%
* reftest-2 51%
* reftest-3 53%
* robocop 57%
* remote-trobocheck 36%
* remote-trobocheck3 41%
* remote-trobopan 39%
* remote-ts 52%
XUL:
* jsreftest-1 24%
* jsreftest-2 22%
The above have been hidden on:
* mozilla-central
* mozilla-inbound
* fx-team
* services-central
* try
I have left out project repos, since many are not running Android tests / sheriffs don't have to star them so makes little difference.
Will file dependants for unhiding each, next.
Assignee | ||
Updated•12 years ago
|
Whiteboard: [sheriff-want]
Assignee | ||
Comment 4•12 years ago
|
||
I've just added "Show all Android tests" links to the TBPL status messages for each of the trees in comment 3, linking to " ...&jobname=Android&noignore=1" (similar to how we've done it for Spidermonkey builds on inbound for some time).
Comment 5•12 years ago
|
||
(In reply to Ed Morley [:edmorley] from comment #0)
> * improve the overall perception of the reliability of Android tests -
> starting the slow journey back to people trusting them.
I think this bug is a good change, but I did the math on the percentages in comment 1. Even with everything with a 30% failure right or higher discarded, there's a 92% chance of a bogus failure in at least one of the remaining tests. So perhaps this could help perception somewhat, but I'd hold off on any announcements that everything is better to avoid a "never cry wolf" effect. (Never cry no wolf?)
It does mean that the odds of only needed a single retrigger on a visible failure are much higher.
I suppose automated retriggering would get things to a decent state, but that seems very bad from a load perspective.
Perhaps if a build of a later changeset is green, it could "auto-star" earlier builds? (Just in the tbpl UI, I mean.)
Assignee | ||
Comment 6•12 years ago
|
||
(In reply to Steve Fink [:sfink] (vacation Jul30-Aug10) from comment #5)
> I think this bug is a good change, but I did the math on the percentages in
> comment 1. Even with everything with a 30% failure right or higher
> discarded, there's a 92% chance of a bogus failure in at least one of the
> remaining tests. So perhaps this could help perception somewhat, but I'd
> hold off on any announcements that everything is better to avoid a "never
> cry wolf" effect. (Never cry no wolf?)
Yeah I wasn't going to shout anything yet :-)
(This bug was as much for my sheriffing sanity as anything else - at least in the short term).
Post bug 775227's bad test disabling, the revised figures (inbound since ~Fri) for the hidden Native mochitests are:
Native:
* mochitest-1: 21 failing, 81 total (26%)
* mochitest-3: 76 failing, 78 total (97%)
* mochitest-8: 15 failing, 82 total (18%)
* robocop : 39 failing, 81 total (48%)
I still haven't filed the dependant bugs, will do so after the a-team meeting. Will also try to track down where the m3 failure rate jumped up from that in comment 3 + un-hide m8 since it seems much better behaved now :-)
Comment 8•12 years ago
|
||
looks like we won't have anything running before too long:)
Assignee | ||
Comment 9•12 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #8)
> looks like we won't have anything running before too long:)
At least that might mean we get buy-in from platform... :-)
Joking aside, unless something drastic happens, we should only be unhiding them from this point forwards (eg M8 which has already improved enough to unhide).
Assignee | ||
Comment 10•12 years ago
|
||
This patch allows TBPL to show failure stats for the current view. Bit hacky but gets the job done.
To use:
1) Check out http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/
2) Apply patch
3) Run index.html from the local filesystem
4) Adjust filters to whichever Android suite(s) you would like included in the stats
5) Stats displayed top right of the UI, where the unstarred count normally is.
Since this method uses the server side components from prod, the data imports and hidden builders will all match prod TBPL :-)
It also changes the default TBPL refresh rate from 120 secs to 99999, to avoid the extreme janking you get when trying to look at the last several days of runs & it does a "Loading...".
Assignee: nobody → bmo
Status: NEW → ASSIGNED
Assignee | ||
Comment 11•12 years ago
|
||
Assignee | ||
Comment 12•12 years ago
|
||
Green glorious green...! [1]
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=cd626d738af7&jobname=Android
[1] http://www.youtube.com/watch?v=hEQDllvuy1I
Assignee | ||
Comment 13•12 years ago
|
||
The last hidden Android {native,XUL} suite was unhidden in bug 778954 -> closing this out :-)
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: Webtools → Tree Management
Updated•10 years ago
|
Product: Tree Management → Tree Management Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•