Closed Bug 777436 Opened 12 years ago Closed 12 years ago

Ascertain which Android test suites have >30% failure rate and hide them

Categories

(Tree Management Graveyard :: TBPL, defect)

Product:

Component:

Platform:

ARM

Android

Type:

defect

Priority:

Not set

Severity:

major

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

(Whiteboard: [sheriff-want])

Attachments

(2 files)

TBPL failure stats 12 years ago Ed Morley [:emorley] 3.86 KB, patch		Details \| Diff \| Splinter Review
Now that's what I'm talking about! :-D 12 years ago Ed Morley [:emorley] 48.47 KB, image/jpeg		Details

Ed Morley [:emorley]

Assignee

Description

•

12 years ago

The current overall Android test failure rate is extremely high, leading to:
* People not taking *any* Android test failures seriously, real or otherwise.
* Many retriggers, increasing load dramatically, which is a contributor to bug 772458 (and combined with things like bug 777273 the situation becomes dire).
* Sheriffs just starring all Android failures generically with 'a' (and have been for months), since there are just too many failures to open each log and star with the correct bug across 10+ trees. (Even starring generically, Android test failures eat up a significant proportion of my day).

People like gbrown, jmaher, Callek, armeng (and many more) are working hard at getting the failures (both infra/hardware and test specific) resolved (for which I'm exceptionally grateful), however we cannot wait any longer. As it is it will be weeks, if not months before we undo the conditioning of all devs to just ignore all Android test failures - even once the tests are routinely green.

Many of the failures are in only a handful of the test suites - so in the short term, we should just hide the worst behaving suites (those with over 30% failure rate).

This will:
* stop people being tempted to retrigger the unreliable test suites, reducing tegra load.
* improve the overall perception of the reliability of Android tests - starting the slow journey back to people trusting them.
* mean that the tests can still be viewed using &noignore=1 added to the TBPL URL; - it's not like we're disabling them and losing coverage.

Ed Morley [:emorley]

Assignee

Comment 1

•

12 years ago

Native Android:

from: 5bd9db1381a6 (bbondy@moco – Sat Jul 21 18:58:07 2012 UTC+1)
to: fe77957061ea (jmathies@moco – Wed Jul 25 10:28:32 2012 UTC+1)

~mochitest-1~
 28 failing, 73 total (38%)
~mochitest-2~
 17 failing, 67 total (25%)
~mochitest-3~
 27 failing, 70 total (39%)
~mochitest-4~
  7 failing, 64 total (11%)
~mochitest-5~
  7 failing, 65 total (11%)
~mochitest-6~
  9 failing, 71 total (13%)
~mochitest-7~
 12 failing, 69 total (17%)
~mochitest-8~
 28 failing, 71 total (39%)
~robocop~
 50 failing, 88 total (57%)
~crashtest-2~
  7 failing, 63 total (11%)
~crashtest-3~
  3 failing, 64 total ( 5%)
~jsreftest-1~
 11 failing, 72 total (15%)
~jsreftest-2~
 10 failing, 66 total (15%)
~jsreftest-3~
  4 failing, 61 total ( 7%)
~reftest-1~
  9 failing, 67 total (13%)
~reftest-2~
 41 failing, 81 total (51%)
~reftest-3~
 46 failing, 86 total (53%)
~remote-tdhtml~
  3 failing, 59 total ( 5%)
~remote-trobocheck~
 24 failing, 66 total (36%)
~remote-trobocheck2~
 18 failing, 64 total (28%)
~remote-trobocheck3~
 32 failing, 78 total (41%)
~remote-trobopan~
 26 failing, 66 total (39%)
~remote-troboprovider~
 20 failing, 71 total (28%)
~remote-tsvg~
  8 failing, 60 total (13%)
~remote-tp4m_nochrome~
  5 failing, 63 total ( 8%)
~remote-ts~
 45 failing, 86 total (52%)

Ed Morley [:emorley]

Assignee

Comment 2

•

12 years ago

Meant to add, the figures in comment 1 and here, exclude the blue retries; but include all other failures.

XUL Android:
(Same timeframe as comment 1)

~mochitest-1~
  6 failing, 66 total ( 9%)
~mochitest-2~
  6 failing, 69 total ( 9%)
~mochitest-3~
  2 failing, 64 total ( 3%)
~mochitest-4~
  6 failing, 63 total (10%)
~mochitest-5~
  2 failing, 66 total ( 3%)
~mochitest-6~
  4 failing, 63 total ( 6%)
~mochitest-7~
  2 failing, 63 total ( 3%)
~mochitest-8~
 10 failing, 67 total (15%)
~crashtest-2~
  4 failing, 63 total ( 6%)
~crashtest-3~
  4 failing, 62 total ( 6%)
~jsreftest-1~
 16 failing, 67 total (24%)
~jsreftest-2~
 15 failing, 68 total (22%)
~jsreftest-3~
  3 failing, 59 total ( 5%)
~reftest-1~
  2 failing, 63 total ( 3%)
~reftest-2~
  3 failing, 62 total ( 5%)
~reftest-3~
  9 failing, 65 total (14%)

Ed Morley [:emorley]

Assignee

Comment 3

•

12 years ago

Setting a threshold of 30% for native Android, and 20% for XUL (given that we only need it to ensure the metro/B2G stuff still works), leaves the following:

Native:
* mochitest-1          38%
* mochitest-3          39%
* mochitest-8          39%
* reftest-2            51%
* reftest-3            53%
* robocop              57%
* remote-trobocheck    36%
* remote-trobocheck3   41%
* remote-trobopan      39%
* remote-ts            52%

XUL:
* jsreftest-1          24%
* jsreftest-2          22%

The above have been hidden on:
* mozilla-central
* mozilla-inbound
* fx-team
* services-central
* try

I have left out project repos, since many are not running Android tests / sheriffs don't have to star them so makes little difference.

Will file dependants for unhiding each, next.

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Whiteboard: [sheriff-want]

Ed Morley [:emorley]

Assignee

Comment 4

•

12 years ago

I've just added "Show all Android tests" links to the TBPL status messages for each of the trees in comment 3, linking to " ...&jobname=Android&noignore=1" (similar to how we've done it for Spidermonkey builds on inbound for some time).

Steve Fink [:sfink] [:s:]

Comment 5

•

12 years ago

(In reply to Ed Morley [:edmorley] from comment #0)
> * improve the overall perception of the reliability of Android tests -
> starting the slow journey back to people trusting them.

I think this bug is a good change, but I did the math on the percentages in comment 1. Even with everything with a 30% failure right or higher discarded, there's a 92% chance of a bogus failure in at least one of the remaining tests. So perhaps this could help perception somewhat, but I'd hold off on any announcements that everything is better to avoid a "never cry wolf" effect. (Never cry no wolf?)

It does mean that the odds of only needed a single retrigger on a visible failure are much higher.

I suppose automated retriggering would get things to a decent state, but that seems very bad from a load perspective.

Perhaps if a build of a later changeset is green, it could "auto-star" earlier builds? (Just in the tbpl UI, I mean.)

Ed Morley [:emorley]

Assignee

Comment 6

•

12 years ago

(In reply to Steve Fink [:sfink] (vacation Jul30-Aug10) from comment #5)
> I think this bug is a good change, but I did the math on the percentages in
> comment 1. Even with everything with a 30% failure right or higher
> discarded, there's a 92% chance of a bogus failure in at least one of the
> remaining tests. So perhaps this could help perception somewhat, but I'd
> hold off on any announcements that everything is better to avoid a "never
> cry wolf" effect. (Never cry no wolf?)

Yeah I wasn't going to shout anything yet :-)
(This bug was as much for my sheriffing sanity as anything else - at least in the short term).

Post bug 775227's bad test disabling, the revised figures (inbound since ~Fri) for the hidden Native mochitests are:

Native:
* mochitest-1: 21 failing, 81 total (26%)
* mochitest-3: 76 failing, 78 total (97%)
* mochitest-8: 15 failing, 82 total (18%)
* robocop    : 39 failing, 81 total (48%)

I still haven't filed the dependant bugs, will do so after the a-team meeting. Will also try to track down where the m3 failure rate jumped up from that in comment 3 + un-hide m8 since it seems much better behaved now :-)

Ed Morley [:emorley]

Assignee

Comment 7

•

12 years ago

Native Android M8 unhidden on all trees listed in comment 3.

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

12 years ago

looks like we won't have anything running before too long:)

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778952

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778954

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778956

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778958

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778960

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778961

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778962

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778963

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778964

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778965

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 778967

Ed Morley [:emorley]

Assignee

Comment 9

•

12 years ago

(In reply to Joel Maher (:jmaher) from comment #8)
> looks like we won't have anything running before too long:)

At least that might mean we get buy-in from platform... :-)

Joking aside, unless something drastic happens, we should only be unhiding them from this point forwards (eg M8 which has already improved enough to unhide).

Ed Morley [:emorley]

Assignee

Comment 10

•

12 years ago

Attached patch TBPL failure stats — Details — Splinter Review

This patch allows TBPL to show failure stats for the current view. Bit hacky but gets the job done.

To use:
1) Check out http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/
2) Apply patch
3) Run index.html from the local filesystem
4) Adjust filters to whichever Android suite(s) you would like included in the stats
5) Stats displayed top right of the UI, where the unstarred count normally is.

Since this method uses the server side components from prod, the data imports and hidden builders will all match prod TBPL :-)

It also changes the default TBPL refresh rate from 120 secs to 99999, to avoid the extreme janking you get when trying to look at the last several days of runs & it does a "Loading...".

Assignee: nobody → bmo

Status: NEW → ASSIGNED

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 771626

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Depends on: 779871

Ed Morley [:emorley]

Assignee

Comment 11

•

12 years ago

Attached image Now that's what I'm talking about! :-D — Details

Ed Morley [:emorley]

Assignee

Comment 12

•

12 years ago

Green glorious green...! [1]
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=cd626d738af7&jobname=Android

[1] http://www.youtube.com/watch?v=hEQDllvuy1I

Ed Morley [:emorley]

Assignee

Comment 13

•

12 years ago

The last hidden Android {native,XUL} suite was unhidden in bug 778954 -> closing this out :-)

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: Webtools → Tree Management

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: Tree Management → Tree Management Graveyard

You need to log in before you can comment on or make changes to this bug.