Closed Bug 917361 Opened 11 years ago Closed 6 years ago

Meet tbpl's visibility requirements for Androidx86

Categories

(Testing :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: armenzg, Assigned: gbrown)

References

Details

https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#Requirements_for_being_shown_in_the_default_TBPL_view

1) Has an active owner
Mobile team

2) Breakage is expected to be followed by tree closure or backout
Yes

3) Runs on mozilla-central and all trees that merge into it
Yes

4) Scheduled on every push
Yes

5) Easily run on try server
Bug 915870

6) Outputs failures in a TBPL-starrable format
Yes

7) Low intermittent failure rate
Yes

8) Must avoid patterns known to cause non deterministic failures
Bug 916923 - I think

9) Supports the disabling of individual tests
Yes

10) Has sufficient documentation
gbrown?

11) Easy for a dev to run locally
Bug 917324
Depends on: 915870
Depends on: 916923
Depends on: 917324
Depends on: 917558
Depends on: 920627
I want to try to get things visible before Monday (end of the quarter).

If we get bug 920627 and bug 915870 done before Monday, would you mind if we un-hide them?

I promise to finish bug 917324 soon after.

I can also promise that I will try to figure things out on bug 917558 but I don't know how long it will take.
We're decidedly unlikely to get bug 917562 fixed by Monday, so fortunately we don't need to decide whether or not to do the wrong thing purely to meet a completely arbitrary and pointless calendar date.
(In reply to Phil Ringnalda (:philor) from comment #2)
> We're decidedly unlikely to get bug 917562 fixed by Monday,
>
That wasn't even in the list of dependencies! Anymore we should get our eyes on?
Could you please add what you want to meet the requirements?

> so fortunately we don't need to decide whether or not to do the wrong
> thing purely to meet a completely arbitrary and pointless calendar date.
>
I don't like it either but sometimes I get asked to ask or try to make it happen.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #3)
> That wasn't even in the list of dependencies! Anymore we should get our eyes
> on?

I would guess it's via:
https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#7.29_Low_intermittent_failure_rate
Not so much via low failure rate (though the rate for the two CSS crashes I filed, the two or more SVG crashes I filed, don't remember, several more SVG crashes I haven't filed, the Graphics crashes I have and haven't filed, and a few more in the weeds that I haven't filed is pretty high, and crashing and thus having to retrigger to get a full test run is pretty bad behavior) as it is via the unwritten visibility rule, "Don't run on a bogus platform." Absolutely any new patch that lands might trigger a new bogus crash, intermittent or 100% permanent, only detectable as being bogus by froydnj looking at the disassembly.

I didn't have bug 917562 blocking this because until yesterday it was just one crash among many; now that it's one crash among many that probably are or maybe aren't all from bogus emulator behavior, the thing that should be blocking this is either an unfiled bug to install a version of Qemu that doesn't yet exist, or an unfiled bug to build our own patched Qemu. And I didn't file either one since I'm the wrong one to be deciding which.
(In reply to Ed Morley [:edmorley UTC+1] from comment #4)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #3)
> > That wasn't even in the list of dependencies! Anymore we should get our eyes
> > on?
> 
> I would guess it's via:
> https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#7.
> 29_Low_intermittent_failure_rate

I meant the list of dependencies in this bug. Can we please add whatever you guys deem necessary to fix before being visible?

On another note, is there a way that we can get help with measuring this?
>  Therefore as a rough guide a new platform/testsuite must have at most a 5% per job failure rate initially, and ideally <1% longer term.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #6)
> I meant the list of dependencies in this bug. Can we please add whatever you
> guys deem necessary to fix before being visible?

"Tests are generally flaky" isn't really something we can file/attach? It doesn't seem like things are even close to being green enough that we need to point out what's missing? (Don't mean that harshly, just a bit confused as to the ask here).

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #6)
> On another note, is there a way that we can get help with measuring this?
> >  Therefore as a rough guide a new platform/testsuite must have at most a 5% per job failure rate initially, and ideally <1% longer term.

Either by filtering (eg https://tbpl.mozilla.org/?tree=Cedar&jobname=Android%204.2%20x86) and doing by eye, or else jmaher has a useful js scratchpad script that he uses for Android suite failure stats - ask him? :-)
(In reply to Ed Morley [:edmorley UTC+1] from comment #7)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #6)
> > I meant the list of dependencies in this bug. Can we please add whatever you
> > guys deem necessary to fix before being visible?
> 
> "Tests are generally flaky" isn't really something we can file/attach? It
> doesn't seem like things are even close to being green enough that we need
> to point out what's missing? (Don't mean that harshly, just a bit confused
> as to the ask here).
> 
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #6)
> > On another note, is there a way that we can get help with measuring this?
> > >  Therefore as a rough guide a new platform/testsuite must have at most a 5% per job failure rate initially, and ideally <1% longer term.
> 
> Either by filtering (eg
> https://tbpl.mozilla.org/?tree=Cedar&jobname=Android%204.2%20x86) and doing
> by eye, or else jmaher has a useful js scratchpad script that he uses for
> Android suite failure stats - ask him? :-)

FTR, I'm only asking for the green sets not for the sets that are orange on Cedar.

Let me take a step back, what should I fix to show sets 1 & 2 visibly on tbpl?
https://tbpl.mozilla.org/?jobname=Android%204.2%20x86&showall=1

I can have bug 920627 and bug 915870 fixed for Monday.
Would that be good enough as long as I promise to fix bug 917324 soon after?
Depends on: 892688
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #8)
> Let me take a step back, what should I fix to show sets 1 & 2 visibly on
> tbpl?
> https://tbpl.mozilla.org/?jobname=Android%204.2%20x86&showall=1
> 
> I can have bug 920627 and bug 915870 fixed for Monday.
> Would that be good enough as long as I promise to fix bug 917324 soon after?

I must not be explaining the concern with bug 917562 well enough.

"Visible" means "if you cause it to fail, we will back you out."

We know that because of a QEMU bug, any perfectly correct patch could at any time cause every single androidx86 test run to crash on startup. If a perfectly correct patch did so because of that QEMU bug, we would not back out that perfectly correct patch. The only way to tell the difference between a perfectly correct patch which causes us to crash and a completely wrong patch that causes us to crash is to have froydnj look at the disassembly. So, running with the current emulator built on a buggy QEMU, no androidx86 test suite may be made visible.

I don't have a strong opinion about what else you would have to do by Monday, but you would have to build and deploy a new emulator image built off an unreleased QEMU with froydnj's patch. Maybe I'm overestimating the difficulty of doing that, but to me that sounds like the full, complete, more than sufficient explanation of why this quarterly goal will not be met: "nobody realized early enough on that we can and do very easily trigger a crashing bug in QEMU, so the current version of the emulator is not usable."
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #8)
> FTR, I'm only asking for the green sets not for the sets that are orange on
> Cedar.

Ah sorry I thought you meant all of them (and there are many permaorange at the moment).

(In reply to Phil Ringnalda (:philor) from comment #9)
> I must not be explaining the concern with bug 917562 well enough.

Ah I had missed that bug - thank you for the explanation :-)
bug 915870 and bug 920627 have been completed.

For now, I'm not going to help with bug 917324 until I see others bugs clearing up (including the QEMU one).

Status update:
##############
* Our longest pole comes from trying to improve the reliability of the emulator used (bug 892688)
* Particularly, bug 917562 is bad because it points that we have a bad emulator

Assigning to gbrown to help figure this out.
Assignee: nobody → gbrown
sheriffs: for bug 917324 (make it easy to run for developers), would it be sufficient to make it easy to run on a releng machine? or is it mandatory to run it on the developer's machine?

If the later is needed, we have some components inside of test-x86.tar.gz that might not be re-distributable. We might need to find a place to put it behind LDAP or request developers to follow instructions from https://bugzilla.mozilla.org/show_bug.cgi?id=894507#c19
Flags: needinfo?(emorley)
If they can use Try and we can also give loaners out to people readily and/or give them the non-distributables securely, then I think that should be sufficient? (Given this is a bit of an edge-case)
Flags: needinfo?(emorley)
Depends on: 936226
Depends on: 964589
I'm not finding time for this and x86 testing is not currently a high priority.

We recently opened access to AVDs and documented how to set up a 2.3 emulator locally -- x86 would be very similar.

https://wiki.mozilla.org/Mobile/Fennec/Android#Running_the_Android_2.3_ARM_emulator
Assignee: gbrown → nobody
A limited set of tests now run on Android x86 as tier 1.
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.