Closed Bug 1263072 (autophone-tier2) Opened 4 years ago Closed 4 years ago

Autophone - make Autophone a Tier 2 job on Treeherder

Categories

(Testing :: Autophone, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bc, Assigned: bc)

References

Details

(Keywords: meta)

Attachments

(3 files)

Autophone is currently a Tier 3 Job on Treeherder. This means that is is hidden by default and is not regularly viewed by Sheriffs or Developers. In order for Autophone to deliver maximum value, it must be more visible and accessible to Sheriffs and Developers.

It is impossible for Autophone to become a Tier 1 Job due to the limitations inherent in the limited number of devices available for testing, but Autophone can meet the requirements for a Tier 2 job as listed in: https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#Requirements_for_jobs_shown_in_the_default_Treeherder_view

Autophone can also meet a limited or reduced set of Tier 1 requirements as listed in: https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#Additional_requirements_for_Tier_1_jobs such as

* Runs on mozilla-central, mozilla-inbound and fx-team
* Scheduled on every push for supported repositories
* Must avoid patterns known to cause non deterministic failures
* Low intermittent failure rate
* Easily run on try server
* Easy for a dev to run locally

Since Autophone's source is hosted in https://github.com/mozilla/autophone/ and is not updated on the servers automatically, it is not possible to support the disabling of individual tests by Sheriffs.

An additional requirement for Autophone's Tier 2 status is the reduction of the need for manual intervention to fix network and usb connectivity issues.

This bug is a tracking bug for this project.
Depends on: 1268571, 1260824, 1126448
Depends on: 1277921
Depends on: 1278077
Depends on: 1278536
Created https://wiki.mozilla.org/EngineeringProductivity/Autophone as the new home for Autophone on the wiki.

Redirected the old Auto-tools Autophone pages to the new page.

Added a link to the above from https://developer.mozilla.org/en-US/docs/Mozilla/QA/Automated_testing

Note: I just did one entry under functional testing and didn't add anything to the Talos centric performance section.
i'm ok with moving to tier-2 also with bc's mail to the sheriffs.

Wes, any objections/concerns before we move to this to tier 2 ?
Flags: needinfo?(wkocher)
bc: can you file a treeherder:visibility request bug and ask camd if he can do the change ? Thanks!
Great, Can do! I'm not quite ready to pull the switch yet since I still have to do some work to green up the unit tests plus fix some minor reporting issues, but will do soonest.
Blocks: 1280160
Alias: autophone-tier2
Depends on: 1281511
submit tier 2 to Treeherder.
Attachment #8765999 - Flags: review?(jmaher)
Disables perma orange unit tests on mozilla-central. They are still available on try.

I tried and tried to get a subset of tests which would run perma-green with rare oranges, but the tests are so flaky that it was impossible and I've given up on finding a green partition. The approach I think will work is to fix each and every test to be better behaved and then disable the ones that are perma or semi perma orange.

dminor: I hope you are ok with disabling Mw on mozilla-central.
Attachment #8766000 - Flags: review?(jmaher)
Attachment #8766000 - Flags: feedback?(dminor)
Comment on attachment 8765999 [details] [diff] [review]
bug-1263072-submit-tier2-v1.patch

Review of attachment 8765999 [details] [diff] [review]:
-----------------------------------------------------------------

oh, this is wonderful!
Attachment #8765999 - Flags: review?(jmaher) → review+
Comment on attachment 8766000 [details] [diff] [review]
bug-1263072-disable-perma-orange-v1.patch

Review of attachment 8766000 [details] [diff] [review]:
-----------------------------------------------------------------

do we have bugs on file to get these jobs enabled again?  I know of a few, just want to ensure we do for all of them!
Attachment #8766000 - Flags: review?(jmaher) → review+
Not yet. There are literally so many and their behavior is so random that I had a hard time figuring out how to proceed especially with regard to which versions of Android the timeout/crash/failure occurred since it would change depending on the phase of the moon.

One thing that is problematic is that timeouts/crashes hide timeouts/crashes which hide timeouts/crashes af infinitum. I ended up trying to bundle them all into related patches but didn't file individual bugs on them.

The approach I would like to take going forward is to have tracking bugs for the test suites and deal with each one at a time and file/fix bugs on each orange for the suite until it is reasonably green. I'll make sure I have tracking bugs for each one I disable.
deployed 2016-06-28 11:34
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Comment on attachment 8766000 [details] [diff] [review]
bug-1263072-disable-perma-orange-v1.patch

Review of attachment 8766000 [details] [diff] [review]:
-----------------------------------------------------------------

Can we agree upon what an acceptable orange rate would be before they are turned on again?

Looking at the Mw results on central for June, other than test_getUserMedia_playAudioTwice.html permafailing on the android-6 phones, things look more green than orange to me; I would not call Mw perma orange. These tests are also intermittent on the emulators, where they run as tier-1. Are the orange rates substantially worse on autophone rather than on the emulators? Are things worse on try than on central?

I wish you would have let me know earlier that the Mw tests were at risk of being disabled and I would have prioritized working on getting them green(er). It looks like I need to investigate test_peerConnection_simulcastOffer.html which failed in several jobs. It is also intermittent on the emulators but might be worse here. And I need to have a look on the android-6 test failure, possibly it is masking other failures. That said, we don't need to run on android-6 phones right away.

Is there anything else you noticed that I should have a look at?
Attachment #8766000 - Flags: feedback?(dminor)
dminor: Sorry. The only perma orange we see in production is the nexus-6p devices. When attempting to find the set of tests which would pass I kept running into issues with timeouts or crashes on practically everything. We can re-enable Mw on everything except nexus-6p if you like. I'll work up a patch.
Attachment #8766031 - Flags: review?(dminor)
bc: If just disabling nexus-6p is sufficient to keep Mw running on central, that would be great, please do so. I'll be happy to look at the nexus-6p specific failures on Try. Thanks!
Comment on attachment 8766031 [details] [diff] [review]
bug-1263072-Mw.patch

Review of attachment 8766031 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you!
Attachment #8766031 - Flags: review?(dminor) → review+
I filed Bug 1282897 to track getting things green on the nexus-6p devices.
You need to log in before you can comment on or make changes to this bug.