Closed Bug 1403283 Opened 7 years ago Closed 7 years ago

Autophone - re-enable Unit Tests

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bc, Assigned: bc)

References

(Depends on 3 open bugs)

Details

(Keywords: meta)

Attachments

(9 files)

unittest-failures.txt 7 years ago Bob Clary [:bc] (inactive) 180.45 KB, text/plain		Details
bug-1403283-temp-assign-try.patch 7 years ago Bob Clary [:bc] (inactive) 35.73 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
bug-1403283-01-full-coverage-v1.patch 7 years ago Bob Clary [:bc] (inactive) 102.09 KB, patch	jmaher : review+ snorp : feedback+	Details \| Diff \| Splinter Review
bug-1403283-manifests-full.txt 7 years ago Bob Clary [:bc] (inactive) 107.00 KB, text/plain		Details
bug-1403283-02-disable-beta-release-v1.patch 7 years ago Bob Clary [:bc] (inactive) 36.73 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
bug-1403283-03-spares-v1.patch 7 years ago Bob Clary [:bc] (inactive) 1.41 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
bug-1403283-04-disable-new-v1.patch 7 years ago Bob Clary [:bc] (inactive) 15.68 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
bug-1403283-05-disable-Mdm1-v1.patch 7 years ago Bob Clary [:bc] (inactive) 3.24 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Screenshot-2017-10-13 Autophone results.png 7 years ago Bob Clary [:bc] (inactive) 186.11 KB, image/png		Details

Bob Clary [:bc] (inactive)

Assignee

Description

•

7 years ago

Attached file unittest-failures.txt — Details

Now that we have a number of the crashes fixed, it is time to revisit the Unit Tests. I ran two sets of tests: 1. Using the default try device assignments in production across the 3 servers autophone-{1,2,3}: <https://treeherder.mozilla.org/#/jobs?repo=try&revision=4d48cb6b54df52b6d083921a58e61aacdc56fc66&filter-tier=1&filter-tier=2&filter-tier=3&group_state=expanded> 2. Using autophone-4 and the nexus 5 and pixel devices attached there: <https://treeherder.allizom.org/#/jobs?repo=mozilla-central&revision=e6b3498a39b94616ba36798fe0b71a3090b1b14c&filter-searchStr=autophone&group_state=expanded> #2 is much worse that #1. It may be due to the fact that #2 has all of the tests running on a single host while #1 spreads them out across 3 different hosts. It could also be a function of the different devices being used. Apart from time outs, crashes are pretty common. We already have a few bugs filed which I'll block this bug. 17 application crashed [@ libc.so + 0x2173c 16 application crashed [@ libc.so + 0x48484 1 application crashed [@ libc.so + 0x490a4 1 application crashed [@ libart.so + 0x313064 1 application crashed [@ boot.oat + 0x56d2cc Another issue appears to be related to a browser hang where we end up exceeding 330 seconds then have to try SIGABRT to kill the browser: org.mozilla.fennec_aurora still alive after SIGABRT: waiting... Then we have the Timed out while waiting for websocket/process bridge startup. But this appears to be common between the allizom and try runs with the same number of errors. Then we have the various test time outs. If bwu's and snorp's teams can fix the [@ libc.so + 0x2173c] and [@ libc.so + 0x48484] crashes that will help. I will: * investigate if multiple unit tests running on a host cause server start up issues and if I can improve the situation. * enable the devices which previously been used for production unit tests to run unit tests on try so that we can get results for the devices which will actually run in production. Suggestions welcome.

Bob Clary [:bc] (inactive)

Assignee

Comment 1

•

7 years ago

Attached patch bug-1403283-temp-assign-try.patch — Details — Splinter Review

fyi, I handle autophone-4 separately by applying local patches. This patch will allow me to run our normal unittests on each of the idle devices. This will also increase the number of unittests running simultaneously on autophone-{1..3} and allow me to judge the interference between multiple instances of the test runners and local web servers started by the unit tests.

Attachment #8912401 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Attachment #8912401 - Flags: review?(jmaher) → review+

Bob Clary [:bc] (inactive)

Assignee

Comment 2

•

7 years ago

https://github.com/mozilla/autophone/commit/16194e6751875e92a15995b1adbb361a9acc2288

Bob Clary [:bc] (inactive)

Assignee

Comment 3

•

7 years ago

Seems autophone had stalled consuming pulse messages sometime after I submitted my try job. Now that I've applied the patch and restarted the servers, additional jobs for the new devices have been created.

Bob Clary [:bc] (inactive)

Assignee

Comment 4

•

7 years ago

gbrown is fixing an issue with killing orphaned xpcshell and ssltunnel processes. The try runs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6bc6ceb4adce778f89a5e8d03de202a79a4c8f86 https://treeherder.allizom.org/#/jobs?repo=try&revision=6bc6ceb4adce778f89a5e8d03de202a79a4c8f86 both show marked improvement. The remaining issues in the unit tests appear very solvable and we should make good progress once bug 1403501 lands.

Bob Clary [:bc] (inactive)

Assignee

Updated

•

7 years ago

Depends on: 1408241

Bob Clary [:bc] (inactive)

Assignee

Comment 5

•

7 years ago

Attached patch bug-1403283-01-full-coverage-v1.patch — Details — Splinter Review

This patch achieves full coverage by adding whatever devices are necessary. It also includes 10 spare devices of each type on top of the requirements for full coverage. This provides us the ability to recover as devices wear out and have to be recycled. If we don't wish to add this many devices we can decide on which tests are required for each device type and cut down on the coverage. device type #devices required nexus 4 20 nexus 5 10 nexus 6p 11 pixel 4 follow up patches will trim the manifests to match our current inventory until the new devices are available.

Attachment #8918294 - Flags: review?(jmaher)

Attachment #8918294 - Flags: feedback?(snorp)

Bob Clary [:bc] (inactive)

Assignee

Comment 6

•

7 years ago

Attached file bug-1403283-manifests-full.txt — Details

device/test/repo assignments.

Bob Clary [:bc] (inactive)

Assignee

Comment 7

•

7 years ago

Attached patch bug-1403283-02-disable-beta-release-v1.patch — Details — Splinter Review

This patch disables the unit tests on beta and release until we can confirm they are green there.

Attachment #8918296 - Flags: review?(jmaher)

Bob Clary [:bc] (inactive)

Assignee

Comment 8

•

7 years ago

Attached patch bug-1403283-03-spares-v1.patch — Details — Splinter Review

This patch disables the spares that are planned.

Attachment #8918297 - Flags: review?(jmaher)

Bob Clary [:bc] (inactive)

Assignee

Comment 9

•

7 years ago

Attached patch bug-1403283-04-disable-new-v1.patch — Details — Splinter Review

This patch disables the new devices until they have been acquired.

Attachment #8918298 - Flags: review?(jmaher)

Bob Clary [:bc] (inactive)

Assignee

Comment 10

•

7 years ago

Attached patch bug-1403283-05-disable-Mdm1-v1.patch — Details — Splinter Review

This patch disables Mdm1 until bug 1408241 is fixed or until we disable the failing test.

Attachment #8918299 - Flags: review?(jmaher)

Bob Clary [:bc] (inactive)

Assignee

Comment 11

•

7 years ago

I forgot to call out that the initial patch also renames the devices so that we no longer have single digit names... nexus-5-1 is now nexus-5-01. This prevent some confusion in matching device names in scripts as well as displays the devices in a more logical sorted order. This does require that I update the mysql database for phonedash to change the names when we deploy the patches.

Joel Maher ( :jmaher ) (UTC -8)

Comment 12

•

7 years ago

Comment on attachment 8918294 [details] [diff] [review] bug-1403283-01-full-coverage-v1.patch Review of attachment 8918294 [details] [diff] [review]: ----------------------------------------------------------------- many changes here, some rubber stamping, much real reviews.

Attachment #8918294 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

7 years ago

Comment on attachment 8918296 [details] [diff] [review] bug-1403283-02-disable-beta-release-v1.patch Review of attachment 8918296 [details] [diff] [review]: ----------------------------------------------------------------- do we get much value out of mozilla-release? the volume is low there and do bugs get filed? Possibly we can reduce some complexity by ignoring that.

Attachment #8918296 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Attachment #8918297 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

7 years ago

Comment on attachment 8918298 [details] [diff] [review] bug-1403283-04-disable-new-v1.patch Review of attachment 8918298 [details] [diff] [review]: ----------------------------------------------------------------- could we make it simpler to not have to comment out each line, maybe change the filename or something? or use a [default] section at the top

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Attachment #8918298 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

7 years ago

Comment on attachment 8918299 [details] [diff] [review] bug-1403283-05-disable-Mdm1-v1.patch Review of attachment 8918299 [details] [diff] [review]: ----------------------------------------------------------------- looks good

Attachment #8918299 - Flags: review?(jmaher) → review+

Bob Clary [:bc] (inactive)

Assignee

Comment 16

•

7 years ago

(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #13) > > do we get much value out of mozilla-release? the volume is low there and do > bugs get filed? Possibly we can reduce some complexity by ignoring that. Not much value. I don't think we have ever filed a bug from release but I do think it is important that we test what is shipping "just in case". (In reply to Joel Maher ( :jmaher) (UTC-5) from comment #14) > > could we make it simpler to not have to comment out each line, maybe change > the filename or something? or use a [default] section at the top I suppose we could do something better. For me, the autophone-4 data was just to have a place to hang my hat (devices, tests) while I calculated how to distribute the devices. A quick find and replace takes just seconds while coming up with an alternative approach would take longer to design, code and test. With our plans on revamping how Autophone handles devices, tests and repos I really don't want to invest any time in improving something that has a limited lifetime anyway.

Joel Maher ( :jmaher ) (UTC -8)

Comment 17

•

7 years ago

as a note, we don't do anything with the mozilla-release data in perfherder, maybe we run some unittests on there only?

Bob Clary [:bc] (inactive)

Assignee

Comment 18

•

7 years ago

Attached image Screenshot-2017-10-13 Autophone results.png — Details

Lets let snorp decide if he finds any value in release. We won't be enabling unit tests there for a while anyway. Attaching a screen shot of release phonedash since june 1 for reference.

Flags: needinfo?(snorp)

Bob Clary [:bc] (inactive)

Assignee

Comment 19

•

7 years ago

https://github.com/mozilla/autophone/commit/611d574024d073090c2744fbdd83cba72d9fdc96 https://github.com/mozilla/autophone/commit/4ec828fdd47674f9ce4986e5b4da6cecebc170d2 https://github.com/mozilla/autophone/commit/ad3570430ab9714fc9864d71c777a62534cbf531 https://github.com/mozilla/autophone/commit/ad191462945ae0430044b7fe170eabda91429d08 https://github.com/mozilla/autophone/commit/eac051edea05e3aa7524a53825618435947b205b deploying now.

Blocks: autophone-deployments

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Bob Clary [:bc] (inactive)

Assignee

Updated

•

7 years ago

Depends on: 1409016

Bob Clary [:bc] (inactive)

Assignee

Updated

•

7 years ago

Depends on: 1409073

Bob Clary [:bc] (inactive)

Assignee

Updated

•

7 years ago

Depends on: 1409203

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

7 years ago

Attachment #8918294 - Flags: feedback?(snorp) → feedback+

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

7 years ago

Flags: needinfo?(snorp)

BMO Automation

Updated

•

3 years ago

Product: Testing → Testing Graveyard

You need to log in before you can comment on or make changes to this bug.