Closed Bug 869030 Opened 7 years ago Closed 6 years ago

Robocop testDoorHanger and testSystemPages cause future reboots on pandas

Categories

(Firefox for Android :: General, defect)

x86
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 24

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(3 files)

Consider this try run which enables testDoorHanger: https://tbpl.mozilla.org/?tree=Try&rev=9055382da417. testDoorHanger runs in rc2. On tegra, rc2 runs to completion and all tests pass. On Panda, rc2 fails on nearly every run, usually with a "blue" and "Remote Device Error: unable to connect to %s after %s attempts" -- the panda is rebooting during the rc2 run.

Some logs from that try run:
- https://tbpl.mozilla.org/php/getParsedLog.php?id=22612336&tree=Try&full=1#error0
 - reboot during testFindInPage
- https://tbpl.mozilla.org/php/getParsedLog.php?id=22612334&tree=Try&full=1#error0 and https://tbpl.mozilla.org/php/getParsedLog.php?id=22612332&full=1&branch=try#error0
 - reboot while setting up testInputAwesomeBar (before browser launch)

Most logs show:

W/Watchdog( 1401): *** WATCHDOG KILLING SYSTEM PROCESS:

I think something in testDoorHanger is triggering the reboot, but the reboot is delayed, so it happens during one of the subsequent tests.

Here is a try run that runs testDoorHanger at the end of rc2. Now rc2 is green more often (but only a little) and there are similar failures, but (almost) always during testDoorHanger:

https://tbpl.mozilla.org/?tree=Try&rev=0b71939742e1
Mobile platform team discussed this bug as a "deep dive" item today. Ideas:
 - collect /data/anr/traces.txt
 - try removing geolocation part of test
geolocation could be the cause here.  We could insert the /data/anr/traces.txt into the logcat output, not too difficult.
Reboots persist when geolocation is removed. Here is a try run without geolocation and also without offline storage (just the login doorhanger test remains): https://tbpl.mozilla.org/?tree=Try&rev=825b9868b2c2
Trials arising from Comment 3 led to the discovery that the reboots happen even without testDoorHanger. Compare base line: https://tbpl.mozilla.org/?tree=Try&rev=02e2aa264a48 (8 reboots in 18 rc2 runs) against runs with testSystemPages disabled: https://tbpl.mozilla.org/?tree=Try&rev=50aab5a39b5e (no reboots in 18 rc2 runs).
Summary: Robocop testDoorHanger causes future reboots on pandas → Robocop testDoorHanger and testSystemPages cause future reboots on pandas
Disable test to reduce panda reboots while we investigate further.
Attachment #749283 - Flags: review?(jmaher)
Comment on attachment 749283 [details] [diff] [review]
disable testSystemPages

Review of attachment 749283 [details] [diff] [review]:
-----------------------------------------------------------------

alright, lets get green and work back towards enabling these tests :)
Attachment #749283 - Flags: review?(jmaher) → review+
Depends on: 872244
With testSystemPages disabled, enabling testDoorHanger still causes fairly-frequent reboots:

https://tbpl.mozilla.org/?tree=Try&rev=3ea00c1d7687

Disabling the geolocation part of testDoorHanger seems to run much better:

https://tbpl.mozilla.org/?tree=Try&rev=6f84a8bab1ca

I want to test this again in a few days -- see if these results are consistent over time.
Duplicate of this bug: 874848
The same failure pattern is seen here, during a reftest: https://tbpl.mozilla.org/php/getParsedLog.php?id=23488550&tree=Mozilla-Inbound (thanks :philor!)
For the last week or so, I have been unable to reproduce these failures even with testDoorHanger fully enabled.

For example:

https://tbpl.mozilla.org/?tree=Try&rev=3d8489b17222
https://tbpl.mozilla.org/?tree=Try&rev=9d4f8f89d46a

At wits end, I propose enabling the test again...but am open to other ideas.
Assignee: nobody → gbrown
Attachment #758605 - Flags: review?(jmaher)
Comment on attachment 758605 [details] [diff] [review]
enable testDoorHanger

Review of attachment 758605 [details] [diff] [review]:
-----------------------------------------------------------------

oh fun stuff.
Attachment #758605 - Flags: review?(jmaher) → review+
enable testDoorHanger: https://hg.mozilla.org/integration/mozilla-inbound/rev/e110fde7785e

I will keep an eye on rc failures on inbound at least for the remainder of the day.
So far so good -- I do not see any signs of increased instability with testDoorHanger enabled.

Try runs with testSystemPages enabled look fine also. Compare:

https://tbpl.mozilla.org/?tree=Try&rev=34b36e6d4369 (testDoorHanger enabled)
https://tbpl.mozilla.org/?tree=Try&rev=2e498167be49 (testDoorHanger + testSystemPages enabled)

I feel like I have been chasing a ghost in this bug.
Attachment #759108 - Flags: review?(jmaher)
Comment on attachment 759108 [details] [diff] [review]
enable testSystemPages

Review of attachment 759108 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for looking into this.  I wonder if it was something in Firefox that has since been fixed?  I would really like to see more retriggers on panda rc2 and see if we can keep the failure rate <7%.
Attachment #759108 - Flags: review?(jmaher) → review+
Final tally on https://tbpl.mozilla.org/?tree=Try&rev=2e498167be49:
43 green
2 orange (both testMasterPassword)
5 blues (appear to be unrelated)

https://hg.mozilla.org/integration/mozilla-inbound/rev/9e276a0276b9
Whiteboard: [leave open]
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 24
See Also: → 1098962
You need to log in before you can comment on or make changes to this bug.