Closed
Bug 855406
Opened 13 years ago
Closed 12 years ago
Un-hide Android 4.0 panda robocop tests when the failure rate isn't unacceptably high
Categories
(Tree Management Graveyard :: Visibility Requests, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 865557
People
(Reporter: RyanVM, Unassigned)
References
Details
These tests are timing out about as often as they are running to completion these days. This is unacceptable for a visible-by-default test. I have hidden these on mozilla-central based branches until the failure rate is improved enough for them to be shown by default again.
![]() |
||
Comment 1•13 years ago
|
||
(Moving to TBPL's component, since that's where the unhiding takes place, and is the component sheriffs watch. Fixing the failure rate to get us to the point where we can unhide will be covered by dependant bugs)
Component: General → Tinderboxpushlog
Product: Testing → Webtools
Version: unspecified → Trunk
Reporter | ||
Comment 2•13 years ago
|
||
With bug 814282 fixed, the only orange I'm seeing in these tests is the regular timeouts. However, the timeout rate is still pretty high, so I'm going to leave it hidden for now.
![]() |
||
Comment 3•13 years ago
|
||
Panda rc failures continue, with the most common cause being "2400 seconds without output". As far as I can tell, these are genuine failures: the buildbot slave is seeing no activity for 40 minutes from the device. Robocop tests are affected much more frequently than M2-M8 or any of the reftests or talos tests.
I used try to determine and verify that only certain rc tests are implicated, but there are 7 such tests, and they seem unrelated:
-[testAxisLocking]
-[testAboutPage]
-[testWebContentContextMenu]
-[testFormHistory]
-[testDoorHanger]
-[testClearPrivateData]
-[testSystemPages]
https://tbpl.mozilla.org/?tree=Try&rev=1670b69a0d31
For most (all?) of these timeouts, when reported as oranges, the logcat is not retrieved and I see no clues in the logs as to root cause.
However, there are also less-frequent retries ("blues") which often do have logcats, and these often have evidence of ANRs leading up to system watchdog-initiated reboots. For example:
https://tbpl.mozilla.org/php/getParsedLog.php?id=21664782&tree=Mozilla-Inbound&full=1#error0
Notice particularly:
04-10 16:20:45.078 I/Process ( 1402): Sending signal. PID: 1402 SIG: 3
04-10 16:20:45.078 I/dalvikvm( 1402): threadid=3: reacting to signal 3
04-10 16:20:45.093 I/dalvikvm( 1402): Wrote stack traces to '/data/anr/traces.txt'
04-10 16:21:15.210 I/dalvikvm( 1402): threadid=3: reacting to signal 3
04-10 16:21:15.210 I/Process ( 1402): Sending signal. PID: 1402 SIG: 3
04-10 16:21:15.226 I/dalvikvm( 1402): Wrote stack traces to '/data/anr/traces.txt'
04-10 16:21:15.242 I/Process ( 1402): Sending signal. PID: 1543 SIG: 3
04-10 16:21:15.242 I/dalvikvm( 1543): threadid=3: reacting to signal 3
04-10 16:21:15.242 I/dalvikvm( 1543): Wrote stack traces to '/data/anr/traces.txt'
04-10 16:21:17.343 I/Watchdog_N( 1402): dumpKernelStacks
04-10 16:21:17.476 I/dalvikvm-heap( 1402): Grow heap (frag case) to 9.190MB for 187400-byte allocation
04-10 16:21:17.585 I/dalvikvm-heap( 1402): Grow heap (frag case) to 9.160MB for 140848-byte allocation
04-10 16:21:19.421 I/Process ( 1402): Sending signal. PID: 1402 SIG: 9
04-10 16:21:19.421 W/Watchdog( 1402): *** WATCHDOG KILLING SYSTEM PROCESS: com.android.server.am.ActivityManagerService
04-10 16:21:19.476 I/ServiceManager( 1285): service 'accessibility' died
04-10 16:21:19.476 I/ServiceManager( 1285): service 'input_method' died
04-10 16:21:19.476 I/ServiceManager( 1285): service 'notification' died
...
But notice that this happened more than a minute after the test completed and fennec was force-stopped.
Can we retrieve some anr traces from a set of pandas?
![]() |
||
Comment 4•13 years ago
|
||
:jchen -- Can you have a look at https://tbpl.mozilla.org/php/getParsedLog.php?id=21664782&tree=Mozilla-Inbound&full=1#error0 with an eye to the anr messages? Now I am seeing many anr's here, even though the test appears to be progressing normally.
Flags: needinfo?(nchen)
Comment 5•13 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #4)
> :jchen -- Can you have a look at
> https://tbpl.mozilla.org/php/getParsedLog.php?id=21664782&tree=Mozilla-
> Inbound&full=1#error0 with an eye to the anr messages? Now I am seeing many
> anr's here, even though the test appears to be progressing normally.
They don't appear to be normal ANRs -- we always log a "processing Gecko ANR" message when we get normal ANRs in the ANR reporter (which should be enabled in the tbpl nightlies).
I just pushed a try (https://tbpl.mozilla.org/?tree=Try&rev=893fd4e59bbe) to enable more logging in the ANR reporter. If we still don't see more logs from the ANR reporter, it means these are not Android-generated ANRs. It could be that some code somewhere is simply trying to generate a stack dump and is simulating an ANR.
Flags: needinfo?(nchen)
![]() |
||
Comment 6•13 years ago
|
||
I thought bug 852467 might be a factor; try run with cache disabled is not very helpful: https://tbpl.mozilla.org/?tree=Try&rev=a76c26d572d2. Is that a moderate improvement, or luck?
it looks like a minor improvement, but it could just be luck, it is really hard to tell.
![]() |
||
Comment 8•12 years ago
|
||
These have been unhidden for a while, via bug 865557.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Updated•11 years ago
|
Product: Webtools → Tree Management
![]() |
||
Updated•11 years ago
|
Component: TBPL → Visibility Requests
Updated•7 years ago
|
Product: Tree Management → Tree Management Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•