Closed
Bug 1098962
Opened 10 years ago
Closed 9 years ago
Android 4.0 robocop-1 has an unacceptable retry rate
Categories
(Firefox for Android Graveyard :: Testing, defect)
Firefox for Android Graveyard
Testing
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: gbrown)
References
Details
https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-central&revision=7f0d92595432&searchQuery=Android%204.0%20Panda%20mozilla-central%20opt%20test%20robocop-1 with a dozen retries before I got sick of seeing the waste of electricity is pretty typical. Stop it.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → gbrown
Summary: robocop-1 has an unacceptable retry rate → Android 4.0 robocop-1 has an unacceptable retry rate
Assignee | ||
Comment 1•10 years ago
|
||
It looks like this has been happening all week. The first obvious rev on inbound is 6c2fb5df16ba (Fri Nov 7 11:17:34 2014 PST) but there may be earlier failures too: 9f896cb289a9, 33d0247e5290. Is this another instance of "Pandas reboot when test job runs for too long"?
Assignee | ||
Comment 2•10 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #1) > Is this another instance of "Pandas reboot when test job runs for too long"? Yes, I think so. Logcats for the failures often show something like: 11-14 09:56:10.968 I/Process ( 1401): Sending signal. PID: 1401 SIG: 3 11-14 09:56:10.968 I/dalvikvm( 1401): threadid=3: reacting to signal 3 11-14 09:56:11.007 I/dalvikvm( 1401): Wrote stack traces to '/data/anr/traces.txt' 11-14 09:56:20.742 W/WindowManager( 1401): Input event injection timed out. 11-14 09:56:41.015 I/Process ( 1401): Sending signal. PID: 1401 SIG: 3 11-14 09:56:41.015 I/dalvikvm( 1401): threadid=3: reacting to signal 3 11-14 09:56:41.046 I/dalvikvm( 1401): Wrote stack traces to '/data/anr/traces.txt' 11-14 09:56:41.046 I/Process ( 1401): Sending signal. PID: 1579 SIG: 3 11-14 09:56:41.046 I/dalvikvm( 1579): threadid=3: reacting to signal 3 11-14 09:56:41.054 I/dalvikvm( 1579): Wrote stack traces to '/data/anr/traces.txt' 11-14 09:56:43.132 I/Watchdog_N( 1401): dumpKernelStacks 11-14 09:56:43.210 D/dalvikvm( 1401): GC_FOR_ALLOC freed 299K, 28% free 9239K/12679K, paused 46ms 11-14 09:56:43.281 D/dalvikvm( 1401): GC_CONCURRENT freed 31K, 28% free 9256K/12679K, paused 4ms+20ms 11-14 09:56:43.343 D/dalvikvm( 1401): GC_FOR_ALLOC freed 97K, 28% free 9158K/12679K, paused 56ms 11-14 09:56:43.343 I/dalvikvm-heap( 1401): Grow heap (frag case) to 9.211MB for 197928-byte allocation 11-14 09:56:43.375 D/dalvikvm( 1401): GC_FOR_ALLOC freed 0K, 28% free 9351K/12935K, paused 54ms 11-14 09:56:43.468 D/dalvikvm( 1401): GC_FOR_ALLOC freed 266K, 30% free 9182K/12935K, paused 55ms 11-14 09:56:45.171 I/Process ( 1401): Sending signal. PID: 1401 SIG: 9 11-14 09:56:45.171 W/Watchdog( 1401): *** WATCHDOG KILLING SYSTEM PROCESS: com.android.server.PowerManagerService approximately 30 minutes after the start of the test. Here's a try run which simulates running Android 4.0 Robocop as 7 chunks instead of 5: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=df79bab7f533 There are only a few retries, and they only happen in cleanupDevice -- not during the test job. Compare that to the same revision run as 5 chunks: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=dd048eae7f2a
Assignee | ||
Comment 3•10 years ago
|
||
btw, in the above log pid 1401 is "system" and 1579 is "com.android.phone" -- that pattern appears to hold for all the logs I have examined.
Assignee | ||
Comment 4•10 years ago
|
||
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/148aae9111bce7aba4e97f99c35b4254578a4a0f16c2feef6165d119e80b0f248175ebc4f5b2fc39328a0ab9ac8ee6ba1106a533c07f7a07cc6fa5bc0468f6a4
Comment 5•10 years ago
|
||
The ANR report in the log for one of the retries[0] shows a deadlock in the PowerManagerService thread. 10:39:04 INFO - "PowerManagerService" prio=5 tid=20 MONITOR 10:39:04 INFO - | group="main" sCount=1 dsCount=0 obj=0x41398f60 self=0x199f668 10:39:04 INFO - | sysTid=1428 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=26764208 10:39:04 INFO - | schedstat=( 0 0 0 ) utm=3 stm=3 core=1 10:39:04 INFO - at com.android.server.PowerManagerService$TimeoutTask.run(PowerManagerService.java:~1298) 10:39:04 INFO - - waiting to lock <0x41209160> (a com.android.server.PowerManagerService$LockList) held by tid=26 (InputDispatcher) 10:39:04 INFO - at android.os.Handler.handleCallback(Handler.java:605) 10:39:04 INFO - at android.os.Handler.dispatchMessage(Handler.java:92) 10:39:04 INFO - at android.os.Looper.loop(Looper.java:137) 10:39:04 INFO - at android.os.HandlerThread.run(HandlerThread.java:60) It says the lock is held by InputDispatcher, whose stack looks like this: 10:39:04 INFO - "InputDispatcher" prio=10 tid=26 MONITOR 10:39:04 INFO - | group="main" sCount=1 dsCount=0 obj=0x4132fa70 self=0x19c3890 10:39:04 INFO - | sysTid=1434 nice=-8 sched=0/0 cgrp=[fopen-error:2] handle=26378376 10:39:04 INFO - | schedstat=( 0 0 0 ) utm=125 stm=171 core=1 10:39:04 INFO - at com.android.server.am.BatteryStatsService.noteUserActivity(BatteryStatsService.java:~189) 10:39:04 INFO - - waiting to lock <0x4120ee28> (a com.android.internal.os.BatteryStatsImpl) held by tid=13 (ProcessStats) 10:39:04 INFO - at com.android.server.PowerManagerService.userActivity(PowerManagerService.java:2419) 10:39:04 INFO - at com.android.server.PowerManagerService.userActivity(PowerManagerService.java:2361) 10:39:04 INFO - at dalvik.system.NativeStart.run(Native Method) So that one is waiting on some lock held by ProcessStats: 0:39:04 INFO - "ProcessStats" prio=5 tid=13 MONITOR 10:39:04 INFO - | group="main" sCount=1 dsCount=0 obj=0x4127d4b8 self=0x199d230 10:39:04 INFO - | sysTid=1419 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=26431952 10:39:04 INFO - | schedstat=( 0 0 0 ) utm=1 stm=0 core=1 10:39:04 INFO - at com.android.server.am.ActivityManagerService.broadcastIntent(ActivityManagerService.java:~12504) 10:39:04 INFO - - waiting to lock <0x4120b2a0> (a com.android.server.am.ActivityManagerService) held by tid=21 (AlarmManager) 10:39:04 INFO - at android.app.ContextImpl.sendBroadcast(ContextImpl.java:918) 10:39:04 INFO - at com.android.server.DropBoxManagerService.add(DropBoxManagerService.java:236) 10:39:04 INFO - at android.os.DropBoxManager.addText(DropBoxManager.java:272) 10:39:04 INFO - at com.android.server.am.ActivityManagerService$11.run(ActivityManagerService.java:7704) 10:39:04 INFO - at com.android.server.am.ActivityManagerService.addErrorToDropBox(ActivityManagerService.java:7709) 10:39:04 INFO - at com.android.server.am.ActivityManagerService.handleApplicationWtf(ActivityManagerService.java:7522) 10:39:04 INFO - at com.android.internal.os.RuntimeInit.wtf(RuntimeInit.java:345) 10:39:04 INFO - at android.util.Log$1.onTerribleFailure(Log.java:103) 10:39:04 INFO - at android.util.Log.wtf(Log.java:278) 10:39:04 INFO - at com.android.internal.os.BatteryStatsImpl.getNetworkStatsDetailGroupedByUid(BatteryStatsImpl.java:5738) 10:39:04 INFO - at com.android.internal.os.BatteryStatsImpl.access$100(BatteryStatsImpl.java:76) 10:39:04 INFO - at com.android.internal.os.BatteryStatsImpl$Uid.computeCurrentTcpBytesReceived(BatteryStatsImpl.java:2457) 10:39:04 INFO - at com.android.internal.os.BatteryStatsImpl$Uid.getTcpBytesReceived(BatteryStatsImpl.java:2446) 10:39:04 INFO - at com.android.internal.os.BatteryStatsImpl.writeSummaryToParcel(BatteryStatsImpl.java:5437) 10:39:04 INFO - at com.android.internal.os.BatteryStatsImpl.writeLocked(BatteryStatsImpl.java:4836) 10:39:04 INFO - at com.android.internal.os.BatteryStatsImpl.writeAsyncLocked(BatteryStatsImpl.java:4818) 10:39:04 INFO - at com.android.server.am.ActivityManagerService.updateCpuStatsNow(ActivityManagerService.java:1652) 10:39:04 INFO - at com.android.server.am.ActivityManagerService$3.run(ActivityManagerService.java:1534) That's waiting on a lock held by AlarmManager: 10:39:04 INFO - "AlarmManager" prio=5 tid=21 MONITOR 10:39:04 INFO - | group="main" sCount=1 dsCount=0 obj=0x413271f8 self=0x196ac88 10:39:04 INFO - | sysTid=1429 nice=0 sched=0/0 cgrp=[fopen-error:2] handle=26742432 10:39:04 INFO - | schedstat=( 0 0 0 ) utm=3 stm=1 core=1 10:39:04 INFO - at com.android.server.am.ActivityManagerService.retrieveServiceLocked(ActivityManagerService.java:~10563) 10:39:04 INFO - - waiting to lock <0x4120ee28> (a com.android.internal.os.BatteryStatsImpl) held by tid=13 (ProcessStats) 10:39:04 INFO - at com.android.server.am.ActivityManagerService.startServiceLocked(ActivityManagerService.java:11096) 10:39:04 INFO - at com.android.server.am.ActivityManagerService.startServiceInPackage(ActivityManagerService.java:11149) 10:39:04 INFO - at com.android.server.am.PendingIntentRecord.sendInner(PendingIntentRecord.java:260) 10:39:04 INFO - at com.android.server.am.PendingIntentRecord.send(PendingIntentRecord.java:182) 10:39:04 INFO - at android.app.PendingIntent.send(PendingIntent.java:509) 10:39:04 INFO - at android.app.PendingIntent.send(PendingIntent.java:463) 10:39:04 INFO - at com.android.server.AlarmManagerService$AlarmThread.run(AlarmManagerService.java:665) AlarmManager is waiting on ProcessStats. And now we're deadlocked. [0] http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/gbrown@mozilla.com-dd048eae7f2a/try-android/try_panda_android_test-robocop-1-bm100-tests1-panda-build2884.txt.gz
Comment 6•10 years ago
|
||
It looks like we could work around this specific instance by avoiding the 'wtf' error in BatteryStatsImpl (ProcessStats thread above)
Comment 7•10 years ago
|
||
If we're able to make changes to the Android build, commenting out this line should fix the deadlock described above: http://androidxref.com/4.0.4/xref/frameworks/base/core/java/com/android/internal/os/BatteryStatsImpl.java#5738
Assignee | ||
Comment 8•10 years ago
|
||
https://wiki.mozilla.org/Auto-tools/Projects/Pandaboard_Setup
Assignee | ||
Comment 9•10 years ago
|
||
Android 4.0 Robocop is running as 7 chunks now and the frequent retries on rc1 are resolved. Let's keep this bug open for now while we sort out a permanent solution (follow-up on Comments 5 - 8).
Assignee | ||
Comment 10•9 years ago
|
||
We have a new instance of this problem, and a solution on the way, in bug 1142553. Even so, given the declining importance and anticipated obsolescence of the pandaboards, I am reluctant to follow-up on Comments 5 - 8.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•