Closed Bug 1054292 Opened 11 years ago Closed 10 years ago

Intermittent Android TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 330 seconds with no output ("org.mozilla.fennec still alive after SIGABRT: waiting...", [@ libc.so + 0xd1fc])

Categories

(Testing :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1085591

People

(Reporter: emorley, Assigned: gbrown)

References

Details

(Keywords: intermittent-failure)

Android 4.0 Panda mozilla-inbound opt test plain-reftest-6 on 2014-08-14 20:11:21 PDT for push f9c51223ffc9 slave: panda-0392 https://tbpl.mozilla.org/php/getParsedLog.php?id=45998403&tree=Mozilla-Inbound { 20:22:13 INFO - REFTEST INFO | runreftest.py | Running tests: start. 20:22:15 INFO - INFO | automation.py | Application pid: 2103 20:28:00 INFO - org.mozilla.fennec still alive after SIGABRT: waiting... 20:28:05 WARNING - TEST-UNEXPECTED-FAIL | remoteautomation.py | application timed out after 330 seconds with no output 20:28:05 INFO - INFO | automation.py | Application ran for: 0:05:51.357393 20:28:05 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpGKgzUdpidlog 20:28:05 INFO - Contents of /data/anr/traces.txt: 20:28:05 INFO - /data/tombstones does not exist; tombstone check skipped 20:28:06 INFO - mozcrash INFO | Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android/1408067296/fennec-34.0a1.en-US.android-arm.crashreporter-symbols.zip 20:28:10 WARNING - PROCESS-CRASH | remoteautomation.py | application crashed [@ libc.so + 0xd1fc] 20:28:10 INFO - Crash dump filename: /tmp/tmpMch4kb/19fdc152-4a13-64da-34603340-4d515c8b.dmp 20:28:10 INFO - Operating system: Android 20:28:10 INFO - 0.0.0 Linux 3.2.0+ #2 SMP PREEMPT Thu Nov 29 08:06:57 EST 2012 armv7l pandaboard/pandaboard/pandaboard:4.0.4/IMM76I/5:eng/test-keys 20:28:10 INFO - CPU: arm 20:28:10 INFO - 2 CPUs 20:28:10 INFO - Crash reason: SIGABRT 20:28:10 INFO - Crash address: 0x874 20:28:10 INFO - Thread 0 (crashed) 20:28:10 INFO - 0 libc.so + 0xd1fc 20:28:10 INFO - r4 = 0x4006b62c r5 = 0x00000000 r6 = 0x00000002 r7 = 0x000000f0 20:28:10 INFO - r8 = 0x409cb840 r9 = 0x40079d68 r10 = 0x00000003 fp = 0xbeabe6cc 20:28:10 INFO - sp = 0xbeabe5b0 lr = 0x40018db4 pc = 0x400141fc 20:28:10 INFO - Found by: given as instruction pointer in context 20:28:10 INFO - 1 libdvm.so + 0xd783e 20:28:10 INFO - sp = 0xbeabe5cc pc = 0x409cb840 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 2 libc.so + 0x13d5b 20:28:10 INFO - sp = 0xbeabe5d8 pc = 0x4001ad5d 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 3 dalvik-heap (deleted) + 0x77279e 20:28:10 INFO - sp = 0xbeabe5f8 pc = 0x412e57a0 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 4 dalvik-heap (deleted) + 0x7727b8 20:28:10 INFO - sp = 0xbeabe5fc pc = 0x412e57ba 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 5 libdvm.so + 0xd783e 20:28:10 INFO - sp = 0xbeabe608 pc = 0x409cb840 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 6 libc.so + 0x17009 20:28:10 INFO - sp = 0xbeabe618 pc = 0x4001e00b 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 7 dalvik-heap (deleted) + 0x77279e 20:28:10 INFO - sp = 0xbeabe61c pc = 0x412e57a0 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 8 libdvm.so + 0x6ff8b 20:28:10 INFO - sp = 0xbeabe620 pc = 0x40963f8d 20:28:10 INFO - Found by: stack scanning 20:28:10 INFO - 9 libdvm.so + 0xd783e 20:28:10 INFO - sp = 0xbeabe638 pc = 0x409cb840 }
It looks like all of these are on Android 4.0 Pandas. All, or at least most, of the reftest chunks are affected. Jobs hang on startup, after lib loading but before any tests are run. I do not see any errors in the logs (before the hang) and it is not apparent to me what the test is waiting for.
$ ./intermittent_stats.py 1054292 OS's: 145 (100.0%): Android 4.0 Panda Branches: 90 (62.1%): mozilla-inbound 21 (14.5%): mozilla-central 19 (13.1%): fx-team 10 (6.9%): b2g-inbound 5 (3.4%): try Build-Type: 144 (99.3%): opt 1 (0.7%): debug Test Group: 15 (10.3%): plain-reftest-8 14 (9.7%): plain-reftest-1 13 (9.0%): plain-reftest-2 12 (8.3%): plain-reftest-6 11 (7.6%): jsreftest-2 11 (7.6%): jsreftest-1 10 (6.9%): plain-reftest-4 10 (6.9%): plain-reftest-5 10 (6.9%): plain-reftest-7 9 (6.2%): plain-reftest-3 9 (6.2%): jsreftest-3 6 (4.1%): crashtest 5 (3.4%): mochitest-6 3 (2.1%): mochitest-2 2 (1.4%): mochitest-8 2 (1.4%): mochitest-1 1 (0.7%): mochitest-7 1 (0.7%): robocop-1 1 (0.7%): mochitest-5 Slaves: 3 (2.1%): panda-0054 2 (1.4%): panda-0278 2 (1.4%): panda-0780 2 (1.4%): panda-0187 2 (1.4%): panda-0717 2 (1.4%): panda-0556 2 (1.4%): panda-0551 2 (1.4%): panda-0908 2 (1.4%): panda-0762 2 (1.4%): panda-0614 + 135 more
:snorp -- Can you find an owner for this frequent Panda reftest failure? Fennec seems to hang on startup, before any reftests are run.
Flags: needinfo?(snorp)
See Also: → 1054456
This bug is the #1 orange on this week's War On Orange report. (And 1054456, which may be the same cause, is #3!)
Most of these logcats end like this: 13:13:47 INFO - 08-26 13:07:51.328 D/GeckoBrowser( 2243): zerdatime 1409058471333 - browser chrome startup finished. 13:13:47 INFO - 08-26 13:07:52.031 D/GeckoLayerClient( 2243): Screen-size changed to (1280,672) 13:13:47 INFO - 08-26 13:07:52.031 D/GeckoLayerClient( 2243): Window-size changed to (1280,616) 13:13:47 INFO - 08-26 13:07:52.062 D/Telemetry( 2243): SendUIEvent: event = policynotification.success.1:true method = null timestamp = 332895 extras = null 13:13:47 INFO - 08-26 13:07:52.093 I/Gecko ( 2243): Attempting load of libEGL.so 13:13:47 INFO - 08-26 13:07:52.093 I/Gecko ( 2243): Can't find symbol '_Z35eglQueryStringImplementationANDROIDPvi'. 13:13:47 INFO - 08-26 13:08:07.234 I/SUTAgentAndroid( 1896): 10.26.137.18 : activity 13:13:47 INFO - 08-26 13:08:27.257 I/SUTAgentAndroid( 1896): 10.26.137.18 : activity 13:13:47 INFO - 08-26 13:08:47.281 I/SUTAgentAndroid( 1896): 10.26.137.18 : activity instead of continuing like this (from a successful run): 08-26 14:10:13.773 D/GeckoBrowser( 2254): zerdatime 1409062213772 - browser chrome startup finished. 08-26 14:10:14.273 D/GeckoLayerClient( 2254): Screen-size changed to (1280,672) 08-26 14:10:14.273 D/GeckoLayerClient( 2254): Window-size changed to (1280,616) 08-26 14:10:14.296 D/Telemetry( 2254): SendUIEvent: event = policynotification.success.1:true method = null timestamp = 337696 extras = null 08-26 14:10:14.320 I/Gecko ( 2254): Attempting load of libEGL.so 08-26 14:10:14.320 I/Gecko ( 2254): Can't find symbol '_Z35eglQueryStringImplementationANDROIDPvi'. 08-26 14:10:14.406 E/GeckoConsole( 2254): OpenGL compositor Initialized Succesfully. 08-26 14:10:14.406 E/GeckoConsole( 2254): Version: OpenGL ES 2.0 build 1.8@550175 08-26 14:10:14.406 E/GeckoConsole( 2254): Vendor: Imagination Technologies 08-26 14:10:14.406 E/GeckoConsole( 2254): Renderer: PowerVR SGX 540 08-26 14:10:14.406 E/GeckoConsole( 2254): FBO Texture Target: TEXTURE_2D 08-26 14:10:14.429 E/GeckoConsole( 2254): Adding HealthReport:RequestSnapshot observer. 08-26 14:10:14.445 D/GeckoWebapps( 2254): Saving /mnt/sdcard/tests/webapps/webapps.json 08-26 14:10:14.453 D/GeckoLayerClient( 2254): Window-size changed to (1280,672) 08-26 14:10:14.460 D/GeckoWebapps( 2254): Saving /mnt/sdcard/tests/webapps/webapps.json 08-26 14:10:14.460 W/GeckoScreenOrientation( 2254): screenOrientationFromString: unknown orientation string ...
In try runs, I'm seeing a hang in CompositorOGL::Initialize, likely in CompositorOGL::GetShaderProgramFor.
Depends on: 1059797
:snorp and :glandium are working on bug 1059797, which I hope will address this bug.
Flags: needinfo?(snorp)
(In reply to Geoff Brown [:gbrown] from comment #665) > :snorp and :glandium are working on bug 1059797, which I hope will address this bug. The fix for bug 1059797 significantly reduced the frequency of time-outs reported here. However, as this bug is a catch-all for test time-outs, failure reports continue, presumably from other causes.
(In reply to Treeherder Robot from comment #1804) > log: > https://treeherder.mozilla.org/logviewer.html#?repo=mozilla- > inbound&job_id=9280416 > repository: mozilla-inbound > start_time: 2015-04-24T11:47:12 > who: rvandermeulen[at]mozilla[dot]com > machine: tst-linux64-spot-382 > buildname: Android 4.3 armv7 API 11+ mozilla-inbound opt test mochitest-gl-2 > revision: a12f0f05779c 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): Exception handling message "Gecko:Ready": 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): java.lang.NullPointerException 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.GeckoApp.geckoConnected(GeckoApp.java:2387) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.GeckoApp.handleMessage(GeckoApp.java:667) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.BrowserApp.handleMessage(BrowserApp.java:1822) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.EventDispatcher.dispatchEvent(EventDispatcher.java:208) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.EventDispatcher.dispatchEvent(EventDispatcher.java:178) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.GeckoAppShell.handleGeckoMessage(GeckoAppShell.java:2253) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.GeckoAppShell.runGecko(GeckoAppShell.java:382) 12:02:08 INFO - 04-24 11:54:49.333 E/GeckoApp( 787): at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:187) Same thing happened in Comment 1803.
Depends on: 1158309
Comments 1806, 1807 are also NPE handling "Gecko:Ready".
No more startup NPEs -- yahoo! Now most of the recent failures show: 11:14:19 INFO - 05-07 11:10:01.241 E/SQLiteLog( 778): (1) no such table: history 11:14:19 INFO - 05-07 11:12:15.562 I/dalvikvm( 778): threadid=3: reacting to signal 3 11:14:19 INFO - 05-07 11:12:15.762 I/dalvikvm( 778): Wrote stack traces to '/data/anr/traces.txt' 11:14:19 INFO - 05-07 11:12:22.902 D/Zygote ( 38): Process 778 terminated by signal (6)
Assignee: nobody → gbrown
Depends on: 1173180
Nearly all recent failure logs show something like a startup hang: I see no explicit failures but it is apparent that the test page is not loaded and browser startup is incomplete. https://treeherder.mozilla.org/#/jobs?repo=try&revision=da1962367cf7 reproduced this failure with additional startup logging. Failure logcats confirm that GeckoApp.loadStartupTab(), http://hg.mozilla.org/mozilla-central/annotate/abaec2f7ed2c/mobile/android/base/GeckoApp.java#l1447 is not called. Failure logcats also show that GeckoApp.onWindowFocusChanged(), http://hg.mozilla.org/mozilla-central/annotate/abaec2f7ed2c/mobile/android/base/GeckoApp.java#l1952 is not called. (See bug 973085 and bug 1085591.) I also notice "ActivityManager: Launch timeout has expired, giving up wake lock!" reported in failure logcats. Failure logcat: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/3576df391eb7ccd91526ab0a8aec526a23a61b4b7fa34c6d1e38ffd3c0208d7473aa755acbd67844e6fa8c9ecb353995682babe78903b2cb4f9f26ae87fb0a8b 06-12 19:29:21.507 64 75 W ActivityManager: Launch timeout has expired, giving up wake lock! 06-12 19:29:22.272 64 75 W ActivityManager: Activity idle timeout for HistoryRecord{408b59b0 org.mozilla.fennec/.App} ... 06-12 19:29:26.171 617 627 E GeckoLibLoad: Loaded libs in 9563ms total, 4230ms(6500ms) user, 1360ms(2060ms) system, 0(0) faults 06-12 19:29:26.191 617 627 W GeckoThread: zerdatime 340210 - runGecko Success: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/70ec687579b28fb82fe4befa509cbc7569391ffb11d5d17da8779df30cd1136188de3fc186a122ef4142759e6c3f17153e4d003062bb580763c128d185bb9fdc 06-12 19:28:13.241 64 91 I ActivityManager: Displayed org.mozilla.fennec/.App: +6s559ms 06-12 19:28:13.261 616 616 I ZZZ : GeckoApp onWindowFocusChanged ... 06-12 19:28:13.380 616 625 I ZZZ : GeckoApp loadStartupTab: http://mochi.test:8888/tests/?autorun=1&closeWhenDone=1&logFile=%2Fmnt%2Fsdcard%2Ftests%2Flogs%2Fmochitest.log&fileLevel=INFO&consoleLevel=INFO&hideResultsTable=1&manifestFile=tests.json&dumpOutputDirectory=%2Fmnt%2Fsdcard%2Ftests ... 06-12 19:28:19.350 616 627 E GeckoLibLoad: Loaded libs in 9150ms total, 3380ms(6050ms) user, 1100ms(2030ms) system, 0(0) faults 06-12 19:28:19.360 616 627 W GeckoThread: zerdatime 263237 - runGecko
For the failure logcat, I meant to also show: 06-12 19:29:20.922 64 92 I ActivityManager: Displayed org.mozilla.fennec/.App: +9s277ms
See frameworks/base/services/java/com/android/server/am/ActivityStack.java in aosp for the launch timeout; the timer expires after 10 seconds. That seems to be 10 seconds from "ActivityManager: Starting: Intent".
Depends on: 1085591
Note that a fix for bug 1085591 landed on mozilla-central at 2015-07-03 03:11:29 PDT. All failures since have been on aurora.
I'm convinced that all recent failures here were caused by bug 1085591 (onWindowFocusChanged not always called). All sorts of hangs could produce this generic "application timed out after 330 seconds with no output" message in future...when that happens, we should open a new bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.