Closed Bug 1136634 Opened 6 years ago Closed 5 years ago

Crash in xpcshell toolkit/components/telemetry/tests/unit/test_TelemetryController.js on Android 4.3 emulator

Categories

(Firefox for Android :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 46
Tracking Status
firefox46 --- fixed

People

(Reporter: gfritzsche, Assigned: gbrown)

References

Details

On my Android emulator i do get this segfault (sorry, still don't have properly working symbols for some reason):

I/Gecko   ( 4101): 1424858489114	Toolkit.Telemetry	TRACE	TelemetrySession::getMetadata - Reason gather-payload
I/Gecko   ( 4101): Attempting load of libEGL.so
D/libEGL  ( 4101): Emulator without GPU support detected. Fallback to software renderer.
D/libEGL  ( 4101): loaded /system/lib/egl/libGLES_android.so
E/libEGL  ( 4101): dlopen("system/lib/libGLESv1_CM.so") failed: dlopen failed: library "system/lib/libGLESv1_CM.so" not found
F/libEGL  ( 4101): couldn't load system OpenGL ES wrapper libraries
F/libc    ( 4101): Fatal signal 11 (SIGSEGV) at 0xdeadbaad (code=1), thread 4101 (xpcshell)
W/NativeCrashListener(  276): Couldn't find ProcessRecord for pid 4101
I/DEBUG   ( 2557): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   ( 2557): Build fingerprint: 'generic/sdk/generic:4.3/JB_MR2/774058:eng/test-keys'
I/DEBUG   ( 2557): Revision: '0'
I/DEBUG   ( 2557): pid: 4101, tid: 4101, name: UNKNOWN  >>> /data/local/xpcb/xpcshell <<<
I/DEBUG   ( 2557): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr deadbaad
I/DEBUG   ( 2557):     r0 00000027  r1 00000000  r2 00000008  r3 deadbaad
I/DEBUG   ( 2557):     r4 00000000  r5 bedaaa3c  r6 00013740  r7 00013720
I/DEBUG   ( 2557):     r8 45cda800  r9 0000000b  sl 00000001  fp bedabf54
I/DEBUG   ( 2557):     ip 00013720  sp bedaaa38  lr 40039fcb  pc 40037524  cpsr 60000030
I/DEBUG   ( 2557):     d0  4018000000000000  d1  7e37e43c8800759c
I/DEBUG   ( 2557):     d2  3e91e9eebc4246d4  d3  00000007bcc0c499
I/DEBUG   ( 2557):     d4  407f800000000000  d5  3ff0000000000000
I/DEBUG   ( 2557):     d6  4039000000000000  d7  437a000000000000
I/DEBUG   ( 2557):     d8  0000000000000000  d9  0000000000000000
I/DEBUG   ( 2557):     d10 0000000000000000  d11 0000000000000000
I/DEBUG   ( 2557):     d12 0000000000000000  d13 0000000000000000
I/DEBUG   ( 2557):     d14 0000000000000000  d15 0000000000000000
I/DEBUG   ( 2557):     scr 60000011
I/DEBUG   ( 2557): 
I/DEBUG   ( 2557): backtrace:
I/DEBUG   ( 2557):     #00  pc 0001e524  /system/lib/libc.so
I/DEBUG   ( 2557):     #01  pc 0001c4e4  /system/lib/libc.so (abort+4)
I/DEBUG   ( 2557):     #02  pc 0000888b  /system/lib/libcutils.so (__android_log_assert+86)
I/DEBUG   ( 2557):     #03  pc 000342f7  /system/lib/libEGL.so
I/DEBUG   ( 2557):     #04  pc 0000dff5  /system/lib/libEGL.so
I/DEBUG   ( 2557):     #05  pc 0000e77d  /system/lib/libEGL.so (eglGetDisplay+24)
I/DEBUG   ( 2557):     #06  pc 010d912d  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #07  pc 010caa67  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #08  pc 010caf51  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #09  pc 010ca41b  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #10  pc 02694695  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #11  pc 026945c3  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #12  pc 0269319f  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #13  pc 0053220f  /data/local/xpcb/libxul.so (NS_InvokeByIndex+66)
I/DEBUG   ( 2557):     #14  pc 00df3cbf  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #15  pc 00df22b1  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #16  pc 00dddc49  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #17  pc 00defc21  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #18  pc 00de480d  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #19  pc 03b04965  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #20  pc 03accac9  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #21  pc 03acce39  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #22  pc 03acd21d  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #23  pc 03ae7df7  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #24  pc 03af9e63  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #25  pc 03afa061  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #26  pc 03ae8413  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #27  pc 039670f3  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #28  pc 0396607f  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #29  pc 03c5b7b5  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #30  pc 03c5be5d  /data/local/xpcb/libxul.so
I/DEBUG   ( 2557):     #31  pc 03c77ef3  /data/local/xpcb/libxul.so

[...]
avd info:

    Name: ook
  Device: Nexus S (Google)
    Path: /Users/gfritzsche/.android/avd/ook.avd
  Target: Android 4.3.1 (API level 18)
 Tag/ABI: default/armeabi-v7a
    Skin: 480x800
  Sdcard: 500M

"Use host GPU" on or off doesn't make a difference.
Version: unspecified → Trunk
This seems to be fixed, doesn't happen anymore on my system.
This crash is back.
Whiteboard: [measurement:client:tracking]
It's crashing when TelemetryEnvironment tries to gather GFX information, and it's a slightly different form than the one in comment 0:

I/GeckoConsole( 2049): 1451473821099	Toolkit.Telemetry	TRACE	TelemetryEnvironment::constructor
I/Gecko   ( 2049): Attempting load of libEGL.so
D/libEGL  ( 2049): loaded /system/lib/egl/libEGL_emulation.so
D/        ( 2049): HostConnection::get() New Host Connection established 0x11870, tid 2049
D/libEGL  ( 2049): loaded /system/lib/egl/libGLESv1_CM_emulation.so
D/libEGL  ( 2049): loaded /system/lib/egl/libGLESv2_emulation.so
E/libEGL  ( 2049): dlopen("system/lib/libGLESv1_CM.so") failed: dlopen failed: library "system/lib/libGLESv1_CM.so" not found
F/libEGL  ( 2049): couldn't load system OpenGL ES wrapper libraries

[0] - https://dxr.mozilla.org/mozilla-central/rev/22f51211915bf7daff076180847a7140d35aa353/toolkit/components/telemetry/TelemetryEnvironment.jsm#1216
I'm confused: I don't see this test on mozilla-central.
See Also: → 1144395
(In reply to Geoff Brown [:gbrown] from comment #5)
> I'm confused: I don't see this test on mozilla-central.

Correct, this test was renamed lately. 

test_TelemetryController.js was consistently crashing 4.3, so it was disabled [0]. Enabling the test again should crash the emulator quickly.

Also, bug 1230213 was backed out due to similar issues. I just checked locally and applying that patch produces the error from comment 4.

[0] - https://dxr.mozilla.org/mozilla-central/rev/22f51211915bf7daff076180847a7140d35aa353/toolkit/components/telemetry/tests/unit/xpcshell.ini#41
Summary: Crash in xpcshell toolkit/components/telemetry/tests/unit/test_TelemetryPing.js on Android 4.3 emulator → Crash in xpcshell toolkit/components/telemetry/tests/unit/test_TelemetryController.js on Android 4.3 emulator
Geoff, could this be a problem with the Emulator?
Flags: needinfo?(gbrown)
It could be, but I'm not sure what the cause of the problem is.

Others have reported similar problems on actual phones: https://groups.google.com/forum/#!topic/android-ndk/KZJqqisUwz4.

/system/lib/libGLESv1_CM.so exists on the 4.3 emulator, as it does on x86 and 2.3 (where this test runs fine).

I'll take a close look.
Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
There's an obvious error in libEGL in our Android 4.3 build (JLS36I) and I don't see a fix on any 4.3 branch. 

It is easily fixed:

--- aosp431r1/frameworks/native/opengl/libs/EGL/Loader.cpp	2015-03-08 16:26:36.657499112 -0600
+++ frameworks/native/opengl/libs/EGL/Loader.cpp	2016-01-06 16:57:35.179946862 -0700
@@ -207,8 +207,8 @@
             "couldn't find the default OpenGL ES implementation "
             "for default display");
 
-    cnx->libGles2 = load_wrapper("system/lib/libGLESv2.so");
-    cnx->libGles1 = load_wrapper("system/lib/libGLESv1_CM.so");
+    cnx->libGles2 = load_wrapper("/system/lib/libGLESv2.so");
+    cnx->libGles1 = load_wrapper("/system/lib/libGLESv1_CM.so");
     LOG_ALWAYS_FATAL_IF(!cnx->libGles2 || !cnx->libGles1,
             "couldn't load system OpenGL ES wrapper libraries");

Then to patch libEGL.so in the 4.3 AVD:

  $ emulator -avd mozemulator-4.3 -partition-size 800
  $ adb -e shell mount -o remount,rw /system
  $ adb push libEGL.so /system/lib/libEGL.so
  $ adb shell chmod 644 /system/lib/libEGL.so
  $ adb -e shell mount -o remount,ro /system
  $ cp /tmp/android-gbrown/emulator* system.img
  # close the emulator, copy system.img, re-package the AVD

In local tests, this allows test_TelemetryController.js to pass. :)


Next up: I'll test on try.
(In reply to Geoff Brown [:gbrown] from comment #10)
> There's an obvious error in libEGL in our Android 4.3 build (JLS36I) and I
> don't see a fix on any 4.3 branch. 
> 
> It is easily fixed:
> ...
> In local tests, this allows test_TelemetryController.js to pass. :)

Wow, good catch! It's great, as we won't have to disable our Telemetry tests on Android anymore. Can't wait for this to land!
I was able to un-skip test_TelemetryController, test_TelemetryControllerBuildID, and test_TelemetrySession.
Duplicate of this bug: 1144395
https://hg.mozilla.org/mozilla-central/rev/710854950d59
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 46
Whiteboard: [measurement:client:tracking]
Hi :gbrown,

Based on [1], do you think we can have a formal release for the latest Android source for the 4.3 branch release? If this formal release causes this bug crashes again, do you think reopen or file a new bug for tracking? Thanks

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1254443#c29
Flags: needinfo?(gbrown)
I cannot support that change without justification: Why is it important? Even if the crash is avoided in dom/media/test/test_bug879717.html, you still haven't demonstrated that you can run that test reliably (passing) on the emulator. Even if you get test_bug879717 passing and enabled (and I note it would be the only dom/media/test test running on the 4.3 emulator), why is that more important than running these telemetry xpcshell tests? 

Also, I would be more supportive if I saw a logical flaw in the fix applied in this bug, or if we understood the logical connection between the crash in test_bug879717 and the fix applied in this bug: Why should libEGL fail to load libGLESv2 and libGLESv1_CM? Why does that failure avoid the crash?
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #18)
> I cannot support that change without justification: Why is it important?
> Even if the crash is avoided in dom/media/test/test_bug879717.html, you
> still haven't demonstrated that you can run that test reliably (passing) on
> the emulator.

I think test crash and test fail for dom/media/test/test_bug879717.html are different thing. At this point I want to figure out crash problem first. Another reason why I doesn't doing it together is solving test fail issue maybe some other guy will take it not be done by me, it is better having a formal release which two crash issues fixed first and then start looking into test fail for dom/media/test/test_bug879717.html.
You need to log in before you can comment on or make changes to this bug.