LUL on x86_64-{android, linux}: accept .eh_frame with type as either SHT_PROGBITS or SHT_X86_64_UNWIND
Categories
(Core :: Gecko Profiler, defect, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr60 | --- | unaffected |
| firefox-esr68 | --- | unaffected |
| firefox69 | --- | unaffected |
| firefox70 | --- | unaffected |
| firefox71 | + | fixed |
| firefox72 | --- | verified |
People
(Reporter: dluca, Assigned: jseward)
References
(Regression)
Details
(Keywords: regression)
Attachments
(3 files)
[Tracking Requested - why for this release]:
Failure log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269028923&repo=try&lineNumber=5203
INFO - TEST-START | AccessibleCaretManagerTester.TestScrollInCursorModeWithCaretShownWhenLongTappingOnEmptyContentPref
[task 2019-09-30T11:55:19.961Z] 11:55:19 INFO - TEST-PASS | AccessibleCaretManagerTester.TestScrollInCursorModeWithCaretShownWhenLongTappingOnEmptyContentPref | test completed (time: 0ms)
[task 2019-09-30T11:55:19.961Z] 11:55:19 INFO - TEST-START | LulIntegration.unwind_consistency
[task 2019-09-30T11:55:19.962Z] 11:55:19 WARNING - TEST-UNEXPECTED-FAIL | LulIntegration.unwind_consistency | Value of: nTestsPassed == nTests
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - Actual: false
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - Expected: true
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - Not all tests passed @ /builds/worker/workspace/build/src/tools/profiler/tests/gtest/LulTest.cpp:48
[task 2019-09-30T11:55:19.962Z] 11:55:19 WARNING - TEST-UNEXPECTED-FAIL | LulIntegration.unwind_consistency | test completed (time: 54ms)
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - TEST-START | LulDwarfCFI.EmptyRegion
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - TEST-PASS | LulDwarfCFI.EmptyRegion | test completed (time: 0ms)
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - TEST-START | LulDwarfCFI.IncompleteLength32
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - TEST-PASS | LulDwarfCFI.IncompleteLength32 | test completed (time: 0ms)
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - TEST-START | LulDwarfCFI.IncompleteLength64
[task 2019-09-30T11:55:19.962Z] 11:55:19 INFO - TEST-PASS | LulDwarfCFI.IncompleteLength64 | test completed (time: 1ms)
Comment 1•6 years ago
•
|
||
This first failed on the 28th september beta sim: https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=2be31e08110c8fccc9035f60b7fc1c9f53a7f47c&selectedJob=268916363&searchStr=Android%2C7.0%2Cx86-64%2Copt%2Ctest-android-em-7.0-x86_64%2Fopt-geckoview-gtest-1proc%2C%28GTest%29
Central pushlog between the last good central rev and the first bad is: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aa8f530a1a35ba9b1c84303dbe15107d0026d77c&tochange=72a8d8c20180a068fd37f0bbf4619963486b0755
Gerald could you please have a look over this?
The solution proposed in bug 1583868 (which was due to the no-sampling mode) doesn't fix this one here.
I'll need to investigate further...
Updated•6 years ago
|
Update: (I've learned many ways to fail)
I had a quick look through the pushlog from comment 0, but didn't see anything related to LUL. I'll try to explore further tomorrow.
I made the failing assertion more explicit, and added some more logging:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269372732&repo=try&lineNumber=5210
This shows that none of the 6 tests pass! But there's no logging appearing, so that's not helping much.
I managed to run the Android simulator (first time!), and to build&run geckoview, but I could not run gtest:
gtest TEST-UNEXPECTED-FAIL | gtest | org.mozilla.geckoview.test failed to start
And I also didn't manage to connect the firefox debugger to geckoview in the simulator.
Reading the test code in LulMain.cpp, I see that it relies on the code not getting too optimized.
Around the time of the failure, clang 9 landed for a short time, could it be related? (Which could explain why the failure only happens in the "opt" build).
One of the patches in the pushlog is "Bug 1583907 - Add MOZ_NEVER_INLINE to LifoAlloc::mark to work around Clang 9 miscompilation on Android."
But then, the clang 9 patch was quickly backed out, so that's probably not it. And the test is still failing as of Oct 2, 20:42:10 AEST (3 hours before this comment) :
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d90ffb989758014ec4219049634172554013fc48&selectedJob=269406767
I'll probably need help, either with running&debugging gtest on Android, or directly with the failure in the profiler code or test.
Comment 4•6 years ago
|
||
Started with bug 1577220 (update to the r20 Android NDK) according to bisection.
James, Nathan is away, can you help Gerald (see comment 3)?
| Comment hidden (Intermittent Failures Robot) |
Comment 6•6 years ago
|
||
Nick, can you help with getting the issue resolved by the end of this week (Gecko 71 syncs to the beta branch starting next Monday)? Thank you.
Comment 7•6 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #6)
Nick, can you help with getting the issue resolved by the end of this week (Gecko 71 syncs to the beta branch starting next Monday)? Thank you.
I can try, but my role will have to be more traffic director than anything else. Nathan and James are both out this week. I expect Markus will have the most context on the interactions between the profiler (LUL) and Android, so I'll NI him. I have never tried to run gtest locally at all, but I may be able to help with that; I vaguely recall a conversation about producing a gtest APK but that never went anywhere (Bug 1544496). I do, however, see some additional issues with LUL and gtest in some work gbrown did: Bug 1558885.
markus: can you see what you can add to Gerald's analysis? This is well outside of my knowledge.
gbrown: does anything here look familiar? This is really about trying to get more eyes on this issue...
Comment 9•6 years ago
|
||
I'm not familiar with this test or its domain, but I did help set up gtest for android.
Have folks noticed the linker errors in the logcat ("logcat-emulator-5554.log" artifact linked to the test task)? I see stuff like:
10-02 07:43:56.199 3996 4011 I gtest : TEST-START | LulIntegration.unwind_consistency
10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #15 not handled
10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #6ffffef5 not handled
10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #20 not handled
10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #21 not handled
10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: unhandled flags #8 not handled
10-02 07:43:56.206 3996 4011 E GeckoLinker: /system/bin/app_process64: Missing or broken DT_HASH
10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/lib64/libcutils.so: dynamic header type #6ffffef5 not handled
10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/lib64/libcutils.so: unhandled flags #8 not handled
10-02 07:43:56.206 3996 4011 E GeckoLinker: /system/lib64/libcutils.so: Missing or broken DT_HASH
10-02 07:43:56.207 3996 4011 E linker : library "/system/lib64/libcutils.so" ("/system/lib64/libcutils.so") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
...
However, I see similar warnings and errors on mozilla-central, so maybe that's not useful.
Comment 10•6 years ago
|
||
In bug 1558885 I have a long-dormant patch which moves the gtest libxul.so to a new path, outside of the apk install directory; that patch generally works, but breaks exactly one test: unwind_consistency. I've never found the time to investigate.
Comment 11•6 years ago
|
||
I have verified that 'mach gtest' and 'mach gtest <test>', like 'mach gtest LulIntegration.unwind_consistency' work for me on the x86_64 emulator. It is important to run from an x86_64 build: If run from an arm build, the arm emulator will be used, and gtest will fail because of bug 1558885. Even an x86 build may fail (I'm not sure).
My mozconfig is:
ac_add_options --enable-application=mobile/android
ac_add_options --target=x86_64
ac_add_options --enable-debug
Comment 12•6 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #9)
I'm not familiar with this test or its domain, but I did help set up gtest for android.
Have folks noticed the linker errors in the logcat ("logcat-emulator-5554.log" artifact linked to the test task)? I see stuff like:
10-02 07:43:56.199 3996 4011 I gtest : TEST-START | LulIntegration.unwind_consistency 10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #15 not handled 10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #6ffffef5 not handled 10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #20 not handled 10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: dynamic header type #21 not handled 10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/bin/app_process64: unhandled flags #8 not handled 10-02 07:43:56.206 3996 4011 E GeckoLinker: /system/bin/app_process64: Missing or broken DT_HASH 10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/lib64/libcutils.so: dynamic header type #6ffffef5 not handled 10-02 07:43:56.206 3996 4011 W GeckoLinker: /system/lib64/libcutils.so: unhandled flags #8 not handled 10-02 07:43:56.206 3996 4011 E GeckoLinker: /system/lib64/libcutils.so: Missing or broken DT_HASH 10-02 07:43:56.207 3996 4011 E linker : library "/system/lib64/libcutils.so" ("/system/lib64/libcutils.so") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"] ...However, I see similar warnings and errors on mozilla-central, so maybe that's not useful.
Indeed, this is a restriction of Android's. It's possible that it will impact this ticket, although since we're not seeing crashes during the test, perhaps not. See Bug 1580999 for issues profiling i686 builds on the x86/x86_64 emulator, which may be relevant here. If profiling x86_64 builds work, that would be great news for me: I will try locally shortly.
Comment 13•6 years ago
|
||
I don't have anything I can add here, sorry. Maybe Julian can?
Updated•6 years ago
|
| Assignee | ||
Comment 15•6 years ago
|
||
My initial reaction to the report is to think that the central-as-beta builds have had some change of compiler, or build flags, or something else that changes the debug information (Dwarf CFI) created. This somehow is causing a problem with LUL, which reads and processes that data, from all the shared objects we create, most notably libxul.so. Was there some change of compiler, flags, or anything related to debug info, that happened just prior to 28 Sept?
I'm also unclear what the actual build target is. I see android-em-7.0-x86_64, but what does that actually mean?
Comment 16•6 years ago
|
||
android-em-7.0-x86_64 is more a description of the test environment: An emulator running Android 7.0 with tests running against an --target=x86_64 build of geckoview.
Comment 17•6 years ago
|
||
The Android NDK got updated: bug 1577220
| Assignee | ||
Comment 18•6 years ago
|
||
I can't reproduce the failure with a native build on x86_64-linux, using gcc 8.3.1 and -g -Og. I'll try again with clang 8 or 9.
If anybody has a build that shows the failure, it would be useful to set DEBUG_LUL_TEST to 1 in tools/profiler/tests/gtest/LulTest.cpp, to see if that gives any useful info.
I am generally confused by the appearance of two copies of Lul in the tree, at tools/profiler/lul and mozglue/baseprofiler/lul. How are they related? Do we have to bug-fix both of them?
| Assignee | ||
Comment 19•6 years ago
|
||
An attempt to build with clang9 on x86_64-linux ended in a link failure. Can anyone advise me on how to build and repro this problem locally?
Comment 20•6 years ago
|
||
If anybody has a build that shows the failure, it would be useful to set
DEBUG_LUL_TESTto1in tools/profiler/tests/gtest/LulTest.cpp, to see if that gives any useful info.
Would that output be shown in automation? https://treeherder.mozilla.org/#/jobs?repo=try&revision=73ad9e1418386368ebdfeb8e269892806b172bcc&selectedJob=270899120 doesn't show more info.
Comment 21•6 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #19)
An attempt to build with clang9 on x86_64-linux ended in a link failure. Can anyone advise me on how to build and repro this problem locally?
In the past, I've had to add the following to my mozconfig:
CC="/Users/mstange/.mozbuild/clang/bin/clang"
CXX="/Users/mstange/.mozbuild/clang/bin/clang++"
I don't know if that'll fix your problem, though. I don't have a build that reproduces the problem that this bug is about.
Comment 22•6 years ago
|
||
Looking at the emulator adb logcat, I see:
10-11 17:08:05.960 3994 4009 I gtest : TEST-START | LulIntegration.unwind_consistency
10-11 17:08:05.950 3994 3994 I Gecko : type=1400 audit(0.0:4): avc: denied { getattr } for path="/data/local/gtest/libxul.so" dev="vdc" ino=21241 scontext=u:r:untrusted_app:s0:c512,c768 tcontext=u:object_r:system_data_file:s0 tclass=file permissive=1
10-11 17:08:05.969 3994 4009 W GeckoLinker: /system/bin/app_process64: dynamic header type #15 not handled
10-11 17:08:05.969 3994 4009 W GeckoLinker: /system/bin/app_process64: dynamic header type #6ffffef5 not handled
10-11 17:08:05.969 3994 4009 W GeckoLinker: /system/bin/app_process64: dynamic header type #20 not handled
10-11 17:08:05.969 3994 4009 W GeckoLinker: /system/bin/app_process64: dynamic header type #21 not handled
10-11 17:08:05.969 3994 4009 W GeckoLinker: /system/bin/app_process64: unhandled flags #8 not handled
10-11 17:08:05.969 3994 4009 E GeckoLinker: /system/bin/app_process64: Missing or broken DT_HASH
10-11 17:08:05.969 3994 4009 W GeckoLinker: /system/lib64/libcutils.so: dynamic header type #6ffffef5 not handled
10-11 17:08:05.969 3994 4009 W GeckoLinker: /system/lib64/libcutils.so: unhandled flags #8 not handled
10-11 17:08:05.969 3994 4009 E GeckoLinker: /system/lib64/libcutils.so: Missing or broken DT_HASH
...
10-11 17:08:06.000 3994 4009 W GeckoLinker: /system/lib64/libstagefright_omx.so: dynamic header type #6ffffef5 not handled
10-11 17:08:06.000 3994 4009 W GeckoLinker: /system/lib64/libstagefright_omx.so: unhandled flags #8 not handled
10-11 17:08:06.000 3994 4009 E GeckoLinker: /system/lib64/libstagefright_omx.so: Missing or broken DT_HASH
10-11 17:08:06.000 3994 4009 E linker : library "/system/lib64/libstagefright_omx.so" ("/system/lib64/libstagefright_omx.so") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
10-11 17:08:06.000 3994 4009 W GeckoLinker: /system/lib64/libstagefright_yuv.so: dynamic header type #6ffffef5 not handled
10-11 17:08:06.000 3994 4009 W GeckoLinker: /system/lib64/libstagefright_yuv.so: unhandled flags #8 not handled
10-11 17:08:06.000 3994 4009 E GeckoLinker: /system/lib64/libstagefright_yuv.so: Missing or broken DT_HASH
10-11 17:08:06.000 3994 4009 E linker : library "/system/lib64/libstagefright_yuv.so" ("/system/lib64/libstagefright_yuv.so") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
10-11 17:08:06.001 3994 4009 W GeckoLinker: /system/lib64/libvorbisidec.so: dynamic header type #6ffffef5 not handled
10-11 17:08:06.001 3994 4009 W GeckoLinker: /system/lib64/libvorbisidec.so: unhandled flags #8 not handled
10-11 17:08:06.001 3994 4009 E GeckoLinker: /system/lib64/libvorbisidec.so: Missing or broken DT_HASH
10-11 17:08:06.001 3994 4009 E linker : library "/system/lib64/libvorbisidec.so" ("/system/lib64/libvorbisidec.so") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
10-11 17:08:06.001 3994 4009 W GeckoLinker: /system/lib64/libpowermanager.so: dynamic header type #6ffffef5 not handled
10-11 17:08:06.001 3994 4009 W GeckoLinker: /system/lib64/libpowermanager.so: unhandled flags #8 not handled
10-11 17:08:06.001 3994 4009 E GeckoLinker: /system/lib64/libpowermanager.so: Missing or broken DT_HASH
<snip>
10-11 17:08:06.010 3994 4009 W GeckoLinker: /system/lib64/libwebviewchromium_loader.so: unhandled flags #8 not handled
10-11 17:08:06.010 3994 4009 E GeckoLinker: /system/lib64/libwebviewchromium_loader.so: Missing or broken DT_HASH
10-11 17:08:06.010 3994 4009 E linker : library "/system/lib64/libwebviewchromium_loader.so" ("/system/lib64/libwebviewchromium_loader.so") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
10-11 17:08:06.010 3994 4009 E GeckoLinker: /system/framework/oat/x86_64/android.test.runner.odex: Failed to mmap
10-11 17:08:06.010 3994 4009 E linker : library "/system/framework/oat/x86_64/android.test.runner.odex" ("/system/framework/oat/x86_64/android.test.runner.odex") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
10-11 17:08:06.010 3994 4009 E GeckoLinker: /data/app/org.mozilla.geckoview.test-1/oat/x86_64/base.odex: Failed to mmap
10-11 17:08:06.010 3994 4009 W GeckoLinker: /data/app/org.mozilla.geckoview.test-1/lib/x86_64/liblgpllibs.so: unhandled flags #8 not handled
10-11 17:08:06.010 3994 4009 W GeckoLinker: /system/lib64/hw/gralloc.ranchu.so: dynamic header type #6ffffef5 not handled
10-11 17:08:06.010 3994 4009 W GeckoLinker: /system/lib64/hw/gralloc.ranchu.so: unhandled flags #8 not handled
10-11 17:08:06.010 3994 4009 E GeckoLinker: /system/lib64/hw/gralloc.ranchu.so: Missing or broken DT_HASH
10-11 17:08:06.010 3994 4009 E linker : library "/system/lib64/hw/gralloc.ranchu.so" ("/system/lib64/hw/gralloc.ranchu.so") needed or dlopened by "/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not accessible for the namespace: [name="classloader-namespace", ld_library_paths="", default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64", permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
10-11 17:08:06.010 3994 4009 I gtest : TEST-UNEXPECTED-FAIL | LulIntegration.unwind_consistency | Value of: nTestsPassed == nTests
10-11 17:08:06.010 3994 4009 I gtest : Actual: false
10-11 17:08:06.010 3994 4009 I gtest : Expected: true
10-11 17:08:06.010 3994 4009 I gtest : Not all tests passed @ /builds/worker/workspace/build/src/tools/profiler/tests/gtest/LulTest.cpp:48
That all looks like (existing) Android restrictions, see discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=1580999#c5. I don't see anything that looks like extra logging.
Aryx, did you enable the define in that try build? Also, can these tests be run against the debug (not opt) build?
jseward, if we can run against debug builds, then you might get good mileage from an artifact build with the try build artifacts (assuming we can run gtest against artifact builds, which I have never tried). gbrown might be able to confirm. The reason I think that using artifact builds with exactly those artifacts will be fruitful is that we think this is a compiler issue, and therefore the precise toolchain and build environment matters greatly.
Comment 23•6 years ago
|
||
I usually build (against mozilla-central) with a mozconfig of only:
ac_add_options --enable-debug
ac_add_options --enable-application=mobile/android
ac_add_options --target=x86_64
Then you can 'mach gtest' or 'mach gtest LulIntegration.unwind_consistency' to run in the emulator.
nalexander's caution about using the exact same artifacts seems wise. I have never tried that with gtest, and I think it might be difficult.
Comment 24•6 years ago
•
|
||
(In reply to Nick Alexander :nalexander [he/him] from comment #22)
Aryx, did you enable the define in that try build? Also, can these tests be run against the
debug(notopt) build?
Yes: https://hg.mozilla.org/try/rev/fd153dba45aded1a9594da70dc0f7937f5e0fccb
It has been requested for debug (where it doesn't fail): https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=270899120&revision=73ad9e1418386368ebdfeb8e269892806b172bcc&searchStr=gtest
Comment 25•6 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #24)
(In reply to Nick Alexander :nalexander [he/him] from comment #22)
Aryx, did you enable the define in that try build? Also, can these tests be run against the
debug(notopt) build?Yes: https://hg.mozilla.org/try/rev/fd153dba45aded1a9594da70dc0f7937f5e0fccb
It has been requested for debug (where it doesn't fail): https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=270899120&revision=73ad9e1418386368ebdfeb8e269892806b172bcc&searchStr=gtest
The plot thickens! This points ever more strongly at a compiler issue.
| Assignee | ||
Comment 26•6 years ago
|
||
Sebastian, is it possible to get that logging output for an opt build instead?
Comment 27•6 years ago
|
||
It's the logged linked by Nick:
Looking at the emulator
adb logcat
| Assignee | ||
Comment 28•6 years ago
|
||
I've now built natively on x86_64-linux with clang9 at -Og and -O2, and the test won't fail. This inclines me away from thinking this is a compiler problem. Whereas ..
It's the logged linked by Nick:
Looking at the emulator adb logcat
.. this makes me think it is a runtime linker problem. Specifically, when LUL starts up, it tries to read CFI from all shared objects in the process. Typically there are around 100 of them. And that log shows what looks like around 100 failures. Also, it shows no evidence that LUL managed to do any unwinding, which is consistent with it failing to load CFI for all objects.
(In reply to Julian Seward [:jseward] from comment #18)
I am generally confused by the appearance of two copies of Lul in the tree, at tools/profiler/lul and mozglue/baseprofiler/lul. How are they related? Do we have to bug-fix both of them?
This is a temporary situation: Base Profiler is new-ish, a subset of the good old Gecko Profiler (tools/profiler) that can work without xpcom; I wanted to implement Base Profiler first, to make sure it was viable before trying to merge both. I plan to start deduplicating this quarter, see meta bug 1557566.
In the meantime, please concentrate on the one in tools/profiler/lul. Don't worry about fixing the one in mozglue, unless there are some build issues of course, or it's easy enough; I'll deal with changes when I start deduplicating...
Thank you all!
| Assignee | ||
Comment 30•6 years ago
|
||
Using the mozconfig in comment 23 and mach bootstrap to install the relevant tools, I managed to build it. mach gtest LulIntegration.unwind_consistency did indeed start an emulator, and produced a window and button bar, but that seemed to hang after a while. I control-C'd it and reran; now I don't even get a window, but the run fails after a few seconds. In ~/.mozbuild/android-device/emulator.log, I see:
emulator: ERROR: Not enough disk space to run AVD 'mozemulator-x86-7.0'. Exiting...
How do I move past this?
Comment 31•6 years ago
|
||
I think that means you need more disk space on the host (your desktop). You could try some disk cleanup and try again.
But if it worked once, maybe the avd is taking up too much space itself? You can reset the avd by running 'mach android-emulator --force-update'; if that works (the emulator starts), run gtest -- it should automatically use the existing emulator session.
| Assignee | ||
Comment 32•6 years ago
|
||
Ah, you're right about the disk space. Fixing that, all 102 Lul tests run successfully, including LulIntegration.unwind_consistency.
| Assignee | ||
Comment 33•6 years ago
|
||
I've now tried the Android build with four different mozconfigs, including --enable-optimize, --enable-release, and enablement of LTO. All those builds succeeded, but none of them show the unwind failure. I guess the next thing to try is with the "central as beta" config, so I tried to guess what the mozconfig was based on the "Android 5.0 x86-64 opt" build log from the "Central as beta" link in comment 0. But that build fails to link.
So I'm out of ideas. Can someone point me at the definitive mozconfig as used by the failing test here?
Comment 34•6 years ago
|
||
For modifying the configurations for the central as beta simulation, see https://wiki.mozilla.org/Sheriffing/How_To/Beta_simulations#TRUNK_AS_EARLY_BETA
build + mozconfig at https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=270996235&repo=try&lineNumber=628
| Assignee | ||
Comment 35•6 years ago
•
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #34)
build + mozconfig at https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=270996235&repo=try&lineNumber=628
Working from that gives the mozconfig below. But then running mach build produces
ERROR: /home/sewardj/MOZ/BUG-1584976/mozconfig_tmp directly or indirectly includes an in-tree mozconfig.
ERROR: In-tree mozconfigs make strong assumptions about and are only meant to be used by Mozilla automation.
ERROR: Please don't use them.
By repeatedly manually inlining all the . "$topsrcdir/build/blah" statements and then unsetting MOZ_AUTOMATION_MOZCONFIG`, I can get past that. But then I'm in a maze of configuration failures.
The initial mozconfig is:
. "$topsrcdir/mobile/android/config/mozconfigs/common"
ac_add_options --with-android-min-sdk=21
ac_add_options --target=x86_64-linux-android
ac_add_options --with-branding=mobile/android/branding/nightly
export FENNEC_NIGHTLY=1
export MOZILLA_OFFICIAL=1
export AR="/home/sewardj/Tools/InstClang900/bin/llvm-ar"
export NM="/home/sewardj/Tools/InstClang900/bin/llvm-nm"
export RANLIB="/home/sewardj/Tools/InstClang900/bin/llvm-ranlib"
export MOZ_LTO=cross
Updated•6 years ago
|
RyanVM, do you know how this central-as-beta config works? Or do you know who does?
Comment 38•6 years ago
|
||
I'm not very familiar with the android mozconfigs. It looks like we use the nightly mozconfig on all branches for release and opt builds, there does not appear to be any change between branches in the mozconfig. $MOZ_UPDATE_CHANNEL passed in will vary, and beta builds and central-as-beta will have that set to beta.
Note that this failure appears to be on beta as well now: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=5d743fd9d6bd149589ae42e2de7f08e569b6ed9d&selectedJob=271526433
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 40•6 years ago
|
||
Finally, I can reproduce with the attached mozconfig. The host clang version is clang 9. Investigating.
| Assignee | ||
Comment 41•6 years ago
|
||
I am seeing the same linker errors as mentioned in comment 12, and it seems
like a fair bet that those are the cause of the problem, as described in
comment 28. In particular, eg
0-21 16:18:19.535 9321 9336 E GeckoLinker: /system/lib64/libutils.so: Missing
or broken DT_HASH
10-21 16:18:19.535 9321 9336 E linker : library "/system/lib64/libutils.so"
("/system/lib64/libutils.so") needed or dlopened by
"/data/app/org.mozilla.geckoview.test-1/lib/x86_64/libmozglue.so" is not
accessible for the namespace: [name="classloader-namespace",
ld_library_paths="",
default_library_paths="/data/app/org.mozilla.geckoview.test-1/lib/x86_64:/data/app/org.mozilla.geckoview.test-1/base.apk!/lib/x86_64",
permitted_paths="/data:/mnt/expand:/data/data/org.mozilla.geckoview.test"]
and specifically the bit library "/system/lib64/libutils.so" .. is not accessible for the namespace: [name="classloader-namespace", ..]
That sounds to me like some kind of permissions problem ("you may only read the following set of files", etc). Who here is the keeper of, or knows of, this mechanism?
| Assignee | ||
Comment 42•6 years ago
|
||
I should add .. I am happy to help fix this, but I am only here this week;
am on PTO the weeks commencing 28 Oct and 4 Nov.
| Assignee | ||
Comment 43•6 years ago
|
||
Mike, do you know anything about the linker errors shown in comment 41?
| Assignee | ||
Comment 44•6 years ago
|
||
So. There are two things going on here. Both relate to Lul -- the unwinder
library in use here -- failing to read unwind info.
(1) Per comment 41 and comment 12, the Android linker denies Lul read access
to any of the Android system libraries, so it fails to read unwind info
for them. That limitation doesn't apply to libxul.so. But for libxul.so,
Lul fails to read unwind info for a second reason:
(2) At least for this build configuration, libxul.so's unwind information is
in an ELF section called .eh_frame. Lul expects this section to have type
SHT_PROGBITS, but this appears to have changed to SHT_X86_64_UNWIND, which
means Lul ignores the section. I suspect this change is related to which
linker created libxul.so. I also checked some x86_64-linux builds I have,
and I noticed that some -- perhaps done by clang 9 -- also have this
change. So it's not Android specific.
(2) is the immediate cause for LulIntegration.unwind_consistency failing.
Fixing it is trivial; I'll attach a patch shortly.
However, until such time as (1) is fixed, unwinding (for the Gecko profiler)
is likely to be still largely broken on x86_64-android, since many stack
traces involve system libraries at some point.
| Assignee | ||
Updated•6 years ago
|
| Assignee | ||
Comment 45•6 years ago
|
||
Comment 46•6 years ago
|
||
Hi Julian, your latest patch makes the failures go away:
Updated•6 years ago
|
| Assignee | ||
Comment 47•6 years ago
|
||
The LUL unwinder on x86_64-android and x86_64-linux tries to read Dwarf unwind
information from the ELF section named ".eh_frame", and expects the section to
have type SHT_PROGBITS. It appears that that section might instead have the
type SHT_X86_64_UNWIND, possibly as a result of recent toolchain (linker?)
changes. This patch tracks that change.
Comment 48•6 years ago
|
||
Comment 49•6 years ago
|
||
| bugherder | ||
Updated•6 years ago
|
| Assignee | ||
Comment 50•6 years ago
|
||
Comment on attachment 9103947 [details]
Bug 1584976 - LUL on x86_64-{android, linux}: accept .eh_frame with type as either SHT_PROGBITS or SHT_X86_64_UNWIND. r=mstange.
Beta/Release Uplift Approval Request
- User impact if declined: Expected to be minimal, as it affects mostly-developer-only functionality on three non-mainstream targets.
This concerns the Gecko profiler on x86_64-android (presumably a very rare configuration) and x86_64-linux or i686-linux (more common). In some build configurations, depending I think on which linker linked libxul.so, backtrace recovery during profiling will be mostly broken, and so usage of the Gecko profiler requiring native backtraces on those targets will also be impossible.
Patch has been on nightly now for circa 12 hours and hasn't bounced. I'm not sure what "verified in Nightly" means. It is covered by the gtest "LulIntegration.unwind_consistency".
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce: Build for android-x86_64. Do this using the mozconfig shown at https://bugzilla.mozilla.org/show_bug.cgi?id=1584976#c40. The exact mozconfig is important. I tried many mozconfigs before finding one that showed the failure.
Then run ./mach gtest LulIntegration.unwind_consistency. This should start an android simulator. The test will fail.
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It only affects x86_64-android, x86_64-linux and i686-linux, and then only for non-user-facing functionality. Patch is very small.
- String changes made/needed: none
| Assignee | ||
Comment 51•6 years ago
|
||
Small note: I am on vacation as of end of today. Markus Stange [:mstange] has
kindly agreed to shepherd this through the rest of the beta uplift process in
my absence.
Comment 52•6 years ago
|
||
Comment on attachment 9103947 [details]
Bug 1584976 - LUL on x86_64-{android, linux}: accept .eh_frame with type as either SHT_PROGBITS or SHT_X86_64_UNWIND. r=mstange.
Fix for broken behaviour in the gecko profiler, no end-user impact but we are early in the beta cycle and the patch is minimal and has tests, uplift approved for 71 beta 5, thanks.
Comment 53•6 years ago
|
||
Fixed verified in yesterday's beta-sims: https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception%2Crunnable&searchStr=gtest&author=ncsoregi%40mozilla.com&fromchange=f240cd4eb2e712483535f23a8d9175931fd388f1&selectedJob=273071829
Comment 54•6 years ago
|
||
| bugherder uplift | ||
| Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Description
•