Closed Bug 727959 Opened 12 years ago Closed 12 years ago

Browser crashes after loading start page on Android/x86 (android.view.InflateException: Binary XML file line #28: Error inflating class <unknown>: at android.view.LayoutInflater.createView(LayoutInflater.java:606))

Categories

(Firefox for Android Graveyard :: General, defect, P1)

x86
Android
defect

Tracking

(blocking-fennec1.0 -)

RESOLVED FIXED
Firefox 14
Tracking Status
blocking-fennec1.0 --- -

People

(Reporter: thomas.g.eaton, Assigned: glandium)

Details

(Keywords: crash, Whiteboard: [native-crash])

Crash Data

Attachments

(6 files, 7 obsolete files)

Attached file logcat_firefox_crash.txt (obsolete) —
User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.1.4322; InfoPath.2; MS-RTC LM 8; .NET4.0C; .NET4.0E)

Steps to reproduce:

Build Fennec for x86/Android and run it on a tablet


Actual results:

It crashes shortly after loading the start screen


Expected results:

It should have continued to run.
OS: Windows 7 → Android
Hardware: x86_64 → x86
Thomas, can you apply this patch, which won't fix your problem, but help identify it more easily, and paste new logs with it applied? Thanks.
Attachment #598159 - Flags: review?
Attachment #598159 - Flags: review? → review?(taras.mozilla)
Severity: normal → critical
Status: UNCONFIRMED → NEW
Crash Signature: [@ android.view.InflateException: Binary XML file line #28: Error inflating class <unknown>: at android.view.LayoutInflater.createView(LayoutInflater.java) ]
Ever confirmed: true
Keywords: crash
Summary: Browser crashes after loading start page on Android/x86 → Browser crashes after loading start page on Android/x86 (android.view.InflateException: Binary XML file line #28: Error inflating class <unknown>: at android.view.LayoutInflater.createView(LayoutInflater.java:606))
Whiteboard: [native-crash]
Attachment #598159 - Flags: review?(taras.mozilla) → review+
Attached file New log prior to adding debug patch. (obsolete) —
This log appears to have the "unknown" value filled in.
Attachment #597955 - Attachment is obsolete: true
Attached file log after adding debug patch (obsolete) —
This log is from after adding glandium's debug patch.
Here's a link to the apk via dropbox if you want to grab it and look at the libs you mentioned earlier.  

http://dl.dropbox.com/u/59285258/fennec-13.0a1.en-US.android-i686.apk
Assignee: nobody → mh+mozilla
(In reply to Thomas G Eaton from comment #3)
> Created attachment 598441 [details]
> log after adding debug patch
> 
> This log is from after adding glandium's debug patch.

Unfortunately, there aren't any GeckoLinker logs here. (which is essentially unexpected, you should at least get messages about unhandled dynamic headers)
If the original log corresponds to that apk, there's something seriously wrong with the system linker, then, because the symbols that resolve to NULL are __wrap_malloc, which is in libmozglue.so, and getnameinfo, which is in libc.so. Are there ICS images for use in a virtual machine available somewhere or do we need to rely on actual x86 tablets for testing?
The original log doesn't correspond to that apk.  I no longer had that apk and wasn't able to pull it from the the device.  I added the "export MOZ_OLD_LINKER=1" to the mozconfig and it fixed the crash issue.  I assumed that you were looking at the initial "Error inflating class" line which was still in the log even though the browser was no longer crashing.  

Would you like me to rebuild without the old linker flag and with your debug patch?  

I'm not aware of any images for use in a virtual machine.  All my testing has been on a tablet or phone.
Thomas, just wondering which NDK version are you using?  I had the same issue building on the Mac in which I was using the NDK 6b.  mfinkle had told me that I needed to add the "export MOZ_OLD_LINKER=1" in order to not crash.  

There's a fix that was just recently pushed (bug 720621) that might be related?
> There's a fix that was just recently pushed (bug 720621) that might be
> related?

That bug leads to "Text relocation not supported" messages. The original log in this bug is about NULL relocations, which is a different problem.

(In reply to Thomas G Eaton from comment #7)
> Would you like me to rebuild without the old linker flag and with your debug
> patch?  

Yes please.
Attachment #598159 - Attachment description: Add symbol name to relocation errors in the linker → Add symbol name to relocation errors in the linker [checked-in]
Whiteboard: [native-crash] → [native-crash][leave open after inbound merge]
Attached file Logcat with error and debug patch (obsolete) —
Here is a link to the apk that corresponds to this logcat. http://dl.dropbox.com/u/59285258/fennec-13.0a1.en-US.android-i686.apk
Attachment #598440 - Attachment is obsolete: true
Attachment #598441 - Attachment is obsolete: true
So, the failing relocation in both cases (mozsqlite and xul) is for __register_frame_info_bases. This symbol comes from the CRT object files, and is found nowhere in the libs provided by the NDK. Where is this symbol defined in system libraries?
I'm not certain, but is it defined in libgcc?

$ grep -ir register_frame_info_bases *
Binary file android_ndk/android-ndk-r6b/platforms/android-9/arch-x86/usr/lib/crtbegin_static.o matches
Binary file android_ndk/android-ndk-r6b/platforms/android-9/arch-x86/usr/lib/crtbegin_so.o matches
Binary file android_ndk/android-ndk-r6b/platforms/android-9/arch-x86/usr/lib/crtbegin_dynamic.o matches
Binary file android_ndk/android-ndk-r6b/sources/cxx-stl/stlport/libs/x86/libstlport_shared.so matches
Binary file android_ndk/android-ndk-r6b/toolchains/x86-4.4.3/prebuilt/linux-x86/lib/gcc/i686-android-linux/4.4.3/crtbeginS.o matches
Binary file android_ndk/android-ndk-r6b/toolchains/x86-4.4.3/prebuilt/linux-x86/lib/gcc/i686-android-linux/4.4.3/crtbeginT.o matches
Binary file android_ndk/android-ndk-r6b/toolchains/x86-4.4.3/prebuilt/linux-x86/lib/gcc/i686-android-linux/4.4.3/libgcc.a matches
Binary file android_ndk/android-ndk-r6b/toolchains/x86-4.4.3/prebuilt/linux-x86/lib/gcc/i686-android-linux/4.4.3/crtbegin.o matches
Ah I missed it in libgcc.a. So you need to figure out why libgcc.a is (apparently) not linked in the libraries.
Attached file Another log
Clean build from current inbound. Installed over earlier Fennec build from inbound (which at some point in the past little while started exhibiting this problem).

Figured I'd post a log.
Uninstalling Fennec on the device, and installing the exact same APK again: everything works.

(I should point out that I was getting the exact same error as the original report, but no crash: instead the Bookmarks view was dying, but the app stayed up and responsive.)
https://hg.mozilla.org/mozilla-central/rev/e18282cd5f61
Status: NEW → ASSIGNED
Target Milestone: --- → Firefox 13
Priority: -- → P1
This is what is happening:
On x86 android NDK, crtbegin_so.o contains a frame_dummy function that calls __register_frame_info_bases. The reference to this symbol is weak. That function is defined in libgcc.a. ld is called with both crtbegin_so.o and libgcc.a, and when linking in that case, it throws away libgcc.a's definition of the function and declares the symbol as undefined. This is actually a feature. I don't know why, but it's the way it works (see further below).

Anyways, the result is that whatever we do, with the current content of the NDK, we can only end up with a relocation for a weak __register_frame_info_bases. However, as far as I can see, frame_dummy is called from nowhere, so, that the relocation ends up writing a NULL pointer in the PLT is not going to be a runtime problem.

On the other hand, on more general terms, there are legitimate uses of weak symbols that can end up with the same scheme, and that shouldn't fail either. So what we actually need to do is not to fail on weak symbol resolution ending up in a NULL, even in the PLT. This should allow the android x86 build to go further in the startup process.

For reference, the linking "problem" can be reproduced with the following:
$ echo "int b() __attribute__((weak)); int a() { return b(); }" > a.c
$ echo "int b() { return 42; }" > b.c
$ gcc -o a.o -c a.c -fPIC
$ gcc -o b.o -c b.c -fPIC
$ ar cr b.a b.o
$ ld -shared -o foo.so a.o b.o
$ objdump -T foo.so | grep b$
0000000000000260 g    DF .text	000000000000000b b
$ ld -shared -o bar.so a.o b.a
$ objdump -T bar.so | grep b$
0000000000000000  w   D  *UND*	0000000000000000 b
Attachment #599907 - Flags: review?(taras.mozilla)
Looking much further in crtbegin_so.o, it appears that frame_dummy *is* called, as a static initializer. BUT, fortunately, it does something like:
  if (__register_frame_info_bases)
    __register_frame_info_bases(args);

so in practice, this shouldn't be a problem.
Looking even much further, it appears that the state of definition of __register_frame_info_bases in the various c++ shared libraries in NDK r7b matches support for exceptions, which is where frame info registration is important. So all in all, since we're not using exceptions and linking against stlport, which doesn't support them anyways, we're just fine without this call.
Attached file Logcat crash with latest patch (obsolete) —
Comment on attachment 599907 [details] [diff] [review]
Don't error out when missing symbol for PLT relocations is weak [checked-in]

For nitpicky linker stuff, Nathan is a better reviewer
Attachment #599907 - Flags: review?(taras.mozilla) → review?(nfroyd)
(In reply to Thomas G Eaton from comment #22)
> Created attachment 600202 [details]
> Logcat crash with latest patch

Can you update to the latest mozilla-central (or inbound), add a #define MOZ_DEBUG_LINKER in mozglue/linker/Logging.h, and try again? (log will be significantly more verbose)
Comment on attachment 599907 [details] [diff] [review]
Don't error out when missing symbol for PLT relocations is weak [checked-in]

Review of attachment 599907 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me.
Attachment #599907 - Flags: review?(nfroyd) → review+
I still often get this on Nightly (02/29).
blocking-fennec1.0: --- → ?
Crash Signature: [@ android.view.InflateException: Binary XML file line #28: Error inflating class <unknown>: at android.view.LayoutInflater.createView(LayoutInflater.java) ] → ComponentInfo{org.mozilla.fennec/org.mozilla.gecko.AwesomeBar}: android.view.InflateException: Binary XML file line #18: Error inflating class <unknown>: at android.app.ActivityThread.performLaunchAc... ] [@ android.view.InflateException: Binary XML file …
Mike, you ready to land?
blocking-fennec1.0: ? → +
(In reply to Doug Turner (:dougt) from comment #27)
> Mike, you ready to land?

Apparently, this is not enough to fix the linker issue. I'm waiting for Thomas to respond to comment 24.
Sorry for the delay the last week.  I'll be able to build and capture the logs with the more verbose logging tomorrow.
I wasn't able to get my dev machine up and working today to get this done.  I'll be out of the office for the rest of the week.  I should be able to provide the logs next week.  There is a tablet device in Mountain View where this issue could be replicated if progress needs to be made on this bug prior to Monday.  Sorry for the delay.
Comment on attachment 599907 [details] [diff] [review]
Don't error out when missing symbol for PLT relocations is weak [checked-in]

https://hg.mozilla.org/integration/mozilla-inbound/rev/973003f8b502
Attachment #599907 - Attachment description: Don't error out when missing symbol for PLT relocations is weak → Don't error out when missing symbol for PLT relocations is weak [checked-in]
blocking-fennec1.0: + → ---
blocking-fennec1.0: --- → -
Attached file Logcat with MOZ_DEBUG_LINKER defined (obsolete) —
(In reply to Thomas G Eaton from comment #33)
> Created attachment 605203 [details]
> Logcat with MOZ_DEBUG_LINKER defined

You need to apply the patches from this bug, too :)
Sorry, I only added the patch from early in the bug and missed that latest one.  Here's an updated logcat with the patch from comment 31.
Attachment #599207 - Attachment is obsolete: true
Attachment #600202 - Attachment is obsolete: true
Attachment #605203 - Attachment is obsolete: true
Thanks. That very much looks like a toolchain bug. I guess we'll have to work around it. Could you attach the libmozsqlite3.so file from that build?
OS: Android → MeeGo
Target Milestone: Firefox 13 → ---
Version: Trunk → Firefox 14
OS: MeeGo → Android
Version: Firefox 14 → Trunk
Attached patch tentative workaround (obsolete) — Splinter Review
Can you try this patch and see if you go further?
It loads with no issues with that patch and I am able to do a google image search with no issues.
Attachment #598159 - Attachment is obsolete: true
Attachment #605452 - Attachment is obsolete: true
Attachment #598159 - Attachment is obsolete: false
Attachment #605512 - Attachment filename: bug727959 → bug727959-3
Comment on attachment 605512 [details] [diff] [review]
Ignore 0xffffffff entries the x86 Android NDK puts in .{init,fini}_array

Feels like this should spit out an adb log message about linker being busted
Attachment #605512 - Flags: review?(taras.mozilla) → review+
> Feels like this should spit out an adb log message about linker being busted

I'm not convinced it's worth polluting logs with something we already know.
https://hg.mozilla.org/integration/mozilla-inbound/rev/8133f0c6fc4e
Whiteboard: [native-crash][leave open after inbound merge] → [native-crash]
https://hg.mozilla.org/mozilla-central/rev/8133f0c6fc4e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 14
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: