Closed Bug 955876 Opened 11 years ago Closed 11 years ago

Dynamically loaded components crash libxul apps with Linux i686 + gold + no PGO

Categories

(Core :: XPCOM, defect, P5)

27 Branch
x86
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: connor.behan, Unassigned)

References

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 Steps to reproduce: Somewhere between 20.0 and 26.0, there has been an i686-specific regression. Apparently, it happens when building xulrunner but not firefox. I used gcc 4.8.2 and -O2. Actual results: Executables do not appear to be valid. They fail with: Inconsistency detected by ld.so: dl-lookup.c: 871: _dl_setup_hash: Assertion `(bitmask_nwords & (bitmask_nwords - 1)) == 0' failed! This first shows up during the "make install" or "make package" step. It fails because xpcshell cannot be executed. If I do the following: cd obj-i686-pc-linux-gnu/dist/bin ./run-mozilla.sh ./xpcshell I get the above error. Running it in gdb simply hangs the debugger. If I workaround this (hack packager.py so that no js files are precompiled and no xpcshell needs to be called) the xulrunner build I end up installing is useless. Programs compiled against it like firefox and instantbird crash with the same glibc inconsistency. Expected results: Bisecting this would take a long time. I really hope the error is informative to someone.
Attached file i686 log
Attached file x86_64 log
Summary: Working 64-bit xulrunner build fails when repeated with "setarch i686" → Dynamically loaded components crash xulrunner-26.0 compiled with gcc-4.8 on i686 but not x86_64
The problem is with components. If I delete components/libdbussservice.so and components/libmozgnome.so, xpcshell will start properly. If I link a firefox installation to this xulrunner build, firefox will give the same "inconsistency" unless I delete components/libbrowsercomps.so. The error *message* goes away if I link with -Wl,--hash-style=sysv but there is still a crash when loading components. The components themselves are proper executables. It is just something wrong with how they are loaded. Other dynamically loaded things like NPAPI plugins are fine.
Component: Build Config → XPCOM
I just tried --disable-elf-hack and it does not fix this.
Keywords: regression
I can confirm that xulrunner-25.0 is fine in this respect.
Priority: -- → P5
The problem did not go away with xulrunner-27.0. Can we set this to a higher priority since it seems to be a persistent bug?
Version: 26 Branch → 27 Branch
Since Firefox continues to work in general, this is not a high priority, no. You will need to debug and fix this yourself. Firefox itself is compiled with clang nowadays.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > Since Firefox continues to work in general, this is not a high priority, no. > You will need to debug and fix this yourself. Firefox itself is compiled > with clang nowadays. Not on Linux!
Luckily at least one of us wants the xulrunner builds to not be broken. The bisect is finally finished and pointed to mozilla-central changeset 142780:6f24ebad0ad8. I guess i686 xulrunner should not use gold.
See Also: → 904979
Attachment #8372945 - Flags: review?(benjamin)
Comment on attachment 8372945 [details] [diff] [review] Let people know about xulrunner + x86 + gold breakage I haven't seen any explanation of why this is true, or confirmation from anyone else that this is true.
Attachment #8372945 - Flags: review?(benjamin) → review?(mh+mozilla)
Comment on attachment 8372945 [details] [diff] [review] Let people know about xulrunner + x86 + gold breakage Review of attachment 8372945 [details] [diff] [review]: ----------------------------------------------------------------- Considering this is likely a gold bug you're hitting, I don't think this should be "documented" in configure.in without at least a version check. Also, if you really want to do this, that needs to be done in build/autoconf/compiler-opts.m4.
Attachment #8372945 - Flags: review?(mh+mozilla) → review-
Also, it would be good to know which of libxul.so or components are the broken ones (do components linked with BFD ld work with gold-linked.so, and vice-versa.
(In reply to Mike Hommey [:glandium] from comment #12) > Considering this is likely a gold bug you're hitting, I don't think this > should be "documented" in configure.in without at least a version check. > Also, if you really want to do this, that needs to be done in > build/autoconf/compiler-opts.m4. I thought about doing that but compiler-opts.m4 is sourced before configure.in checks the value of --enable-application. So would that need rearranging or is there some magic in m4 where things don't have to be sequential?
Presumably, if there's a gold bug, it's going to happen for any application. But really, i'd rather know what it's all about before changing anything.
(In reply to Mike Hommey [:glandium] from comment #15) > Presumably, if there's a gold bug, it's going to happen for any application. > But really, i'd rather know what it's all about before changing anything. Ok, you're right. Whether I build browser or xulrunner from the mozilla-release tarball, the same bug happens. I originally thought that this only happened for xulrunner because the Archlinux i686 firefox package builds correctly (I have been using the same Archlinux build server this whole time). However, that may be because of a different mozconfig. They are using default libs and PGO. I am using system libs and no PGO.
Scratch that, it's actually just cairo where they don't use the system version.
Attached file Broken mozconfig
Do you mean that removing --enable-system-cairo from your mozconfig fixes the issue for you?
(In reply to Mike Hommey [:glandium] from comment #13) > Also, it would be good to know which of libxul.so or components are the > broken ones (do components linked with BFD ld work with gold-linked.so, and > vice-versa. Unfortunately, it is acting like a Heisenbug. If I start with a working --enable-release build, I cannot break the build by going back and manually relinking stuff with gold.
(In reply to Mike Hommey [:glandium] from comment #19) > Do you mean that removing --enable-system-cairo from your mozconfig fixes > the issue for you? I just tried and no, the issue remains. It must be something in PGO that causes the Archlinux firefox package to not run into this.
Confirmed... PGO makes a difference too.
Keywords: regression
Summary: Dynamically loaded components crash xulrunner-26.0 compiled with gcc-4.8 on i686 but not x86_64 → Dynamically loaded components crash libxul apps with Linux i686 + gold + no PGO
There is only one machine where I can build libxul in a reasonable amount of time, and I don't have permission to change the toolchain on it. So I just know that it happens with glibc-2.18, gcc-4.8.2 and binutils-2.24. PGO and --enable-release are workarounds for now. I will try again when there is aun update to the GNU stuff.
You can also use --disable-gold
Mozilla 27 still fails but mozilla 28 just worked. Did any patches land with the aim of working around bugs in gold? Or was this fixed by accident? I would close this bug and re-open if it happens again.
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: