Closed Bug 628618 Opened 9 years ago Closed 9 years ago

elfhack broken with some versions of binutils gold

Categories

(Firefox Build System :: General, defect)

x86_64
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla2.0b11

People

(Reporter: octoploid, Assigned: glandium)

References

(Depends on 1 open bug)

Details

(Whiteboard: [workaround: ac_add_options --disable-elf-hack])

Attachments

(14 files, 2 obsolete files)

84.23 KB, application/octet-stream
Details
65.62 KB, application/octet-stream
Details
1.08 KB, patch
taras.mozilla
: review+
sdwilsh
: approval2.0+
Details | Diff | Splinter Review
1.01 KB, patch
taras.mozilla
: review+
sdwilsh
: approval2.0+
Details | Diff | Splinter Review
39.71 KB, application/x-bzip
Details
25.73 KB, text/plain
Details
47.92 KB, application/x-bzip
Details
84.29 KB, application/octet-stream
Details
1.52 KB, application/x-object
Details
928 bytes, patch
taras.mozilla
: review+
sdwilsh
: approval2.0+
Details | Diff | Splinter Review
1.85 KB, patch
taras.mozilla
: review+
sdwilsh
: approval2.0+
Details | Diff | Splinter Review
2.92 KB, patch
taras.mozilla
: review+
sdwilsh
: approval2.0+
Details | Diff | Splinter Review
6.64 KB, patch
taras.mozilla
: review+
sdwilsh
: approval2.0+
Details | Diff | Splinter Review
1.51 KB, patch
taras.mozilla
: review+
sdwilsh
: approval2.0+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:2.0b10pre) Gecko/20100101 Firefox/4.0b10pre
Build Identifier: 

% ld -v
GNU gold (Linux/GNU Binutils 2.21.51.0.6.20110118) 1.10

elfhack[4667]: segfault at 14 ip 000000000040a362 sp 00007fff76bb76c0 error 4 in elfhack[400000+11000]

c++ -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-variadic-macros -Werror=return-type -Wno-long-long -march=native -fno-strict-aliasing -fshort-wchar -pthread -pipe -fexceptions  -DNDEBUG -DTRIMMED -O3 -fPIC -shared -Wl,-z,defs -Wl,-h,test.so -o test.so test.o
rm -f test.so.bak
/var/tmp/mozilla-central/moz-build-dir/build/unix/elfhack/elfhack -b test.so
make[6]: *** [test.so] Segmentation fault
make[6]: *** Deleting file `test.so'


Reproducible: Always
Attached file elfhack binary
Assignee: nobody → mh+mozilla
Attached file test.so
 % /lib/libc.so.6
GNU C Library stable release version 2.13, by Roland McGrath et al.
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.5.2.
Compiled on a Linux 2.6.36 system on 2011-01-18.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
(gdb) bt
#0  0x000000000040a362 in ElfLocation::ElfLocation(unsigned int, Elf*) ()
#1  0x0000000000406192 in Elf::Elf(std::basic_ifstream<char, std::char_traits<char> >&) ()
#2  0x000000000040a975 in do_file(char const*, bool) ()
#3  0x000000000040b7c8 in main ()
The crash is due to the elf entry point pointing nowhere in an elf section on this test.so.
This patch fixes the crash but unveils another bug, in that the .dynamic section is not found if it follows .tbss.
Attachment #506709 - Flags: review?(tglek)
This forces the PT_DYNAMIC segment to only contain SHT_DYNAMIC sections, avoiding .tbss to get in the way.

Could you test if all works for you with the 2 patches applied?
Yes, it all works fine now. Many thanks for the quick fixes.
Attachment #506710 - Flags: review?(tglek)
Looks good:
cd ../../dist/firefox; find . -name "*.so" | xargs ../../build/unix/elfhack/elfhack
./components/libnkgnomevfs.so: No gain. Aborting
./components/libbrowsercomps.so: Reduced by 12128 bytes
./components/libdbusservice.so: No gain. Aborting
./components/libmozgnome.so: No gain. Aborting
./libnspr4.so: Reduced by 8032 bytes
./libfreebl3.so: Reduced by 3936 bytes
./libnssutil3.so: Reduced by 8024 bytes
./libnssckbi.so: Reduced by 73568 bytes
./libplc4.so: No gain. Aborting
./libmozsqlite3.so: Reduced by 8032 bytes
./libnssdbm3.so: No gain. Aborting
./libplds4.so: No gain. Aborting
./libmozalloc.so: No gain. Aborting
./libnss3.so: Reduced by 8032 bytes
./libxul.so: Reduced by 4652896 bytes
./libssl3.so: Reduced by 3936 bytes
./libsoftokn3.so: Reduced by 3936 bytes
./libsmime3.so: No gain. Aborting
./libxpcom.so: No gain. Aborting
Stripping package directory...
Unfortunately libnssutil3.so doesn't survive the size reduction:

firefox-bin[15642]: segfault at 7f1af6e95d60 ip 00007f1af6e95d60 sp 00007fff49d80598 error 7 in libnssutil3.so[7f1af6e
92000+4000]
(In reply to comment #9)
> Unfortunately libnssutil3.so doesn't survive the size reduction:
> 
> firefox-bin[15642]: segfault at 7f1af6e95d60 ip 00007f1af6e95d60 sp
> 00007fff49d80598 error 7 in libnssutil3.so[7f1af6e
> 92000+4000]

When does that happen? During startup or later?
Yes, during startup.
A --with-system-nss build runs just fine.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(In reply to comment #11)
> Yes, during startup.

Could you attach the original libnssutil3.so (it should be in dist/bin), as well as the content of /proc/pid/maps and segfault address when the crash happens ?
I will have to rebuild the library (because I've deleted the build
directory).

In the meantime here's a backtrace:

(gdb) run
Starting program: /usr/lib/firefox-4.0b10pre/firefox-bin --sync
[Thread debugging using libthread_db enabled]
warning: Loadable segment ".elfhack.text.v0" outside of ELF segments
warning: Loadable segment ".elfhack.data.v0" outside of ELF segments
warning: Loadable segment ".elfhack.text.v0" outside of ELF segments
warning: Loadable segment ".elfhack.data.v0" outside of ELF segments
warning: Loadable segment ".elfhack.text.v0" outside of ELF segments
warning: Loadable segment ".elfhack.data.v0" outside of ELF segments

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff52f5d60 in ?? () from /usr/lib/firefox-4.0b10pre/libnssutil3.so
(gdb) bt
#0  0x00007ffff52f5d60 in ?? () from /usr/lib/firefox-4.0b10pre/libnssutil3.so
#1  0x00007ffff7dec809 in call_init (l=0x7ffff7e074f0, argc=2, argv=0x7fffffffdee8, env=0x7fffffffdf00) at dl-init.c:70
#2  0x00007ffff7dec947 in _dl_init (main_map=0x7ffff7ffe128, argc=2, argv=0x7fffffffdee8, env=0x7fffffffdf00) at dl-init.c:134
#3  0x00007ffff7ddfaea in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#4  0x0000000000000002 in ?? ()
#5  0x00007fffffffe237 in ?? ()
#6  0x00007fffffffe25e in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) q
Attached file buggy libnssutil3.so
Attached file /proc/28147/maps
firefox-bin[28006]: segfault at 7fdb3c8add60 ip 00007fdb3c8add60 sp 00007ffffdff9348 error 7 in libnssutil3.so[7fdb3c8
aa000+4000]


Program received signal SIGSEGV, Segmentation fault.
0x00007ffff52f5d60 in ?? () from /usr/lib/firefox-4.0b10pre/libnssutil3.so
(gdb) bt
#0  0x00007ffff52f5d60 in ?? () from /usr/lib/firefox-4.0b10pre/libnssutil3.so
#1  0x00007ffff7dec809 in call_init (l=0x7ffff7e074f0, argc=2, argv=0x7fffffffdec8, env=0x7fffffffdee0) at dl-init.c:70
#2  0x00007ffff7dec947 in _dl_init (main_map=0x7ffff7ffe128, argc=2, argv=0x7fffffffdec8, env=0x7fffffffdee0) at dl-init.c:134
#3  0x00007ffff7ddfaea in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
Attachment #506730 - Attachment is patch: true
Attachment #506730 - Attachment mime type: application/octet-stream → text/plain
(In reply to comment #14)
> Created attachment 506729 [details]
> buggy libnssutil3.so

I'm more interested in the original one.
(In reply to comment #17)
> (In reply to comment #14)
> > Created attachment 506729 [details]
> > buggy libnssutil3.so
> 
> I'm more interested in the original one.

It should be in dist/bin, untouched.
Attachment #506730 - Attachment is patch: false
  [ 7] .rela.dyn         RELA             0000000000003970  00003970
       00000000000003f0  0000000000000018   A       1     0     8
  [ 8] .elfhack.text.v0  PROGBITS         0000000000003d60  00203d60
       000000000000005c  0000000000000000  AX       0     0     32
  [ 9] .elfhack.data.v0  PROGBITS         0000000000003dc0  00203dc0
       0000000000001378  0000000000000008   A       0     0     8
  [10] .rela.plt         RELA             00000000000077d8  000047d8
       0000000000000a20  0000000000000018   A       1    12     8

That doesn't make sense... And I can't reproduce with a local elfhack with your original libnssutil3.so :(
Could you attach elfhack and inject/x86_64.o ?
Attached file elfhack
Attached file inject/x86_64.o
I can't even reproduce with your files :(

Could you try to run build/unix/elfhack/elfhack -b dist/bin/libnssutil3.so (the non broken file will be saved as dist/bin/libnssutil3.so.bak) and see what readelf -S dist/bin/libnssutil3.so has to say, compared to comment 20 ?
I do see other problems, though, with the PT_LOAD segments
It says:
  [ 7] .rela.dyn         RELA             0000000000003970  00003970
       00000000000003f0  0000000000000018   A       1     0     8
  [ 8] .elfhack.text.v0  PROGBITS         0000000000003d60  00003d60
       000000000000005c  0000000000000000  AX       0     0     32
  [ 9] .elfhack.data.v0  PROGBITS         0000000000003dc0  00003dc0
       0000000000001378  0000000000000008   A       0     0     8
  [10] .rela.plt         RELA             00000000000077d8  000057d8
       0000000000000a20  0000000000000018  AI       1    12     8
Looks identical except A vs. AI in .rela.plt
BTW could this be related to the following glibc commit http://sourceware.org/git/?p=glibc.git;a=commit;h=4a531bb0b3b582cb693de9f76d2d97d970f9a5d5 ?
(In reply to comment #25)
> Looks identical except A vs. AI in .rela.plt

Not identical, check the last column.
Another thing that I've just observed is that libnssutil3.so was
linked without using my exported LDFLAGS="-Wl,-O1,--hash-style=gnu,--as-needed".
(There is no .gnu.hash section).
This makes isRelocatable return the proper value even when the ElfSection has other flags set, like
  [10] .rela.plt         RELA             00000000000077d8  000057d8
       0000000000000a20  0000000000000018  AI       1    12     8
that has the SHF_INFO_LINK flag set. This solves the PT_LOAD problem I was seeing with your libnssutil3.so, but also unveils another bug with the PT_PHDR segment.
Attachment #506751 - Flags: review?(tglek)
Attachment #506709 - Flags: review?(tglek) → review+
Attachment #506710 - Flags: review?(tglek) → review+
Attachment #506751 - Flags: review?(tglek) → review+
Still not enough, but that solves an issue I have locally with gold 2.20.1:
elfhack: elf.cpp:284: Elf::Elf(std::ifstream&): Assertion `segment->getFileSize() == phdr.p_filesz' failed.
Attachment #506812 - Flags: review?(tglek)
Attachment #506812 - Flags: review?(tglek) → review+
Summary: elfhack segfaults with latest binutils gold → elfhack broken with binutils gold
Whiteboard: [workaround: ac_add_options --disable-elf-hack]
Summary: elfhack broken with binutils gold → elfhack broken with some versions of binutils gold
While GNU ld always keeps some empty slots in the .dynamic section, it looks like gold doesn't.
This patch avoids growing the .dynamic section, which isn't very well supported.
I have another patch to raise an actual error when growing the .dynamic section, but I'll make a bundle with other error raising and put in bug 628627.

With this patch, the testcase finally works for me with gold, though the attached libnssutil3.so still doesn't.
Attachment #507052 - Flags: review?(tglek)
This should be the final part. Octoploid, could you check that applying the 6 patches from here fixes the issues for you?

The last problem that appeared from the attached libnssutil3.so and that this patch fixes was that the .dynamic section is not positioned by gold at the closest place considering the previous section position and alignment requirement, while GNU ld does that for all sections. As such, we can't rely on our own algorithm to place the sections before doing anything to them. We ended up calculating that .dynamic wasn't in the PT_LOAD is was supposed to be part of, making it smaller than it was supposed to be, thus leading to this assert():
elfhack: elf.cpp:285: Elf::Elf(std::ifstream&): Assertion `segment->getFileSize() == phdr.p_filesz' failed.
Attachment #507058 - Flags: review?(tglek)
Yes everything runs fine again with the six patches applied.

(You might be interested in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770 .
I've tested this gcc patch and it works fine.
% objdump -s -j .init_array /usr/lib/firefox-4.0b10pre/libxul.so

/usr/lib/firefox-4.0b10pre/libxul.so:     file format elf64-x86-64

Contents of section .init_array:
 22d6ac0 00d85b00 00000000 20d85b00 00000000  ..[..... .[.....
 22d6ad0 40d85b00 00000000 60d85b00 00000000  @.[.....`.[.....
 22d6ae0 a0d85b00 00000000 c0d85b00 00000000  ..[.......[.....
 22d6af0 e0d85b00 00000000 00d95b00 00000000  ..[.......[.....
 22d6b00 20d95b00 00000000 40d95b00 00000000   .[.....@.[.....
 22d6b10 60d95b00 00000000 00db5b00 00000000  `.[.......[.....
 22d6b20 20db5b00 00000000 60db5b00 00000000   .[.....`.[.....
...

So maybe one day in the not too distant future we won't need your elf hack anymore.)
(In reply to comment #33)
> Yes everything runs fine again with the six patches applied.
> 
> (You might be interested in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770

I'm the one who filed this bug ;)

> So maybe one day in the not too distant future we won't need your elf hack
> anymore.)

Actually, it doesn't address what elfhack addresses, but it addresses bug 606137.
(In reply to comment #34)
> (In reply to comment #33)
> > Yes everything runs fine again with the six patches applied.
> > 
> > (You might be interested in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770
> 
> I'm the one who filed this bug ;)

Haha, didn't notice.

> > So maybe one day in the not too distant future we won't need your elf hack
> > anymore.)
> 
> Actually, it doesn't address what elfhack addresses, but it addresses bug
> 606137.

OK, I thought this was already a part of elfhack.
Same as the previous one, but also delay code injection after parsing relocations.
Attachment #507098 - Flags: review?(tglek)
Attachment #507058 - Attachment is obsolete: true
Attachment #507058 - Flags: review?(tglek)
Attachment #507052 - Flags: review?(tglek) → review+
Attachment #507098 - Flags: review?(tglek) → review+
Attachment #506709 - Flags: approval2.0?
Attachment #506710 - Flags: approval2.0?
Attachment #506751 - Flags: approval2.0?
Attachment #506812 - Flags: approval2.0?
Attachment #507052 - Flags: approval2.0?
Attachment #507098 - Flags: approval2.0?
There was a problem on arm, where the init_array section could be moved (it has its own section type, contrary to .ctors) and end up not being properly relocated by the injected code at runtime. With a whitelist of sections that can be moved, we avoid moving some sections unexpectedly.
Attachment #507401 - Flags: review?(tglek)
Checking the SHF_ALLOC flag is still needed.
Attachment #507422 - Flags: review?(tglek)
Attachment #507401 - Attachment is obsolete: true
Attachment #507401 - Flags: review?(tglek)
Attachment #507422 - Flags: review?(tglek) → review+
Attachment #507422 - Flags: approval2.0?
The patch queue here improves correctness of elfhack, and fixes various issues that show up when using gold instead of GNU ld.

It has been tested on try in ceae7158f7f6 (build + xpcshell, all green), except for a small change in patch 7 that only has an effect on some elf headers that runtime doesn't care about. IOW, risk is low.
Attachment #506709 - Flags: approval2.0? → approval2.0+
Attachment #506710 - Flags: approval2.0? → approval2.0+
Attachment #506751 - Flags: approval2.0? → approval2.0+
Attachment #506812 - Flags: approval2.0? → approval2.0+
Attachment #507052 - Flags: approval2.0? → approval2.0+
Attachment #507098 - Flags: approval2.0? → approval2.0+
Attachment #507422 - Flags: approval2.0? → approval2.0+
Target Milestone: --- → mozilla2.0b11
Blocks: 606145
Depends on: 676198
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.