Closed Bug 606145 Opened 14 years ago Closed 13 years ago

Compress/pack relocations in ELF binaries (elfhack)

Categories

(Firefox Build System :: General, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla2.0b11

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(3 files, 15 obsolete files)

3.33 KB, patch
khuey
: review+
benjamin
: approval2.0+
Details | Diff | Splinter Review
69.29 KB, patch
taras.mozilla
: review+
jbramley
: feedback+
benjamin
: approval2.0+
Details | Diff | Splinter Review
7.95 KB, patch
Details | Diff | Splinter Review
The idea, that I'm going to experiment in the coming days, is the following:
- Insert a constructor function (.ctor) that would handle a compressed or packed relocations section, and put it at the end of the .ctors section with something like what is in bug 606137.
- Get rid of the relocations except for the ones necessary to our constructor function.
- Modify the ELF header such that libc won't remove write permissions on the mapped memory, which we would remove ourselves.

The 2 first steps might be replaced by modifying the ELF entry pointer, if that works out.

This could benefit all linux builds including android.

This could also be extended to do fancy stuff such as madvise()ing.
Assignee: nobody → mh+mozilla
See http://glandium.org/blog/?p=1177#relocations for some details, and http://hg.mozilla.org/users/mh_glandium.org/elfhack/ for the current implementation.

I'm going to attach two patches for the build system integration.
Attachment #495508 - Flags: review?(ted.mielczarek)
I'm not attaching the source code itself (which will be part 2), as I want to fix a bunch of glitches I encountered, first.
Attachment #495509 - Flags: review?(ted.mielczarek)
Attached patch part 2 - Import elfhack code (obsolete) — Splinter Review
Attachment #495526 - Flags: review?(ted.mielczarek)
Comment on attachment 495526 [details] [diff] [review]
part 2 - Import elfhack code

This doesn't build on buildbots because of problems with kernel headers:

/usr/include/linux/byteorder/swab.h: In function '__u16 __fswab16(__u16)':
/usr/include/linux/byteorder/swab.h:134: error: ISO C++ forbids braced-groups within expressions
/usr/include/linux/byteorder/swab.h:134: error: ISO C++ forbids braced-groups within expressions
/usr/include/linux/byteorder/swab.h: In function '__u16 __swab16p(const __u16*)':
/usr/include/linux/byteorder/swab.h:138: error: ISO C++ forbids braced-groups within expressions
/usr/include/linux/byteorder/swab.h:138: error: ISO C++ forbids braced-groups within expressions
/usr/include/linux/byteorder/swab.h: In function 'void __swab16s(__u16*)':
/usr/include/linux/byteorder/swab.h:142: error: ISO C++ forbids braced-groups within expressions
/usr/include/linux/byteorder/swab.h:142: error: ISO C++ forbids braced-groups within expressions
Attachment #495526 - Flags: review?(ted.mielczarek)
Added missing defines.
Attachment #495509 - Attachment is obsolete: true
Attachment #495533 - Flags: review?(ted.mielczarek)
Attachment #495509 - Flags: review?(ted.mielczarek)
Comment on attachment 495533 [details] [diff] [review]
part 3 - Integrate elfhack with the build system

There are a few more issues to fix. I think I'll first go with a configure option defaulting to disabled, so that we can land this asap and enhance/fix it before defaulting to enabled.
Attachment #495533 - Flags: review?(ted.mielczarek)
Attached patch part 2 - Import elfhack code (obsolete) — Splinter Review
This has been successfully tested on maemo, android, linux and linux64 (after some small fixes)
Attachment #495526 - Attachment is obsolete: true
Attachment #495791 - Flags: review?(ted.mielczarek)
This is now disabled by default, and requires --enable-elf-hack to be enabled. This could thus land ASAP as NPOTB, and we can further improve before enabling by default.
Attachment #495533 - Attachment is obsolete: true
Attachment #495792 - Flags: review?(ted.mielczarek)
Comment on attachment 495791 [details] [diff] [review]
part 2 - Import elfhack code

Taras agreed to review this part.
Attachment #495791 - Flags: review?(ted.mielczarek) → review?(tglek)
Attachment #495791 - Flags: review?(jimb)
This needs a README(or a comment) to describe what it does. At least one testcase to verify that it is working.
Comment on attachment 495791 [details] [diff] [review]
part 2 - Import elfhack code

Use .h instead of .hxx

void isNextOf makes no sense, rename it. 

I think this code needs a basic sanity check testcase within elfhack/ directory since it will be useful outside of Mozilla too.

Rest of the code looks like reasonable elf-fiddling.
Attachment #495791 - Flags: review?(tglek)
Attachment #495791 - Flags: review?(jimb)
Attachment #495791 - Flags: review+
Attachment #495791 - Flags: feedback?(jimb)
(In reply to comment #11)
> This needs a README(or a comment) to describe what it does. At least one
> testcase to verify that it is working.

Yeah, I agree.
This hopefully addresses all your comments. Unfortunately, the testcase won't run when cross compiling (e.g. for android)
Attachment #499328 - Flags: review?(tglek)
Adjusted for the testcase added in attachment #499328 [details] [diff] [review]
Attachment #495792 - Attachment is obsolete: true
Attachment #499329 - Flags: review?(ted.mielczarek)
Attachment #495792 - Flags: review?(ted.mielczarek)
(In reply to comment #14)
> Created attachment 499328 [details] [diff] [review]
> Addition to part 2 to address review comments
> 
> This hopefully addresses all your comments. Unfortunately, the testcase won't
> run when cross compiling (e.g. for android)

There's additionally a fix that makes ./elfhack elfhack not segfault.
Attached patch part 2 - Import elfhack code (obsolete) — Splinter Review
Updated again (with irc comments from Taras for the README), and merged both subparts.
Attachment #495791 - Attachment is obsolete: true
Attachment #499328 - Attachment is obsolete: true
Attachment #495791 - Flags: feedback?(jimb)
Attachment #499328 - Flags: review?(tglek)
As agreed on IRC with Taras, the naming of the injected sections, as implemented in the last iteration of part 2, is the last step before enabling the hack by default, which I'm now doing by removing the configure parts, and USE_ELF_HACK conditionals.
Attachment #499329 - Attachment is obsolete: true
Attachment #499387 - Flags: review?(ted.mielczarek)
Attachment #499329 - Flags: review?(ted.mielczarek)
Attachment #499386 - Flags: review?(tglek) → review+
Attached patch part 2 - Import elfhack code (obsolete) — Splinter Review
With http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/28bd20ed70d7, as discussed on irc ; carrying r+ over.
Attachment #499386 - Attachment is obsolete: true
Attachment #499395 - Flags: review+
We had a misunderstanding on irc. So we do want to keep the option to disable the hack, but we still enable it by default. Sorry for the noise. This time, this should be the last iteration.
Attachment #499396 - Flags: review?(ted.mielczarek)
Attachment #499387 - Attachment is obsolete: true
Attachment #499387 - Flags: review?(ted.mielczarek)
Comment on attachment 499396 [details] [diff] [review]
part 3 - Integrate elfhack with the build system

r+ w/nits

>diff --git a/build/unix/elfhack/Makefile.in b/build/unix/elfhack/Makefile.in

This file is a little messy, but I'm not sure how to make it better short of recursing into inject/, which I'm not thrilled about either.

>+inject:
>+	[ -d $@ ] || mkdir $@

$(NSINSTALL) -D $@?

>diff --git a/configure.in b/configure.in
>--- a/configure.in
>+++ b/configure.in
>@@ -7698,16 +7698,26 @@ dnl ====================================
> dnl = --enable-elf-dynstr-gc
> dnl ========================================================
> MOZ_ARG_ENABLE_BOOL(elf-dynstr-gc,
> [  --enable-elf-dynstr-gc  Enable elf dynstr garbage collector (opt builds only)],
>     USE_ELF_DYNSTR_GC=1,
>     USE_ELF_DYNSTR_GC= )
> 
> dnl ========================================================
>+dnl = --enable-elf-dynstr-gc

--disable-elf-hack?
Attachment #499396 - Flags: review?(ted.mielczarek) → review+
Attachment #495508 - Flags: approval2.0?
Attachment #499395 - Flags: approval2.0?
Attachment #499396 - Flags: approval2.0?
I see this has tests. Can someone give a quick blurb about risk and reward of taking this? Also, have this been through try?
Also, I think we should have a clear on/off switch for any somewhat larger thing we take into a stable/beta release. I guess in this case this is the configure code that turns USE_ELF_HACK to 1 or empty. Would it make sense in some way to land it in a way that it's very near to a one-liner to completely turn off the impact of this patch set?
I'm just talking the possibility of risk minimization and easily returning to the previous state even without a complete code-backout.
Addressed review comments. Carrying r+ over.
Attachment #499396 - Attachment is obsolete: true
Attachment #499408 - Flags: review+
Attachment #499408 - Flags: approval2.0?
Attachment #499396 - Flags: approval2.0?
(In reply to comment #23)
> I see this has tests. Can someone give a quick blurb about risk and reward of
> taking this? Also, have this been through try?

It's been through try during previous iterations, and going through try right now, both for all linux targets including unit tests, and other targets without unit tests to double check that the build doesn't break, i.e. that the conditionals are correct.

I'll post how much is saved on each impacted platform in terms of file sizes, according to the try builds.

(In reply to comment #24)
> Also, I think we should have a clear on/off switch for any somewhat larger
> thing we take into a stable/beta release. I guess in this case this is the
> configure code that turns USE_ELF_HACK to 1 or empty. Would it make sense in
> some way to land it in a way that it's very near to a one-liner to completely
> turn off the impact of this patch set?
> I'm just talking the possibility of risk minimization and easily returning to
> the previous state even without a complete code-backout.

Technically, replacing USE_ELF_HACK=1 with USE_ELF_HACK= before the MOZ_ARG_DISABLE_BOOL line will disable it, but still leave a --enable-elf-hack option doing the appropriate thing (which MOZ_ARG_DISABLE_BOOL conveniently implements).

(In reply to comment #22)
> >diff --git a/build/unix/elfhack/Makefile.in b/build/unix/elfhack/Makefile.in
> 
> This file is a little messy, but I'm not sure how to make it better short of
> recursing into inject/, which I'm not thrilled about either.

Eventually, I want to find a way to only need one copy of the injected code built to handle both with init and noinit cases. The Makefile will be simplified then, but that'll be for later.
Try build for all impacted platforms:
http://tbpl.mozilla.org/?tree=MozillaTry&rev=edea484124b0
There's already a failure on maemo, which was addressed with http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/3a6206ac5351 but apparently, this wasn't enough.

This is the build for other platforms, that shouldn't fail to build.
http://tbpl.mozilla.org/?tree=MozillaTry&rev=028bbc995640
(In reply to comment #27)
> Try build for all impacted platforms:
> http://tbpl.mozilla.org/?tree=MozillaTry&rev=edea484124b0

There were oranges, but they are the same as for the parent changeset:
http://tbpl.mozilla.org/?rev=14ae20ed2bd5

As for the bustages on maemo, I identified them and found a way to reproduce. A fix should follow soon.

> This is the build for other platforms, that shouldn't fail to build.
> http://tbpl.mozilla.org/?tree=MozillaTry&rev=028bbc995640

This was all green.

As for binary size reductions:
On Linux opt:
libxul.so: Reduced by 1413008 bytes

(all others offered no gain, so were not modified by elfhack)

On Linux64 opt:
libxul.so: Reduced by 4792160 bytes

libnssckbi.so: Reduced by 73568 bytes

libnspr4.so: Reduced by 12128 bytes

libnss3.so: Reduced by 8032 bytes

libmozsqlite3.so: Reduced by 8032 bytes

libfreebl3.so: Reduced by 3936 bytes

components/libbrowsercomps.so: Reduced by 12128 bytes

libssl3.so: Reduced by 3936 bytes

libnssutil3.so: Reduced by 8032 bytes

libsoftokn3.so: Reduced by 3928 bytes


On Android:
libxul.so: Reduced by 1437580 bytes

libmozsqlite3.so: Reduced by 3980 bytes

I don't know how that reflects on the corresponding apk, I don't know where to find the corresponding apk for the parent commit to compare.

All these file size reductions are on portions that are always read on startup (except for prelinked binaries), so the gains are real.
Attached patch part 2 - Import elfhack code (obsolete) — Splinter Review
This is the version I pushed to try for maemo and android:
http://tbpl.mozilla.org/?tree=MozillaTry&rev=8f252fef68d0

It includes this fix:
http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/b2b392bc2df1
(I wish the ARM ELF documentation was clearer)
Maybe Jacob could take a look to the above commit diff. As far as my testing goes, this produces the correct branch call in the resulting binary.

As it modifies ARM related code, I pushed on all architectures using ARM. Other architectures are not impacted.
Attachment #499395 - Attachment is obsolete: true
Attachment #499491 - Flags: review?(Jacob.Bramley)
Attachment #499491 - Flags: approval2.0?
Attachment #499395 - Flags: approval2.0?
Comment on attachment 499491 [details] [diff] [review]
part 2 - Import elfhack code

Actually, I'll carry over previous r+, and request the additional review from Jacob for http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/b2b392bc2df1 which is the only change since last iteration.
Attachment #499491 - Flags: review+
I got maemo to build (apparently, it doesn't load libraries with LD_PRELOAD), but it turns out there is yet another problem. What I propose to go forward is to modify part 3 to exclude ARM for now (it's just a matter of removing arm from the test in toolkit/mozapps/installer/packager.mk), land the current patches, and file a followup bug for ARM issues.
(In reply to comment #28)
> All these file size reductions are on portions that are always read on startup
> (except for prelinked binaries), so the gains are real.

Do we have perf numbers from try as well? I'd guess that we should see some impact on perf due to those size reductions (and if they're all going in the right direction, that could be a huge argument for getting this in).
(In reply to comment #32)
> Do we have perf numbers from try as well? I'd guess that we should see some
> impact on perf due to those size reductions (and if they're all going in the
> right direction, that could be a huge argument for getting this in).

AIUI, ts_cold was removed, and it's basically the only test that would have shown a significant improvement, despite its flaws. So, I didn't run talos on the try pushes.
(In reply to comment #33)
> (In reply to comment #32)
> > Do we have perf numbers from try as well? I'd guess that we should see some
> > impact on perf due to those size reductions (and if they're all going in the
> > right direction, that could be a huge argument for getting this in).
> 
> AIUI, ts_cold was removed, and it's basically the only test that would have
> shown a significant improvement, despite its flaws. So, I didn't run talos on
> the try pushes.

Would you mind running with full talos? I understand you don't expect a perf impact, but stranger things have happened.
I found what the maemo issues were about:
- Original DT_INIT function was not called at all, but that didn't lead to any actual problem because the only function being called this way is call_gmon_start, which only does actual work if the binary is built with profiling stuff (either for PGO or gcov). That part was addressed in previous patch.
- Somehow, the testcase didn't run through the LD_PRELOAD variable, so I had to replace /bin/false with a binary doing the same, but linked against the test library.
- The DT_INIT_ARRAYSZ dynamic field ended up with the wrong value because of the .tbss section being logically at the same address as .init_array on maemo binaries. This is what caused the crashes, because only one init function was called (meaning most static initializers wouldn't run). This was a regression from http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/93194e8a4bcb, and I added extra code to the testcase to avoid this issue in the future. Corresponding commits are the following:
http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/755b083e2565
http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/afaaa7ffa029
http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/c13b41574183

I still would like feedback from Jacob for http://hg.mozilla.org/users/mh_glandium.org/elfhack/rev/b2b392bc2df1 though it apparently works properly.

I ran this through try, again, and nothing broke. I also tested the testcase with the broken elf.cpp code on maemo, which properly spotted the error and failed the build.
Attachment #499491 - Attachment is obsolete: true
Attachment #500021 - Flags: review?(tglek)
Attachment #500021 - Flags: feedback?(Jacob.Bramley)
Attachment #500021 - Flags: approval2.0?
Attachment #499491 - Flags: review?(Jacob.Bramley)
Attachment #499491 - Flags: approval2.0?
This adds support for building the dummy binary needed for the testcase to work on maemo.
Attachment #499408 - Attachment is obsolete: true
Attachment #500022 - Flags: review?(khuey)
Attachment #500022 - Flags: approval2.0?
Attachment #499408 - Flags: approval2.0?
Attachment #500021 - Flags: review?(tglek) → review+
Comment on attachment 500021 [details] [diff] [review]
part 2 - Import elfhack code

I know very little about ELF so I'm not really qualified to review this. The code looks sensible enough (at a glance), and the theory appears sound. I've given it '+' for feedback, but be aware that it doesn't count for much.
Attachment #500021 - Flags: feedback?(Jacob.Bramley) → feedback+
Here are some numbers I got on my setup, over 50 startups for each:
Plain 4.0b8:
x86 	3,228.76 ± 0.57%
x86-64 	3,382.0 ± 0.51%

4.0b8+elfhack:
x86 	3,149.32 ± 0.62%
x86-64 	3,191.58 ± 0.62%

See http://glandium.org/blog/?p=1296 for more details
(In reply to comment #40)
> Here are some numbers I got on my setup, over 50 startups for each:
> Plain 4.0b8:
> x86     3,228.76 ± 0.57%
> x86-64     3,382.0 ± 0.51%
> 
> 4.0b8+elfhack:
> x86     3,149.32 ± 0.62%
> x86-64     3,191.58 ± 0.62%
> 
> See http://glandium.org/blog/?p=1296 for more details

And if you read the post, it's clear that this + other changes result in a very significant startup win when combined.
Attachment #495508 - Flags: approval2.0? → approval2.0+
Attachment #500021 - Flags: approval2.0? → approval2.0+
Comment on attachment 500022 [details] [diff] [review]
part 3 - Integrate elfhack with the build system

Is there a particular reason we're doing this at package-time instead of build time? It seems to me that we could do this in the $(SHARED_LIBRARY) target of rules.mk.
(In reply to comment #42)
> Comment on attachment 500022 [details] [diff] [review]
> part 3 - Integrate elfhack with the build system
> 
> Is there a particular reason we're doing this at package-time instead of build
> time?

Mostly, to catch nss files, where there's a win on x86-64.
Attachment #500022 - Flags: approval2.0? → approval2.0+
Unfortunately, there are now 2 problems with Android:
- elfhack doesn't support that the injected code is thumb, which was turned on by default recently. This can be easily worked around with a patch removing the thumb arguments from CFLAGS.
- elfhack apparently unveals a dynamic linker bug that makes the last text relocation (not packed by elfhack) not applied correctly. It also looks like from /proc/pid/maps that not all PT_LOAD sections are loaded ; I couldn't find traces of the data sections in /proc/pid/maps, but that's really weird because it should fail much earlier than it does if that were actually the case.

Anyways, I'm looking into this last issue, the first one being a matter of filter-out in the Makefile. Procedure-wise, do I need review and approval for this fixup?
If it winds up being a trivial fix, then no. If you wind up making substantial changes, then it should get re-review/approval. Can we just disable this for Android for now while you fix those issues in a followup?
Removed thumb flags (i think they are used on meego) when building elfhack and disabled for android. Will file followup bugs.
Attachment #500022 - Attachment is obsolete: true
Blocks: 606137
Depends on: 628281
Depends on: 628283
This causes my build to fail on Ubuntu 10.10:

elfhack: /home/dbaron/builds/ssd/mozilla-central/mozilla/build/unix/elfhack/elf.cpp:284: Elf::Elf(std::ifstream&): Assertion `segment->getFileSize() == phdr.p_filesz' failed.
You can build with --disable-elf-hack to not hit this issue
Target Milestone: --- → mozilla2.0b11
On Fedora 14 hg:ac6c96109dcf build failed at
c++ -o host_elf.o -c -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloade
d-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno
-invalid-offsetof -Wno-variadic-macros -Werror=return-type -Wno-long-long -gdwar
f-2 -fno-strict-aliasing -fshort-wchar -pthread -pipe -fexceptions  -DNDEBUG -DT
RIMMED -g -O2 -g -pipe -Wp,-D_FORTIFY_SOURCE=2 -fno-exceptions -fstack-protector
 --param=ssp-buffer-size=4 -m64 -mtune=generic  -I. -I. -I../../../dist/include 
-I../../../dist/include/nsprpub  -I/usr/include/nspr4 -I/usr/include/nss3      -
I/usr/include/nspr4 elf.cpp
elf.cpp: In constructor 'Elf::Elf(std::ifstream&)':
elf.cpp:149:54: error: exception handling disabled, use -fexceptions to enable

According to https://developer.mozilla.org/en/C___Portability_Guide#Don%27t_use_exceptions
we cannot use exceptions for Mozilla products.
And we got the following warning message during compiling build/unix/elfhack.test.c
test.c:116:5: warning: C++ style comments are not allowed in ISO C90
test.c:116:5: warning: (this will be reported only once per input file)
(In reply to comment #50)
> elf.cpp: In constructor 'Elf::Elf(std::ifstream&)':
> elf.cpp:149:54: error: exception handling disabled, use -fexceptions to enable
> 
> According to
> https://developer.mozilla.org/en/C___Portability_Guide#Don%27t_use_exceptions
> we cannot use exceptions for Mozilla products.

It's only a linux-specific build tool, portability is not exactly important here. It doesn't end up in the final binary.
It's interesting, though, that your compiler doesn't handle the presence of both -fexceptions and -fno-exceptions like others.

(In reply to comment #51)
> And we got the following warning message during compiling
> build/unix/elfhack.test.c
> test.c:116:5: warning: C++ style comments are not allowed in ISO C90
> test.c:116:5: warning: (this will be reported only once per input file)

Not really a problem, but that can surely be easily fixed.
Filed bug 628593 and bug 628595
(In reply to comment #52)
> (In reply to comment #50)
> It's only a linux-specific build tool, portability is not exactly important
> here. It doesn't end up in the final binary.
> It's interesting, though, that your compiler doesn't handle the presence of
> both -fexceptions and -fno-exceptions like others.

Ah, compiler option I set is complicated.
-fno-exceptions ... -fexceptions ... -fno-exceptions
So I suppose -fno-exceptions is finally set.

I found that I manually set -fno-exceptions (-fexceptions is replaced
with -fno-exceptions in Fedora xulrunner spec file).
When I remove my -fno-exceptions, the build goes well
Depends on: 628618
Depends on: 628593
Depends on: 628595
This was disabled in bug 637243. The bug to re-enable it is bug 637317.
Summary: Compress/pack relocations in ELF binaries → Compress/pack relocations in ELF binaries (elfhack)
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: