Closed
Bug 716295
Opened 13 years ago
Closed 12 years ago
Elf-hack failing on b2g builds
Categories
(Core :: mozglue, defect, P1)
Tracking
()
People
(Reporter: cjones, Assigned: glandium)
References
Details
Attachments
(6 files, 4 obsolete files)
2.36 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
3.54 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
3.45 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
5.33 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
4.15 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
19.01 KB,
patch
|
glandium
:
review+
|
Details | Diff | Splinter Review |
I get the following spew before the exception
/home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/../../../../arm-eabi/bin/ld: warning: /home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/thumb/android/libgcc.a(unwind-arm.o) uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
/home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/../../../../arm-eabi/bin/ld: warning: /home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/thumb/android/libgcc.a(pr-support.o) uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
/home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/../../../../arm-eabi/bin/ld: warning: /home/cjones/mozilla/b2g/glue/gonk/out/target/product/galaxys2/obj/lib/liblog.so uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
/home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/../../../../arm-eabi/bin/ld: warning: /home/cjones/mozilla/b2g/glue/gonk/out/target/product/galaxys2/obj/lib/libstdc++.so uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
/home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/../../../../arm-eabi/bin/ld: warning: /home/cjones/mozilla/b2g/glue/gonk/out/target/product/galaxys2/obj/lib/libm.so uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
/home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/../../../../arm-eabi/bin/ld: warning: /home/cjones/mozilla/b2g/glue/gonk/out/target/product/galaxys2/obj/lib/libc.so uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
/home/cjones/mozilla/b2g/glue/gonk/prebuilt/linux-x86/toolchain/arm-eabi-4.4.3/bin/../lib/gcc/arm-eabi/4.4.3/../../../../arm-eabi/bin/ld: warning: /home/cjones/mozilla/b2g/glue/gonk/out/target/product/galaxys2/obj/lib/libdl.so uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
===
=== If you get failures below, please file a bug describing the error
=== and your environment (compiler and linker versions), and use
=== --disable-elf-hack until this is fixed.
===
test.so: terminate called after throwing an instance of 'std::runtime_error'
what(): Moving section would require overlapping segments
I'm using the following mozconfig
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/objdir-prof-gonk
# XXX our build system doesn't seem to respect toplevel -j ...
# hard-coding this sucks
mk_add_options MOZ_MAKE_FLAGS="-s $MAKE_FLAGS"
# XXX need a plan for toolchain and multi-dev-platform support
gonk="$GONK_PATH"
sys=`uname -s | tr "[[:upper:]]" "[[:lower:]]"`
ac_add_options --target=arm-android-eabi
ac_add_options --with-gonk="$gonk"
ac_add_options --with-endian=little
ac_add_options --enable-application=b2g
# ac_add_options --disable-elf-hack
ac_add_options --enable-debug-symbols
# ac_add_options --enable-profiling
ac_add_options --with-ccache
# Enable dump() from JS.
export CXXFLAGS=-DMOZ_ENABLE_JS_DUMP
The environment is https://github.com/andreasgal/B2G/commit/f12f7a7f557c0506d05be75eb7439493b76653d2. The compiler is included prebuilt in the repository. It corresponds to the android NDK as of 2.3.5, which I believe is NDK 6.
Note that this mozconfig isn't the default for b2g builds (obviously! :).
Reporter | ||
Comment 1•12 years ago
|
||
It came out in bug 774131 that we're spending ~500ms between fork() and main() when launching content processes. glandium reminded us that we disabled elfhack, which should win some startup time.
After bug 774131 fixes the biggest user-perceived lag in launching content processes, the fork()-to-main() overhead will be the biggest to seeing app pixels appear on screen. XPCOM startup is biggest after that.
This should block for figuring out what's required to reenable elfhack and what win we can get from it.
Blocks: b2g-e10s-work
blocking-basecamp: --- → +
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → mh+mozilla
Assignee | ||
Updated•12 years ago
|
Component: Build Config → mozglue
OS: Linux → Gonk
Hardware: x86_64 → ARM
Reporter | ||
Comment 2•12 years ago
|
||
glandium, I now see
===
=== If you get failures below, please file a bug describing the error
=== and your environment (compiler and linker versions), and use
/home/cjones/mozilla/new-b2g/objdir-gecko/config/nsinstall -D ../../dist/
=== --disable-elf-hack until this is fixed.
===
cd ../../dist/bin; find . -name "*.so" | xargs ../../build/unix/elfhack/elfhack
./libnssckbi.so: terminate called after throwing an instance of 'std::runtime_error'
what(): Moving section would require overlapping segments
xargs: ../../build/unix/elfhack/elfhack: terminated by signal 6
make[4]: *** [elfhack] Error 125
make[4]: Leaving directory `/home/cjones/mozilla/new-b2g/objdir-gecko/b2g/installer'
make[3]: *** [make-package] Error 2
make[3]: Leaving directory `/home/cjones/mozilla/new-b2g/objdir-gecko/b2g/installer'
make[2]: *** [default] Error 2
make[2]: Leaving directory `/home/cjones/mozilla/new-b2g/objdir-gecko/b2g/installer'
make[1]: *** [package] Error 2
make[1]: Leaving directory `/home/cjones/mozilla/new-b2g/objdir-gecko'
make: *** [out/target/product/crespo/obj/DATA/gecko_intermediates/gecko] Error 2
with a vanilla b2g build for the Nexus S, with |ac_add_options --disable-elf-hack| commented out in $b2g/gonk-misc/default-gecko-config. Can you take a look?
Reporter | ||
Comment 3•12 years ago
|
||
Mid-aired, I'm going to assume that's a yes :).
Assignee | ||
Comment 4•12 years ago
|
||
Yeah, I already found the problem. Fix isn't very difficult, but the tricky part is to test the fix properly, considering the problem only happens on cross-compiled environment, where we can't verify that elfhack doesn't break the resulting binaries, so I'd like the fix to be tested on linux builds as well, and that requires forging a proper test case with a linker script, or maybe using -nostartfiles.
Reporter | ||
Comment 5•12 years ago
|
||
Ooc, why doesn't the bug appear in firefox-android builds?
Assignee | ||
Comment 6•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #5)
> Ooc, why doesn't the bug appear in firefox-android builds?
Because ld, on firefox-android builds, leaves empty entries in the .dynamic section, which the b2g ld doesn't. Elfhack adds an entry to .dynamic in both cases, but only on android builds it has room to do so.
The way I want to fix this is to avoid elfhack adding an entry to .dynamic at all.
Reporter | ||
Comment 7•12 years ago
|
||
Hm, ok. I'm surprised we're setting things up differently from firefox-android.
Do you have a patch I can try?
Assignee | ||
Comment 8•12 years ago
|
||
I have a preliminary patch that addresses the .dynamic issue for rel locations (so, for x86 and ARM, but not x64), but it uncovered another problem...
Assignee | ||
Comment 9•12 years ago
|
||
This works for me on otoro.
Reporter | ||
Comment 10•12 years ago
|
||
Representative timings with patch applied
I/Gecko ( 78): LAUNCH [Parent] Before fork(), time is 52 sec 517 ms
I/Gecko ( 78): LAUNCH [Parent] LaunchAndWait() --- 12.6202 ms
I/Gecko ( 261): LAUNCH [Child] In XRE_InitChildProcess(), time is 52 sec 879 ms
I/Gecko ( 261): LAUNCH [Child] ContentProcess::Init()
I/Gecko ( 261): LAUNCH [Child] mContent.Init() --- 1.98187 ms
I/Gecko ( 261): LAUNCH [Child] mXREEmbed.Start() ---- 396.717 ms
I/Gecko ( 261): LAUNCH [Child] mContent.InitXPCOM() --- 396.858 ms
So this takes fork()-to-main() time down by about 200ms :).
Updated•12 years ago
|
Priority: -- → P1
Comment 13•12 years ago
|
||
Dumb question, shouldn't we be doing relocations only once for the main b2g process and fork()ing post-relocation for the rest?
Assignee | ||
Comment 14•12 years ago
|
||
Attachment #650259 -
Flags: review?(nfroyd)
Assignee | ||
Comment 15•12 years ago
|
||
Attachment #650260 -
Flags: review?(nfroyd)
Assignee | ||
Comment 16•12 years ago
|
||
Attachment #650261 -
Flags: review?(nfroyd)
Assignee | ||
Updated•12 years ago
|
Attachment #649714 -
Attachment is obsolete: true
Assignee | ||
Comment 17•12 years ago
|
||
Attachment #650262 -
Flags: review?(nfroyd)
Assignee | ||
Comment 18•12 years ago
|
||
Attachment #650263 -
Flags: review?(nfroyd)
Assignee | ||
Comment 19•12 years ago
|
||
Attachment #650264 -
Flags: review?(nfroyd)
Comment 20•12 years ago
|
||
Comment on attachment 650259 [details] [diff] [review]
part 1 - Fail more gracefully when .dynamic section can't be grown
Review of attachment 650259 [details] [diff] [review]:
-----------------------------------------------------------------
::: build/unix/elfhack/elf.cpp
@@ +656,5 @@
> // If we get here, this means we didn't match for the given tag
> + // Most of the time, there are a few DT_NULL entries, that we can
> + // use to add our value, but if we are on the last entry, we can't.
> + if (i >= shdr.sh_size / shdr.sh_entsize - 1)
> + return false;
Given the number of times shdr.sh_size / shdr.sh_entsize is used in this function, I think you should give it a name.
I'd chuck the assert above this given that we're no longer twiddling with the size of the section.
@@ +662,2 @@
> dyns[i].tag = tag;
> dyns[i++].value = val;
Nit: just dyns[i] here.
Attachment #650259 -
Flags: review?(nfroyd) → review+
Updated•12 years ago
|
Attachment #650260 -
Flags: review?(nfroyd) → review+
Updated•12 years ago
|
Attachment #650261 -
Flags: review?(nfroyd) → review+
Comment 21•12 years ago
|
||
Comment on attachment 650263 [details] [diff] [review]
part 5 - Add support for R_ARM_THM_CALL relocations
Review of attachment 650263 [details] [diff] [review]:
-----------------------------------------------------------------
::: build/unix/elfhack/elfhack.cpp
@@ +248,5 @@
> s = (tmp & (1 << 24)) >> 24;
> j1 = ((tmp & (1 << 23)) >> 23) ^ !s;
> j2 = ((tmp & (1 << 22)) >> 22) ^ !s;
>
> return 0xf000 | (s << 10) | ((tmp & (0x3ff << 12)) >> 12) |
Nit: trailing whitespace. Please fix while you're here.
Attachment #650263 -
Flags: review?(nfroyd) → review+
Updated•12 years ago
|
Attachment #650264 -
Flags: review?(nfroyd) → review+
Comment 22•12 years ago
|
||
Comment on attachment 650262 [details] [diff] [review]
part 4 - Interpose elfhack injected code in DT_INIT_ARRAY's first entry when possible
Review of attachment 650262 [details] [diff] [review]:
-----------------------------------------------------------------
I think this looks sane enough. I want to give myself some time to look at this in the morning, though.
::: build/unix/elfhack/elfhack.cpp
@@ +368,5 @@
> + addr.serialize(const_cast<char *>(loc.getBuffer()), Elf_Addr::size(elf->getClass()), elf->getClass(), elf->getData());
> +}
> +
> +void set_relative_reloc(Elf_Rela *rel, Elf *elf, unsigned int value) {
> + set_relative_reloc((Elf_Rel *)rel, elf, value);
This looks unnecessary (and a bit ugly), since the reloc holds the addend already. Am I missing something?
@@ +478,1 @@
> (relro && (i->r_offset >= relro->getAddr()) &&
Nit: indentation here and below.
@@ +508,5 @@
> + Rel_Type *rel = &new_rels[init_array_reloc - 1];
> + unsigned int addend = get_addend(rel, elf);
> + // Use relocated value of DT_INIT_ARRAY's first entry for the
> + // function to be called by the injected code.
> + if (ELF32_R_TYPE(rel->r_info) == rel_type) {
This macro usage (and others throughout) should depend on the class of |elf|, yes?
Attachment #650262 -
Flags: review?(nfroyd) → feedback+
Assignee | ||
Comment 23•12 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #22)
> Comment on attachment 650262 [details] [diff] [review]
> part 4 - Interpose elfhack injected code in DT_INIT_ARRAY's first entry when
> possible
>
> Review of attachment 650262 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I think this looks sane enough. I want to give myself some time to look at
> this in the morning, though.
>
> ::: build/unix/elfhack/elfhack.cpp
> @@ +368,5 @@
> > + addr.serialize(const_cast<char *>(loc.getBuffer()), Elf_Addr::size(elf->getClass()), elf->getClass(), elf->getData());
> > +}
> > +
> > +void set_relative_reloc(Elf_Rela *rel, Elf *elf, unsigned int value) {
> > + set_relative_reloc((Elf_Rel *)rel, elf, value);
>
> This looks unnecessary (and a bit ugly), since the reloc holds the addend
> already. Am I missing something?
It's for consistency with ld, which keeps the value in both places (and makes one wonder why use RELA at all), although that depends on versions. I'm tempted to make it conditional on whether the value is the old addend or not. ld.so doesn't care, though.
> @@ +508,5 @@
> > + Rel_Type *rel = &new_rels[init_array_reloc - 1];
> > + unsigned int addend = get_addend(rel, elf);
> > + // Use relocated value of DT_INIT_ARRAY's first entry for the
> > + // function to be called by the injected code.
> > + if (ELF32_R_TYPE(rel->r_info) == rel_type) {
>
> This macro usage (and others throughout) should depend on the class of
> |elf|, yes?
Once read from disk, everything is actually stored in Elf32 structs.
Assignee | ||
Comment 24•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #23)
> (In reply to Nathan Froyd (:froydnj) from comment #22)
> > Comment on attachment 650262 [details] [diff] [review]
> > part 4 - Interpose elfhack injected code in DT_INIT_ARRAY's first entry when
> > possible
> >
> > Review of attachment 650262 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > I think this looks sane enough. I want to give myself some time to look at
> > this in the morning, though.
> >
> > ::: build/unix/elfhack/elfhack.cpp
> > @@ +368,5 @@
> > > + addr.serialize(const_cast<char *>(loc.getBuffer()), Elf_Addr::size(elf->getClass()), elf->getClass(), elf->getData());
> > > +}
> > > +
> > > +void set_relative_reloc(Elf_Rela *rel, Elf *elf, unsigned int value) {
> > > + set_relative_reloc((Elf_Rel *)rel, elf, value);
> >
> > This looks unnecessary (and a bit ugly), since the reloc holds the addend
> > already. Am I missing something?
>
> It's for consistency with ld, which keeps the value in both places (and
> makes one wonder why use RELA at all), although that depends on versions.
> I'm tempted to make it conditional on whether the value is the old addend or
> not. ld.so doesn't care, though.
Actually, looking more closely, for relative relocations all versions of ld I tested are storing the value in both places. What differs is, for R_X86_64_64/R_386_32/..., some versions of ld put the value of the symbol in the relocated data field, while it's (obviously) not the addend, which is 0.
Assignee | ||
Comment 25•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Attachment #650259 -
Attachment is obsolete: true
Assignee | ||
Comment 26•12 years ago
|
||
Added a comment, and addressed review comments.
Assignee | ||
Updated•12 years ago
|
Attachment #650262 -
Attachment is obsolete: true
Assignee | ||
Comment 27•12 years ago
|
||
Comment on attachment 650445 [details] [diff] [review]
part 1 - Fail more gracefully when .dynamic section can't be grown
Carry r+ over.
Attachment #650445 -
Flags: review+
Assignee | ||
Updated•12 years ago
|
Attachment #650446 -
Flags: review?(nfroyd)
Comment 28•12 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #23)
> > ::: build/unix/elfhack/elfhack.cpp
> > @@ +368,5 @@
> > > + addr.serialize(const_cast<char *>(loc.getBuffer()), Elf_Addr::size(elf->getClass()), elf->getClass(), elf->getData());
> > > +}
> > > +
> > > +void set_relative_reloc(Elf_Rela *rel, Elf *elf, unsigned int value) {
> > > + set_relative_reloc((Elf_Rel *)rel, elf, value);
> >
> > This looks unnecessary (and a bit ugly), since the reloc holds the addend
> > already. Am I missing something?
>
> It's for consistency with ld, which keeps the value in both places (and
> makes one wonder why use RELA at all), although that depends on versions.
> I'm tempted to make it conditional on whether the value is the old addend or
> not. ld.so doesn't care, though.
My guess is that ld does what it does for simplicity. If you always put the value in the relocation's location, then many places won't have to care whether you're using REL or RELA. And this gracefully handles those architectures that can do either/or (or both simultaneously *shudder*).
> > @@ +508,5 @@
> > > + Rel_Type *rel = &new_rels[init_array_reloc - 1];
> > > + unsigned int addend = get_addend(rel, elf);
> > > + // Use relocated value of DT_INIT_ARRAY's first entry for the
> > > + // function to be called by the injected code.
> > > + if (ELF32_R_TYPE(rel->r_info) == rel_type) {
> >
> > This macro usage (and others throughout) should depend on the class of
> > |elf|, yes?
>
> Once read from disk, everything is actually stored in Elf32 structs.
Argh, I saw that earlier but didn't make the connection to here.
Comment 29•12 years ago
|
||
Comment on attachment 650446 [details] [diff] [review]
part 4 - Interpose elfhack injected code in DT_INIT_ARRAY's first entry when possible
Review of attachment 650446 [details] [diff] [review]:
-----------------------------------------------------------------
::: build/unix/elfhack/elfhack.cpp
@@ +360,5 @@
> +unsigned int get_addend(Elf_Rela *rel, Elf *elf) {
> + return rel->r_addend;
> +}
> +
> +void set_relative_reloc(Elf_Rel *rel, Elf *elf, unsigned int value, unsigned int old_value) {
old_value looks unused in both overloads of this function.
@@ +370,5 @@
> +
> +void set_relative_reloc(Elf_Rela *rel, Elf *elf, unsigned int value, unsigned int old_value) {
> + // ld puts the value of relocated relocations both in the addend and
> + // at r_offset. For consistency, keep it that way.
> + set_relative_reloc((Elf_Rel *)rel, elf, value);
I appreciate the comment, but there's no three-argument set_relative_reloc...?
Attachment #650446 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 30•12 years ago
|
||
Doh, I had forgotten to remove these, which remained from an attempt at doing something smarter, which ended up being wrong.
Assignee | ||
Updated•12 years ago
|
Attachment #650446 -
Attachment is obsolete: true
Assignee | ||
Comment 31•12 years ago
|
||
Comment on attachment 650548 [details] [diff] [review]
part 4 - Interpose elfhack injected code in DT_INIT_ARRAY's first entry when possible
Carrying r+ over.
Attachment #650548 -
Flags: review+
Assignee | ||
Comment 32•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/e043018e92ba
https://hg.mozilla.org/integration/mozilla-inbound/rev/23fdc9b894d3
https://hg.mozilla.org/integration/mozilla-inbound/rev/ac414499bc14
https://hg.mozilla.org/integration/mozilla-inbound/rev/80caa5061f40
https://hg.mozilla.org/integration/mozilla-inbound/rev/8d4e4fc8adf8
https://hg.mozilla.org/integration/mozilla-inbound/rev/076954b3555e
Target Milestone: --- → mozilla17
Comment 33•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/e043018e92ba
https://hg.mozilla.org/mozilla-central/rev/23fdc9b894d3
https://hg.mozilla.org/mozilla-central/rev/ac414499bc14
https://hg.mozilla.org/mozilla-central/rev/80caa5061f40
https://hg.mozilla.org/mozilla-central/rev/8d4e4fc8adf8
https://hg.mozilla.org/mozilla-central/rev/076954b3555e
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•