Closed Bug 788974 Opened 8 years ago Closed 3 years ago

Don't disable elfhack when enabling profiling on platforms supporting dl_iterate_phdr.

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox37 fixed)

RESOLVED FIXED
mozilla37
Tracking Status
firefox37 --- fixed

People

(Reporter: glandium, Assigned: jseward)

References

Details

Attachments

(3 files, 5 obsolete files)

Now that bug 749518 is fixed, we should be able re-enable elfhack when profiling is enabled.
Depends on: 892355
Provided profiling works (it's supposed to)
Attachment #773803 - Flags: review?(ted)
Assignee: nobody → mh+mozilla
In fact, b2g still doesn't have dl_iterate_phdr, and we still fallback to /proc/self/maps there.
Attachment #773806 - Flags: review?(ted)
Attachment #773803 - Attachment is obsolete: true
Attachment #773803 - Flags: review?(ted)
Also removing the corresponding --disable-elf-hack that are not unnecessary.
Attachment #773808 - Flags: review?(ted)
Attachment #773806 - Attachment is obsolete: true
Attachment #773806 - Flags: review?(ted)
Attachment #773808 - Flags: review?(ted) → review+
(In reply to Mike Hommey [:glandium] from comment #3)
> Created attachment 773808 [details] [diff] [review]

Using M-I, this patch and M-I, with mobile/android/config/mozconfigs/android/nightly
I get a lot of unwinding failure on ARM/Android -- averaging 5.2 frames/unwind.  

Re-adding "ac_add_options --disable-elf-hack" changes that to 28.9 frames/unwind,
which is what I expect to get.

Looking at the profile data in the GUI, for the elfhack'd build .. it's clear
there's a lot of unwind failure inside libxul.
Julian, could you check how things go nowadays with elfhack+profiling?
Flags: needinfo?(jseward)
Blocks: 1052262
(In reply to Mike Hommey [:glandium] from comment #5)
> Julian, could you check how things go nowadays with elfhack+profiling?

I tested x86_64-linux, x86-linux and arm-android.  None of the mozconfigs
had --disable-elfhack.  As far as I can see the LUL unwinder worked OK.
But now I am not sure whether or not that was the correct test.  Do I need
to explicitly say --enable-elfhack, or is it assumed if --disable-elfhack
is not present?
Flags: needinfo?(jseward)
Note the option is --disable-elf-hack, and the default is --enable-elf-hack. However, with neither --disable-elf-hack or --enable-elf-hack, the default is to enable it *if possible*, and configure.in considers it not possible if --enable-profiling is there too.

So, to enable elfhack, you do need to add an explicit --enable-elf-hack. Elfhack also only applies when you run make package (or mach package). You can check libxul is elfhacked by looking at its ELF sections and looking whether there are "elfhack" sections.
Retesting this more thoroughly, with explicit --enable-elf-hack, doing
"./mach package" along the way, and verifying the elfhackedness with readelf -S,
gives the following result:

   x86_64-linux  LUL works ok
   x86-linux     ./mach package fails
   arm-android   LUL fails

so AFAICS, apart from the x86-linux fail, the situation is unchanged from before.
(In reply to Julian Seward [:jseward] from comment #8)
> Retesting this more thoroughly, with explicit --enable-elf-hack, doing
> "./mach package" along the way, and verifying the elfhackedness with readelf
> -S,
> gives the following result:
> 
>    x86_64-linux  LUL works ok
>    x86-linux     ./mach package fails

How does mach package fail?

>    arm-android   LUL fails
> 
> so AFAICS, apart from the x86-linux fail, the situation is unchanged from
> before.

The code being different now, it would be interesting to know why that fails now.
Attached file bug788974c10_log.txt
> How does mach package fail?

See attached log file.  This is a 32-bit build on a 64-bit host.
Maybe relevant?
(In reply to Mike Hommey [:glandium] from comment #9)
> >    arm-android   LUL fails
> The code being different now, it would be interesting to know why that fails
> now.

I'll look into it.  One point of commonality is that LUL's Elf parser
is derived from Breakpad's Elf parser, so if the latter has a problem
with elfhacked objects then I would expect the former to also have
such a problem.
(In reply to Julian Seward [:jseward] from comment #11)
> (In reply to Mike Hommey [:glandium] from comment #9)
> > >    arm-android   LUL fails
> > The code being different now, it would be interesting to know why that fails
> > now.
> 
> I'll look into it.  One point of commonality is that LUL's Elf parser
> is derived from Breakpad's Elf parser, so if the latter has a problem
> with elfhacked objects then I would expect the former to also have
> such a problem.

Elfhack doesn't do things differently between arm-android and x86*-linux. Why would it fail for the former but not the latter?

(In reply to Julian Seward [:jseward] from comment #10)
> Created attachment 8503184 [details]
> bug788974c10_log.txt
> 
> > How does mach package fail?
> 
> See attached log file.  This is a 32-bit build on a 64-bit host.
> Maybe relevant?

The error suggests that the value of STRIP that you got out of configure is wrong.
Fixes LUL's EXIDX reader so as to be able to deal with elfhacked ARM
binaries.  This simplifies and cleans up LUL's reading of
.ARM.{exidx,extab} sections by reading directly from the in-process
image rather than from the temporarily mapped object file, thereby
sidestepping the previous not-really-correct biasing scheme used when
reading EXIDX.

This relies on the assumption that, for elfhacked binaries, the bias
(difference between stated-in-the-file and actual-in-the-process
virtual addresses) for the two text sections is the same.  I think
that ELF guarantees this for us, but I am not sure.  Mike, do you
know?  In any case I would say that the reader is more correct now
than it was before.

WIP patch; do not commit.  Needs further tidying up to do with the
range checks that guarantee we will not segfault when reading this
unwind info.
> This relies on the assumption that, for elfhacked binaries, the bias
> (difference between stated-in-the-file and actual-in-the-process
> virtual addresses) for the two text sections is the same.

If you mean differences in sh_addr, then yes, that's guaranteed. That said, I won't claim to know how those sections work, but it /should/ be possible to use them purely in-memory, without reading the sections table. The exidx section is pointed at by PT_EXIDX, and I think extab elements are directly pointed at from exidx. (and if you don't want to parse elf headers at all, you can even get the exidx address, as well as its size, by calling __gnu_Unwind_Find_exidx)
(In reply to Mike Hommey [:glandium] from comment #14)
> If you mean differences in sh_addr, then yes, that's guaranteed. That said,
> I won't claim to know how those sections work, but it /should/ be possible
> to use them purely in-memory, without reading the sections table. The exidx
> section is pointed at by PT_EXIDX, and I think extab elements are directly
> pointed at from exidx.

Yes: the exidx points to both the text and (when necessary) the extab with relative addresses.  All that's necessary are the address/length from the PT_EXIDX.

> (and if you don't want to parse elf headers at all,
> you can even get the exidx address, as well as its size, by calling
> __gnu_Unwind_Find_exidx)

Isn't that one of those functions that's visible only because Android winds up exporting libgcc's private symbols from libc (and a few other libraries, IIRC)?
Per comment 13, fixes LUL's EXIDX reader so it works with elfhacked
objects.

* reads EXIDX from the running process image, not the temporarily
  mapped object from which CFI etc is read

* this simplifies the EXIDX reader and removes hard-to-understand
  and probably buggy biasing calculations

* redo range checking so that we don't segfault should the .exidx/.extab
  not have been mapped in with at least r perms, for whatever reason
Attachment #8511953 - Attachment is obsolete: true
Attachment #8515915 - Flags: review?(mh+mozilla)
Attachment #8515915 - Flags: review?(mh+mozilla) → review+
Could you also refresh attachment 773808 [details] [diff] [review]? It predates b2g, for instance.
Rebased.  I didn't change it in any other way.
Attachment #773808 - Attachment is obsolete: true
(In reply to Julian Seward [:jseward] from comment #18)
> Created attachment 8518123 [details] [diff] [review]
> "elfhack and profiling don't conflict anymore", rebased
> 
> Rebased.  I didn't change it in any other way.

The point was that there are plenty of new mozconfigs that have --disable-elf-hack that didn't exist back then.
(In reply to Mike Hommey [:glandium] from comment #19)
Ok, I'll redo it, removing all the --disable-elf-hacks I can find.
(In reply to Julian Seward [:jseward] from comment #16)
> Created attachment 8515915 [details] [diff] [review]
> bug788974-fix-exidx-and-elfhack-2.diff

https://hg.mozilla.org/integration/mozilla-inbound/rev/d13417400605
Keywords: leave-open
I found it a bit difficult to decide where to remove --disable-elf-hack.  In
the end I removed it everywhere except from build/unix/mozconfig.asan.  This
is on the basis that it could interact badly at least with the following
tools:

(1) LUL
(2) Jed Davis' EXIDX reader
(3) Valgrind
(4) ASan

but (1) and (3) are AFAICT ok with elfhack now, and I'd guess that (2)
isn't affected by it.  That leaves (4); I assume that ASan has its own
unwinder and I don't know the elfhack-happyness status of that.

I'm not too sure about all this, so would appreciate feedback.  Is
indeed (2) elfhack-OK and (4) not-elfhack-OK?  Are there other tools
to be considered?
Attachment #8518123 - Attachment is obsolete: true
Attachment #8519888 - Flags: feedback?(mh+mozilla)
Attachment #8519888 - Flags: feedback?(jld)
Comment on attachment 8519888 [details] [diff] [review]
bug788974-att773808-rebase2.diff

My ARM EH unwinder should be elfhack-compatible.  If I recall correctly, I was trying to avoid breaking elfhack in particular (multiple executable load commands, varying/nonzero difference between offset and vaddr, etc.), and avoid unnecessary assumptions about ELF usage in general.  I also recall testing it with elfhack at one point, though not recently. 


For the sake of completeness: the one corner of ELF that I know it doesn't handle is if there isn't a PT_LOAD covering the ELF header and program headers.  The one instance of that that I saw was the Android dynamic linker, but that's no longer the case as of [dcbc378] which is in JB and up.

[dcbc378]: https://android.googlesource.com/platform/bionic/+/dcbc3787bfb9a272a010
Attachment #8519888 - Flags: feedback?(jld) → feedback+
Comment on attachment 8519888 [details] [diff] [review]
bug788974-att773808-rebase2.diff

Review of attachment 8519888 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know the status of ASAN vs. elfhack, but chances are it works. I'm fine leaving elfhack off for those builds until it is completely ruled out as being a problem, though.
Attachment #8519888 - Flags: feedback?(mh+mozilla) → feedback+
Attachment #8519888 - Flags: feedback+ → review+
Assignee: mh+mozilla → jseward
Keywords: leave-open
Target Milestone: --- → mozilla37
Is there still a reason this is leave-open and not closed?
Flags: needinfo?(jseward)
I have no idea why this is leave-open.  I think we can close it now.
Flags: needinfo?(jseward)
Actually, the leave-open flag was removed 2.5 years ago... but the bug was not marked as FIXED (although it was marked status-firefox37: fixed).
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Pushed by mozilla@hocat.ca:
https://hg.mozilla.org/comm-central/rev/85e4ac0d421f
Port bug 788974: Don't disable elfhack when enabling profiling; r=me
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.