Don't disable elfhack when enabling profiling on platforms supporting dl_iterate_phdr.

RESOLVED FIXED in Firefox 37

Status

defect
RESOLVED FIXED
7 years ago
Last year

People

(Reporter: glandium, Assigned: jseward)

Tracking

unspecified
mozilla37
Dependency tree / graph

Firefox Tracking Flags

(firefox37 fixed)

Details

Attachments

(3 attachments, 5 obsolete attachments)

Reporter

Description

7 years ago
Now that bug 749518 is fixed, we should be able re-enable elfhack when profiling is enabled.
Reporter

Updated

6 years ago
Depends on: 892355
Reporter

Comment 1

6 years ago
Provided profiling works (it's supposed to)
Attachment #773803 - Flags: review?(ted)
Reporter

Updated

6 years ago
Assignee: nobody → mh+mozilla
Reporter

Comment 2

6 years ago
In fact, b2g still doesn't have dl_iterate_phdr, and we still fallback to /proc/self/maps there.
Attachment #773806 - Flags: review?(ted)
Reporter

Updated

6 years ago
Attachment #773803 - Attachment is obsolete: true
Attachment #773803 - Flags: review?(ted)
Reporter

Comment 3

6 years ago
Also removing the corresponding --disable-elf-hack that are not unnecessary.
Attachment #773808 - Flags: review?(ted)
Reporter

Updated

6 years ago
Attachment #773806 - Attachment is obsolete: true
Attachment #773806 - Flags: review?(ted)
Attachment #773808 - Flags: review?(ted) → review+
Assignee

Comment 4

6 years ago
(In reply to Mike Hommey [:glandium] from comment #3)
> Created attachment 773808 [details] [diff] [review]

Using M-I, this patch and M-I, with mobile/android/config/mozconfigs/android/nightly
I get a lot of unwinding failure on ARM/Android -- averaging 5.2 frames/unwind.  

Re-adding "ac_add_options --disable-elf-hack" changes that to 28.9 frames/unwind,
which is what I expect to get.

Looking at the profile data in the GUI, for the elfhack'd build .. it's clear
there's a lot of unwind failure inside libxul.
Reporter

Comment 5

5 years ago
Julian, could you check how things go nowadays with elfhack+profiling?
Flags: needinfo?(jseward)
Blocks: 1052262
Assignee

Comment 6

5 years ago
(In reply to Mike Hommey [:glandium] from comment #5)
> Julian, could you check how things go nowadays with elfhack+profiling?

I tested x86_64-linux, x86-linux and arm-android.  None of the mozconfigs
had --disable-elfhack.  As far as I can see the LUL unwinder worked OK.
But now I am not sure whether or not that was the correct test.  Do I need
to explicitly say --enable-elfhack, or is it assumed if --disable-elfhack
is not present?
Flags: needinfo?(jseward)
Reporter

Comment 7

5 years ago
Note the option is --disable-elf-hack, and the default is --enable-elf-hack. However, with neither --disable-elf-hack or --enable-elf-hack, the default is to enable it *if possible*, and configure.in considers it not possible if --enable-profiling is there too.

So, to enable elfhack, you do need to add an explicit --enable-elf-hack. Elfhack also only applies when you run make package (or mach package). You can check libxul is elfhacked by looking at its ELF sections and looking whether there are "elfhack" sections.
Assignee

Comment 8

5 years ago
Retesting this more thoroughly, with explicit --enable-elf-hack, doing
"./mach package" along the way, and verifying the elfhackedness with readelf -S,
gives the following result:

   x86_64-linux  LUL works ok
   x86-linux     ./mach package fails
   arm-android   LUL fails

so AFAICS, apart from the x86-linux fail, the situation is unchanged from before.
Reporter

Comment 9

5 years ago
(In reply to Julian Seward [:jseward] from comment #8)
> Retesting this more thoroughly, with explicit --enable-elf-hack, doing
> "./mach package" along the way, and verifying the elfhackedness with readelf
> -S,
> gives the following result:
> 
>    x86_64-linux  LUL works ok
>    x86-linux     ./mach package fails

How does mach package fail?

>    arm-android   LUL fails
> 
> so AFAICS, apart from the x86-linux fail, the situation is unchanged from
> before.

The code being different now, it would be interesting to know why that fails now.
Assignee

Comment 10

5 years ago
> How does mach package fail?

See attached log file.  This is a 32-bit build on a 64-bit host.
Maybe relevant?
Assignee

Comment 11

5 years ago
(In reply to Mike Hommey [:glandium] from comment #9)
> >    arm-android   LUL fails
> The code being different now, it would be interesting to know why that fails
> now.

I'll look into it.  One point of commonality is that LUL's Elf parser
is derived from Breakpad's Elf parser, so if the latter has a problem
with elfhacked objects then I would expect the former to also have
such a problem.
Reporter

Comment 12

5 years ago
(In reply to Julian Seward [:jseward] from comment #11)
> (In reply to Mike Hommey [:glandium] from comment #9)
> > >    arm-android   LUL fails
> > The code being different now, it would be interesting to know why that fails
> > now.
> 
> I'll look into it.  One point of commonality is that LUL's Elf parser
> is derived from Breakpad's Elf parser, so if the latter has a problem
> with elfhacked objects then I would expect the former to also have
> such a problem.

Elfhack doesn't do things differently between arm-android and x86*-linux. Why would it fail for the former but not the latter?

(In reply to Julian Seward [:jseward] from comment #10)
> Created attachment 8503184 [details]
> bug788974c10_log.txt
> 
> > How does mach package fail?
> 
> See attached log file.  This is a 32-bit build on a 64-bit host.
> Maybe relevant?

The error suggests that the value of STRIP that you got out of configure is wrong.
Assignee

Comment 13

5 years ago
Fixes LUL's EXIDX reader so as to be able to deal with elfhacked ARM
binaries.  This simplifies and cleans up LUL's reading of
.ARM.{exidx,extab} sections by reading directly from the in-process
image rather than from the temporarily mapped object file, thereby
sidestepping the previous not-really-correct biasing scheme used when
reading EXIDX.

This relies on the assumption that, for elfhacked binaries, the bias
(difference between stated-in-the-file and actual-in-the-process
virtual addresses) for the two text sections is the same.  I think
that ELF guarantees this for us, but I am not sure.  Mike, do you
know?  In any case I would say that the reader is more correct now
than it was before.

WIP patch; do not commit.  Needs further tidying up to do with the
range checks that guarantee we will not segfault when reading this
unwind info.
Reporter

Comment 14

5 years ago
> This relies on the assumption that, for elfhacked binaries, the bias
> (difference between stated-in-the-file and actual-in-the-process
> virtual addresses) for the two text sections is the same.

If you mean differences in sh_addr, then yes, that's guaranteed. That said, I won't claim to know how those sections work, but it /should/ be possible to use them purely in-memory, without reading the sections table. The exidx section is pointed at by PT_EXIDX, and I think extab elements are directly pointed at from exidx. (and if you don't want to parse elf headers at all, you can even get the exidx address, as well as its size, by calling __gnu_Unwind_Find_exidx)
(In reply to Mike Hommey [:glandium] from comment #14)
> If you mean differences in sh_addr, then yes, that's guaranteed. That said,
> I won't claim to know how those sections work, but it /should/ be possible
> to use them purely in-memory, without reading the sections table. The exidx
> section is pointed at by PT_EXIDX, and I think extab elements are directly
> pointed at from exidx.

Yes: the exidx points to both the text and (when necessary) the extab with relative addresses.  All that's necessary are the address/length from the PT_EXIDX.

> (and if you don't want to parse elf headers at all,
> you can even get the exidx address, as well as its size, by calling
> __gnu_Unwind_Find_exidx)

Isn't that one of those functions that's visible only because Android winds up exporting libgcc's private symbols from libc (and a few other libraries, IIRC)?
Assignee

Comment 16

5 years ago
Per comment 13, fixes LUL's EXIDX reader so it works with elfhacked
objects.

* reads EXIDX from the running process image, not the temporarily
  mapped object from which CFI etc is read

* this simplifies the EXIDX reader and removes hard-to-understand
  and probably buggy biasing calculations

* redo range checking so that we don't segfault should the .exidx/.extab
  not have been mapped in with at least r perms, for whatever reason
Attachment #8511953 - Attachment is obsolete: true
Attachment #8515915 - Flags: review?(mh+mozilla)
Reporter

Updated

5 years ago
Attachment #8515915 - Flags: review?(mh+mozilla) → review+
Reporter

Comment 17

5 years ago
Could you also refresh attachment 773808 [details] [diff] [review]? It predates b2g, for instance.
Assignee

Comment 18

5 years ago
Rebased.  I didn't change it in any other way.
Attachment #773808 - Attachment is obsolete: true
Reporter

Comment 19

5 years ago
(In reply to Julian Seward [:jseward] from comment #18)
> Created attachment 8518123 [details] [diff] [review]
> "elfhack and profiling don't conflict anymore", rebased
> 
> Rebased.  I didn't change it in any other way.

The point was that there are plenty of new mozconfigs that have --disable-elf-hack that didn't exist back then.
Assignee

Comment 20

5 years ago
(In reply to Mike Hommey [:glandium] from comment #19)
Ok, I'll redo it, removing all the --disable-elf-hacks I can find.
Assignee

Comment 21

5 years ago
(In reply to Julian Seward [:jseward] from comment #16)
> Created attachment 8515915 [details] [diff] [review]
> bug788974-fix-exidx-and-elfhack-2.diff

https://hg.mozilla.org/integration/mozilla-inbound/rev/d13417400605
Keywords: leave-open
Assignee

Comment 22

5 years ago
I found it a bit difficult to decide where to remove --disable-elf-hack.  In
the end I removed it everywhere except from build/unix/mozconfig.asan.  This
is on the basis that it could interact badly at least with the following
tools:

(1) LUL
(2) Jed Davis' EXIDX reader
(3) Valgrind
(4) ASan

but (1) and (3) are AFAICT ok with elfhack now, and I'd guess that (2)
isn't affected by it.  That leaves (4); I assume that ASan has its own
unwinder and I don't know the elfhack-happyness status of that.

I'm not too sure about all this, so would appreciate feedback.  Is
indeed (2) elfhack-OK and (4) not-elfhack-OK?  Are there other tools
to be considered?
Attachment #8518123 - Attachment is obsolete: true
Attachment #8519888 - Flags: feedback?(mh+mozilla)
Attachment #8519888 - Flags: feedback?(jld)
Comment on attachment 8519888 [details] [diff] [review]
bug788974-att773808-rebase2.diff

My ARM EH unwinder should be elfhack-compatible.  If I recall correctly, I was trying to avoid breaking elfhack in particular (multiple executable load commands, varying/nonzero difference between offset and vaddr, etc.), and avoid unnecessary assumptions about ELF usage in general.  I also recall testing it with elfhack at one point, though not recently. 


For the sake of completeness: the one corner of ELF that I know it doesn't handle is if there isn't a PT_LOAD covering the ELF header and program headers.  The one instance of that that I saw was the Android dynamic linker, but that's no longer the case as of [dcbc378] which is in JB and up.

[dcbc378]: https://android.googlesource.com/platform/bionic/+/dcbc3787bfb9a272a010
Attachment #8519888 - Flags: feedback?(jld) → feedback+
Reporter

Comment 25

5 years ago
Comment on attachment 8519888 [details] [diff] [review]
bug788974-att773808-rebase2.diff

Review of attachment 8519888 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know the status of ASAN vs. elfhack, but chances are it works. I'm fine leaving elfhack off for those builds until it is completely ruled out as being a problem, though.
Attachment #8519888 - Flags: feedback?(mh+mozilla) → feedback+
Reporter

Updated

5 years ago
Attachment #8519888 - Flags: feedback+ → review+
Reporter

Updated

5 years ago
Assignee: mh+mozilla → jseward
Keywords: leave-open
Target Milestone: --- → mozilla37
Reporter

Comment 28

2 years ago
Is there still a reason this is leave-open and not closed?
Flags: needinfo?(jseward)
Assignee

Comment 29

2 years ago
I have no idea why this is leave-open.  I think we can close it now.
Flags: needinfo?(jseward)
Reporter

Comment 30

2 years ago
Actually, the leave-open flag was removed 2.5 years ago... but the bug was not marked as FIXED (although it was marked status-firefox37: fixed).
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED

Comment 31

Last year
Pushed by mozilla@hocat.ca:
https://hg.mozilla.org/comm-central/rev/85e4ac0d421f
Port bug 788974: Don't disable elfhack when enabling profiling; r=me

Updated

Last year
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.