Closed Bug 822584 Opened 11 years ago Closed 11 years ago

Breakpad can't deal with elfhacked binaries on ARM (B2G crash stacks are broken/corrupted)

Categories

(Firefox Build System :: General, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:-, firefox19 fixed, b2g18 fixed)

RESOLVED FIXED
mozilla20
blocking-basecamp -
Tracking Status
firefox19 --- fixed
b2g18 --- fixed

People

(Reporter: glandium, Assigned: glandium)

References

Details

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #821353 +++

Let's keep bug 821353 for the breakpad issue. This bug is going to be about the elfhack workaround. The idea is to add another PT_LOAD to force the linker to leave no anonymous memory gap between the two executable PT_LOADs elfhack creates.
This effectively reverts part of bug 637341, and I don't think it's going to be a problem. (As a matter of fact, the new packager code from bug 780561 does that too)
Almost two years later, I'm not sure why I wanted to do bug 637341 in the first place.

This is required for the upcoming elfhack patch because stripping the elfhacked binary breaks the hack (strip modified the filler PT_LOAD in a way that makes it useless)
Attachment #693303 - Flags: review?(ted)
Assignee: nobody → mh+mozilla
The elfhack patch is going to be on top of bug 816494, so either we backport bug 816494 or a patch will have to be written for b2g18 alone. The patch in bug 816494 has been tested for long enough that I'm confident it won't break anything on b2g18.
Depends on: 816494
Someone will have to test these patches on a device. All I get from a b2g build on my otoro at the moment is a white screen, and i can't find a way to trigger a crash that generates a minidump.
Attachment #693303 - Flags: review?(ted) → review+
Attachment #693332 - Flags: review?(ted) → review+
(In reply to Mike Hommey [:glandium] from comment #5)
> Someone will have to test these patches on a device. All I get from a b2g
> build on my otoro at the moment is a white screen, and i can't find a way to
> trigger a crash that generates a minidump.

Ok, I was able to get a minidump from my build, and the result looks good: one MDRawModule entry covering libxul.so in its entirety. The memory mapping is not entirely as I would like it to be though, the linker maps the filler PT_LOAD read/executable, while there's no flag on the segment (glibc properly loads it with no flag ; probably a bionic bug), but it doesn't impair functionality.
(In reply to Mike Hommey [:glandium] from comment #5)
> i can't find a way to
> trigger a crash that generates a minidump.

FYI, a |kill -ABRT| on the b2g or a plugin-container process should cause our crash reporter to kick in and generate/send a report.
(In reply to Mike Hommey [:glandium] from comment #1)
> This effectively reverts part of bug 637341

IIUC, after this, we need to make sure that the output of |make buildsymbols| and |make package| are still working well together, right?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #8)
> (In reply to Mike Hommey [:glandium] from comment #1)
> > This effectively reverts part of bug 637341
> 
> IIUC, after this, we need to make sure that the output of |make
> buildsymbols| and |make package| are still working well together, right?

They do, afaict.

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #7)
> (In reply to Mike Hommey [:glandium] from comment #5)
> > i can't find a way to
> > trigger a crash that generates a minidump.
> 
> FYI, a |kill -ABRT| on the b2g or a plugin-container process should cause
> our crash reporter to kick in and generate/send a report.

The subtle thing is, the crash reporter is not enabled by default on custom builds. I keep forgetting that. (and bug 822646 didn't help)
Nathan is out all week, do we have an alternate reviewer?
Comment on attachment 693330 [details] [diff] [review]
Workaround in elfhack to accomodate for breakpad not handling the memory mapping induced by the elfhack/bionic linker combination

(In reply to JP Rosevear [:jpr] from comment #10)
> Nathan is out all week, do we have an alternate reviewer?

Let's try jimb or ted.
Attachment #693330 - Flags: review?(ted)
Attachment #693330 - Flags: review?(nfroyd)
Attachment #693330 - Flags: review?(jimb)
Target Milestone: --- → B2G C3 (12dec-1jan)
Comment on attachment 693330 [details] [diff] [review]
Workaround in elfhack to accomodate for breakpad not handling the memory mapping induced by the elfhack/bionic linker combination

With bug 822432 under work, we can wait for a review from someone familiar with this code.
Attachment #693330 - Flags: review?(ted)
Attachment #693330 - Flags: review?(nfroyd)
Attachment #693330 - Flags: review?(jimb)
Due to bug 822432 this is no longer a blocker, we have a server side work around
blocking-basecamp: + → -
Comment on attachment 693330 [details] [diff] [review]
Workaround in elfhack to accomodate for breakpad not handling the memory mapping induced by the elfhack/bionic linker combination

Review of attachment 693330 [details] [diff] [review]:
-----------------------------------------------------------------

This looks sane enough.  I'd like to get my concerns addressed before giving r+ on this.

::: build/unix/elfhack/elf.cpp
@@ +584,5 @@
>  }
>  
>  unsigned int ElfSegment::getFileSize()
>  {
> +    if (type == PT_GNU_RELRO || (type == PT_LOAD && flags == 0))

It would be an improvement if the new check was pulled out into its own function with some obvious name.

@@ +603,5 @@
>  }
>  
>  unsigned int ElfSegment::getMemSize()
>  {
> +    if (type == PT_GNU_RELRO || (type == PT_LOAD && flags == 0))

...and used here.

@@ +621,5 @@
>          (sections.front()->getAddr() != vaddr))
>          throw std::runtime_error("PT_GNU_RELRO segment doesn't start on a section start");
>  
> +    // Neither bionic nor glibc linkers seem to like when the offset of that segment is 0
> +    if (type == PT_LOAD && flags == 0)

...and here.

@@ +633,5 @@
>      if ((type == PT_GNU_RELRO) && !sections.empty() &&
>          (sections.front()->getAddr() != vaddr))
>          throw std::runtime_error("PT_GNU_RELRO segment doesn't start on a section start");
>  
> +    return sections.empty() ? vaddr : sections.front()->getAddr();

...and I think here, too, even?  Maybe like:

if (isElfHackLoadSegment())
  return vaddr;

would be sufficient, instead of changing the return statement?

::: build/unix/elfhack/elfhack.cpp
@@ +415,5 @@
> +            phdr.p_vaddr = (previous->getAddr() + previous->getSize() + segment->getAlign() - 1) & ~(segment->getAlign() - 1);
> +            phdr.p_paddr = phdr.p_vaddr + segment->getVPDiff();
> +            phdr.p_flags = 0;
> +            phdr.p_align = 0;
> +            phdr.p_filesz = (section->getAddr() & ~(newSegment->getAlign() - 1)) - phdr.p_vaddr;

The computations here don't look right.  We have:

...:
  previous @ address Y
newSegment:
  section @ address X
  ...more sections
newAnonymousSegment: /* we are creating this one */

and we're basing the vaddr of newAnonymousSegment off of Y (though I think p_vaddr here ultimately winds up near X) and the filesize off of X - (X - epsilon)--which looks like it's going to wrap or be very small?

In any event, I think setting p_filesz to 0 and p_memsz to whatever size we compute is more correct in this case; this fake section is effectively .bss.

@@ +420,5 @@
> +            phdr.p_memsz = phdr.p_filesz;
> +            if (phdr.p_filesz) {
> +                newSegment = new ElfSegment(&phdr);
> +                elf->insertSegmentAfter(segment, newSegment);
> +            }

I think this would be slightly clearer as:

if (fill)
  insertFakeLoadSegment(elf, section, ...);
break;

but you may have to pass a lot of args to insertFakeLoadSegment, so I'm not sure if it's worth it.

@@ +699,5 @@
>      ElfSegment *second = elf.getSegmentByType(PT_LOAD, first);
> +    ElfSegment *filler = NULL;
> +    // If the second PT_LOAD has empty flags, the file was elfhacked with
> +    // the --fill option.
> +    if (!second->getFlags()) {

...can use the new function here too.
Attachment #693330 - Flags: review?(nfroyd) → review-
(In reply to Nathan Froyd (:froydnj) from comment #14)
> Comment on attachment 693330 [details] [diff] [review]
> Workaround in elfhack to accomodate for breakpad not handling the memory
> mapping induced by the elfhack/bionic linker combination
> 
> Review of attachment 693330 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks sane enough.  I'd like to get my concerns addressed before giving
> r+ on this.
> 
> ::: build/unix/elfhack/elf.cpp
> @@ +584,5 @@
> >  }
> >  
> >  unsigned int ElfSegment::getFileSize()
> >  {
> > +    if (type == PT_GNU_RELRO || (type == PT_LOAD && flags == 0))
> 
> It would be an improvement if the new check was pulled out into its own
> function with some obvious name.
> 
> @@ +603,5 @@
> >  }
> >  
> >  unsigned int ElfSegment::getMemSize()
> >  {
> > +    if (type == PT_GNU_RELRO || (type == PT_LOAD && flags == 0))
> 
> ...and used here.
> 
> @@ +621,5 @@
> >          (sections.front()->getAddr() != vaddr))
> >          throw std::runtime_error("PT_GNU_RELRO segment doesn't start on a section start");
> >  
> > +    // Neither bionic nor glibc linkers seem to like when the offset of that segment is 0
> > +    if (type == PT_LOAD && flags == 0)
> 
> ...and here.
> 
> @@ +633,5 @@
> >      if ((type == PT_GNU_RELRO) && !sections.empty() &&
> >          (sections.front()->getAddr() != vaddr))
> >          throw std::runtime_error("PT_GNU_RELRO segment doesn't start on a section start");
> >  
> > +    return sections.empty() ? vaddr : sections.front()->getAddr();
> 
> ...and I think here, too, even?  Maybe like:
> 
> if (isElfHackLoadSegment())
>   return vaddr;
> 
> would be sufficient, instead of changing the return statement?
> 
> ::: build/unix/elfhack/elfhack.cpp
> @@ +415,5 @@
> > +            phdr.p_vaddr = (previous->getAddr() + previous->getSize() + segment->getAlign() - 1) & ~(segment->getAlign() - 1);
> > +            phdr.p_paddr = phdr.p_vaddr + segment->getVPDiff();
> > +            phdr.p_flags = 0;
> > +            phdr.p_align = 0;
> > +            phdr.p_filesz = (section->getAddr() & ~(newSegment->getAlign() - 1)) - phdr.p_vaddr;
> 
> The computations here don't look right.  We have:
> 
> ...:
>   previous @ address Y
> newSegment:
>   section @ address X
>   ...more sections
> newAnonymousSegment: /* we are creating this one */
> 
> and we're basing the vaddr of newAnonymousSegment off of Y (though I think
> p_vaddr here ultimately winds up near X) and the filesize off of X - (X -
> epsilon)--which looks like it's going to wrap or be very small?

The point is to have a segment starting at the first page after Y and ending at the end of the page before X. Which that computation just does.

> In any event, I think setting p_filesz to 0 and p_memsz to whatever size we
> compute is more correct in this case; this fake section is effectively .bss.

It can't be effectively .bss, because the linker would use anonymous memory, and we'd be back to square one.
Comment on attachment 693330 [details] [diff] [review]
Workaround in elfhack to accomodate for breakpad not handling the memory mapping induced by the elfhack/bionic linker combination

Ah, you're right!  I was getting the calls to insertSegmentAfter mixed up: I see that you're inserting them both after |segment|.

r=me with the |type == PT_LOAD && flags == 0| check moved out as suggested; do what you like with the code for the creation of the fake segment.
Attachment #693330 - Flags: review- → review+
Do we want this landed on b2g18?
Target Milestone: B2G C3 (12dec-1jan) → ---
Comment on attachment 693303 [details] [diff] [review]
Run elfhack on packaged binaries instead of built binaries

[Approval Request Comment]
See comment 1 as to why this is required in this patch queue.
Testing completed (on m-c, etc.): On m-c since yesterday.
Risk to taking this patch (and alternatives if risky): This only affects the debug symbol files we can download from symbols.mozilla.org for Linux and Android builds (not the breakpad symbol files, the ones for gdb). As far as my testing goes, gdb is happy with the discrepancy.
String or UUID changes made by this patch: None
Attachment #693303 - Flags: approval-mozilla-b2g18?
Attachment #693303 - Flags: approval-mozilla-aurora?
Comment on attachment 693330 [details] [diff] [review]
Workaround in elfhack to accomodate for breakpad not handling the memory mapping induced by the elfhack/bionic linker combination

[Approval Request Comment]
User impact if declined: crash dumps are wrong in a way that breakpad doesn't like when doing stack walking, effectively breaking crash reports in socorro. We do have a workaround in place on the server side to unbreak the crash dumps when they arrive, but it would be better if they weren't received broken.
Testing completed (on m-c, etc.): This was only tested locally, because the builds on m-c don't enable elfhack, but it works as expected on my local builds and generates proper crash dumps.
Risk to taking this patch (and alternatives if risky): The patch itself only adds a new feature to elfhack, which is enabled with the 3rd patch in this bug. Backing out the 3rd patch is enough to get back to the state where crash dumps are received broken. The default behaviour of elfhack is unchanged.
String or UUID changes made by this patch: None
Attachment #693330 - Flags: approval-mozilla-b2g18?
Attachment #693330 - Flags: approval-mozilla-aurora?
Comment on attachment 693332 [details] [diff] [review]
Enable elfhack workaround for breakpad on b2g/gonk

[Approval Request Comment]
Enables the feature provided by the 2nd patch on B2G only, when elfhack is enabled.
See approval request comment for the 2nd patch.
Attachment #693332 - Flags: approval-mozilla-b2g18?
Attachment #693332 - Flags: approval-mozilla-aurora?
Comment on attachment 693303 [details] [diff] [review]
Run elfhack on packaged binaries instead of built binaries

Build-only workaround for bug 821353.
Attachment #693303 - Flags: approval-mozilla-b2g18?
Attachment #693303 - Flags: approval-mozilla-b2g18+
Attachment #693303 - Flags: approval-mozilla-aurora?
Attachment #693303 - Flags: approval-mozilla-aurora+
Attachment #693332 - Flags: approval-mozilla-b2g18?
Attachment #693332 - Flags: approval-mozilla-b2g18+
Attachment #693332 - Flags: approval-mozilla-aurora?
Attachment #693332 - Flags: approval-mozilla-aurora+
Attachment #693330 - Flags: approval-mozilla-b2g18?
Attachment #693330 - Flags: approval-mozilla-b2g18+
Attachment #693330 - Flags: approval-mozilla-aurora?
Attachment #693330 - Flags: approval-mozilla-aurora+
Depends on: 835214
Blocks: 835214
No longer depends on: 835214
Depends on: 940250
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.