Open Bug 1035892 Opened 6 years ago Updated 2 months ago

All 64-bit mode crashes on OS X with reason EXC_BAD_ACCESS have crash addresses truncated to 32-bits

Categories

(Toolkit :: Crash Reporting, defect, P2, major)

x86_64
macOS
defect

Tracking

()

REOPENED
mozilla70

People

(Reporter: smichaud, Assigned: gsvelto)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

[This bug is spun off from bug 1002564, which has grown too unwieldy.  This bug covers issue #2 from bug 1002564 comment #14.]

Bugs in OS X Breakpad code cause crashes with "reason" EXC_BAD_ACCESS to have their crash addresses > 0xffffffff truncated to 32 bits.  (Crash addresses > 0x80000000 are also sign extended.)

A patch for this was already posted at bug 1002564 comment #26.  I'll update it to current trunk and repost it here.
Severity: normal → major
Attached patch Fix (obsolete) — Splinter Review
I based this partly on the following patch from Vlad:

http://people.mozilla.com/~vladimir/misc/breakpad-64-bit-exc.patch

I also found valuable information here:

http://stackoverflow.com/questions/2824105/handling-mach-exceptions-in-64bit-os-x-application

The only real documentation for this stuff is in the xnu source code.  The source for Apple's implementation of it is available at http://opensource.apple.com/.

Unlike Vlad, I made Breakpad (and the OS) use 64-bit "subcodes" in both 64-bit and 32-bit mode.  This works just fine in 32-bit mode (which I tested), and it makes the code simpler.

Ted, if you think this needs more reviewers, please add them as you see fit.

I've started tryserver builds here:
https://tbpl.mozilla.org/?tree=Try&rev=d9283e658e9c
Assignee: nobody → smichaud
Attachment #8452400 - Flags: review?(ted)

Taking this.

Assignee: smichaud → gsvelto
Status: NEW → ASSIGNED
Attachment #8452400 - Attachment is obsolete: true

I've updated the patch trying to address the review comments as best as I could. I've also double-checked other implementations and run both automated and manual tests and everything seems to work correctly. Heavy emphasis on seems considering how many unknown corner-cases there might be.

There seems to be an issue with marionette tests, it didn't show up locally in debug builds only try opt ones: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd1961ad3347270a4827ab719dff4fd303cd18cb&selectedJob=238146499

I'm testing again with a local opt build.

While I couldn't reproduce the failure locally it seems like it's a race in the minidump generation or something along the lines. In the marionette logs I can see three lock failures before the actual error:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=238146533&repo=try&lineNumber=39510-39513

Locally everything works fine, even after hundreds of runs with a heavy background load to slow down the system. This is definitely a race and possibly a narrow one. My changes here might have made it wide enough to reproduce frequently on try.

Interestingly we have a bunch of similar issues regarding marionette that occur only very rarely and I haven't been able to debug them yet (see bug 1523583). Hopefully this is something similar. There is also a known race in the crash reporting code (bug 1489536) but I don't think that's what causing the problem here.

Priority: -- → P2
Blocks: 1523276

I finally found some time to investigate the issue I encountered in comment 6... and it's gone. I'm running more tests to make sure it's really gone for good. It looked like a race between the crash generation and the front-end code so either the race is still there but not triggered or it was solved (most likely in the front-end code). Either way if testing comes up clean I'll land this, it's about time. This is my latest try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=900578d15112851fecfe57dd53dde0a8e04dcc2f

Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2b1ffc8004d5
Handle 64-bit addresses for EXC_BAD_ACCESS exceptions on Mac r=froydnj
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Status: RESOLVED → REOPENED
Flags: needinfo?(gsvelto)
Resolution: FIXED → ---

I have just tested on my Mac... and I don't reproduce, no matter how hard I try. I'll push this to try again but I'm wondering why this doesn't reproduce locally nor it happened on try before landing. Sigh. I'll never land this fix.

Flags: needinfo?(gsvelto)
Duplicate of this bug: 1002564
No longer blocks: 1523276
You need to log in before you can comment on or make changes to this bug.