Open Bug 1035892 Opened 6 years ago Updated 2 months ago

All 64-bit mode crashes on OS X with reason EXC_BAD_ACCESS have crash addresses truncated to 32-bits


(Toolkit :: Crash Reporting, defect, P2, major)






(Reporter: smichaud, Assigned: gsvelto)


(Blocks 1 open bug)



(1 file, 1 obsolete file)

[This bug is spun off from bug 1002564, which has grown too unwieldy.  This bug covers issue #2 from bug 1002564 comment #14.]

Bugs in OS X Breakpad code cause crashes with "reason" EXC_BAD_ACCESS to have their crash addresses > 0xffffffff truncated to 32 bits.  (Crash addresses > 0x80000000 are also sign extended.)

A patch for this was already posted at bug 1002564 comment #26.  I'll update it to current trunk and repost it here.
Severity: normal → major
Attached patch Fix (obsolete) — Splinter Review
I based this partly on the following patch from Vlad:

I also found valuable information here:

The only real documentation for this stuff is in the xnu source code.  The source for Apple's implementation of it is available at

Unlike Vlad, I made Breakpad (and the OS) use 64-bit "subcodes" in both 64-bit and 32-bit mode.  This works just fine in 32-bit mode (which I tested), and it makes the code simpler.

Ted, if you think this needs more reviewers, please add them as you see fit.

I've started tryserver builds here:
Assignee: nobody → smichaud
Attachment #8452400 - Flags: review?(ted)

Taking this.

Assignee: smichaud → gsvelto
Attachment #8452400 - Attachment is obsolete: true

I've updated the patch trying to address the review comments as best as I could. I've also double-checked other implementations and run both automated and manual tests and everything seems to work correctly. Heavy emphasis on seems considering how many unknown corner-cases there might be.

There seems to be an issue with marionette tests, it didn't show up locally in debug builds only try opt ones:

I'm testing again with a local opt build.

While I couldn't reproduce the failure locally it seems like it's a race in the minidump generation or something along the lines. In the marionette logs I can see three lock failures before the actual error:

Locally everything works fine, even after hundreds of runs with a heavy background load to slow down the system. This is definitely a race and possibly a narrow one. My changes here might have made it wide enough to reproduce frequently on try.

Interestingly we have a bunch of similar issues regarding marionette that occur only very rarely and I haven't been able to debug them yet (see bug 1523583). Hopefully this is something similar. There is also a known race in the crash reporting code (bug 1489536) but I don't think that's what causing the problem here.

Priority: -- → P2
Blocks: 1523276

I finally found some time to investigate the issue I encountered in comment 6... and it's gone. I'm running more tests to make sure it's really gone for good. It looked like a race between the crash generation and the front-end code so either the race is still there but not triggered or it was solved (most likely in the front-end code). Either way if testing comes up clean I'll land this, it's about time. This is my latest try run:

Pushed by
Handle 64-bit addresses for EXC_BAD_ACCESS exceptions on Mac r=froydnj
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Flags: needinfo?(gsvelto)
Resolution: FIXED → ---

I have just tested on my Mac... and I don't reproduce, no matter how hard I try. I'll push this to try again but I'm wondering why this doesn't reproduce locally nor it happened on try before landing. Sigh. I'll never land this fix.

Flags: needinfo?(gsvelto)
Duplicate of this bug: 1002564
No longer blocks: 1523276
You need to log in before you can comment on or make changes to this bug.