Closed Bug 1018360 Opened 10 years ago Closed 10 years ago

Publicly display thread state for crashing thread on Intel 64-bit processors

Categories

(Socorro :: General, task)

x86_64
All
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: smichaud, Unassigned)

References

Details

Intel 64-bit (i.e. amd64) CPUs have a quirk that sometimes prevents the "crash address" for illegal memory access crashes from being accurately reported:  Their address space isn't actually 64 bits wide.  So if you crash by trying to access an invalid address (one > 0x7fffffffffff), the crash address isn't stored or reported.

If you crash trying to access a valid address that you don't have permission to access, the Intel 64-bit CPU stores this value in the CR2 register, and it gets reported to Breakpad's crash handler by kernel-level code.  But this doesn't happen if your "address" > 0x7fffffffffff.  In this case the "crash address" gets reported as 0xffffffffffffffff (on Windows, see bug 974420) or 0 (on OS X, see bug 1002564 comment #14).

Needless to say, this makes it much more difficult to figure out these crashes.

Short of Intel/AMD changing the design of their 64-bit CPUs, there is no cure for this problem.  But the crash address will almost always have been stored in a user-level register just before the crash (and the crash will have been triggered by trying to access memory at the "address" in that register).  So seeing the contents of all these registers (the "thread state") as of the crash will often allow us to guess/infer the crash address, even if it wasn't reported by kernel-level code.
Breakpad currently stores this information in minidumps, plus much more:  It stores the thread state for every thread, and for each frame of each thread's stack.  (The actual thread state is stored for the top frame.)

But this information isn't displayed by Socorro, at least publicly.  Surely part of the reason is that it's very bulky.  But apparently people also think that this information could be used to identify the person who submitted a given minidump.

However, the thread state (properly so called) for the crashing thread (for its top frame) isn't particularly bulky.  And I find it hard to believe that having this information would make it any easier to match up a particular crash report with a particular user.

Any qualms about these issues are heavily outweighed by the fact that having this information is often extremely useful.  It will often make the difference between whether or not a bug can be fixed.  Bug 997908 is a good case.
Severity: normal → major
Up until very recently, minidump-stackwalk didn't report the full thread state for even the top frame of each thread's stack.  But this has now been fixed.  See bug 1002564 comment #39 and http://breakpad.appspot.com/7654002/.
> But this has now been fixed

Only partly.  See bug 1002564 comment #43.
Benjamin, how would we best present this in Socorro so that it's as devs would expect?

Ted, does Breakpad expose what we need?
Kairo: per the preceding comment, we don't have this in the JSON output yet, but it's fairly easy to add.
Currently in the "raw dump" tab we display the pipe-delimited output. We should switch to displaying the JSON output (prettified if necessary) without the restricted data. This is strictly more information, and would make it possible to show other data in the future without having to code custom display.
bug 976077 covers using the JSON dump to build everything in report/index, probably falls under the purview of that.
My patch in bug 1030276 will let you view this in the "Raw Dump" tab, that's probably sufficient for now.
Ted:

When/if you do a mockup, please let us know.  Otherwise let us know when your patch has landed and we can see its results in production Socorro.
> http://people.mozilla.org/~tmielczarek/raw-json.png

That actually doesn't show all the registers.  Could you make the window bigger, or scroll it down, before making another snapshot?
I'll just copy and paste what's there:
    "registers": {
        "r10": "0x00007fff928d2c70",
        "r11": "0x00007f14226002cf",
        "r12": "0x00007f13fea509e8",
        "r13": "0xfffffffffffffffc",
        "r14": "0x0000000000000008",
        "r15": "0x00007fff928d2dc0",
        "r8": "0x00007fff928d2cb0",
        "r9": "0x00007f13febd73f8",
        "rax": "0x00007f14226002f1",
        "rbp": "0x00007fff928d2d20",
        "rbx": "0x00007fff928d2dd0",
        "rcx": "0x00007f1431c13ff0",
        "rdi": "0x0000000000000000",
        "rdx": "0xffffffffffffff51",
        "rip": "0x00007f14226002f1",
        "rsi": "0x00007f13febd73f8",
        "rsp": "0x00007fff928d2d00"
    }
Thanks!  Looks good to me.
The fix for bug 1030276 has gotten into production Socorro, so I'd say this is now fixed.

But before I mark it so, I'm going to play around with various kinds of crashes on various threads.
I tested with various crash addresses, on the main thread and off, and didn't see any problems.  In all cases the raw dump displayed the user-level registers for the crashing thread, including one that contained the crash address.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I should mention that I tested only on OS X.  But, as bug 1030276 only changed "cross-platform" code, I think this bug must also be fixed for crash reports on Windows and Linux.
Thanks again, Ted!

Now that we've started memory poisoning, there will be lots more cases where the "crash address" > 0x7fffffffffff (in 64-bit mode).  So this fix will be crucial in detecting cases of memory poisoning.

It will also make it possible to work around our current lack of a reviewed patch for bug 1002564.
You're welcome, I'm glad that was useful! It should work on any platform, the underlying data structures are the same.
See Also: → 974420
You need to log in before you can comment on or make changes to this bug.