Closed Bug 1322611 Opened 8 years ago Closed 7 years ago

Create new field for client_crash_id to be populated from breakpad

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla55
Tracking Status
firefox55 --- verified

People

(Reporter: ddurst, Assigned: gsvelto)

References

Details

Attachments

(1 file)

We want to be able to align the breakpad-generated crash id with a crash report. To do that, we would pass the crash id along with the crash report.

I'm nominating 'client_crash_id' just to keep it unique across both systems (and not tied to breakpad), but something else would be fine.

This would be part of bug 1293656, as we were trying to do this with 'crash_id' previously.

(I wasn't sure where to file this, so feel free to change as necessary.)
Flags: needinfo?(chris.lonnen)
Blocks: 1328657
This needs to be fixed in the crashreporter client: in the standalone C++ app, CrashSubmit.jsm, and the Android CrashReporter.java. It should be as simple as taking the GUID from the .dmp filename and adding it as an additional POST field.
Component: General → Breakpad Integration
Product: Socorro → Toolkit
@ddurst -- I've thought a little more about this, now. I'd rather not trust a client-assigned ID. How could we rely on a server-assigned ID instead? Submitting both payloads in the same request?
Flags: needinfo?(chris.lonnen)
lonnen and I chatted about this briefly on IRC and I understand his concern here, since the IDs will be publicly-visible and it'd be easy for a malicious client to submit existing IDs to collide. I suggested that we instead generate the ID by hashing the minidump file contents, so Socorro could generate it server-side, and the client could generate it when doing the client-side stackwalking.
This wfm, since we get one step closer to having this be the same submission.
+1
I'm starting work on this, any preference for the hashing function? Should I use something like SHA256? Also, do we still want to call this client_crash_id? Maybe minidump_hash or minidump_sha256_hash would be better?
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
After some offline discussions it looks like this is going to be a SHA256 hash of the minidump so Socorro can re-generate it easily. I was thinking of calling the ping's field |minidumpSha256Hash| for maximum clarity since it's not really an id (i.e. it doesn't guarantee that there won't be any duplicates, it's just very, very unlikely).
Flags: needinfo?(ddurst)
Flags: needinfo?(ddurst)
I've updated the patch with Fennec support so this works across all platforms now.
Comment on attachment 8839844 [details]
Bug 1322611 - After a crash compute the SHA256 hash of a minidump and attach it to the crash ping;

https://reviewboard.mozilla.org/r/114426/#review118700

::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:274
(Diff revision 2)
>      public void onRestartClick(View v) {  // bound via crash_reporter.xml
>          doRestart();
>          backgroundSendReport();
>      }
>  
> +    private void computeMinidumpHash(File extraFile, File minidump) {

Can you get someone from the Android team to review this? I don't feel like I'm a competent reviewer of Java code.

::: toolkit/components/crashes/CrashService.js:28
(Diff revision 2)
> + *          minidump. If the hash could not be computed then null is returned
> + *          instead.
> + */
> +function computeMinidumpHash(id) {
> +  let cr = Cc["@mozilla.org/toolkit/crash-reporter;1"]
> +             .getService(Components.interfaces.nsICrashReporter);

Random thought: there's a `crashmanager` on Services.jsm, maybe we should add a `crashreporter` there as well?

::: toolkit/components/crashes/CrashService.js:43
(Diff revision 2)
> +
> +      let hashBin = hasher.finish(false);
> +      let hash = "";
> +
> +      for (let i = 0; i < hashBin.length; i++) {
> +        // Every character in the hash string contains a byte of the hash data

Wow, do we really not have a `tohexString()` method on whatever `nsICryptoHash.finish` returns?

::: toolkit/crashreporter/client/crashreporter.cpp:439
(Diff revision 2)
>  {
>    srand(time(0));
>    return ((rand() % 100) < MOZ_CRASHREPORTER_ENABLE_PERCENT);
>  }
>  
> +// Class used to compute a SHA256 hash

Can we please do almost anything other than add another C++ hash implementation? I would be fine with attempting to use system libraries, importing a Rust crate, even linking the crashreporter against Firefox's copy of NSS and using that.
Attachment #8839844 - Flags: review?(ted) → review-
Aside from my one (admittedly non-trivial) sticking point the rest of the patch looks fine.
Comment on attachment 8839844 [details]
Bug 1322611 - After a crash compute the SHA256 hash of a minidump and attach it to the crash ping;

https://reviewboard.mozilla.org/r/114426/#review118700

> Can you get someone from the Android team to review this? I don't feel like I'm a competent reviewer of Java code.

Sure, I'll find someone.

> Random thought: there's a `crashmanager` on Services.jsm, maybe we should add a `crashreporter` there as well?

Good point, it's used in a bunch of places already and we're only going to use it more as we ramp up the number of processes we have.

> Wow, do we really not have a `tohexString()` method on whatever `nsICryptoHash.finish` returns?

Yup. nsICryptoHash has provisions to turn that into a base64 string but not into a hex one. Maybe I could add a method there instead.

> Can we please do almost anything other than add another C++ hash implementation? I would be fine with attempting to use system libraries, importing a Rust crate, even linking the crashreporter against Firefox's copy of NSS and using that.

I understand your feelings on this, working on the crashreporter sometimes feel like getting an assignment in college where you have to reimplement something trivial and already widely available. I'll gladly use NSS instead though as with everything that's crashreporter-related I'll need to jump through some hoops to get it working. On Windows and Linux I can just link with the in-tree NSS. On Linux thought I'll need to dlopen() it instead, here's why: breakpad uses libcurl internally to send the crashreport to our servers and since breakpad doesn't bundle libcurl (nor do we) it dlopen()s it. But libcurl depends on NSS and we don't want it to pick the in-tree NSS because it can be quite a bit different than whatever the system libcurl was built agains:

https://dxr.mozilla.org/mozilla-central/rev/eb23648534779c110f3a1f2baae1849ae4a9c570/toolkit/crashreporter/nsExceptionHandler.cpp#873

So the best we can in Linux is also dlopen() NSS; and dynamically check if it supports SHA256 hashes. In the unlikely case it doesn't we can just skip generating that field safely.

If this sounds OK-ish I'll go ahead and do it.
Quick update: I'm working on it and the basic approach seems to work (dlopen() NSS on Linux, link directly on Mac and Windows) but it needs some extra massaging on Mac because the libraries we package are not in the same folder as the crashreporter app so I either need to set the appropriate DYLD_LIBRARY_PATH when launching it.
I've updated the patch to use NSS instead of custom code to compute the SHA256 hash in the crashreporter client and flagged James for review for the Fennec/Java changes.

Using NSS turned out to be complicated, the changes I had to make are the following:

- On Windows just use the NSS library we build and bundle

- On Mac use the NSS library we build but provide a custom DYLD_LIBRARY_PATH override so that the crashreporter client can pick it up correctly. Since I was at it I've removed the Mac-specific code that used posix_spawnp() from the exception handler and used the same fork()/exec() path we use on Linux. posix_spawnp() was introduced to allow us to pick the right processor in PowerPC/x86/x86-64 universal builds. We now only support x86-64 so a separate path isn't needed anymore.

- On Linux I dlopen() the system-provided NSS library because the breakpad code dlopen()s libcurl, and libcurl is linked against NSS so we can't use the version we build ourselves.
Attachment #8839844 - Flags: review?(snorp) → review?(nchen)
Comment on attachment 8839844 [details]
Bug 1322611 - After a crash compute the SHA256 hash of a minidump and attach it to the crash ping;

https://reviewboard.mozilla.org/r/114426/#review120574

::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:275
(Diff revision 3)
>          doRestart();
>          backgroundSendReport();
>      }
>  
> +    private void computeMinidumpHash(File extraFile, File minidump) {
> +        try {

The patch closes `writer` but not `stream`, and in general, you need two try blocks to properly close the file in case of exception. e.g.,

    try {
        Stream steam = new Stream();
        
        try {
            stream.read();
        } finally {
            stream.close();
        }
    } catch (IOException e) {
        // ...
    }

::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:287
(Diff revision 3)
> +            while ((readBytes = stream.read(buffer)) != -1) {
> +                md.update(buffer, 0, readBytes);
> +            }
> +
> +            byte[] digest = md.digest();
> +            StringWriter hash = new StringWriter(84);

Use `StringBuilder` instead of `StringWriter`

::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:301
(Diff revision 3)
> +            hash.write('\n');
> +
> +            FileWriter writer = new FileWriter(extraFile, /* append */ true);
> +            writer.write(hash.toString());
> +            writer.close();
> +        } catch (Exception e) {

IOException
Attachment #8839844 - Flags: review?(nchen) → review-
Thanks for the review Jim, I'll address your comments and post an updated patch shortly.
Attachment #8839844 - Flags: review?(snorp) → review?(nchen)
Comment on attachment 8839844 [details]
Bug 1322611 - After a crash compute the SHA256 hash of a minidump and attach it to the crash ping;

https://reviewboard.mozilla.org/r/114426/#review121612

r+ for CrashReporter.java
Attachment #8839844 - Flags: review?(nchen) → review+
(In reply to Gabriele Svelto [:gsvelto] from comment #16)
> - On Linux I dlopen() the system-provided NSS library because the breakpad
> code dlopen()s libcurl, and libcurl is linked against NSS so we can't use
> the version we build ourselves.

I'm a little worried that this won't work reliably since I think there are distros where libcurl is linked against OpenSSL or GNUTLS. I guess if we fail gracefully in these situations and just don't send the minidump hash it won't be super harmful. I guess we'll be able to measure this in practice by counting crash reports that are missing the dump hash.
Comment on attachment 8839844 [details]
Bug 1322611 - After a crash compute the SHA256 hash of a minidump and attach it to the crash ping;

https://reviewboard.mozilla.org/r/114426/#review123900

::: toolkit/components/crashes/CrashService.js:49
(Diff revision 4)
> +        hash += ("0" + hashBin.charCodeAt(i).toString(16)).slice(-2);
> +      }
> +
> +      return hash;
> +    } catch (e) {
> +      Cu.reportError(e);

As a follow-up, maybe we could figure out a better error reporting story here? I guess we can wait and see if we get crash reports that are missing the dump hash to see if this actually occurs in practice.

::: toolkit/crashreporter/client/crashreporter.cpp:472
(Diff revision 4)
> +
> +  if (!lib) {
> +    return "";
> +  }
> +
> +  SECStatus (*NSS_Initialize)(const char*, const char*, const char*,

I know I've said it before, but now I'm practically shaking with glee at the thought of ripping all this code out and replacing it with Rust.

::: toolkit/crashreporter/client/crashreporter.cpp:547
(Diff revision 4)
> +  HASH_Destroy(hashContext);
> +
> +  if (!error) {
> +    ostringstream hash;
> +
> +    for (size_t i = 0; i < SHA256_LENGTH; i++) {

I feel like I would just write this like:
```
string hash(SHA256_LENGTH * 2 + 1, '\0');
for (size_t i = 0; i < SHA256_LENGTH; i++) {
  snprintf(&hash[i * 2], 3, "%02x", result[i]);
}
```

::: toolkit/crashreporter/client/crashreporter.cpp:767
(Diff revision 4)
>      // Assemble and send the crash ping
> +    string hash;
>      string pingUuid;
>  
> -    if (SendCrashPing(queryParameters, pingUuid)) {
> +    hash = ComputeDumpHash();
> +    AppendToEventFile("MinidumpSha256Hash", hash);

Do you want an `if (!hash.empty())` here?

::: toolkit/crashreporter/client/crashreporter_osx.mm:902
(Diff revision 4)
>  bool UIDeleteFile(const string& file)
>  {
>    return (unlink(file.c_str()) != -1);
>  }
>  
> -std::ifstream* UIOpenRead(const string& filename)
> +std::ifstream* UIOpenRead(const string& filename, bool binary)

Can the Linux and OSX impls just go into `_unix_common` since they're identical?

::: toolkit/crashreporter/nsExceptionHandler.cpp
(Diff revision 4)
>  
>      CloseHandle(pi.hProcess);
>      CloseHandle(pi.hThread);
>    }
>  #elif defined(XP_UNIX)
> -#ifdef XP_MACOSX

Thanks for removing this code while you were here!

::: toolkit/crashreporter/nsExceptionHandler.cpp:843
(Diff revision 4)
> +#ifdef XP_LINUX
>      // need to clobber this, as libcurl might load NSS,
>      // and we want it to load the system NSS.
>      unsetenv("LD_LIBRARY_PATH");
> +#else // XP_MACOSX
> +    // Needed to locate NSS and its dependencies

I'm sure there's something smarter we could do here but I'm not overly motivated to make you find out. (Presumably we'd have to make the library references in the crashreporter binary be `@executable_path/../../../MacOS` or something like that.)
Attachment #8839844 - Flags: review?(ted) → review+
Comment on attachment 8839844 [details]
Bug 1322611 - After a crash compute the SHA256 hash of a minidump and attach it to the crash ping;

https://reviewboard.mozilla.org/r/114426/#review123900

> As a follow-up, maybe we could figure out a better error reporting story here? I guess we can wait and see if we get crash reports that are missing the dump hash to see if this actually occurs in practice.

Yeah, I've designed most of the error handling to fail silently without adding the annotation. If we measure that it's impacting a signficant number of users we can change it and do it in a smarter way.

> I know I've said it before, but now I'm practically shaking with glee at the thought of ripping all this code out and replacing it with Rust.

I feel your pain. I'll make it happen, I promise, I can't stand writing more of these C++ hacks myself :)

> Do you want an `if (!hash.empty())` here?

Yes, good catch.

> Can the Linux and OSX impls just go into `_unix_common` since they're identical?

Yes, I'll do it.

> I'm sure there's something smarter we could do here but I'm not overly motivated to make you find out. (Presumably we'd have to make the library references in the crashreporter binary be `@executable_path/../../../MacOS` or something like that.)

I tried using `@executable_path` but I couldn't get it to work and since I found no way of figuring out reliably what the correct path would have been I decided to stick with `DYLD_LIBRARY_PATH`. This is also going to go away with a Rust rewrite though :)
For some reason after 8h the try run is still running. I'll trigger another one; I don't want to land without being sure everything is green.
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8cddbe0c5f50
After a crash compute the SHA256 hash of a minidump and attach it to the crash ping; r=Ted, nchen
https://hg.mozilla.org/mozilla-central/rev/8cddbe0c5f50
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
I prepared the test cases [1] and ran them on Nightly 55.0a1 under Win 10 64-bit, Mac OS X 10.11 and Ubuntu 16.04 64-bit.
There were no new issues found so far.

I'm marking this bug as verified based on the testing made so far. We have planned the mid-Nightly sign-off for next week.

The test plan [2] was updated to reflect this information.

[1] https://testrail.stage.mozaws.net/index.php?/suites/view/925&group_by=cases:section_id&group_order=asc
[2] Test Plan: https://wiki.mozilla.org/QA/Client_Crash_Id
Status: RESOLVED → VERIFIED
Blocks: 1376574
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: