Closed
Bug 1408041
Opened 7 years ago
Closed 6 years ago
[tracker] expose MinidumpSha256Hash
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: lonnen, Unassigned)
References
(Blocks 1 open bug)
Details
Bug 1322611 introduced MinidumpSha256Hash to the breakpad crash. Socorro should use this in a few places: 1. verify the client generated value and, if mismatched, alert and abort before it ends up in a public dataset. a processor rule seems like the right place for this. 2. send this field with other publicly available data to telemetry so we can join data downstream 3. expose this as a field in supersearch for easier cross-system lookups between telemetry and socorro uncertain of this last one: 4a. the search box in the header should, when provided a sha256 hash, load the most recently submitted report with this sha -or- 4b. the search box in the header should, when provided a sha256 hash, load the search page for this sha iff more than one crash has this exact sha, or load the report directly if a single crash has this sha
Comment 1•7 years ago
|
||
Slight correction here: bug 1322611 added that field to the telemetry crash ping only, not the data that gets submitted to Socorro, so step 1 will need to be "calculate the SHA-256 hash of the minidump file". We already have md5sums of minidump files in Socorro (as `dump_checksums`) so presumably there's already a straightforward place to shoehorn this. I'm not sure if either bit of step 4 is needed, but I don't think we have enough experience with the crash ping to know what the workloads there will look like. As long as there's some way to take the hash from a crash ping and look up a corresponding crash in Socorro that should be sufficient to start. Having the data in the telemetry pipeline is probably the most important thing, so that we'll be able to write an analysis over crash ping data and make it join to Socorro crashes when available.
Comment 2•7 years ago
|
||
I assumed that we were only going to go the other way (from Socorro -> ping) because that field was intended to only be available to logged-in users, and that workflow would enforce that.
Comment 3•7 years ago
|
||
It doesn't seem particularly harmful to make it public, in any event. It's not PII in and of itself, and it won't allow you to get any PII you didn't already have access to.
Comment 4•7 years ago
|
||
I vaguely remember Benjamin mentioning that the problem with making the hash public was about enabling an external onlooker to correlate the two and recover data that way. I can't find on which bug that was though.
Comment 5•7 years ago
|
||
Unless you can source that, I'm unclear why correlating two sets of publicly available information (both of which are basically the same thing in slightly different forms) is any more of a risk to user privacy.
Comment 6•7 years ago
|
||
Crash reports can have multiple minidump files. Are we explicitly talking about the one named "upload_file_minidump"? If there isn't one (pretty sure I've seen crashes like this), then what value should the MinidumpSha256Hash field have? Empty string? Null/None?
Comment 7•7 years ago
|
||
Exposing just the one from upload_file_minidump is probably sufficient. If not we can figure out how to expose the others in a followup. Are there really crashes that are completely missing a minidump? I know we have plenty of examples where the minidump is zero bytes. In any event, having a Null/None for that case would be fine.
Comment 8•7 years ago
|
||
I don't know if they're "valid crashes", but when I was working on Antenna I saw some crashes with no "upload_file_minidump" and it's one of the scenarios I was testing with. If we're doing sha256 hashes of multiple dumps, then we probably want to make this field a dict of dump name -> sha256 hash. Does that sound right? If that's true, then it seems like we might be better off replacing dump_checksums since we'll effectively have the same data in two places. I don't honestly know what uses dump_checksums. I see it being produced in Socorro. There are a couple of places where we use the keys to figure out what dumps are associated with the raw crash. Otherwise, I don't see it being used anywhere.
Comment 9•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #8) > If we're doing sha256 hashes of multiple dumps, then we probably want to > make this field a dict of dump name -> sha256 hash. Does that sound right? We could do that, as long as it's straightforward to query for the data for upload_file_minidump, since that's all we're going to care about for the purposes of matching crashes between Socorro and the crash ping for now. > If that's true, then it seems like we might be better off replacing > dump_checksums since we'll effectively have the same data in two places. I > don't honestly know what uses dump_checksums. I see it being produced in > Socorro. There are a couple of places where we use the keys to figure out > what dumps are associated with the raw crash. Otherwise, I don't see it > being used anywhere. I'm not aware of anything that uses dump_checksums. I think it was probably part of the work to filter out duplicate crashes that happened a while back. If so, I don't think swapping a md5 hash for a sha-256 hash is harmful (and is actually better in practice).
Comment 10•7 years ago
|
||
When working on analysis for duplicate crash reports, I looked at 14,000 Firefox crash reports. Of those 14,000, 418 had 0-byte upload_file_minidump files. We need to keep that in mind here, too. Maybe we ignore 0-byte upload_file_minidump files.
Comment 11•7 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #10) > We need to keep that in mind here, too. Maybe we ignore 0-byte > upload_file_minidump files. It's not super important what exactly we do with those since they don't have any actionable data in the dump anyway, so there won't be any real use in correlating a crash ping to a crash in Socorro for those.
Comment 12•7 years ago
|
||
Ted: Is that true even if they have other dumps that aren't 0 bytes? For example, I was looking at this one when I was looking at 14,000 crashes earlier: https://crash-stats.mozilla.com/report/index/6ba7b1b2-1e66-4b2c-93e9-e63540171011 That has an upload_file_minidump_browser.
Comment 13•7 years ago
|
||
Yes. The other dumps aren't crashes, they're just showing the state of other processes at the time of that crash. In this case we killed a content process because of an IPC error, so it's useful to know what the browser process was doing at that time. However, if we don't have an actual minidump for the content process then we don't have a stack, and we won't have anything to put in the crash ping anyway.
Comment 14•6 years ago
|
||
I landed code in the collector to calculate the MinidumpSha256Hash and code in the processor to carry that along and make sure it ends up in the crash data Socorro sends to Telemetry. Further, minidump_sha256_hash is a searchable field in Super Search. That was done under bug #1409427. It'll get pushed to prod next week. The only thing that's not done is item 4. I'm voting we push that off to a separate bug when someone asks for it.
Comment 15•6 years ago
|
||
It's in prod! Marking this as FIXED.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•