Closed
Bug 1144311
Opened 10 years ago
Closed 7 years ago
Investigate how many dupes we see
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: selenamarie, Unassigned)
References
Details
Using the checksum field, look into how many dupes we're really seeing.
Reporter | ||
Comment 1•10 years ago
|
||
I grabbed about 600k checksums from 3/13-3/14 (inclusive) and found the following:
3741 duplicate crashes, 3184 consisted of 2 crashes only
Per day duplicates were pretty close to half of the two day totals.
24 crashes contained processed crashes whose max - min(date_processed) time was greater than 1 hour.
The most repeated crash (138 times) had the signature 'mozilla::dom::indexedDB::IDBTransaction::IsOpen' and had a dump that couldn't be analyzed. Same with the other top duplicates that I spot checked.
Our 'reports_duplicates' table contains 9374 "duplicate" crashes for 3/13 and 7982 "duplicates" for 3/14.
Reporter | ||
Comment 2•10 years ago
|
||
A duplicates table based on checksums appears to need about 25 MB/day to maintain. I'd say just truncate this data after 4 weeks, and regenerate it if needed for backfilling.
Comment 3•10 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #1)
> The most repeated crash (138 times) had the signature
> 'mozilla::dom::indexedDB::IDBTransaction::IsOpen' and had a dump that
> couldn't be analyzed. Same with the other top duplicates that I spot checked.
I wonder how a dump cannot be analyzed if we have a signature, which IIRC need analysis of the dump.
> Our 'reports_duplicates' table contains 9374 "duplicate" crashes for 3/13
> and 7982 "duplicates" for 3/14.
That older duplicate algorithm used for that was intentionally created to detect not only actual duplicates but also mark crashes that are similar but nor real duplicates. So it's expected that this table contains way more entries.
Comment 4•10 years ago
|
||
If a dump produces a signature I would expect it to be valid. If we were seeing duplicates of dumps that don't get signatures it might be that there are common types of malformed dumps. That would be interesting to know.
Reporter | ||
Comment 5•10 years ago
|
||
:ted.mielczarek - I read the output wrong. It was just exploitability that wasn't able to analyze the dump, rather than the stackwalker.
Here's records from the top 10 dupes:
count|21307
checksum|d41d8cd9-8f00-b204-e980-0998ecf8427e
crash_id|0001da4c-f86d-4d3d-a9d7-589082150313
processor_notes|sp-processor10_phx1_mozilla_com.1142:2015; MozillaProcessorAlgorithm2015; MDSW failed on 'timeout -s KILL 600 /data/socorro/stackwalk/bin/stackwalker --raw-json /tmp/0001da4c-f86d-4d3d-a9d7-589082150313.Thread-6.TEMPORARY.json --symbols-url https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1 --symbols-url https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/v1 --symbols-cache /tmp/symbols /home/socorro/temp/0001da4c-f86d-4d3d-a9d7-589082150313.upload_file_minidump.TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify the crashing thread; exploitability information missing; no 'topmost_file' name because ''crash_info'' is missing; CSignatureTool: No signature could be created because we do not know which thread crashed; skunk_classifier: reject - not a plugin hang
signature|EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER
count|138
checksum|45b43275-e8a3-6e7e-c13c-81dc77509d30
crash_id|00833826-1bf2-4cbc-aac9-d2ecb2150314
processor_notes|sp-processor05_phx1_mozilla_com.1959:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature|mozilla::dom::indexedDB::IDBTransaction::IsOpen
count|94
checksum|7d87d880-dd17-2cd8-a437-524643965afd
crash_id|0283c466-0aaa-4aea-aec0-03e172150314
processor_notes|sp-processor02_phx1_mozilla_com.23358:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature|snapshot_start
count|94
checksum|67849b05-06ec-c72b-aad4-f18546a8715e
crash_id|068265a5-78e8-450a-af59-fd6702150314
processor_notes|sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature|mozalloc_abort | NS_DebugBreak_P | mozilla::dom::ContentChild::ProcessingError
count|90
checksum|be13c247-ac93-88bd-6d8e-a9ed6d3adf29
crash_id|00c13e57-4cbd-4206-8845-77e4c2150314
processor_notes|sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature|mozalloc_abort | NS_DebugBreak_P | mozilla::dom::ContentChild::ProcessingError
count|83
checksum|f7d46b1d-300c-319a-f3e6-a94b40b39bbb
crash_id|00408f80-ad46-4bdb-8ccc-99afa2150314
processor_notes|sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
signature|android_atomic_or
count|62
checksum|fc40439d-0536-23a7-b646-e7e0af9a5df1
crash_id|0366a8ed-0c87-4d8d-b6a2-963292150314
processor_notes|sp-processor04_phx1_mozilla_com.13206:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature|snapshot_start
count|60
checksum|b63af555-581f-66f5-6cb6-eb3222343362
crash_id|03379328-90b6-4c7f-a3bc-714fa2150314
processor_notes|sp-processor05_phx1_mozilla_com.1959:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature|snapshot_start
count|54
checksum|e98e5c34-9f8b-79fa-c2c6-4e523f54d4cb
crash_id|04ec8649-c945-4c3e-b794-ae73d2150313
processor_notes|sp-processor06_phx1_mozilla_com.24826:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature|@0x0 | mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived
count|48
checksum|db9d9465-13e0-6606-a14c-a068fec5eb9e
crash_id|06b73e58-425d-4516-9cb5-8121c2150313
processor_notes|sp-processor09_phx1_mozilla_com.4163:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
signature|strlen | __vfprintf
Reporter | ||
Comment 6•10 years ago
|
||
Here's the top 10 from a similar SQL query but using the 'reports_duplicates' table (old dupe detection) instead:
-[ RECORD 1 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 184
duplicate_of | 00c13e57-4cbd-4206-8845-77e4c2150314
processor_notes | sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature | mozalloc_abort | NS_DebugBreak_P | mozilla::dom::ContentChild::ProcessingError
-[ RECORD 2 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 93
duplicate_of | 0283c466-0aaa-4aea-aec0-03e172150314
processor_notes | sp-processor02_phx1_mozilla_com.23358:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature | snapshot_start
-[ RECORD 3 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 83
duplicate_of | 00408f80-ad46-4bdb-8ccc-99afa2150314
processor_notes | sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
signature | android_atomic_or
-[ RECORD 4 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 65
duplicate_of | da8538e1-70f0-4abe-b2af-3381c2150314
processor_notes | sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
signature | GetPropCompiler::generateStub(JSObject*, js::Shape const*)
-[ RECORD 5 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 62
duplicate_of | 0366a8ed-0c87-4d8d-b6a2-963292150314
processor_notes | sp-processor04_phx1_mozilla_com.13206:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature | snapshot_start
-[ RECORD 6 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 60
duplicate_of | 03379328-90b6-4c7f-a3bc-714fa2150314
processor_notes | sp-processor05_phx1_mozilla_com.1959:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature | snapshot_start
-[ RECORD 7 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 58
duplicate_of | f68e57d5-d390-4d89-a359-d52d22150314
processor_notes | sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
signature | `anonymous namespace''::WorkerThreadProxySyncRunnable::Dispatch(JSContext*)
-[ RECORD 8 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 54
duplicate_of | 04ec8649-c945-4c3e-b794-ae73d2150313
processor_notes | sp-processor06_phx1_mozilla_com.24826:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature | @0x0 | mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived
-[ RECORD 9 ]---+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 51
duplicate_of | 02f95a4e-0232-4d6f-a6d4-003302150314
processor_notes | sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang
signature | mozilla::dom::indexedDB::IDBTransaction::IsOpen
-[ RECORD 10 ]--+----------------------------------------------------------------------------------------------------------------------------------------------------------------------
count | 46
duplicate_of | 01432ff3-049c-40b8-8d2f-5b6702150313
processor_notes | sp-processor09_phx1_mozilla_com.4163:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang
signature | strlen | __vfprintf
Reporter | ||
Comment 7•10 years ago
|
||
So my question is -- what's a reasonable way to determine how similar these approaches are?
I was going to start with a simple diff on the signatures that are determined to be duplicates (what shows up in the checksum vs the old duplicates detection). And I counted the wrong thing earlier --
The old duplicate detection found 17665 duplicate crashes across 5422 signatures from 3/14-3/15.
The duplicate detection by checksum found 31253 duplicate crashes across 3742 signatures from 3/14-3/15.
Reporter | ||
Comment 8•10 years ago
|
||
Oops - duplicate detection by checksum found 9946 duplicate crashes that were processed. The remaining crashes had the stackwalker fail because of a lack of a minidump header.
Comment 9•10 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #5)
> Here's records from the top 10 dupes:
>
> count|21307
> checksum|d41d8cd9-8f00-b204-e980-0998ecf8427e
> crash_id|0001da4c-f86d-4d3d-a9d7-589082150313
> processor_notes|sp-processor10_phx1_mozilla_com.1142:2015;
> MozillaProcessorAlgorithm2015; MDSW failed on 'timeout -s KILL 600
> /data/socorro/stackwalk/bin/stackwalker --raw-json
> /tmp/0001da4c-f86d-4d3d-a9d7-589082150313.Thread-6.TEMPORARY.json
> --symbols-url
> https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1
> --symbols-url
> https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/
> v1 --symbols-cache /tmp/symbols
> /home/socorro/temp/0001da4c-f86d-4d3d-a9d7-589082150313.upload_file_minidump.
> TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify
> the crashing thread;
This is most likely a timeout downloading symbols (the timeout command killed stackwalker) - how many of these do we have?
Comment 10•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #9)
> (In reply to Selena Deckelmann :selenamarie :selena from comment #5)
> > Here's records from the top 10 dupes:
> >
> > count|21307
> > checksum|d41d8cd9-8f00-b204-e980-0998ecf8427e
> > crash_id|0001da4c-f86d-4d3d-a9d7-589082150313
> > processor_notes|sp-processor10_phx1_mozilla_com.1142:2015;
> > MozillaProcessorAlgorithm2015; MDSW failed on 'timeout -s KILL 600
> > /data/socorro/stackwalk/bin/stackwalker --raw-json
> > /tmp/0001da4c-f86d-4d3d-a9d7-589082150313.Thread-6.TEMPORARY.json
> > --symbols-url
> > https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1
> > --symbols-url
> > https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/
> > v1 --symbols-cache /tmp/symbols
> > /home/socorro/temp/0001da4c-f86d-4d3d-a9d7-589082150313.upload_file_minidump.
> > TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify
> > the crashing thread;
>
> This is most likely a timeout downloading symbols (the timeout command
> killed stackwalker) - how many of these do we have?
Actually let me double-check this, I might be wrong.
Comment 11•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #9)
> > TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify
> > the crashing thread;
>
> This is most likely a timeout downloading symbols (the timeout command
> killed stackwalker) - how many of these do we have?
Nope, that's an empty minidump. I wouldn't worry about it.
Comment 12•10 years ago
|
||
(In reply to Selena Deckelmann :selenamarie :selena from comment #5)
> signature|EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER
Those are NOT dupes and should not be eliminated. Those are empty minidumps.
> signature|snapshot_start
Ouch. Those are B2G crashes from the ZTE Open. I wonder if we end up re-sending the same crash over and over or if the devices are so similar that the dumps end up identical due to that.
There's definitely some interesting ones here, though, I wonder if we can find out why we end up with duplicates esp. for those DOM signatures.
Comment 13•10 years ago
|
||
The minidump header includes a timestamp: https://code.google.com/p/google-breakpad/source/browse/trunk/src/google_breakpad/common/minidump_format.h#260
so if we're seeing byte-identical minidumps that aren't empty or full of zeroes they are the same crash submitted more than once.
Comment 14•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13)
> The minidump header includes a timestamp:
> https://code.google.com/p/google-breakpad/source/browse/trunk/src/
> google_breakpad/common/minidump_format.h#260
>
> so if we're seeing byte-identical minidumps that aren't empty or full of
> zeroes they are the same crash submitted more than once.
That makes those DOM and B2G dupes even more interesting to investigate. We should never be sending the same crash more than 50 times, even more so from a phone.
Comment 15•7 years ago
|
||
We are moving towards crash pings as the source of record for crash rates going forward. We are unlikely to invest more in deduplication, except as part of some proposed data compaction plans.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•