Closed Bug 1144311 Opened 10 years ago Closed 7 years ago

Investigate how many dupes we see

Categories

(Socorro :: General, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: selenamarie, Unassigned)

References

Details

Using the checksum field, look into how many dupes we're really seeing.
Blocks: 579136
I grabbed about 600k checksums from 3/13-3/14 (inclusive) and found the following: 3741 duplicate crashes, 3184 consisted of 2 crashes only Per day duplicates were pretty close to half of the two day totals. 24 crashes contained processed crashes whose max - min(date_processed) time was greater than 1 hour. The most repeated crash (138 times) had the signature 'mozilla::dom::indexedDB::IDBTransaction::IsOpen' and had a dump that couldn't be analyzed. Same with the other top duplicates that I spot checked. Our 'reports_duplicates' table contains 9374 "duplicate" crashes for 3/13 and 7982 "duplicates" for 3/14.
A duplicates table based on checksums appears to need about 25 MB/day to maintain. I'd say just truncate this data after 4 weeks, and regenerate it if needed for backfilling.
(In reply to Selena Deckelmann :selenamarie :selena from comment #1) > The most repeated crash (138 times) had the signature > 'mozilla::dom::indexedDB::IDBTransaction::IsOpen' and had a dump that > couldn't be analyzed. Same with the other top duplicates that I spot checked. I wonder how a dump cannot be analyzed if we have a signature, which IIRC need analysis of the dump. > Our 'reports_duplicates' table contains 9374 "duplicate" crashes for 3/13 > and 7982 "duplicates" for 3/14. That older duplicate algorithm used for that was intentionally created to detect not only actual duplicates but also mark crashes that are similar but nor real duplicates. So it's expected that this table contains way more entries.
If a dump produces a signature I would expect it to be valid. If we were seeing duplicates of dumps that don't get signatures it might be that there are common types of malformed dumps. That would be interesting to know.
:ted.mielczarek - I read the output wrong. It was just exploitability that wasn't able to analyze the dump, rather than the stackwalker. Here's records from the top 10 dupes: count|21307 checksum|d41d8cd9-8f00-b204-e980-0998ecf8427e crash_id|0001da4c-f86d-4d3d-a9d7-589082150313 processor_notes|sp-processor10_phx1_mozilla_com.1142:2015; MozillaProcessorAlgorithm2015; MDSW failed on 'timeout -s KILL 600 /data/socorro/stackwalk/bin/stackwalker --raw-json /tmp/0001da4c-f86d-4d3d-a9d7-589082150313.Thread-6.TEMPORARY.json --symbols-url https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1 --symbols-url https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/v1 --symbols-cache /tmp/symbols /home/socorro/temp/0001da4c-f86d-4d3d-a9d7-589082150313.upload_file_minidump.TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify the crashing thread; exploitability information missing; no 'topmost_file' name because ''crash_info'' is missing; CSignatureTool: No signature could be created because we do not know which thread crashed; skunk_classifier: reject - not a plugin hang signature|EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER count|138 checksum|45b43275-e8a3-6e7e-c13c-81dc77509d30 crash_id|00833826-1bf2-4cbc-aac9-d2ecb2150314 processor_notes|sp-processor05_phx1_mozilla_com.1959:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature|mozilla::dom::indexedDB::IDBTransaction::IsOpen count|94 checksum|7d87d880-dd17-2cd8-a437-524643965afd crash_id|0283c466-0aaa-4aea-aec0-03e172150314 processor_notes|sp-processor02_phx1_mozilla_com.23358:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature|snapshot_start count|94 checksum|67849b05-06ec-c72b-aad4-f18546a8715e crash_id|068265a5-78e8-450a-af59-fd6702150314 processor_notes|sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature|mozalloc_abort | NS_DebugBreak_P | mozilla::dom::ContentChild::ProcessingError count|90 checksum|be13c247-ac93-88bd-6d8e-a9ed6d3adf29 crash_id|00c13e57-4cbd-4206-8845-77e4c2150314 processor_notes|sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature|mozalloc_abort | NS_DebugBreak_P | mozilla::dom::ContentChild::ProcessingError count|83 checksum|f7d46b1d-300c-319a-f3e6-a94b40b39bbb crash_id|00408f80-ad46-4bdb-8ccc-99afa2150314 processor_notes|sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang signature|android_atomic_or count|62 checksum|fc40439d-0536-23a7-b646-e7e0af9a5df1 crash_id|0366a8ed-0c87-4d8d-b6a2-963292150314 processor_notes|sp-processor04_phx1_mozilla_com.13206:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature|snapshot_start count|60 checksum|b63af555-581f-66f5-6cb6-eb3222343362 crash_id|03379328-90b6-4c7f-a3bc-714fa2150314 processor_notes|sp-processor05_phx1_mozilla_com.1959:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature|snapshot_start count|54 checksum|e98e5c34-9f8b-79fa-c2c6-4e523f54d4cb crash_id|04ec8649-c945-4c3e-b794-ae73d2150313 processor_notes|sp-processor06_phx1_mozilla_com.24826:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature|@0x0 | mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived count|48 checksum|db9d9465-13e0-6606-a14c-a068fec5eb9e crash_id|06b73e58-425d-4516-9cb5-8121c2150313 processor_notes|sp-processor09_phx1_mozilla_com.4163:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang signature|strlen | __vfprintf
Here's the top 10 from a similar SQL query but using the 'reports_duplicates' table (old dupe detection) instead: -[ RECORD 1 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 184 duplicate_of | 00c13e57-4cbd-4206-8845-77e4c2150314 processor_notes | sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature | mozalloc_abort | NS_DebugBreak_P | mozilla::dom::ContentChild::ProcessingError -[ RECORD 2 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 93 duplicate_of | 0283c466-0aaa-4aea-aec0-03e172150314 processor_notes | sp-processor02_phx1_mozilla_com.23358:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature | snapshot_start -[ RECORD 3 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 83 duplicate_of | 00408f80-ad46-4bdb-8ccc-99afa2150314 processor_notes | sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang signature | android_atomic_or -[ RECORD 4 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 65 duplicate_of | da8538e1-70f0-4abe-b2af-3381c2150314 processor_notes | sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang signature | GetPropCompiler::generateStub(JSObject*, js::Shape const*) -[ RECORD 5 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 62 duplicate_of | 0366a8ed-0c87-4d8d-b6a2-963292150314 processor_notes | sp-processor04_phx1_mozilla_com.13206:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature | snapshot_start -[ RECORD 6 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 60 duplicate_of | 03379328-90b6-4c7f-a3bc-714fa2150314 processor_notes | sp-processor05_phx1_mozilla_com.1959:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature | snapshot_start -[ RECORD 7 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 58 duplicate_of | f68e57d5-d390-4d89-a359-d52d22150314 processor_notes | sp-processor07_phx1_mozilla_com.7654:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang signature | `anonymous namespace''::WorkerThreadProxySyncRunnable::Dispatch(JSContext*) -[ RECORD 8 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 54 duplicate_of | 04ec8649-c945-4c3e-b794-ae73d2150313 processor_notes | sp-processor06_phx1_mozilla_com.24826:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature | @0x0 | mozilla::dom::indexedDB::PIndexedDBRequestChild::OnMessageReceived -[ RECORD 9 ]---+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 51 duplicate_of | 02f95a4e-0232-4d6f-a6d4-003302150314 processor_notes | sp-processor03_phx1_mozilla_com.19110:2015; MozillaProcessorAlgorithm2015; non-integer value of "SecondsSinceLastCrash"; skunk_classifier: reject - not a plugin hang signature | mozilla::dom::indexedDB::IDBTransaction::IsOpen -[ RECORD 10 ]--+---------------------------------------------------------------------------------------------------------------------------------------------------------------------- count | 46 duplicate_of | 01432ff3-049c-40b8-8d2f-5b6702150313 processor_notes | sp-processor09_phx1_mozilla_com.4163:2015; MozillaProcessorAlgorithm2015; skunk_classifier: reject - not a plugin hang signature | strlen | __vfprintf
So my question is -- what's a reasonable way to determine how similar these approaches are? I was going to start with a simple diff on the signatures that are determined to be duplicates (what shows up in the checksum vs the old duplicates detection). And I counted the wrong thing earlier -- The old duplicate detection found 17665 duplicate crashes across 5422 signatures from 3/14-3/15. The duplicate detection by checksum found 31253 duplicate crashes across 3742 signatures from 3/14-3/15.
Oops - duplicate detection by checksum found 9946 duplicate crashes that were processed. The remaining crashes had the stackwalker fail because of a lack of a minidump header.
(In reply to Selena Deckelmann :selenamarie :selena from comment #5) > Here's records from the top 10 dupes: > > count|21307 > checksum|d41d8cd9-8f00-b204-e980-0998ecf8427e > crash_id|0001da4c-f86d-4d3d-a9d7-589082150313 > processor_notes|sp-processor10_phx1_mozilla_com.1142:2015; > MozillaProcessorAlgorithm2015; MDSW failed on 'timeout -s KILL 600 > /data/socorro/stackwalk/bin/stackwalker --raw-json > /tmp/0001da4c-f86d-4d3d-a9d7-589082150313.Thread-6.TEMPORARY.json > --symbols-url > https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1 > --symbols-url > https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/ > v1 --symbols-cache /tmp/symbols > /home/socorro/temp/0001da4c-f86d-4d3d-a9d7-589082150313.upload_file_minidump. > TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify > the crashing thread; This is most likely a timeout downloading symbols (the timeout command killed stackwalker) - how many of these do we have?
(In reply to Robert Helmer [:rhelmer] from comment #9) > (In reply to Selena Deckelmann :selenamarie :selena from comment #5) > > Here's records from the top 10 dupes: > > > > count|21307 > > checksum|d41d8cd9-8f00-b204-e980-0998ecf8427e > > crash_id|0001da4c-f86d-4d3d-a9d7-589082150313 > > processor_notes|sp-processor10_phx1_mozilla_com.1142:2015; > > MozillaProcessorAlgorithm2015; MDSW failed on 'timeout -s KILL 600 > > /data/socorro/stackwalk/bin/stackwalker --raw-json > > /tmp/0001da4c-f86d-4d3d-a9d7-589082150313.Thread-6.TEMPORARY.json > > --symbols-url > > https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-public/v1 > > --symbols-url > > https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.symbols-private/ > > v1 --symbols-cache /tmp/symbols > > /home/socorro/temp/0001da4c-f86d-4d3d-a9d7-589082150313.upload_file_minidump. > > TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify > > the crashing thread; > > This is most likely a timeout downloading symbols (the timeout command > killed stackwalker) - how many of these do we have? Actually let me double-check this, I might be wrong.
(In reply to Robert Helmer [:rhelmer] from comment #9) > > TEMPORARY.dump 2>/dev/null': ERROR_NO_MINIDUMP_HEADER; MDSW did not identify > > the crashing thread; > > This is most likely a timeout downloading symbols (the timeout command > killed stackwalker) - how many of these do we have? Nope, that's an empty minidump. I wouldn't worry about it.
(In reply to Selena Deckelmann :selenamarie :selena from comment #5) > signature|EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER Those are NOT dupes and should not be eliminated. Those are empty minidumps. > signature|snapshot_start Ouch. Those are B2G crashes from the ZTE Open. I wonder if we end up re-sending the same crash over and over or if the devices are so similar that the dumps end up identical due to that. There's definitely some interesting ones here, though, I wonder if we can find out why we end up with duplicates esp. for those DOM signatures.
The minidump header includes a timestamp: https://code.google.com/p/google-breakpad/source/browse/trunk/src/google_breakpad/common/minidump_format.h#260 so if we're seeing byte-identical minidumps that aren't empty or full of zeroes they are the same crash submitted more than once.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #13) > The minidump header includes a timestamp: > https://code.google.com/p/google-breakpad/source/browse/trunk/src/ > google_breakpad/common/minidump_format.h#260 > > so if we're seeing byte-identical minidumps that aren't empty or full of > zeroes they are the same crash submitted more than once. That makes those DOM and B2G dupes even more interesting to investigate. We should never be sending the same crash more than 50 times, even more so from a phone.
We are moving towards crash pings as the source of record for crash rates going forward. We are unlikely to invest more in deduplication, except as part of some proposed data compaction plans.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.