[research] look into dip in process_crash mean times
Categories
(Socorro :: General, task, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
Details
Attachments
(1 file)
|
164.46 KB,
image/png
|
Details |
The socorro processor process_crash mean time has been pretty steady at slightly-over-6s for a long time. Then around April 21st, it dipped to slightly-under-4s with clear day/night cycles. Then on 4/30, the mean started going up to about 6s again.
This bug covers looking into the dip. What happened?
| Assignee | ||
Comment 1•5 years ago
|
||
Brian did a graph of crashes processed over the last 30 days. Looks like there's an increase in crashes processed between April 20th and April 30th, then it goes down again.
Maybe the processor was processing more crashes during this period that didn't have minidumps?
In bug #1635164, Gabriele opined that shutdownkill crash reports are steadily increasing. Maybe this is related to that?
Maybe this is Fission-related?
Maybe this is Fenix-related? When Fenix crashes in Java-land, it doesn't send a crash report.
It'd be nice if we had tools to look at crashes by (some filter) by day. I know I was working on something like that with crashstats-tools, but I don't think I finished it.
Grabbing this to look into.
| Assignee | ||
Comment 2•5 years ago
|
||
Looking at the last month of crash report counts, we have this:
| date | Fenix | Fennec | FennecAndroid | Firefox | FirefoxReality | Focus | GeckoViewExample | ReferenceBrowser | SeaMonkey | Thunderbird |
|---|---|---|---|---|---|---|---|---|---|---|
| 2020-04-04 | 8259 | 21 | 35813 | 78229 | 26 | 1918 | 0 | 0 | 273 | 19995 |
| 2020-04-05 | 8059 | 10 | 36817 | 77587 | 36 | 1806 | 0 | 0 | 243 | 18496 |
| 2020-04-06 | 8555 | 8 | 36224 | 109945 | 11 | 1989 | 0 | 0 | 302 | 50670 |
| 2020-04-07 | 9231 | 10 | 35393 | 113812 | 20 | 2044 | 0 | 13 | 233 | 48483 |
| 2020-04-08 | 9609 | 13 | 35702 | 111221 | 33 | 2109 | 0 | 0 | 251 | 47321 |
| 2020-04-09 | 10948 | 8 | 34964 | 102698 | 30 | 2008 | 0 | 1 | 275 | 44669 |
| 2020-04-10 | 12331 | 11 | 36093 | 93720 | 35 | 2033 | 0 | 0 | 262 | 36192 |
| 2020-04-11 | 12101 | 14 | 35626 | 77794 | 25 | 1928 | 0 | 3 | 234 | 19380 |
| 2020-04-12 | 11844 | 12 | 36468 | 73830 | 23 | 1917 | 0 | 3 | 233 | 16561 |
| 2020-04-13 | 12100 | 9 | 37862 | 101718 | 21 | 2112 | 0 | 1 | 277 | 34590 |
| 2020-04-14 | 12177 | 4 | 36015 | 112043 | 40 | 2513 | 0 | 3 | 316 | 50639 |
| 2020-04-15 | 14155 | 6 | 35844 | 109062 | 33 | 2437 | 0 | 2 | 354 | 47875 |
| 2020-04-16 | 14069 | 2 | 35196 | 105707 | 32 | 2395 | 0 | 2 | 349 | 47049 |
| 2020-04-17 | 16022 | 10 | 34821 | 103113 | 29 | 2385 | 0 | 0 | 256 | 42944 |
| 2020-04-18 | 17326 | 14 | 35850 | 77277 | 35 | 2471 | 0 | 2 | 238 | 19300 |
| 2020-04-19 | 16419 | 12 | 35623 | 79855 | 41 | 2358 | 0 | 5 | 232 | 18047 |
| 2020-04-20 | 20887 | 7 | 35707 | 113964 | 58 | 2312 | 0 | 4 | 308 | 48999 |
| 2020-04-21 | 36660 | 12 | 35293 | 105699 | 151 | 2439 | 0 | 4 | 310 | 46486 |
| 2020-04-22 | 49339 | 4 | 34775 | 107905 | 63 | 2306 | 0 | 0 | 327 | 47646 |
| 2020-04-23 | 54797 | 11 | 34822 | 103090 | 144 | 2316 | 0 | 0 | 328 | 45790 |
| 2020-04-24 | 67300 | 13 | 33646 | 101181 | 175 | 2349 | 0 | 3 | 327 | 43571 |
| 2020-04-25 | 93088 | 8 | 34755 | 76605 | 160 | 2167 | 1 | 1 | 263 | 18464 |
| 2020-04-26 | 104930 | 8 | 34980 | 74319 | 148 | 2353 | 1 | 2 | 294 | 17843 |
| 2020-04-27 | 114474 | 17 | 33101 | 107444 | 120 | 2190 | 0 | 5 | 301 | 49195 |
| 2020-04-28 | 121653 | 12 | 34139 | 106058 | 165 | 2350 | 0 | 4 | 308 | 48321 |
| 2020-04-29 | 118677 | 10 | 33884 | 107850 | 136 | 2238 | 0 | 2 | 314 | 46651 |
| 2020-04-30 | 111292 | 6 | 33307 | 107102 | 180 | 2316 | 0 | 5 | 340 | 46910 |
| 2020-05-01 | 91442 | 8 | 34350 | 84696 | 239 | 2338 | 0 | 1 | 295 | 26367 |
| 2020-05-02 | 54798 | 7 | 33244 | 73302 | 155 | 2107 | 0 | 2 | 261 | 19463 |
| 2020-05-03 | 41636 | 6 | 34102 | 71861 | 128 | 2223 | 0 | 2 | 252 | 18386 |
| 2020-05-04 | 36904 | 6 | 33263 | 103017 | 170 | 2247 | 0 | 1 | 324 | 47256 |
| 2020-05-05 | 18551 | 8 | 15880 | 52720 | 77 | 1143 | 0 | 13 | 149 | 23683 |
During the date range we're curious about, Fenix sees a dramatic increase in processed crash reports. Fenix crash reports won't have a minidump if they're crashes in Java-land and that means minidump-stackwalk isn't run. That rule is the bulk of processing, so getting a spike of crash reports that don't run that rule in processing will drop the mean significantly.
Ergo, I think Socorro is fine here.
The one thing I think I wan to do is split the process_crash time somehow so we're looking at process_crash time for processing that runs minidump stackwalk separately from process_crash time for processing that doesn't run minidump-stackwalk.
I'm going to defer thinking about that until something like this happens again.
Marking this as FIXED.
Description
•