Reduce the delay between batches of crashes that we reprocess when scraping system symbols
Categories
(Toolkit :: Crash Reporting, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox137 | --- | fixed |
People
(Reporter: gsvelto, Assigned: gsvelto)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Bug 1943243 highlighted the fact that we can attempt to reprocess an enormous amount of crashes. This only happens in uncommon scenarios, typically when old symbols for very common Windows libraries are purged from the symbol servers and then re-scraped. We should cap the maximum number of crashes we reprocess to 50k, or use a shorter time limit (like 3 days instead of a week). Or both. Either way we shouldn't let the script reprocess an arbitrarily large number of crashes.
Assignee | ||
Comment 1•8 days ago
|
||
I was about to land a fix for this but then I remembered about bug 1903945 (where we removed all caps) and in particular bug 1903945 comment 4. Will, an alternative way of dealing with this would be to reduce the waiting time between every batch of crashes sent for reprocessing (right now it's 5 seconds) as it would make handling large number of crashes quicker. Would that be a sensible choice or do you think it would put too much strain on Socorro's backend? We're talking about 70-80k crashes but this only seems to happen once in a blue moon, and my current solution would be to cap them to 50k.
Comment 2•4 days ago
|
||
We (Observability Team) talked about this a bit. We think we should adjust how things are handled on the server side of things so we don't require wonky buffering of crash ids to be reprocessed by the user and instead move it to the server. I wrote up bug #1945653 to cover the changes we should make.
In the meantime, we don't think you should cap the number of crash reports being reprocessed. Further, we think you should drop the sleep time between reprocessing requests to a smaller number. We can see how that goes.
Assignee | ||
Comment 3•3 days ago
|
||
Thank you very much Will! I'll drop the timeout from 5 to 3 seconds then, it should be more than enough to avoid a timeout like the one in bug 1943243.
Assignee | ||
Updated•2 days ago
|
Assignee | ||
Comment 4•2 days ago
|
||
Updated•2 days ago
|
Description
•