Closed Bug 1943465 Opened 15 days ago Closed 23 hours ago

Reduce the delay between batches of crashes that we reprocess when scraping system symbols

Categories

(Toolkit :: Crash Reporting, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
137 Branch
Tracking Status
firefox137 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1943243 highlighted the fact that we can attempt to reprocess an enormous amount of crashes. This only happens in uncommon scenarios, typically when old symbols for very common Windows libraries are purged from the symbol servers and then re-scraped. We should cap the maximum number of crashes we reprocess to 50k, or use a shorter time limit (like 3 days instead of a week). Or both. Either way we shouldn't let the script reprocess an arbitrarily large number of crashes.

I was about to land a fix for this but then I remembered about bug 1903945 (where we removed all caps) and in particular bug 1903945 comment 4. Will, an alternative way of dealing with this would be to reduce the waiting time between every batch of crashes sent for reprocessing (right now it's 5 seconds) as it would make handling large number of crashes quicker. Would that be a sensible choice or do you think it would put too much strain on Socorro's backend? We're talking about 70-80k crashes but this only seems to happen once in a blue moon, and my current solution would be to cap them to 50k.

Flags: needinfo?(willkg)

We (Observability Team) talked about this a bit. We think we should adjust how things are handled on the server side of things so we don't require wonky buffering of crash ids to be reprocessed by the user and instead move it to the server. I wrote up bug #1945653 to cover the changes we should make.

In the meantime, we don't think you should cap the number of crash reports being reprocessed. Further, we think you should drop the sleep time between reprocessing requests to a smaller number. We can see how that goes.

Flags: needinfo?(willkg)

Thank you very much Will! I'll drop the timeout from 5 to 3 seconds then, it should be more than enough to avoid a timeout like the one in bug 1943243.

Summary: Cap the maximum number of crashes that we reprocess when scraping system symbols → Reduce the delay between batches of crashes that we reprocess when scraping system symbols
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e4ffb353bed9 Reduce the delay between batches of crashes that we reprocess when scraping system symbols r=gerard-majax
Status: ASSIGNED → RESOLVED
Closed: 23 hours ago
Resolution: --- → FIXED
Target Milestone: --- → 137 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: