Closed Bug 1848831 Opened 1 year ago Closed 1 year ago

"EMPTY: no frame data available; unknown error" frequently used as signature because stackwalker takes too long to process report

Categories

(Socorro :: Processor, defect, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Assigned: willkg)

References

Details

related crash reports

Last week "EMPTY: no frame data available; unknown error" got frequent as crash signature. Will explained this as the stackwalker taking too long to process the crashes. He kicked off reprocessing of ~5k crashes.

At the moment, we have ~1.1k crashes (see link above) with the signature, and they are distributed across versions and builds. The frequency might have increased today compared to yesterday: Firefox 117.0b7 has 34 crash reports and only got released <20h ago while the previous beta from 3 days only has 9 crashes with that signature.

That's definitely an increase over time:

date -- Fenix Firefox Focus Thunderbird total
2023-08-01 00:00:00 0 0 0 0 0 0
2023-08-02 00:00:00 0 0 0 0 0 0
2023-08-03 00:00:00 0 0 0 0 0 0
2023-08-04 00:00:00 0 0 0 0 0 0
2023-08-05 00:00:00 0 0 0 0 0 0
2023-08-06 00:00:00 0 0 0 0 0 0
2023-08-07 00:00:00 0 0 0 0 0 0
2023-08-08 00:00:00 0 0 0 0 0 0
2023-08-09 00:00:00 0 0 0 0 0 0
2023-08-10 00:00:00 0 0 0 0 0 0
2023-08-11 00:00:00 0 451 269 11 24 755
2023-08-12 00:00:00 0 28 15 0 0 43
2023-08-13 00:00:00 0 0 0 0 0 0
2023-08-14 00:00:00 0 79 28 3 3 113
2023-08-15 00:00:00 0 1953 1004 54 138 3149 (partial day)

We've been having an increased volume and that's caused some performance problems as the processors were getting overloaded and not handling it well. That's been an ongoing problem for about a year and is covered in bug #1795017. I did some more work on that with another theory and did a prod deploy.

Then I went through and reprocessed all the crash reports with EMPTY: no frame data available; unknown error in the signature. There were about 5k of them again. I don't see any more:

https://crash-stats.mozilla.org/signature/?signature=EMPTY%3A%20no%20frame%20data%20available%3B%20unknown%20error&date=%3E%3D2023-08-01T23%3A41%3A00.000Z&date=%3C2023-08-15T23%3A41%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#reports

I'm going to mark this as FIXED, but I'll keep an eye on whether more of these pop up or bug #1795017 manifests again.

Assignee: nobody → willkg
Status: NEW → RESOLVED
Type: task → defect
Closed: 1 year ago
Priority: -- → P2
Resolution: --- → FIXED

The processor cluster is having issues again and we have a bunch more.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I reprocessed 5,720 crash reports from the last couple of days where the stackwalker timed out.

However, we're still having problems with the processing cluster, so these are still happening. I'm going to keep this open and reprocess them as they come up for the next couple of weeks until we're out of a code freeze and I can do something more useful about it.

Could this be from increased load after Windows 7 and 8.1 as well as OS X 10.12-10.14 got migrated to ESR where every crash report gets processed by default instead of every tenth?

I plotted some stuff. You're right on!--there's an increase in Firefox Windows ESR users that mirrors the increase in overall crash volume:

https://github.com/willkg/socorro-jupyter/blob/main/notebooks/bug_1795017_volume_20230816.ipynb

I wrote up bug #1849352 to consider throttling that grouping.

See Also: → 1795017

We've gotten a handle on the Socorro processor problems we were having. I haven't seen another instance of this issue in the last week, so I'm going to close this out.

Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.