Closed Bug 1667997 Opened 4 years ago Closed 1 year ago

split crash reports into crashes, shutdown hangs, and content-process hangs

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(1 file)

pr 6434: bug 1667997: report_type field, processor rule, and adjustment to Top Crasher report 1 year ago Will Kahn-Greene [:willkg] ET needinfo? me 53 bytes, text/x-github-pull-request		Details \| Review

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Description

•

4 years ago

Socorro collects "crash reports" for a variety of projects. However, because the mechanism of generating a crash report, what it contains, and getting it to Mozilla so we can look into it is a convenient way to get data about problems with processes, not all crash reports are actually crashes.

Currently, we're getting:

crash reports -- a process has crashed, the client generated a minidump, it get sent along
shutdownhangs -- a process hung and another process generated a minidump and sent it along
content-process hangs -- I'm not sure what this is, but Gabriele said it's a thing
warnings -- at one point, Fenix was sending warnings which weren't crashes, but were things they were keeping tabs on; I think we're not getting these anymore

This becomes problematic for analyzing and investigating the crash report data because everything is jumbled up together.

For example, the TopCrashers report for Firefox is full of ShutDownKill signatures. Shutdown hangs are a problem, but it's unhelpful to have them overwhelming the TopCrashers report hiding crash issues. (bug #1624946)

This tracker bug covers looking into the problem, figuring out a plan, shopping it around to stakeholders, and then executing on it.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 1

•

4 years ago

Making this a P3 for now. I don't know when I'll get to it.

cc:ing Gabriele because we were talking about this recently.

Some thoughts:

maybe we set up a processor rule that groups the reports into report types: crash, shutdownhang, etc
maybe we adjust the TopCrashers report to filter on report type--it currently filters on platform and process type
maybe we default everything to look at crash report type and you have to explicitly choose other types
maybe we put the type in the crash signature

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

4 years ago

Priority: -- → P3

Gabriele Svelto [:gsvelto]

Comment 2

•

4 years ago

I'll write some super-search queries to show what I hope will be the end result and post them here as examples.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 3

•

4 years ago

I have some notes on this and I want to capture them in the comments rather than have notes in multiple places.

We should add a processor rule that adds a "report_type" field (or some similar name) that specifies the report type. It's flexible so if we get it wrong, we can tweak things and reprocess.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 4

•

4 years ago

Bumping this up to P2 because this could help a bunch.

Priority: P3 → P2

Gabriele Svelto [:gsvelto]

Comment 5

•

2 years ago

A quick refresh of how we'd like to categorize crashes (or rather reports given not all reports are crashes):

Regular crashes
Hangs, this would cover
a. Browser shutdown hangs (example query)
b. Content process shutdown hangs (example query)
c. Non-shutdown hangs we'll explicitly flag (see bug 1826703)
d. Content-process hangs which I don't remember if/how we deal with right now

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

1 years ago

Assignee: nobody → willkg

Status: NEW → ASSIGNED

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 6

•

1 year ago

I want to add a report_type field that will take keywords. We'll start it off with:

hang -- different kinds of hangs
crash -- regular crashes and anything that didn't get categorized as something else; e.g. it'll default to "crash"

We'll make this a public field. We'll index it as a keyword so it's available in search and aggregations.

Gabriele: Does that work for you? I can't tell from comment #5 if you wanted to further break it down by different hang types or not.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 7

•

1 year ago

Yes it's fine. There we'll be different types of hangs but we don't need to tell them apart at that level, we can always facet on the hang type later (like we do with the crash reason with crashes). What we care about is that users can clearly tell apart hangs from crashes.

Flags: needinfo?(gsvelto)

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

1 year ago

Summary: [tracker] split crash reports into crashes, shutdown hangs, and content-process hangs → split crash reports into crashes, shutdown hangs, and content-process hangs

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

1 year ago

Blocks: 1826703

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 8

•

1 year ago

I've finished up most of the changes we need to do here.

The only thing left is to figure out is how to transition from the old system (no report_type field indexed) to the new system (report_type field indexed). I think we have to treat all crash reports with no report_type as a "crash". That means TopCrashers report for report type "crash" will still include hangs depending on whether old data is in the range.

If we think this is going to be confusing to everyone, the alternative is to land the processor and indexing changes now, let data with a report_type accumulate and then land the TopCrashers changes after we have 4 weeks of data--TopCrashers has Days values of 1, 7, 14, 28.

I'll finish this up the week of July 17th.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 9

•

1 year ago

Attached file pr 6434: bug 1667997: report_type field, processor rule, and adjustment to Top Crasher report — Details

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 10

•

1 year ago

This needs testing on stage. I'm pretty sure I got the handling for old data in the Top Crashers report correct.

Further, before it gets deployed to production, I need to send an email to stability, crash-reporting-wg, and firefox-dev mailing lists about the changes to the Top Crashers report and how the old data is handled.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 11

•

1 year ago

willkg merged PR #6434: "bug 1667997: report_type field, processor rule, and adjustment to Top Crasher report" in c081da0.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 12

•

1 year ago

That will auto-deploy to the staging environment.

I need to verify the following on stage:

does the topcrashers report work when crash reports don't have a report_type field (it's not in the index mapping)?
(next week) does the topcrashers report work when some crash reports have a report_type field (index has the field) and some don't (last week's index doesn't have the field)?

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 13

•

1 year ago

I looked at stage and because there's no report_type data and the Top Crashers report defaults to report_type="crash", the Top Crashers report is empty. That will be terrifying to anyone looking at it for the first few weeks after it goes to production.

I'm going to change the default to report_type="any". After this goes to production, we can write up a bug for changing the default at some point in the future. Maybe in a month.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 14

•

1 year ago

willkg merged PR #6440: "bug 1667997: change report_type default to "any"" in 5b9daf9.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 15

•

1 year ago

I checked stage this morning and the filter appears to be working as expected.

I sent an email to the stability and crash-reporting-wg mailing lists about the upcoming change.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 16

•

1 year ago

All the changes so far were deployed to prod just now in bug #1843869.

There won't be any report_type data in the search index until a new index is created this weekend so the "crash" and "hang" filters won't have any results until next week.

I'll keep this open until next week after I verify everything is working as expected and write up a bug to change the report_type default from "any" to "crash".

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Updated

•

1 year ago

Blocks: 1844174

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 17

•

1 year ago

I wrote up a bug for changing the default. I think we're good here.

Status: ASSIGNED → RESOLVED

Closed: 1 year ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.