Closed Bug 1701357 Opened 3 years ago Closed 3 years ago

Make parent process crashes on crash-stats more obvious

Categories

(Socorro :: Webapp, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mccr8, Assigned: willkg)

References

Details

Attachments

(7 files)

Crash reports for the parent process don't have a crash type, which can be confusing. A developer in the developers channel was confused about it just now. It would be nice if there could be an annotation like parent or something.

Component: DOM: Content Processes → Crash Reporting
Product: Core → Toolkit
Summary: Add a processType annotation in for the parent process → Add a processType annotation for the parent process

I know this is going to sound silly but fixing this in a reasonable way is a lot more complicated than it appears. For starters Socorro implies that if a crash doesn't have the ProcessType annotation then it's a main process crash, so we'd have to change that first. To make things worse we already have an official "name" for the main process type and it's not what you'd expect. It's called default. What's worse is that value is part of our telemetry IIRC so we can't change it.

So yeah, technically we can add the ProcessType annotation to main process crashes but either we call it "default" - which is going to be confusing - or we change it but then it will be different between telemetry and Socorro (and we'd have to special-case it too in the crash reporting flow). Also we'd have to do the change server-side first.

If there was a "processtype: default" on parent process crashes I would find that much less confusing then having no process type listed at all on the crashes.

(In reply to Timothy Nikkel (:tnikkel) from comment #2)

If there was a "processtype: default" on parent process crashes I would find that much less confusing then having no process type listed at all on the crashes.

Oh, you mean on crash-stats! I misunderstood the request, I thought this was about having the annotation in the (raw) crash report. On crash-stats it's a different story. Parent process crashes can be searched for explicitly by using process_type=browser there so I think it should be possible to make "Process Type" string appear in the details tab.

Will, how hard would it be add a line with the process type to parent (AKA browser) process crashes? This way all crashes would have a visible process type.

Flags: needinfo?(willkg)

Thanks, Timothy. Yeah, sorry I didn't explain what I wanted very well.

Summary: Add a processType annotation for the parent process → Make parent process crashes on crash-stats more obvious

Um... I feel like this came up at some point and there was a reason it never got done.

I'll have to look into it. Keeping the needinfo until I do.

Also, given this is a Socorro problem, I'm going to move it.

Component: Crash Reporting → Webapp
Product: Toolkit → Socorro
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Flags: needinfo?(willkg)
Priority: -- → P2
Flags: needinfo?(willkg)

"browser" is a hard-coded option:

https://github.com/mozilla-services/socorro/blob/2842db4f30468e44d12bbd6016bc4b39c2ae9b29/webapp-django/crashstats/settings/base.py#L289-L298

It's also hard-coded here:

https://github.com/mozilla-services/socorro/blob/2842db4f30468e44d12bbd6016bc4b39c2ae9b29/socorro/external/es/super_search_fields.py#L3010

That also has this comment:

What type of process the crash happened in. When the main process crashes, this will
not be present. But when a plugin or content process crashes, this will be
'plugin' or 'content'.

It's implemented in SuperSearch as a "if the user specified browser, then switch it to does-not-exist":

https://github.com/mozilla-services/socorro/blob/2842db4f30468e44d12bbd6016bc4b39c2ae9b29/socorro/lib/search_common.py#L291-L327

If you do a facet on process types, "browser" doesn't show up:

https://crash-stats.mozilla.org/search/?date=%3E%3D2021-03-02T19%3A37%3A00.000Z&date=%3C2021-04-02T19%3A37%3A00.000Z&_facets=signature&_facets=process_type&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-process_type

rank type count percentage
1 content 1353119 25.05 %
2 gpu 7878 0.15 %
3 plugin 7537 0.14 %
4 rdd 2085 0.04 %
5 socket 197 0.00 %
6 vr

Fun.

Looking at specific crash reports, this crash report has "Process type: content" in the crash annotations and shows a "Process Type: content" in the Details tab of the report view:

https://crash-stats.mozilla.org/report/index/fc8594ba-7cfa-47d1-b5be-155990210402

This crash report has no "Process type" annotation and so it shows nothing in the Details tab:

https://crash-stats.mozilla.org/report/index/117042d9-e0a9-4d85-bcac-b59410210402

So I think we've got a handful of issues and complexities:

  1. Crash reports with no ProcessType annotation don't have a "Process type" line in the Crash Annotations tab of report view on Crash Stats which is confusing.
  2. Crash reports with no ProcessType annotation aren't easily searchable in SuperSearch--there's this whole "browser" option, but that gets converted to "does not exist" and these crash reports won't show up in a "browser" facet. I remember hitting this a while back when looking at the Email data and I had to do gymnastics to work around it.
  3. If we changed this in the crash reporting client, the value would be "default" which would be confusing and doesn't match anything in Socorro.
  4. If we changed this in the crash reporting client or added a normalization step to crash report processing that fixed the ProcessType field, it'd cause skew in values between crash pings and crash reports

Given all that, I think I want to create a new field in the processed crash data that's a normalized ProcessType value. Then we can get rid of all the special-casing and hard-coded stuff and the inconsistent weirdness. Maybe call it normalized_process_type or something like that.

Does the lack of a ProcessType annotation in the crash report always mean that the process is the browser for all products? Is this just for Firefox or some subset?

If we were to normalize it to something and because it's a Socorro value, we don't have to make it match crash pings, is "browser" the right thing to use? What about "main" or "parent"? What's the least confusing?

Flags: needinfo?(willkg)
Flags: needinfo?(gsvelto)
Flags: needinfo?(continuation)

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #6)

Does the lack of a ProcessType annotation in the crash report always mean that the process is the browser for all products? Is this just for Firefox or some subset?

Yes, if ProcessType is missing then it's a browser/parent/main process crash.

If we were to normalize it to something and because it's a Socorro value, we don't have to make it match crash pings, is "browser" the right thing to use? What about "main" or "parent"? What's the least confusing?

It is my understanding that in Fenix the browser (as in the UI) is a separate process compared to the main process running the code in libxul so maybe browser is not the best option. We use both "main process" and "parent process" in our code but the latter is much more common. Additionally code that checks for the process type in Gecko uses XRE_IsParentProcess(). Given that I'd go for "parent" unless there's a strong reason not to which I might not be aware of.

Flags: needinfo?(gsvelto)

Yeah, I think parent is reasonable.

Flags: needinfo?(continuation)

This is super helpful!

I looked at the code. We've already got a processor rule that creates a process_type field, so I'm going to reuse that. It'll do:

  1. if there's a ProcessType annotation, then use that
  2. if there's no ProcessType annotation, then use "parent"

Thus the changes for this bug will be:

  1. change the processor rules so we have one that creates a process_type field in the processed crash that defaults to "parent" -- this will cause it to show up in the Details tab going forward
  2. add "parent" to all the hard-coded places we have process types
  3. write up a new bug to remove the hard-coded process types and all the "browser" means "does not exist" nonsense and schedule that work for 3 months from now

willkg merged PR #5737: "bug 1701357: change default process_type to parent" in d3bd000.

I was tossing around reprocessing crash reports that don't have a ProcessType annotation, but there are a lot of them, so I don't think we should do that after this gets deployed to prod unless it's very compelling. If there are subsets of crash reports that we can reprocess that make things easier, I'm game for doing that. Otherwise, we can wait and all things will be better in time.

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #11)

I was tossing around reprocessing crash reports that don't have a ProcessType annotation, but there are a lot of them, so I don't think we should do that after this gets deployed to prod unless it's very compelling. If there are subsets of crash reports that we can reprocess that make things easier, I'm game for doing that. Otherwise, we can wait and all things will be better in time.

No need to reprocess IMHO, we've lived with this for over a decade we can live with it for a little longer.

I deployed this to prod just now in bug #1703541.

However it's showing up weird in the Details tab. For crash reports in the parent process, it's showing "parent (web)".

Example: bp-1a08ef23-c6fe-4e89-ac87-5ee960210407

I need to look at why that's happening.

Also, the tooltip help text is wrong. I'll have to track down where that's coming from and update it.

In content processes, the stuff in parens is based on the remoteType, if that helps.

Yeah, further it looks like the template code shows the RemoteType or "web" if no RemoteType is specified. That's wrong--it should only show that if it's a content process crash report.

I'll fix that.

Andrew, Gabriele: I reworked that field so it's less weird and it's clearer what it's doing. Do the above clips look ok?

Flags: needinfo?(gsvelto)
Flags: needinfo?(continuation)

Yes, very nice!

Flags: needinfo?(gsvelto)
Flags: needinfo?(continuation)

jcristau and ryanvm pointed out that the TopCrashers report now has two process type filters "parent" and "browser" that should be the same thing (all parent process crash reports) but have non-overlapping sets of crash reports.

That will work itself out over time as the data processed in the wold way expires, but it's really inconvenient now.

I'm going to change the TopCrashers process type filters and get rid of "browser". I'm going to fix Supersearch process_type filter handling so that "parent" and "browser" both mean "process_type=parent OR process_type does not exist". That'll be less disruptive for existing workflows particularly for Stability analysis.

Looks good, thanks.

I deployed this in bug #1703900 a couple of hours ago. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: