Closed Bug 1612569 Opened 4 years ago Closed 4 years ago

Improve IPCError-browser | ShutDownKill signatures

Categories

(Socorro :: Signature, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gsvelto, Assigned: willkg)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

Currently crashes with an IPCError-browser | ShutDownKill signature are all lumped together making it extremely hard to tell apart the real issues to the cases where Firefox is just being slow. Bug 1279293 points to a large collection of bugs and more are being filed all the time.

I propose to replace the current signature from IPCError-browser | ShutDownKill to IPCError-browser | ShutDownKill @ <regular signature>. This way we should be able to put crashes into buckets that make sense and the "noisy" signatures caused by general slowness should fall off the radar.

This is the signature generation rule that's causing problems:

https://github.com/mozilla-services/socorro/blob/b403877ce52c150723adb8b04a072dccf9557cdf/socorro/signature/rules.py#L831

We have pretty good tooling for testing signature generation changes. I'll test this out this week.

I changed the rule to just prepend the IPCError rather than stomp on the signature. I didn't work on this rule originally, so I'm not sure why it was stomping on the signature. Does this look better?

app@socorro:/app$ socorro-cmd fetch_crashids --signature=ShutDownKill --num=5 | socorro-cmd signature
Crash id: 9e4d772e-719e-4711-a688-122e80200204
Original: IPCError-browser | ShutDownKill
New:      IPCError-browser | ShutDownKill | gfxHarfBuzzShaper::SetGlyphsFromRun
Same?:    False
Notes:    (1)
          SignatureIPCChannelError: IPC Channel Error prepended
Crash id: 2aa9b6f1-8727-4a38-a873-6c5220200204
Original: IPCError-browser | ShutDownKill
New:      IPCError-browser | ShutDownKill | js::jit::MaybeEnterJit
Same?:    False
Notes:    (1)
          SignatureIPCChannelError: IPC Channel Error prepended
Crash id: 5db1f204-d40a-4682-8292-9246d0200204
Original: IPCError-browser | ShutDownKill
New:      IPCError-browser | ShutDownKill | ntdll.dll | MessageLoop::PostTask_Helper | mozilla::ipc::ProcessLink::SendClose
Same?:    False
Notes:    (1)
          SignatureIPCChannelError: IPC Channel Error prepended
Crash id: d96363f0-ceb1-424c-a68e-177d70200204
Original: IPCError-browser | ShutDownKill
New:      IPCError-browser | ShutDownKill | mozilla::ContentPrincipal::GetURI
Same?:    False
Notes:    (1)
          SignatureIPCChannelError: IPC Channel Error prepended
Crash id: f6086af3-c3ba-459f-92c3-5152b0200204
Original: IPCError-browser | ShutDownKill
New:      IPCError-browser | ShutDownKill | __poll
Same?:    False
Notes:    (1)
          SignatureIPCChannelError: IPC Channel Error prepended
Flags: needinfo?(gsvelto)

Yes, this is much better.

Flags: needinfo?(gsvelto)

NI? myself to manually double-check some of the crashes.

Flags: needinfo?(gsvelto)

I made some analysis on the crashes and the new signatures are indeed very interesting. For example crash f6086af3-c3ba-459f-92c3-5152b0200204 shows a content process that is waiting, it didn't even initiate shutdown so it's either stuck (didn't get the message?) or the machine was so slow that the process wasn't even woken up before being killed. Crash d96363f0-ceb1-424c-a68e-177d70200204 on the other hand shows work going on in the content process during the shutdown procedure. It seems to be gathering some sort of stats and it might be slow, so we could work on that to make it faster. Either way this will make these bugs actionable while right now they're a huge faceless blob.

Flags: needinfo?(gsvelto)

Awesome! I'll make the changes and land them. Once they're in prod, I can reprocess existing ShutDownKill crash reports for the last month.

I'll make sure to reply to the email on the stability list when I'm done with that.

Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P2

This went to prod just now with deploy bug #1613695. I'm reprocessing the last week of ShutDownKill crashes now.

Thanks Will, this is excellent!

I reprocessed the last week and emailed stability.

Marking this as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Because I only reprocessed the last week of them, there's been some confusion around the new signatures around assuming things because of when they show up which is entirely due to the fact I only reprocessed a week of them.

There are 660k left. Reopening to work on reprocessing the rest.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

We reprocessed all the crash reports that had the signature IPCError-browser | ShutDownKill. Marking as FIXED.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Regressions: 1617918
See Also: → 1727149
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: