Closed Bug 1295934 Opened 8 years ago Closed 8 years ago

Add a crash report annotation when we hit DoneStartingUp

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: ted, Assigned: n.nethercote)

References

Details

Attachments

(1 file, 1 obsolete file)

We don't have a good annotation for "startup crash" currently. We do have a piece of code that defines when we're "done starting up", however: https://dxr.mozilla.org/mozilla-central/rev/fe895421dfbe1f1f8f1fc6a39bb20774423a6d74/toolkit/xre/nsAppRunner.cpp#4255 We should add a crash report annotation there to use for distinguishing startup crashes from other crashes.
It would be easy to add an annotation (e.g. "DoneStartingUp") to the crash report once we hit this point. The harder part is dealing with it on the Socorro side. Ideally we'd want a "this is a startup crash" indicator rather than a "this is not a startup crash indicator". I guess it wouldn't be hard to turn !DoneStartingUp into StartupCrash during crash processing. AFAICT, Socorro currently decides if a crash signature is a startup crash if the uptime is 0 in more than 50% of the crash reports: https://github.com/mozilla/socorro/blob/fb3c8dcb1f2a66cfc2a9ee08b07038c2f5c28c21/webapp-django/crashstats/topcrashers/views.py#L109-L113. (peterbe, is that right?) That does feel quite unscientific, and using DoneStartingUp instead seems like a clear improvement. If we change the meaning of "startup crash" in crash-stats we should give people plenty of warning.
Flags: needinfo?(peterbe)
Here's an attempt at it. I have a couple of things I'm unsure about marked with "njn" comments. I'm also not sure if it works. I tried triggering some deliberate crashes, well after start-up, such as this one: https://crash-stats.mozilla.com/report/index/b6732e8e-e7f3-4150-8a10-e71232160822 but there is no DoneStartingUp field visible in the metadata tab or the raw dump field. Am I doing something wrong?
Attachment #8783412 - Flags: feedback?(ted)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment on attachment 8783412 [details] [diff] [review] Add a crash report annotation when we hit DoneStartingUp Review of attachment 8783412 [details] [diff] [review]: ----------------------------------------------------------------- This looks like the patch I would have written. I don't know why it doesn't show up in the Socorro UI, but if you load the raw JSON you can see it's present: https://crash-stats.mozilla.com/rawdumps/b6732e8e-e7f3-4150-8a10-e71232160822.json ::: toolkit/crashreporter/nsExceptionHandler.cpp @@ +177,5 @@ > static xpstring *defaultMemoryReportPath = nullptr; > > // A whitelist of crash annotations which do not contain sensitive data > // and are saved in the crash record and sent with Firefox Health Report. > +// njn: need to add DoneStartingUp here, commented out or not? Adding DoneStartingUp here seems fine. @@ +331,5 @@ > #endif // MOZ_CRASHREPORTER_INJECTOR > > // Crashreporter annotations that we don't send along in subprocess > // reports > +// njn: need DoneStartingUp here? This is an interesting point--we probably want it listed here so we aren't sending along the chrome process "DoneStartingUp" for content process crashes, but do we want to define a similar startup point for content processes?
Attachment #8783412 - Flags: feedback?(ted) → feedback+
Oh, and I just looked and now it's present in the "Raw Metadata", so maybe we either overlooked it or the Socorro UI has some glitch?
I now see the field in the Metadata tab. I swear it wasn't there before. Hmm.
A thought while at the gym--should we annotate this to "0" immediately when we initialize the crash reporter, so that we always have a value for it?
New version. Things of note: - I took your suggestion of adding an annotation as soon as the crash reporter starts up. I'm not sure if I chose the right place for this. SetExceptionHandler() is another candidate. - I renamed the field "StartupCrash", and flipped its meaning. That way we won't have to do any processing on the Socorro side, which is nice. - I think *not* doing this in content processes is the right thing. At least, my personal interpretation of "start-up crash" is that it's the chrome process. - I discovered why I sometimes couldn't see the "DoneStartingUp" field on crash-stats -- it depends on whether or not I'm logged in! When logged in I can see it, otherwise I can't. I did a test with "StartupCrash" and saw the same behaviour: https://crash-stats.mozilla.com/report/index/97c6a62a-3e5c-492c-badc-766132160823 When logged in I see 35 fields in the Metadata tab; when not logged in I see 29. The difference is these fields: dump_checksums Email StartupCrash TelemetryEnvironment type_tag uuid I think it might be the API_WHITELIST variable in Socorro that's the cause.
Attachment #8783778 - Flags: review?(ted)
Attachment #8783412 - Attachment is obsolete: true
> https://crash-stats.mozilla.com/report/index/97c6a62a-3e5c-492c-badc- > 766132160823 > When logged in I see 35 fields in the Metadata tab; when not logged in I > see > 29. [...] > I think it might be the API_WHITELIST variable in Socorro that's the cause. Kan-Ru explained this to me: it needs to be added to the Super Search fields, similar to bug 1264244.
Yep, the Metadata tab loops over all keys in the raw crash, as stored in S3, and if you're not signed in to view PII, it filters by API_WHITELIST: https://github.com/mozilla/socorro/blob/f6a02e45015b4d502d31c2fb20d34d7941071200/webapp-django/crashstats/crashstats/views.py#L1445-L1452 Just file away on the new fields. Perhaps even better would be if you file AND implement them yourselves. Perhaps we should make you a superuser :) The reason why the socorro team has been filling in the new SuperSearch fields was for deliberate inertia (i.e. a second pair of eyes). * Email should obviously never be added. * Can TelemetryEnvironment ever contain PII? * Why isn't uuid already part of the report index page? In the SuperSearch fields admin page, there's a special report (https://crash-stats.mozilla.com/admin/supersearch-fields/missing/) that compares ALL available fields noticed in the last 3 weeks that don't exist as a SuperSearch field. The list is huge and mostly crazy Unicode encoding problems. In it, I see there's `raw_crash.StartupCrash`. (and there's `raw_crash.StartupTimePluginVersion` too by the way)
Flags: needinfo?(peterbe)
Comment on attachment 8783778 [details] [diff] [review] Add a "StartupCrash" crash report annotation Review of attachment 8783778 [details] [diff] [review]: ----------------------------------------------------------------- This makes more sense this way, thanks!
Attachment #8783778 - Flags: review?(ted) → review+
Comment on attachment 8783778 [details] [diff] [review] Add a "StartupCrash" crash report annotation bsmedberg, does this change seem ok to you? The idea is to have a crash annotation that clearly indicates if we are starting up or not. It is only used in the chrome process. The idea is that once this patch lands we would modify Socorro to use it to identify startup crashes, rather than the existing "do more than half the crashes with this signature have an uptime of 0?" heuristic. (As well as having a clearer meaning, this also means we can tell if an individual crash is a startup crash, whereas the current heuristic only applies to groups of crashes.)
Attachment #8783778 - Flags: feedback?(benjamin)
Comment on attachment 8783778 [details] [diff] [review] Add a "StartupCrash" crash report annotation I believe that annotations from the main process also end up in subprocess crash reports, so this is likely to show up (but be meaningless) in content and plugin crash reports.
Attachment #8783778 - Flags: feedback?(benjamin) → feedback+
(In reply to Benjamin Smedberg [:bsmedberg] from comment #12) > > I believe that annotations from the main process also end up in subprocess > crash reports, so this is likely to show up (but be meaningless) in content > and plugin crash reports. I added some debugging printf statements and AFAICT those annotation points (within XRE_mainInit() and XRE_mainRun()) aren't hit in content processes. Even if that's wrong, adding "StartupCrash" to kSubprocessBlacklist[] should prevent them being submitted. At least, that's my understanding!
You can easily test this by submitting a content process crash using https://addons.mozilla.org/en-US/firefox/addon/crash-me-now-simple/ (crash content process button is available in the customize dialog)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #14) > You can easily test this by submitting a content process crash using > https://addons.mozilla.org/en-US/firefox/addon/crash-me-now-simple/ (crash > content process button is available in the customize dialog) Thank you for the tip. Very useful! I did some experimenting. My hypothesis about this annotation not being attached to content process crash reports was wrong. When I removed "StartupCrash" from kSubprocessBlacklist[] I got a content crash with "StartupCrash" in it: https://crash-stats.mozilla.com/report/index/9c485aad-96aa-47cd-b480-b93f02160825 Then I put "StartupCrash" back into kSubprocessBlacklist[] and now I'm not getting it for content crashes, which is good: https://crash-stats.mozilla.com/report/index/66099a52-a336-4779-a722-d9c752160825 I am getting it for chrome crashes though, as intended: https://crash-stats.mozilla.com/report/index/a526b2ca-4c9b-4db6-8bd4-3696c2160825 I'm a little confused about kSubprocessBlacklist[] -- it appears to work for StartupCrash, FramePoisonBase, FramePoisonSize. But it doesn't work for URL and StartupTime -- I get them in content crashes. I don't know why. Anyway, the patch as reviewed appears to work in the desired way.
Blocks: 1297966
I filed bug 1297966 for fully hooking up this new annotation in Socorro.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Depends on: 1298701
Depends on: 1361704
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: