Closed
Bug 1135543
Opened 7 years ago
Closed 7 years ago
Report OOM crashes in FHR
Categories
(Toolkit :: Telemetry, defect)
Tracking
()
RESOLVED
FIXED
mozilla39
Tracking | Status | |
---|---|---|
firefox39 | --- | fixed |
People
(Reporter: away, Unassigned)
Details
(Whiteboard: [MemShrink:P2])
Attachments
(3 files, 3 obsolete files)
6.23 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
9.54 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
7.58 KB,
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
This will tell us whether users are leaving due to OOMs.
Attachment #8567746 -
Flags: review?(benjamin)
![]() |
||
Updated•7 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P2]
Comment 2•7 years ago
|
||
Comment on attachment 8567746 [details] [diff] [review] Report OOM crashes in telemetry I apologize it's taken me a while to get to this. I'm concerned about the overall data design and the related lack of a useful commit message or comments. My understanding of what this patch does is if we have an OOM allocation size, it will write a marker file to disk (where exactly? It appears to not be in the profile but outside somewhere). Then on startup, we check for the existence of the marker and record a single CRASH_OUT_OF_MEMORY flag in the first session of the new launch. I'm concerned about this for the following combination of reasons: * We need to have a system for recording app crashes in general in telemetry (bug 1121013 covers this). I expect that we should coordinate those. * We already have the CrashManager system for recording crash events and processing them near startup. Rather than using a separate marker, it sounds better to me to use that existing system to add the additional OOM marker. See http://hg.mozilla.org/mozilla-central/annotate/a27dd304348d/toolkit/crashreporter/nsExceptionHandler.cpp#l704 for where we write the crash event, and http://hg.mozilla.org/mozilla-central/annotate/a27dd304348d/toolkit/components/crashes/CrashManager.jsm#l515 for where we read it. I suspect that what we want to do is centralize both this recording and the recording from bug 1121013 in the crash manager. This will avoid additional main-thread I/O and avoid creating yet another system.
Attachment #8567746 -
Flags: review?(benjamin) → review-
I was trying to follow the existing pattern by putting the OOM marker in the same directory as the "LastCrash" and "LastCrashFileName" files. I will look into the CrashManager event code. At first glance it does look like it could be a better approach. ni?: It's not clear to me what you are suggesting regarding bug 1121013. (I don't know any of its backstory) Must that bug be resolved before we can proceed here?
Flags: needinfo?(benjamin)
Comment 4•7 years ago
|
||
Bug 1121013 is a necessary part of the FHR/telemetry unification program and our team needs to figure it out in the short term. It doesn't necessarily block this one, but having a unified design for it seem prudent given the close relationship. If you want to take it, I'd be happy, but I was going to probably take it myself.
Flags: needinfo?(benjamin)
Is this at all on the right track? I really don't know what I'm doing here and I don't want to waste time on the wrong thing.
Attachment #8567746 -
Attachment is obsolete: true
Attachment #8574466 -
Flags: feedback?(benjamin)
Comment 7•7 years ago
|
||
Comment on attachment 8574466 [details] [diff] [review] crash events WIP This is along the right track, but I think you should approach the crash manager differently. The fact that a particular crash is an OOM doesn't mean that it's not a crash: it's metadata that we associate with a crash. Also in the near future, I expect that we'll want to store all sorts of metadata about a crash, not just the OOM state. So instead of hard-coding all the possibilities, I think we should have an optional set of key/value metadata about each crash record. So the nsICrashManager API would be something like: addCrash(processType, crashType, id, metadata); The only real problem with my plan is that this isn't just a JS API, it's also an XPIDL API, and XPIDL doesn't have a dictionary datatype. So we'll have to figure out something with the nsICrashManager.addCrash method. That way, you can record {"OOMAllocationSize": "23546"} in the metadata, and then use that to drive the final data collection. Soon I'd like that to include most of the metadata that we collect for crashes except the personal data (URLs). What this means technically for this patch is that the format of the crash event file should be something like this: crash.main.2 <crashid> Name=Value Name2=Value2 ... And you can use the same block of code to process crash.main.1 and crash.main.2 to avoid duplication. You won't need the CRASH_TYPE_OOM constant at all. Other questions you have in the code: You don't need to modify CrashReporterParent yet, unless you need to record the OOM state for subprocesses (content/plugin/gmp). I don't think we can do that yet, but it might be worth a followup. // xxx where does this call go?! I'm pretty sure it ends up here: http://mxr.mozilla.org/mozilla-central/source/toolkit/components/crashes/CrashManager.jsm#1106 You should always change IIDs. Not because of binary-compat concerns, but because changing the IID is what triggers our build system to rebuild IDL files.
Attachment #8574466 -
Flags: feedback?(benjamin)
So there has been some massive confusion on my end, and this was actually supposed to be FHR. (Fortunately it doesn't make *too* much of a difference from a code perspective; I still plan to use the Crash Manager)
Summary: Report OOM crashes in telemetry → Report OOM crashes in FHR
![]() |
Reporter | |
Comment 10•7 years ago
|
||
Attachment #8574466 -
Attachment is obsolete: true
Attachment #8576414 -
Flags: review?(benjamin)
![]() |
Reporter | |
Comment 11•7 years ago
|
||
Attachment #8576416 -
Flags: review?(benjamin)
![]() |
Reporter | |
Comment 12•7 years ago
|
||
Attachment #8576417 -
Flags: review?(benjamin)
![]() |
Reporter | |
Comment 13•7 years ago
|
||
Attachment #8576418 -
Flags: review?(benjamin)
Comment 14•7 years ago
|
||
Comment on attachment 8576414 [details] [diff] [review] Part 1: Add a |metadata| parameter to nsICrashService::addCrash In Mozilla-COM You don't need to/shouldn't revise the contractID or CID. So please undo the crashservice;2 change as well as the new {0cb49ca4-8c96-4f31-b22a-09f3082d250d} CID. I wish we had a better datatype for the metadata, but I suppose the hacky way is good enough for now. Are you sure we need to make this change at all?
Attachment #8576414 -
Flags: review?(benjamin) → review+
Comment 15•7 years ago
|
||
Comment on attachment 8576416 [details] [diff] [review] Part 2: Add a |metadata| parameter to CrashManager.addCrash It feels like you ought to have tests which exercize the keyvalue parsing mechanism, if we actually need it. My impression is that we don't need it (just skip part 1 entirely). So I'm going to mark r+ on this patch but the parsing bits shouldn't land without a separately reviewed test.
Attachment #8576416 -
Flags: review?(benjamin) → review+
Comment 16•7 years ago
|
||
Comment on attachment 8576417 [details] [diff] [review] Part 3: Create events file format crash.main.2 which contains metadata +There should be ``UUID.dmp`` and ``UUID.extra`` files on disk, saved by +Breakpad. I don't think this comment belongs here. If the crash has already been submitted by the user, the .dmp/extra files will already have been deleted. r=me on the rest.
Attachment #8576417 -
Flags: review?(benjamin) → review+
Updated•7 years ago
|
Attachment #8576418 -
Flags: review?(benjamin) → review+
![]() |
Reporter | |
Comment 17•7 years ago
|
||
> Are you sure we need to make this change at all? Nope! > My impression is that we don't need it (just skip part 1 entirely). No objection here.
![]() |
Reporter | |
Comment 18•7 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #15) Regarding the tests, I assume it's a moot point if we decide not to add that parsing code? So just to confirm, if I: * Drop part 1 * Remove CrashService.js from part 2 * Clean up .rst comments in part 3 ...then this stack is ready to land?
Flags: needinfo?(benjamin)
Attachment #8576414 -
Attachment is obsolete: true
![]() |
Reporter | |
Comment 20•7 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/2beffa5eb078 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/21341fc677aa remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/ec3e39670ac9
Comment 21•7 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/2beffa5eb078 https://hg.mozilla.org/mozilla-central/rev/21341fc677aa https://hg.mozilla.org/mozilla-central/rev/ec3e39670ac9
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
![]() |
Reporter | |
Comment 22•7 years ago
|
||
Is this ineffective on 39 due to the unification? Should it go up to 38?
Flags: needinfo?(vdjeric)
Comment 23•7 years ago
|
||
(In reply to David Major [:dmajor] (UTC+13) from comment #22) > Is this ineffective on 39 due to the unification? Should it go up to 38? We do need this patch on 39. We still have the old FHR enabled on 39, and we won't turn it off until we've confirmed that unified Telemetry is reporting the same numbers as the old FHR.
Flags: needinfo?(vdjeric)
You need to log in
before you can comment on or make changes to this bug.
Description
•