Closed
Bug 1407693
Opened 7 years ago
Closed 7 years ago
Don't attempt to create a file on crash in content process
Categories
(Toolkit :: Crash Reporting, enhancement)
Toolkit
Crash Reporting
Tracking
()
RESOLVED
FIXED
mozilla60
People
(Reporter: Alex_Gaynor, Assigned: Alex_Gaynor)
References
Details
Attachments
(2 files, 1 obsolete file)
Right now on a crash, our callback to the breakpad ExceptionHandler creates a new file in the content temp directory: https://dxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#1351-1390
We'd like to drop write permissions from this directory, and I believe this is the last use of that!
A fairly straight forward resolution of this would be to open the file in |SetRemoteExceptionHandler|, and pass it as the callback context.
I'll work up a patch which does this shortly.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 4•7 years ago
|
||
mozreview-review |
Comment on attachment 8932235 [details]
Bug 1407693 - Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API;
https://reviewboard.mozilla.org/r/203266/#review208836
::: security/sandbox/win/src/sandboxbroker/sandboxBroker.h:60
(Diff revision 1)
> - // Exposes AddTargetPeer from broker services, so that none sandboxed
> + // Exposes AddTargetPeer from broker services, so that non-sandboxed
> // processes can be added as handle duplication targets.
> bool AddTargetPeer(HANDLE aPeerProcess);
>
> + void AddHandleToShare(HANDLE aHandle);
Oh dear, thanks for fixing.
While we're changing it, can we use JavaDoc-style comments, which I didn't realise were in the style guide at the time.
Also, can you add one to AddHandleToShare to explain that this handle will be inherited by the child.
Attachment #8932235 -
Flags: review?(bobowencode) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 8•7 years ago
|
||
mozreview-review |
Comment on attachment 8932236 [details]
Bug 1407693 - Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file;
https://reviewboard.mozilla.org/r/203268/#review210132
This looks alright with a nit addressed: you need to add stubs for the functions you've added to nsExceptionHandler.h in nsDummyExceptionHandler.cpp so that things will build even with crash reporting disabled. This is required because I've landed bug 1402519 in the meantime.
::: toolkit/crashreporter/nsExceptionHandler.h:178
(Diff revision 2)
> // indicating which remote process crashed first.
> bool TakeMinidumpForChild(uint32_t childPid,
> nsIFile** dump,
> uint32_t* aSequence = nullptr);
>
> +// TODO: can we use base::ProcessId?
This is rather ugly but I assume there must have been a reason why we've avoided including chromium headers here...
Attachment #8932236 -
Flags: review?(gsvelto) → review+
Comment 9•7 years ago
|
||
I'm looking into the tests now. I've been bothered about the way they get .extra/.dmp files for a while so this might be the right time to fix that.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 13•7 years ago
|
||
While trying to address the issues with the broken tests I've noticed that there's another problem with your patch that I hadn't spotted during review. If you try to run the toolkit/crashreporter/test/unit_ipc/test_content_oom_annotation_windows.js xpcshell test, you'll notice that the annotations that are added in the exception handler callback do not end up in the .extra file. I'm not sure if the issue is because the test isn't getting the correct .extra file or if the annotations are not written correctly. Thing is, the test does find an .extra file, but it's only partially populated.
Assignee | ||
Comment 14•7 years ago
|
||
Mmm, when I run this test I'm seeing a ton of unrelated error tracebacks, including "Failed to load module resource:///modules/AttributionCode.jsm" and a Python KeyError in mozlog.
Do these match what you're seeing? -- I suspect there's some underlying error in this output, but it's a bit hard to find :-)
Assignee | ||
Comment 15•7 years ago
|
||
Ok, disregard that, I managed to muddle through by ignoring the errors and debugging manually from your description :-) I'm debugging now.
Comment 16•7 years ago
|
||
Yeah, those errors are sadly all expected... The AttributionCode.jsm one is particularly annoying and was added recently IIRC.
Assignee | ||
Comment 17•7 years ago
|
||
Gabriele, should we expect this data to be coming from the WriteGlobaMemoryStatus call in PrepareChildExceptionTimeAnnotations or the call in MinidumpCallback?
Comment 18•7 years ago
|
||
status-firefox59:
--- → ?
Comment 19•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #17)
> Gabriele, should we expect this data to be coming from the
> WriteGlobaMemoryStatus call in PrepareChildExceptionTimeAnnotations or the
> call in MinidumpCallback?
Yes, and from my testing it seems to be called correctly. I haven't had time to check the handle that gets passed to the callback though. Either way it seems that some annotation end up in an .extra file, and some others (like those from PrepareChildExceptionTimeAnnotations) end up in another. Or at least that what it looks like.
Assignee | ||
Comment 20•7 years ago
|
||
So it doesn't appear that MinidumpCallback is ever invoked, at least in this test. (Verified by sticking a MOZ_CRASH() at the top of it and seeing if it crashed, which hopefully wouldn't get silently eaten). Is that expected?
I'm not sure I understand the full flow of when these callbacks are used, and when different ones are supposed to be called. Is it a bug that MinidumpCallback isn't called?
If not, the only other called of WriteGlobalMemoryStatus is PrepareChildExceptionTimeAnnotations, which writes it to the main annotation FD (the one this whole patchset is about :-)). If this is the call it is supposed to originate, I need to figure out how it's supposed to migrate to the .extra.
Are there any good docs to read to get more comfortable with this code?
Comment 21•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #20)
> So it doesn't appear that MinidumpCallback is ever invoked, at least in this
> test. (Verified by sticking a MOZ_CRASH() at the top of it and seeing if it
> crashed, which hopefully wouldn't get silently eaten). Is that expected?
Yes, MinidumpCallback is invoked only for main process crashes. This test is causing a content process crash so it shouldn't be called.
> If not, the only other called of WriteGlobalMemoryStatus is
> PrepareChildExceptionTimeAnnotations, which writes it to the main annotation
> FD (the one this whole patchset is about :-)). If this is the call it is
> supposed to originate, I need to figure out how it's supposed to migrate to
> the .extra.
Yes, that's the call that should do the annotations. When we hit a content process crash the main process is notified and it asks for a dump of the process that crashed. This should trigger OnChildProcessDumpRequested() callback:
https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/crashreporter/nsExceptionHandler.cpp#3234
This in turn will attempt to write an .extra file for the associated minidump by calling WriteExtraForMinidump():
https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/crashreporter/nsExceptionHandler.cpp#3269
This will retrieve the .extra file generated by the child process via GetExtraFileForChildPid():
https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/crashreporter/nsExceptionHandler.cpp#3060
In your patch you've replaced this part by retrieving the file descriptor from the processToCrashFd which looks sane since it's the file that you've passed to the content process in the first place.
There's just a thing that might be relevant and which I hadn't noticed during review: you deregister the file descriptor and delete the file in GeckoProcessHost::~GeckoProcessHost(), but this destructor isn't called on the main thread, it's called on the I/O thread:
https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/ipc/glue/GeckoChildProcessHost.cpp#107
I wonder if we're racing here and deleting the content process' .extra file before we get the chance to retrieve the annotations. Note that if the file isn't present WriteExtraForMinidump() will return false, but OnChildProcessDumpRequested() won't do a thing, so the error might not be apparent.
> Are there any good docs to read to get more comfortable with this code?
Unfortunately not, we have a high-level overview of the whole process but it doesn't got into all the details and quirks of the implementation:
https://firefox-source-docs.mozilla.org/toolkit/crashreporter/crashreporter/index.html
Assignee | ||
Comment 22•7 years ago
|
||
Disabling the de-registration code doesn't cause this test to pass, so I don't think it's that (or at least, not _only_ that).
I'm hitting something that is completely baffling. If I run the test with --debugger="C:\path\to\windbg.exe" the test passes, even when I do nothing but run "g" (continue) when it starts. Running without a debugger reliably fails.
My goal was to put a DebugBreak() in WriteExtraForMinidump, however when running with the debugger that DebugBreak() doesn't fire, even though if I put an fprintf in there, it fires.
Are there issues running the crashreporter tests under a debugger?
Assignee | ||
Comment 23•7 years ago
|
||
With a bunch of fprintf debugging, what I've found is that it doesn't bail out of WriteExtraForMinidump anywhere, it makes it all the way through; the FD is present in the map, it writes it, etc. Next step is to add more prints until I know exactly what's getting written to the .extra
Assignee | ||
Comment 24•7 years ago
|
||
Ok, spent basically the entire day with printfs and a debugger:
- We definitely have the same HANDLE value in both the parent and the child.
- We write stuff to the child HANDLE, and then we close it.
- The parent's position on the HANDLE is at 0 as soon as we receive it -- this was surprising to me, but I think sharing the position between two processes might be a unix-ism.
- When we go to read from the HANDLE in the parent, we read 0 bytes from it.
Aaaaand, as I was writing this I came up with one additional idea and confirmed it: WriteFile in the child is returning FALSE, indicating it did not succeed.
I think this indicates that somehow I am not passing the HANDLE to the child correctly. I will resume investigating this tomorrow.
Comment 25•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #24)
> Ok, spent basically the entire day with printfs and a debugger:
>
> - We definitely have the same HANDLE value in both the parent and the child.
I wonder if this is the problem.
I know we can't use DuplicateHandle in the parent, because we don't know the target process (because it doesn't exist yet).
However, maybe we have to use DuplicateHandle in the child with the child process as the source and target process, to get a handle we can use.
Just grasping at straws really.
Comment 26•7 years ago
|
||
If we could get hang of someone more familiar with Windows internals it would help... However searching around the internet seems to point to the fact that one cannot share a file handle on Windows between processes without calling DuplicateHandle() on it first. Our IPC code seems to follow this guideline, see this:
https://searchfox.org/mozilla-central/rev/b7e3ec2468d42fa59d86c03ec7afeb209813f1d4/ipc/glue/FileDescriptor.cpp#119
Updated•7 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 27•7 years ago
|
||
*lights computer on fire*
HOKAY, SO! To share a HANDLE with a new process, you need to include it in PROC_THREAD_ATTRIBUTE_HANDLE_LIST, which is accomplished by the AddHandleToShare API.
AddHandleToShare is on the SandboxBroker.
We have two process spawning APIs, depending on whether the sandbox is enabled.
xpcshells disable the sandbox.
PROC_THREAD_ATTRIBUTE_HANDLE_LIST is therefore only set when the sandbox is running, meaning the HANDLE isn't available in the child process, causing these tests to fail.
It seems there's two ways to go here:
a) Use PROC_THREAD_ATTRIBUTE_HANDLE_LIST in the non-sandboxed spawn path
b) Consolidate the process spawning code paths.
Bob, I'm not sure what our plans are here, is (b) already on the roadmap?
Flags: needinfo?(bobowencode)
Comment 28•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #27)
...
> xpcshells disable the sandbox.
Isn't this kind of by accident?
> a) Use PROC_THREAD_ATTRIBUTE_HANDLE_LIST in the non-sandboxed spawn path
> b) Consolidate the process spawning code paths.
>
> Bob, I'm not sure what our plans are here, is (b) already on the roadmap?
Well roadmap might be a bit strong, it should be fairly straight forward once we can unconditionally compile the sandbox code.
I think the only thing blocking that is the MinGW build for tor on which tjr is actively working.
I guess for the purposes of our tests I could consolidate when MOZ_SANDBOX is defined.
Then we could cut out the other stuff once we always have the sandbox.
I have a WIP patch somewhere, let me see how far I got.
Flags: needinfo?(bobowencode)
Assignee | ||
Comment 29•7 years ago
|
||
Accident is also kind of a strong word :-) It's intentional in the sense that we intentionally wrote code to do it; there's a bunch of xpcshell tests that do things like spawn TCP servers, and we didn't want refactoring all of those to block rolling the sandbox out.
Comment 30•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #27)
> It seems there's two ways to go here:
>
> a) Use PROC_THREAD_ATTRIBUTE_HANDLE_LIST in the non-sandboxed spawn path
> b) Consolidate the process spawning code paths.
Upstream Chromium seems to do (a), for what it's worth: https://codesearch.chromium.org/chromium/src/base/process/launch_win.cc?l=231&rcl=329d49213d27cd7eaf30f52a7fbab6118a7c5c37
But also, so do we:
https://searchfox.org/mozilla-central/rev/e3cba77cee3f/ipc/chromium/src/base/process_util_win.cc#251
…kind of; it's currently only stdout and stderr:
https://searchfox.org/mozilla-central/rev/e3cba77cee3f/ipc/chromium/src/base/process_util_win.cc#363-366
However, it looks like it wouldn't be too hard to add something to LaunchOptions and wire it up in LaunchApp.
Comment 31•7 years ago
|
||
(In reply to Jed Davis [:jld] (⏰UTC-7) from comment #30)
...
> However, it looks like it wouldn't be too hard to add something to
> LaunchOptions and wire it up in LaunchApp.
This sounds like the simplest option then.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 35•7 years ago
|
||
Went with that option, and indeed |test_content_oom_annotation_windows.js| passes cleanly on my machine now. Kicked off a try run and will review where we're at after lunch :-)
Assignee | ||
Comment 36•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=59db81ed4739ce4167762a88f004b3badb3d39c1&group_state=expanded
Looks like the failing tests are:
- dom/plugins/test/mochitest/test_busy_hang.xul
- dom/plugins/test/mochitest/test_CrashService_hang.html
- browser/components/extensions/test/xpcshell/test_ext_browsingData.js (and several others from this directory)
I believe these correspond to the TODO comments I put in part 3. IIRC, you were looking at changes to these tests Gabriele, did you make any progress on those, or should I dive in?
Comment 37•7 years ago
|
||
I hadn't looked at those yet. Hangs are nasty because they use a slightly different codepath to generate the minidump. I'll have a look later today to see if this is something trivial or not.
Assignee | ||
Comment 38•7 years ago
|
||
I'm reasonably certain the comment in hang_test.js I added is wrong -- that code is running in the parent, so it's acting on the final location of the minidump, not where the content process wrote to. I'm digging in to figure out where the actual bug is.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 42•7 years ago
|
||
Ok, fixed the hang tests! https://treeherder.mozilla.org/#/jobs?repo=try&revision=afeab45e919a2aa33f0682e574b8483ead6be8fa&group_state=expanded
Down to just the failures which I believe are caused by the deletion issue, will dive into those tomorrow.
Assignee | ||
Comment 43•7 years ago
|
||
Pretty sure the solution here is to just use an anonymous pipe, instead of a disk file. Will implement today.
Assignee | ||
Comment 44•7 years ago
|
||
Gabriele, one of the limitations of a pipe is that there's a max write size before it starts to block -- obviously that'd be really bad in the crashreporter :-). Do you have any sense of how large the payloads we're typically writing for crash time annotations is?
Flags: needinfo?(gsvelto)
Comment 45•7 years ago
|
||
It depends on the annotations, most are small, usually a few tens of bytes. The TelemetryEnvironment annotation on the other hand is usually much bigger as it's a JSON object holding a significant amount of data (think kilobytes). Are you considering removing the content process .extra file altogether?
Flags: needinfo?(gsvelto)
Assignee | ||
Comment 46•7 years ago
|
||
The idea was to inherit a pipe HANDLE, instead of inheriting a disk file HANDLE. Is the TelemetryEnvironment annotation one of the ones that is sent as part of ExceptionTimeAnnotations?
Comment 47•7 years ago
|
||
I did some digging to be sure of what is annotated when. The only annotation we write to the content process .extra files are the ones in PrepareChildExceptionTimeAnnotations(). So OOMAllocationSize, MozCrashReason, ThreadIdNameMapping and additionally only on Windows those in WriteGlobalMemoryStatus(). Those are all small (tens of bytes) with the only exception of ThreadIdNameMapping which can be a few hundred KiBs. If you can push those over a pipe then we can get rid of the content process .extra file. This doesn't need to be perfect, we just have to check that if something gets truncated on the parent side it doesn't generate invalid annotations.
[1] https://searchfox.org/mozilla-central/rev/eeb7190f9ad6f1a846cd6df09986325b3f2c3117/toolkit/crashreporter/nsExceptionHandler.cpp#1264
Assignee | ||
Comment 48•7 years ago
|
||
Ooof. I misread your message as "few hundred bytes" and started porting to pipes, but you wrote hundreds of _KiB_. Pipe buffer sizes are max at like 65KiB, so that's not going to work. Going to dive into these tests and why they want to delete the temp dir.
Comment 49•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #48)
> Ooof. I misread your message as "few hundred bytes" and started porting to
> pipes, but you wrote hundreds of _KiB_.
No, that's a typo! I meant a few hundred bytes, the pipe idea is the best approach and it should work.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8932237 -
Attachment is obsolete: true
Assignee | ||
Comment 52•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f01da56df3e7bfe4e08bd6d207c9829532d4100&group_state=expanded
tests look green on Windows with a pipe! (oranges are unrelated) -- going to kick off a build with Linux and macOS now.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 55•7 years ago
|
||
Took a bit to get the tests green on Linux, but we're good! https://treeherder.mozilla.org/#/jobs?repo=try&revision=38877abc835b6dab5d47c999f903c3e2c1d45942&group_state=expanded
Going to re-r? since the patch has diverged significantly since the original reviews!
Assignee | ||
Updated•7 years ago
|
Attachment #8932235 -
Flags: review+ → review?(bobowencode)
Assignee | ||
Updated•7 years ago
|
Attachment #8932236 -
Flags: review+ → review?(gsvelto)
Comment 56•7 years ago
|
||
mozreview-review |
Comment on attachment 8932235 [details]
Bug 1407693 - Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API;
https://reviewboard.mozilla.org/r/203266/#review224134
I'm not an IPC peer, but given jld's comment 30, I think that's fine.
Attachment #8932235 -
Flags: review?(bobowencode) → review+
Comment 57•7 years ago
|
||
mozreview-review |
Comment on attachment 8932236 [details]
Bug 1407693 - Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file;
https://reviewboard.mozilla.org/r/203268/#review224238
Looking good, thanks a lot for the time you spent fixing this, sorry it's so messy :(
::: toolkit/crashreporter/nsExceptionHandler.h:178
(Diff revision 7)
> // indicating which remote process crashed first.
> bool TakeMinidumpForChild(uint32_t childPid,
> nsIFile** dump,
> uint32_t* aSequence = nullptr);
>
> +// TODO: can we use base::ProcessId?
There was a reason why we didn't use base:: here though I can't remember which.
Attachment #8932236 -
Flags: review?(gsvelto) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 62•7 years ago
|
||
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f18e1e557cf6
Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API; r=bobowen
https://hg.mozilla.org/integration/autoland/rev/9c3346021c21
Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file; r=gsvelto
Keywords: checkin-needed
Comment 63•7 years ago
|
||
Backed out for windows mingw32 bustages at /builds/worker/workspace/build/src/ipc/glue/GeckoChildProcessHost.cpp:1032
Push that caused the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=9c3346021c21eec994fc6b5dc6b68c00d05ea7f4
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=160900661&repo=autoland&lineNumber=13122
Backout: https://hg.mozilla.org/integration/autoland/rev/20ec5beec3715aff1ff658ea95a9806e03a2046e
Flags: needinfo?(agaynor)
Assignee | ||
Comment 64•7 years ago
|
||
We actually backed it out for breaking Android, not for breaking mingw. Nonetheless, working on it :-)
Flags: needinfo?(agaynor)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 67•7 years ago
|
||
Ok, updated and green on Android! Gabriele, are you the right person to review the Android changes?
Flags: needinfo?(gsvelto)
Comment 68•7 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #67)
> Ok, updated and green on Android! Gabriele, are you the right person to
> review the Android changes?
Not really, I did some crash-related hacking on Android but I'm not a peer there. You could ask :jchen or :snorp instead.
Flags: needinfo?(gsvelto)
Assignee | ||
Updated•7 years ago
|
Attachment #8932236 -
Flags: review?(snorp)
Attachment #8932236 -
Flags: review?(snorp) → review?(rbarker)
Seems ok to me, but rbarker wrote this code so I'll defer to him.
Comment 70•7 years ago
|
||
mozreview-review |
Comment on attachment 8932236 [details]
Bug 1407693 - Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file;
https://reviewboard.mozilla.org/r/203268/#review224960
Attachment #8932236 -
Flags: review?(rbarker) → review+
Comment 72•7 years ago
|
||
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/202bc739dda8
Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API; r=bobowen
https://hg.mozilla.org/integration/autoland/rev/bcc0a91dd43f
Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file; r=gsvelto,rbarker
Keywords: checkin-needed
Comment 73•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/202bc739dda8
https://hg.mozilla.org/mozilla-central/rev/bcc0a91dd43f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Updated•7 years ago
|
Updated•5 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•