Don't attempt to create a file on crash in content process

RESOLVED FIXED in Firefox 60

Status

()

enhancement
RESOLVED FIXED
2 years ago
6 months ago

People

(Reporter: Alex_Gaynor, Assigned: Alex_Gaynor)

Tracking

(Depends on 1 bug)

Trunk
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox58 wontfix, firefox59 wontfix, firefox60 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Assignee

Description

2 years ago
Right now on a crash, our callback to the breakpad ExceptionHandler creates a new file in the content temp directory: https://dxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#1351-1390

We'd like to drop write permissions from this directory, and I believe this is the last use of that!

A fairly straight forward resolution of this would be to open the file in |SetRemoteExceptionHandler|, and pass it as the callback context.

I'll work up a patch which does this shortly.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 4

2 years ago
mozreview-review
Comment on attachment 8932235 [details]
Bug 1407693 - Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API;

https://reviewboard.mozilla.org/r/203266/#review208836

::: security/sandbox/win/src/sandboxbroker/sandboxBroker.h:60
(Diff revision 1)
> -  // Exposes AddTargetPeer from broker services, so that none sandboxed
> +  // Exposes AddTargetPeer from broker services, so that non-sandboxed
>    // processes can be added as handle duplication targets.
>    bool AddTargetPeer(HANDLE aPeerProcess);
>  
> +  void AddHandleToShare(HANDLE aHandle);

Oh dear, thanks for fixing.
While we're changing it, can we use JavaDoc-style comments, which I didn't realise were in the style guide at the time.

Also, can you add one to AddHandleToShare to explain that this handle will be inherited by the child.
Attachment #8932235 - Flags: review?(bobowencode) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 8

2 years ago
mozreview-review
Comment on attachment 8932236 [details]
Bug 1407693 - Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file;

https://reviewboard.mozilla.org/r/203268/#review210132

This looks alright with a nit addressed: you need to add stubs for the functions you've added to nsExceptionHandler.h in nsDummyExceptionHandler.cpp so that things will build even with crash reporting disabled. This is required because I've landed bug 1402519 in the meantime.

::: toolkit/crashreporter/nsExceptionHandler.h:178
(Diff revision 2)
>  // indicating which remote process crashed first.
>  bool TakeMinidumpForChild(uint32_t childPid,
>                            nsIFile** dump,
>                            uint32_t* aSequence = nullptr);
>  
> +// TODO: can we use base::ProcessId?

This is rather ugly but I assume there must have been a reason why we've avoided including chromium headers here...
Attachment #8932236 - Flags: review?(gsvelto) → review+
I'm looking into the tests now. I've been bothered about the way they get .extra/.dmp files for a while so this might be the right time to fix that.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
While trying to address the issues with the broken tests I've noticed that there's another problem with your patch that I hadn't spotted during review. If you try to run the toolkit/crashreporter/test/unit_ipc/test_content_oom_annotation_windows.js xpcshell test, you'll notice that the annotations that are added in the exception handler callback do not end up in the .extra file. I'm not sure if the issue is because the test isn't getting the correct .extra file or if the annotations are not written correctly. Thing is, the test does find an .extra file, but it's only partially populated.
Assignee

Comment 14

a year ago
Mmm, when I run this test I'm seeing a ton of unrelated error tracebacks, including "Failed to load module resource:///modules/AttributionCode.jsm" and a Python KeyError in mozlog.

Do these match what you're seeing? -- I suspect there's some underlying error in this output, but it's a bit hard to find :-)
Assignee

Comment 15

a year ago
Ok, disregard that, I managed to muddle through by ignoring the errors and debugging manually from your description :-) I'm debugging now.
Yeah, those errors are sadly all expected... The AttributionCode.jsm one is particularly annoying and was added recently IIRC.
Assignee

Comment 17

a year ago
Gabriele, should we expect this data to be coming from the WriteGlobaMemoryStatus call in PrepareChildExceptionTimeAnnotations or the call in MinidumpCallback?
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #17)
> Gabriele, should we expect this data to be coming from the
> WriteGlobaMemoryStatus call in PrepareChildExceptionTimeAnnotations or the
> call in MinidumpCallback?

Yes, and from my testing it seems to be called correctly. I haven't had time to check the handle that gets passed to the callback though. Either way it seems that some annotation end up in an .extra file, and some others (like those from PrepareChildExceptionTimeAnnotations) end up in another. Or at least that what it looks like.
Assignee

Comment 20

a year ago
So it doesn't appear that MinidumpCallback is ever invoked, at least in this test. (Verified by sticking a MOZ_CRASH() at the top of it and seeing if it crashed, which hopefully wouldn't get silently eaten). Is that expected?

I'm not sure I understand the full flow of when these callbacks are used, and when different ones are supposed to be called. Is it a bug that MinidumpCallback isn't called?

If not, the only other called of WriteGlobalMemoryStatus is PrepareChildExceptionTimeAnnotations, which writes it to the main annotation FD (the one this whole patchset is about :-)). If this is the call it is supposed to originate, I need to figure out how it's supposed to migrate to the .extra.

Are there any good docs to read to get more comfortable with this code?
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #20)
> So it doesn't appear that MinidumpCallback is ever invoked, at least in this
> test. (Verified by sticking a MOZ_CRASH() at the top of it and seeing if it
> crashed, which hopefully wouldn't get silently eaten). Is that expected?

Yes, MinidumpCallback is invoked only for main process crashes. This test is causing a content process crash so it shouldn't be called.

> If not, the only other called of WriteGlobalMemoryStatus is
> PrepareChildExceptionTimeAnnotations, which writes it to the main annotation
> FD (the one this whole patchset is about :-)). If this is the call it is
> supposed to originate, I need to figure out how it's supposed to migrate to
> the .extra.

Yes, that's the call that should do the annotations. When we hit a content process crash the main process is notified and it asks for a dump of the process that crashed. This should trigger OnChildProcessDumpRequested() callback:

https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/crashreporter/nsExceptionHandler.cpp#3234

This in turn will attempt to write an .extra file for the associated minidump by calling WriteExtraForMinidump():

https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/crashreporter/nsExceptionHandler.cpp#3269

This will retrieve the .extra file generated by the child process via GetExtraFileForChildPid():

https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/toolkit/crashreporter/nsExceptionHandler.cpp#3060

In your patch you've replaced this part by retrieving the file descriptor from the processToCrashFd which looks sane since it's the file that you've passed to the content process in the first place.

There's just a thing that might be relevant and which I hadn't noticed during review: you deregister the file descriptor and delete the file in GeckoProcessHost::~GeckoProcessHost(), but this destructor isn't called on the main thread, it's called on the I/O thread:

https://searchfox.org/mozilla-central/rev/1f9b4f0f0653b129b4db1b9fc5e9068054afaac0/ipc/glue/GeckoChildProcessHost.cpp#107

I wonder if we're racing here and deleting the content process' .extra file before we get the chance to retrieve the annotations. Note that if the file isn't present WriteExtraForMinidump() will return false, but OnChildProcessDumpRequested() won't do a thing, so the error might not be apparent.

> Are there any good docs to read to get more comfortable with this code?

Unfortunately not, we have a high-level overview of the whole process but it doesn't got into all the details and quirks of the implementation:

https://firefox-source-docs.mozilla.org/toolkit/crashreporter/crashreporter/index.html
Assignee

Comment 22

a year ago
Disabling the de-registration code doesn't cause this test to pass, so I don't think it's that (or at least, not _only_ that).

I'm hitting something that is completely baffling. If I run the test with --debugger="C:\path\to\windbg.exe" the test passes, even when I do nothing but run "g" (continue) when it starts. Running without a debugger reliably fails.

My goal was to put a DebugBreak() in WriteExtraForMinidump, however when running with the debugger that DebugBreak() doesn't fire, even though if I put an fprintf in there, it fires.

Are there issues running the crashreporter tests under a debugger?
Assignee

Comment 23

a year ago
With a bunch of fprintf debugging, what I've found is that it doesn't bail out of WriteExtraForMinidump anywhere, it makes it all the way through; the FD is present in the map, it writes it, etc. Next step is to add more prints until I know exactly what's getting written to the .extra
Assignee

Comment 24

a year ago
Ok, spent basically the entire day with printfs and a debugger:

- We definitely have the same HANDLE value in both the parent and the child.
- We write stuff to the child HANDLE, and then we close it.
- The parent's position on the HANDLE is at 0 as soon as we receive it -- this was surprising to me, but I think sharing the position between two processes might be a unix-ism.
- When we go to read from the HANDLE in the parent, we read 0 bytes from it.

Aaaaand, as I was writing this I came up with one additional idea and confirmed it: WriteFile in the child is returning FALSE, indicating it did not succeed.

I think this indicates that somehow I am not passing the HANDLE to the child correctly. I will resume investigating this tomorrow.
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #24)
> Ok, spent basically the entire day with printfs and a debugger:
> 
> - We definitely have the same HANDLE value in both the parent and the child.

I wonder if this is the problem.

I know we can't use DuplicateHandle in the parent, because we don't know the target process (because it doesn't exist yet).
However, maybe we have to use DuplicateHandle in the child with the child process as the source and target process, to get a handle we can use.
Just grasping at straws really.
If we could get hang of someone more familiar with Windows internals it would help... However searching around the internet seems to point to the fact that one cannot share a file handle on Windows between processes without calling DuplicateHandle() on it first. Our IPC code seems to follow this guideline, see this:

https://searchfox.org/mozilla-central/rev/b7e3ec2468d42fa59d86c03ec7afeb209813f1d4/ipc/glue/FileDescriptor.cpp#119
Status: NEW → ASSIGNED
Assignee

Comment 27

a year ago
*lights computer on fire*

HOKAY, SO! To share a HANDLE with a new process, you need to include it in PROC_THREAD_ATTRIBUTE_HANDLE_LIST, which is accomplished by the AddHandleToShare API.

AddHandleToShare is on the SandboxBroker.

We have two process spawning APIs, depending on whether the sandbox is enabled.

xpcshells disable the sandbox.

PROC_THREAD_ATTRIBUTE_HANDLE_LIST is therefore only set when the sandbox is running, meaning the HANDLE isn't available in the child process, causing these tests to fail.

It seems there's two ways to go here:

a) Use PROC_THREAD_ATTRIBUTE_HANDLE_LIST in the non-sandboxed spawn path
b) Consolidate the process spawning code paths.

Bob, I'm not sure what our plans are here, is (b) already on the roadmap?
Flags: needinfo?(bobowencode)
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #27)
...
> xpcshells disable the sandbox.

Isn't this kind of by accident?
 
> a) Use PROC_THREAD_ATTRIBUTE_HANDLE_LIST in the non-sandboxed spawn path
> b) Consolidate the process spawning code paths.
> 
> Bob, I'm not sure what our plans are here, is (b) already on the roadmap?

Well roadmap might be a bit strong, it should be fairly straight forward once we can unconditionally compile the sandbox code.
I think the only thing blocking that is the MinGW build for tor on which tjr is actively working.

I guess for the purposes of our tests I could consolidate when MOZ_SANDBOX is defined.
Then we could cut out the other stuff once we always have the sandbox.

I have a WIP patch somewhere, let me see how far I got.
Flags: needinfo?(bobowencode)
Assignee

Comment 29

a year ago
Accident is also kind of a strong word :-) It's intentional in the sense that we intentionally wrote code to do it; there's a bunch of xpcshell tests that do things like spawn TCP servers, and we didn't want refactoring all of those to block rolling the sandbox out.
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #27)
> It seems there's two ways to go here:
> 
> a) Use PROC_THREAD_ATTRIBUTE_HANDLE_LIST in the non-sandboxed spawn path
> b) Consolidate the process spawning code paths.

Upstream Chromium seems to do (a), for what it's worth: https://codesearch.chromium.org/chromium/src/base/process/launch_win.cc?l=231&rcl=329d49213d27cd7eaf30f52a7fbab6118a7c5c37

But also, so do we:
https://searchfox.org/mozilla-central/rev/e3cba77cee3f/ipc/chromium/src/base/process_util_win.cc#251

…kind of; it's currently only stdout and stderr:
https://searchfox.org/mozilla-central/rev/e3cba77cee3f/ipc/chromium/src/base/process_util_win.cc#363-366

However, it looks like it wouldn't be too hard to add something to LaunchOptions and wire it up in LaunchApp.
(In reply to Jed Davis [:jld] (⏰UTC-7) from comment #30)
...
> However, it looks like it wouldn't be too hard to add something to
> LaunchOptions and wire it up in LaunchApp.

This sounds like the simplest option then.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 35

a year ago
Went with that option, and indeed |test_content_oom_annotation_windows.js| passes cleanly on my machine now. Kicked off a try run and will review where we're at after lunch :-)
Assignee

Comment 36

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=59db81ed4739ce4167762a88f004b3badb3d39c1&group_state=expanded

Looks like the failing tests are:

- dom/plugins/test/mochitest/test_busy_hang.xul
- dom/plugins/test/mochitest/test_CrashService_hang.html
- browser/components/extensions/test/xpcshell/test_ext_browsingData.js  (and several others from this directory)

I believe these correspond to the TODO comments I put in part 3. IIRC, you were looking at changes to these tests Gabriele, did you make any progress on those, or should I dive in?
I hadn't looked at those yet. Hangs are nasty because they use a slightly different codepath to generate the minidump. I'll have a look later today to see if this is something trivial or not.
Assignee

Comment 38

a year ago
I'm reasonably certain the comment in hang_test.js I added is wrong -- that code is running in the parent, so it's acting on the final location of the minidump, not where the content process wrote to. I'm digging in to figure out where the actual bug is.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 42

a year ago
Ok, fixed the hang tests! https://treeherder.mozilla.org/#/jobs?repo=try&revision=afeab45e919a2aa33f0682e574b8483ead6be8fa&group_state=expanded

Down to just the failures which I believe are caused by the deletion issue, will dive into those tomorrow.
Assignee

Comment 43

a year ago
Pretty sure the solution here is to just use an anonymous pipe, instead of a disk file. Will implement today.
Assignee

Comment 44

a year ago
Gabriele, one of the limitations of a pipe is that there's a max write size before it starts to block -- obviously that'd be really bad in the crashreporter :-). Do you have any sense of how large the payloads we're typically writing for crash time annotations is?
Flags: needinfo?(gsvelto)
It depends on the annotations, most are small, usually a few tens of bytes. The TelemetryEnvironment annotation on the other hand is usually much bigger as it's a JSON object holding a significant amount of data (think kilobytes). Are you considering removing the content process .extra file altogether?
Flags: needinfo?(gsvelto)
Assignee

Comment 46

a year ago
The idea was to inherit a pipe HANDLE, instead of inheriting a disk file HANDLE. Is the TelemetryEnvironment annotation one of the ones that is sent as part of ExceptionTimeAnnotations?
I did some digging to be sure of what is annotated when. The only annotation we write to the content process .extra files are the ones in PrepareChildExceptionTimeAnnotations(). So OOMAllocationSize, MozCrashReason, ThreadIdNameMapping and additionally only on Windows those in WriteGlobalMemoryStatus(). Those are all small (tens of bytes) with the only exception of ThreadIdNameMapping which can be a few hundred KiBs. If you can push those over a pipe then we can get rid of the content process .extra file. This doesn't need to be perfect, we just have to check that if something gets truncated on the parent side it doesn't generate invalid annotations.

[1] https://searchfox.org/mozilla-central/rev/eeb7190f9ad6f1a846cd6df09986325b3f2c3117/toolkit/crashreporter/nsExceptionHandler.cpp#1264
Assignee

Comment 48

a year ago
Ooof. I misread your message as "few hundred bytes" and started porting to pipes, but you wrote hundreds of _KiB_. Pipe buffer sizes are max at like 65KiB, so that's not going to work. Going to dive into these tests and why they want to delete the temp dir.
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #48)
> Ooof. I misread your message as "few hundred bytes" and started porting to
> pipes, but you wrote hundreds of _KiB_.

No, that's a typo! I meant a few hundred bytes, the pipe idea is the best approach and it should work.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Updated

a year ago
Attachment #8932237 - Attachment is obsolete: true
Assignee

Comment 52

a year ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f01da56df3e7bfe4e08bd6d207c9829532d4100&group_state=expanded

tests look green on Windows with a pipe! (oranges are unrelated) -- going to kick off a build with Linux and macOS now.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 55

a year ago
Took a bit to get the tests green on Linux, but we're good! https://treeherder.mozilla.org/#/jobs?repo=try&revision=38877abc835b6dab5d47c999f903c3e2c1d45942&group_state=expanded

Going to re-r? since the patch has diverged significantly since the original reviews!
Assignee

Updated

a year ago
Attachment #8932235 - Flags: review+ → review?(bobowencode)
Assignee

Updated

a year ago
Attachment #8932236 - Flags: review+ → review?(gsvelto)

Comment 56

a year ago
mozreview-review
Comment on attachment 8932235 [details]
Bug 1407693 - Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API;

https://reviewboard.mozilla.org/r/203266/#review224134

I'm not an IPC peer, but given jld's comment 30, I think that's fine.
Attachment #8932235 - Flags: review?(bobowencode) → review+

Comment 57

a year ago
mozreview-review
Comment on attachment 8932236 [details]
Bug 1407693 - Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file;

https://reviewboard.mozilla.org/r/203268/#review224238

Looking good, thanks a lot for the time you spent fixing this, sorry it's so messy :(

::: toolkit/crashreporter/nsExceptionHandler.h:178
(Diff revision 7)
>  // indicating which remote process crashed first.
>  bool TakeMinidumpForChild(uint32_t childPid,
>                            nsIFile** dump,
>                            uint32_t* aSequence = nullptr);
>  
> +// TODO: can we use base::ProcessId?

There was a reason why we didn't use base:: here though I can't remember which.
Attachment #8932236 - Flags: review?(gsvelto) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 61

a year ago
Thanks!
Keywords: checkin-needed

Comment 62

a year ago
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f18e1e557cf6
Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API; r=bobowen
https://hg.mozilla.org/integration/autoland/rev/9c3346021c21
Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file; r=gsvelto
Keywords: checkin-needed
Backed out for windows mingw32 bustages at /builds/worker/workspace/build/src/ipc/glue/GeckoChildProcessHost.cpp:1032

Push that caused the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=9c3346021c21eec994fc6b5dc6b68c00d05ea7f4

Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=160900661&repo=autoland&lineNumber=13122

Backout: https://hg.mozilla.org/integration/autoland/rev/20ec5beec3715aff1ff658ea95a9806e03a2046e
Flags: needinfo?(agaynor)
Assignee

Comment 64

a year ago
We actually backed it out for breaking Android, not for breaking mingw. Nonetheless, working on it :-)
Flags: needinfo?(agaynor)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Assignee

Comment 67

a year ago
Ok, updated and green on Android! Gabriele, are you the right person to review the Android changes?
Flags: needinfo?(gsvelto)
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #67)
> Ok, updated and green on Android! Gabriele, are you the right person to
> review the Android changes?

Not really, I did some crash-related hacking on Android but I'm not a peer there. You could ask :jchen or :snorp instead.
Flags: needinfo?(gsvelto)
Assignee

Updated

a year ago
Attachment #8932236 - Flags: review?(snorp)
Attachment #8932236 - Flags: review?(snorp) → review?(rbarker)
Seems ok to me, but rbarker wrote this code so I'll defer to him.

Comment 70

a year ago
mozreview-review
Comment on attachment 8932236 [details]
Bug 1407693 - Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file;

https://reviewboard.mozilla.org/r/203268/#review224960
Attachment #8932236 - Flags: review?(rbarker) → review+
Assignee

Comment 71

a year ago
Thanks for all the reviews!
Keywords: checkin-needed

Comment 72

a year ago
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/202bc739dda8
Part 1 - Expose method for sharing a HANDLE to a child process in the sandboxing API; r=bobowen
https://hg.mozilla.org/integration/autoland/rev/bcc0a91dd43f
Part 2 - when a child process crashes, write extra annotation data to a pre-opened file descriptor instead of creating a new file; r=gsvelto,rbarker
Keywords: checkin-needed

Comment 73

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/202bc739dda8
https://hg.mozilla.org/mozilla-central/rev/bcc0a91dd43f
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Depends on: 1438106
Depends on: 1438406
No longer depends on: 1438406
Depends on: 1438209
Depends on: 1438219
You need to log in before you can comment on or make changes to this bug.