Closed Bug 1253575 Opened 4 years ago Closed 2 years ago

crash in mozalloc_abort | NS_DebugBreak | mozilla::ipc::GeckoChildProcessHost::OpenPrivilegedHandle

Categories

(Core :: IPC, defect, P2, critical)

44 Branch
x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
Tracking Status
e10s + ---

People

(Reporter: cyu, Assigned: cyu)

References

Details

(Keywords: crash, Whiteboard: btpp-active)

Crash Data

Attachments

(3 files, 3 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-23a96e02-f185-4f45-be67-deec82160304.
=============================================================

This is also seen on older releases. For some unknown reason, ::OpenProcess() fails after ::CreateProcess(). Looks like we shouldn't assert ::OpenProcess() always succeeds and abort the chrome process when it doesn't.
Dave, any thoughts here?
Flags: needinfo?(dvander)
Whiteboard: btpp-followup-2016-03-11
I've never seen this code, but...

Given the stack, this is E10S opening up a content process. Since E10S is not shipping to 100% of users in the near future we should probably propagate the error instead, and fallback to single-process behavior. I don't know whether that's feasible. Either way, we'd want either Telemetry/crash-report annotations for the GetLastError() result of OpenProcess() when it fails. (And if the content process is crashing super fast, maybe there's a report for the same session that can be found.)
Flags: needinfo?(dvander)
Also note that this happens not only to e10s but also to plugins like this one:
https://crash-stats.mozilla.com/report/index/d24bf20b-2aa5-4a4c-a84d-ea3042160307#allthreads
and is affecting firefox 40 to 44.

I tried to kill the process right before OpenPrivlegedHandle() but still didn't crash the browser (OpenProcess() still succeeded). Trying to exit() early in child process startup also fails to reproduce. We'll need GetLastError() annotated in the crash report.
Assignee: nobody → cyu
Attachment #8727784 - Flags: feedback?(wmccloskey)
This patch allows us to know why ::OpenProcess() on Windows fails and we may need to decide whether it's legitimate to handle the error(s) (but I guess we may eventually need to handle the error instead of just aborting the browser).

BTW, the tests I had in comment #3 makes the browser unstable. Failure to start a plugin process (like an early crash) hangs the browser because PluginModuleParent blocks the main thread waiting for the process to start. Failure to start a content process doesn't block the UI, but the browser is in an unusable state. The user has to restart the browser.
Comment on attachment 8727784 [details] [diff] [review]
Annotate the crash report on failure in open the process handle

Review of attachment 8727784 [details] [diff] [review]:
-----------------------------------------------------------------

I think it would be better for OpenPrivilegedProcessHandle to return an error code rather than use a separate function.

Otherwise this looks fine.
Attachment #8727784 - Flags: feedback?(wmccloskey) → feedback+
Attachment #8727784 - Attachment is obsolete: true
Attachment #8728305 - Flags: review?(wmccloskey)
Comment on attachment 8728305 [details] [diff] [review]
Annotate the crash report on failure in open the process handle

Review of attachment 8728305 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/base/process_util.h
@@ +116,5 @@
>  // You have to close returned handle using CloseProcessHandle. Returns true
>  // on success.
> +bool OpenPrivilegedProcessHandle(ProcessId pid,
> +                                 ProcessHandle* handle,
> +                                 int64_t* error_code);

Let's allow someone to pass nullptr for error_code and make that the default for this parameter. Then you don't need to update the crash reporting callsites.

@@ +227,5 @@
>  // a different manner on POSIX.
>  bool DidProcessCrash(bool* child_exited, ProcessHandle handle);
>  
> +// Get the last system-level error code with process operations.
> +int64_t GetLastErrorCode();

Please remove.

::: ipc/chromium/src/base/process_util_posix.cc
@@ +346,5 @@
>  
>    return cpu;
>  }
>  
> +int64_t GetLastErrorCode() {

Please remove.
Attachment #8728305 - Flags: review?(wmccloskey) → review+
tracking-e10s: --- → ?
https://hg.mozilla.org/integration/mozilla-inbound/rev/670268bd39c9898036fc1aaf17560d7d79c6b5ff
Bug 1253575 - Annotate the crash report on failure in opening the process handle for plugin or content process. r=billm
The version that is pushed to inbound.
Attachment #8728305 - Attachment is obsolete: true
Attachment #8728834 - Flags: review+
Keywords: leave-open
impacts e10s and oopp, doesn't happen very often. tracking e10s but doesn't block. if this starts to show up more it'll get dealt with through top crash triage.
Priority: -- → P3
Was this a top 10 crasher from the e10s beta experiment?
Whiteboard: btpp-followup-2016-03-11 → btpp-active
Priority: P3 → P2
https://crash-stats.mozilla.com/report/index/b18920e3-6a80-4910-a8ec-fcec72160317 shows that GetLastError() returns 5, which is ACCESS_DENIED. I still don't know how such access can be denied by the system, but it looks like some condition that should be handled.
After taking a closer look, I think it may be safe to remove PROCESS_VM_READ privilege when opening a plugin/content/GMP process. We don't use the process handle in GeckoChildProcessHost for calling ::ReadProcessMemory().

Jim, do you have any concern if we take PROCESS_VM_READ out and only open non-privileged handle to the child process?
Flags: needinfo?(jmathies)
Depends on: 1258663
Please note that the implementation of GetSystemError() is in the patch for bug 1258633.
Attachment #8734695 - Flags: review?(gkrizsanits)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #16)
> After taking a closer look, I think it may be safe to remove PROCESS_VM_READ
> privilege when opening a plugin/content/GMP process. We don't use the
> process handle in GeckoChildProcessHost for calling ::ReadProcessMemory().
> 
> Jim, do you have any concern if we take PROCESS_VM_READ out and only open
> non-privileged handle to the child process?

AFAICT I agree we don't use that handle for much. Do you think this will help solve the access denied errors?
Flags: needinfo?(jmathies)
(In reply to Jim Mathies [:jimm] from comment #18)
> AFAICT I agree we don't use that handle for much. Do you think this will
> help solve the access denied errors?

It may help, but I am not sure if it can really eliminate this error. It's never reproduced locally, and we've seen only one annotated error on nightly. There are more crashes on release and beta channels, and we still don't know if ::OpenProcess() failed for denial of access. Anyway, the ultimate fix would be handling failure and retrying the call, or notifying the user about the problem.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #0)
> report bp-23a96e02-f185-4f45-be67-deec82160304.

Not that there is a Samsung DRM dll loaded in the process...

(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #15)
> https://crash-stats.mozilla.com/report/index/b18920e3-6a80-4910-a8ec-
> fcec72160317 shows that GetLastError() returns 5, which is ACCESS_DENIED. I
> still don't know how such access can be denied by the system, but it looks
> like some condition that should be handled.

And here we have a privman64.dll which is some 3rd party security magic:
http://www.beyondtrust.com/Solutions/ManagingWindowsDesktops/

I've looked into this problem and it's quite surprising that we start up a child process with CreateProcess, which should then inherit the parent processes privileges and despite that we cannot open it's handle with ACCESS_DENIED.

I can imagine two cases where this happens after some thorough research. If the parent process is a Protected Process (used from DRM in Vista) or maybe a system protected process (Windows 8.1 and up) and these processes do not allow their handles to be duplicated. So opening the handle with PROCESS_DUP_HANDLE would give this error. Problem is that I cannot imaging Firefox running in this mode...

The other option is that another application is injecting a dll into the parent process and hooks into the OpenProcess function hijacking it and opening the child process with different privileges than the parent (either for DRM or for some security reasons or for whatever stupid reason). This can be the case in the two crashes above but I've seen crash with this signature and without any suspicious dll loaded...

Now if we want to know what's going on I think additionally to GetLastError we should get more information about the current processes security settings and maybe also retry to reopen the handle with PROCESS_QUERY_LIMITED_INFORMATION and get as much info about the child as we can.

(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #16)
> After taking a closer look, I think it may be safe to remove PROCESS_VM_READ
> privilege when opening a plugin/content/GMP process. We don't use the
> process handle in GeckoChildProcessHost for calling ::ReadProcessMemory().
> 
> Jim, do you have any concern if we take PROCESS_VM_READ out and only open
> non-privileged handle to the child process?

I don't think it would help a lot, PROCESS_DUP_HANDLE will fail sooner anyway.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #20)
> (In reply to Cervantes Yu [:cyu] [:cervantes] from comment #0)
> > report bp-23a96e02-f185-4f45-be67-deec82160304.
> 
> Not that there is a Samsung DRM dll loaded in the process...
> 
> And here we have a privman64.dll which is some 3rd party security magic:
> http://www.beyondtrust.com/Solutions/ManagingWindowsDesktops/
> 
There are also Kingsoft Web Shield, Symantec intrusion detection, Taobao Protect, just to name a few.

> Now if we want to know what's going on I think additionally to GetLastError
> we should get more information about the current processes security settings
> and maybe also retry to reopen the handle with
> PROCESS_QUERY_LIMITED_INFORMATION and get as much info about the child as we
> can.
>

What info are we expecting to see, like security descriptor? With PROCESS_QUERY_LIMITED_INFORMATION, there are GetExitCodeProcess, GetPriorityClass, IsProcessInJob, QueryFullProcessImageName, but they don't seem to help in getting the information we want.

Actually we already have a process handle from base::LaunchApp(). I wonder if it's necessary that we open another privileged handle given we already have one.
Comment on attachment 8734695 [details] [diff] [review]
Use nonprivileged process handle in GeckoChildProcessHost

Review of attachment 8734695 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure about this patch, it looks to me that for the crash reporter we use minidumps so there this could cause trouble: http://hg.mozilla.org/mozilla-central/annotate/d5d53a3b4e50/toolkit/crashreporter/google-breakpad/src/client/windows/crash_generation/minidump_generator.cc#l365

Are you absolutely sure that this is not the case?

Also, is there a reason why you use "LastError" for annotation in the previous patch but not in this one and the one in bug 1258663?
Attachment #8734695 - Flags: review?(gkrizsanits)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #22)
> Comment on attachment 8734695 [details] [diff] [review]
> Use nonprivileged process handle in GeckoChildProcessHost
> 
> Review of attachment 8734695 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm not sure about this patch, it looks to me that for the crash reporter we
> use minidumps so there this could cause trouble:
> http://hg.mozilla.org/mozilla-central/annotate/d5d53a3b4e50/toolkit/
> crashreporter/google-breakpad/src/client/windows/crash_generation/
> minidump_generator.cc#l365
> 
> Are you absolutely sure that this is not the case?

I think this handle is from https://dxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/windows/crash_generation/client_info.cc#62 not from GeckoChildProcessHost. But I agree that removing PROCESS_VM_READ may not help since we've seen that in many crashes 3rd party dlls are involved. If these dlls are responsible for the crashes, using a non-privileged handle may not get us much farther and the browser could still crash soon.

> 
> Also, is there a reason why you use "LastError" for annotation in the
> previous patch but not in this one and the one in bug 1258663?

I think SystemError is clearer than LastError in that it indicates that this is a system-level error, not application level one like an nsresult. Since we'll need to annotate IPC errors with system calls, I'd like to use a clear an neutral name for it.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #21)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #20)
> > Now if we want to know what's going on I think additionally to GetLastError
> > we should get more information about the current processes security settings
> > and maybe also retry to reopen the handle with
> > PROCESS_QUERY_LIMITED_INFORMATION and get as much info about the child as we
> > can.
> >
> 
> What info are we expecting to see, like security descriptor? With
> PROCESS_QUERY_LIMITED_INFORMATION, there are GetExitCodeProcess,
> GetPriorityClass, IsProcessInJob, QueryFullProcessImageName, but they don't
> seem to help in getting the information we want.
> 

Yeah that's really not much info, I would prefer focusing on the current, opening process instead
and get all the info about it using something like GetSecurityInfo. In particular I'm curios if this crash happens only when running ff in special, probably restricted mode or not. Unless we already have that info somewhere available...

> Actually we already have a process handle from base::LaunchApp(). I wonder
> if it's necessary that we open another privileged handle given we already
> have one.

That's a very good point. It makes zero sense to me to try to open that handle, it was invented in bug 774131, but CreateProcess should return the process handle with full available access right... That being said we probably will get in the same trouble with that handle later for the same reason why the open fails. But still it really makes no sense to me to open another handle. Especially since I don't see any code closing the handle base::LaunchApp returns... I'm kind of leaning toward not opening a new handle tbh, do you want to give it a try?

> > 
> > Also, is there a reason why you use "LastError" for annotation in the
> > previous patch but not in this one and the one in bug 1258663?
> 
> I think SystemError is clearer than LastError in that it indicates that this
> is a system-level error, not application level one like an nsresult. Since
> we'll need to annotate IPC errors with system calls, I'd like to use a clear
> an neutral name for it.

I think SystemError is the right label, I just was wondering if you have any reason
to keep LastError for http://hg.mozilla.org/mozilla-central/annotate/494289c72ba3/ipc/glue/GeckoChildProcessHost.cpp#l1118 or can we change that one too to SystemError?
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #24)
> process handle with full available access right... That being said we
> probably will get in the same trouble with that handle later for the same
> reason why the open fails.

Except these dll's make the open call fail but let create process return a valid opened handle, in which case this would totally fix the crashes. So yeah I think we should try to use that original handle instead of trying to open a new one.
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #24)
> That's a very good point. It makes zero sense to me to try to open that
> handle, it was invented in bug 774131, but CreateProcess should return the
> process handle with full available access right... That being said we
> probably will get in the same trouble with that handle later for the same
> reason why the open fails. But still it really makes no sense to me to open
> another handle. Especially since I don't see any code closing the handle
> base::LaunchApp returns... I'm kind of leaning toward not opening a new
> handle tbh, do you want to give it a try?
> 
The handle from base::LaunchApp is set to the base class and will be closed in class ProcessWatcher. And the privileged handle in GeckoChildProcessHost will be closed in its GeckoChildProcessHost::SetAlreadyDead(). The use of 2 process handles could lead to inconsistent views of the child process between the base and derived class. We can consider removing GeckoChildProcessHost::mChildProcessHandle and only use the one from the base class. But care must be taken given the interplay between the base class from chromium and our glue code. I'll give it a try.

> 
> I think SystemError is the right label, I just was wondering if you have any
> reason
> to keep LastError for
> http://hg.mozilla.org/mozilla-central/annotate/494289c72ba3/ipc/glue/
> GeckoChildProcessHost.cpp#l1118 or can we change that one too to SystemError?

We can change that to SystemError for consistency.
This removes the redundant handle, GeckoChildProcessHost::mChildProcessHandle, and cleans up the class by hiding its base class and the process handle.

Jet, do you have any concern with this patch, especially SetAlreadyDead() that handles process termination?
Attachment #8734695 - Attachment is obsolete: true
Attachment #8738969 - Flags: feedback?(jld)
Attachment #8738969 - Flags: feedback?(gkrizsanits)
Comment on attachment 8738969 [details] [diff] [review]
Remove the privileged handle in GeckoChildProcessHost

The change to SetAlreadyDead looks fine.  I don't really understand why the second process handle exists, so I can't say much about the rest of the patch.
Attachment #8738969 - Flags: feedback?(jld) → feedback+
Comment on attachment 8738969 [details] [diff] [review]
Remove the privileged handle in GeckoChildProcessHost

Review of attachment 8738969 [details] [diff] [review]:
-----------------------------------------------------------------

This looks great to me.
Attachment #8738969 - Flags: feedback?(gkrizsanits) → feedback+
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #33)
> Created attachment 8764877 [details]
> Bug 1253575 - Fallback to DuplicateHandle() when
> base::OpenPrivilegedProcessHandle() fails when starting a child process.
> 
This is based on the observation that we never saw intra-process DuplicateHandle() fail in the chrome process. This should bail out when it fails in OpenProcess() for any reason.
Crash Signature: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::GeckoChildProcessHost::OpenPrivilegedHandle] → [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::GeckoChildProcessHost::OpenPrivilegedHandle] [@ Abort | can't open handle to child process | mozalloc_abort | NS_DebugBreak | mozilla::ipc::GeckoChildProcessHost::OpenPrivilegedHandle]
Comment on attachment 8764877 [details]
Bug 1253575 - Fallback to DuplicateHandle() when base::OpenPrivilegedProcessHandle() fails when starting a child process.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/60512/diff/1-2/
Attachment #8764877 - Flags: review?(gkrizsanits)
Comment on attachment 8764877 [details]
Bug 1253575 - Fallback to DuplicateHandle() when base::OpenPrivilegedProcessHandle() fails when starting a child process.

https://reviewboard.mozilla.org/r/60512/#review65896

Sorry for the lag, I just had to reload this code back into my memory :) This looks great to me, thanks!

::: ipc/glue/GeckoChildProcessHost.cpp:1155
(Diff revision 2)
> -  {
> +#ifdef XP_WIN
> +      // If we failed in opening the process handle, try harder by duplicating
> +      // one.
> +      && !::DuplicateHandle(::GetCurrentProcess(), process,
> +                            ::GetCurrentProcess(), &mChildProcessHandle,
> +                            PROCESS_DUP_HANDLE | PROCESS_TERMINATE |
> +                            PROCESS_QUERY_INFORMATION | PROCESS_VM_READ |
> +                            SYNCHRONIZE,
> +                            FALSE, 0)
> +#endif

Why do we want to restrict this part for XP?
Attachment #8764877 - Flags: review?(gkrizsanits) → review+
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #36)
> Why do we want to restrict this part for XP?
Thanks for the review. Build flag XP_WIN applies to all Windows versions. :)
Pushed by cyu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b214bedbeab6
Fallback to DuplicateHandle() when base::OpenPrivilegedProcessHandle() fails when starting a child process. r=krizsa
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.