Closed Bug 920397 Opened 11 years ago Closed 11 years ago

PContentParent may close a process handle it doesn't own

Categories

(Core :: IPC, defect)

All
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox26 + verified
firefox27 + fixed

People

(Reporter: markh, Assigned: markh)

References

Details

(Whiteboard: [qa-])

Attachments

(3 files, 2 obsolete files)

STR:
* Set browser.tabs.remote=true

* From a "browser" scratchpad, execute:
    gBrowser.selectedTab = gBrowser.addTab();
    gBrowser.removeCurrentTab();

See also bug 919878.

In Windows debug build, the process *may* raise an exception indicating an invalid handle was closed.  I've tracked this down to:

PContentParent::Open() is passed a ProcessHandle from ContentParent::ContentParent().  The handle is "owned" directly by ContentParent and closed in the destructor.  At this point, PContentParent::mOtherProcess is a "borrowed" handle.

When a channel to the child is connected, ContentParent::OnChannelConnected() gets a *new* handle to the process and stores it in mOtherProcess (via SetOtherProcess()) - and the handle originally passed into ::Open() is overwritten with the new handle.  At this point, PContentParent::mOtherProcess is "owned" by PContentParent.

Later, if an error occurs talking to the child process, ContentParent::KillHard() arranges to close mOtherProcess.  This works fine as long as ContentParent::OnChannelConnected() has been called - but if it hasn't, it tries to close the "borrowed" handle passed to ::Open().  Then, when ContentParent::~ContentParent() is called it tries to close the same handle.  On debug windows builds, this causes an "invalid handle" exception.  With the above command executed in the scratchpad, the child process *may* be killed before OnChannelConnected() is called, causing this problem.
This sounds bad. Any idea if this is a problem on non-Windows?
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #1)
> This sounds bad. Any idea if this is a problem on non-Windows?

I believe not - a PID and a "process handle" are synonymous in process_util_posix.cc - closing a handle is a noop and "opening" a process handle from a PID just returns the PID.

That said, there might be a case for debug builds using a simple structure as a process handle on posix, with a flag to indicate if it is currently open and asserting if it is used incorrectly.  Similarly, debug builds on Windows could possibly use GetHandleInformation() to assert the handle is valid (I neglected to mention that the "invalid handle" exception is only raised when a debugger is attached to the process, so this would allow assertions when that isn't true)
Cool. We should fix this, I use FF on Windows in a debugger plenty and I don't want to randomly hit a scary exception if we can easily avoid it.

That being said this won't be a high priority since IPC is most heavily used on bsg (posix) at the moment.
I believe that this is the primary blocker for the Firefox desktop mochitests which use content processes.
Hm... We have Firefox mochitests that launch subprocesses on Windows running already (indexedDB and deviceStorage come to mind first). Are you talking about something else?
This is almost certainly the cause of bug 916757, for example, and is also affecting mhammond standing up mochitests for desktop e10s.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #7)
> This is almost certainly the cause of bug 916757, for example, and is also
> affecting mhammond standing up mochitests for desktop e10s.

FTR, I don't think it is exactly the same as bug 916757 - however, there are other similar bugs I'm trying to track down now - eg, I see the CloseHandle() call in SyncChannel::~SyncChannel() failing with the same exception in obscure cases and have managed to see assertions in w95thread.c too - but not yet in a way that gives me any insight into the problem.
I think I've got to the bottom of this.  After these patches are applied I can no longer see these assertions and the desktop e10s tests no longer crash the parent process.

There are 3 attachments in total, and only the last one is necessary to fix the problem - however, the first 2 perform additional checks which should make any lingering or future similar issues easier to spot.

This change is to ipc/chromium.  CloseProcessHandle will assert if Windows reports the handle couldn't be closed, and a couple of other places which called :CloseHandle() directly now call this function instead to get that additional checking.

Only asking for feedback as I'm not sure how we feel about changes to ipc/chromium, but I think the patch is otherwise ready for review.
Assignee: nobody → mhammond
Attachment #810988 - Flags: feedback?(bent.mozilla)
This patch adds an assertion to SyncChannel::~SyncChannel().  While debugging this, I found that SyncChannel ended up getting the same handle value as the recently closed process handles, which caused the ::CloseHandle() call to fail.  While there was no problem in the SyncChannel code, I figured this additional check was worthwhile as it may point out a problem in some other code.
Attachment #810989 - Flags: review?(bent.mozilla)
This patch has the fix.  It ensures that PContentParent always holds an "owned" reference to a handle and always closes it once.  This involves:

* A new function GetOwnedChildProcessHandle() in GeckoChildProcessHost.h which is called to pass the initially held handle.

* ContentParent::OnChannelConnected() closes the existing handle before setting the new one.  Ideally this would be done directly in PContentParent::SetOtherProcess(), but I've no idea how to make the IPDL code generator do this.  This is the reason I'm asking for feedback rather than review - but apart from that, I think the patch is ready to review (ie, if you think that's OK, please consider this a review request and I'll update the comment accordingly)

* ContentParent::KillHard() zeroes the handle so that the destructor doesn't close the handle again.

Try run at https://tbpl.mozilla.org/?tree=Try&rev=a1bc986cafbe
Attachment #810993 - Flags: feedback?(bent.mozilla)
doh - s/BOOL/bool/ in the cross-platform GetOwnedChildProcessHandle

New try: https://tbpl.mozilla.org/?tree=Try&rev=372eb422c7d8
Attachment #810993 - Attachment is obsolete: true
Attachment #810993 - Flags: feedback?(bent.mozilla)
Attachment #811007 - Flags: feedback?(bent.mozilla)
Attachment #810988 - Flags: feedback?(bent.mozilla) → review+
Attachment #810989 - Flags: review?(bent.mozilla) → review+
Comment on attachment 811007 [details] [diff] [review]
Part 3 - stop closing a handle twice.

I don't understand why GetOwnedChildProcessHandle is the way you made it. Can't we use DuplicateHandle instead of doing handle->pid->handle?

Also the comment about "ideally that would be done in SetOtherProcess" is confusing. Since IPDL currently doesn't manage owning the handle, it doesn't seem that SetOtherProcess ought to be aware of that ownership and so ContentParent::OnChannelConnected is the correct place to manage the ownership. If I'm wrong, we can certainly change the IPDL code here: http://mxr.mozilla.org/mozilla-central/source/ipc/ipdl/ipdl/lower.py#3168
Comment on attachment 811007 [details] [diff] [review]
Part 3 - stop closing a handle twice.

(In reply to Benjamin Smedberg  [:bsmedberg] from comment #13)
> Comment on attachment 811007 [details] [diff] [review]
> Part 3 - stop closing a handle twice.
> 
> I don't understand why GetOwnedChildProcessHandle is the way you made it.
> Can't we use DuplicateHandle instead of doing handle->pid->handle?

You mean an #ifdef XP_WIN DuplicateHandle #else just return the same handle?  That would certainly work.

Another issue is that I should be using OpenPrivilegedHandle() - that is how the handle was initially fetched and what OnChannelConnected calls.  Thoughts on how I should name/express things with that in mind?

> Also the comment about "ideally that would be done in SetOtherProcess" is
> confusing. Since IPDL currently doesn't manage owning the handle, it doesn't

Yes, you're correct about this - the ownership concept doesn't apply in PContentParent.
Attachment #811007 - Flags: feedback?(bent.mozilla)
(In reply to Mark Hammond (:markh) from comment #14)
> Comment on attachment 811007 [details] [diff] [review]
> Another issue is that I should be using OpenPrivilegedHandle() - that is how
> the handle was initially fetched and what OnChannelConnected calls. 
> Thoughts on how I should name/express things with that in mind?

Sorry, it's late :)  I should be using OpenPrivilegedHandle() but that doesn't impact the naming etc - mChildProcess was (and should continue to be) be privilileged.  So please ignore the second part of that.
Updated to remove the comment about PContentParent "owning" the handle and now uses OpenPrivilegedProcessHandle.  Note I didn't make any attempt to use ::DuplicateHandle on Windows as I don't believe this code is particularly hot and it just breaks the ProcessHandle abstraction - but let me know if you want me to change it.
Attachment #811007 - Attachment is obsolete: true
Attachment #811452 - Flags: review?(benjamin)
Attachment #811452 - Flags: review?(benjamin) → review+
Comment on attachment 811452 [details] [diff] [review]
Part 3 - stop closing a handle twice.

In practice this will only be hit in edge cases - but given we have thumbnails and (pref'd off) social using child processes on 26 and it's a simple patch, I think it's worth requesting uplift.  The other 2 patches are additional assertions and aren't necessary for the fix (but wouldn't hurt either!)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A 
User impact if declined: Unlikely, but possible, crash
Testing completed (on m-c, etc.): On m-c
Risk to taking this patch (and alternatives if risky): Relatively low risk - worst case would be a windows handle leak (but that's not considered likely)
String or IDL/UUID changes made by this patch: None
Attachment #811452 - Flags: approval-mozilla-aurora?
My understanding is that there are no content processes on Aurora, since both thumbnails and social have turned off content processes via pref.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #20)
> My understanding is that there are no content processes on Aurora, since
> both thumbnails and social have turned off content processes via pref.

Thumbnails are enabled on Aurora and (IIUC) the intention is this will ride the trains normally.
Carrying over the tracking from bug 916757

Mark, if you dupe a bug with tracking flags set, please re-nominate the bug where the work will continue or we lose it from our tracking for a release.
Attachment #811452 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: verifyme
I confirm the fix is verified on Latest Aurora 26 on Windows 7 x64. No exception is raised while running the script from Comment 0 in the Scratchpad

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 (20131024004004)
I've tried to reproduce this issue, but without success, with the Nightly debug build from 2013-09-25, installed from ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/09/2013-09-25-mozilla-central-debug/ on a Windows 7 64-bit machine (after previously trying on 2 other machines, where I couldn't launch Nightly from the console).

In the Scratchpad I get the following:

Exception: gBrowser is not defined
@Scratchpad/1:10
WCA_evalWithDebugger@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/main.js -> resource://gre/modules/devtools/server/actors/webconsole.js:838
WCA_onEvaluateJS@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/main.js -> resource://gre/modules/devtools/server/actors/webconsole.js:531
DSC_onPacket@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/main.js:1018
@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/main.js -> resource://gre/modules/devtools/server/transport.js:255
@resource://gre/modules/devtools/dbg-server.jsm -> resource://gre/modules/devtools/server/main.js -> resource://gre/modules/devtools/DevToolsUtils.js:72


I've also tried with a Nightly debug build after the fix, and in the Scratchpad I get the following:

Exception: gBrowser is not defined
@Scratchpad/1:10

This exception I'm getting is different from the ones mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=919878#c0

Is the issue that I'm seeing the same as the one reported in this bug?
Flags: needinfo?(mhammond)
(In reply to Manuela Muntean [:Manuela] [QA] from comment #26)
> I've tried to reproduce this issue, but without success, with the Nightly
> debug build from 2013-09-25, installed from
> ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/09/2013-09-25-
> mozilla-central-debug/ on a Windows 7 64-bit machine (after previously
> trying on 2 other machines, where I couldn't launch Nightly from the
> console).
> 
> In the Scratchpad I get the following:
> 
> Exception: gBrowser is not defined

It looks like you didn't set the scratchpad to execute in the "browser" environment, via the "Environment" menu on the scratchpad window.  If you don't see that menu, you probably need to set the pref 'devtools.chrome.enabled' to true.  Let me know if you need more help.
Flags: needinfo?(mhammond)
I've managed to set the Scratchpad to execute in the "browser" environment and retried to reproduce the issue with the Nightly debug build from 2013-09-25. 

But all I can see in the console is the warning mentioned in bug 919878: [Child 11280] WARNING: nsWindow::GetNativeData not implemented for this type: file o:/src/mozilla-git/central2/widget/xpwidgets/PuppetWidget.cpp, line 672


Taking the following into consideration: 

> In Windows debug build, the process *may* raise an exception indicating an
> invalid handle was closed.  I've tracked this down to:
> 
> PContentParent::Open() is passed a ProcessHandle from
> ContentParent::ContentParent().  The handle is "owned" directly by
> ContentParent and closed in the destructor.  At this point,
> PContentParent::mOtherProcess is a "borrowed" handle.
> 
> When a channel to the child is connected,
> ContentParent::OnChannelConnected() gets a *new* handle to the process and
> stores it in mOtherProcess (via SetOtherProcess()) - and the handle
> originally passed into ::Open() is overwritten with the new handle.  At this
> point, PContentParent::mOtherProcess is "owned" by PContentParent.
> 
> Later, if an error occurs talking to the child process,
> ContentParent::KillHard() arranges to close mOtherProcess.  This works fine
> as long as ContentParent::OnChannelConnected() has been called - but if it
> hasn't, it tries to close the "borrowed" handle passed to ::Open().  Then,
> when ContentParent::~ContentParent() is called it tries to close the same
> handle.  On debug windows builds, this causes an "invalid handle" exception.
> With the above command executed in the scratchpad, the child process *may*
> be killed before OnChannelConnected() is called, causing this problem.


I don't understand which is the warning I should be seeing in the console, in order to reproduce this bug, bug 920397.  Could you please give more details about that?
Flags: needinfo?(mhammond)
Sorry, I thought I was looking at a different (but similar) bug - bug 919878, and my previous comments reflected that other bug.

This bug wasn't able to be reproduced reliably in the console, so I don't think there is anything QA can do to help verify this.  I can confirm I've not seem the same symptoms since this landed, so I think we can qa- this and set it to verified.
Flags: needinfo?(mhammond)
Keywords: verifyme
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: