Closed Bug 326777 Opened 18 years ago Closed 17 years ago

[FIX]Event queue spinners must push null JSContext if they could be called from JS

Categories

(Core :: Security, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha8

People

(Reporter: bzbarsky, Assigned: bzbarsky)

Details

(Keywords: fixed1.8.0.13, fixed1.8.1.5, Whiteboard: [sg:high?] need r/sr=darin [need testcase])

Attachments

(3 files, 2 obsolete files)

Any code that could conceivably be called while content JS is on the stack must push a null JSContext on the JS context stack before processing events.  Not doing that leads to odd failure modes in some cases and security bugs in others.  It also makes fixing security bugs much more regression prone.

What follows is a list of likely culprits (who spin the event queue without pushing).  I'm ccing relevant module owners and peers (or whoever has CVS blame) as needed, in addition to general interested parties; please file bugs on your module and fix it accordingly!  I think we want this fixed on the 1.8.0 branch too, and possibly on 1.7.x.

I'd like us to set a date (say 2 weeks from now?) by which we'll have this fixed and just do it.  Note that we really do need some module owner action here, since I, for one, have no idea how some of this code _should_ be behaving when there's no JSContext stack or when pushing fails or whatever.  And I certainly don't know for most of this code whether it could run when content JS is on the stack.

Likely culprit list, as far as I can tell (I excluded tests of all sorts; these are the callers of HandleEvent() and ProcessPendingEvents() on event queues):

nsAutoConfig::downloadAutoConfig
nsLDAPSyncQuery::GetQueryResults
nsDOMParser::ParseFromStream
nsXMLHttpRequest::Send
NS_ShutdownXPCOM
nsEventQueueServiceImpl::PopThreadEventQueue
nsXMLDocument::Load
Various stuff in appshell?  Not sure.
Other widget stuff?  Again, no idea what this is doing.
nsMsgAccountManager::cleanupOnExit
nsMsgCopy::DoCopy
nsMapiHook::BlindSendMail
nsWSAUtils::GetOfficialHostName
imgThread::Run
nsSyncStreamListener::WaitForData
nsProxyObject::PostAndWait

Help with this would be much appreciated and help with general sanity of fixing security bugs....
Note also that at least some cases (opening modal window) rely on pushing a non-null JSContext on the stack as far as I can tell.  So just pushing null in the relevant event queue methods doesn't seem like it would work.  :(
Is this something we can make the eventqueue do? I hate having situations where you have to remember to make a second call every time you perform a certain action. Begs for whack-a-mole.
> imgThread::Run

not used and now removed
nsMapiHook::BlindSendMail shouldn't have a js context on the stack, nor should nsAutoConfig::downloadAutoConfig, though it's remotely possible that someone could write js code that caused these functions to be called.

nsMsgCopy::DoCopy does get called from js. I'm not sure about nsMsgAccountManager::cleanupOnExit - it gets called during a shutdown notification.
> Is this something we can make the eventqueue do?

Hmm... Possible, actually.  I thought nsXULWindow::ShowModal pushes non-null, but it does not.  So perhaps we should simply have ProcessPendingEvents do null-pushing.  What about HandleEvent()?  Do we really want to push and pop in a loop for every event there?  

Also, nsXULWindow::ShowModal uses yet a third method of processing events: it calls appShell->GetNativeEvent and appShell->DispatchNativeEvent.  I'm not sure what that actually ends up doing; I'll bet it's different on different platforms or something.

For trunk, we should just improve event queues, of course.  Say have only one true event processing path?
Yeah, I said this elsewhere but it may not have been clear: we shouldn't have an API that leaves everyone guessing whether a cx (null if appropriate) was pushed too, later/elsewhere.

/be
Whiteboard: [sg:high?]
Note that people on the web are discovering this problem -- see bug 370445 comment 41.
marking this as blocking since it's scary and it doesn't seem that hard to fix.
Assignee: dveditz → bzbarsky
Flags: blocking1.9? → blocking1.9+
There's no way I have the cycles to fix this (complete with testing, etc) in time for the next 1.8 security releases (which is where we need to fix it, imo)...  I'm not even sure I have the cycles for 1.9 at this point.... I have a whole bunch of other security stuff on my plate, and rapidly shrinking Mozilla work time.
Flags: blocking1.8.1.3?
Flags: blocking1.8.0.11?
sicking reminded me that we could try to do this in the event queue code on branch.  Not sure what the setup looks like on trunk...  And not sure whether we can make xpcom depend on xpconnect (which we'd need to do if we were to do that).  Plus the possible performance effects for the HandleEvent() callers; not sure whether that's an issue.

In any case, if someone else has the cycles to do this, they really should.  It's looking like April for me at this point, given my other commitments.  :(
Flags: blocking1.8.1.4?
Flags: blocking1.8.1.4+
Flags: blocking1.8.0.12?
Flags: blocking1.8.0.12+
One approach that might work well is to have XPCOM expose an API for a service that it uses (calling WillProcessEvents/DidProcessEvents or something on it) and have XPConnect implement that API and push/pop null.
So one problem with that is that we'd need to push/pop per thread event... that seems unfortunate.  Thing is, NS_ProcessPendingEvents (on branch) calls ProcessNextEvent(), which other consumers also call.

I suppose we could hook ProcessPendingEvents and NS_ProcessNextEvent/NS_ProcessPendingEvents, and change existing C++ consumers to not use ProcessNextEvent() directly.  Still doesn't help with third-party C++.

And of course trunk is a totally separate thing, enough so that it's not really worth doing it before branch somehow.
er... that previous comment mixed up some branch and trunk stuff.

On the _branch_ the only event processing APIs that nsIEventQueue exposes are ProcessPendingEvents and HandleEvent.  And appshell uses ProcessPendingEvents, so we should be good.
Attached patch Branch patchSplinter Review
This seems to do about what we want... Thoughts?

I'm kinda hoping we have decent perf tests on branch, because I'm not sure what the perf impact of this will be.  :(

I'll try to think of a way to handle trunk once we fix branch.
Attachment #261632 - Flags: superreview?(jst)
Attachment #261632 - Flags: review?(darin.moz)
Attachment #261632 - Flags: review?(benjamin)
Comment on attachment 261632 [details] [diff] [review]
Branch patch

Since nsEventQueue is used on background threads, have you verified that nsXPConnect can handle being destroyed on random background threads (in case the ListenerCaller thingy holds the last ref)?  Likewise, is nsXPConnect happy to receive notifications on random threads?
> Likewise, is nsXPConnect happy to receive notifications on random threads?

Yes.  The Push/Pop methods here look up the right stack for the current thread...

As for being destroyed, I'm not sure, to be honest.  :(  I don't understand our shutdown sequence well enough to say what could happen there.
Comment on attachment 261632 [details] [diff] [review]
Branch patch

I can't speak for xpconnect shutdown safety on this, but otherwise it looks good.
Attachment #261632 - Flags: review?(benjamin) → review+
I've been thinking, and I _think_ shutdown should be ok -- the service manager will hold a ref to nsXPConnect up to a point, and before that point we set gXPCOMShuttingDown to PR_TRUE, so after the service manager drops the ref event queues just won't be able to get the service.  So I don't see how the last release can happen off the main thread, unless an off-main-thread event queue is in the middle of processing events when NS_ShutdownXPCOM is called on the main thread.  Could that happen?
If we're worried about that, we could do the GetService before both the push and the pop.  A little slower, perhaps, but should be completely safe.  Let me know if you want me to do that?
Well, it could be, but all non-main-threads are supposed to shut down well before we end up releasing services (during the xpcom-shutdown-threads notification).
But if it was already on the stack... could that happen?  Or would it finish up processing an exit before we release services?
Oh, you mean calling NS_ShutdownXPCOM in the middle of a stack? That doesn't happen.
Whiteboard: [sg:high?] → [sg:high?] need sr=jst, trunk landing, branch landing
jst says this needs trunk baking before considering for branch, so too late for that in 1.8.1.4
Flags: blocking1.8.1.5+
Flags: blocking1.8.1.4+
Flags: blocking1.8.0.13+
Flags: blocking1.8.0.12+
Uh... there's no real way to do trunk baking here, imo.  The event code is completely different on trunk; the trunk and branch patches are completely different.  I don't even know _how_ to write a trunk patch for this yet.  The patch I've attached so far is branch-only, as comment 14 indicates.

If you mean that we need trunk baking of the basic concept, then we're in trouble.  As I said, I have no idea how to fix this in a performant way on the trink with the new APIs, and no time in the foreseeable future to plan out the relevant API changes to make it possible....
If someone else does decide to work on this on trunk, let me know and I'll brain-dump what thoughts I have so far on the APIs.
Target Milestone: --- → mozilla1.9alpha6
Attached patch Trunk patch (obsolete) — Splinter Review
I decided that trying to batch up the pushes/pops just wasn't worth it.  For reference, I did some profiling of loading a web page, clicking some links, etc with this patch.  Of the 263034 hits under nsThread::ProcessNextEvent, we have:

  313 under nsCOMPtr_base::assign_from_gs_contractid
  177 under nsXPConnect::AfterProcessNextEvent
  165 under nsXPConnect::OnProcessNextEvent

Total of about 0.25% spent in the new code.  I think that's acceptable...
Attachment #267916 - Flags: superreview?(darin.moz)
Attachment #267916 - Flags: review?(jst)
Attachment #267916 - Flags: review?(benjamin)
Summary: Event queue spinners must push null JSContext if they could be called from JS → [FIX]Event queue spinners must push null JSContext if they could be called from JS
Comment on attachment 267916 [details] [diff] [review]
Trunk patch

This patch looks OK.  One thought though:  from a performance point-of-view wouldn't it be better to have a function exported from xpcom/threads to set a global thread observer?  That would be a lot more light-weight.
Also, it might be nice for the "global thread observer" to actually be thread local :)
I suppose we could do it that way (setting it in the xpconnect module ctor and unsetting it in the xpconnect module dtor), yeah.

Making it thread-local is a bit of a pain, because we'd have to do it every time a new thread is created, no?
So I looked at exporting things.  Either I export it via nsThreadUtils (and probably make it MOZILLA_INTERNAL_API only so I can touch nsThread::stuff directly) or I need a new header in xpcom/thread, right?

I have to admit I'm not quite up on how exactly the glue, MOZILLA_INTERNAL_API, and other modules interact.  Is any of it documented somewhere?
I wouldn't use nsThreadUtils.h. Assuming you want it to be an internal-only API I don't see why you couldn't hang it on one of the existing private headers (nsThreadManager.h). Use NS_COM, which is for internal-only exports.

But couldn't you just cache the global thread observer service once (as a global or a member of nsThread, for example)? getservice is a little expensive because of the locking and hashtable lookup: the actual virtual function call is pretty insignificant.
> But couldn't you just cache the global thread observer service once

If we're sure it won't leak and that the thread spins up late enough that the GetService would succeed, etc, etc.

nsThreadManager.h is not exported to anywhere where XPConnect could include it, is it?  And I don't think we want it to be...

As for whether this should be internal-only, that's up to the module owner, I guess.  I don't really care either way.  Realistically, as far as I'm concerned this is all a hack to work around lack of built in knowledge of security in XPCOM.
You would have to include nsThreadManager.h directly from $(topsrcdir)/xpcom/threads, correct. I don't care whether you create a new header just for that function if you like. I don't think it should be exported in the external API because as you said it's an xpconnect-specific hack to deal with XPCOM lacking any security knowledge.

I that we can be fairly sure that non-main threads won't spin up until xpconnect is registered.
We especially need this on the main thread.  In fact, while I want to do it for all threads to be safe, chances are we only really need it on the main thread...

I'm not sure xpconnect would be happy with including non-exported stuff, so I guess new header it is.  :(
Attached patch Trunk patch with exported method (obsolete) — Splinter Review
Attachment #267916 - Attachment is obsolete: true
Attachment #267997 - Flags: superreview?(darin.moz)
Attachment #267997 - Flags: review?(jst)
Attachment #267997 - Flags: review?(benjamin)
Attachment #267916 - Flags: superreview?(darin.moz)
Attachment #267916 - Flags: review?(jst)
Attachment #267916 - Flags: review?(benjamin)
Attachment #267997 - Flags: review?(jst) → review+
Comment on attachment 267997 [details] [diff] [review]
Trunk patch with exported method

>+NS_SetGlobalThreadObserver(nsIThreadObserver* aObserver)
>+{
>+  if (aObserver && nsThread::sGlobalObserver) {
>+    return NS_ERROR_NOT_AVAILABLE;
>+  }
>+
>+  if (!NS_IsMainThread()) {
>+    return NS_ERROR_UNEXPECTED;
>+  }

nit, I'd prefer that this had debug spew on failure, with NS_ENSURE_ or NS_ERROR

>Index: js/src/xpconnect/src/nsXPConnect.cpp

>@@ -289,16 +293,20 @@ nsXPConnect::GetXPConnect()
>             delete gSelf;
>             gSelf = nsnull;
>         }
>         else
>         {
>             // Initial extra ref to keep the singleton alive
>             // balanced by explicit call to ReleaseXPConnectSingleton()
>             NS_ADDREF(gSelf);
>+            if (NS_FAILED(NS_SetGlobalThreadObserver(gSelf))) {
>+                NS_RELEASE(gSelf);
>+                // Fall through to returning null

It took me a while to realize that NS_RELEASE sets the var to null, so you might want to call that out explicitly.
Attachment #267997 - Flags: review?(benjamin) → review+
Comment on attachment 261632 [details] [diff] [review]
Branch patch

sr=jst
Attachment #261632 - Flags: superreview?(jst) → superreview+
punting remaining a6 bugs to b1, all of these shipped in a5, so we're at least no worse off by doing so.
Target Milestone: mozilla1.9alpha6 → mozilla1.9beta1
Flags: blocking1.8.1.5+ → blocking1.8.1.6?
Whiteboard: [sg:high?] need sr=jst, trunk landing, branch landing → [sg:high?] need r/sr=darin, trunk landing, branch landing
Comment on attachment 261632 [details] [diff] [review]
Branch patch

I haven't heard from darin in forever... I'm going to go ahead with this, given the reviews it already has.
Attachment #261632 - Flags: approval1.8.1.5?
Attachment #261632 - Flags: approval1.8.0.13?
Checked in the trunk patch.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Whiteboard: [sg:high?] need r/sr=darin, trunk landing, branch landing → [sg:high?] need r/sr=darin, branch landing
I backed this out because it sent tinderbox red:

/builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/js/src/xpconnect/src/nsXPConnect.cpp:318: undefined reference to `NS_SetGlobalThreadObserver(nsIThreadObserver*)'
nsXPConnect.o(.text+0x517b): In function `nsXPConnect::GetXPConnect()':
/builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/js/src/xpconnect/src/nsXPConnect.cpp:293: undefined reference to `NS_SetGlobalThreadObserver(nsIThreadObserver*)'
collect2: ld returned 1 exit status

Needless to say I cannot reproduce that over here in a debug build.  I guess it could be a GCC visibility issue or something?  I'm not building with gcc4.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Re-reading the patch, it appears that nsThread.cpp never #includes nsThreadUtilsInternal.h, so it wouldn't notice that the declaration was NS_COM and therefore exported.
Comment on attachment 261632 [details] [diff] [review]
Branch patch

approved for 1.8.1.5 and 1.8.0.13, a=dveditz for release-drivers

(looks like the thing that turned trunk red isn't going to affect the 1.8 patch)
Attachment #261632 - Flags: approval1.8.1.5?
Attachment #261632 - Flags: approval1.8.1.5+
Attachment #261632 - Flags: approval1.8.0.13?
Attachment #261632 - Flags: approval1.8.0.13+
Attachment #267997 - Attachment is obsolete: true
Attachment #271727 - Flags: superreview?(darin.moz)
Attachment #267997 - Flags: superreview?(darin.moz)
Checked in on the branches.
Flags: blocking1.8.1.6?
Checked in on trunk again.
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Whiteboard: [sg:high?] need r/sr=darin, branch landing → [sg:high?] need r/sr=darin
I checked this in to fix tinderbox asserts/crashes.  The problem was that a global thread observer was getting set while an event was being processed, so we had unbalanced calls causing us to pop when the stack was empty...
Attachment #272055 - Flags: review?(benjamin)
Attachment #272055 - Flags: review?(benjamin) → review+
This has been on the Trunk for a while, but does anyone have a testcase QA can use to verify on Trunk and branch?
Whiteboard: [sg:high?] need r/sr=darin → [sg:high?] need r/sr=darin [need testcase]
There isn't one that I know of (though we've had similar bugs in the past with testcases).  If you want to create one, you could make a sync XMLHttpRequest from some JS with expanded privileges, and then see whether the subject principal in some random C++ code that gets called off an event that got posted before the sync request has those privileges.  Not sure what the best C++ code to attack that way would be.
Group: security
Flags: in-testsuite?
Comment on attachment 261632 [details] [diff] [review]
Branch patch

Removing requests.  I doubt review from Darin is that likely at this point, and they keep popping up when I try to figure out which review requests I should follow up.
Attachment #261632 - Flags: review?(darin.moz)
Attachment #271727 - Flags: superreview?(darin.moz)
Boris: What did you want darins help with here. This feels like a too important bug to just give up on landing written patches.
I just wanted a sanity-check from Darin, since he's most familiar with that code.  All the patches are landed, and have been since July.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: