Closed Bug 834732 Opened 12 years ago Closed 12 years ago

Do a better job of maintaining the JS context stack

Categories

(Core :: XPConnect, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22
Tracking Status
firefox21 --- wontfix
firefox22 --- fixed
firefox-esr17 --- wontfix
b2g18 --- wontfix

People

(Reporter: bholley, Assigned: bholley)

References

Details

(Keywords: sec-moderate, Whiteboard: [adv-main22+])

Attachments

(15 files, 2 obsolete files)

16.73 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
989 bytes, patch
mrbkap
: review+
Details | Diff | Splinter Review
2.42 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
2.81 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
15.04 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
13.08 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
8.11 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
1.09 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
1.21 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
3.91 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
2.96 KB, patch
mrbkap
: review-
Details | Diff | Splinter Review
1.93 KB, patch
mrbkap
: review+
abillings
: sec-approval+
Details | Diff | Splinter Review
2.14 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
2.00 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
56.82 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
The code creates a special JSContext for the sandbox and pushes it. After evaluation, however, it begins to use the caller's cx again while the sandcx is still on the stack. There are calls to JS_Wrap*, which mean that we'll enter the XPConnect wrapper callbacks with |cx != nsContentUtils::GetCurrentJSContext()|, which is bad.

I don't believe this is a security issue, because the sandbox's cx is presumably always less privileged than its caller, so it's not a problem security-wise if we use that instead of the caller. We should nonetheless fix this.
So actually, my assertions seem to be catching a number of other places in the code where we don't properly push a cx. :-(

Given that we depend on the JSContext stack to determine SOW access (see the nsContentUtils::GetCurrentJSContext() assertion in AccessCheck.cpp), I'm marking this s-s for now.
Group: core-security
Summary: xpc_EvalInSandbox cx handling is sloppy → Do a better job of maintaining the JS context stack
Keywords: sec-audit
This is green, modulo one b-c test that hasn't come in yet. Uploading patches and flagging Blake for review.
The goal here is to get rid of this crap entirely, and make nsCxPusher always
push. But that's a scary change, so we do it in chunks. This patch, in particular,
should have zero behavioral change. This means preserving some very wrong behavior.
For instance, currently SafeAutoJSContext never pushes a damn thing, because the
safe JSContext doesn't have an associated nsIScriptContext. We preserve this
behavior, and in fact convert various similarly-buggy consumers to
SafeAutoJSContext, so that we can hoist the behavioral change into a subsequent
patch.
Attachment #708041 - Flags: review?(mrbkap)
Currently it never does, because the SafeJSContext doesn't have an
nsIScriptContext behind it. :-(
Attachment #708042 - Flags: review?(mrbkap)
This gets rid of one of the last consumers of REQUIRE_SCRIPT_CONTEXT.
Attachment #708043 - Flags: review?(mrbkap)
This gets rid of the last use of REQUIRE_SCRIPT_CONTEXT. \o/
Attachment #708044 - Flags: review?(mrbkap)
Now that we only have ALWAYS_PUSH and ASSERT_SCRIPT_CONTEXT, we have uniform
release-mode behavior everywhere. Remove the crap.
Attachment #708045 - Flags: review?(mrbkap)
We leave the nsIDOMEventTarget* versions fallible for now, but this makes the
common case a lot simpler. Note that this means that pushing a null JSContext,
a bug, is no longer handled at runtime. But I think we should just assert
against it, since there are already callers that don't check the return value.
Attachment #708046 - Flags: review?(mrbkap)
Attachment #708050 - Flags: review?(mrbkap)
Attachment #708041 - Flags: review?(mrbkap) → review+
Comment on attachment 708042 [details] [diff] [review]
Part 2 - Make SafeAutoJSContext actually push something. v1

It is mind boggling to me that this code was written like this and has lasted as long as it has.
Attachment #708042 - Flags: review?(mrbkap) → review+
Attachment #708043 - Flags: review?(mrbkap) → review+
Attachment #708044 - Flags: review?(mrbkap) → review+
Attachment #708045 - Flags: review?(mrbkap) → review+
Comment on attachment 708046 [details] [diff] [review]
Part 6 - Make nsCxPusher.Push(JSContext*) infallible. v1

Review of attachment 708046 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/base/nsRefreshDriver.cpp
@@ +556,5 @@
>  
>    nsCxPusher pusher;
> +  pusher.PushNull();
> +  DoTick();
> +  pusher.Pop();

Is there any reason to keep the explicit Pop here?
Attachment #708046 - Flags: review?(mrbkap) → review+
Comment on attachment 708047 [details] [diff] [review]
Part 7 - Implement stricter cx handling in xpc_EvalInSandbox. v1

Review of attachment 708047 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this. As you might have noticed, the changes to this function were done piecemeal and without any overarching architecture. This is much cleaner.

::: js/xpconnect/src/XPCComponents.cpp
@@ +3965,2 @@
>  
> +    // Whew!

Amen.
Attachment #708047 - Flags: review?(mrbkap) → review+
Comment on attachment 708048 [details] [diff] [review]
Part 8 - Push a cx in nsWindowSH::NewResolve. v1

Do we have a bug on getting rid of this? It makes no sense to switch contexts here and I suspect that this code was a wallpaper over some bug that has long been lost to the depths of time and has probably been fixed for real for a while now.
Attachment #708048 - Flags: review?(mrbkap) → review+
Attachment #708049 - Flags: review?(mrbkap) → review+
Comment on attachment 708050 [details] [diff] [review]
Part 10 - Push a cx in AdoptNode. v1

This isn't sufficient because we need to push the cx in GetContextAndScope. For this bug, we can do the easy thing and pass the cx pusher in, but please file a followup in trying to avoid do any context switching here.
Attachment #708050 - Flags: review?(mrbkap) → review-
Attachment #708051 - Flags: review?(mrbkap) → review+
Comment on attachment 708052 [details] [diff] [review]
Part 12 - Push any cx we grab off the object for XPCWrappedJS. v1

This is unnecessary because the XPCCallContext pushes the context for you. That being said, if you want to land the comment you added, please do.
Attachment #708052 - Flags: review?(mrbkap) → review-
Attachment #708053 - Flags: review?(mrbkap) → review+
I'm just going to take a guess at sec-high here.
Keywords: sec-auditsec-high
Blocks: 839467
(In reply to Blake Kaplan (:mrbkap) from comment #23)
> Do we have a bug on getting rid of this? It makes no sense to switch
> contexts here and I suspect that this code was a wallpaper over some bug
> that has long been lost to the depths of time and has probably been fixed
> for real for a while now.

It'll probably go away soonish, either as bz converts stuff to the new bindings or as I remove JSContext stuff.
(In reply to Blake Kaplan (:mrbkap) from comment #24)
> Comment on attachment 708050 [details] [diff] [review]
> Part 10 - Push a cx in AdoptNode. v1
> 
> This isn't sufficient because we need to push the cx in GetContextAndScope.
> For this bug, we can do the easy thing and pass the cx pusher in, but please
> file a followup in trying to avoid do any context switching here.

Filed bug 839467.
Attachment #708050 - Attachment is obsolete: true
Attachment #711795 - Flags: review?(mrbkap)
Attachment #711795 - Attachment is obsolete: true
Attachment #711795 - Flags: review?(mrbkap)
Attachment #711797 - Flags: review?(mrbkap)
Attachment #711797 - Flags: review?(mrbkap) → review+
Comment on attachment 708053 [details] [diff] [review]
Part 13 - Assert proper cx stack handling in WrapperFactory::Rewrap. v1

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

It depends. It's clear that we don't do a good job of pushing JSContexts everywhere, and some of these patches highlight exactly where we don't do that. On the other hand, understanding how to turn that into an exploit is very nontrivial. Indeed, it's not even immediately clear to me how o do that.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Not really.

Which older supported branches are affected by this flaw?

All of them, pretty much. This is just an issue of us being sloppy.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

They will probably be rather time-consuming, but pretty straightforward.

How likely is this patch to cause regressions; how much testing does it need?

Testing would be good. Most of this patch is just adding an assertion about the cx stack and fixing all the places where we hit that assert with our test suite. However, there may be paths to hit the assertion that we don't test on tinderbox.

I recommend that we check this in to inbound immediately, let it bake for a while, and then look at backporting.
Attachment #708053 - Flags: sec-approval?
This is green on try, so just waiting on sec-approval here.
landing now means it will be bumped to Aurora in a week. Assuming the sec-approval request is for the whole shebang and not just adding an assert in part 13 does that give us enough testing for a patch of this size?

On the other hand if we are going to want this in Firefox 21 anyway then landing now saves you one branch to worry about.
(In reply to Daniel Veditz [:dveditz] from comment #34)
> landing now means it will be bumped to Aurora in a week. Assuming the
> sec-approval request is for the whole shebang and not just adding an assert
> in part 13 does that give us enough testing for a patch of this size?

It's on the safe side for a patch of this size, modulo crashes due to hitting the assertion in ways our test suite didn't catch. Basically, each one of these patches is plugging a hole where we weren't pushing a JSContext but should have been. I'd like to check it in for this cycle.
 
> On the other hand if we are going to want this in Firefox 21 anyway then
> landing now saves you one branch to worry about.

Indeed.
We're too late in the cycle to land now. We're building this week to ship next week. I'd say that this has sec-approval for landing after we fork for the current release and branching.
Actually, forget that. Just land it now!
Attachment #708053 - Flags: sec-approval? → sec-approval+
Backed out for b2g build failures and mochitest-1 crashes on OSX.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a69f329fc7ee

https://tbpl.mozilla.org/php/getParsedLog.php?id=19683369&tree=Mozilla-Inbound

/builds/slave/m-in-ics_a7_g-0000000000000000/build/dom/system/gonk/SystemWorkerManager.cpp: In member function 'nsresult mozilla::dom::gonk::SystemWorkerManager::Init()':
/builds/slave/m-in-ics_a7_g-0000000000000000/build/dom/system/gonk/SystemWorkerManager.cpp:345: error: 'AutoSafeJSContext' was not declared in this scope
/builds/slave/m-in-ics_a7_g-0000000000000000/build/dom/system/gonk/SystemWorkerManager.cpp:345: error: expected ';' before 'cx'
/builds/slave/m-in-ics_a7_g-0000000000000000/build/dom/system/gonk/SystemWorkerManager.cpp:347: error: 'cx' was not declared in this scope
make[7]: Leaving directory `/builds/slave/m-in-ics_a7_g-0000000000000000/build/obj-b2g/dom/system/gonk'

https://tbpl.mozilla.org/php/getParsedLog.php?id=19684857&tree=Mozilla-Inbound

225902 INFO TEST-START | /tests/content/xbl/test/test_bug821850.xhtml
++DOMWINDOW == 45 (0x10c28e138) [serial = 3508] [outer = 0x11c4f5008]
JavaScript error: , line 0: can't access dead object
JavaScript error: , line 0: can't access dead object
JavaScript error: , line 0: can't access dead object
JavaScript error: , line 0: can't access dead object
[Parent 432] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004003: file ../../../../intl/uconv/src/nsCharsetConverterManager.cpp, line 301
225903 INFO TEST-PASS | /tests/content/xbl/test/test_bug821850.xhtml | Set contentVal
[Parent 432] WARNING: NS_ENSURE_TRUE(globalObject) failed: file ../../../../content/base/src/nsDocument.cpp, line 6834
[Parent 432] WARNING: NS_ENSURE_TRUE(globalObject) failed: file ../../../../content/base/src/nsDocument.cpp, line 6834
Assertion failure: XPCJSRuntime::Get()->GetJSContextStack()->Peek() == cx, at ../../../../js/xpconnect/wrappers/WrapperFactory.cpp:346
TEST-UNEXPECTED-FAIL | /tests/content/xbl/test/test_bug821850.xhtml | Exited with code 1 during test run
INFO | automation.py | Application ran for: 0:40:02.951440
INFO | automation.py | Reading PID log: /var/folders/qd/srwd5f710sj0fcl9z464lkj00000gn/T/tmpECKwbLpidlog
PROCESS-CRASH | /tests/content/xbl/test/test_bug821850.xhtml | application crashed [@ xpc::WrapperFactory::Rewrap(JSContext*, JSObject*, JSObject*, JSObject*, JSObject*, unsigned int)]
Crash dump filename: /var/folders/qd/srwd5f710sj0fcl9z464lkj00000gn/T/tmpQ8nX5k/minidumps/07EC9850-7EED-466C-B5EE-DEADB65C8686.dmp
Operating system: Mac OS X
                  10.7.2 11C74
CPU: amd64
     family 6 model 23 stepping 10
     2 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0x0

Thread 0 (crashed)
 0  XUL!xpc::WrapperFactory::Rewrap(JSContext*, JSObject*, JSObject*, JSObject*, JSObject*, unsigned int) [WrapperFactory.cpp:e650f1bab42b : 338 + 0x0]
    rbx = 0x00007fff761fd630   r12 = 0x000000011c4f55b0
    r13 = 0x000000010f303080   r14 = 0x0000000000000001
    r15 = 0x000000011c4f55b0   rip = 0x000000010225b57b
    rsp = 0x00007fff5fbfb9e0   rbp = 0x00007fff5fbfba90
    Found by: given as instruction pointer in context
 1  XUL!MarkInternal<JSObject> [Root.h:e650f1bab42b : 1018 + 0x4]
    rip = 0x00000001034c1548   rsp = 0x00007fff5fbfba60
    rbp = 0x00007fff5fbfba90
    Found by: stack scanning
 2  XUL!JSCompartment::wrap(JSContext*, JS::MutableHandle<JS::Value>, JS::Handle<JSObject*>) [jscompartment.cpp:e650f1bab42b : 387 + 0x16]
    rip = 0x000000010327058f   rsp = 0x00007fff5fbfbaa0
    rbp = 0x00007fff5fbfbb70
    Found by: stack scanning
 3  libsystem_c.dylib + 0xa11a3
    rip = 0x00007fff90d7d1a4   rsp = 0x00007fff5fbfbae0
    rbp = 0x00007fff5fbfbb70
    Found by: stack scanning
 4  libmozalloc.dylib!moz_xmalloc [mozalloc.cpp:e650f1bab42b : 54 + 0x4]
    rip = 0x00000001000cc96e   rsp = 0x00007fff5fbfbb00
    rbp = 0x00007fff5fbfbb70
    Found by: stack scanning
 5  XUL!JSCompartment::wrap(JSContext*, JSObject**, JSObject*) [jscompartment.cpp:e650f1bab42b : 426 + 0x4]
    rip = 0x0000000103270dbc   rsp = 0x00007fff5fbfbb80
    rbp = 0x00007fff5fbfbbb0
    Found by: stack scanning
 6  XUL!JS_WrapObject(JSContext*, JSObject**) [jsapi.cpp:e650f1bab42b : 1507 + 0x10]
    rip = 0x000000010322c87c   rsp = 0x00007fff5fbfbbc0
Sigh, turns out there's actually a lot more work do be done. Spent the day on this and pushing to try:

https://tbpl.mozilla.org/?tree=Try&rev=cbb490ac7c06
Note - I had to disable an assertion in bug 834697 because of one of these. I'll re-enable it in this patch stack.
It's annoying to add yet another RAII class, but we're solving a different
problem here. In particular, there are lots of callers that grab a cx off
of an nsIScriptContext and use it without pushing. Most of the time this
is ok, since that cx is also the active cx. But it's hard to tell when we
might potentially violate that invariant. What's more, we don't want to just
use an nsCxPusher, because that does expensive things even when the cx matches
the one on the top of the stack. Most of these consumers should just switch
to AutoJSContext, but doing such a change en masse is a pretty risky thing to
do. So let's introduce a class that gives us good performance in the common case
and correctness in the uncommon case.
Attachment #713904 - Flags: review?(mrbkap)
Comment on attachment 713904 [details] [diff] [review]
Part 12.1 - Introduce AutoPushJSContext. v1

Review of attachment 713904 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/public/nsContentUtils.h
@@ +2315,5 @@
> + */
> +class NS_STACK_CLASS AutoPushJSContext {
> +  nsCxPusher mPusher;
> +  JSContext* mCx;
> +  public:

Nit: newline before public. Also, should public be in column 0 to distinguish it from the declarations above?

@@ +2317,5 @@
> +  nsCxPusher mPusher;
> +  JSContext* mCx;
> +  public:
> +    AutoPushJSContext(JSContext* aCx) : mCx(aCx) {
> +      if (mCx && mCx != nsContentUtils::GetCurrentJSContext()) {

IMO this should either assert mCx is non-null or allow pushing a null context. Silently doing nothing is scary.
Attachment #713904 - Flags: review?(mrbkap)
Comment on attachment 713905 [details] [diff] [review]
Part 12.2 - Audit callers of GetNativeContext and use AutoPushJSContext where appropriate. v1

Review of attachment 713905 [details] [diff] [review]:
-----------------------------------------------------------------

I read through this and didn't see anything glaringly obvious. I think I've also managed to convince myself that this can't change behavior in cases that weren't already broken.
Attachment #713905 - Flags: review?(mrbkap) → review+
Comment on attachment 713904 [details] [diff] [review]
Part 12.1 - Introduce AutoPushJSContext. v1

Review of attachment 713904 [details] [diff] [review]:
-----------------------------------------------------------------

bholley pointed out that the uses of this class weren't consistent with my request, so r+. I asked him on IRC to add a comment about null contexts so that future users of the class will at least think about them.
Attachment #713904 - Flags: review+
Gave this a full try push to be extra safe, given how big of a change it is:
https://tbpl.mozilla.org/?tree=Try&rev=bc9876e864ad
All good except for some b2g build bustage I think. Fixed and repushing for b2g builds:

https://tbpl.mozilla.org/?tree=Try&rev=bbdcb0f93d28
Without any known way to trigger the potential vulnerabilities it's safer to not try to back-port this anywhere.
(In reply to Daniel Veditz [:dveditz] from comment #52)
> Without any known way to trigger the potential vulnerabilities it's safer to
> not try to back-port this anywhere.

Ok. moz_bug_r_a4, if you want to look at this, I'd suggest trying to find a way to create a mismatch between the stack-top cx and the cx being used by running code. Then, see if you can get the code to query the subject principal (using the cx stack), and exploit the resulting principal. ;-)
Depends on: 848538
Depends on: 851418
Whiteboard: [adv-main22+]
Depends on: 934563
No longer depends on: 934563
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: