Closed Bug 1072150 Opened 5 years ago Closed 4 years ago

Assert against accessing the subject principal when no AutoJSAPI is on the stack

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: bholley, Assigned: bholley)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: sec-other, Whiteboard: [post-critsmash-triage][adv-main44-])

Attachments

(4 files)

Our current architecture for computing the subject principal relies in inspecting the current compartment of JS execution (and assuming System Principal if there is none). This generally works well, but is easy to screw up in the case where code dispatches async runnables to continue its work.

One simple idea I had is for the nsRunnable constructor to detect whether an AutoJSAPI is active at the time the runnable is constructed. If so, the caller is required to either pass the runnable an nsIGlobalObject (whose compartment it will enter before invoking Run()) or explicitly invoke something like AllowDispatchWithoutSecurityToken(). If neither of these things happens before the runnable is dispatched, we assert.

Thoughts? We probably couldn't do this for a few months until we've fully switched everything over to AutoJSAPI, but that shouldn't be too far off.
nsRunnable is often used for cross-thread communication too, so please don't change that.
Perhaps we need nsSameThreadRunnable or such.

But anyhow, do we actually have any case where this is an issue.
Currently runnables behave like "native code running, no js on the stack" and changing that
would break stuff, I'm pretty sure.
(In reply to Olli Pettay [:smaug] from comment #1)
> nsRunnable is often used for cross-thread communication too, so please don't
> change that.

It would still be useful though, right? In the worker/main-thread case, we would pass the nsIGlobalObject (or similar) that would be used on the other thread. And when dispatching to threads where it doesn't make sense, we just use the explicit opt-out.

> But anyhow, do we actually have any case where this is an issue.

This is the biggest complaint I've heard from critics of IsCallerChrome() (i.e. bent and jonas), so I'm assuming that it is. Ben/Jonas, can you confirm?

> Currently runnables behave like "native code running, no js on the stack"
> and changing that
> would break stuff, I'm pretty sure.

Well, this patch would just require us to audit the cases that might be problematic, and make the right decision in each one.
I'd really like to understand the set of problems we're trying to solve.

From my point of view, introspection of the subject principal anywhere other than the very entry point of algorithms seems like a bug to me...
(In reply to Bobby Holley (:bholley) from comment #2)
> It would still be useful though, right? In the worker/main-thread case, we
> would pass the nsIGlobalObject (or similar) that would be used on the other
> thread.
And crash immediately. global objects aren't thread safe, they are something cyclecollactable


> 
> Well, this patch would just require us to audit the cases that might be
> problematic, and make the right decision in each one.
We have tons of nsRunnable usages. (including all the runnable methods etc.)
Assuming there are less stray subject principal checks than nsRunnables, what if instead you asserted, whenever subject principal was checked, that *some* Auto* guard had been entered since the event loop.  The idea being that you are asserting that someone higher up on the callstack had considered the question of what the current principal was.
Well, on the main thread we do http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/nsXPConnect.cpp?rev=e1f3be2c48f6&mark=1066-1066#1057
Everything expects currently that when runnable runs, it is treated like native code/chrome is running.
(In reply to Boris Zbarsky [:bz] from comment #3)
> I'd really like to understand the set of problems we're trying to solve.

Maybe Jonas can weigh in here and describe the scenarios he's concerned about.

> From my point of view, introspection of the subject principal anywhere other
> than the very entry point of algorithms seems like a bug to me...

Does this mean that we should never be inspecting the subject principal when there isn't an active AutoJSAPI on the stack? That seems to contradict what smaug says below:

(In reply to Olli Pettay [:smaug] from comment #6)
> Everything expects currently that when runnable runs, it is treated like
> native code/chrome is running.

Would it in fact break the world if invoking SubjectPrincipal()/IsCallerChrome() without an AutoJSAPI on the stack caused a MOZ_CRASH? If so, then from Boris' perspective we have a lot of bugs.
> Does this mean that we should never be inspecting the subject principal when there isn't 
> an active AutoJSAPI on the stack?

Pretty much, yes.

> If so, then from Boris' perspective we have a lot of bugs.

Yes, I think we do.
(In reply to Boris Zbarsky [:bz] from comment #8)
> > Does this mean that we should never be inspecting the subject principal when there isn't 
> > an active AutoJSAPI on the stack?
> 
> Pretty much, yes.

Ok. Then that's a very straightforward (if perhaps tedious) thing to start fixing.
Basically, we just make SubjectPrincipal() MOZ_CRASH in those situations and fix things until the tree is green. That solve the runnable case and makes our code generally better to boot.
Maybe I should expand on my position, just to make it clear.

The fundamental issue we've run into over and over is that we will have some API, call it Foo(), that is called both from script and for our own internal purposes.  Then we try to tell these two cases apart by checking the subject principal, and requiring "internal purposes" callers to push a null JSContext or use an AutoNoJSAPI or whatever.

But the fundamental bug here, it seems to me, is the lack of differentiation between the "called by someone who may not have permissions to do this, check whether they do" case and the "just do this thing" case.

In the XPCOM world, we were basically forced into this, since there was only one possible API entrypoint: the XPCOM method (getter, setter, whatever).

But in the Web IDL world, we're not forced into doing this.  We can, and imo should, expose different methods for JS and C++ to call into.  We can also hoist checks into bindings code as needed, effectively making the binding method the "JS method that does the security check" and the C++ impl as the "Just do it" method.  We've done his very successfully with [ChromeOnly], for example, which obviates the need for a huge number of IsCallerChrome() checks in the C++.

Maybe I'm missing something, of course, in which case I'd like people to tell me what that is.  But it seems to me like my proposal would eliminate the spooky action at a distance issues we get when you call some code that calls some other code that then happens to do a security check because it can happen to get invoked directly from JS, and suddenly you're screwed.
bz++. Does the action plan in comment 10 sound good?
The plan of comment 10 sounds pretty good, yeah.
Summary: Add stronger enforcement for inheriting security characteristics in async runnables → Assert against accessing the subject principal when no AutoJSAPI is on the stack
(In reply to Boris Zbarsky [:bz] from comment #3)
> From my point of view, introspection of the subject principal anywhere other
> than the very entry point of algorithms seems like a bug to me...

I agree with this. Though it's not clear to me what "algorithms" mean here exactly.

I would say "APIs" rather than "algorithms".

One tricky scenario is when one API/algorithm calls another. For example API/algorithm A might be implementing part of its functionality by calling API/algorithm B.

That means that while it's "the very entry point" of B, it's in the middle of A. Which per bz's comment above would be a bad idea (which I agree with).

So does that mean that A shouldn't be calling B? And that we should break out B's implementation into some function that can be called by both A and B? That way B can do its subject principal check before calling that function. 

This is why I think explicitly passing something representing the subject principal is a good idea, rather than grabbing it from a global. At least we can see in the code that some function that's about to use the subject principal is called. And we can either make sure that the correct subject principal is used, or that the code needs to be refactored so that subject principal checks only do happen at "the very entry point".
> One tricky scenario is when one API/algorithm calls another.

I would argue this sort of thing is a bug, unless a spec explicitly says to do that.

> And that we should break out B's implementation into some function that can be called by
> both A and B?

If we're just using B for some sort of internal stuff when called from A, yes.
I guess posting try pushes is probably pretty annoying given the encrypted bugmail, so I'll stop posting them until I get one that's green.
Blocks: 1150927
Group: core-security → dom-core-security
I can't believe we do this.
Attachment #8665136 - Flags: review?(bzbarsky)
For the curious, I was inspired to finish this up by bug 1201747, which this would have caught.
Comment on attachment 8665136 [details] [diff] [review]
Part 1 - Don't examine the subject principal in CheckSameOrigin. v1

Hmm.  Looks like nowadays the only caller other than XUL stuff is treewalker.

I guess we might not have extensions or chrome changing which document a treewalker is walking.. but watch out for regressions.  :(

r=me
Attachment #8665136 - Flags: review?(bzbarsky) → review+
Comment on attachment 8665137 [details] [diff] [review]
Part 2 - Introduce a transitional legacy API that works like things used to. v1

>+  // (whose access needed to be checked) and internal C++ platform code (whose

s/and internal/or internal/

r=nicecomments
Attachment #8665137 - Flags: review?(bzbarsky) → review+
Comment on attachment 8665138 [details] [diff] [review]
Part 3 - Use the opt-out for various sloppy consumers. v1

r=me, but how do we envision things like DispatchDOMEvent working in the long term?

Also, we can probably remove a bunch of those checks in WindowUtils, in followups.
Attachment #8665138 - Flags: review?(bzbarsky) → review+
Comment on attachment 8665139 [details] [diff] [review]
Part 4 - Crash in SubjectPrincipal if there's no active AutoJSAPI. v1

r=me, but we really should get followups for all that Legacy* stuff.
Attachment #8665139 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] from comment #35)
> Comment on attachment 8665138 [details] [diff] [review]
> Part 3 - Use the opt-out for various sloppy consumers. v1
> 
> r=me, but how do we envision things like DispatchDOMEvent working in the
> long term?

Naively, by having a separate API entry point for the scripted case, and doing the security check there. If we need to leave a couple of these in for a while, I don't think it costs us very much.

> Also, we can probably remove a bunch of those checks in WindowUtils, in
> followups.

Yeah, though I'd kinda rather wait until we turn off XPConnect for the web.
Assignee: nobody → bobbyholley
Sounds like a plan on both counts.
I filed bug 1208164 for the followups.
Comment on attachment 8665137 [details] [diff] [review]
Part 2 - Introduce a transitional legacy API that works like things used to. v1

Review of attachment 8665137 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsContentUtils.h
@@ +224,5 @@
> +  // To enforce this and catch bugs, nsContentUtils::SubjectPrincipal will crash
> +  // if it is invoked without script on the stack. To land that transition, it
> +  // was necessary to go through and whitelist a bunch of callers that were
> +  // depending on the old behavior. Those callers should be fixed up, and these
> +  // methods should not be used by new code without review from bholley or bz.

Should this be "a Privilege Manager peer" or something similar rather than mention the two of you by name? ;)
(In reply to Andrew McCreight [:mccr8] from comment #40)
> Comment on attachment 8665137 [details] [diff] [review]
> Part 2 - Introduce a transitional legacy API that works like things used to.
> v1
> 
> Review of attachment 8665137 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/base/nsContentUtils.h
> @@ +224,5 @@
> > +  // To enforce this and catch bugs, nsContentUtils::SubjectPrincipal will crash
> > +  // if it is invoked without script on the stack. To land that transition, it
> > +  // was necessary to go through and whitelist a bunch of callers that were
> > +  // depending on the old behavior. Those callers should be fixed up, and these
> > +  // methods should not be used by new code without review from bholley or bz.
> 
> Should this be "a Privilege Manager peer" or something similar rather than
> mention the two of you by name? ;)

Maybe, but I think in practice this is probably easier for people to figure out who to actually flag than a hypothetical module made up of the people who are paying attention to this issue. ;-)

I'm hoping that this will be a relatively temporary setup, so hopefully it won't be too long-lived.
(In reply to Bobby Holley (:bholley) from comment #41)
> I'm hoping that this will be a relatively temporary setup, so hopefully it
> won't be too long-lived.

Oh, good point. I wasn't think about that aspect of it.
Depends on: 1208769
nsContentUtils::IsCallerChrome() is now rather broken, I guess on purpose, but this bug changed its meaning from what it has meant for years, which has obviously lead to tons of MOZ_CRASH crashes.
I'm about to back https://hg.mozilla.org/mozilla-central/rev/f23c63830ae5 out so that Nightlies will be useable again.
The crashes are rather obvious, so I don't think the MOZ_CRASH is needed there atm.
https://hg.mozilla.org/integration/mozilla-inbound/rev/8bfcfc79f9c4
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1208680
(In reply to Olli Pettay [:smaug] from comment #45)
> nsContentUtils::IsCallerChrome() is now rather broken, I guess on purpose,
> but this bug changed its meaning from what it has meant for years, which has
> obviously lead to tons of MOZ_CRASH crashes.

Yes, that's how we find and annotate the callsites that need to be fixed.

It would have been more helpful to land/merge the crash fixes in the bugs (bug 1208517, bug 1208622, and bug 1208815), but oh well.

When those bugs hit central, I'll repush this patch.
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) from comment #48)
> It would have been more helpful to land/merge the crash fixes in the bugs
> (bug 1208517, bug 1208622, and bug 1208815), but oh well.
Sure, but I just wanted to get the Nightlies non-crash-y asap and didn't have time to go through
all the IsCallerChrome() callers.
Depends on: 1208758
https://hg.mozilla.org/mozilla-central/rev/8bfcfc79f9c4
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Merging a backout shouldn't re-close the bug... ;)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Depends on: 1210240
Depends on: 1210293
Depends on: 1210294
No longer depends on: 1072144
Flags: needinfo?(bobbyholley)
Flags: needinfo?(bobbyholley)
Flags: needinfo?(bobbyholley)
Flags: needinfo?(bobbyholley)
Depends on: 1212658
With the various fixes landed, Telemetry shows this case being hit in .1% of active runs. I think that's good enough to re-land. I'll  watch crash-stacks and fix any stragglers.
Flags: needinfo?(bobbyholley)
Note that I left the telemetry in and made this crash |#ifndef RELEASE_BUILD|, so that we can avoid any last-minute panics on release-day. I'll let the Telemetry ride one train ahead of the crash.
https://hg.mozilla.org/mozilla-central/rev/429258d9a178
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Depends on: 1214295
Depends on: 1214488
Depends on: 1214491
Depends on: 1214495
Group: dom-core-security → core-security-release
Depends on: 1215398
Depends on: 1216072
Depends on: 1216308
Depends on: 1217614
Depends on: 1219253
Depends on: 1222280
Depends on: 1235411
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main44-]
Group: core-security-release
Depends on: 1328054
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.