Closed Bug 1530854 Opened 5 years ago Closed 5 years ago

Assertion failure: principalCSPCount == argCSPCount (Different PolicyCount for CSP as arg and Principal)

Categories

(Core :: DOM: Security, defect, P1)

67 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 --- unaffected
firefox67 --- fixed

People

(Reporter: bc, Assigned: ckerschb)

References

()

Details

(Keywords: assertion, regression, Whiteboard: [3/6] patch waiting for review [domsecurity-active])

Attachments

(2 files)

  1. https://www.balloon-juice.com/

Nightly Debug Windows 10 x86_64.

Assertion failure: principalCSPCount == argCSPCount (Different PolicyCount for CSP as arg and Principal), at z:/build/build/src/docshell/base/nsDocShell.cpp:9969
#01: nsDocShell::LoadHistoryEntry(nsISHEntry *,unsigned int) [docshell/base/nsDocShell.cpp:11619]
#02: nsDocShell::Reload(unsigned int) [docshell/base/nsDocShell.cpp:0]
#03: mozilla::dom::Location::Reload(bool) [dom/base/Location.cpp:792]
#04: static bool mozilla::dom::Location_Binding::reload(struct JSContext *, class JS::Handle<JSObject *>, class mozilla::dom::Location *, const class JSJitMethodCallArgs & const) [s3:gecko-generated-sources:da4b97d94bbf6ba8265263f2034409d5c309fe3abc95da189b68a365b6b5e7c6d1b2465eadb51b8ef0ddd82603fd7393e0527c2ba2e94779d63c3b4f5d5f67f8/dom/bindings/LocationBinding.cpp::1136]
#05: mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions>(JSContext *,unsigned int,JS::Value *) [dom/bindings/BindingUtils.cpp:3146]
#06: CallJSNative(JSContext *,bool (*)(JSContext *,unsigned int,JS::Value *),JS::CallArgs const &) [js/src/vm/Interpreter.cpp:440]
#07: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:532]
#08: static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const) [js/src/vm/Interpreter.cpp:587]
#09: static bool Interpret(struct JSContext *, class js::RunState & const) [js/src/vm/Interpreter.cpp:3051]
#10: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:420]
#11: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:560]
#12: static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const) [js/src/vm/Interpreter.cpp:587]
#13: js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:603]
#14: js::fun_apply(JSContext *,unsigned int,JS::Value *) [js/src/vm/JSFunction.cpp:1219]
#15: CallJSNative(JSContext *,bool (*)(JSContext *,unsigned int,JS::Value *),JS::CallArgs const &) [js/src/vm/Interpreter.cpp:440]
#16: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:532]
#17: static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const) [js/src/vm/Interpreter.cpp:587]
#18: static bool js::jit::DoCallFallback(struct JSContext *, class js::jit::BaselineFrame *, class js::jit::ICCall_Fallback *, unsigned int, union JS::Value *, class JS::MutableHandle<JS::Value>) [js/src/jit/BaselineIC.cpp:0]
#19: ??? (???:???)

Manually reproduced on Windows. Bughunter also saw this on Ubuntu 18.10 for asan-debug.

Assertion added in bug 1518454

Blocks: 1518454
Keywords: regression

There are known follow ups to Bug 1518454, e.g. see Bug 1524970 - this doesn't need to be hidden.

I can reproduce on my linux machine - taking this bug.

Assignee: nobody → ckerschb
Group: core-security
Status: NEW → ASSIGNED
Priority: -- → P1
See Also: → 1524970
Whiteboard: [domsecurity-active]

It took me quite some time to figure out what's going on within this bug. So here is the deal:

At some point when loading https://www.balloon-juice.com/ we create a history entry [1] for https://videos.culturalconvergencemedia.com/ads/BalloonsJuice/balloonsjuice/balloonsjuice.html. At this point the explicit CSP on the history entry is null and the CSP on the Principal within the history entry is also null.

At some point the script https://cdns.connatix.com/p/1437/min/connatix.renderer.infeed.min_dc.js loads which delivers a meta csp of 'upgrade-insecure-requests' which sets a CSP on that same Principal we store in the history entry. At this point the explicit CSP stored in the history entry and the CSP stored on the Principal within the history entry are out of sync.

We then load that history entry and 'boom' the assertion fires.

Boris, what do you think we should do? Should we create a clone of the Principal at the time the history entry is created?

[1] https://searchfox.org/mozilla-central/source/docshell/shistory/nsSHEntry.cpp#383

Flags: needinfo?(bzbarsky)

Creating a clone of the principal is no good; for example there's no way to clone nullprincipals right now.

What we should probably do is update the CSP in the current history entry (or all the history entries for the current document?) at the point when we update the CSP of the document. Or alternately store a mutable CSP by reference in the history entry and in the document and mutate it when we have new directives or something.

Flags: needinfo?(bzbarsky)

Hey Nika, if I have a page A that creates FrameA, then we first create a nsISHEntry entry for A, then a an nsISHEntry entry for FrameA and then we call ::AddChild() so that A holds a link to it's child FrameA - so far so good.

At a later point in time, I query mOSHE within nsDocshell and would like to iterate it's childre, because I need to find the history entry for FrameA, but at this point mOSHE does not hold any more children. So it seems the links for A's children get removed. I use |int32_t childCount = mOSHE->GetChildCount();| to query the children, is this not what am I supposed to do? Put differently, is there a better way of quering the history entry for FrameA?

Finally, it's worth mentioning that Location::Reload() is called, using the history Entry for FrameA. I am quite confused on how our history entries work - can you shed some light for me please?

Flags: needinfo?(nika)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #4)

Creating a clone of the principal is no good; for example there's no way to clone nullprincipals right now.

FWIW, NullPrincipals do not rely on pointer comparisons anymore, which allows to compare NullPrincipals even when (de)serialized. Changes landed within Bug 1346759.

(In reply to Christoph Kerschbaumer [:ckerschb] from comment #5)

Hey Nika, if I have a page A that creates FrameA, then we first create a nsISHEntry entry for A, then a an nsISHEntry entry for FrameA and then we call ::AddChild() so that A holds a link to it's child FrameA - so far so good.

Nevermind actually, I spend way too much time debugging - meanwhile I came up with a better solution anyway.

Flags: needinfo?(nika)
Whiteboard: [domsecurity-active] → [3/6] patch waiting for review [domsecurity-active]

Boris, unfortunately there was an oversight in the patch you already accepted. Since we now always call Principal->EnsureCSP() the event target for violation events which gets set in SetRequestContext [1] might be the wrong document. In more detail, imagine a document with no CSP (we call EnsureCSP() and set the violation target to that document). That document then creates a data: URI iframe (which inherits the CSP) which then creates a meta CSP. Now CSP violations get dispatched to the including document, not the data: URI iframe. The web-platform tests [2,3] exhibit that problem [try run A].

Then, I wrote a patch where we always set the requesting context within EnsureCSP() (see attached patch) to make sure the Request Context is always the data: URI iframe, in case of our example. But then I realized, that's also not quite right, as the number of wpt tests I had to update in the patch demonstrate [see also try run B].

Now, I think the including context and the data: URI iframe should not share a reference of the CSP. I think what we really should do is create a clone() of the CSP within InheritAndSetCSPOnPrincipalIfNeeded() [4]. To be honest, it's surprising this hasn't caused more problems. Imagine a document with a CSP loading a data: URI iframe - I think if the data: URI iframe adds a meta-csp of default-src 'none' it would block all further loads in the parent document.

I just realized that now, otherwise I would have written that patch in the first place but now it's too late for today. The Clone approach should also be reasonably small in case we need to uplift.

Does that sound reasonable/acceptable?

[1] https://searchfox.org/mozilla-central/source/dom/security/nsCSPContext.cpp#678
[2] /content-security-policy/frame-src/frame-src-self-unique-origin.html
[3] content-security-policy/img-src/img-src-self-unique-origin.html
[4] https://searchfox.org/mozilla-central/source/caps/nsScriptSecurityManager.cpp#247

[try run A] https://treeherder.mozilla.org/#/jobs?repo=try&revision=02537f68eed030ced3ca1f7ef8a4d269e90437b1
[try run B] https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ecaeccdffb661a0d21e80335c32845f63b58350

Flags: needinfo?(bzbarsky)

That document then creates a data: URI iframe (which inherits the CSP) which then creates a meta CSP.

OK, so if the original document did have a CSP this would already be wrong, right?

I think the including context and the data: URI iframe should not share a reference of the CSP

Yes, of course. That's been true all along. I thought we had bugs tracking it, but can't find them right now. It's definitely been brought up as a spec issue, and the spec is very clear about it: https://w3c.github.io/webappsec-csp/#initialize-document-csp says you make a copy of all the policies and create a new CSP from that. Fixing that is one of the major motivations for moving CSP out of principals, no? Because the principal should in fact be shared in various cases (not data:, but some others) where the CSP should not be shared.

Flags: needinfo?(bzbarsky)

FWIW, I just hit this fatal assertion in a debug build, for a tab that was at this URL:
https://calendar.google.com/calendar/b/1/r/week

(This was for my MoCo calendar account.)

In my case, principalCSPCount==1 and argCSPCount==0, and the "argsCSP" pointer is null.

(In reply to Daniel Holbert [:dholbert] from comment #11)

FWIW, I just hit this fatal assertion in a debug build, for a tab that was at this URL:
https://calendar.google.com/calendar/b/1/r/week
In my case, principalCSPCount==1 and argCSPCount==0, and the "argsCSP" pointer is null.

I couldn't reproduce this very one, but it's very likely that we are facing the same problem with history entries as described within comment 0. I am still working out the best way to fix the problem. Once landed, I would ask you to re-test and potentially file a new bug. But again, it's very likely the same problem.

Boris, I uploaded a new version of the patch |https://phabricator.services.mozilla.com/D21919| I think phab will not flag you for review since you already accepted the revision. As discussed, I am now creating a 'clone' of the CSP for data: URIs. So the only change to the patch you already accepted is within nsScriptSecurityManager.cpp.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #10)

That document then creates a data: URI iframe (which inherits the CSP) which then creates a meta CSP.

OK, so if the original document did have a CSP this would already be wrong, right?

Correct. That would trigger a problem.

I think the including context and the data: URI iframe should not share a reference of the CSP

Yes, of course. That's been true all along. I thought we had bugs tracking it, but can't find them right now. It's definitely been brought up as a spec issue, and the spec is very clear about it: https://w3c.github.io/webappsec-csp/#initialize-document-csp says you make a copy of all the policies and create a new CSP from that. Fixing that is one of the major motivations for moving CSP out of principals, no? Because the principal should in fact be shared in various cases (not data:, but some others) where the CSP should not be shared.

Yes, that is one of the main motivations, and I am really looking forward to once we have moved the CSP into the client. For now, I tried to create a 'clone' of the CSP with the minimal code changes necessary since we have to uplift that change as well. I know it's not the most elegant solution, but at least it works.

Flags: needinfo?(bzbarsky)

Right, but why are data: URIs special? You get the same issues with srcdoc and whatnot too, no?

That is, why are we suddenly trying to solve this longstanding problem now, as part of this change? Do we have specific data: URI tests that ended up failing or something?

Flags: needinfo?(bzbarsky) → needinfo?(ckerschb)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #15)

Right, but why are data: URIs special? You get the same issues with srcdoc and whatnot too, no?

That is, why are we suddenly trying to solve this longstanding problem now, as part of this change? Do we have specific data: URI tests that ended up failing or something?

You are correct, we face the same problem with all schemes that inherit the CSP. It seems that all wpt-tests which use a meta CSP are based on data: URI, hence those are the only ones that are failing.

Whatever we do in that space seems fragile. I am not happy with things stand right now. Nevertheless this seems the best option to resolve the problem in comment 0. Alternatively we could do nothing to fix that particular bug. As things stand right now it's only the assertion that fires, but the underlying code still uses the CSP from the Principal - so no harm done.

We could then worry about such problems after Bug 965637 which at least would take care of the inheritance problem.

Flags: needinfo?(ckerschb) → needinfo?(bzbarsky)

If the assertion is not causing a problem in practice, I would focus on bug 965637, which would allow us to fix this correctly.

That said, I'm also ok with checking in the bandaid, since we have it written already. I'll take a look at the patch soon.

Flags: needinfo?(bzbarsky)

I did 'Preview Landing' in Lando and it showed me 'Landing Queued' but the patch never landed - until I figure out what the problem was I am requesting 'checkin-needed' for this one.

Keywords: checkin-needed
Attachment #9048156 - Attachment description: Summary: Bug 1530854: Always create CSP on Principal so the explicit CSP in the nsISHEntry holds a reference to the potentially dynamically modified CSP in case of a meta CSP. r=bz → Bug 1530854: Always create CSP on Principal so the explicit CSP in the nsISHEntry holds a reference to the potentially dynamically modified CSP in case of a meta CSP. r=bz

You can reload the lando page to see current status. https://lando.services.mozilla.com/D21919/ says that the first line of the commit message didn't contain a bug number, which it did not; it just had "r=bzbarsky" as the first line. Removing the bogus "Summary:" from the front of the mercurial revision title seems to fix that problem; you should just try relanding via lando.

Flags: needinfo?(ckerschb)

Pushed by mozilla@christophkerschbaumer.com:
https://hg.mozilla.org/integration/autoland/rev/30b70cd83063
Always create CSP on Principal so the explicit CSP in the nsISHEntry holds a reference to the potentially dynamically modified CSP in case of a meta CSP. r=bzbarsky

Keywords: checkin-needed

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #19)

You can reload the lando page to see current status. https://lando.services.mozilla.com/D21919/ says that the first line of the commit message didn't contain a bug number, which it did not; it just had "r=bzbarsky" as the first line. Removing the bogus "Summary:" from the front of the mercurial revision title seems to fix that problem; you should just try relanding via lando.

TIL - I guess it works now. Thanks for your help Boris!

Flags: needinfo?(ckerschb)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: