Open Bug 612921 Opened 14 years ago Updated 2 years ago

CSP content policy is called twice for script (and some other) loads

Categories

(Core :: DOM: Core & HTML, defect, P5)

x86
All
defect

Tracking

()

People

(Reporter: bsterne, Unassigned)

References

()

Details

Place a breakpoint at CSPService::ShouldLoad and load the URL above and see that we get called once for the top level document and then twice for the script resource that the page embeds.  The same is also true for style-src-test.cgi where we get called once for the top level load and then twice each for the remote stylesheet and the remote script file that the test embeds.

However with frame-src-test.cgi, for example, we get called once for top-level, twice for the script resource, but only once for the subdocument load.  So we're called twice for TYPE_SCRIPT and TYPE_STYLESHEET at a minimum... possibly others.  It would obviously be great to eliminate the duplicate calls as there's potential performance gains to be made.
Those calls are from speculative preloads, no?

And in particular, preloads don't have the full context that real loads do (e.g. the DOM element involved, etc), so the calls are not equivalent.
Brandon, you are observing speculative loading at work. The first call is made when the script/stylesheet/image needs to be loaded speculatively. At that point no real context can be passed in, only the document. If the load way allowed and the resource loaded actually gets used there will be a second call - this time with the actual element as context. Some content policies actually care about the context so this needs to be done. Bug 540642 is where this behavior has been introduces for images, there is an earlier bug for scripts as well somewhere.
Okay, thank you for the explanation.  That makes perfect sense.  This bug is probably INVALID, then.  I'll let dveditz confirm before I resolve it.
isn't the bug valid anyway? now a double shouldLoad makes sense, but we still need to fix CSP::shouldLoad to distinguish between speculative and real loads, so that we can apply the policy only for one of them.
Why would you only apply the policy for one of them?  Does the policy not need to prevent the network being hit?
If i deny the speculative request, the real request is issued anyway, right? Is there a way to deny both of them at the same time when handling the speculative request?
Otherwise we could simply cache the policy check result. I am not sure it's worth it for CSP policies, since they seem to be relatively lightweight. However I am trying to extend CSPs with more expensive policies so I am personally interested in applying them only once.
> If i deny the speculative request, the real request is issued anyway, right?

Yes.  Note that the real request might well be to a different URI than the speculative one in any case (though of course it usually isn't).

> Is there a way to deny both of them at the same time when handling the
> speculative request?

No.
Yeah, the policy needs to be applied to both, but we don't want to duplicate reports for what are logically (not technically) the same violation.

Maybe for CSP an option would be to just not send violation reports for speculative loads.  Is there an easy way to identify for sure if a call to ShouldLoad() is for a speculative load?  Maybe we could just skip the reporting part for those.  Would this approach have any adverse effects (i.e., is there ever a speculative load that causes a real load to not happen, though the resource is used)?
Note that I'm not sure that all resources do two loads. The stacks in bug 614718 show that the CSS service calls into the content policy both for the speculative request and the real request, but I'm not sure that for example the script loader does.

If we stop doing reports for speculative requests, we should make sure that there is a content-policy request for the real load. And add tests for it of course.
> but I'm not sure that for example the script loader does.

It does.

Sid, I don't think there's a way to tell for sure that a load is speculative right now....  We could have an API change to add one, of course.
(In reply to comment #11)
> Sid, I don't think there's a way to tell for sure that a load is speculative
> right now....

A script or stylesheet with a document as context is pretty definitely a speculative load. For images things are ambiguous however, background images also pass in the document as context.

Sid, I would guess that the most reliable solution for CSP would be remembering reports that were already sent for this document - and avoid sending duplicate reports. Only problem is doing that in a way that won't leak memory, since you don't know when the "real" load will happen. But IMHO all other approaches would be making too many assumptions.
> A script or stylesheet with a document as context is pretty definitely a
> speculative load.

A stylesheet load with a document as context can be caused by an HTTP Link header.

I think you're probably right about scripts right this second.  Tomorrow, who knows.

There are no speculative loads after DOMContentLoaded, so once that happens you could clear the duplicate report thing.
(In reply to Boris Zbarsky (:bz) from comment #13)
> There are no speculative loads after DOMContentLoaded, so once that happens
> you could clear the duplicate report thing.

Can it be assumed that DOMContentLoaded will always fire? Gecko seems to do it even if window.stop() is called, Chrome on the other hand doesn't fire it in this scenario. I'm not sure what HTML5 standard says on this but I wouldn't bet on Gecko's behavior never changing.
Sure.  I don't know what the spec says on this either, to be honest.

Again, I would have no problem indicating in the API when we're doing a preload...
Garrett, this is quite relevant to what you're seeing with nonce-source..
The speculative preload problem with CSP becomes a more serious issue beyond performance gains or duplicate reports with the addition of nonce-source to the CSP 1.1 spec (Bug 855326). In this case, we need to compare a nonce set as an attribute in a script tag with a nonce provided in the policy to determine whether to accept or reject a script load. The call to ShouldLoad in the speculative preload can never return the correct result here, since it doesn't have access to the DOM element involved as bz explained in Comment 1 and so cannot access the nonce attribute.

This is an even more serious problem (one that I spent most of yesterday untangling) because caching of CSP results was added to improve performance on B2G (http://mxr.mozilla.org/mozilla-central/source/content/base/src/contentSecurityPolicy.js#507). Due to caching, the first call to ShouldLoad during the speculative preload incorrectly returns false, since it does not have access to the nonce. This incorrect result is then cached, preventing the correct behavior from happening the second time ShouldLoad is called (when the script is actually about to be run).

The best solution is probably to pass information in the context about whether this is a speculative preload, so the Content Policies can decide for themselves what the best course of action to take might be. I am currently hacking this in my script-nonce patch using the approach from Comment 12, and inferring that style or source loads that have an nsIDOMHTMLDocument as context (and not an nsIDOMScriptElement or nsIDOMStyleElement) are speculative preloads. I would welcome suggestions of other, more explicit approaches, as I am concerned about this being overly indirect/brittle.

bz, one question - I would expect that if a load was blocked by a content policy in a speculative preload, the same element would not be loaded again. But as I have noticed in my own recent debugging, this happens at least in some cases. Can you better explain why this would happen, and if so, what the purpose of checking content policies during speculative preloading is?
Flags: needinfo?(bzbarsky)
I covered that in comment 7, no?

Specifically:

1)  Success or failure of the preload has nothing to do with the normal load, really.  The normal load just checks for an existing successful preload, and if there isn't one starts a new load.  We could change that around somehow, but once again the context for the preload and the normal load is different so a content policy could block all preloads (just in case, since it can't tell whether they should be blocked because there is no DOM node to work with) but allow through some normal loads.

2)  The point of checking content policies during preloads is because if the preload succeeds there will be no normal load at all.  So if you want to block something you have to block the preload _and_ the normal load.
Flags: needinfo?(bzbarsky)
So that I understand this stuff better, is the following correct

* We only get two calls into ShouldLoad if the first load is denied.
* If the prescanner-load is allowed to go through, then when we go do the "real" load we simply grab the
  existing load and don't call any content policies.
* The reason the second load happens is because if the prescan security checks fail, no channel is
  created and no preload data is set up. We simply act like if the prescanner hadn't found anything. So
  when the real load is about to happen, we don't find a failed load, we find nothing, and so a new load
  is started, which of course will go through security checks.
* If a preload happens due to a server error (like a 500), we don't drop that preload on the floor.
  Instead when the real load is about to happen it simply grabs the existing, failed, preload and we act
  like the real load failed.

If *all* of the above is correct, then I could think of two possible solutions here:

A) If a preload fails the security checks, create a fake channel which simply acts like a network error,
   then start a preload from that channel. When the real load is about to start, it fails with a network
   error.
B) Wait for the "new necko security hooks" architecture since it will solve this problem.
(In reply to Jonas Sicking (:sicking) from comment #20)
> * We only get two calls into ShouldLoad if the first load is denied.

IMHO it's rather the opposite - if the first load is already denied then the second call no longer happens. We get two calls into ShouldLoad if the first load is *allowed*, the second call providing more context information to decide on.
Wow, looking at the scriptloader it does indeed look like it explicitly calls nsIContentPolicy again, even when we're picking up a already-started preload.

http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsScriptLoader.cpp#616

Seems like we should simply stop doing that?
That would break content policy implementations that need the context to make a qualified decision - like Adblock Plus.
Why do you need the context node rather than just the context window/document? Note that the network request has already started by the time the node is created, so we can't prevent it from going out. Though we can (and do) of course prevent it from getting used.
Actually, the decision itself isn't dependent on the context node - but Adblock Plus still needs to know the node, e.g. to allow the user to block that request in future via context menu.
Could you please describe that in more detail?
In more detail: Adblock Plus associates document nodes with the requests that they triggered. This allows (among other things) to display a "Block image" context menu item when an image is right-clicked - Adblock Plus knows all the parameters of the request associated with this image and can suggest a filter matching them. Adblock Plus can also show a list of all requests for a document and show the nodes corresponding to each of them. As far as Adblock Plus is concerned, for preloaded requests the first request is important to block things but the second is required to correctly establish this association between requests and nodes.

I now remember that there is also a scenario where Adblock Plus might decide to block the preload request but allow the regular request - it is possible to create signature-based exception rules, but in order to read out the signature of a page Adblock Plus needs to access the root element of the document which might not exist yet when the preload request is made. I think that this is the only scenario where Adblock Plus might make different decisions for preload and the "real" request.
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046

Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5.

If you have questions, please contact :mdaly.
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.