Closed Bug 737064 Opened 8 years ago Closed 8 years ago

sync CSP source-expression parsing with w3c spec

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: geekboy, Assigned: mmoutenot)

References

(Blocks 2 open bugs, )

Details

Attachments

(1 file, 13 obsolete files)

The parsing of source expressions in CSPUtils.jsm is out of sync with the W3C's spec draft that suggests a procedure for matching and alludes to how to parse them.  We need to update the CSP parser to properly match and parse source-lists and source-expressions.

http://dvcs.w3.org/hg/content-security-policy/raw-file/tip/csp-specification.dev.html#source-list
Blocks: csp-w3c-1.0
Just getting a revision up here. Not by any means finalized, but I would love to get feedback. Am I even doing this correctly?
Assignee: nobody → mmoutenot
Attached patch Patch 2 (obsolete) — Splinter Review
Fixed a lot of parsing problems and corrected a few more tests.
Attachment #634966 - Attachment is obsolete: true
Attachment #635149 - Flags: feedback?(sstamm)
Sid, do you mind taking a look at the tests I am unsure about?

As a side note: please ignore the dumps. The patch is in an early stage =)
From spec: "If the source expression does not contain a port and port is not the default port for scheme, then return does not match."

Need to change the source parsing to assign default port for scheme if no port explicit.
Attached patch Third iteration for patch (obsolete) — Splinter Review
Removed dumps and some extraneous comments. Fixed some problems in the permits that were due to the spec changed and got the default ports for schemes without nasty enumeration.
Attachment #635149 - Attachment is obsolete: true
Attachment #635149 - Flags: feedback?(sstamm)
Attachment #635828 - Flags: feedback?(sstamm)
Attached patch Patch 4 (obsolete) — Splinter Review
Attachment #635828 - Attachment is obsolete: true
Attachment #635828 - Flags: feedback?(sstamm)
Attachment #636899 - Flags: feedback?(sstamm)
Comment on attachment 636899 [details] [diff] [review]
Patch 4

Review of attachment 636899 [details] [diff] [review]:
-----------------------------------------------------------------

This is starting to look really good, Marshall!

::: content/base/src/CSPUtils.jsm
@@ +585,5 @@
>    //    <host-dir-value> ::= <source-list>
>    //                       | "'none'"
>    //    <source-list>    ::= <source>
>    //                       | <source-list>" "<source>
> +  // These regexps represent the concrete syntax on the w3 spec

Please put the ABNF syntax in a comment here so that it's easy to match up to the spec and to match up to the below syntax.

@@ +617,5 @@
>    }
>  
>    var tokens = aStr.split(/\s+/);
>    for (var i in tokens) {
> +    if (sourceExp.test(tokens[i])){

How about checking the opposite?  If it doesn't match, warn and skip.  That helps avoid a little more nested blocks.

@@ +622,1 @@
>      var src = CSPSource.create(tokens[i], self, enforceSelfChecks);

Check indentation unless you change the if statement above.  Should be using two spaces (not tabs) per indentation.

@@ +623,5 @@
>      if (!src) {
>        CSPWarning("Failed to parse unrecoginzied source " + tokens[i]);
>        continue;
>      }
> +      // if a source is a *, then we can permit all sources

What does the spec say about what to do if there's a * in the source list and other source expressions too?  We should make sure we are supposed to process the other tokens.  (Please add a test for this, or verify one of our tests already checks this.)

@@ +931,5 @@
>   * @returns
>   *        an instance of CSPSource 
>   */
> +
> +

Please remove these extra newline/whitespace changes.

@@ +973,5 @@
> +  var extHostSrc = new RegExp ("^" + hostSrc.source + "\\/[:print:]+$", 'i');
> +  var keywordSrc = new RegExp ("^('self'|'unsafe-inline'|'unsafe-eval')$", 'i');
> +  var sourceExp  = new RegExp (schemeSrc.source + "|" +
> +                                 hostSrc.source + "|" +
> +                              keywordSrc.source,  'i');

If you will be defining these twice, perhaps it makes sense to make them constants or define them somewhere else?

@@ +997,5 @@
> +    if (!hostMatch){
> +      CSPError("Couldn't parse the host of invalid source " + aStr);
> +      return null;
> +    }
> +    sObj._host = CSPHost.fromString(hostMatch[0]);

What happens if the regex matches multiple instances?  Will it?  Please add a test that makes sure "http://foo.com:bar.com:23" doesn't parse.

@@ +1002,5 @@
> +    var portMatch = port.exec(aStr);
> +    if (!portMatch) {
> +      // gets the default port for the given scheme
> +      defPort = Services.io.getProtocolHandler(sObj._scheme).defaultPort;
> +      if (!defPort) {

Ports don't make sense for all schemes (data:, jar:, etc). For those without a default port, we should just assume port doesn't make sense (and no port should be used).  Please add a test (or verify we have one) to ensure "data:" is a valid source.

@@ +1094,5 @@
>    toString:
>    function() {
>      if (this._isSelf) 
> +      return "'self'";
> +      // return this._self.toString();

I'm torn between whether we want 'self' here or the actual value... lets stick with 'self' here as long as our unit tests pass, but if the spec says anything about it, we should follow the spec.

@@ +1152,5 @@
>      // aSource's host.
>      if (this.host && !this.host.permits(aSource.host))
>        return false;
>  
> +

Please remove extra newline.

@@ +1288,5 @@
>  CSPHost.fromString = function(aStr) {
>    if (!aStr) return null;
>  
>    // host string must be LDH with dots and stars.
> +  // TODO:[Marshall] I don't think we need to check this via previous regex'ing

The unit tests test this thing as a whole.  If we can guarantee that all callers will pass in valid strings, we can skip it (but document that requirement in the comments here).  Otherwise, just change the regex check below to use your regex from above.

::: content/base/test/unit/test_bug558431.js
@@ +68,2 @@
>        let cspr = CSPRep.fromString(cspr_str, mkuri(DOCUMENT_URI));
> +      dump("done creating CSPRep\n");

You'll have to remove all the dumps before landing.

::: content/base/test/unit/test_csputils.js
@@ +207,5 @@
>        //"src should inherit scheme 'https'"
> +
> +      // [TODO:MARSHALL] should this be permitted? seems like the below url has *
> +      // ports permitted, while the 'self' only has one port (443)
> +      //do_check_true(src.permits("https://foobar.com"));

I think you're right.  Please fix this test!

@@ +299,5 @@
>        // policy a /\ policy b intersects policies, not context (where 'self'
>        // values come into play)
>        var nullSourceList = CSPSourceList.fromString("'none'");
> +      //need to add a self to give port and scheme
> +      var simpleSourceList = CSPSourceList.fromString("a.com", "http://google.com");

Please just change to "http://a.com" instead of providing a self argument.  We should have other tests that check intersection when using self... if we don't, we should write some.
Attachment #636899 - Flags: feedback?(sstamm)
Attached patch Patch 5 (obsolete) — Splinter Review
Changes follow Sid's feedback. Thanks!
Attachment #636899 - Attachment is obsolete: true
Attachment #638875 - Flags: feedback?(sstamm)
Have we considered just using nsIURI to parse the source expressions? If/when we port this to C++, porting all the regexes might not be fun.

Also, the regex doesn't seem to allow for IDN (puny-encoded or otherwise). Am I missing something in the regex?
Dev: this bug addresses moving our implementation into compliance with the spec.  Reading the spec, you can see it is very specific about composition of hostnames and because the syntax is not the same as a URI (not compatible due to optional presence and wildcarding of host and scheme), we cannot use nsIURI parser for hosts.

If you don't agree with the spec's coverage of IDN or parsing of sources, you should bring it up in the working group, not here.  This bug is intended to comply with and not make changes to that spec.

If we do decide to port the parser to C++, we will use the usual ABNF parsing techniques (not regex) based on the ABNF in the spec and I don't think it'll be that hard to write.
(In reply to Sid Stamm [:geekboy] from comment #10)
> we cannot use nsIURI parser for hosts.

And of course by this I meant we cannot use the nsIURI parser for *CSP Sources*.
Attached patch Patch 6 (obsolete) — Splinter Review
Not sure if this is the ideal way to have the spec in the code, but I think having it along side of the regex is really readable and clear.
Attachment #638875 - Attachment is obsolete: true
Attachment #638875 - Flags: feedback?(sstamm)
Attachment #639395 - Flags: feedback?(sstamm)
Comment on attachment 639395 [details] [diff] [review]
Patch 6

Review of attachment 639395 [details] [diff] [review]:
-----------------------------------------------------------------

Almost there!

A few of my comments from last time still apply to this new patch.  Please address them (or tell me I'm doing it wrong).  ;)

One thing I am curious about from your last patch, but you removed the comment so it's not in this patch:

::: content/base/test/unit/test_csputils.js
@@ +207,5 @@
>        //"src should inherit scheme 'https'"
> +
> +      // [TODO:MARSHALL] should this be permitted? seems like the below url has *
> +      // ports permitted, while the 'self' only has one port (443)
> +      //do_check_true(src.permits("https://foobar.com"));

I think you're right.  Did you fix the test?

::: content/base/src/CSPUtils.jsm
@@ -559,5 @@
>   */
>  function CSPSourceList() {
>    this._sources = [];
> -  this._permitAllSources = false;
> -

If you remove this, please also remove it from CSPSourceList.fromString and then change permits() in the sources and source list to be sure any "*" in the list will be honored.

What does the spec say about what to do if there's a * in the source list and other source expressions too?  We should make sure we are supposed to process the other tokens.

@@ +645,5 @@
>        continue;
>      }
> +    // if a source is a *, then we can permit all sources
> +    if (src.permitAll){
> +      slObj._permitAllSources = true;

You deleted _permitAllSources from the constructor... is it needed here (or there, or both)?

@@ +651,2 @@
>      slObj._sources.push(src);
>    }

Nit: Indentation is messy here.  Please fix it.

@@ +986,5 @@
>  
> +  // check for host-source or ext-host-source match
> +  if (R_HOSTSRC.test(aStr) || R_EXTHOSTSRC.test(aStr)){
> +    var schemeMatch = R_GETSCHEME.exec(aStr);
> +    if (!schemeMatch) sObj._scheme = self.scheme;

Please fix indentation.

@@ +996,5 @@
> +    if (!hostMatch){
> +      CSPError("Couldn't parse the host of invalid source " + aStr);
> +      return null;
> +    }
> +    sObj._host = CSPHost.fromString(hostMatch[0]);

What happens if the regex matches multiple instances?  Will it?  Please add a test that makes sure "http://foo.com:bar.com:23" doesn't parse.

@@ +1003,5 @@
> +      // gets the default port for the given scheme
> +      defPort = Services.io.getProtocolHandler(sObj._scheme).defaultPort;
> +      if (!defPort) {
> +        CSPError("Invalid scheme given for source " + aStr);
> +        return null;

Ports don't make sense for all schemes (data:, jar:, etc). For those without a default port, we should just assume port doesn't make sense (and no port should be used).  Please add a test (or verify we have one) to ensure "data:" is a valid source.

@@ +1020,5 @@
>      if (!self) {
>        CSPError("self keyword used, but no self data specified");
>        return null;
>      }
> +    // host and port as the resource's URI

What's this comment mean?

::: content/base/test/unit/test_csputils.js
@@ +295,5 @@
>        //"* does not permit a long host with no port"
>        do_check_true( allSourceList.permits("http://a.b.c.d.e.f.g.h.i.j.k.l.x.com"));
>  
> +      //* short circuts parsing
> +      do_check_true(allAndMoreSourceList.permits("http://a.com"));

What happens if * appears after 'none'?

@@ +305,5 @@
>        // policy a /\ policy b intersects policies, not context (where 'self'
>        // values come into play)
>        var nullSourceList = CSPSourceList.fromString("'none'");
> +      //need to add a self to give port and scheme
> +      var simpleSourceList = CSPSourceList.fromString("http://a.com");

You're not adding self anymore, just hard-coding the scheme.  Maybe delete the comment.
Attachment #639395 - Flags: feedback?(sstamm)
>> One thing I am curious about from your last...

After looking at it again, I actually think that it is fine. Since 443 is the default port for https (right?) http://foobar.com should automatically get the port 443.

I guess I'm not totally up on the permits code. Does it have to be extra explicit?
agh *https://foobar.com* in the example above.
You're right, Marshall.  The thing being tested on line 208 is a URL and not a source, so omitting the port suggests we should use the default port (443).  You can probably leave that test alone.  Thanks for thinking on it.
>>What happens if * appears after 'none'?

In the source list: "'none' *" 'none' is not a valid item unless it is standalone, so it will be ignored and * will be honored.
Attached patch Patch 7 (obsolete) — Splinter Review
Attachment #639395 - Attachment is obsolete: true
Attachment #639818 - Flags: feedback?(sstamm)
Attached patch Patch 7 (correct one I hope) (obsolete) — Splinter Review
Attachment #639818 - Attachment is obsolete: true
Attachment #639818 - Flags: feedback?(sstamm)
Attachment #640693 - Flags: feedback?(sstamm)
Comment on attachment 640693 [details] [diff] [review]
Patch 7 (correct one I hope)

Review of attachment 640693 [details] [diff] [review]:
-----------------------------------------------------------------

This is great.  Please rebase your patch (see comments below), then flag me for a quick review.  After you rebase the patch, we can push it to try and make sure it's safe before landing.

::: content/base/src/CSPUtils.jsm
@@ +11,5 @@
>   */
>  
> +// Need for scheme's default port
> +Components.utils.import("resource://gre/modules/Services.jsm");
> +

Please use "Cu" as a shortcut for Components.utils.  In mozilla-central, the current revision also includes another jsm file that adds that as a global.

@@ +637,5 @@
>  
>    var tokens = aStr.split(/\s+/);
>    for (var i in tokens) {
> +    if (!R_SOURCEEXP.test(tokens[i])){
> +      CSPWarning("Invalid source expression " + tokens[i]);

The string warnings have been made localizable on trunk, which means this looks different.  Please rebase your patch to use the CSPLocalizer introduced by the fix for bug 766569.

@@ +655,1 @@
>      slObj._sources.push(src);

Nit: indent and spacing is funny here.  If statements should be formatted like this:

if (foo) {
  bar;
} else {
  junk;
}

Please update your if statements accordingly.

@@ +953,5 @@
>   * @returns
>   *        an instance of CSPSource 
>   */
> +
> +

Please don't add these extra newlines.

@@ +1036,2 @@
>        CSPError("Couldn't parse invalid source " + aStr);
>        return null;

Check indentation on these lines.
Attachment #640693 - Flags: feedback?(sstamm) → feedback+
Attached patch Rebase (8) (obsolete) — Splinter Review
Attachment #640693 - Attachment is obsolete: true
Attachment #640815 - Attachment is patch: true
Attached patch Patch for Review (obsolete) — Splinter Review
I could have messed something up manually editing out the whitespace changes... Let me know!
Attachment #640815 - Attachment is obsolete: true
Attachment #641204 - Flags: review?(sstamm)
Attached patch Whoops! Patch for review round 2 (obsolete) — Splinter Review
Patch got all messed up. :(
Attachment #641204 - Attachment is obsolete: true
Attachment #641204 - Flags: review?(sstamm)
Attachment #643125 - Flags: review?(sstamm)
Attachment #643125 - Attachment is patch: true
Comment on attachment 643125 [details] [diff] [review]
Whoops! Patch for review round 2

Review of attachment 643125 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Please fix my nits, then run it through try first.

This time I get nit-picky.  :)

::: content/base/src/CSPUtils.jsm
@@ +1006,5 @@
> +      sObj._scheme = schemeMatch[0];
> +    }
> +
> +    var hostMatch = R_HOST.exec(aStr);
> +    if (!hostMatch){

Nit: space between ) and {

@@ +1007,5 @@
> +    }
> +
> +    var hostMatch = R_HOST.exec(aStr);
> +    if (!hostMatch){
> +      CSPError(CSPLocalizer.getFormatStr("couldntParseInvalidSource",[aStr]));

Nit: space between , and [

@@ +1016,5 @@
> +    if (!portMatch) {
> +      // gets the default port for the given scheme
> +      defPort = Services.io.getProtocolHandler(sObj._scheme).defaultPort;
> +      if (!defPort) {
> +        CSPError(CSPLocalizer.getFormatStr("couldntParseInvalidSource",[aStr]));

Same as previous nit.
Attachment #643125 - Flags: review?(sstamm) → review+
Attached patch Final with Nits (obsolete) — Splinter Review
Needs to be added to try.
Attachment #643125 - Attachment is obsolete: true
Keywords: checkin-needed
Keywords: checkin-needed
Attached patch final patchSplinter Review
Carrying over Sid's r+

Updated patch with author name and final commit message

Pushed to try : https://tbpl.mozilla.org/?tree=Try&rev=45cbf50d6df5

jst says he's ok with us setting checkin-needed on this, so if try looks good, we can go ahead and do that.
Attachment #651831 - Attachment is obsolete: true
Attachment #652265 - Flags: review+
Flags: in-testsuite+
Try looks good.

Pushed to mozilla-inbound: 
https://hg.mozilla.org/integration/mozilla-inbound/rev/d5fab9ef16a4

(Not holding back for the other csp-1.0 changes since the source-expression syntax is identical to our previous implementation and this is just a parser change.)
Target Milestone: --- → mozilla17
https://hg.mozilla.org/mozilla-central/rev/d5fab9ef16a4
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Attachment #652265 - Attachment is obsolete: true
Attachment #653563 - Flags: feedback?(sstamm)
Blocks: 784315
Marshall: can you move your patch (attachment 653563 [details] [diff] [review]) for the single-token host over to bug 784315?
Moved patch to bug 784315
Attachment #652265 - Attachment is obsolete: false
Comment on attachment 653563 [details] [diff] [review]
fixed bug where host returned scheme instead

this patch moved to another bug, so it's not useful here.
Attachment #653563 - Attachment is obsolete: true
Attachment #653563 - Flags: feedback?(sstamm)
Depends on: 779918
Depends on: 832193
Depends on: 832558
You need to log in before you can comment on or make changes to this bug.