Last Comment Bug 737064 - sync CSP source-expression parsing with w3c spec
: sync CSP source-expression parsing with w3c spec
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla17
Assigned To: Marshall Moutenot
:
Mentors:
http://dvcs.w3.org/hg/content-securit...
Depends on: 779918 832193 832558
Blocks: CSP csp-w3c-1.0 784315
  Show dependency treegraph
 
Reported: 2012-03-19 09:44 PDT by Sid Stamm [:geekboy or :sstamm]
Modified: 2013-01-18 17:06 PST (History)
4 users (show)
ian.melven: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Regexing the parser to match the spec *exactly* (14.54 KB, patch)
2012-06-20 10:15 PDT, Marshall Moutenot
no flags Details | Diff | Review
Patch 2 (21.32 KB, patch)
2012-06-20 17:39 PDT, Marshall Moutenot
no flags Details | Diff | Review
Third iteration for patch (18.04 KB, patch)
2012-06-22 11:25 PDT, Marshall Moutenot
no flags Details | Diff | Review
Patch 4 (21.31 KB, patch)
2012-06-26 15:18 PDT, Marshall Moutenot
no flags Details | Diff | Review
Patch 5 (14.32 KB, patch)
2012-07-03 14:59 PDT, Marshall Moutenot
no flags Details | Diff | Review
Patch 6 (17.73 KB, patch)
2012-07-05 10:44 PDT, Marshall Moutenot
no flags Details | Diff | Review
Patch 7 (15.61 KB, patch)
2012-07-06 14:58 PDT, Marshall Moutenot
no flags Details | Diff | Review
Patch 7 (correct one I hope) (17.55 KB, patch)
2012-07-10 11:23 PDT, Marshall Moutenot
mozbugs: feedback+
Details | Diff | Review
Rebase (8) (17.14 KB, patch)
2012-07-10 15:16 PDT, Marshall Moutenot
no flags Details | Diff | Review
Patch for Review (20.76 KB, patch)
2012-07-11 14:33 PDT, Marshall Moutenot
no flags Details | Diff | Review
Whoops! Patch for review round 2 (20.94 KB, patch)
2012-07-17 13:46 PDT, Marshall Moutenot
mozbugs: review+
Details | Diff | Review
Final with Nits (20.95 KB, patch)
2012-08-14 11:00 PDT, Marshall Moutenot
no flags Details | Diff | Review
final patch (18.24 KB, patch)
2012-08-15 16:02 PDT, Ian Melven :imelven
ian.melven: review+
Details | Diff | Review
fixed bug where host returned scheme instead (20.39 KB, patch)
2012-08-20 15:49 PDT, Marshall Moutenot
no flags Details | Diff | Review

Description Sid Stamm [:geekboy or :sstamm] 2012-03-19 09:44:26 PDT
The parsing of source expressions in CSPUtils.jsm is out of sync with the W3C's spec draft that suggests a procedure for matching and alludes to how to parse them.  We need to update the CSP parser to properly match and parse source-lists and source-expressions.

http://dvcs.w3.org/hg/content-security-policy/raw-file/tip/csp-specification.dev.html#source-list
Comment 1 Marshall Moutenot 2012-06-20 10:15:58 PDT
Created attachment 634966 [details] [diff] [review]
Regexing the parser to match the spec *exactly*

Just getting a revision up here. Not by any means finalized, but I would love to get feedback. Am I even doing this correctly?
Comment 2 Marshall Moutenot 2012-06-20 17:39:58 PDT
Created attachment 635149 [details] [diff] [review]
Patch 2

Fixed a lot of parsing problems and corrected a few more tests.
Comment 3 Marshall Moutenot 2012-06-20 17:40:56 PDT
Sid, do you mind taking a look at the tests I am unsure about?

As a side note: please ignore the dumps. The patch is in an early stage =)
Comment 4 Marshall Moutenot 2012-06-21 10:59:07 PDT
From spec: "If the source expression does not contain a port and port is not the default port for scheme, then return does not match."

Need to change the source parsing to assign default port for scheme if no port explicit.
Comment 5 Marshall Moutenot 2012-06-22 11:25:18 PDT
Created attachment 635828 [details] [diff] [review]
Third iteration for patch

Removed dumps and some extraneous comments. Fixed some problems in the permits that were due to the spec changed and got the default ports for schemes without nasty enumeration.
Comment 6 Marshall Moutenot 2012-06-26 15:18:09 PDT
Created attachment 636899 [details] [diff] [review]
Patch 4
Comment 7 Sid Stamm [:geekboy or :sstamm] 2012-07-02 15:10:02 PDT
Comment on attachment 636899 [details] [diff] [review]
Patch 4

Review of attachment 636899 [details] [diff] [review]:
-----------------------------------------------------------------

This is starting to look really good, Marshall!

::: content/base/src/CSPUtils.jsm
@@ +585,5 @@
>    //    <host-dir-value> ::= <source-list>
>    //                       | "'none'"
>    //    <source-list>    ::= <source>
>    //                       | <source-list>" "<source>
> +  // These regexps represent the concrete syntax on the w3 spec

Please put the ABNF syntax in a comment here so that it's easy to match up to the spec and to match up to the below syntax.

@@ +617,5 @@
>    }
>  
>    var tokens = aStr.split(/\s+/);
>    for (var i in tokens) {
> +    if (sourceExp.test(tokens[i])){

How about checking the opposite?  If it doesn't match, warn and skip.  That helps avoid a little more nested blocks.

@@ +622,1 @@
>      var src = CSPSource.create(tokens[i], self, enforceSelfChecks);

Check indentation unless you change the if statement above.  Should be using two spaces (not tabs) per indentation.

@@ +623,5 @@
>      if (!src) {
>        CSPWarning("Failed to parse unrecoginzied source " + tokens[i]);
>        continue;
>      }
> +      // if a source is a *, then we can permit all sources

What does the spec say about what to do if there's a * in the source list and other source expressions too?  We should make sure we are supposed to process the other tokens.  (Please add a test for this, or verify one of our tests already checks this.)

@@ +931,5 @@
>   * @returns
>   *        an instance of CSPSource 
>   */
> +
> +

Please remove these extra newline/whitespace changes.

@@ +973,5 @@
> +  var extHostSrc = new RegExp ("^" + hostSrc.source + "\\/[:print:]+$", 'i');
> +  var keywordSrc = new RegExp ("^('self'|'unsafe-inline'|'unsafe-eval')$", 'i');
> +  var sourceExp  = new RegExp (schemeSrc.source + "|" +
> +                                 hostSrc.source + "|" +
> +                              keywordSrc.source,  'i');

If you will be defining these twice, perhaps it makes sense to make them constants or define them somewhere else?

@@ +997,5 @@
> +    if (!hostMatch){
> +      CSPError("Couldn't parse the host of invalid source " + aStr);
> +      return null;
> +    }
> +    sObj._host = CSPHost.fromString(hostMatch[0]);

What happens if the regex matches multiple instances?  Will it?  Please add a test that makes sure "http://foo.com:bar.com:23" doesn't parse.

@@ +1002,5 @@
> +    var portMatch = port.exec(aStr);
> +    if (!portMatch) {
> +      // gets the default port for the given scheme
> +      defPort = Services.io.getProtocolHandler(sObj._scheme).defaultPort;
> +      if (!defPort) {

Ports don't make sense for all schemes (data:, jar:, etc). For those without a default port, we should just assume port doesn't make sense (and no port should be used).  Please add a test (or verify we have one) to ensure "data:" is a valid source.

@@ +1094,5 @@
>    toString:
>    function() {
>      if (this._isSelf) 
> +      return "'self'";
> +      // return this._self.toString();

I'm torn between whether we want 'self' here or the actual value... lets stick with 'self' here as long as our unit tests pass, but if the spec says anything about it, we should follow the spec.

@@ +1152,5 @@
>      // aSource's host.
>      if (this.host && !this.host.permits(aSource.host))
>        return false;
>  
> +

Please remove extra newline.

@@ +1288,5 @@
>  CSPHost.fromString = function(aStr) {
>    if (!aStr) return null;
>  
>    // host string must be LDH with dots and stars.
> +  // TODO:[Marshall] I don't think we need to check this via previous regex'ing

The unit tests test this thing as a whole.  If we can guarantee that all callers will pass in valid strings, we can skip it (but document that requirement in the comments here).  Otherwise, just change the regex check below to use your regex from above.

::: content/base/test/unit/test_bug558431.js
@@ +68,2 @@
>        let cspr = CSPRep.fromString(cspr_str, mkuri(DOCUMENT_URI));
> +      dump("done creating CSPRep\n");

You'll have to remove all the dumps before landing.

::: content/base/test/unit/test_csputils.js
@@ +207,5 @@
>        //"src should inherit scheme 'https'"
> +
> +      // [TODO:MARSHALL] should this be permitted? seems like the below url has *
> +      // ports permitted, while the 'self' only has one port (443)
> +      //do_check_true(src.permits("https://foobar.com"));

I think you're right.  Please fix this test!

@@ +299,5 @@
>        // policy a /\ policy b intersects policies, not context (where 'self'
>        // values come into play)
>        var nullSourceList = CSPSourceList.fromString("'none'");
> +      //need to add a self to give port and scheme
> +      var simpleSourceList = CSPSourceList.fromString("a.com", "http://google.com");

Please just change to "http://a.com" instead of providing a self argument.  We should have other tests that check intersection when using self... if we don't, we should write some.
Comment 8 Marshall Moutenot 2012-07-03 14:59:51 PDT
Created attachment 638875 [details] [diff] [review]
Patch 5

Changes follow Sid's feedback. Thanks!
Comment 9 Devdatta Akhawe [:devd] 2012-07-04 20:01:04 PDT
Have we considered just using nsIURI to parse the source expressions? If/when we port this to C++, porting all the regexes might not be fun.

Also, the regex doesn't seem to allow for IDN (puny-encoded or otherwise). Am I missing something in the regex?
Comment 10 Sid Stamm [:geekboy or :sstamm] 2012-07-04 22:13:31 PDT
Dev: this bug addresses moving our implementation into compliance with the spec.  Reading the spec, you can see it is very specific about composition of hostnames and because the syntax is not the same as a URI (not compatible due to optional presence and wildcarding of host and scheme), we cannot use nsIURI parser for hosts.

If you don't agree with the spec's coverage of IDN or parsing of sources, you should bring it up in the working group, not here.  This bug is intended to comply with and not make changes to that spec.

If we do decide to port the parser to C++, we will use the usual ABNF parsing techniques (not regex) based on the ABNF in the spec and I don't think it'll be that hard to write.
Comment 11 Sid Stamm [:geekboy or :sstamm] 2012-07-04 22:20:40 PDT
(In reply to Sid Stamm [:geekboy] from comment #10)
> we cannot use nsIURI parser for hosts.

And of course by this I meant we cannot use the nsIURI parser for *CSP Sources*.
Comment 12 Marshall Moutenot 2012-07-05 10:44:39 PDT
Created attachment 639395 [details] [diff] [review]
Patch 6

Not sure if this is the ideal way to have the spec in the code, but I think having it along side of the regex is really readable and clear.
Comment 13 Sid Stamm [:geekboy or :sstamm] 2012-07-05 17:18:59 PDT
Comment on attachment 639395 [details] [diff] [review]
Patch 6

Review of attachment 639395 [details] [diff] [review]:
-----------------------------------------------------------------

Almost there!

A few of my comments from last time still apply to this new patch.  Please address them (or tell me I'm doing it wrong).  ;)

One thing I am curious about from your last patch, but you removed the comment so it's not in this patch:

::: content/base/test/unit/test_csputils.js
@@ +207,5 @@
>        //"src should inherit scheme 'https'"
> +
> +      // [TODO:MARSHALL] should this be permitted? seems like the below url has *
> +      // ports permitted, while the 'self' only has one port (443)
> +      //do_check_true(src.permits("https://foobar.com"));

I think you're right.  Did you fix the test?

::: content/base/src/CSPUtils.jsm
@@ -559,5 @@
>   */
>  function CSPSourceList() {
>    this._sources = [];
> -  this._permitAllSources = false;
> -

If you remove this, please also remove it from CSPSourceList.fromString and then change permits() in the sources and source list to be sure any "*" in the list will be honored.

What does the spec say about what to do if there's a * in the source list and other source expressions too?  We should make sure we are supposed to process the other tokens.

@@ +645,5 @@
>        continue;
>      }
> +    // if a source is a *, then we can permit all sources
> +    if (src.permitAll){
> +      slObj._permitAllSources = true;

You deleted _permitAllSources from the constructor... is it needed here (or there, or both)?

@@ +651,2 @@
>      slObj._sources.push(src);
>    }

Nit: Indentation is messy here.  Please fix it.

@@ +986,5 @@
>  
> +  // check for host-source or ext-host-source match
> +  if (R_HOSTSRC.test(aStr) || R_EXTHOSTSRC.test(aStr)){
> +    var schemeMatch = R_GETSCHEME.exec(aStr);
> +    if (!schemeMatch) sObj._scheme = self.scheme;

Please fix indentation.

@@ +996,5 @@
> +    if (!hostMatch){
> +      CSPError("Couldn't parse the host of invalid source " + aStr);
> +      return null;
> +    }
> +    sObj._host = CSPHost.fromString(hostMatch[0]);

What happens if the regex matches multiple instances?  Will it?  Please add a test that makes sure "http://foo.com:bar.com:23" doesn't parse.

@@ +1003,5 @@
> +      // gets the default port for the given scheme
> +      defPort = Services.io.getProtocolHandler(sObj._scheme).defaultPort;
> +      if (!defPort) {
> +        CSPError("Invalid scheme given for source " + aStr);
> +        return null;

Ports don't make sense for all schemes (data:, jar:, etc). For those without a default port, we should just assume port doesn't make sense (and no port should be used).  Please add a test (or verify we have one) to ensure "data:" is a valid source.

@@ +1020,5 @@
>      if (!self) {
>        CSPError("self keyword used, but no self data specified");
>        return null;
>      }
> +    // host and port as the resource's URI

What's this comment mean?

::: content/base/test/unit/test_csputils.js
@@ +295,5 @@
>        //"* does not permit a long host with no port"
>        do_check_true( allSourceList.permits("http://a.b.c.d.e.f.g.h.i.j.k.l.x.com"));
>  
> +      //* short circuts parsing
> +      do_check_true(allAndMoreSourceList.permits("http://a.com"));

What happens if * appears after 'none'?

@@ +305,5 @@
>        // policy a /\ policy b intersects policies, not context (where 'self'
>        // values come into play)
>        var nullSourceList = CSPSourceList.fromString("'none'");
> +      //need to add a self to give port and scheme
> +      var simpleSourceList = CSPSourceList.fromString("http://a.com");

You're not adding self anymore, just hard-coding the scheme.  Maybe delete the comment.
Comment 14 Marshall Moutenot 2012-07-06 09:26:44 PDT
>> One thing I am curious about from your last...

After looking at it again, I actually think that it is fine. Since 443 is the default port for https (right?) http://foobar.com should automatically get the port 443.

I guess I'm not totally up on the permits code. Does it have to be extra explicit?
Comment 15 Marshall Moutenot 2012-07-06 09:28:06 PDT
agh *https://foobar.com* in the example above.
Comment 16 Sid Stamm [:geekboy or :sstamm] 2012-07-06 09:48:40 PDT
You're right, Marshall.  The thing being tested on line 208 is a URL and not a source, so omitting the port suggests we should use the default port (443).  You can probably leave that test alone.  Thanks for thinking on it.
Comment 17 Marshall Moutenot 2012-07-06 14:53:43 PDT
>>What happens if * appears after 'none'?

In the source list: "'none' *" 'none' is not a valid item unless it is standalone, so it will be ignored and * will be honored.
Comment 18 Marshall Moutenot 2012-07-06 14:58:51 PDT
Created attachment 639818 [details] [diff] [review]
Patch 7
Comment 19 Marshall Moutenot 2012-07-10 11:23:14 PDT
Created attachment 640693 [details] [diff] [review]
Patch 7 (correct one I hope)
Comment 20 Sid Stamm [:geekboy or :sstamm] 2012-07-10 12:54:13 PDT
Comment on attachment 640693 [details] [diff] [review]
Patch 7 (correct one I hope)

Review of attachment 640693 [details] [diff] [review]:
-----------------------------------------------------------------

This is great.  Please rebase your patch (see comments below), then flag me for a quick review.  After you rebase the patch, we can push it to try and make sure it's safe before landing.

::: content/base/src/CSPUtils.jsm
@@ +11,5 @@
>   */
>  
> +// Need for scheme's default port
> +Components.utils.import("resource://gre/modules/Services.jsm");
> +

Please use "Cu" as a shortcut for Components.utils.  In mozilla-central, the current revision also includes another jsm file that adds that as a global.

@@ +637,5 @@
>  
>    var tokens = aStr.split(/\s+/);
>    for (var i in tokens) {
> +    if (!R_SOURCEEXP.test(tokens[i])){
> +      CSPWarning("Invalid source expression " + tokens[i]);

The string warnings have been made localizable on trunk, which means this looks different.  Please rebase your patch to use the CSPLocalizer introduced by the fix for bug 766569.

@@ +655,1 @@
>      slObj._sources.push(src);

Nit: indent and spacing is funny here.  If statements should be formatted like this:

if (foo) {
  bar;
} else {
  junk;
}

Please update your if statements accordingly.

@@ +953,5 @@
>   * @returns
>   *        an instance of CSPSource 
>   */
> +
> +

Please don't add these extra newlines.

@@ +1036,2 @@
>        CSPError("Couldn't parse invalid source " + aStr);
>        return null;

Check indentation on these lines.
Comment 21 Marshall Moutenot 2012-07-10 15:16:10 PDT
Created attachment 640815 [details] [diff] [review]
Rebase (8)
Comment 22 Marshall Moutenot 2012-07-11 14:33:13 PDT
Created attachment 641204 [details] [diff] [review]
Patch for Review

I could have messed something up manually editing out the whitespace changes... Let me know!
Comment 23 Marshall Moutenot 2012-07-17 13:46:25 PDT
Created attachment 643125 [details] [diff] [review]
Whoops! Patch for review round 2

Patch got all messed up. :(
Comment 24 Sid Stamm [:geekboy or :sstamm] 2012-08-13 17:10:07 PDT
Comment on attachment 643125 [details] [diff] [review]
Whoops! Patch for review round 2

Review of attachment 643125 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  Please fix my nits, then run it through try first.

This time I get nit-picky.  :)

::: content/base/src/CSPUtils.jsm
@@ +1006,5 @@
> +      sObj._scheme = schemeMatch[0];
> +    }
> +
> +    var hostMatch = R_HOST.exec(aStr);
> +    if (!hostMatch){

Nit: space between ) and {

@@ +1007,5 @@
> +    }
> +
> +    var hostMatch = R_HOST.exec(aStr);
> +    if (!hostMatch){
> +      CSPError(CSPLocalizer.getFormatStr("couldntParseInvalidSource",[aStr]));

Nit: space between , and [

@@ +1016,5 @@
> +    if (!portMatch) {
> +      // gets the default port for the given scheme
> +      defPort = Services.io.getProtocolHandler(sObj._scheme).defaultPort;
> +      if (!defPort) {
> +        CSPError(CSPLocalizer.getFormatStr("couldntParseInvalidSource",[aStr]));

Same as previous nit.
Comment 25 Marshall Moutenot 2012-08-14 11:00:51 PDT
Created attachment 651831 [details] [diff] [review]
Final with Nits

Needs to be added to try.
Comment 26 Ian Melven :imelven 2012-08-15 16:02:46 PDT
Created attachment 652265 [details] [diff] [review]
final patch

Carrying over Sid's r+

Updated patch with author name and final commit message

Pushed to try : https://tbpl.mozilla.org/?tree=Try&rev=45cbf50d6df5

jst says he's ok with us setting checkin-needed on this, so if try looks good, we can go ahead and do that.
Comment 27 Sid Stamm [:geekboy or :sstamm] 2012-08-16 11:04:33 PDT
Try looks good.

Pushed to mozilla-inbound: 
https://hg.mozilla.org/integration/mozilla-inbound/rev/d5fab9ef16a4

(Not holding back for the other csp-1.0 changes since the source-expression syntax is identical to our previous implementation and this is just a parser change.)
Comment 28 Ryan VanderMeulen [:RyanVM] 2012-08-16 17:57:20 PDT
https://hg.mozilla.org/mozilla-central/rev/d5fab9ef16a4
Comment 29 Marshall Moutenot 2012-08-20 15:49:56 PDT
Created attachment 653563 [details] [diff] [review]
fixed bug where host returned scheme instead
Comment 30 Sid Stamm [:geekboy or :sstamm] 2012-08-21 08:17:49 PDT
Marshall: can you move your patch (attachment 653563 [details] [diff] [review]) for the single-token host over to bug 784315?
Comment 31 Marshall Moutenot 2012-08-21 10:34:53 PDT
Moved patch to bug 784315
Comment 32 Sid Stamm [:geekboy or :sstamm] 2012-08-23 14:07:04 PDT
Comment on attachment 653563 [details] [diff] [review]
fixed bug where host returned scheme instead

this patch moved to another bug, so it's not useful here.

Note You need to log in before you can comment on or make changes to this bug.