Closed Bug 430740 Opened 12 years ago Closed 12 years ago

BOM characters are stripped from javascript before execution

Categories

(Core :: JavaScript Engine, defect)

x86
Windows Vista
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla1.9.1a1

People

(Reporter: bugzilla, Assigned: crowderbt)

References

()

Details

(Keywords: verified1.8.1.17, verified1.9.0.2, verified1.9.1, Whiteboard: [sg:high] XSS risk to sites)

Attachments

(5 files, 4 obsolete files)

User-Agent:       Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506; .NET CLR 3.5.30319; WWTClient2)
Build Identifier: 2.0.0.13

Using control format unicode characters, it is possible for code that will be executed to exist inside what would otherwise appear to be a quoted javascript string. The same technique can be used to escape out of block comments.



Reproducible: Always

Steps to Reproduce:
Execute the following javascript:

function evil() {
   alert('evil');
   return 'evil';
}
alert(eval("(['a\\\u200d', '+evil()])//')]"));
Actual Results:  
alert 'evil' appears, then an aldert "a', evil"

Expected Results:  
only one alert "a\,+evil()])//" should appear.

See url for more.
Assignee: nobody → general
Component: General → JavaScript Engine
Product: Firefox → Core
QA Contact: general → general
Fixed for Firefox 3 by patch for bug 274152 -- see bug 274152 comment 43 for the checkin date (also, not quite perfect hindsight: please search closed bugs first for format control keywords).

/be
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Summary: Control Format unicode characters are stipped from javascript before execution → Control Format unicode characters are stripped from javascript before execution
Is it possible that bug 268516 re-introduced this bug, for BOM characters?
js> alert(eval("(['a\\\ufffe', '+evil()])//')]")); 
evil
a', evil
Search as I may, for 30 minutes, and could not find any bugs related to 'control format' characters. Hindsight only works if you have eyes.
Attached file ecma_3/extensions/regress-430740.js (obsolete) —
Flags: in-testsuite+
Flags: in-litmus-
How did this bug get resolved?  Are comments 2 and 3 not still issues?
I missed the second case by crowder. 1.8.1 fails the first part, 1.9.0 fails the second part.
Attachment #318432 - Attachment is obsolete: true
Need a better fix here.
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
is the 1.9.0 failure of the second part of the test due to bug 368516 ?
Yes, and that's what I -meant- to type in comment 2, but didn't.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Blocks: 368516
Flags: wanted1.8.1.x+
Flags: blocking1.9.1?
Flags: blocking1.9.0.1?
Flags: blocking1.8.1.16?
Whiteboard: [sg:high?] XSS risk to sites.
brendan:  Some thoughts on this would be very helpful.  What will be the fate of randomly-occurring format characters like this, in ES4?
What's the problem? ES4 does not strip format control characters; it does try to cope with misplaced BOMs. The proposal to do both has been stable for a while.

/be
I guess I'm looking for a better idea of what you mean by "cope", and I'm not sure if I'm reading ecmascript.org correctly, or even looking in the right place.
Dave Reed:  If I substitute the \ufffe in my altered version of your example with just "a", the expression doesn't parse correctly.  Can you show me some more examples?
The write-up in http://www.ecmascript.org/es4/spec/incompatibilities.pdf says:

ES4 therefore specifies that class Cf characters should not be stripped from source text at all, with the exception of byte order marks (BOMs), which must continue to be stripped everywhere.

This bug makes the case that BOM as well as class Cf characters should not be stripped from within string literals. I'll try to get this onto the Ecma TC39 committee's radar for the next face-to-face meeting.

/be
This has been resolved for ES4, IMO.  The resolution is that Cf characters are not stripped at all, but that a BOM is interpreted in the same way as white space between tokens.  (Thus a BOM in a string or regular expression literal has the value of the BOM, not a space value.)  The alternative is to translate BOM to U+0020 also within string and regular expression literals, but IMO that is an inferior solution, as the former solution does not translate at all -- it just gives a compatible-with-ES3 interpretation to BOM in certain contexts.
Flags: blocking1.9.0.1? → blocking1.9.0.1-
Brendan, Brian: is this something we'd fix on the 1.8 branch, too?
Assignee: general → crowder
I'm actually not convinced anymore that this is that serious a bug, and the original poster has not provided an exploitable example, in which the enclosing code does not itself contain a syntax error.  We should probably WONTFIX or morph this bug to be the "make SM compatible with ES4" bug, which means doing what Lars documents in comment #17.
My original example does not result in a syntax error. The code within the supposed string isn't valid by itself, but that would be necessary by an attacker to make the unquoted string form valid code in the context in which it is used. Hence, the string ends in // so that remaining 'actual' code is considered commented out. 
Attached patch the fix (obsolete) — Splinter Review
Actually, yeah.  You're right.  Here's an attempt at a patch; it needs more testing to be ready for review, though.
Comment on attachment 329544 [details] [diff] [review]
the fix

I have -no idea- why mercurial has so much difficulty generating a minimized diff of this, I can't get it to do -b or -w.  I'll have a manually-generated diff shortly.
Attached patch for review purposes (obsolete) — Splinter Review
jsscan.cpp versus the fixed jsscan.cpp generated by |diff -bpU8|
Comment on attachment 329544 [details] [diff] [review]
the fix

This passes js/tests and Mochitests.
Attachment #329544 - Flags: review?(brendan)
Comment on attachment 329544 [details] [diff] [review]
the fix

Passing review to Igor.
Attachment #329544 - Flags: review?(brendan) → review?(igor)
(In reply to comment #25)
> (From update of attachment 329544 [details] [diff] [review])
> Passing review to Igor.
> 

From the first impression the patch replaces BOM by whytespace in regexp literals, doesn't it?
> From the first impression the patch replaces BOM by whytespace in regexp
> literals, doesn't it?

String literals, not regexp literals, unless I did something terribly wrong.
Brian, see comment 17. BOM in string or regexp must be preserved, BOM outside quotes or slashes becomes space.

/be
(In reply to comment #28)
> Brian, see comment 17. BOM in string or regexp must be preserved, BOM outside
> quotes or slashes becomes space.

I interpret that idea that BOM between tokens means whitespace as preserving BOM not only inside string and regexps but also inside numbers and names. That is, 1<bom>e7 should not mean 1 e7 but rather lead to a syntax error. 
(In reply to comment #29)
> I interpret that idea that BOM between tokens means whitespace as preserving
> BOM not only inside string and regexps but also inside numbers and names. That
> is, 1<bom>e7 should not mean 1 e7 but rather lead to a syntax error. 

Here is another example:

-<bom>-a

With the stripping rule that becomes "--a" while with bom-as-whitespace that would be "- -a" or just "+a". This is not good in my book.
Igor: problem is IE set a precedent. Can someone test IE8 beta to confirm it does turn BOM outside of string or regexp into a space?

/be
IE8 and IE7 seem to actually error out on BOMs anywhere except inside regexp and string literals, from my testing.  Both 7 and 8 throw an "invalid character" exception if the BOM occurs anywhere but at the location of the first character in the script (string, in eval), or in a regexp or string literal.  Also, IE throws an invalid character error for non-BOM format control characters (\u200d, as in the example from comment #1) occuring -anywhere- except inside of string/regexp literals.
My comment #32 is wrong.  IE7/8 handle BOMs differently inside strings being eval()d than inside <script> tags or code included from <script> tags.  Does eval() lose the utf-8 encoding attribute, perhaps?
At any rate, what IE does for BOMs inside <script> tags is to convert them to space (rather than eliding them as we currently do).  This means that my "the fix" is relatively close to their handling, except that I must now make eval() break to match them.
Also, I need to add the same exemption for regexp literals.
(In reply to comment #33)
> My comment #32 is wrong.  IE7/8 handle BOMs differently inside strings being
> eval()d than inside <script> tags or code included from <script> tags.  Does
> eval() lose the utf-8 encoding attribute, perhaps?

What would MSIE alerts for the following case:

<script>
    var a = 0;
    -<bom>-a;
    alert(a);
<script>

??
> What would MSIE alerts for the following case:
> 
> <script>
>     var a = 0;
>     -<bom>-a;
>     alert(a);
> <script>

IE alerts 0, not -1
(In reply to comment #37)
> > What would MSIE alerts for the following case:
> > 
> > <script>
> >     var a = 0;
> >     -<bom>-a;
> >     alert(a);
> > <script>
> 
> IE alerts 0, not -1

On more test, what about

<script>
    var a = eval('"<bom>"')
    alert(a.charCodeAt(0));
</script>

That is, does that bom->whitespace replacement happens on HTML parser or script level?




For 0xfeff, this alerts 65279.
(In reply to comment #39)
> For 0xfeff, this alerts 65279.

Ok , the last curiosity attempt:

<script>
function f() {<bom>}

var s = f.toString().replace('\ufeff', 'T');
alert(s);
</script>

I.e. the question is what exactly character MSIE uses to replace the bom? Or perhaps it does not replace it with WS at all but rather treats BOM as a whitespace in the lexer outside of eval context?  
function f() {T}
(In reply to comment #41)
> function f() {T}
> 

So MSIE simply treats BOM as a whitespace for the purpose of parsing. Which suggests to do this in SM to fix the bug: treat BOM as one of Unicode whitespace characters in the scanner avoiding any character skipping or patching.
Is the patch only for the BOM? What about \u200d from the original example, and the other :Cf: characters listed in the URI?
Flags: blocking1.9.0.2?
Flags: blocking1.8.1.17?
Flags: blocking1.8.1.17+
Non BOM Cf characters aren't stripped anymore (note that comment 0 references an Fx2 build-id), so they get treated appropriately (typically, we'll throw an "illegal character" exception).  That has been the case since mrbkap's original fix for bug 274152 landed.  I excepted BOM characters in bug 368516 because of concerns over web-compability, but stripping them out wasn't the Right Thing.  The Right Thing (or close to it, I think, given the situation), is to treat them as whitespace.

The 1.8 patch for this will likely be at least a little different.
Summary: Control Format unicode characters are stripped from javascript before execution → BOM characters are stripped from javascript before execution
Pratap:  I'm hoping you (or another engineer) can shed some light on the difference in behavior for IE7/8's eval() here, as opposed to the handling of <script> tags.
We still don't do the same thing IE does for eval cases (we'll continue to treat BOMs as whitespace inside eval.  This saves us from the exploit, but doesn't precisely emulated IE.
Attachment #329544 - Attachment is obsolete: true
Attachment #329545 - Attachment is obsolete: true
Attachment #329910 - Flags: review?(igor)
Attachment #329544 - Flags: review?(igor)
For convenience in review.
Comment on attachment 329910 [details] [diff] [review]
Here's another swing at emulating IE.

>--- mozilla.4b761ffed330/js/src/jsscan.cpp	2008-07-16 15:11:37.000000000 -0700
>+++ /Users/crowder/mozilla/js/src/jsscan.cpp	2008-07-16 14:56:37.000000000 -0700
>+static JS_INLINE JSBool
>+ScanAsSpace(jschar ch)
>+{
>+    if (JS_ISSPACE(ch) || ch == 0xfffe || ch == 0xfeff)
>+        return JS_TRUE;
>+    return JS_FALSE;
>+}

Nit: comment that 0xfeff and 0xfffe are BOMs treated as whitespace for compatibility.
Attachment #329910 - Flags: review?(igor) → review+
Comment on attachment 329911 [details] [diff] [review]
diff -w of previous

Sorry, drive-by: IsSpace would be canonical name (someone pointed out that BOM code points are Unicode space chars going by attributes), instead of ScanAsSpace. Likewise, c not ch.

/be
Come to think of it, does JS_ISSPACE treat BOMs as space already?

/be
(In reply to comment #50)
> Come to think of it, does JS_ISSPACE treat BOMs as space already?

BOMs are zero-width space characters which is different from the space character attribute.
Includes the ch => c rewrite, but retains ScanAsSpace as the function name, per discussion with brendan on IRC.
Attachment #329910 - Attachment is obsolete: true
Keywords: checkin-needed
Attached patch 1.9 branch patchSplinter Review
Applies without fuzz on the branch, other than the module name (jsscan.cpp => jsscan.c)
Attachment #330105 - Flags: approval1.9.0.2?
I think this bug is a legit XSS bug and rightly deserves the blocking statuses requested for it for both branches.
Whiteboard: [sg:high?] XSS risk to sites. → [sg:high] XSS risk to sites.
Pushed to mozilla-central:
http://hg.mozilla.org/mozilla-central/index.cgi/rev/ddb85d02a8f5
Status: NEW → RESOLVED
Closed: 12 years ago12 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1a1
OK, now it's really there (there were some issues before)
Is the "branch patch" for both branches, or is that only the 1.9 "branch"?
Blocks: xss
(In reply to comment #57)
> Is the "branch patch" for both branches, or is that only the 1.9 "branch"?

Only for 1.9, yeah.  I'll roll a 1.8 patch next.
Attachment #330105 - Attachment description: branch patch → 1.9 branch patch
Attached patch 1.8 branch patchSplinter Review
Fx2 is vulnerable to -all- Cf characters, not just BOMs.  This fixes that.  Asking for re-review, since the patch didn't apply trivially.  If 1.8.1.17 is not the branch I should be seeking approval for, please fix.
Attachment #330318 - Flags: review?(igor)
Attachment #330318 - Flags: approval1.8.1.17?
Duplicate of this bug: 446112
Attachment #330318 - Flags: review?(igor) → review+
Flags: blocking1.9.0.2? → blocking1.9.0.2+
Whiteboard: [sg:high] XSS risk to sites. → [sg:high] XSS risk to sites. [needs baking]
verified m-c:1.9.1
Status: RESOLVED → VERIFIED
Sorry for the trivial question but I'm new to bugzilla - is there a nightly build (Windows preferably) with the fix included that I can grab to run my test cases against?
The builds here should have the latest and greatest, including the patch landed on the 17th.

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/
Comment on attachment 330105 [details] [diff] [review]
1.9 branch patch

Approved for 1.9.0.2. Please land in CVS. a=ss
Attachment #330105 - Flags: approval1.9.0.2? → approval1.9.0.2+
Comment on attachment 330318 [details] [diff] [review]
1.8 branch patch

Approved for 1.8.1.17, a=dveditz for release-drivers
Attachment #330318 - Flags: approval1.8.1.17? → approval1.8.1.17+
Keywords: fixed1.9.0.2
Keywords: fixed1.8.1.17
Did this cause ecma_3/extensions/regress-368516.js to regress or is the test no longer valid?
Yes, that test is invalid now; BOMs aren't stripped anymore, but are treated as whitespace in parsing.
(In reply to comment #66)
> Did this cause ecma_3/extensions/regress-368516.js to regress or is the test no
> longer valid?
> 

The test contains:

eval("hi" + bomchars[i] + "there = 'howdie';");

With the patch this equivalent to eval("hi there = 'howdie';") which is not a valid code. So the test is no longer applicable. Ideally it should be changed to verify that boms are treated precizely as a whitespace, like in 

eval("var"+bom+"x;")

/cvsroot/mozilla/js/tests/ecma_3/extensions/regress-368516.js,v  <--  regress-368516.js
new revision: 1.2; previous revision: 1.1

mozilla-central: 16398:4da7cad4b133
verified fixed 1.8.1, 1.9.0, 1.9.1 linux/mac/win shell/browser
Comment on attachment 330318 [details] [diff] [review]
1.8 branch patch

a=asac for 1.8.0.15

taking for distros
Attachment #330318 - Flags: approval1.8.0.15+
Flags: blocking1.8.0.15+
Group: core-security
/cvsroot/mozilla/js/tests/ecma_3/extensions/regress-430740.js,v  <--  regress-430740.js
initial revision: 1.1

http://hg.mozilla.org/mozilla-central/rev/2d7d00ef041e
Flags: blocking1.9.1? → blocking1.9.1+
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ddb85d02a8f5
Keywords: fixed1.9.1
Whiteboard: [sg:high] XSS risk to sites. [needs baking] → [sg:high] XSS risk to sites
v 1.9.1
You need to log in before you can comment on or make changes to this bug.