Closed Bug 368516 Opened 18 years ago Closed 17 years ago

UTF-8 encoded scripts that contain a BOM result in an "illegal character" error (breaks .Mac web galleries)

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: fullmetaljacket.xp+bugmail, Assigned: crowderbt)

References

Details

(Keywords: qawanted, regression, testcase)

Attachments

(4 files)

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9a2pre) Gecko/20070128 Minefield/3.0a2pre ID:2007012815 [cairo]

cannot load yahoo mail beta

error console:

Error: illegal character
Source file: http://us.js2.yimg.com/us.js.yimg.com/lib/pim/r/dclient/e/js/us/351464c1680509c1d55a45699cd168be_1.js
Line: 1, Column: 120
Source code:
:"Edit Error",Tu:"Errors were returned from the Contacts XML server",T:"Yahoo! Messenger ID"};strings_done_loading=null;?var gEncoding_alert1 = "To view a message in a different language encoding, click\n\"More Actions\" -> \"Set Encodings\", and choose y


Error: na is not defined
Source file: http://us.js2.yimg.com/us.js.yimg.com/lib/pim/r/dclient/e/js/core/d4f85716d971f0d78b30c44f9b0ab3d0_1.js
Line: 38


Error: na is not defined
Source file: http://us.js2.yimg.com/us.js.yimg.com/lib/pim/r/dclient/e/js/launch/e734b8275fceaf68c09632aae35ab8e9_1.js
Line: 23


Error: BrowserHistoryManager is not defined
Source file: http://us.js2.yimg.com/us.js.yimg.com/lib/pim/r/dclient/e/js/launch/e734b8275fceaf68c09632aae35ab8e9_1.js
Line: 55
btw, same error with safe mode
OS: Windows NT → Windows Vista
regression range:

20070126 1306pst WFM

20070126 1424pst BROKEN

http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1169845560&maxdate=1169850239

bug# 274152 ? 
Blocks: 274152
Status: UNCONFIRMED → NEW
Component: General → JavaScript Engine
Ever confirmed: true
Do remember to change the assignee/QA contact when moving a bug between components -- it's not likely to get noticed if you don't.  (I only found out about this via dependencies.)  Thanks!
Assignee: nobody → general
QA Contact: general → general
Using Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9a2pre) Gecko/20070201 Minefield/3.0a2pre, I don't get the same errors as the reporter I am unable to get past the "Logging In" point.
using same build as yours i can still reproduce this problem.
-using safe mode
-using new profile

Its not only a Vista Problem, same Problem is also on XP Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9a2pre) Gecko/20070203 Minefield/3.0a2pre ID:2007020304 [cairo]

WFM on Linux Fedora FC 6 with latest trunk.
So Yahoo! is relying on ECMA-262's rule that Format Control characters must be stripped before lexical analysis. Funny, but IE and Opera do not strip, so how is this content working in those browsers.

Could someone reduce the testcase from the source file cited in the "Illegal character" error shown in comment 0 and test it in all the major browsers?

/be
Attached file minimal testcase
Removing the |charset="utf-8"| from the script tag makes this an error in all builds. So at a first glance, it looks like the patch for bug 274152 somehow broke the use of that attribute.
OS: Windows Vista → All
Hardware: PC → All
(In reply to comment #9)
> Removing the |charset="utf-8"| from the script tag makes this an error in all
> builds. So at a first glance, it looks like the patch for bug 274152 somehow
> broke the use of that attribute.

Bad logic on my part: removing charset="utf-8" just makes the content always invalid, since the logic in nsScriptLoader.cpp falls back to ISO-8859-1.
A modified version of attachment 253922 [details] that doesn't rely on data: URIs passes in both IE 7 and Opera 9.02, on Windows.
attachment 253922 [details] yields a "pass" in Safari also.
Summary: cannot load yahoo mail beta → cannot load yahoo mail beta due to illegal character error
i cant no longer reproduce this problem from comment 0 but attachment 253922 [details] still yields as "FAILED".

using both gran paradiso 2 and 20070214 trunk

Yahoo has apparently removed the BOM from the deployed code, so this issue doesn't affect Yahoo Mail Beta anymore. The bug is still present, however.
if that so, should we change the bug summary now?



Summary: cannot load yahoo mail beta due to illegal character error → UTF-8 encoded scripts that contain a BOM result in an "illegal character" error
Flags: blocking1.9?
attachment 253922 [details] still returns "FAIL" in 3.0a6 pre (june 14, 2007 nightly trunk)
(In reply to comment #17)
> attachment 253922 [details] still returns "FAIL" in 3.0a6 pre (june 14, 2007 
> nightly trunk)

David, no need to tell everyone this bug is not fixed. That is what the status NEW means. 

Summary: UTF-8 encoded scripts that contain a BOM result in an "illegal character" error → UTF-8 encoded scripts that contain a BOM result in an "illegal character" error (breaks .Mac web galleries)
Has someone sent an email to Apple about gallery.mac.com?
Seems like more of an evangelism bug than a blocker, to me.
Flags: blocking1.9?
(In reply to comment #23)
> Seems like more of an evangelism bug than a blocker, to me.

Why? We have evidence of at least two major sites where it's been an issue, and I don't think it's unlikely that there are many more. It's also a regression from 1.8.
Flags: blocking1.9?
Brendan:  please weigh in on this?
This breaks also few Finnish sites, for example www.igglo.fi's calculator
This seems like something we could fairly trivially address; I'll be glad to hack on it, if someone could oblige with a hint on where to start.  Brendan?
moving to blocking since it appears to be a webcompat regression from 1.8
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Is this only the case for external script loads?  How do Opera and especially IE behave for inline script that includes a BOM?  What about event handler attribute values?  What about setTimeout string values?

Answering those questions will be a step towards figuring out the exact layer where we want to fix this.

I do think it's pretty weird that they're stripping out a BOM that doesn't come at the beginning of the data...
Keywords: qawanted
This bug also affects ajax'd pages.
try http://www.pagii.com/eric
works great in firefox 2.
Renders only background in FF3.0b2pre
Flags: blocking1.8.1.9?
Flags: blocking1.8.1.9?
Yes, yes.  This might affect all sorts of sites.  What's needed now are answers to the questions in comment 30.  Someone who has access to IE needs to sit down and do some testing...
(In reply to comment #33)
> Yes, yes.  This might affect all sorts of sites.  What's needed now are answers
> to the questions in comment 30.  Someone who has access to IE needs to sit down
> and do some testing...

I did some testing, but don't have the testcases here with me at the moment. IE seems to treat the BOM as it would any other unknown identifier, in that it throws a runtime error (something like " is not defined"). Of the cases you mentioned (inline script, event handler attribute values, setTimeout string values) IE throws a run-time exception, while Firefox currently throws a compile-time exception (syntaxerror). Branch builds also fail in that case, though, so it seems the only behavior change on our part was dealing with external script loads. IE seems to ignore the BOM in that case.

I'll try to get the testcases posted when I can.
So we could either change nsJSContext or we could change the scriptloader, right?  And we need to strip out the BOM everywhere, not just at the beginning of the data, correct?
I virtualize XP an would be happy to do some testing, just dont know what you folks need. She me some steps to aquire the proper data for fixing this and i'd be happy to help.
Brian: you don't need me to tell you that the combination of Postel's Law and the cut-and-paste-programming law mean we must tolerate BOMs in the middle of .js files by ignoring them. This is what the ES4 proposals will do, although the details have not been written down.

Based on comment 34 we don't need to tolerate misplaced BOMs by ignoring them, though. We could do what IE does, but only for BOMs -- other format control chars outside of strings and regexps should still be compile-time errors. But I would favor stripping misplaced BOMs. Write a patch and I'll review it.

/be
Assignee: general → crowder
This undoes the part of the patch from bug 274152 by restoring the do {} while loop there, but checking only for BE and LE BOM markers (not ALL unicode format-control characters).  diff -w coming next.
Attachment #291730 - Flags: review?(mrbkap)
Comment on attachment 291730 [details] [diff] [review]
revert part of the patch from bug 274152

This is conservative. It could lead to slightly odd behavior where random 0xfffe and 0xffef bytes that aren't part of a BOM-like thing are accepted, but I doubt that's an issue for anybody.
Attachment #291730 - Flags: review?(mrbkap) → review+
Comment on attachment 291730 [details] [diff] [review]
revert part of the patch from bug 274152

Brendan, if you can give this a quick once-over, that would be great.  If not, I think it is safe enough to just land.  I'll land tonight, if I don't hear from you.
Attachment #291730 - Flags: review?(brendan)
Attachment #291730 - Flags: approvalM10?
Attachment #291730 - Flags: approvalM10? → approvalM10+
Attachment #291730 - Flags: approval1.9?
Comment on attachment 291730 [details] [diff] [review]
revert part of the patch from bug 274152

Please comment on the magic hex numbers and what is going on (cite this bug even).

Also we'll need to follow up with TG1 to make sure the final spec (or one more final than the "let's strip BOMs anywhere" non-spec that is somewhere on ecmascript.org currently) is implemented for Firefox 3.

/be
Attachment #291730 - Flags: review?(brendan)
Attachment #291730 - Flags: review+
Attachment #291730 - Flags: approval1.9?
Attachment #291730 - Flags: approval1.9+
jsscan.c: 3.143
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
ecma_3/extensions/regress-274152.js now fails with:

Do not ignore unicode format-control characters expected: SyntaxError: illegal character actual:  reason: Expected value 'SyntaxError: illegal character', Actual value ''

should the \uFEFF be changed to another character used or is the test invalid now?
It should be changed to another character; this patch allows only U+FFFE and U+FEFF and forbids the rest.
Checking in ecma_3/extensions/regress-368516.js;
/cvsroot/mozilla/js/tests/ecma_3/extensions/regress-368516.js,v  <--  regress-368516.js
initial revision: 1.1
done
Flags: in-testsuite+
verified fixed 2007-12-09 1.9.0 linux|mac|win
Status: RESOLVED → VERIFIED
False Fixed Status, try http://www.pagii.com/eric
not working at all
still broken
(In reply to comment #49)
> False Fixed Status, try http://www.pagii.com/eric
> not working at all
> still broken

That's bug 407957.
Depends on: 430740
Attached file tests
Here are some tests that I wrote a while ago. They seem to contradict part of my comment 34 (BOMs in inline script and event handler attribute values don't cause errors), but I redid the tests and there are no inconsistencies between browsers (or between Firefox 2 and Firefox 3) as far as I can tell.

These tests are also at http://gavinsharp.com/bug/368516/ for easy loading.
Gavin:  What happens if you use \xff\xfe or \xfe\xff here, instead of \xEF\xBB\xBF ?
IE7 and IE8 both fail the "minimal testcase" test.
(In reply to comment #53)
> IE7 and IE8 both fail the "minimal testcase" test.

That testcase requires data: URI support, which IE7 doesn't have (IE 8 supposedly does, but it might only accept text/html, and the test uses text/javascript).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: