Closed Bug 671029 Opened 13 years ago Closed 13 years ago

test262 shell test failures due to UTF8+BOM

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla8

People

(Reporter: paul.biggar, Assigned: paul.biggar)

References

Details

(Whiteboard: js-triage-done)

Attachments

(1 file)

Check for BOM 13 years ago Paul Biggar 1.96 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review

Paul Biggar

Assignee

Description

•

13 years ago

When running test262 in the shell (currently requires a patch from bug 669766), most of the ietestcenter tests fail with a syntax error in the first line:

jit-test/tests/ietestcenter/chapter07/7.3/7.3-9.js:1: SyntaxError: illegal character:
jit-test/tests/ietestcenter/chapter07/7.3/7.3-9.js:1: /// Copyright (c) 2009 Microsoft Corporation 
jit-test/tests/ietestcenter/chapter07/7.3/7.3-9.js:1: .^

I don't really know anything about this, but the filetype of the files that fail is:

"UTF-8 Unicode (with BOM) English text, with CRLF line terminators"

Boris Zbarsky [:bzbarsky]

Comment 1

•

13 years ago

When the JS shell loads files, what does it assume about the character encoding?

David Mandelin [:dmandelin]

Updated

•

13 years ago

Whiteboard: js-triage-needed

Paul Biggar

Assignee

Comment 2

•

13 years ago

I'm looking into this. I tried a quick script to convert the UTF8 to ASCII, but there were non-ASCII characters in there.

So the shell assumes ASCII, and transcodes it to UCS2 (or UTF16, not sure) (as does Firefox, from whatever encoding it downloads in).

So the only fixes I can think of are:
- figure out the encoding, and transcode it to UCS2
- assume UTF-8 instead of assuming ASCII
- ignore the problem, and measure test262 using the browser

I think I like option 2 the best. AIUI, UTF8 is a strict superset of ASCII, so we break nothing by assuming it. Also, I think UTF8 is the only encoding we could use that handles all characters while also not breaking stuff for users. Finally, I suspect it can be made work by setting JS_C_STRINGS_ARE_UTF8, and so involves minimal work.

(Pretty low priority, but worth a quick look)

Assignee: general → pbiggar

Status: NEW → ASSIGNED

Whiteboard: js-triage-needed → js-triage-done

Paul Biggar

Assignee

Comment 3

•

13 years ago

No, I think I'm all wrong here. We already support UTF8, with the -U flag, and this doesn't solve the problem (though if it did, specifying -U would be much preferable to the solutions above).

I'm thinking we can just remove the BOM, either once off for the test262 files, or during parsing. Wikipedia says of the BOM:

``While the Unicode Standard does allow a BOM in UTF-8,[2] it does not require or recommend it.[3] Byte order has no meaning in UTF-8[4] so a BOM serves only to identify a text stream or file as UTF-8.

The reason the BOM is recommended against is that it defeats the ASCII back-compatibility that is part of UTF-8's design. Many existing pieces of software can handle UTF-8 inside the text but not at the start. For instance, the bytes of UTF-8 can be placed between the quotes of string constants in many programming languages, and that language will write the correct UTF-8 to a file or to a display, despite the language not knowing anything about UTF-8. This provides an easy migration path to convert systems to Unicode and to remove all legacy encodings. The unexpected three bytes of the BOM break this however, as they are located where they are certain to be a syntax error.``

Paul Biggar

Assignee

Comment 4

•

13 years ago

Comment 3 is right on. It's easy to remove the BOM, and that fixes the problem. Patch coming.

Paul Biggar

Assignee

Comment 5

•

13 years ago

Attached patch Check for BOM — Details — Splinter Review

This checks for, and skips, a byte-order mark on UTF8 files.

Attachment #545504 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 6

•

13 years ago

Comment on attachment 545504 [details] [diff] [review]
Check for BOM

Review of attachment 545504 [details] [diff] [review]:
-----------------------------------------------------------------

This is kinda hackish, but it arguably works and no one's really really going to care, so whatever.

Attachment #545504 - Flags: review?(jwalden+bmo) → review+

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 7

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/102481f5e2b9

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla8

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 8

•

13 years ago

and the followup
http://hg.mozilla.org/mozilla-central/rev/52e36db1e8c7

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

test262 shell test failures due to UTF8+BOM

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: paul.biggar, Assigned: paul.biggar)

References

Details

(Whiteboard: js-triage-done)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type