detecting BOM when loading script

VERIFIED FIXED in mozilla1.2beta

Status

()

Core
Internationalization
--
blocker
VERIFIED FIXED
16 years ago
16 years ago

People

(Reporter: David, Assigned: Shanjian Li)

Tracking

({intl})

Trunk
mozilla1.2beta
x86
Windows 2000
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

(Reporter)

Description

16 years ago
When a unicode javascript file is included, i have the following error in the
javascript console :

Error: illegal character
Source File: /login.js
Line: 1
Source Code:
��/
Is the Javascript file being served with the right character set from the server?

Comment 2

16 years ago
A unicode decoder problem? Not sure if this is Parser or International,
but I don't think this is JS Engine. Reassigning to International. 

swann@cqs.dyndns.org: is there a URL we can go to that shows the problem?
Without a testcase or a URL, we won't be able to work on this; thanks -
Assignee: rogerl → yokoyama
Component: JavaScript Engine → Internationalization
QA Contact: pschwartau → ruixu
Summary: include unicode javascript files don't work → Included Unicode JavaScript files don't work

Updated

16 years ago
Keywords: intl
QA Contact: ruixu → teruko
(Reporter)

Comment 3

16 years ago
unfortunatly the file is inside the intranet .
actually all the js files in my application are included and stored in unicode..
the main page which include them is encoded in ISO-8859-1... the error occurs
for each file inclusion..
to reproduce the bug, one just have to make a simple html page encoded with
ISO-8859-1 that include a javascript file (even empty) stored in unicode.

Comment 4

16 years ago
I tested this in 6-21 branch Win32 build.  I could reproduce this. 
The beggining of the UTF8 js file caused the problem.  

Comment 5

16 years ago
Created attachment 88692 [details]
html file

Comment 6

16 years ago
Created attachment 88693 [details]
UTF8 JS file

Comment 7

16 years ago
Created attachment 88694 [details]
Javascript console

Comment 8

16 years ago
When I run the attached html file (id=88692) which included UTF8 file (id = 88693),
I got the error message in Javascript console (id=88694). 

David, is this what you see? 
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Reporter)

Comment 9

16 years ago
yes teruko. it's exactly the same error i got, due to the fact that js file is
stored in UTF-8

Updated

16 years ago
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2beta

Comment 10

16 years ago
If I manually change the attached html file from <ISO-8859-1> to <UTF-8> from
browser menu; then JS works fine. I guess the decoder is respecting the doc
charset.
Target Milestone: mozilla1.2beta → ---

Updated

16 years ago
Blocks: 157673

Comment 11

16 years ago
what happen is the html is in iso-8859-1 and the js file is UTF-8 file without
any labeling. (http charset)
So the browser assume the js file is in the same encoding of the html and try to
load it with iso-8859-1 converter. Since this utf-8 js file is created by window
notpad, it generate 3 bytes of BOM in utf8 in the beginning. This make the JS
engine think this is not a valid file.

What could we do? I think one thing we could do is in the code which convert JS
into unicode. look at the first several bytes, like what we do in html parser.
And detect UTF16 BOM or UTF-8 BOM

Reassign this to shanjian
nsbeta1+ for m1.2final 
Assignee: yokoyama → shanjian
Status: ASSIGNED → NEW
Keywords: nsbeta1+
Summary: Included Unicode JavaScript files don't work → Included UTF8 JavaScript files from a non UTF8 html don't work
Target Milestone: --- → mozilla1.2beta

Comment 12

16 years ago
setting charset solves the problem:
<script type="text/javascript" charset="UTF-8" src="test.js"></script>

unfortunately the code will still produce error in msie5.
can anyone check if this is also a prob with ns4.x ?

Comment 13

16 years ago
msie5 loads the file fine if the first 3 characters in the
js file are removed.

if the first 3 bytes are removed, but charset is not set, msie5 and
mozilla will load the script but produce (document.write) jibberish
chars in iso-8859-1
(Assignee)

Comment 14

16 years ago
Created attachment 97079 [details] [diff] [review]
patch
(Assignee)

Comment 15

16 years ago
ftang, could you review?
Status: NEW → ASSIGNED
Summary: Included UTF8 JavaScript files from a non UTF8 html don't work → detecting BOM when loading script

Comment 16

16 years ago
Comment on attachment 97079 [details] [diff] [review]
patch

r=ftang
make sure there are space between if and (
Attachment #97079 - Flags: review+
(Assignee)

Comment 17

16 years ago
I will take care of the space format issue before checkin. 
dbaron, could you sr?
Comment on attachment 97079 [details] [diff] [review]
patch

I could, although I'd much prefer if jst did since he's the module owner for
this area of code.  (Other than the bizarre indentation, which should be made
consistent with the rest of the file, it seems fine to me, although the "should
change to" comments seem to have been done already.)
(Assignee)

Comment 19

16 years ago
johny,I try to avoid you this time in order not to keep you too busy. But it
seems like I have to do ask you about this one since you are the best candidate. 
Comment on attachment 97079 [details] [diff] [review]
patch

+
+// This function is copied from nsParser.cpp. It was simplied though,
unnecessary part is removed.

"simlified" is mis-spelled above.

+static PRBool DetectByteOrderMark(const unsigned char* aBytes, PRInt32 aLen,
nsString& oCharset) {

Please put the static keyword and the return type on its own line, and opening
brace on its own line as well.

+ if (aLen < 2)
+   return false;
+
+ switch(aBytes[0])
+	 {
...

Please clean up this indentation mess. No tabs, 2-space indentation.

Other than that (and what dbaron pointed out about the comments), sr=jst
Attachment #97079 - Flags: superreview+
(Assignee)

Comment 21

16 years ago
fix checked in. 
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
urg. this patch is bad. it uses AssignWithConversion, which should no longer be
used, and uses nsString as function argument.

It should use a more general type as the argument (nsAString seems appropriate).
as for
+        oCharset.AssignWithConversion("UTF-8"); 

it would be better written as:
oCharset.Assign(NS_LITERAL_STRING("UTF-8"));
which can also be faster, if the compiler supports doing the conversion at
compile time.
oh yeah, one more thing:
+ return oCharset.Length() > 0;

can be written as:
return !oCharset.IsEmpty();

which can be faster.

Comment 24

16 years ago
This checkin broke the OS/2 tinderbox on
http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey-Ports ... is anyone
looking at this ?

Comment 25

16 years ago
If you have a serious concern with the current fix, please
re-open this bug or file another one. 
reopening bug so that the issues from comment 22 and comment 23 can get addressed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Assignee)

Comment 27

16 years ago
bug 170339 filed to code cleanup. Close this bug.
Status: REOPENED → RESOLVED
Last Resolved: 16 years ago16 years ago
Resolution: --- → FIXED

Comment 28

16 years ago
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong

Comment 29

16 years ago
Verified it's fixed in 10-31 trunk build.  However it's still in 1.0.2 branch build.

I'm marking this as verified, if any one think it's important for branch build,
feel free to nominate it.
Status: RESOLVED → VERIFIED

Updated

16 years ago
Depends on: 180372

Updated

16 years ago
No longer blocks: 157673
You need to log in before you can comment on or make changes to this bug.