Mozilla mangles javascript: URLs with non-ASCII

VERIFIED DUPLICATE of bug 161479

Status

()

Core
Internationalization
VERIFIED DUPLICATE of bug 161479
16 years ago
16 years ago

People

(Reporter: Stanislav Malyshev, Assigned: Roy Yokoyama)

Tracking

({intl, testcase})

Trunk
x86
Linux
intl, testcase
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

16 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826

When there is a javascript: URL containin non-ASCII text, it comes to function
mangled (probably in UTF8?). However, when same call made from inside
Javascript, the parameter comes unmangled. I guess it should be consistent (and
preferably, the latter way). See also the example. May be related to bug #51355. 

Reproducible: Always

Steps to Reproduce:
See the example. Just press link and button and see different results.
(Reporter)

Comment 1

16 years ago
Created attachment 101054 [details]
Testcase for the bug (see this in KOI8-R charset)

Clicking on button and link gives different results.

Comment 2

16 years ago
wfm using build 2002092808 on Win2k (trunk).
Keywords: testcase

Comment 3

16 years ago
it also works fine here on Linux build 20020927 (trunk).

Comment 4

16 years ago
Well, it doesn't work for me either (1.1, Mac OS X). MacRoman garbage comes
instead of Cyrillic characters. The extremely bad Cyrillic rendering on Mac OS X
may be fixed, but this is really a JavaScript error. Javascript may not contain
unescaped non-ASCII strings unless natively encoded in a Unicode encoding.
Escape the characters as \uXXXX, where XXXX is the Unicode code point for the
character. Or save your page in UTF-8 or UTF-16.
(Reporter)

Updated

16 years ago
Attachment #101054 - Attachment description: Testcase for the bug → Testcase for the bug (see this in KOI8-R charset)
(Reporter)

Comment 5

16 years ago
Theoretically maybe it is right (though I don't think it is - tell me one
programmer's editor that can do UTF8 and one shell/grep/wc/sed that can grok
such files). Practically, however, there are thousands of pages containing
non-Latin-1 characters in Javascript. Breaking all these pages is not wise. BTW,
MSIE handles this just fine - so Mozilla could do it too.

Comment 6

16 years ago
BBEdit or Pepper for Mac OS X. grep/wc/cat etc works in whatever encoding you
prefer, provided it is supported by your OS (Mac OS X supports and actually uses
Unicode natively).

However, the question on what kind of JavaScript Mozilla supports really merits
an official answer. There have been numerous bugs on the subject, and no one
really knows what goes. On one hand, we have the ECMAScript compliance, and on
the other legacy code on the net. How is Mozilla to determine what is what? By
the deprecated language attribute? If the HTML is advertised as standards
compliant (with a valid DTD), the JavaScript is ECMA compliant, otherwise not?
All major browsers have declared ECMA compliance for many years now.

I have found that escaping non-ASCII characters is the best solution to make
scripts work in any browser (except the first JavaScript-enabled beasts),
although nowadays I will use nothing but raw UTF-16 (only Mozilla will handle
this... effectively shutting out all the lame browsers out there :)

Actually, one method that would probably also work is to specify the charset for
the script, i.e. <script charset="koi-8"> or whatever. This could be done once
and for all in a meta http-equiv tag or specifically for each script tag.

Comment 7

16 years ago
can you try again with a later build (1.2a) or even latest nightly build ?

Updated

16 years ago
Keywords: intl
QA Contact: ruixu → ylong

Comment 8

16 years ago
It's WFM with the test case.

I think there is nothing to do with JavaScript - the test page doesn't has any
charset info.  So it will be displayed as garbled if auto-detect OFF(default),
but if you turn auto-detect as Universal, the page will be displayed proerply,
and charset is marked as Cyrillc (KOI8-R). You can also get correct display by
manually set charset as a Cyrillc.  
Stanislav is right; we were treating stuff as UTF8 when we should not have been.
 It's been fixed for a while in the nightlies.

*** This bug has been marked as a duplicate of 161479 ***
Status: UNCONFIRMED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → DUPLICATE

Comment 10

16 years ago
Mark as verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.