Closed Bug 161479 Opened 22 years ago Closed 22 years ago

[trunk]JavaScript string always converted to UTF-8 inside a Windows-1251 page

Categories

(Core :: Internationalization, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: Maniac, Assigned: nhottanscp)

References

Details

(Keywords: intl, regression)

Attachments

(4 files, 2 obsolete files)

In document written in Windows-1251 strings that appear as JavaScript string
constant are converted in UTF-8 unlike the rest of the page.
Attached file a testcase
1. Save the testcase on local disk (for <META HTTP-EQUIV> take effect)
2. Look into the source: both strings in HREF attribute and between <A></A>
should look identical (though unreadable for someone because they are in
Russian)
3. Open the testcase in browser and click on a link.
4. Compare two string: the one in alert window looks converted in UTF-8.
WFM 2002072204 Linux. The alert looks identical to both the source and the
displayed text in the browser window (less font size differences, of course).
Keywords: intl
QA Contact: ruixu → ylong
It seem works for me either, I don't see the russian characters are displayed
much difference between html file and alert window.

Reporter: could you please attach a screen shot for the problem? thanks!
Attached image screenshot
I'm seeing this in Linux build 2002-08-05-08...  both locally and from
bugzilla.
On my system (win98) it looks pretty much like on Boris' screenshot. Sorry, I
forgot to specify my build id: 2002080108
Attached image screenshot on Win98
Hmm, I don't have any problem on my WinXP, also on WinME even though with WinME
the characters in Alert window looks like wider than they are in html file.

I tried it on linux RH7.2, it has similar result as in WinME, except miss the
last charcter"y" in alert window.

But I didn't see the garbage display like the 2 screen shot before with both case.

I'm going to confirm it now in order to get more investigation.
Status: UNCONFIRMED → NEW
Ever confirmed: true
On windows XP,
Build 1.0 2002053012 doesn't show this problem, testcase works fine.
On build 2002081409 running on the same computer, the testcase alert produces 
garbage
I saw the garbage display in 08-15 trunk build but not branch build / WinME.
Summary: JavaScript string always converted to UTF-8 inside a Windows-1251 page → [trunk]JavaScript string always converted to UTF-8 inside a Windows-1251 page
Blocks: 162958
can this be related to the renaming of nsISupportsWString / nsISupportsString 
bug 157624? I had other javascript problems with strings after this was 
checked in.
Dup. bugs keep come in. Nominate as nsbeta1.
Keywords: nsbeta1
*** Bug 166368 has been marked as a duplicate of this bug. ***
The problem is definitely with javascipt string constants in href attribute of
<a> tag. I mean the problem is simply with href attribute, but the only way to
encounter problems with it is using javascript inline functions.
Adding regression. And nominating for 1.2alfa.
Keywords: regression
BTW, On Win98 2002090208 testcase now refuses to work :-(. It ignores
JavaScript: URL in <a href=...> and just reloads the same page instead.

Should I file a new bug or am I missing something?
Target Milestone: --- → mozilla1.2alpha
Oh... Another 'BTW'.
This bug looks very similar to bug 147991 which was filed earlier. May be we
should resolve this as a duplicate?
Followup to comment #15:

The testcase malfunction can be traced in the status bar where 

javascript:window.alert('...');

looks like 

javascript:///window.alert('...');

But! This appears only if window.alert contains parameter spelled in cyrillic
letters. If I change this to, say, 

javascript:window.alert('Test');

then popup jumps up instantly upon clicking this thing. 
Because this bug has more comments and testcase it worth marking bug 147991 as dup
It seems like there's weird stuff happening in
nsJSProtocolHandler::EnsureUTF8Spec.  I'm seeing (using \uNN for nonprintable
characters) it be called some of the time with the input:

Input spec (charset=windows-1251):
JavaScript:window.alert('\uD0\u9F\uD0\uBE\uD1\u87\uD0\uB5\uD0\uBC\uD1\u83 UTF-8?');

which leads it to the first early return.

However, sometimes it's called with the input:

Input spec (charset=windows-1251):
javascript:window.alert('%D0%9F%D0%BE%D1%87%D0%B5%D0%BC%D1%83 UTF-8?');

which leads it to run all the way through and produce the doubly-escaped output:

javascript:window.alert('%D0%A0%D1%9F%D0%A0%D1%95%D0%A1%E2%80%A1%D0%A0%C2%B5%D0%A0%D1%98%D0%A1%D1%93
UTF-8?');

which ends up being used.


From a low-level point of view, the percent-escaped input should have had
|aCharset| as UTF-8, not windows-1251.  That said, all this conversion strikes
me as awfully messy.
Reassign to nhotta.
Assignee: yokoyama → nhotta
> From a low-level point of view, the percent-escaped input should have had
> |aCharset| as UTF-8, not windows-1251.  That said, all this conversion strikes
> me as awfully messy.

This is actually about non escaped case. The current code assumes the URI as
UTF-8 if not percent escaped while many existing documents use raw 8bit as URI
without escaping. In order to support them, we assume UTF-8 only if the string
is UTF-8 otherwise use the document charset. About the conversion, there is a
utility function I can use so it can be simplified.
Status: NEW → ASSIGNED
*** Bug 162958 has been marked as a duplicate of this bug. ***
>This is actually about non escaped case.
This was actually wrong, the data is escaped as dbaron mentioned. The current
patch checks if the URI is UTF-8 or not either escaped or not escaped, so fixes
the problem.
I will take a look at the patch again tomorrow then ask for reviews.
drivers are interested in this for 1.2a adding to the 1.2 dependency list.
Blocks: 1.2a
Attachment #98048 - Attachment is obsolete: true
Add "HZ" to stateful charset set and them mark it as r=shanjian.
Attachment #98157 - Attachment is obsolete: true
Comment on attachment 98214 [details] [diff] [review]
Added HZ as 7bit encoding

r=shanjian
Attachment #98214 - Flags: review+
Comment on attachment 98214 [details] [diff] [review]
Added HZ as 7bit encoding

sr=jst
Attachment #98214 - Flags: superreview+
Depends on: 166996
This seems reasonable for now... once bug 166996 is fixed, we should revisit
this code, though....
Comment on attachment 98214 [details] [diff] [review]
Added HZ as 7bit encoding

a=dbaron for trunk checkin
Attachment #98214 - Flags: approval+
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Verified fixed in 09-10 trunk build on windows and linux.
Status: RESOLVED → VERIFIED
*** Bug 171521 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: