leefish.ch not display text as of Moz 1.5

VERIFIED WORKSFORME

Status

()

Core
DOM
--
major
VERIFIED WORKSFORME
15 years ago
15 years ago

People

(Reporter: Olaf Christoffel, Unassigned)

Tracking

({intl})

Other Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(3 attachments)

(Reporter)

Description

15 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031030
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6a) Gecko/20031030

this page looks correct from netscape 6 to mozilla 1.4 finall. since mozilla 1.5
part of the content (text) is missing.


Reproducible: Always

Steps to Reproduce:
1.
2.
3.



Expected Results:  
page should be rendered like mozilla 1.4 

the page is made with a cms that does use the ie contenteditable features. sites
generated with the cms work correctly in ie, mozilla < 1.5, opera 7.2 on win32.

very difficult to say whats wrong, but there seems to be something wrong :-(

Comment 1

15 years ago
confirm problem on Moz 1.5 final on Win XP Pro

confirm it works ok on IE 6.0 on WinXP Pro

See screenshots

- severity -> major
- clarify summary
Severity: normal → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: since mozilla 1.5 text content of this page is missing → leefish.ch not display text as of Moz 1.5

Comment 2

15 years ago
Created attachment 134847 [details]
Screenshot: Leefish on Moz 1.5 final

Comment 3

15 years ago
Created attachment 134848 [details]
Screenshot: Leefish on IE 6.0
This is a regression from bug 200984, looks like.  I can't find where the site
is calling unescape(), but there is clearly a character-encoding issue here --
if I go back to treating the string to be unescaped as ASCII, things work.  I
have no idea why this works in IE, given that it does NOT treat the string as
ASCII... but the site is doing a lot of UA-sniffing (eg passing the appname and
UA string to all subframes in the URL), so I would not be surprised if they send
IE and Mozilla different data...
(Reporter)

Comment 5

15 years ago
the text is unescaped because of javascript issues.

e.g.: 
div.innerHTML = 'O'Brien is a Scottish name';
would generate a javascript error, so we need to do the following:
div.innerHTML = unescape('O%27'Brien is a Scottish name');

browser sinffing is done one time (ok reloading the page will do it again). the
content for all browsers is absolutly the same.
Well, here is the problem.  The long string with all sorts of URL-escapes that's
passed to unescape() contains a char that's not expressible as ISO-8859-1 (the
charset of the page).  When converted from Unicode to UTF-8, the byte sequence is:

S T R O N G % 3 E 0xe2 0x80 0xa6 F O R % 2 0 Y O U R ...

So we end up discarding the whole thing, since we can't do the right charset
conversions.  We could probably try to recover and knowingly produce bad data
and hope that it looks "about right", which is probably what IE does...
I don't think IE makes any distinction between ISO-8859-1 and windows-1252,
where the character (HORIZONTAL ELLIPSIS) does appear. If I override the
encoding to windows-1252 in Mozilla, the problem seems to go away.

Olaf, are you the page author or webmaster? If so, can you change the charset
declaration of the page to windows-1252?

(Reporter)

Comment 8

15 years ago
thanks Simon for your answer, we did some testing against the charsets.

ISO-8859-1:
Mozilla 1.1b to Mozilla 1.4 works
Mozilla 1.5 or greater does not work

UTF-8:
Mozilla 1.1b to Mozilla 1.4 has problems with ä,ü,ö and so on. Text is not
displaying.
Mozilla 1.5 or greater works.

Windows-1252:
All Mozilla versions works...but as you may saw the leefish.ch does also sell
fishes in the japanese markets. japaneses characters are saved in escaped form
in the database and displayed through unescaping the characters. this does not
work with Windows-1252.

At the moment there is no test page for this online, but we will provide you
with test case tomorrow.

With IE 5.1-6.0 the pages works, it does not matter what type of charset we use.

Test are done on win2000 and winXP.

Comment 9

15 years ago
This may be irrelevant, but the page works fine in Opera 7 too.
> japaneses characters are saved in escaped form
> in the database and displayed through unescaping the characters.

Olaf, the problem is that unescaping produces _bytes_, not characters.  Those
unescaped bytes can only be interpreted as characters encoded in some encoding. 
At the moment, Mozilla assumes that encoding is the page encoding (since that
makes the most sense).  But the page encoding here is ISO-8859-1, and Japanese
characters can't possibly be encoded in that... so what encoding _are_ you using
exactly?  And how are we supposed to know that?

By the way, the problem you are seeing with UTF-8 in old Mozilla builds is
precisely what bug 200984 was about.

Comment 11

15 years ago
>Comment #6
>So we end up discarding the whole thing, since we can't do the right charset
>conversions.  We could probably try to recover and knowingly produce bad data
>and hope that it looks "about right", which is probably what IE does...

please understand, that I want to use an encoding like iso-8859-1 as
general encoding. All characters not fitting in this 
characterset are available in unicode sequence 
e.g. \u30D7\u30ED\u30B8\u30A7\u30AF\u30C8 (some japanese characters)
unescapeing such a sequence and applying as 
innerHTML or value in a html input element produces strange results with
different versions of mozilla builds
can anybody tell me why the behaviour changes from version to version?
this is a nightmare!


>Comment #10
>But the page encoding here is ISO-8859-1, and Japanese
>characters can't possibly be encoded in that... so what encoding _are_ you 
>using exactly?  And how are we supposed to know that?

We are using iso-8859-1 right now. 
I wonder, why windows-1252 and iso-8859-1 produce different results 
on ver. 1.1-1.4???

>By the way, the problem you are seeing with UTF-8 in old Mozilla builds is
>precisely what bug 200984 was about.

Using UTF-8 works fine with latest builds 1.5+ ,but is not really an option
for us, since we are not working in a laboratory environment, but dealing
with the real world and a lot of users are using version before 1.5!
All these people would be really upset, if we would change to utf-8

any suggestions that work for all version 1.2 through 1.5+?

> e.g. \u30D7\u30ED\u30B8\u30A7\u30AF\u30C8 (some japanese characters)
> unescapeing such a sequence

There is nothing there to unescape, is there?  That's not a URL-escaped string...

> can anybody tell me why the behaviour changes from version to version?

Because the pre-1.5 behavior was buggy with non-western (and possibly with just
non-ascii) chars.

> I wonder, why windows-1252 and iso-8859-1 produce different results
> on ver. 1.1-1.4???

Because they are different encodings?

> any suggestions that work for all version 1.2 through 1.5+?

You have to tell me the constraints for me to be able to answer this question...
as far as I can tell, you have the following constraints:

1)  The string you are passing to unescape() cannot always be represented as
    ISO-8859-1 (thus causing problems in 1.5).
2)  Something about Japanese characters.

Please clearly explain _exactly_ what issue #2 is (what exact Unicode string you
pass to unescape() and what exact unicode string you expect out and why).

Comment 13

15 years ago
Created attachment 134950 [details]
escape/unescape test case with characters in Java notation(\uxxxx)

Try the page with many different character encodings. The page is 'compatible'
with any ASCII-preserving encodings. 

Characters in Java notation is converted to Unicode characters internally so
that escaping and unescaping of them don't work if the current document
encoding can't represent them.	

What MSIE does might be either
 A. don't convert '\uxxxx' to Unicode characters internally until it 
    is printed out 
 or
 B. escape/unescape converts characters unrepresentable in the current document
encoding to Java notation (or some other representation)

If Mozilla 1.1-1.4 compatibility were not an issue, I would suggest using
UTF-8.

We(mozilla) might add 'Java-style notation' to nsISaveAsCharset (if it's not
there yet) and use it in escape/unescape, but it wouldn't help Mozilla 1.1-1.4
users.
jshin, on a separate note, do you think we should handle conversion errors as I
mention in comment 6 (fill in 0xFFFD and go on or something)?  That's not so bad
going native-to-unicode, but the problem here is going unicode-to-native...

Comment 15

15 years ago
bz, when going from unicode to native (in GlobalWindowImpl::Unescape), we're
calling  ConvertCharset which returns null on coming across an unrepresentable
char. By using nsISaveAsCharset (with Java notation) instead of ConvertCharset,
we can avoid that. nsUnescape would leave \uxxxx alone. Then, after converting
back to Unicode, we have to replace \uxxxx with a PRUnichar corresponding to
\uxxxx. The last step is not pretty, but we can copy code from JS engine or call
it if it's public... 

What if there's literal '\uxxxx'......
nsISaveAsCharset needs to deal with that anyway, no?  Preferably by escaping '\'
as \uxxxx if that's the escapeing method chosen...

Comment 17

15 years ago
OK to make things clearer, I prepared some test pages:
three encodings are available:
utf-8, windows-1252, and iso-8859-1

http://dev.leefish.ch/utftest.jsp
http://dev.leefish.ch/1252test.jsp
http://dev.leefish.ch/iso88591test.jsp

12 tests per page: (div and input)
the first six fields are not escaped, the second six are

in IE and Opera 7.02 on Win32 all three encodings display the correct characters
for the last six (unescaped) fields

Comment 18

15 years ago
Addendum for Comment 17:

Sorry folks:
Please use the test pages without dev

http://leefish.ch/utftest.jsp
http://leefish.ch/1252test.jsp
http://leefish.ch/iso88591test.jsp
Ok, so Jungshik's guess was right.  The problem is that \uxxxx escapes are
converted into unicode chars at JS compile time, so unescape('\uxxxx') can fail
in Mozilla 1.5 if the unicode char in question cannot be represented in the page
encoding...  It would succeed in Mozilla 1.4 or earlier, but at the cost of
corrupting the char (since it would "convert" the Unicode chars to bytes by
simply casting each 16-bit unsigned int into to an 8-bit signed int, then
unescape %-escapes in the resulting bytes and convert the bytes back into chars
using the page encoding).

In short, doing Japanese this way through unescape() in Mozilla before 1.5
simply doesn't work (notice the garbage displayed in the last two fields of the
iso88591 and 1252 test by Mozilla 1.4).  It works in Mozilla 1.5 if the page
encoding can encode those chars (due to the fix for bug 200984).  If we make the
changes Jungshik proposes, we can make it work even if the page does not support
those chars.  Jungshik, could we do this in the 1.6b timeframe?  How much change
to the nsISaveAsCharset code would be needed?  It seems to already support
\uxxxx escapes (see
http://lxr.mozilla.org/seamonkey/source/intl/unicharutil/src/nsSaveAsCharset.cpp#335),
but has the problem you mentioned with literal \\uxxxx being present in the
string that would need addressing...

Thomas, I'm afraid I cannot offer you a solution that works in 1.2-1.5+ Mozilla
builds.  This is largely because there is simply no way to make the Japanese
chars work in pre-1.5 builds without using an encoding in which all the Japanese
chars involved are single-byte (such do not exist to my knowledge....).

You basically have two options:

1) Use UTF-8.  Then both Japanese and western non-ascii chars work in Mozilla
   1.5 and neither really works in pre-1.5 builds.
2) Use Windows-1252.  Then Japanese chars fail in all Mozilla builds and western
   chars work in all Mozilla builds, as you observed.

There is also "option three", which is to sniff the browser version and use
UTF-8 for 1.5+ and windows-1252 for pre-1.5, which just gives you broken
Japanese in pre-1.5 builds....  I don't know how your various markets compare
and hence which decision makes the most sense for you.  With any luck, things
should work correctly in 1.6 even if you choose option 2 above.

There is also the question of whether we want to try to land bug 200984 on the
1.4 branch.  I doubt that would really improve the situation, though.
(Reporter)

Comment 20

15 years ago
Many thanks for the information.

We can live for now with the windows-1252 charset. but it would be really
helpful if mozilla is rendering the unicode chars correctly in future.

I will change the bug to resolved as soon as mozilla is doing right with the chars.

thomas
olaf

Comment 21

15 years ago
bz, I can't promise it will be done in 1.6b timeframe. I'll give it a try but I
have to solve real-life issues as well :-). 

As for nsISaveAsCharset, yes it has '\uxxxx' already, but I guess it doesn't
escape literal '\' because it's only for output and doesn't care about
converting back. I wish nsISaveAsCharset::Convert had an out parameter
indicating whether there's any character that is escaped or how many chars. are
escaped.

[somewhat ot, but related]
BTW, we might have to fix bug 44272 at the same time. If MS IE does the right
thing(in regard to ECMAscript standard) and hasn't caused  compatibility
problems, we shoudl be able to do it without much worry. In that case, the last
step in comment #15 has to deal with '%uxxxx' as well as '\uxxxx'. An
alternative would be to add '%uxxxx' escaping to nsISaveAs and deal with only
'%uxxxx' in GlobalWindowImpl:UnEscape. This addition can be also used by
GlobalWindowImpl:Escape (to fix bug 44272). 

Comment 22

15 years ago
I'm gonna fix this eventually. As for leefish site, I came up with a solution
that should work across versions. 

1. Use UTF-8

2. Instead of mxiing '%xx' (for characters like single quotation as in O'Brien)
and '\uxxxx' for Japanese, always use '\uxxxx' for ASCII characters you
currently url-escape as well as for Japanese characters. 

Actually, if you use UTF-8, you can put literal Japanese characters. You have to
use '\uxxxx' notation only for url-unsafe ASCII characters. It should be very
easy to write a Java function for this on the server side, shouldn't it?  
 
3. With that, you don't need to call |unescape| in your ECMAscript(Javascript)
so that you don't have to worry about   the version/browser dependency of 'escape()'
 
(Reporter)

Comment 23

15 years ago
I'm gonna fix this eventually. As for leefish site, I came up with a solution
that should work across versions. 
-Please do. as soon as you dealt with the real live issues :-)

1. Use UTF-8
our customer does not have a UTF-8 database.

2. Instead of mxiing '%xx' (for characters like single quotation as in O'Brien)
and '\uxxxx' for Japanese, always use '\uxxxx' for ASCII characters you
currently url-escape as well as for Japanese characters. 
-okey that would be easy to do for javascript char problems.

Actually, if you use UTF-8, you can put literal Japanese characters. You have to
use '\uxxxx' notation only for url-unsafe ASCII characters. It should be very
easy to write a Java function for this on the server side, shouldn't it?  
-thats right:-)
 
3. With that, you don't need to call |unescape| in your ECMAscript(Javascript)
so that you don't have to worry about   the version/browser dependency of 'escape()'
-we need to use a default charset (like windows-1252) if we would escape every
single none ASCII char, the bytestream would be 6 times larger then  yet.

-there is also an extranet behind the page which needs sorting and search
capabilities accordingly to the region (e.g.: regions/countries do use diffrent
sorting)

-mainly we use a default region charset (e.g western style for switzerland,
shift-jis for japan) for the customer. Chars which cannot displayed by this
charset are saved in escaped form (and then unescaping them for display).

-as for now the customer (leefish) does not need japanese characeters (maybee in
3 months) so, we can live for now with windows-1252. 

-about 99% of the visitor uses ie5-6 and this browsers is doing okey, so mozilla
is not mission critical.

-!We think because ie and opera are doing ok with the this problem/issue/bug
mozilla should do it too. maybee there are also other humans/companies around
the globe which would be happy if mozilla can display it.


Comment 24

15 years ago
I fixed the problem by fixing bug 44272. I'm gonna upload a patch there in a minute.

Here is some clarification.

> mainly we use a default region charset (e.g western style for switzerland,
> shift-jis for japan) for the customer.

  Well, I'm afraid this is not so good an idea. I would use UTF-8 everywhere
from the start to the end. If your customer has DB in legacy character encoding,
that would be the only point at which you have to deal with legacy character enc
odings. Once into your system, why bother to deal with those things of the past
especially considering that virtually all modern browsers have no problem
dealing with UTF-8 (that is, site visitors would never notice the difference if
you use 'lang' and 'xml:lang'  to specify the language of the content correctly)
You may wish to visit http://www.w3.org/international where there are a couple
of FAQ items as to why use Unicode. 


> Chars which cannot displayed by this
> charset are saved in escaped form (and then unescaping them for display)

You have a problem because you use 'unescape()'.If you don't use 'unescape()',
there's no problem. You do NOT have to use it as long as you use '\uxxxx' (Java
notation) for a small subset of ASCII characters.

> we need to use a default charset (like windows-1252) if we would escape every
> single none ASCII char, the bytestream would be 6 times larger then  yet.
 
As for the size bloat, note that you do NOT have to use \uxxxx for Japanese
characters(letters with diacritic marks for Western European languages).in pages
encoded in Shift_JIS/EUC-JP (Windows-1252/ISO-8859-1) or UTF-8. \uxxxx notation
is only necessary for two cases : 1) to 'escape' characters that should not
appear in JS string literal directly (a _small subset of ASCII characters) 2) to
represent characters OUTSIDE the character repertoire of the current page
encoding (that is, Japanese characters in Windows-1252/ISO-8859-1 or Latin
letters with diacritic marks in Shift_JIS/EUC-JP).  
Component: Layout → DOM Other
Depends on: 44272
Keywords: intl
OS: Windows XP → All
Hardware: PC → All
(Reporter)

Comment 25

15 years ago
I fixed the problem by fixing bug 44272. I'm gonna upload a patch there in a minute.
> I will test it tomorrow, today I had a f... hard working day at a customer 
> and I'm to tired now for testing.

Well, I'm afraid this is not so good an idea. I would use UTF-8 everywhere
from the start to the end. If your customer has DB in legacy character encoding,
that would be the only point at which you have to deal with legacy character enc
odings. Once into your system, why bother to deal with those things of the past
especially considering that virtually all modern browsers have no problem
dealing with UTF-8 (that is, site visitors would never notice the difference if
you use 'lang' and 'xml:lang'  to specify the language of the content correctly)
You may wish to visit http://www.w3.org/international where there are a couple
of FAQ items as to why use Unicode. 
> your right but sometimes you do not have the choice. i will visit the site and
> study more the chareset and encoding, i must admin, im not the guru with  
> encoding and charsets.


many thanks! your and mozilla group help blasts away any support hotline form an
normal company!!!






 

Comment 26

15 years ago
Thanks for your kind words and your willingness to support Mozilla. [OT] I wish
Korean web admins/designers were like you. They don't have a single bit of
interest in interoperability, platform/device independence, standard compliance
and univeral accessibility (all at the very heart of the web and internet). 

On, bug 44272 (the fix for which was landed to the trunk), I am gonna 'lobby'
for applying the patch to 1.4 branch.

Comment 27

15 years ago
Work now in 1.6trunk thanks to the patch for bug 44272.
it's not fixed on 1.5... are we gonna have 1.5.1? 
maybe not. not sure what to do here
Version: Trunk → Other Branch
(Reporter)

Comment 28

15 years ago
with firebird nightly build 20031114 all testcases works for me :-) 

http://leefish.ch/utftest.jsp
http://leefish.ch/1252test.jsp
http://leefish.ch/iso88591test.jsp

many thanks! 

will change to fixed when the bug is resolved in an "official" 1.x release

Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → WORKSFORME
(Reporter)

Comment 29

15 years ago
with moz 1.6b it works as it should :-)
many thanks
olaf
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.