Closed Bug 122455 Opened 23 years ago Closed 22 years ago

incorrect characters with ISO-8859-15 character coding

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: ollittm, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7+)
Gecko/20020116
BuildID:    2002011604

I started using 8859-15 codepage when we went over to Euro. I noticed that
sometimes single quotes are wrong. Changing codepage to -1 corrects problem, but
then the euro doesn't display properly. In the slashdot article, "we didn't
violate russian law" demonstrates the problem with codepage -15.

Reproducible: Always
Steps to Reproduce:
1.View->Character coding->Western (ISO-8859-15)
2.Load URL
3.Enjoy

Actual Results:  boxes and question marks replace single quotes

Expected Results:  single quotes
QA Contact: ruixu → ylong
I do see the single quotes are marked as "?" on both windows and linux when
charset sets to iso-8859-15.

But I don't see the euro sign in this page though.  So, looks like the page will
displayed fine when charset is in iso-8859-1. 
Well... the slashdot url is using quotes that are encoded in iso-8859-1, no?
As far as I understand how things work, codepage -15 *is* codepage -1 plus the
euro character. At least that's what the linux installer said.

Anyways, check this european central back URL:
http://www.euro.ecb.int/en/section.html

The paragraph starting "The new coins.." displays "minus" characters incorrectly
as well as the euro sign before 664 billion when you use codepage -15. If you
use -1, all's well.
URL: http://www.euro.ecb.int/en/section.html
with iso-8859-1 will display euro sign fine and "The new coins -" (I can not
tell it should be ".." instead of "-") on all platforms.  

Does charset iso-8859-15 replace some iso-8859-1 characters or just add some
more special charcters based on iso-8859-1?  I'm confirming it to get engineers
input.


Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
OS: Windows 2000 → All
Hardware: PC → All
Comparing the two character sets you can find in:
http://www.kostis.net/charsets/iso8859.1.htm
http://www.kostis.net/charsets/iso8859.15.htm

..It turns out -15 actually substitutes accented S, Z and OE characters into the
latin-1 set, as well as the euro sign. For whatever reason, mozilla cannot
display the accented chars properly with the -15 encoding, althought it doesn't
have a problem with kostis' page.

However, I should not see the single quotes even in ideal situation, it looks
like Redhat installation docs were less than complete about the differences of
latin-9 and latin-1 ... I quess I'll go on using the Microsoft-hacked latin-1
character set.
I cannot see any problem on 2002012806 build. can someone attach a screenshot
and tell me where is the problem ?
give to ftang
Assignee: yokoyama → ftang
For seeing currency signs everywhere with codepage -15, it's a not-a-bug. 

For seeing question marks instead of extra currency chars, there's either a
prolem in the font or maybe with mozilla font rendering? I did try a few w2k
fonts, both serif and sans serif but I couldn't see the yen-char etc the
codepage -15 says I should see. Just extra question marks.

See the slashdot url sentense "we didn't break russian law"
ok, the problem is not we support ISO-8859-15 wrong, the problem is how we
handle some code point that ISO-8859-15 does NOT defined in range 0x80-0x9f
the - ae encoded either in 0x96 or 0x97 which are defined in cp1252 but nor
ISO-8859-1 neither ISO-8859-15. For ISO-8859-1, we use the cp1252 definitation.
But in ISO-8859-15, we treat them as undefined characters. 
Status: NEW → ASSIGNED
I've found also when I use ISO-8859-15 I can't write accented letters (such as
"á"), I have to write them somewhere else and then copy&paste. I doesn't happen
when I use any other charset. However, I can write "ñ", so it's not my keyboard
settings, I suppose
> rinzewind@wanadoo.es
which platform are you running?
I'm using Linux (Debian SID)
Forget about what I said... it was a misconfiguration in my font server...
nothing to worry. Sorry :-(
Maybe the codepoints 0x80 to 0x9f should be handled as UNDEFINED for all
ISO-8859 character sets. I simply don't like using "poisoned variants" of thoses
charsets. If People want to use those Windows-Charsets, they should declare so
accordingly. Just as HTML character entity references € to   are
control characters, not curly quotes or euro signs.
I'd like to second Marc's request. Please leave the ISO character sets as they
are. If you're inventing a mix of Windows-specific and ISO-characters users
might get confused by unexpected characters (as seen here), and you're
supporting the wrong setting of the Content-Type header (iso-.. instead of
windows-..). After all, that's what the Content-Type header is ment for,
defining which characters to display. You're trying to fix wrong content types
at the wrong end - a bit like Microsoft does.

In addition to that, you're getting a problem with forms: Although windows
characters are displayed in ISO pages, you cannot enter them in a text field,
because at least some of them get converted to Unicode things. A &#92 (a single
quote in Windows) for example suddenly turns up as ’ in the posted data,
although the form was in ISO format and thus should never have contained such a
character (see http://bugzilla.mozilla.org/show_bug.cgi?id=139328 ).

Even if I'm wrong about this internal conversion and it's relation to the mixed
display of Windows/ISO-characters discussed here, this at least shows that the
way you've gone only causes confusion, displaying things where they shouldn't be.

Better do it the right way from the beginning on :)
>------- Additional Comment #16 From Sönke Tesch  2002-04-23 04:48 -------
>
>I'd like to second Marc's request. Please leave the ISO character sets as they
>are. If you're inventing a mix of Windows-specific and ISO-characters users
>might get confused by unexpected characters (as seen here),
sorry, too late to said that.
>Better do it the right way from the beginning on :)
yea but this is not a "beginning". The "beginning" is in 1994. It is 8 years
away from the "beginning". 

mark this bug as invalid
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
Mark as verified according to above comments.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: