incorrect characters with ISO-8859-15 character coding

VERIFIED INVALID

Status

()

Core
Internationalization
VERIFIED INVALID
17 years ago
16 years ago

People

(Reporter: Olli Männistö, Assigned: Frank Tang)

Tracking

({intl})

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

17 years ago
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7+)
Gecko/20020116
BuildID:    2002011604

I started using 8859-15 codepage when we went over to Euro. I noticed that
sometimes single quotes are wrong. Changing codepage to -1 corrects problem, but
then the euro doesn't display properly. In the slashdot article, "we didn't
violate russian law" demonstrates the problem with codepage -15.

Reproducible: Always
Steps to Reproduce:
1.View->Character coding->Western (ISO-8859-15)
2.Load URL
3.Enjoy

Actual Results:  boxes and question marks replace single quotes

Expected Results:  single quotes

Updated

17 years ago
QA Contact: ruixu → ylong

Comment 1

17 years ago
I do see the single quotes are marked as "?" on both windows and linux when
charset sets to iso-8859-15.

But I don't see the euro sign in this page though.  So, looks like the page will
displayed fine when charset is in iso-8859-1. 
Well... the slashdot url is using quotes that are encoded in iso-8859-1, no?
(Reporter)

Comment 3

17 years ago
As far as I understand how things work, codepage -15 *is* codepage -1 plus the
euro character. At least that's what the linux installer said.

Anyways, check this european central back URL:
http://www.euro.ecb.int/en/section.html

The paragraph starting "The new coins.." displays "minus" characters incorrectly
as well as the euro sign before 664 billion when you use codepage -15. If you
use -1, all's well.

Comment 4

17 years ago
URL: http://www.euro.ecb.int/en/section.html
with iso-8859-1 will display euro sign fine and "The new coins -" (I can not
tell it should be ".." instead of "-") on all platforms.  

Does charset iso-8859-15 replace some iso-8859-1 characters or just add some
more special charcters based on iso-8859-1?  I'm confirming it to get engineers
input.


Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
OS: Windows 2000 → All
Hardware: PC → All
(Reporter)

Comment 6

17 years ago
Comparing the two character sets you can find in:
http://www.kostis.net/charsets/iso8859.1.htm
http://www.kostis.net/charsets/iso8859.15.htm

..It turns out -15 actually substitutes accented S, Z and OE characters into the
latin-1 set, as well as the euro sign. For whatever reason, mozilla cannot
display the accented chars properly with the -15 encoding, althought it doesn't
have a problem with kostis' page.

However, I should not see the single quotes even in ideal situation, it looks
like Redhat installation docs were less than complete about the differences of
latin-9 and latin-1 ... I quess I'll go on using the Microsoft-hacked latin-1
character set.
(Assignee)

Comment 7

16 years ago
I cannot see any problem on 2002012806 build. can someone attach a screenshot
and tell me where is the problem ?
(Assignee)

Comment 8

16 years ago
give to ftang
Assignee: yokoyama → ftang
(Reporter)

Comment 9

16 years ago
For seeing currency signs everywhere with codepage -15, it's a not-a-bug. 

For seeing question marks instead of extra currency chars, there's either a
prolem in the font or maybe with mozilla font rendering? I did try a few w2k
fonts, both serif and sans serif but I couldn't see the yen-char etc the
codepage -15 says I should see. Just extra question marks.

See the slashdot url sentense "we didn't break russian law"
(Assignee)

Comment 10

16 years ago
ok, the problem is not we support ISO-8859-15 wrong, the problem is how we
handle some code point that ISO-8859-15 does NOT defined in range 0x80-0x9f
the - ae encoded either in 0x96 or 0x97 which are defined in cp1252 but nor
ISO-8859-1 neither ISO-8859-15. For ISO-8859-1, we use the cp1252 definitation.
But in ISO-8859-15, we treat them as undefined characters. 
Status: NEW → ASSIGNED

Comment 11

16 years ago
I've found also when I use ISO-8859-15 I can't write accented letters (such as
"á"), I have to write them somewhere else and then copy&paste. I doesn't happen
when I use any other charset. However, I can write "ñ", so it's not my keyboard
settings, I suppose
(Assignee)

Comment 12

16 years ago
> rinzewind@wanadoo.es
which platform are you running?

Comment 13

16 years ago
I'm using Linux (Debian SID)

Comment 14

16 years ago
Forget about what I said... it was a misconfiguration in my font server...
nothing to worry. Sorry :-(

Comment 15

16 years ago
Maybe the codepoints 0x80 to 0x9f should be handled as UNDEFINED for all
ISO-8859 character sets. I simply don't like using "poisoned variants" of thoses
charsets. If People want to use those Windows-Charsets, they should declare so
accordingly. Just as HTML character entity references € to   are
control characters, not curly quotes or euro signs.

Comment 16

16 years ago
I'd like to second Marc's request. Please leave the ISO character sets as they
are. If you're inventing a mix of Windows-specific and ISO-characters users
might get confused by unexpected characters (as seen here), and you're
supporting the wrong setting of the Content-Type header (iso-.. instead of
windows-..). After all, that's what the Content-Type header is ment for,
defining which characters to display. You're trying to fix wrong content types
at the wrong end - a bit like Microsoft does.

In addition to that, you're getting a problem with forms: Although windows
characters are displayed in ISO pages, you cannot enter them in a text field,
because at least some of them get converted to Unicode things. A &#92 (a single
quote in Windows) for example suddenly turns up as ’ in the posted data,
although the form was in ISO format and thus should never have contained such a
character (see http://bugzilla.mozilla.org/show_bug.cgi?id=139328 ).

Even if I'm wrong about this internal conversion and it's relation to the mixed
display of Windows/ISO-characters discussed here, this at least shows that the
way you've gone only causes confusion, displaying things where they shouldn't be.

Better do it the right way from the beginning on :)
(Assignee)

Comment 17

16 years ago
>------- Additional Comment #16 From Sönke Tesch  2002-04-23 04:48 -------
>
>I'd like to second Marc's request. Please leave the ISO character sets as they
>are. If you're inventing a mix of Windows-specific and ISO-characters users
>might get confused by unexpected characters (as seen here),
sorry, too late to said that.
>Better do it the right way from the beginning on :)
yea but this is not a "beginning". The "beginning" is in 1994. It is 8 years
away from the "beginning". 

mark this bug as invalid
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → INVALID

Comment 18

16 years ago
Mark as verified according to above comments.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.