122455 - incorrect characters with ISO-8859-15 character coding

Reporter

Description

•

23 years ago

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7+)
Gecko/20020116
BuildID:    2002011604

I started using 8859-15 codepage when we went over to Euro. I noticed that
sometimes single quotes are wrong. Changing codepage to -1 corrects problem, but
then the euro doesn't display properly. In the slashdot article, "we didn't
violate russian law" demonstrates the problem with codepage -15.

Reproducible: Always
Steps to Reproduce:
1.View->Character coding->Western (ISO-8859-15)
2.Load URL
3.Enjoy

Actual Results:  boxes and question marks replace single quotes

Expected Results:  single quotes

Rui Xu

Updated

•

23 years ago

QA Contact: ruixu → ylong

Yuying Long

Comment 1

•

23 years ago

I do see the single quotes are marked as "?" on both windows and linux when
charset sets to iso-8859-15.

But I don't see the euro sign in this page though.  So, looks like the page will
displayed fine when charset is in iso-8859-1.

Boris Zbarsky [:bzbarsky]

Comment 2

•

23 years ago

Well... the slashdot url is using quotes that are encoded in iso-8859-1, no?

Olli Männistö

Reporter

Comment 3

•

23 years ago

As far as I understand how things work, codepage -15 *is* codepage -1 plus the
euro character. At least that's what the linux installer said.

Anyways, check this european central back URL:
http://www.euro.ecb.int/en/section.html

The paragraph starting "The new coins.." displays "minus" characters incorrectly
as well as the euro sign before 664 billion when you use codepage -15. If you
use -1, all's well.

Yuying Long

Comment 4

•

23 years ago

URL: http://www.euro.ecb.int/en/section.html
with iso-8859-1 will display euro sign fine and "The new coins -" (I can not
tell it should be ".." instead of "-") on all platforms.  

Does charset iso-8859-15 replace some iso-8859-1 characters or just add some
more special charcters based on iso-8859-1?  I'm confirming it to get engineers
input.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Keywords: intl

OS: Windows 2000 → All

Hardware: PC → All

Yuying Long

Comment 5

•

23 years ago

Attached file saved page http://www.euro.ecb.int/en/section.html — Details

Olli Männistö

Reporter

Comment 6

•

23 years ago

Comparing the two character sets you can find in:
http://www.kostis.net/charsets/iso8859.1.htm
http://www.kostis.net/charsets/iso8859.15.htm

..It turns out -15 actually substitutes accented S, Z and OE characters into the
latin-1 set, as well as the euro sign. For whatever reason, mozilla cannot
display the accented chars properly with the -15 encoding, althought it doesn't
have a problem with kostis' page.

However, I should not see the single quotes even in ideal situation, it looks
like Redhat installation docs were less than complete about the differences of
latin-9 and latin-1 ... I quess I'll go on using the Microsoft-hacked latin-1
character set.

Frank Tang

Assignee

Comment 7

•

23 years ago

I cannot see any problem on 2002012806 build. can someone attach a screenshot
and tell me where is the problem ?

Frank Tang

Assignee

Comment 8

•

23 years ago

give to ftang

Assignee: yokoyama → ftang

Olli Männistö

Reporter

Comment 9

•

23 years ago

For seeing currency signs everywhere with codepage -15, it's a not-a-bug. 

For seeing question marks instead of extra currency chars, there's either a
prolem in the font or maybe with mozilla font rendering? I did try a few w2k
fonts, both serif and sans serif but I couldn't see the yen-char etc the
codepage -15 says I should see. Just extra question marks.

See the slashdot url sentense "we didn't break russian law"

Frank Tang

Assignee

Comment 10

•

23 years ago

ok, the problem is not we support ISO-8859-15 wrong, the problem is how we
handle some code point that ISO-8859-15 does NOT defined in range 0x80-0x9f
the - ae encoded either in 0x96 or 0x97 which are defined in cp1252 but nor
ISO-8859-1 neither ISO-8859-15. For ISO-8859-1, we use the cp1252 definitation.
But in ISO-8859-15, we treat them as undefined characters.

Status: NEW → ASSIGNED

rinzewind

Comment 11

•

23 years ago

I've found also when I use ISO-8859-15 I can't write accented letters (such as
"á"), I have to write them somewhere else and then copy&paste. I doesn't happen
when I use any other charset. However, I can write "ñ", so it's not my keyboard
settings, I suppose

Frank Tang

Assignee

Comment 12

•

23 years ago

> rinzewind@wanadoo.es
which platform are you running?

rinzewind

Comment 13

•

23 years ago

I'm using Linux (Debian SID)

rinzewind

Comment 14

•

22 years ago

Forget about what I said... it was a misconfiguration in my font server...
nothing to worry. Sorry :-(

Account Deleted

Comment 15

•

22 years ago

Maybe the codepoints 0x80 to 0x9f should be handled as UNDEFINED for all
ISO-8859 character sets. I simply don't like using "poisoned variants" of thoses
charsets. If People want to use those Windows-Charsets, they should declare so
accordingly. Just as HTML character entity references &#128; to &#160; are
control characters, not curly quotes or euro signs.

Sönke Tesch

Comment 16

•

22 years ago

I'd like to second Marc's request. Please leave the ISO character sets as they
are. If you're inventing a mix of Windows-specific and ISO-characters users
might get confused by unexpected characters (as seen here), and you're
supporting the wrong setting of the Content-Type header (iso-.. instead of
windows-..). After all, that's what the Content-Type header is ment for,
defining which characters to display. You're trying to fix wrong content types
at the wrong end - a bit like Microsoft does.

In addition to that, you're getting a problem with forms: Although windows
characters are displayed in ISO pages, you cannot enter them in a text field,
because at least some of them get converted to Unicode things. A &#92 (a single
quote in Windows) for example suddenly turns up as &#8217; in the posted data,
although the form was in ISO format and thus should never have contained such a
character (see http://bugzilla.mozilla.org/show_bug.cgi?id=139328 ).

Even if I'm wrong about this internal conversion and it's relation to the mixed
display of Windows/ISO-characters discussed here, this at least shows that the
way you've gone only causes confusion, displaying things where they shouldn't be.

Better do it the right way from the beginning on :)

Frank Tang

Assignee

Comment 17

•

22 years ago

>------- Additional Comment #16 From Sönke Tesch  2002-04-23 04:48 -------
>
>I'd like to second Marc's request. Please leave the ISO character sets as they
>are. If you're inventing a mix of Windows-specific and ISO-characters users
>might get confused by unexpected characters (as seen here),
sorry, too late to said that.
>Better do it the right way from the beginning on :)
yea but this is not a "beginning". The "beginning" is in 1994. It is 8 years
away from the "beginning". 

mark this bug as invalid

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → INVALID

Yuying Long

Comment 18

•

22 years ago

Mark as verified according to above comments.

Status: RESOLVED → VERIFIED