Closed Bug 372325 Opened 18 years ago Closed 18 years ago

 under UTF-8 is not a —

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: davygrvy, Unassigned)

References

(
URL
)

Details

David Gravereaux

Reporter

Description

•

18 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a3pre) Gecko/20070228 Minefield/3.0a3pre Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a3pre) Gecko/20070228 Minefield/3.0a3pre The character represented by the entity  is not \u2014 (—) when the claimed encoding is UTF-8. There is no glyph there. I think it is nice that FF is "fixing" Windows-1252 mis-representations, but by doing so, aren't you perpetuating the problem? Please give me the empty squarebox instead so I can -RFC police- the offenders :) Reproducible: Always Steps to Reproduce: 1. Go to http://news.yahoo.com/ 2. Look at almost any news article for the 'em dash' glyph 3. Open source and see it referenced as , not — 4. Scratch head trying to find why 'Windows-1252' leaked into UTF-8 Actual Results: There's no glyph for  under UTF-8 Expected Results: Show me a empty squarebox there. Mozilla should be accurate. Be the RFC police! Do it well.

David Gravereaux

Reporter

Comment 1

•

18 years ago

The web authorship issue of this is described @ http://www.cs.tut.fi/~jkorpela/www/windows-chars.html

David Gravereaux

Reporter

Comment 2

•

18 years ago

odd Windows-1252 leakage into ISO-8859-1 shown @ http://old.no/charmap/iso-8859-1.html I'm seeing glyphs in the forbidden zones. Yes, — is displayed along with others. Notice I added the offending char () to this form just to see the behavior.

Justin Kerk

Comment 3

•

18 years ago

Comment #2 is bug 288904, and a separate issue.

David Gravereaux

Reporter

Comment 4

•

18 years ago

The good thing anyways, is it comes back properly as \u2014 on this page as form entry saw it correctly from cp1252. I hear from other people that the problem might be related to fonts under windows as Win32 might be going under Mozilla and implicitly adding those glyphs there as a "favor", thus requiring code to be strict to remove that range. Umm.. add code to remove them, thanks.

Simon Montagu :smontagu

Comment 5

•

18 years ago

(In reply to comment #0) > The character represented by the entity  is not \u2014 (—) when the > claimed encoding is UTF-8. There is no glyph there. This is true but doesn't go far enough: "when the claimed encoding is UTF-8" is redundant because numeric entities should *always* represent Unicode codepoints whatever the encoding of the document they appear in. However I fear we are stuck with this incorrect behaviour for the sake of backward compatibility. I really don't think we want to start evangelizing http://news.yahoo.com and all the other authors who (ab)use entities in this way.

Status: UNCONFIRMED → RESOLVED

Closed: 18 years ago

Resolution: --- → WONTFIX

David Gravereaux

Reporter

Comment 6

•

18 years ago

> This is true but doesn't go far enough: "when the claimed encoding is UTF-8" is > redundant because numeric entities should *always* represent Unicode codepoints > whatever the encoding of the document they appear in. Yes, I looked it up earlier tonight and found numeric entities are always Unicode referenced. > However I fear we are stuck with this incorrect behavior for the sake of > backward compatibility. I say make a statement for the better and fix the cruft.

David Gravereaux

Reporter

Comment 7

•

18 years ago

Is there an about:config setting I can add so I get strict ISO-8859-1 decoding instead of the current broken one?

Simon Montagu :smontagu

Comment 8

•

18 years ago

No, but if you file a bug asking for one I will consider adding it ;-)

David Gravereaux

Reporter

Comment 9

•

18 years ago

you rock :)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

under UTF-8 is not a —

Categories

(Firefox :: General, defect)

Tracking

()

People

(Reporter: davygrvy, Unassigned)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

&#151; under UTF-8 is not a &mdash;

under UTF-8 is not a —