Closed Bug 333292 Opened 19 years ago Closed 19 years ago

renders invalid latin1 characters as if from windows character set

Categories

(Firefox :: General, defect)

Other
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 288904

People

(Reporter: Matijs.van.Zuijlen, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.8.0.1) Gecko/20060313 Debian/1.5.dfsg+1.5.0.1-4 Firefox/1.5.0.1
Build Identifier: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.8.0.1) Gecko/20060313 Debian/1.5.dfsg+1.5.0.1-4 Firefox/1.5.0.1

Upon loading the example page, the character set is identified as "Western (ISO 8869-1)". However, the page contains characters (hex) 92, 93, 94 and 97, which are invalid in that encoding. They are however rendered as single and double quotes and em-dashes, as in the Windows-1252 encoding.

Reproducible: Always

Steps to Reproduce:
1. Load example page and examine the reported character set
Actual Results:  
Character set is identified as ISO-8869-1, invalid characters display as if valid.

Expected Results:  
Either the character set is 'corrected' to Windows-1252, or the characters are identified is invalid (using "?").
The site won't resolve for me. Are you sure that's the right URL?

*** This bug has been marked as a duplicate of 288904 ***
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → DUPLICATE
(In reply to comment #1)
> The site won't resolve for me. Are you sure that's the right URL?

Yes, it still resolves fine here.
(In reply to comment #2)
> *** This bug has been marked as a duplicate of 288904 ***

But that one has been marked RESOLVED INVALID because no-one verified that it was still valid. I'm loathe to reopen 333292 immediately, however. Maybe someone with the power can reopen 288904 instead?
The page loads for me now (must've been a temporary blockage somewhere), but its character set is UTF-8, apparently. It renders with the question marks referred to in your expected results.
(In reply to comment #5)
> The page loads for me now (must've been a temporary blockage somewhere), but
> its character set is UTF-8, apparently. It renders with the question marks
> referred to in your expected results.

Could it be that UTF-8 is your default character set?
It was marked invalid because the auto-resolve message made the reporter (probably correctly) believe that nobody who is capable of "fixing" it wants to. For every person who wants to see Windows-1252 characters not display in something that's otherwise ISO-8859-1, there's several thousand who do not, all of whom filed duplicate bugs about Euro signs not displaying years ago. Now that we have several hundred million more users, they're probably even less likely to want to fix it. And with the only workable fix, make Windows-1252 the default for Western European versions, you would still see this page without "?" unless you had changed your default encoding.
(In reply to comment #7)
> And with the only workable fix, make Windows-1252 the
> default for Western European versions, you would still see this page without
> "?" unless you had changed your default encoding.

I'm not looking to break those pages and make millions of people unhappy. It is the combination of reporting ISO 8869-1 _and_ showing the 'offending' characters that I object to. Indeed, in my original report, I put 'correct the character set to Windows-1252' as one of the possible expected results. Would that not be a better and equally workable fix? It's like an autodetect that's on by default.

Something like the solution suggested in https://bugzilla.mozilla.org/show_bug.cgi?id=288904#c10 seems like a good one as well.

Note that I'm trying to find a good solution to this problem, so this can also be implemented in browsers like Epiphany, which currently shows garbage where the quotes and dashes should be. Call it 'encoding quirks best practice'.
You need to log in before you can comment on or make changes to this bug.