846936 - Non-ASCII characters not displayed correctly in encoding IBM850 (and possibly others?)

Reporter

Description

•

12 years ago

Attached image URL above displays correctly in an older Mozilla — Details

A friend reported 'FireFox' does not correctly display non-ASCII characters in some pages. I checked in FireFox 18.0.2 and everything was OK. FF updated itself to v19 and it proceeded to fail as reported. Try with any page you know is encoded with charset IBM850. I haven't checked other encodings yet.

Alfredo Fernández Díaz

Reporter

Updated

•

12 years ago

Component: General → HTML: Parser

Product: Firefox → Core

Target Milestone: Firefox 19 → ---

Loic

Comment 1

•

12 years ago

Regression range: m-c good=2012-11-08 bad=2012-11-09 http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=36e99ea02c05&tochange=90cea19e27e2 Suspected bug: Bug 801402 - Use EncodingUtils::FindEncodingForLabel instead of nsCharsetAlias::GetPreferred from HTML5 parser and DOM APIs

Blocks: 801402

Status: UNCONFIRMED → NEW

status-firefox19: --- → affected

tracking-firefox20: --- → ?

tracking-firefox21: --- → ?

tracking-firefox22: --- → ?

Ever confirmed: true

Keywords: regression

Masatoshi Kimura [:emk]

Comment 2

•

12 years ago

Chrome doesn't display the URL correctly, either.

Alfredo Fernández Díaz

Reporter

Comment 3

•

12 years ago

(In reply to Masatoshi Kimura [:emk] from comment #2) > Chrome doesn't display the URL correctly, either. Yes, I knew about that all along. I started to mention it but it got lost when switching from basic to advanced bug reporting and then I forgot -- after all this isn't «BugChromilla», is it?

Henri Sivonen (:hsivonen)

Comment 4

•

12 years ago

(In reply to Alfredo Fernández Díaz from comment #3) > (In reply to Masatoshi Kimura [:emk] from comment #2) > > Chrome doesn't display the URL correctly, either. > > Yes, I knew about that all along. I started to mention it but it got lost > when switching from basic to advanced bug reporting and then I forgot -- > after all this isn't «BugChromilla», is it? The relevance of the page not working in Chrome is that the page was already not working across browsers. In other words, the page was already broken regardless of Firefox removing support for IBM850. The removal of IBM850 was intentional. Since Chrome has had market success without support, it was inferred that the Web doesn't depend on that particular encoding. Clearly, there's now one counter example. Note that IBM850 was a legacy code page even when the Web was introduced. Microsoft was already replacing it with windows-1252 at that time. It's highly unusual for a Web site to use IBM850. Reporter, is the site maintained by you?

Henri Sivonen (:hsivonen)

Comment 5

•

12 years ago

In Safari, the encoding is supported but not listed in the menu.

Henri Sivonen (:hsivonen)

Comment 6

•

12 years ago

For reference, we dropped these: armscii-8 IBM850 IBM852 IBM855 IBM857 IBM862 IBM864 ISO-2022-CN ISO-8859-12 ISO-IR-111 T.61-8bit VISCII x-euc-tw x-johab x-mac-arabic x-mac-ce x-mac-croatian x-mac-devanagari x-mac-farsi x-mac-greek x-mac-gujarati x-mac-gurmukhi x-mac-hebrew x-mac-icelandic x-mac-romanian x-mac-turkish x-viet-tcvn5712 x-viet-vps

Lukas Blakk [:lsblakk] use ?needinfo

Comment 7

•

12 years ago

Triage comment: will wait on response to the question in comment 4 - it's not clear how many people/sites would be impacted here but this looks like it is a tech evangelism and not a release blocking issue.

Alfredo Fernández Díaz

Reporter

Comment 8

•

12 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #4) ... > The relevance of the page not working in Chrome is that the page was already > not working across browsers. In other words, the page was already broken > regardless of Firefox removing support for IBM850. > > The removal of IBM850 was intentional. Since Chrome has had market success > without support, it was inferred that the Web doesn't depend on that > particular encoding. In other words, Chrome trusts its own momentum not to support anything they don't feel like and they've gotten away with it so far, so we'd thought we'd do the same. If a page isn't tweaked and re-tweaked so it looks the same across browsers, then it is broken? I don't think so, and I remember a time when people at Mozilla would have agreed. A page not being properly rendered when the right encoding is explicitly set doesn't mean the page is broken, it means *the browser is broken*, no matter what it's called or how you spin it. As for how relevant a particular page is, the question misses the point -- removing support for something and not even mentioning it in the release notes (a tiny but key detail) is a matter of principles. > Clearly, there's now one counter example. > > Note that IBM850 was a legacy code page even when the Web was introduced. > Microsoft was already replacing it with windows-1252 at that time. It's > highly unusual for a Web site to use IBM850. IBM850 and 437 are the defaults for pretty much any Windows text mode session. > Reporter, is the site maintained by you? Yes, so it's not a problem for me to re-encode the stuff there. However, it will be the first time in history I've tweaked a website to accommodate 'standards-compliant' browsers and not Internet Explorer. The customer and owner gave me an irate call complaining how it was possible that a 'standards advocate' had made something that only looked right in IE. Good for laughs :)

Alfredo Fernández Díaz

Reporter

Comment 9

•

12 years ago

(In reply to Lukas Blakk [:lsblakk] from comment #7) > Triage comment: will wait on response to the question in comment 4 - it's > not clear how many people/sites would be impacted here but this looks like > it is a tech evangelism and not a release blocking issue. Certainly not, especially if we keep in mind the issue arose post-release. No impact as far as I am concerned. Still, from here, it seems a bad move to drop previously working code -- but as I said I think it's a matter of principles, so if I were you I wouldn't wait for me either. I'm not sure who's preaching what Gospel, but you're right making a biblical reference, for I could say scales just fell from my eyes, and now I can see :) P.S. I forgot to thank Henri for the reference of newly unsupported encodings.

Lukas Blakk [:lsblakk] use ?needinfo

Comment 10

•

12 years ago

Since the site in question can workaround and we have not seen much impact on other sites, not tracking this for release. If those circumstances change we can revisit.

Henri Sivonen (:hsivonen)

Comment 11

•

12 years ago

(In reply to Alfredo Fernández Díaz from comment #8) > Yes, so it's not a problem for me to re-encode the stuff there. ... > No impact as far as I am concerned. Thank you. > Still, from here, it seems a bad move to > drop previously working code That depends on how much the code is in actual use. It is possible that you are the only (or almost only) person in the world who authored a Web page in an old DOS encoding and knew how to declare it. I did a large number of Bugzilla searches for words like IBM, DOS, charset, character, encoding, code page, accented, umlaut, arabic, turkish, croatian, gujarati, gurmukhi, devanagari, hindi, indian, hebrew, VISCII, vietnamese, serbian, macedonian, bulgarian, french, german, spanish, armscii, armenian, etc. and found no other bugs filed about desupporting the encodings listed in comment 6. > The customer and owner gave me an irate call complaining how it was possible > that a 'standards advocate' had made something that only looked right in IE. > Good for laughs :) I'm sorry that this caused a problem with your customer. For standards advocacy, I recommend advocating the use of UTF-8.

Anne (:annevk)

Comment 12

•

12 years ago

Alfredo, for what it's worth, the standard Gecko is trying to follow here is http://encoding.spec.whatwg.org/ which is indeed not completely compatible with previous deployments of Gecko, but we believe following it will be better for the health of the web long term. Now granted, in developing that document not all the trade offs might have been correct so any feedback you have is definitely appreciated.

Alfredo Fernández Díaz

Reporter

Comment 13

•

12 years ago

(In reply to Anne (:annevk) from comment #12) > Alfredo, for what it's worth, the standard Gecko is trying to follow here is > http://encoding.spec.whatwg.org/ which is indeed not completely compatible > with previous deployments of Gecko, but we believe following it will be > better for the health of the web long term. > > Now granted, in developing that document not all the trade offs might have > been correct so any feedback you have is definitely appreciated. Anne Van Kesteren?!?! Wow! And you're a Mozillian now. Double wow! :) Unfortunately I don't have that much to add. I understand not every encoding can be supported, especially if they have not much of a presence in the web. But given we're talking encodings that were previously supported, I wonder what the removal really aims at. I have been in positions were keeping support for stuff was causing headaches and it was far simpler and easier to just remove it (so there was a reason), but in such cases I always directed a warning to end users and set a reasonable phase-out period. Doing otherwise --and I know it for a fact-- would have only caused major trouble. Now I may have missed both the reasons and such an announcement in this case (I'll be glad to be pointed to them, but I found none following the regular users' path), but my point is, if you do something like this without warning you'll be lucky if all you get is my reaction (as a tech-type I may be quicker finding who's to blame but that's it). Believe me, I really hope you're that lucky ;) (In reply to Henri Sivonen (:hsivonen) from comment #11) ... > It is possible that you > are the only (or almost only) person in the world who authored a Web page in > an old DOS encoding and knew how to declare it. Oh, I know of at least two others, but of course they're mates of mine. It is possible we are (almost) the only people in the world who can do a lot of stuff right and still get punished for it. So sad... ... > I'm sorry that this caused a problem with your customer. Oh, it was just a call, and I quickly pointed out the real cause, so they're not blaming me. As I said, good for laughs. > For standards advocacy, I recommend advocating the use of UTF-8. While I'd agree for the most part, I prefer not to waste space using multiple-byte or variable-length encodings whenever possible.

Alex Keybl [:akeybl]

Updated

•

12 years ago

tracking-firefox20: ? → -

tracking-firefox21: ? → -

tracking-firefox22: ? → -

Yuhong Bao

Comment 14

•

12 years ago

>IBM850 and 437 are the defaults for pretty much any Windows text mode session. FYI, each default system locale on Windows has an OEMCP and an ACP. With a ACP of 1252, yes an OEMCP of 437 or 850 is typical. Other ACPs typically have different OEMCPs. DBCS locales typically have ACP == OEMCP.

Alfredo Fernández Díaz

Reporter

Comment 15

•

12 years ago

(In reply to Yuhong Bao from comment #14) > FYI, each default system locale on Windows has an OEMCP and an ACP. With a ... > have different OEMCPs. DBCS locales typically have ACP == OEMCP. Your point being *not all* Windows text boxes are set to CP850/437, or...? Sure, CJK systems had standard CPs set long ago by industry and not as much backwards compatibility issues with DOS apps, so just for this once Microsoft spared us from coming up with yet even more code pages... thank God for that.

Henri Sivonen (:hsivonen)

Comment 16

•

12 years ago

I searched for duplicates again and didn't find any. Looks like we can get away with not exposing these legacy encodings to the Web.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

dgimeno

Comment 17

•

11 years ago

I don't understand how important is to remove support to any codepage. Is it too difficult to leave them in Firefox? In fact those codepages now removed are no more than a subset of UTF. There are few pages with ibm850 but there are still some and I think it would be nice to keep supporting them. I repeat, it isn't that difficult.

Anne (:annevk)

Comment 18

•

11 years ago

dgimeno, the idea is to get all browsers aligned on a common standard. That standard does not include ibm850. If you have a list of pages this breaks that would help.

Alfredo Fernández Díaz

Reporter

Comment 19

•

11 years ago

That's sugar-coating things a bit. The idea seems to be enforcing all web pages to use one of a reduced set of encodings. Nothing against that per se, but it would have been nice to read about it in the release notes at the time (a year ago). Apparently it's never too late to document changes.

Anne (:annevk)

Comment 20

•

11 years ago

If that happened I'm sorry. It must have slipped through. We definitely want to document this. teoli, any idea with regards to documentation around encoding support?

Keywords: dev-doc-needed

Jean-Yves Perrier [:teoli]

Updated

•

11 years ago

Flags: needinfo?(jypenator)

Alfredo Fernández Díaz

Reporter

Comment 21

•

11 years ago

What further information is needed? Support for several character encodings was removed in the interim between FireFox v18.0.2 and v19, and it wasn't documented in a way visible to normal users (https://www.mozilla.org/en-US/firefox/19.0/releasenotes/), if at all. Maybe I didn't spell it out like this, but I mentioned it anyway a year ago -- comments #8 and #13. If people need to read anything besides https://www.mozilla.org/en-US/firefox/no./releasenotes/ it would be nice to know. If I had read 'removed support for encodings X Y Z' there at the time I may not have liked it, but I wouldn't have filed a bug.

dgimeno

Comment 22

•

11 years ago

annevk, I think the common standard does include ibm850. http://www.iana.org/assignments/character-sets/character-sets.xhtml This page is coded in ibm850. Further than this, I repeat I think is really odd to remove something already made. Simply that.

dgimeno

Comment 23

•

11 years ago

Sorry, the page I point is this one: http://sima.cat/vcatcont.php

Jean-Yves Perrier [:teoli]

Updated

•

8 years ago

Flags: needinfo?(jypenator)