Closed Bug 711101 Opened 9 years ago Closed 6 years ago

Remove IBM encodings

Categories

(Core :: Internationalization, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 997124

People

(Reporter: annevk, Assigned: smontagu)

References

(Blocks 1 open bug)

Details

As http://annevankesteren.nl/2010/8-bit-labels shows IBM encodings are not consistently implemented and all of them are not supported by at least one browser. I think it would be better if they were removed from the platform so they can be declared unsupported by http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html
That table is rather confusing.  Is the idea that the headings in bold are the names for an encoding that are recognized in all browsers -- unless there's a "Not recognized in:" right before the next bold heading?  And then the other things right under the heading are alternate names recognized in only some browsers?

The OS/2 and/or AIX ports may still want internal support for some of these... but that doesn't mean we'd even need to ship them for other platforms (and if we care, those platforms probably shouldn't recognize them for Web content, though it may not really matter).

For reference, back in 2003 tv.yahoo.com was publishing their US television schedule listings as unlabeled IBM850.  Eventually they got their act together, though.
Your interpretation of the table is correct.  I created an updated one today: http://dvcs.w3.org/hg/encoding/raw-file/tip/single-octet-research.html I will look into making it more clear tomorrow. Thanks!
(In reply to David Baron [:dbaron] from comment #1)
> That table is rather confusing.

Indeed.

> The OS/2 and/or AIX ports may still want internal support for some of
> these...

Uli, does the AIX port need IBM encodings internally for the clipboard of for gfx?

Who is in charge of the OS/2 port these days?

> (and if we care, those platforms probably shouldn't recognize them
> for Web content, though it may not really matter).

In principle, I think it's terrible to expose different encodings or encoding defaults depending on platform. However, I agree that the behavior on AIX or OS/2 won't have much of an impact. (When I discovered that we shipped a different "Shift_JIS" on OS/2, I didn't have the energy to try to change it to the same "Shift_JIS" we ship elsewhere, even though in principle it's a terrible idea that Shift_JIS pages get different Unicode data in the DOM on OS/2.)

In general, I'm in favor of dropping support for encodings that we can drop given market realities. Too many of our encodings were added just because the encodings existed or just because IBM contributed code (as opposed adding just encodings that were truly needed to support existing *Web* content).
(In reply to Henri Sivonen (:hsivonen) from comment #3)

> > The OS/2 and/or AIX ports may still want internal support for some of
> > these...
> 
> Uli, does the AIX port need IBM encodings internally for the clipboard of
> for gfx?

For AIX
Firefox 6.0 is far too burdensome to port to AIX related to very few users, IBM still only offers 3.5.12 as their "recent" release, and I have the mozilla-1.9.2 branch in both 32- and 64bit. The Fast major release schedule kills such ports.
So nothing against removing exotic encodings.
On AIX the GTK2 toolkit locally needs/uses UTF-8 since GTK1 was desupported long ago. If someone needs rendering of historic enconed web content, one could install and use Netscape 4.79 for such content :)
Blocks: encoding
OS/2 ports needs IBM code pages since nsIPlatformCharset needs this.  But Windows and Mac don't reference this.  UNIX (except to Android) may still reference this for locale.all.Ar_AA and locale.all.Sv_SE without HAVE_LANGINFO_CODESET.

Simon, should we keep this encoding for all platform?
I'm against removing encodings that (at least) IE supports. I don't think we have enough data to determine that they're not needed or used.
(In reply to Simon Montagu from comment #6)
> I'm against removing encodings that (at least) IE supports. I don't think we
> have enough data to determine that they're not needed or used.

It seems that we do various things in the internationalization space out of prudence without proper data about necessity.  Can we use something like CommonCrawl to get data?
(In reply to Henri Sivonen (:hsivonen) from comment #7)
> (In reply to Simon Montagu from comment #6)
> > I'm against removing encodings that (at least) IE supports. I don't think we
> > have enough data to determine that they're not needed or used.
> 
> It seems that we do various things in the internationalization space out of
> prudence without proper data about necessity.  Can we use something like
> CommonCrawl to get data?

Actually, couldn't we use telemetry for this?  That way, we'd be measuring the proportion of pages using IBM encodings relative to the pages that are actually visited by Firefox users.
(In reply to Makoto Kato from comment #5)
> UNIX (except to Android) may still
> reference this for locale.all.Ar_AA and locale.all.Sv_SE without
> HAVE_LANGINFO_CODESET.

Why would Swedish be different from Finnish, Danish and Norwegian here?
(In reply to Henri Sivonen (:hsivonen) from comment #9)
> (In reply to Makoto Kato from comment #5)
> > UNIX (except to Android) may still
> > reference this for locale.all.Ar_AA and locale.all.Sv_SE without
> > HAVE_LANGINFO_CODESET.
> 
> Why would Swedish be different from Finnish, Danish and Norwegian here?

I don't know.  CVS log (http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=unixcharset.properties&branch=1.25&root=/cvsroot&subdir=mozilla/intl/uconv/src&command=DIFF_FRAMESET&rev1=1.6&rev2=1.7) has no information for this.  But I believe that this may be needed for old version of AIX.  Since old gfx and widget depends on current code page for rendering, font system needs it.

Also, since current AIX supports nl_langinfo, this alias may not be used. (nl_langinfo may return IBM code page if LANG isn't xxx.UTF8...)
As long as I read bug 210629, we can remove IBM-1046 and alias.  Our font rendering engine was changed by Gecko 1.9, we read/render TrueType/OpenType directly.
When I looking for Sv_SE issue in unicode.org ML, IBM says that Sv_SE on AIX is IBM-850 (although sv_SE on AIX is ISO8859-1).

AIX list is the following.
http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML012/0589.html
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 997124
You need to log in before you can comment on or make changes to this bug.