Closed Bug 1036837 Opened 10 years ago Closed 10 years ago

Can't customize the Character Encoding menu to put MS-DOS encodings there

Categories

(Firefox :: Menus, defect)

30 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: davian818+1, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0 (Beta/Release)
Build ID: 20140605174243

Steps to reproduce:

The View/Encoding menu has fixed set of entries and does not include my encodings. It totally ignores customization via intl.charsetmenu.static setting.
Henri, I believe removing this functionality was intentional, can you confirm?

Urmas, for what encodings do you need this functionality?
Component: Untriaged → Menus
Flags: needinfo?(hsivonen)
Flags: needinfo?(davian818+1)
OS: Windows 7 → All
Hardware: x86_64 → All
Primarily IBM852, IBM850, although other ones came in handy.
Flags: needinfo?(davian818+1)
(In reply to :Gijs Kruitbosch from comment #1)
> Henri, I believe removing this functionality was intentional, can you
> confirm?

This is by design. Since the menu is now flat, there is no need for a feature to pin items to the top level of the menu--the top level is all there is.

(In reply to Urmas from comment #2)
> Primarily IBM852, IBM850, although other ones came in handy.

(For others reading this bug report: IBM850 is DOS Western European and IBM852 is DOS Central European.)

We believe IBM852 and IBM850 not to be necessary for the Web, since Presto-Opera got away with not supporting them and Chrome continues to get away with not supporting them. We no longer expose these encodings to the Web and are about to remove the remaining dead code related to these encodings. We haven't allowed sites to declare them since Firefox 19. We haven't allowed them to manually used from the menu since Firefox 28.

How exactly do these encoding "come in handy"? What Web sites serve you content with these encodings?

Since Firefox 28 shipped on March 18 and you are filing this bug almost 4 months later, frankly, it doesn't look this these DOS encodings have been a day-to-day mission-critical feature.
Flags: needinfo?(hsivonen)
Summary: Encoding menu ignores encoding selections → Can't customize the Character Encoding menu to put MS-DOS encodings there
Urmas, can you, please, elaborate on these questions:
How exactly do these encoding "come in handy"? What Web sites serve you content with these encodings?
Flags: needinfo?(davian818+1)
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
Why should I advocate myself? It was there. I used it. It worked. Now it doesn't.
Status: RESOLVED → UNCONFIRMED
Resolution: INCOMPLETE → ---
You should provide justification since without justification we will assume we made the correct choice. There's a lot of variables in play here.
(In reply to Anne (:annevk) from comment #6)
> You should provide justification since without justification we will assume
> we made the correct choice. There's a lot of variables in play here.

I personally would like to use it again when viewing files on http://textfiles.com/ and other similar sites that host old files using those codepages. Since firefox will display the text files in-browser, it's useful to have it let you choose the right codepage.

If you know of an extension that replicates the functionality instead, I'd love to see it though!
(In reply to fishmech from comment #7)
> I personally would like to use it again when viewing files on
> http://textfiles.com/ and other similar sites that host old files using
> those codepages. Since firefox will display the text files in-browser, it's
> useful to have it let you choose the right codepage.

Did this use case really work in the past?

Obviously, I didn't examine all files on textfiles.com, but the files I did examine fell into 3 categories:
 1) Pure ASCII, displayed fine.
 2) Windows-1252, displayed fine if you have a windows-1252-affiliated Firefox locale.
 3) Files that had control bytes that caused Firefox to offer to save the file to disk thinking the user had tried to download a binary that was mislabeled as text/plain.

Since the logic for #3 has always run before the HTML parser gets a chance to turn text/plain into a displayable DOM, I'd expect files in case #3 to be unviewable in the browser even back when DOS encodings were in the menu.

If there exists a fourth case of DOS-encoded files that Firefox doesn't treat as binary downloads, I didn't stumble upon any and that case certainly isn't representative of all of textfiles.com.

Considering that textfiles.com carries text files that Firefox doesn't consider to be "text" at all, I think it makes sense to contact the admin about making a different tradeoff between historical byte accuracy and browser-viewability, since bringing back the DOS encodings wouldn't actually make all of current textfiles.com viewable in Firefox.
(In reply to Henri Sivonen (:hsivonen) from comment #8)
>  3) Files that had control bytes that caused Firefox to offer to save the
> file to disk thinking the user had tried to download a binary that was
> mislabeled as text/plain.

Note that we can't really remove Firefox's second-guessing the server-declared type when the server claims "text/plain" and there are non-textish bytes, because Apache had bad defaults for such a long time that there are many files out there that are meant to be binary downloads and that are labeled text/plain.
(In reply to Henri Sivonen (:hsivonen) from comment #8)
> Considering that textfiles.com carries text files that Firefox doesn't
> consider to be "text" at all, I think it makes sense to contact the admin
> about making a different tradeoff between historical byte accuracy and
> browser-viewability, since bringing back the DOS encodings wouldn't actually
> make all of current textfiles.com viewable in Firefox.

https://twitter.com/textfiles/status/534323744186302464
(In reply to Henri Sivonen (:hsivonen) from comment #8)
> (In reply to fishmech from comment #7)
> > I personally would like to use it again when viewing files on
> > http://textfiles.com/ and other similar sites that host old files using
> > those codepages. Since firefox will display the text files in-browser, it's
> > useful to have it let you choose the right codepage.
> 
> Did this use case really work in the past?
> 
> Obviously, I didn't examine all files on textfiles.com, but the files I did
> examine fell into 3 categories:
>  1) Pure ASCII, displayed fine.
>  2) Windows-1252, displayed fine if you have a windows-1252-affiliated
> Firefox locale.
>  3) Files that had control bytes that caused Firefox to offer to save the
> file to disk thinking the user had tried to download a binary that was
> mislabeled as text/plain.
> 
> Since the logic for #3 has always run before the HTML parser gets a chance
> to turn text/plain into a displayable DOM, I'd expect files in case #3 to be
> unviewable in the browser even back when DOS encodings were in the menu.
> 
> If there exists a fourth case of DOS-encoded files that Firefox doesn't
> treat as binary downloads, I didn't stumble upon any and that case certainly
> isn't representative of all of textfiles.com.
> 
> Considering that textfiles.com carries text files that Firefox doesn't
> consider to be "text" at all, I think it makes sense to contact the admin
> about making a different tradeoff between historical byte accuracy and
> browser-viewability, since bringing back the DOS encodings wouldn't actually
> make all of current textfiles.com viewable in Firefox.

It did work in a past, in older versions that allowed choosing a "western (DOS)" or similar named encoding.

Here is an example of a file that firefox happily opens, but won't look right by default:
http://textfiles.com/anarchy/CARDING/aicard.txt

As you can see the first few lines have DOS line-drawing art used, that the default Western encoding it auto-detects displays as accented characters instead, further Windows-1252 also does not have the box drawing characters present.

Here is an image of how firefox autodetects on my system: http://fishmech.net/test1.png
Here is an image of how it's supposed to look: http://fishmech.net/test2.png

I have found that the "Cyrillic (DOS)" characer encoding makes most of them work since it seems to be the sole Firefox supported character encoding currently with the DOS box drawing characters int he right position, but occasionally this results in Cyrillic characters showing up where they probably weren't intended, since the Cyrillic (DOS) character encoding replaces a bank of accented characters, currency symbols, greek letters used in formulas and minor punctuation with the Cyrillic alphabet.


Now I do say again, I don't particularly care if code page 437 direct support is added back into Firefox core functionality, although if it was that'd be great. I'd just like to have a extension or something that will cover selecting code page 437, especially if we're gonna have support for an obscure Cyrillic DOS variant (code page 866 it seems). 

Again, if any of you know of such an extension I'd be happy to just install it, much like I have the "overbiteff" extension installed to mess around with the gopher protocol ever since that was removed from firefox proper.
DOS code page 437 is still used to this day as the standard encoding for the NFO text files accompanying pirate scene releases (which web sites generally resort to displaying as images from lack of browser support) and is present all over the web in legacy file collections which nobody is going to mass-convert to Unicode any time soon.

Some examples (note the range of dates):
http://www.vcdq.com/files/nfos/18-2014/172873/1166722206.nfo (which the site also provides as http://www.vcdq.com/files/nfos/18-2014/172873/1629007614.png )
http://www.gamefaqs.com/pc/564960-master-of-magic/faqs/2059
http://ftp.sunet.se/pub/databases/relational/msql/Incoming/flt-mec3.nfo
ftp://ftp.scene.org/pub/parties/1998/interjam98/misc/interjam.txt
http://archive.org/download/1stCanadian/1stCanadian.cdr/GAMES%2F1KEENV3%2FREADALL.TXT

They're all over the place if you search the web for strings like "ÛÛÛÛÛ". I don't think Bugzilla reports are an especially good way to gauge usage.
OK, so textfiles.com has a fourth category of files: Files that use DOS box drawing characters but don't trigger the binary detector. I think it doesn't make particularly lot of sense to fix category 4 of textfiles.com content while leaving category 3 broken.

However, the distribution of historical text files with box drawing characters from code page 437 seems to warrant a more careful examination:

(In reply to fishmech from comment #11)
> Now I do say again, I don't particularly care if code page 437 direct
> support is added back into Firefox core functionality, although if it was
> that'd be great.

Firefox has never supported code page 437 (the U.S. English DOS code page), so the notion of adding it "back" does not make sense. Firefox did support a subset of European and Middle-Eastern DOS code pages. Note that code page 850 (Western European) does not have all the box drawing characters that code page 437 has.

Is it the case that content that's supposed to be code page 437 just happens to use only the box drawing character subset that code page 850 also has? In fact, the DOS code page that we still do support, code page 866, has the full set of box drawing characters from code page 437 (and retains the ASCII range as usual), using code page 866 to view code page 437 content that uses ASCII plus box drawing actually makes more sense than using code page 850!

So:

 * Since Firefox has never supported code page 437, there is no code page 437 support to "restore".

 * For the purpose of viewing code page 437 content that uses ASCII plus box drawing, the DOS code page that we do retain, code page 866, actually works *better* than the removed code pages--in particular the "Western European" one, which users might naïvely choose as the closest approximation for "U.S. English".

Since code page 866 is in the menu, I think you don't need an extension. :-) (Also, since the relevant content doesn't tend to actually declare code page 437, it's not like the lack of support for the actual code page 437 is what necessitates the manual override.)

> especially if we're gonna have support for an
> obscure Cyrillic DOS variant (code page 866 it seems). 

According to telemetry, code page 866 is the least obscure of them. It's unclear if code page 866 is truly used significantly on the Web or if our Russian and Ukrainian detectors just misfire and detect a lot of non-Cyrillic content as code page 866. However, code page 866 is the only DOS code page that Chrome and Presto-Opera support, which strongly suggests that it truly is used more commonly on the Web than other DOS code pages.

I think this is WONTFIX at this point, since the dominant use case turned out to be reading content encoded in a code page we've never supported and as far as approximating it with other code pages go, the DOS code page we do retain is actually a better approximation than the removed ones.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.