http-index-format directory listings should send 301: encoding line

UNCONFIRMED
Unassigned

Status

()

P5
normal
UNCONFIRMED
7 years ago
a year ago

People

(Reporter: info, Unassigned)

Tracking

(Blocks: 1 bug)

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [necko-would-take])

(Reporter)

Description

7 years ago
Numerous bugs have been filed about character encodings in directory listings (ftp, jar, etc.).  It seems many of these have been "fixed" by allowing the user to change View > Character Encoding, change intl.charset.default, or relying on auto-detect.

But in those cases where Mozilla knows the encoding of the directory listing, it should specify it and avoid any problems. Mozilla produces directory listings using a textual http-index-format (https://developer.mozilla.org/En/Application%2F%2Fhttp-index-format_specification) , and then transforms this into HTML. This format includes a
  301: <encoding>
line, see http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsDirIndexParser.cpp#477 , but it is undocumented and seems unused.

One test and proof of this is to connect to an FTP server and browse a directory full of files with special glyphs in their names. They appear wrong if you leave your intl.charset.default at the default of ISO-8859-1, because all FTP servers are supposed to issue UTF-8 directory listings, but Mozilla's http-index-format representation of the FTP doesn't tell itself to use UTF-8, and it all gets garbled.

You can see garbled characters at ftp://mozilla:mozilla@annexia.org/ (bug 26767).

I captured some http-index-format output, added the 301 charset line, and configured my server to serve this.  Compare
  http://www.skierpage.com/moz_bugs/ftp_listing.diri?x
with the added 301 line:
  http://www.skierpage.com/moz_bugs/ftp_listing_extra.diri?x

Similarly, compare browsing the contents of the jar file
  jar:http://www.skierpage.com/moz_bugs/d%C3%A9j%C3%A0%E6%97%A5%E6%9C%AC%E5%9B%BD.jar!/
with the added 301 line:
  http://www.skierpage.com/moz_bugs/dejajar_contents_extra.diri?x
(The ?x query string on these URLs avoids bug 367076 where http-index-format adds a / and becomes a 404 when you view source, reload, or change character encoding.)

Adding a "301: UTF-8" line fixes accented characters in listings, but Asian glyphs seem to remain problematic.  Maybe this can be part of the solution to bug 26767, bug 502540 and others.
(Reporter)

Updated

7 years ago
Blocks: 502540
Whiteboard: [necko-would-take]
You need to log in before you can comment on or make changes to this bug.