Open Bug 715801 Opened 13 years ago Updated 3 years ago

http-index-format directory listings should send 301: encoding line

Categories

(Core :: Networking, defect, P5)

defect

Tracking

()

UNCONFIRMED

People

(Reporter: info, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-would-take])

Numerous bugs have been filed about character encodings in directory listings (ftp, jar, etc.). It seems many of these have been "fixed" by allowing the user to change View > Character Encoding, change intl.charset.default, or relying on auto-detect. But in those cases where Mozilla knows the encoding of the directory listing, it should specify it and avoid any problems. Mozilla produces directory listings using a textual http-index-format (https://developer.mozilla.org/En/Application%2F%2Fhttp-index-format_specification) , and then transforms this into HTML. This format includes a 301: <encoding> line, see http://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsDirIndexParser.cpp#477 , but it is undocumented and seems unused. One test and proof of this is to connect to an FTP server and browse a directory full of files with special glyphs in their names. They appear wrong if you leave your intl.charset.default at the default of ISO-8859-1, because all FTP servers are supposed to issue UTF-8 directory listings, but Mozilla's http-index-format representation of the FTP doesn't tell itself to use UTF-8, and it all gets garbled. You can see garbled characters at ftp://mozilla:mozilla@annexia.org/ (bug 26767). I captured some http-index-format output, added the 301 charset line, and configured my server to serve this. Compare http://www.skierpage.com/moz_bugs/ftp_listing.diri?x with the added 301 line: http://www.skierpage.com/moz_bugs/ftp_listing_extra.diri?x Similarly, compare browsing the contents of the jar file jar:http://www.skierpage.com/moz_bugs/d%C3%A9j%C3%A0%E6%97%A5%E6%9C%AC%E5%9B%BD.jar!/ with the added 301 line: http://www.skierpage.com/moz_bugs/dejajar_contents_extra.diri?x (The ?x query string on these URLs avoids bug 367076 where http-index-format adds a / and becomes a 404 when you view source, reload, or change character encoding.) Adding a "301: UTF-8" line fixes accented characters in listings, but Asian glyphs seem to remain problematic. Maybe this can be part of the solution to bug 26767, bug 502540 and others.
Blocks: 502540
Whiteboard: [necko-would-take]
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.