Closed Bug 1422889 Opened 7 years ago Closed 7 years ago

HTML file starting with UTF-8 BOM not recognized as HTML

Categories

(Core :: DOM: Navigation, defect)

57 Branch
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: vincent-moz, Unassigned)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Build ID: 20171114221957

Steps to reproduce:

1. Create a HTML file "bom" (without a filename extension) starting with a BOM in UTF-8 (see attached file as an example).
2. Open this file from a shell with "firefox bom".


Actual results:

The file is opened as text/plain.


Expected results:

The file should be opened as text/html. This is what happens if the BOM is removed.
Note: This is reproducible with both Firefox 52.5.0 and Firefox 57.0.1.
This appears to be in compliance with https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm .

While the network-oriented legacy rules aren't ideal for file: URLs, it's probably not worthwhile to introduce file: URL-specific additional rules. Changing the rules that apply to network have security implications.

In general, it's best to have the names of HTML files end in .html if they are to be browsed via file: URLs.

CCing GPHemsley and annevk in case they disagree.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Component: HTML: Parser → Document Navigation
Resolution: --- → WONTFIX
(In reply to Henri Sivonen (:hsivonen) from comment #2)
> This appears to be in compliance with
> https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm .

But this is in contradiction with <https://www.w3.org/International/questions/qa-byte-order-mark>, which says "The UTF-8 BOM offers reliable encoding detection, since it is extremely short and stable, works in XML and HTML, and works whether your page is read over the network or not (unlike HTTP declarations)."

I wonder why the mimesniff algorithm doesn't take the possible BOM into account for HTML and XML types.

> In general, it's best to have the names of HTML files end in .html if they
> are to be browsed via file: URLs.

Here, the data came from a pipe and were stored in a temporary file (though I can certainly improve the script since the MIME type was actually provided at some point).
(In reply to Henri Sivonen (:hsivonen) from comment #2)
> This appears to be in compliance with
> https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm .
> 
> While the network-oriented legacy rules aren't ideal for file: URLs, it's
> probably not worthwhile to introduce file: URL-specific additional rules.
> Changing the rules that apply to network have security implications.
> 
> In general, it's best to have the names of HTML files end in .html if they
> are to be browsed via file: URLs.
> 
> CCing GPHemsley and annevk in case they disagree.

I'll defer to Anne, as he knows more about these things than I do.
To be perfectly clear, I agree with Henri (comment 2).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: