Closed Bug 1422889 Opened 7 years ago Closed 7 years ago

HTML file starting with UTF-8 BOM not recognized as HTML

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: vincent-moz, Unassigned)

Details

Attachments

(1 file)

HTML file starting with a BOM 7 years ago Vincent Lefevre 219 bytes, text/html		Details

Vincent Lefevre

Reporter

Description

•

7 years ago

Attached file HTML file starting with a BOM — Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Build ID: 20171114221957

Steps to reproduce:

1. Create a HTML file "bom" (without a filename extension) starting with a BOM in UTF-8 (see attached file as an example).
2. Open this file from a shell with "firefox bom".


Actual results:

The file is opened as text/plain.


Expected results:

The file should be opened as text/html. This is what happens if the BOM is removed.

Vincent Lefevre

Reporter

Comment 1

•

7 years ago

Note: This is reproducible with both Firefox 52.5.0 and Firefox 57.0.1.

Henri Sivonen (:hsivonen)

Comment 2

•

7 years ago

This appears to be in compliance with https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm .

While the network-oriented legacy rules aren't ideal for file: URLs, it's probably not worthwhile to introduce file: URL-specific additional rules. Changing the rules that apply to network have security implications.

In general, it's best to have the names of HTML files end in .html if they are to be browsed via file: URLs.

CCing GPHemsley and annevk in case they disagree.

Status: UNCONFIRMED → RESOLVED

Closed: 7 years ago

Component: HTML: Parser → Document Navigation

Resolution: --- → WONTFIX

Vincent Lefevre

Reporter

Comment 3

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #2)
> This appears to be in compliance with
> https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm .

But this is in contradiction with <https://www.w3.org/International/questions/qa-byte-order-mark>, which says "The UTF-8 BOM offers reliable encoding detection, since it is extremely short and stable, works in XML and HTML, and works whether your page is read over the network or not (unlike HTTP declarations)."

I wonder why the mimesniff algorithm doesn't take the possible BOM into account for HTML and XML types.

> In general, it's best to have the names of HTML files end in .html if they
> are to be browsed via file: URLs.

Here, the data came from a pipe and were stored in a temporary file (though I can certainly improve the script since the MIME type was actually provided at some point).

Gordon P. Hemsley [:GPHemsley]

Comment 4

•

7 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #2)
> This appears to be in compliance with
> https://mimesniff.spec.whatwg.org/#mime-type-sniffing-algorithm .
> 
> While the network-oriented legacy rules aren't ideal for file: URLs, it's
> probably not worthwhile to introduce file: URL-specific additional rules.
> Changing the rules that apply to network have security implications.
> 
> In general, it's best to have the names of HTML files end in .html if they
> are to be browsed via file: URLs.
> 
> CCing GPHemsley and annevk in case they disagree.

I'll defer to Anne, as he knows more about these things than I do.

Anne (:annevk)

Comment 5

•

7 years ago

To be perfectly clear, I agree with Henri (comment 2).

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

HTML file starting with UTF-8 BOM not recognized as HTML

Categories

(Core :: DOM: Navigation, defect)

Tracking

()

People

(Reporter: vincent-moz, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Attachment

General

Description

File Name

Content Type