Closed Bug 1625258 Opened 5 years ago Closed 4 years ago

BOMless UTF-16LE not autodetected if the first 1024 bytes contain non-Latin1 characters

Tracking

()

Status:

RESOLVED WONTFIX

Tracking Flags:

Tracking

Status

firefox74

---

affected

firefox75

---

affected

firefox76

---

affected

People

(Reporter: didec3662, Unassigned, NeedInfo)

References

Details

Attachments

(2 files)

file.zip 5 years ago didec3662 53.61 KB, application/x-zip-compressed		Details
PartOfTheExportedFileNew.htm 5 years ago Alphan Chen [:alchen] 724 bytes, text/html		Details

didec3662

Reporter

Description

•

5 years ago

Attached file file.zip — Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0

Actual results:

Exported HTML file shows like a sample text (ignore html syntax). Google Chrome the file opens like Firefox (incorrect). But freaking Microsoft Edge the file shows correct. In attachment is ZIP file with a part of the file and PNGs with previews of different browsers.

Clara Guerrero ( Need Info Brindusa Tot Please)

Comment 1

•

5 years ago

Hi,

Thanks for the details. I was able to reproduce on windows 10 pro, on the following versions

Release 74.0 (64-bit)
Beta 75.0b11 (64-bit)
Firefox Nightly 76.0a1 (2020-03-31) (64-bit)

I will move this over to a component so developers can take a look over it. If is not the correct component please feel free to change it to an appropriate one.

Thanks for the report.

Best regards, Clara.

Component: Untriaged → DOM: Core & HTML

Product: Firefox → Core

Clara Guerrero ( Need Info Brindusa Tot Please)

Updated

•

5 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Clara Guerrero ( Need Info Brindusa Tot Please)

Updated

•

5 years ago

status-firefox74: --- → affected

status-firefox75: --- → affected

status-firefox76: --- → affected

Alphan Chen [:alchen]

Comment 2

•

5 years ago

Attached file PartOfTheExportedFileNew.htm — Details

Removing some special from the original test htm file.

Alphan Chen [:alchen]

Comment 3

•

5 years ago

(In reply to Alphan Chen [:alchen] from comment #2)

Created attachment 9138557 [details]
PartOfTheExportedFileNew.htm

Removing some special from the original test htm file.

I think the problem is related to encoding.
If I remove some special characters(attachment 9138557 [details]), it can be viewed normally.
e.g. ř, ž ů, ě

Hi Henri, could you leave some comments on this?

Flags: needinfo?(hsivonen)

Priority: -- → P3

Henri Sivonen (:hsivonen)

Comment 4

•

5 years ago

The file is a BOMless UTF-16LE document. We detect this case only if the code points is the first 1024 bytes are all below U+0100. (The function name suggests under U+0080, but it looks the name doesn't match what the function does.) The detection was added in bug 631751.

Detecting BOMless UTF-16[LE|BE] in the general case is problematic. See https://en.wikipedia.org/wiki/Bush_hid_the_facts

Detecting it assuming the content has HTML tags is less problematic, but I'm still inclined to treat this as WONTFIX unless there's a very good reason to do otherwise.

Reporter, where did the file come from?

Flags: needinfo?(hsivonen) → needinfo?(didec3662)

Summary: Incorrect view of the exported HTML file → BOMless UTF-16LE not autodetected if the first 1024 bytes contain non-Latin1 characters

Henri Sivonen (:hsivonen)

Updated

•

5 years ago

Updated

•

5 years ago

Component: DOM: Core & HTML → DOM: HTML Parser

BugBot [:suhaib / :marco/ :calixte]

Comment 5

•

5 years ago

The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.

Priority: P3 → --

Henri Sivonen (:hsivonen)

Comment 6

•

5 years ago

Setting back to P3 for now, although this is most likely WONTFIX.

Priority: -- → P3

Firefox Bug Husbandry Bot

Comment 7

•

5 years ago

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3 (Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3 (normal.)

Severity: normal → S3

Henri Sivonen (:hsivonen)

Comment 8

•

4 years ago

But freaking Microsoft Edge the file shows correct.

New Edge shows it like Chrome.

Bugzilla

BOMless UTF-16LE not autodetected if the first 1024 bytes contain non-Latin1 characters

Categories

(Core :: DOM: HTML Parser, defect, P3)

Tracking

()

People

(Reporter: didec3662, Unassigned, NeedInfo)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Updated

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type