BOMless UTF-16LE not autodetected if the first 1024 bytes contain non-Latin1 characters
Categories
(Core :: DOM: HTML Parser, defect, P3)
Tracking
()
People
(Reporter: didec3662, Unassigned, NeedInfo)
References
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0
Actual results:
Exported HTML file shows like a sample text (ignore html syntax). Google Chrome the file opens like Firefox (incorrect). But freaking Microsoft Edge the file shows correct. In attachment is ZIP file with a part of the file and PNGs with previews of different browsers.
Comment 1•5 years ago
|
||
Hi,
Thanks for the details. I was able to reproduce on windows 10 pro, on the following versions
Release 74.0 (64-bit)
Beta 75.0b11 (64-bit)
Firefox Nightly 76.0a1 (2020-03-31) (64-bit)
I will move this over to a component so developers can take a look over it. If is not the correct component please feel free to change it to an appropriate one.
Thanks for the report.
Best regards, Clara.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 2•5 years ago
|
||
Removing some special from the original test htm file.
Comment 3•5 years ago
|
||
(In reply to Alphan Chen [:alchen] from comment #2)
Created attachment 9138557 [details]
PartOfTheExportedFileNew.htmRemoving some special from the original test htm file.
I think the problem is related to encoding.
If I remove some special characters(attachment 9138557 [details]), it can be viewed normally.
e.g. ř, ž ů, ě
Hi Henri, could you leave some comments on this?
Comment 4•5 years ago
|
||
The file is a BOMless UTF-16LE document. We detect this case only if the code points is the first 1024 bytes are all below U+0100. (The function name suggests under U+0080, but it looks the name doesn't match what the function does.) The detection was added in bug 631751.
Detecting BOMless UTF-16[LE|BE] in the general case is problematic. See https://en.wikipedia.org/wiki/Bush_hid_the_facts
Detecting it assuming the content has HTML tags is less problematic, but I'm still inclined to treat this as WONTFIX unless there's a very good reason to do otherwise.
Reporter, where did the file come from?
Updated•5 years ago
|
Comment 5•5 years ago
|
||
The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.
Comment 6•5 years ago
|
||
Setting back to P3 for now, although this is most likely WONTFIX.
Comment 7•5 years ago
|
||
Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3
(Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3
(normal.)
Comment 8•4 years ago
|
||
But freaking Microsoft Edge the file shows correct.
New Edge shows it like Chrome.
See also bug 1727491.
Description
•