Closed Bug 1534006 Opened 6 years ago Closed 6 years ago

Does not recognize Polish special characters in some mixed UTF-8 text

Categories

(Core :: Internationalization, defect)

65 Branch
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1071816
Tracking Status
firefox-esr60 --- wontfix
firefox65 --- wontfix
firefox66 --- unaffected
firefox67 --- unaffected

People

(Reporter: adjoint, Unassigned)

Details

Attachments

(2 files)

<div>Ą Ć Ę Ł Ń Ó Ś Ź Ż<br></div> will recognize correctly but not <div>Ą Ć Ę Ł Ń Ó Ś Ź Ż<br></div><div><br></div><div class="protonmail_signature_block"><div class="protonmail_signature_block-user protonmail_signature_block-empty"><br></div><div class="protonmail_signature_block-proton">Sent with <a href="https://protonmail.com" target="_blank">ProtonMail</a> Secure Email.<br></div></div><div><br></div>

You are reporting the bug for which software and for which version ?
What are the steps to reproduce ?
Please attach a testcase using the "attach File" link.

Component: Translation → Untriaged
Flags: needinfo?(adjoint)
Flags: needinfo?(adjoint)
Attached file bad.html
Attachment #9049757 - Attachment mime type: text/html → application/zip

Gingerbread, how could I upload 2 (seemingly identical) html texts without zipping them?
Though I cannot understand why Firefox displays them differently but I do know that Firefox tries to display the language of local shops contrary to your declared preferences. So, when I set 'language for displaying pages' as (En, Pl) Firefox will display mixed content(ASCI and Polish characters) in Polish when I am in Poland and in English what I am in USA or UK.
I think it is high time to boycott all that commercial shit that has spoiled the Internet.
Another thought: couldn't Firefox try to make itself leaner rather than disturbing users with its idiotically updates?

Managed to reproduce this issue on Windows 10/7, ubuntu 16.04 and mac OS, on Firefox 65.0.2 and Firefox 60.5.2esr. On Nightly 67.0a1 (2019-03-10), Firefox 66.0b14 didn't managed.

Status: UNCONFIRMED → NEW
Component: Untriaged → Internationalization
Ever confirmed: true
OS: Windows 7 → All
Product: Firefox → Core
Hardware: x86_64 → All
Version: 67 Branch → 65 Branch

David Olah's finding confirms my suspicion that adding spying to final versions tilts language encoding preference.
But I am still curious why those identical files are read differently. Can there be machine code difference, below what is shown in Notepad, which spying gains access?

functor:
I do not understand your comment #1 at all. Gingerbread just corrected the wrong content-type of the uploaded file.

What you are missing ion your bug description is that your are loading the testcase via file: protocol and not with http/https.
Loading the attachment in comment#3 works as expected (because it'S https) but I do see the problem when loading the testcase with the file: protocol in the current Release.

This got fixed in the meantime because it works in the nightly.

C:>mozregression --bad=2018-11-01 --find-fix --arg="file:///C:/Users/Matti/Downloads/pl-characters/bad.html"

3:03.82 INFO: No more inbound revisions, bisection finished.
3:03.83 INFO: First good revision: 5a6f372f62c150b9263a1a59377a352284b35114
3:03.83 INFO: Last bad revision: 8917a7a6d54fee4a13cc9229380e148615b22779
3:03.83 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8917a7a6d54fee4a13cc9229380e148615b22779&tochange=5a6f372f62c150b9263a1a59377a352284b35114

Henri Sivonen — Bug 1071816 - Support loading unlabeled/BOMless UTF-8 text/html and text/plain files from file: URLs. r=emk.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE

Note that the language preference setting is unrelated to encodings and is about telling Web sites what natural languages you prefer to read.

If this bug was about local HTML files, it indeed is a duplicate as marked.

If this bug was about what Protonmail server online, you'd need to report the bug to Protonmail to get them to declare the encoding correctly in protocol metadata.

I expected to hear answer to my question: how this two UTF-8 Notepad-identical files differ, not noise.

Look with a hex viewer and and you should see the difference between those files.
You can't use notepad to compare files, use for example https://hexed.it/ if you do not want to install a local hex viewer.

The "good" file contains a Byte order mark (EF BB BF) and the "bad" file not.

Thank you Mr.Versen very much. I shouldn't have started my journey in programming from high level languages. They are like river that you see but drink water drawn from under its bed.
That is, I guess, how browsers read cursor position and speed and that is how Microsoft writes something to motherboard BIOS - yes, I can put my head that Windows 10 installation destroyed my MSI BIOS!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: