Does not recognize Polish special characters in some mixed UTF-8 text
Categories
(Core :: Internationalization, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr60 | --- | wontfix |
| firefox65 | --- | wontfix |
| firefox66 | --- | unaffected |
| firefox67 | --- | unaffected |
People
(Reporter: adjoint, Unassigned)
Details
Attachments
(2 files)
<div>Ą Ć Ę Ł Ń Ó Ś Ź Ż<br></div> will recognize correctly but not <div>Ą Ć Ę Ł Ń Ó Ś Ź Ż<br></div><div><br></div><div class="protonmail_signature_block"><div class="protonmail_signature_block-user protonmail_signature_block-empty"><br></div><div class="protonmail_signature_block-proton">Sent with <a href="https://protonmail.com" target="_blank">ProtonMail</a> Secure Email.<br></div></div><div><br></div>
Comment 1•6 years ago
|
||
You are reporting the bug for which software and for which version ?
What are the steps to reproduce ?
Please attach a testcase using the "attach File" link.
Updated•6 years ago
|
Gingerbread, how could I upload 2 (seemingly identical) html texts without zipping them?
Though I cannot understand why Firefox displays them differently but I do know that Firefox tries to display the language of local shops contrary to your declared preferences. So, when I set 'language for displaying pages' as (En, Pl) Firefox will display mixed content(ASCI and Polish characters) in Polish when I am in Poland and in English what I am in USA or UK.
I think it is high time to boycott all that commercial shit that has spoiled the Internet.
Another thought: couldn't Firefox try to make itself leaner rather than disturbing users with its idiotically updates?
Comment 5•6 years ago
|
||
Managed to reproduce this issue on Windows 10/7, ubuntu 16.04 and mac OS, on Firefox 65.0.2 and Firefox 60.5.2esr. On Nightly 67.0a1 (2019-03-10), Firefox 66.0b14 didn't managed.
David Olah's finding confirms my suspicion that adding spying to final versions tilts language encoding preference.
But I am still curious why those identical files are read differently. Can there be machine code difference, below what is shown in Notepad, which spying gains access?
Comment 7•6 years ago
|
||
functor:
I do not understand your comment #1 at all. Gingerbread just corrected the wrong content-type of the uploaded file.
What you are missing ion your bug description is that your are loading the testcase via file: protocol and not with http/https.
Loading the attachment in comment#3 works as expected (because it'S https) but I do see the problem when loading the testcase with the file: protocol in the current Release.
This got fixed in the meantime because it works in the nightly.
C:>mozregression --bad=2018-11-01 --find-fix --arg="file:///C:/Users/Matti/Downloads/pl-characters/bad.html"
3:03.82 INFO: No more inbound revisions, bisection finished.
3:03.83 INFO: First good revision: 5a6f372f62c150b9263a1a59377a352284b35114
3:03.83 INFO: Last bad revision: 8917a7a6d54fee4a13cc9229380e148615b22779
3:03.83 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8917a7a6d54fee4a13cc9229380e148615b22779&tochange=5a6f372f62c150b9263a1a59377a352284b35114
Henri Sivonen — Bug 1071816 - Support loading unlabeled/BOMless UTF-8 text/html and text/plain files from file: URLs. r=emk.
Note that the language preference setting is unrelated to encodings and is about telling Web sites what natural languages you prefer to read.
If this bug was about local HTML files, it indeed is a duplicate as marked.
If this bug was about what Protonmail server online, you'd need to report the bug to Protonmail to get them to declare the encoding correctly in protocol metadata.
I expected to hear answer to my question: how this two UTF-8 Notepad-identical files differ, not noise.
Comment 10•6 years ago
|
||
Look with a hex viewer and and you should see the difference between those files.
You can't use notepad to compare files, use for example https://hexed.it/ if you do not want to install a local hex viewer.
The "good" file contains a Byte order mark (EF BB BF) and the "bad" file not.
| Reporter | ||
Comment 11•6 years ago
|
||
Thank you Mr.Versen very much. I shouldn't have started my journey in programming from high level languages. They are like river that you see but drink water drawn from under its bed.
That is, I guess, how browsers read cursor position and speed and that is how Microsoft writes something to motherboard BIOS - yes, I can put my head that Windows 10 installation destroyed my MSI BIOS!
Updated•6 years ago
|
Description
•