Closed Bug 1534006 Opened 6 years ago Closed 6 years ago

Does not recognize Polish special characters in some mixed UTF-8 text

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1071816

Tracking Flags:

Tracking

Status

firefox-esr60

---

wontfix

firefox65

---

wontfix

firefox66

---

unaffected

firefox67

---

unaffected

People

(Reporter: adjoint, Unassigned)

Details

Attachments

(2 files)

2 identical UTF-8 sets of Polish characters that Firefox 65 displays differently 6 years ago functor 369 bytes, application/zip		Details
bad.html 6 years ago functor 43 bytes, text/html		Details

functor

Reporter

Description

•

6 years ago

<div>Ą Ć Ę Ł Ń Ó Ś Ź Ż<br></div> will recognize correctly but not <div>Ą Ć Ę Ł Ń Ó Ś Ź Ż<br></div><div><br></div><div class="protonmail_signature_block"><div class="protonmail_signature_block-user protonmail_signature_block-empty"><br></div><div class="protonmail_signature_block-proton">Sent with <a href="https://protonmail.com" target="_blank">ProtonMail</a> Secure Email.<br></div></div><div><br></div>

Matthias Versen [:Matti]

Comment 1

•

6 years ago

You are reporting the bug for which software and for which version ?
What are the steps to reproduce ?
Please attach a testcase using the "attach File" link.

Component: Translation → Untriaged

Flags: needinfo?(adjoint)

functor

Reporter

Comment 2

•

6 years ago

Attached file 2 identical UTF-8 sets of Polish characters that Firefox 65 displays differently — Details

Flags: needinfo?(adjoint)

functor

Reporter

Comment 3

•

6 years ago

Attached file bad.html — Details

Gingerbread Man

Updated

•

6 years ago

Attachment #9049757 - Attachment mime type: text/html → application/zip

functor

Reporter

Comment 4

•

6 years ago

Gingerbread, how could I upload 2 (seemingly identical) html texts without zipping them?
Though I cannot understand why Firefox displays them differently but I do know that Firefox tries to display the language of local shops contrary to your declared preferences. So, when I set 'language for displaying pages' as (En, Pl) Firefox will display mixed content(ASCI and Polish characters) in Polish when I am in Poland and in English what I am in USA or UK.
I think it is high time to boycott all that commercial shit that has spoiled the Internet.
Another thought: couldn't Firefox try to make itself leaner rather than disturbing users with its idiotically updates?

David Olah

Comment 5

•

6 years ago

Managed to reproduce this issue on Windows 10/7, ubuntu 16.04 and mac OS, on Firefox 65.0.2 and Firefox 60.5.2esr. On Nightly 67.0a1 (2019-03-10), Firefox 66.0b14 didn't managed.

Status: UNCONFIRMED → NEW

status-firefox65: --- → affected

status-firefox66: --- → unaffected

status-firefox67: affected → unaffected

status-firefox-esr60: --- → affected

Component: Untriaged → Internationalization

Ever confirmed: true

OS: Windows 7 → All

Product: Firefox → Core

Hardware: x86_64 → All

Version: 67 Branch → 65 Branch

functor

Reporter

Comment 6

•

6 years ago

David Olah's finding confirms my suspicion that adding spying to final versions tilts language encoding preference.
But I am still curious why those identical files are read differently. Can there be machine code difference, below what is shown in Notepad, which spying gains access?

Matthias Versen [:Matti]

Comment 7

•

6 years ago

functor:
I do not understand your comment #1 at all. Gingerbread just corrected the wrong content-type of the uploaded file.

What you are missing ion your bug description is that your are loading the testcase via file: protocol and not with http/https.
Loading the attachment in comment#3 works as expected (because it'S https) but I do see the problem when loading the testcase with the file: protocol in the current Release.

This got fixed in the meantime because it works in the nightly.

C:>mozregression --bad=2018-11-01 --find-fix --arg="file:///C:/Users/Matti/Downloads/pl-characters/bad.html"

3:03.82 INFO: No more inbound revisions, bisection finished.
3:03.83 INFO: First good revision: 5a6f372f62c150b9263a1a59377a352284b35114
3:03.83 INFO: Last bad revision: 8917a7a6d54fee4a13cc9229380e148615b22779
3:03.83 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8917a7a6d54fee4a13cc9229380e148615b22779&tochange=5a6f372f62c150b9263a1a59377a352284b35114

Henri Sivonen — Bug 1071816 - Support loading unlabeled/BOMless UTF-8 text/html and text/plain files from file: URLs. r=emk.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → DUPLICATE

Henri Sivonen (:hsivonen) (away from Bugzilla until 2025-12-29)

Comment 8

•

6 years ago

Note that the language preference setting is unrelated to encodings and is about telling Web sites what natural languages you prefer to read.

If this bug was about local HTML files, it indeed is a duplicate as marked.

If this bug was about what Protonmail server online, you'd need to report the bug to Protonmail to get them to declare the encoding correctly in protocol metadata.

functor

Reporter

Comment 9

•

6 years ago

I expected to hear answer to my question: how this two UTF-8 Notepad-identical files differ, not noise.

Matthias Versen [:Matti]

Comment 10

•

6 years ago

Look with a hex viewer and and you should see the difference between those files.
You can't use notepad to compare files, use for example https://hexed.it/ if you do not want to install a local hex viewer.

The "good" file contains a Byte order mark (EF BB BF) and the "bad" file not.

https://en.wikipedia.org/wiki/Byte_order_mark

functor

Reporter

Comment 11

•

6 years ago

Thank you Mr.Versen very much. I shouldn't have started my journey in programming from high level languages. They are like river that you see but drink water drawn from under its bed.
That is, I guess, how browsers read cursor position and speed and that is how Microsoft writes something to motherboard BIOS - yes, I can put my head that Windows 10 installation destroyed my MSI BIOS!

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox65: affected → wontfix

status-firefox-esr60: affected → wontfix

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Does not recognize Polish special characters in some mixed UTF-8 text

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: adjoint, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Attachment

General

Description

File Name

Content Type