Closed Bug 1731482 Opened 3 years ago Closed 2 years ago

Repair Text Encoding: doesn't detect lone windows-1252 copyright sign

Tracking

()

Status:

VERIFIED FIXED

Milestone:

97 Branch

Tracking Flags:

Tracking

Status

firefox97

---

verified

firefox98

---

verified

People

(Reporter: grahamperrin, Assigned: hsivonen)

References

Details

Attachments

(4 files)

Screenshot: firefox-esr-78.14.0,1 on FreeBSD 13.0-RELEASE-p4 3 years ago Graham Perrin 237.19 KB, image/png		Details
Screenshot: pre-repair, the � U+FFFD REPLACEMENT CHARACTER in Firefox 92 3 years ago Graham Perrin 18.97 KB, image/png		Details
Bug 1731482 - Detect lone windows-1252 copyright sign as window-1252. 2 years ago Henri Sivonen (:hsivonen) 48 bytes, text/x-phabricator-request		Details \| Review
Test case 2 years ago Henri Sivonen (:hsivonen) 26 bytes, text/plain		Details

Graham Perrin

Reporter

Description

•

3 years ago

User Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:92.0) Gecko/20100101 Firefox/92.0

Steps to reproduce:

visit http://gost.isi.edu/publications/kerberos-neuman-tso.html
observe:

Copyright 1994 …

View menu ▶ Repair Text Encoding

Actual results:

an inaccurate repair:

Copyright Š 1994

Expected results:

an accurate repair:

Copyright © 1994 …

– achievable with 78.14.0esr switching from Unicode to Western.

Graham Perrin

Reporter

Comment 1

•

3 years ago

Attached image Screenshot: firefox-esr-78.14.0,1 on FreeBSD 13.0-RELEASE-p4 — Details

The http://gost.isi.edu/publications/kerberos-neuman-tso.html example arose at/above https://forums.FreeBSD.org/threads/82121/post-532272.

https://old.reddit.com/r/firefox/comments/o3yg6f/-/h2ekzwn/?context=2 suggests marking this type of bug as a regression of bug 1687635.

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

3 years ago

The Bugbug bot thinks this bug should belong to the 'Core::Internationalization' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Internationalization

Product: Firefox → Core

Graham Perrin

Reporter

Comment 3

•

3 years ago

Attached image Screenshot: pre-repair, the � U+FFFD REPLACEMENT CHARACTER in Firefox 92 — Details

(In reply to Graham Perrin from comment #0)

observe:

Sorry, I just realised, Bugzilla does not truly represent what I pasted ;-/

Instead, here's a screenshot of the REPLACEMENT CHARACTER

Greg Tatum [:gregtatum]

Comment 4

•

3 years ago

Changing severity to S3 because this doesn't appear to affect overall text encoding. Do you have any ideas Henri? This seems to be a space you've been looking into recently.

Severity: -- → S2

Flags: needinfo?(hsivonen)

Priority: -- → P2

doomtay

Comment 5

•

2 years ago

Here's another example of a failed encoding repair: https://web.archive.org/web/20000302175152id_/http://www.asus.com.tw/Products/Motherboard/Pentiumpro/P2b-d/index.html

Henri Sivonen (:hsivonen)

Assignee

Comment 6

•

2 years ago

The page from comment 0 has a single non-ASCII byte, so it's understandable that guessing from one byte is unlikely to work.

However, I'll add a special case for space, windows-1252 copyright sign, space.

Assignee: nobody → hsivonen

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 7

•

2 years ago

(In reply to doomtay from comment #5)

Here's another example of a failed encoding repair: https://web.archive.org/web/20000302175152id_/http://www.asus.com.tw/Products/Motherboard/Pentiumpro/P2b-d/index.html

Again, one non-ASCII byte to work with. Not worth fixing. However, filed bug 1746924.

Henri Sivonen (:hsivonen)

Assignee

Updated

•

2 years ago

Summary: Repair Text Encoding: page(s) not properly repaired (compared to e.g. Firefox ESR) → Repair Text Encoding: doesn't detect lone windows-1252 copyright sign

Henri Sivonen (:hsivonen)

Assignee

Comment 8

•

2 years ago

Attached file Bug 1731482 - Detect lone windows-1252 copyright sign as window-1252. — Details

Henri Sivonen (:hsivonen)

Assignee

Comment 9

•

2 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8636dd772e314e79a62783cc9f66acfd9c16b01c

Pulsebot

Comment 10

•

2 years ago

Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8052a44096da
Detect lone windows-1252 copyright sign as window-1252. r=dminor

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Comment 11

•

2 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/32183 for changes under testing/web-platform/tests

Henri Sivonen (:hsivonen)

Assignee

Updated

•

2 years ago

Blocks: 1747394

Andreea Pavel [:apavel]

Comment 12

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8052a44096da

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox97: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 97 Branch

Web Platform Test Sync Bot (Matrix: #interop:mozilla.org)

Comment 13

•

2 years ago

Upstream PR merged by moz-wptsync-bot

Giorgia Nichita, Release Desktop QA

Updated

•

2 years ago

Flags: qe-verify+

Monica Chiorean

Comment 14

•

2 years ago

Hi Henri,
I tried to reproduce the issue but the link from description is not loading (The server at gost.isi.edu is taking too long to respond) and for the link from comment 5 the 'repair text encoding' option is not enabled.
Can you please give me more details in order to manually verify the fix? Thank you.

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Assignee

Comment 15

•

2 years ago

Attached file Test case — Details

Henri Sivonen (:hsivonen)

Assignee

Comment 16

•

2 years ago

(In reply to Monica Chiorean from comment #14)

Can you please give me more details in order to manually verify the fix? Thank you.

You can manually test this by downloading the new attachment and verifying that it shows the copyright sign when loaded from a file: URL or when loaded from a server that doesn't set the HTTP charset. (E.g. serving a local directory with python -m SimpleHTTPServer. Unfortunately, I don't know how make the Bugzilla attachment system serve the file this way directly. Just loading the file from Bugzilla isn't a valid test, because Bugzilla already did server-side detection.)

In this scenario, the Repair Text Encoding menu items will be disabled, because the detection already happened.

Flags: needinfo?(hsivonen)

Monica Chiorean

Comment 17

•

2 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #16)

(In reply to Monica Chiorean from comment #14)

Can you please give me more details in order to manually verify the fix? Thank you.

You can manually test this by downloading the new attachment and verifying that it shows the copyright sign when loaded from a file: URL or when loaded from a server that doesn't set the HTTP charset. (E.g. serving a local directory with python -m SimpleHTTPServer. Unfortunately, I don't know how make the Bugzilla attachment system serve the file this way directly. Just loading the file from Bugzilla isn't a valid test, because Bugzilla already did server-side detection.)

In this scenario, the Repair Text Encoding menu items will be disabled, because the detection already happened.

That was really helpful because with your attachment and build 94.0a1 (20210919212908) on Win10x64 I was able to reproduce the issue (I have the repair option when I open the attachment and after pressing on repair the copyright is broken on build without fix).
Verified as fixed on Win10 / Ubuntu 20.4 /Mac 10.13 and builds 97.0b8 (20220125201015) and 98.0a1 (20220125190421), (if I open the attachment I have the option to repair text encoding and the copyright is correct on fixed builds).

Status: RESOLVED → VERIFIED

status-firefox97: fixed → verified

status-firefox98: --- → verified

Flags: qe-verify+

You need to log in before you can comment on or make changes to this bug.