Closed Bug 1731482 Opened 3 years ago Closed 2 years ago

Repair Text Encoding: doesn't detect lone windows-1252 copyright sign

Categories

(Core :: Internationalization, defect, P2)

Firefox 92
defect

Tracking

()

VERIFIED FIXED
97 Branch
Tracking Status
firefox97 --- verified
firefox98 --- verified

People

(Reporter: grahamperrin, Assigned: hsivonen)

References

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:92.0) Gecko/20100101 Firefox/92.0

Steps to reproduce:

  1. visit http://gost.isi.edu/publications/kerberos-neuman-tso.html

  2. observe:

Copyright 1994 …

  1. View menu ▶ Repair Text Encoding

Actual results:

  1. an inaccurate repair:

Copyright Š 1994

Expected results:

  1. an accurate repair:

Copyright © 1994 …

– achievable with 78.14.0esr switching from Unicode to Western.

The Bugbug bot thinks this bug should belong to the 'Core::Internationalization' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Internationalization
Product: Firefox → Core

(In reply to Graham Perrin from comment #0)

  1. observe:

Sorry, I just realised, Bugzilla does not truly represent what I pasted ;-/

Instead, here's a screenshot of the REPLACEMENT CHARACTER

Changing severity to S3 because this doesn't appear to affect overall text encoding. Do you have any ideas Henri? This seems to be a space you've been looking into recently.

Severity: -- → S2
Flags: needinfo?(hsivonen)
Priority: -- → P2

The page from comment 0 has a single non-ASCII byte, so it's understandable that guessing from one byte is unlikely to work.

However, I'll add a special case for space, windows-1252 copyright sign, space.

Assignee: nobody → hsivonen
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(hsivonen)

(In reply to doomtay from comment #5)

Here's another example of a failed encoding repair: https://web.archive.org/web/20000302175152id_/http://www.asus.com.tw/Products/Motherboard/Pentiumpro/P2b-d/index.html

Again, one non-ASCII byte to work with. Not worth fixing. However, filed bug 1746924.

Summary: Repair Text Encoding: page(s) not properly repaired (compared to e.g. Firefox ESR) → Repair Text Encoding: doesn't detect lone windows-1252 copyright sign
Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8052a44096da
Detect lone windows-1252 copyright sign as window-1252. r=dminor
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/32183 for changes under testing/web-platform/tests
Blocks: 1747394
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 97 Branch
Upstream PR merged by moz-wptsync-bot
Flags: qe-verify+

Hi Henri,
I tried to reproduce the issue but the link from description is not loading (The server at gost.isi.edu is taking too long to respond) and for the link from comment 5 the 'repair text encoding' option is not enabled.
Can you please give me more details in order to manually verify the fix? Thank you.

Flags: needinfo?(hsivonen)
Attached file Test case

(In reply to Monica Chiorean from comment #14)

Can you please give me more details in order to manually verify the fix? Thank you.

You can manually test this by downloading the new attachment and verifying that it shows the copyright sign when loaded from a file: URL or when loaded from a server that doesn't set the HTTP charset. (E.g. serving a local directory with python -m SimpleHTTPServer. Unfortunately, I don't know how make the Bugzilla attachment system serve the file this way directly. Just loading the file from Bugzilla isn't a valid test, because Bugzilla already did server-side detection.)

In this scenario, the Repair Text Encoding menu items will be disabled, because the detection already happened.

Flags: needinfo?(hsivonen)

(In reply to Henri Sivonen (:hsivonen) from comment #16)

(In reply to Monica Chiorean from comment #14)

Can you please give me more details in order to manually verify the fix? Thank you.

You can manually test this by downloading the new attachment and verifying that it shows the copyright sign when loaded from a file: URL or when loaded from a server that doesn't set the HTTP charset. (E.g. serving a local directory with python -m SimpleHTTPServer. Unfortunately, I don't know how make the Bugzilla attachment system serve the file this way directly. Just loading the file from Bugzilla isn't a valid test, because Bugzilla already did server-side detection.)

In this scenario, the Repair Text Encoding menu items will be disabled, because the detection already happened.

That was really helpful because with your attachment and build 94.0a1 (20210919212908) on Win10x64 I was able to reproduce the issue (I have the repair option when I open the attachment and after pressing on repair the copyright is broken on build without fix).
Verified as fixed on Win10 / Ubuntu 20.4 /Mac 10.13 and builds 97.0b8 (20220125201015) and 98.0a1 (20220125190421), (if I open the attachment I have the option to repair text encoding and the copyright is correct on fixed builds).

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: