Repair Text Encoding: doesn't detect lone windows-1252 copyright sign
Categories
(Core :: Internationalization, defect, P2)
Tracking
()
People
(Reporter: grahamperrin, Assigned: hsivonen)
References
Details
Attachments
(4 files)
User Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:92.0) Gecko/20100101 Firefox/92.0
Steps to reproduce:
-
visit http://gost.isi.edu/publications/kerberos-neuman-tso.html
-
observe:
Copyright 1994 …
- View menu ▶ Repair Text Encoding
Actual results:
- an inaccurate repair:
Copyright Š 1994
Expected results:
- an accurate repair:
Copyright © 1994 …
– achievable with 78.14.0esr switching from Unicode to Western.
Reporter | ||
Comment 1•3 years ago
|
||
The http://gost.isi.edu/publications/kerberos-neuman-tso.html example arose at/above https://forums.FreeBSD.org/threads/82121/post-532272.
https://old.reddit.com/r/firefox/comments/o3yg6f/-/h2ekzwn/?context=2 suggests marking this type of bug as a regression of bug 1687635.
Comment 2•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Internationalization' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.
Reporter | ||
Comment 3•3 years ago
|
||
(In reply to Graham Perrin from comment #0)
- observe:
Sorry, I just realised, Bugzilla does not truly represent what I pasted ;-/
Instead, here's a screenshot of the REPLACEMENT CHARACTER
Comment 4•3 years ago
|
||
Changing severity to S3 because this doesn't appear to affect overall text encoding. Do you have any ideas Henri? This seems to be a space you've been looking into recently.
Here's another example of a failed encoding repair: https://web.archive.org/web/20000302175152id_/http://www.asus.com.tw/Products/Motherboard/Pentiumpro/P2b-d/index.html
Assignee | ||
Comment 6•2 years ago
|
||
The page from comment 0 has a single non-ASCII byte, so it's understandable that guessing from one byte is unlikely to work.
However, I'll add a special case for space, windows-1252 copyright sign, space.
Assignee | ||
Comment 7•2 years ago
|
||
(In reply to doomtay from comment #5)
Here's another example of a failed encoding repair: https://web.archive.org/web/20000302175152id_/http://www.asus.com.tw/Products/Motherboard/Pentiumpro/P2b-d/index.html
Again, one non-ASCII byte to work with. Not worth fixing. However, filed bug 1746924.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 8•2 years ago
|
||
Assignee | ||
Comment 9•2 years ago
|
||
Comment 10•2 years ago
|
||
Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8052a44096da Detect lone windows-1252 copyright sign as window-1252. r=dminor
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/32183 for changes under testing/web-platform/tests
Comment 12•2 years ago
|
||
bugherder |
Upstream PR merged by moz-wptsync-bot
Updated•2 years ago
|
Comment 14•2 years ago
|
||
Hi Henri,
I tried to reproduce the issue but the link from description is not loading (The server at gost.isi.edu is taking too long to respond) and for the link from comment 5 the 'repair text encoding' option is not enabled.
Can you please give me more details in order to manually verify the fix? Thank you.
Assignee | ||
Comment 15•2 years ago
|
||
Assignee | ||
Comment 16•2 years ago
|
||
(In reply to Monica Chiorean from comment #14)
Can you please give me more details in order to manually verify the fix? Thank you.
You can manually test this by downloading the new attachment and verifying that it shows the copyright sign when loaded from a file:
URL or when loaded from a server that doesn't set the HTTP charset
. (E.g. serving a local directory with python -m SimpleHTTPServer
. Unfortunately, I don't know how make the Bugzilla attachment system serve the file this way directly. Just loading the file from Bugzilla isn't a valid test, because Bugzilla already did server-side detection.)
In this scenario, the Repair Text Encoding menu items will be disabled, because the detection already happened.
Comment 17•2 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #16)
(In reply to Monica Chiorean from comment #14)
Can you please give me more details in order to manually verify the fix? Thank you.
You can manually test this by downloading the new attachment and verifying that it shows the copyright sign when loaded from a
file:
URL or when loaded from a server that doesn't set the HTTPcharset
. (E.g. serving a local directory withpython -m SimpleHTTPServer
. Unfortunately, I don't know how make the Bugzilla attachment system serve the file this way directly. Just loading the file from Bugzilla isn't a valid test, because Bugzilla already did server-side detection.)In this scenario, the Repair Text Encoding menu items will be disabled, because the detection already happened.
That was really helpful because with your attachment and build 94.0a1 (20210919212908) on Win10x64 I was able to reproduce the issue (I have the repair option when I open the attachment and after pressing on repair the copyright is broken on build without fix).
Verified as fixed on Win10 / Ubuntu 20.4 /Mac 10.13 and builds 97.0b8 (20220125201015) and 98.0a1 (20220125190421), (if I open the attachment I have the option to repair text encoding and the copyright is correct on fixed builds).
Description
•