Closed
Bug 462590
Opened 17 years ago
Closed 5 years ago
Even one high-byte prevent to autodetect ISO-2022-* encodings
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: emk, Assigned: smontagu)
References
()
Details
Attachments
(1 file)
|
6.78 KB,
text/html
|
Details |
Steps to reproduce:
1. Select View > Character Encoding > Auto-Detect > Japanese
2. Navigate to the URL.
Actual result:
Western (Windows-1252) is selected.
Expected result:
Japanese (ISO-2022-JP) is selected.
Reproducable: Always
nsUniversalDetector::HandleData will not invoke EscCharSetProber even if just one high-byte is found in the buffer.
| Assignee | ||
Comment 1•17 years ago
|
||
I assume this isn't a recent regression. High bytes are illegal in ISO-2022, of course, so in theory this is INVALID, but I guess you're saying that we could be tolerant of pages like this one which are in ISO-2022-JP with a small number of errors (if I select ISO-2022-JP manually the high bytes appear as invalid character symbols).
OS: Windows Vista → All
Hardware: PC → All
| Reporter | ||
Comment 2•17 years ago
|
||
Fx2 (that is, legacy detector) was detected "corrctly" because ISO2022JPVerifier said "it's me" before encounting high-byte.
On the other hand, universal detector won't call EscCharsetProber in the first place. IMO we should be a little more tolerant (when at least autodetect-Japanese is selected).
| Reporter | ||
Comment 4•5 years ago
|
||
Attached the problematic file because the original page is dead.
I could still reproduce the problem locally, but if this page is on a .jp domain (as the original page is), it may detect the encoding correctly because encoding detection will depend on TLD.
Flags: needinfo?(VYV03354)
Comment 5•5 years ago
|
||
This is presently by design on the logic that with enough ingenuity ISO-2022-JP could probably be used for XSS, so it's safer not to detect it when there is an obvious non-ISO-2022 signal (i.e. high bit set).
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
| Reporter | ||
Comment 6•5 years ago
|
||
I confirmed that the encoding detection works on .jp domains.
Resolution: WONTFIX → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•