Closed Bug 462590 Opened 17 years ago Closed 5 years ago

Even one high-byte prevent to autodetect ISO-2022-* encodings

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: emk, Assigned: smontagu)

References

()

Details

Attachments

(1 file)

Steps to reproduce: 1. Select View > Character Encoding > Auto-Detect > Japanese 2. Navigate to the URL. Actual result: Western (Windows-1252) is selected. Expected result: Japanese (ISO-2022-JP) is selected. Reproducable: Always nsUniversalDetector::HandleData will not invoke EscCharSetProber even if just one high-byte is found in the buffer.
I assume this isn't a recent regression. High bytes are illegal in ISO-2022, of course, so in theory this is INVALID, but I guess you're saying that we could be tolerant of pages like this one which are in ISO-2022-JP with a small number of errors (if I select ISO-2022-JP manually the high bytes appear as invalid character symbols).
OS: Windows Vista → All
Hardware: PC → All
Fx2 (that is, legacy detector) was detected "corrctly" because ISO2022JPVerifier said "it's me" before encounting high-byte. On the other hand, universal detector won't call EscCharsetProber in the first place. IMO we should be a little more tolerant (when at least autodetect-Japanese is selected).

still valid?

Flags: needinfo?(VYV03354)
Attached file genkan.html

Attached the problematic file because the original page is dead.

I could still reproduce the problem locally, but if this page is on a .jp domain (as the original page is), it may detect the encoding correctly because encoding detection will depend on TLD.

Flags: needinfo?(VYV03354)

This is presently by design on the logic that with enough ingenuity ISO-2022-JP could probably be used for XSS, so it's safer not to detect it when there is an obvious non-ISO-2022 signal (i.e. high bit set).

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX

I confirmed that the encoding detection works on .jp domains.

Resolution: WONTFIX → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: