UTF-8 documents without charset information autodetected as Japanese

RESOLVED DUPLICATE of bug 306272

Status

()

Core
Internationalization
RESOLVED DUPLICATE of bug 306272
12 years ago
12 years ago

People

(Reporter: Simon Bünzli, Assigned: smontagu)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Reporter)

Description

12 years ago
This regularly happens for lxr/bonsai which seem not to include charset information. It seems that including less than 5 ü or ä characters triggers EUC-JP (not sure about other characters).

Steps to reproduce:
1. Visit data:text/html,%3Ctitle%3EEUC-JP%20instead%20of%20UTF-8%3C%2Ftitle%3E%0A%C3%BC%20%C3%A4%20%C3%BC%20%C3%A4
2. Visit data:text/html,%3Ctitle%3EUTF-8%20correctly%3C%2Ftitle%3E%0A%C3%BC%20%C3%A4%20%C3%BC%20%C3%A4%20%C3%BC

Actual result:
The first file is auto-detected as EUC-JP the second as UTF-8.

Expected result:
Both files are auto-detected as UTF-8.

For comparison: IE6 behaves as expected whereas Opera 9 renders both documents as ISO-8859-1.
Related to bug 306272, if not a dupe.
(Assignee)

Comment 2

12 years ago

*** This bug has been marked as a duplicate of 306272 ***
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.