Closed
Bug 279225
Opened 20 years ago
Closed 20 years ago
Cannot display UTF-16 encoded webpage correctly.
Categories
(SeaMonkey :: General, defect)
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: mika_adler, Assigned: smontagu)
References
()
Details
Attachments
(1 file)
880 bytes,
patch
|
Biesinger
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041231 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041231 My webpage is edited and saved with the Quanta editor in Gentoo Linux, using UTF-16 encoding and has a properly (I think) META-tag to tell the browser that I'm using UTF-16. Mozilla's auto-detect feature for encoding does only try iso-8859-1 encoding, no matter how I configure the browser, and the result is that the webpage just looks like garbage. It does work, though, if I manually tell Mozilla to use UTF-16, but then there are problems that the HTML <B>-tag does only work with UTF-16 little endian, not with UTF-16. Reproducible: Always Steps to Reproduce: 1.Load my homepage into the browser Actual Results: Characters decoded with iso-8859-1 = garbagae. Expected Results: Read the HTML META-tag, and made an intelligent decision that this page was encoded with UTF-16, then the auto-detect feature for char encoding probably would work _much_ better :-)
Comment 1•20 years ago
|
||
Your server is sending a content type of text/html; charset=ISO-8859-1 and it seems to me that it would be difficult to locate a UTF-16 encoded meta tag in such a document.
Comment 2•20 years ago
|
||
server headers override everything else, this is INVALID. fix your server.
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → INVALID
Comment 3•20 years ago
|
||
hm... 10 Content-Type: text/html; charset=UTF-16 it seems the server (now) sends the correct headers. and indeed, it works for me in mozilla.
Assignee | ||
Comment 4•20 years ago
|
||
(In reply to comment #3) > hm... > 10 Content-Type: text/html; charset=UTF-16 > > it seems the server (now) sends the correct headers. and indeed, it works for me > in mozilla. Not completely (for both parts of this statement). UTF-16 implies BE, and the page is in fact little-endian; and there is still the issue mentioned in comment 0 -- > the HTML <B>-tag does only work with UTF-16 little endian, not with UTF-16. I don't understand why if I change View | Character Encoding between UTF-16, UTF-16-BE and UTF-16-LE, each one displays slightly differently. I would have expected at least one to display garbage.
Comment 5•20 years ago
|
||
ah - reopening for those issues, then
Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Assignee | ||
Comment 6•20 years ago
|
||
Thanks to biesi for suggesting that the problem was that UTF-16 isn't recognized as being in the "x-unicode" langGroup. This is actually a regression from bug 68738, which removed the aliasing of UTF-16 to UTF-16BE, without adding a new entry for it in charsetData.properties.
Assignee: general → smontagu
Status: UNCONFIRMED → ASSIGNED
Assignee | ||
Updated•20 years ago
|
Attachment #172170 -
Flags: superreview?(dbaron)
Attachment #172170 -
Flags: review?(cbiesinger)
Updated•20 years ago
|
Attachment #172170 -
Flags: review?(cbiesinger) → review+
Comment on attachment 172170 [details] [diff] [review] Patch for the font issue sr=dbaron utf-32 isn't aliased to itself in charsetalias.properties. Does it need to be? Do we support BOM detection on it?
Attachment #172170 -
Flags: superreview?(dbaron) → superreview+
Reporter | ||
Updated•20 years ago
|
Severity: major → minor
Status: ASSIGNED → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 8•20 years ago
|
||
True, my server was sending wrong data.. but the issue with the font is a problem (small though).
Assignee | ||
Comment 9•20 years ago
|
||
> utf-32 isn't aliased to itself in charsetalias.properties. Does it need to be? > Do we support BOM detection on it? All the HTML utf-32 testcases at http://jshin.net/i18n/utftest/ seem to work (with autodetection turned off), but maybe we need it for UTF-32 stylesheets (parallel to bug 235090)?
Assignee | ||
Comment 10•20 years ago
|
||
Comment on attachment 172170 [details] [diff] [review] Patch for the font issue Checked in.
Comment 11•20 years ago
|
||
(In reply to comment #4) > Not completely (for both parts of this statement). UTF-16 implies BE, and the > page is in fact little-endian; and there is still the issue mentioned in comment > 0 -- That's because 'UTF-16' decoder does 'sort of' endian detection at the beginning instead of regarding it as 'UTF-16BE'. > > the HTML <B>-tag does only work with UTF-16 little endian, not with UTF-16. > > I don't understand why if I change View | Character Encoding between UTF-16, > UTF-16-BE and UTF-16-LE, each one displays slightly differently. I would have > expected at least one to display garbage. It would break in a more spectacular manner if it included a lot more characters beyond U+0100. The page is mostly made of characters below U+0100 and it begins with 0xFF 0xFE 0x3c 0x00 0x48 0x00 0x54 0x00 0x4d 0x00. What's happening is that UTF-16BE decoder interprets 0xFF as invalid ('?') and the rest (0xFE 0x3c 0x00 0x48 0x00 0x54 0x00 0x4d 0x00) as 'U+FE3C U+0048 U+0054 U+004D'. This would work 'perfectly' if the page doesn't have any characters above U+0100. However, it has a few Japanese characters, which breaks this interpretation of UTF-16LE as UTF-16BE with 'one byte offset'. As for stylesheets in UTF-32, indeed we need to do something like what we did in bug 235090 for CSS stylesheet in UTF-16.
You need to log in
before you can comment on or make changes to this bug.
Description
•