Closed
Bug 545478
Opened 14 years ago
Closed 5 years ago
UTF-8 containing astral plane glyphs wraps in the wrong place
Categories
(Thunderbird :: Untriaged, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: dg, Unassigned)
Details
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/533.1 (KHTML, like Gecko) Chrome/5.0.323.0 Safari/533.1 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.7) Gecko/20100111 Thunderbird/3.0.1 If I send a plain-text message using the UTF-8 encoding, with a line of text containing astral plane glyphs (that is, code points above U+FFFF), then the line is wrapped in the wrong place. I've filed this as General because I don't know which component contains the linewrapping code... Reproducible: Always Steps to Reproduce: 1. Start composing a plain-text message in UTF-8. 2. Paste in '
Reporter | ||
Comment 1•14 years ago
|
||
Oh, joy. Bugzilla has utterly mangled my bug. Apparently it doesn't support astral plane Unicode properly. Maybe I should file a bug on it... My test text is the phrase "Ph'nglui mglw'nafh Cthulhu R'lnyeh wgah'nagl fhtagn.", but using letters from the Mathematical Alphanumeric Symbols unicode range up in U+1D400 instead of plain text. I tried to post a copy to pastebin but that doesn't support Unicode either --- you can find a copy of the string on Twitter, here: http://twitter.com/hjalfi/statuses/8690602802 To reproduce: 1. Start composing a plain-text message in UTF-8. 2. Paste in the string above (the exotic Unicode form). 3. Send it to yourself. What you see: Ph'nglui mglw'nafh Cthulhu R'lnyeh wgah'nagl fhtagn. What you should see: Ph'nglui mglw'nafh Cthulhu R'lnyeh wgah'nagl fhtagn. This is a classic symptom of a particular problem: one astral plane code point is represented as two UTF-16 values, using surrogates. An awful lot of code makes the erroneous assumption that one UTF-16 value represents on glyph. If you encode the above message in UTF-16, the line has been wrapped 60 half-words in (even though it's only 33 codepoints). This leads me to suspect that the line-wrapping code is treating each UTF-16 value as if it's a character, rather than actually parsing the Unicode properly.
Comment 2•14 years ago
|
||
Is this also true when using Ff ?
Reporter | ||
Comment 3•14 years ago
|
||
HTML containing the test phrase does seem to wrap correctly. Also, the test phrase gets miswrapped only when the message is *sent*, not when it's in the compose window. Does that help? Also also it would appear that Twitter's astral plane support is also dodgy, and the above link to the test phrase no longer works (it now links to an empty message!). In the interests of having at least *something* to test with, there's an entity-encoded version here: http://pastebin.com/f11edd070 But you'll need to convert that to UTF-8 somehow before pasteing it into a compose window...
Reporter | ||
Comment 4•14 years ago
|
||
(Sorry, HTML containing the test phrase seems to wrap correctly *in Firefox*.)
Comment 5•12 years ago
|
||
David, can you reproduce this using a current version of thunderbird? if you are unable to reproduce, please close by setting stats to resolved, and resolution to WORKSFORME or another appropriate setting. If you are able to reproduce, add new details, and a testcase if one does not already exist in the bug report.
Version: unspecified → 3.0
Reporter | ||
Comment 6•12 years ago
|
||
No, this is still extant in 10.0.2 and the supplied test case still verifies it.
Updated•11 years ago
|
Component: General → Untriaged
Reporter | ||
Comment 7•11 years ago
|
||
Having just rechecked, it looks like this is fixed in 17.0.8.
Comment 8•5 years ago
|
||
Then I think it is safe to close this bug...
Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•