Last Comment Bug 752780 - Convert invalid UTF-16 to UTF-8 in websocket send()
: Convert invalid UTF-16 to UTF-8 in websocket send()
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: unspecified
: All All
-- normal (vote)
: mozilla16
Assigned To: Jason Duell [:jduell] (needinfo me)
: Patrick McManus [:mcmanus]
Depends on:
  Show dependency treegraph
Reported: 2012-05-07 19:10 PDT by Jason Duell [:jduell] (needinfo me)
Modified: 2012-07-10 15:48 PDT (History)
2 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

v1 fix (9.48 KB, patch)
2012-07-04 01:14 PDT, Jason Duell [:jduell] (needinfo me)
jonas: review+
Details | Diff | Splinter Review

Description User image Jason Duell [:jduell] (needinfo me) 2012-05-07 19:10:51 PDT
The W3C Candidate Recommendation 08 December 2011 says we should throw an exception if text with unpaired surrogates is passed to send().  The latest editor's draft has changed that to say that we should convert the incoming argument to Unicode if it's not already (I'm an ignoramus about Unicode: apparently this involves inserting replacement characters).

Not sure when we should ship a fix:  now, or wait until the Editor's draft becomes another W3C recommendation?
Comment 1 User image Jonas Sicking (:sicking) No longer reading bugmail consistently 2012-05-07 22:34:09 PDT
Consensus in the working group was to take this change (though it came down to a close vote if we should make the change now or wait until v2 of the protocol). So I feel pretty confident that this change will stick, so I see no reason to wait.
Comment 2 User image Jason Duell [:jduell] (needinfo me) 2012-07-04 01:14:08 PDT
Created attachment 638996 [details] [diff] [review]
v1 fix

Why do I ever think little things like this will take "just 15 minutes"? 

Anyway:  here's what seems to be a working fix.  It turns out that the existing, custom conversion code (nsWebSocket::ConvertTextToUTF8) doesn't insert replacement characters correctly (instead of replacing with '0xef 0xbf 0xbd', aka '\ufffd', it replaced with '0xed 0xa0 0x80'.  Which seems odd, because the code explicitly tells the converter to use UTF_8_REPLACEMENT_CHAR.   But I didn't spent too much time worrying about it, because it seems like we're fine just using CopyUTF16toUTF8(), which is what we're already using everywhere else in the logic.
Comment 3 User image Jason Duell [:jduell] (needinfo me) 2012-07-09 17:31:51 PDT
Comment 4 User image Ryan VanderMeulen [:RyanVM] 2012-07-10 15:48:56 PDT

Note You need to log in before you can comment on or make changes to this bug.