Closed Bug 686312 Opened 13 years ago Closed 13 years ago

Firefox/WebSockets denies to receive certain valid UTF-8 sequences

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla9

People

(Reporter: tobias.oberstein, Assigned: mcmanus)

Details

(Keywords: dev-doc-complete, Whiteboard: [inbound])

Attachments

(1 file, 1 obsolete file)

patch v1 13 years ago Patrick McManus [:mcmanus] 7.55 KB, patch		Details \| Diff \| Splinter Review
patch v2 13 years ago Patrick McManus [:mcmanus] 7.69 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review

Tobias Oberstein

Reporter

Description

•

13 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0a2) Gecko/20110911 Firefox/8.0a2
Build ID: 20110911042006

Steps to reproduce:

I am running a WebSockets protocol test suite which covers UTF-8 by fuzzing an implementation with valid and invalid UTF-8 sequences.
The test suite sends different WS text messages. The exact sequences sent, incl. wire logs can be found in the failed cases
6.9.3, 6.9.4, 6.11.4, 6.22.1, 6.22.2
under the following link
http://www.tavendo.de/autobahn/testsuite/report/clients/index.html


Actual results:

The WS connection is failed by Firefox.


Expected results:

The text message is processed and provided in the JS event handler as text payload.

Tobias Oberstein

Reporter

Comment 1

•

13 years ago

The behavior is identical for Firefox 7, 8 and 9 (tested, but only on Win64).

At least some of the sequences not processed show up correctly in HTML (i.e. have a look at http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt => 5.3.1  U+FFFE). So I don't know if it's really WS specific or not.

In any case, I filed the bug without "security on", since FF denies more than it should, not less.

Patrick McManus [:mcmanus]

Assignee

Comment 2

•

13 years ago

The test I U+FFFE and U+FFFF are non-characters. That's why they are being rejected during the utf-8 validation.

I'm going to have to ask around to figure out if that's the behavior we want or not, but at least at this point I can tell you that's why it is happening.

Tobias Oberstein

Reporter

Comment 3

•

13 years ago

Ok.
Those 2 are valid UTF-8 by RFC3629 and hence by WS draft spec at least ..
What about the other 3 sequences?

Patrick McManus [:mcmanus]

Assignee

Comment 4

•

13 years ago

Hi Tobias,

You report 5 test cases of interest, but I only see 3 different byte sequences being tested.

6.9.3 and 6.22.2 both seem to test ef bf bf, and 6.9.4 and 6.11.4 both seem to test f4 8f bf bf. The other sequence is from test 6.22.1 (ef bf be).

Do I have that correct?

Tobias Oberstein

Reporter

Comment 5

•

13 years ago

Hello Patrick,

sorry, yes, you are right, it's only 3 sequences - rechecked:

6.9.4 / 6.11.4 => f48fbfbf

6.22.1 => efbfbe

6.9.3 / 6.22.2 => efbfbf

I missed that, since I somehow "blindly" constructed most of the cases by adapting the cases from the Markus Kuhn page above. Well ..

Tobias Oberstein

Reporter

Comment 6

•

13 years ago

Ah, I know why I missed it: at the time Markus wrote the page, UTF-8 was on the one hand not yet limited to U+10FFFF and on the other hand U+FFFE/U+FFFF were probably really illegal code positions, not just non-characters. So I had to adjust the cases, and by that introduced the overlap.

--

Don't know if thats relevant: a couple of cases from section 6.4. are deactivated for FF7, since it crashes. FF8 and higher don't crash and pass (non-strict). It also crashes in section 9.3.

https://github.com/oberstet/Autobahn/blob/master/lib/python/autobahn/fuzzing.py#L197

Patrick McManus [:mcmanus]

Assignee

Comment 7

•

13 years ago

(In reply to Tobias Oberstein from comment #5)

> 6.9.4 / 6.11.4 => f48fbfbf
> 
> 6.22.1 => efbfbe
> 
> 6.9.3 / 6.22.2 => efbfbf

ok, those are all non-characters. Brian Smith, David Baron, Jason Duell and I talked face to face and decided that it is appropriate to let WS pass strings with non characters included - so I'll make a patch.

Patrick McManus [:mcmanus]

Assignee

Comment 8

•

13 years ago

Attached patch patch v1 (obsolete) — Details — Splinter Review

Assignee: nobody → mcmanus

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Attachment #561253 - Flags: review?(jduell.mcbugs)

Patrick McManus [:mcmanus]

Assignee

Comment 9

•

13 years ago

Attached patch patch v2 — Details — Splinter Review

Attached the wrong (the test was still full of debug) patch the first time. doh!

Attachment #561253 - Attachment is obsolete: true

Attachment #561253 - Flags: review?(jduell.mcbugs)

Attachment #561257 - Flags: review?(jduell.mcbugs)

Patrick McManus [:mcmanus]

Assignee

Comment 10

•

13 years ago

Comment on attachment 561257 [details] [diff] [review]
patch v2

Actually, David probably has the best sense of this change. The tweak is small and there is a test that exercises it.

Attachment #561257 - Flags: review?(jduell.mcbugs) → review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

13 years ago

Comment on attachment 561257 [details] [diff] [review]
patch v2

>diff --git a/xpcom/string/public/nsReadableUtils.h b/xpcom/string/public/nsReadableUtils.h
>--- a/xpcom/string/public/nsReadableUtils.h
>+++ b/xpcom/string/public/nsReadableUtils.h
>@@ -236,26 +236,26 @@ PRBool IsASCII( const nsACString& aStrin
>    * Returns |PR_TRUE| if |aString| is a valid UTF-8 string.
>    * XXX This is not bullet-proof and nor an all-purpose UTF-8 validator. 
>    * It is mainly written to replace and roughly equivalent to
>    *
>    *    str.Equals(NS_ConvertUTF16toUTF8(NS_ConvertUTF8toUTF16(str)))
>    *
>    * (see bug 191541)
>    * As such,  it does not check for non-UTF-8 7bit encodings such as 
>-   * ISO-2022-JP and HZ. However, it filters out  UTF-8 representation
>+   * ISO-2022-JP and HZ. However, by default it filters out UTF-8 representation
>    * of surrogate codepoints and non-characters ( 0xFFFE and 0xFFFF
>    * in planes 0 through 16.) as well as overlong UTF-8 sequences. 
>    * Also note that it regards UTF-8 sequences corresponding to 
>    * codepoints above 0x10FFFF as invalid in accordance with 
>    * http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-04.txt

This comment change isn't sufficient to describe your change; you need to be much more specific about what it filters when.  In particular, I'd suggest replacing from "However, by default it filters out UTF-8 representation" to the end of what I quoted with:

>It rejects sequences with the following errors:
> * byte sequences that cannot be decoded into characters according to UTF-8's rules (including cases where the input is part of a valid UTF-8 sequence but starts or ends mid-character)
> * overlong sequences (i.e., cases where a character was encoded non-canonically by using more bytes than necessary)
> * surrogate codepoints (i.e., the codepoints reserved for representing astral characters in UTF-16)
> * codepoints above the unicode range (i.e., outside the first 17 planes; higher than U+10FFFF), in accordance with http://tools.ietf.org/html/rfc3629
> * when aRejectNonChar is true (the default), any codepoint whose low 16 bits are 0xFFFE or 0xFFFF

This suggested change:
 * categorizes the things that are rejected more clearly
 * clearly marks which one is affected by aRejectNonChar
 * updates the RFC2279bis reference to RFC3629.

r=dbaron with that fixed

Attachment #561257 - Flags: review?(dbaron) → review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

13 years ago

(Please wrap my comment appropriately, too; I just didn't want to try wrapping it to the right length in the bugzilla input.)

Patrick McManus [:mcmanus]

Assignee

Updated

•

13 years ago

Whiteboard: [inbound]

Target Milestone: --- → mozilla9

Patrick McManus [:mcmanus]

Assignee

Comment 13

•

13 years ago

  In particular, I'd suggest
> replacing from "However, by default it filters out UTF-8 representation" to
> the end of what I quoted with:
>

Thanks!

Mozilla RelEng Bot

Comment 14

•

13 years ago

Try run for 4fc81a7e251e is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=4fc81a7e251e
Results (out of 59 total builds):
    success: 56
    warnings: 3
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-4fc81a7e251e

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 15

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/f98f144896ad

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Christopher Blizzard (:blizzard)

Updated

•

13 years ago

Keywords: dev-doc-needed

Eric Shepherd [:sheppy]

Comment 16

•

13 years ago

I've just added a note to Firefox 9 for developers for this, as well as a brief discussion here:

https://developer.mozilla.org/en/WebSockets/Writing_WebSocket_client_applications#Text_data_format

Keywords: dev-doc-needed → dev-doc-complete

You need to log in before you can comment on or make changes to this bug.