Closed Bug 317216 Opened 19 years ago Closed 15 years ago

UTF16 and UTF32 decoders allow invalid UTF16 into the core

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: bzbarsky, Assigned: smontagu)

References

(Blocks 1 open bug)

Details

(Keywords: fixed1.9.1, verified1.8.1.22, verified1.9.0.9, Whiteboard: [sg:want P1])

Attachments

(4 files, 1 obsolete file)

UTF-32 patch 19 years ago Jungshik Shin 3.21 KB, patch	smontagu : review+ bzbarsky : superreview+	Details \| Diff \| Splinter Review
UTF-16 patch 17 years ago Simon Montagu :smontagu 13.83 KB, patch	jshin1987 : review+ dveditz : superreview+	Details \| Diff \| Splinter Review
UTF-16 patch v.3 15 years ago Simon Montagu :smontagu 15.09 KB, patch	jag+mozilla : review+ dveditz : approval1.9.0.9+	Details \| Diff \| Splinter Review
Patch for 1.8 branch 15 years ago Simon Montagu :smontagu 14.84 KB, patch	dveditz : approval1.8.1.next+ smontagu : approval1.8.0.next?	Details \| Diff \| Splinter Review
Testcase 15 years ago Simon Montagu :smontagu 6.58 KB, text/html		Details

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

19 years ago

See bug 316338.  The relevant parts are:

> There's also the following comment in our "UCS216ToUnicode" converter:

> // XXX : illegal surrogate code points are just passed through !!

and

> and also in UTF32ToUnicode:

> // XXX Do we have to convert surrogate code points to the replacement
> // character (0xfffd)? 

The answer to the latter question is "Yes, please!", of course.

Simon Montagu :smontagu

Assignee

Comment 1

•

19 years ago

(In reply to comment #0)
> > // XXX Do we have to convert surrogate code points to the replacement
> > // character (0xfffd)? 
> 
> The answer to the latter question is "Yes, please!", of course.

It was discussed in bug 184120 comment 15:

> > Should you check that the input isn't in the surrogate ranges here (and > replace
> > with 0xfffd, perhaps?), since failing to check could allow an 
> > incorrectly
> > encoded thing that looks like a surrogate into the data?  
> 
>   It's not very clear what to do in such a case. (UAX #19 doesn't
> say anything about what to do when converting UTF-32 to other
> encoding forms). I'll ask around. In the meantime, I added a comment.
> 
> > Or do all our
> > routines that handle UTF-16 correctly handle such errors?
> 
>   They're supposed to check for unpaired surrogate code points.
> In all cases I'm aware of, our routines do the 'right thing'.
> Therefore, I guess we can get away with just passing
> thru all codepoints below 0x10000 for the time being. 

I think we've learned from bug 316338 and its offshoots that that's not the case. The definitions and conformance requirements have also been clarified in the current version of Unicode[1]:

D31: Because surrogate code points are not included in the set of Unicode scalar values, UTF-32 code units in the range 0000D80016..0000DFFF16 are ill-formed.

D35: Because surrogate code points are not Unicode scalar values, isolated UTF-16 code units in the range D80016..DFFF16 are ill-formed.

C12a When a process interprets a code unit sequence which purports to be in a Unicode character encoding form, it shall treat ill-formed code unit sequences as an error condition, and shall not interpret such sequences as characters.

[1] http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf

Status: NEW → ASSIGNED

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

19 years ago

>   They're supposed to check for unpaired surrogate code points.
> In all cases I'm aware of, our routines do the 'right thing'.

Er..  "our routines" would be the whole friggin' codebase, plus any system APIs that we pass UTF16 to.  There's no way they could all possibly be doing "the right thing", and imposing such a burden on all of it is unreasonable.

Jungshik Shin

Comment 3

•

19 years ago

Attached patch UTF-32 patch — Details — Splinter Review

As I'm to blame for that XXX  comments and oversight, I'm taking it. This is the easier of two cases.

Assignee: smontagu → jshin1987

Attachment #203790 - Flags: superreview?(bzbarsky)

Attachment #203790 - Flags: review?(smontagu)

Simon Montagu :smontagu

Assignee

Comment 4

•

19 years ago

Comment on attachment 203790 [details] [diff] [review]
UTF-32 patch

The same code is repeated later in the source file. Make the same changes there and r=me.

Attachment #203790 - Flags: review?(smontagu) → review+

Boris Zbarsky [:bzbarsky]

Reporter

Comment 5

•

19 years ago

Comment on attachment 203790 [details] [diff] [review]
UTF-32 patch

Excellent!

Attachment #203790 - Flags: superreview?(bzbarsky) → superreview+

Jungshik Shin

Comment 6

•

19 years ago

Thanks for your review and reference.
UTF-32 patch checked into the trunk a couple of days ago. As for UTF-16, I quickly made a patch, but that's not perfect in that it doesn't handle cases where a surrogate pair is splitted into two separate chunks. Btw, Simon, do you think we just have to replace an isolated surrogate by the replacement character? That's what's done in UTF-32 patch..

Simon Montagu :smontagu

Assignee

Comment 7

•

19 years ago

(In reply to comment #6)
> Btw, Simon, do you
> think we just have to replace an isolated surrogate by the replacement
> character? That's what's done in UTF-32 patch..

Yes, I think so, based on D35 that I quoted in comment 1.

Jesse Ruderman

Updated

•

17 years ago

Flags: blocking1.9?

Damon Sicore (:damons)

Comment 8

•

17 years ago

If this has been checked in and is really fixed/resolved, please mark as so.  This is old, and I'm not sure why this should block 1.9.  Please re-nom if necessary.

Flags: blocking1.9? → blocking1.9-

Jesse Ruderman

Comment 9

•

17 years ago

I believe it was nominated because invalid UTF-16 in core has caused memory safety bugs in the past.

Flags: blocking1.9- → blocking1.9?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 10

•

17 years ago

What Jesse says.  This bug was part of a series of bugs filed once we realized we were running into security issues with this...

We still need to fix the UTF16 decoder.

Damon Sicore (:damons)

Comment 11

•

17 years ago

OK.  I'll plus this, set the priority to P3.  Please reset the priority if need be.

Flags: blocking1.9? → blocking1.9+

Priority: -- → P3

Simon Montagu :smontagu

Assignee

Comment 12

•

17 years ago

It shouldn't be hard to allow for surrogate pairs split into two separate chunks as mentioned in comment 6 -- we handle the same situation in other multi-byte encodings.

Simon Montagu :smontagu

Assignee

Comment 13

•

17 years ago

Attached patch UTF-16 patch (obsolete) — Details — Splinter Review

This turned out to be a little trickier than I expected, because of the possibility of either odd bytes or odd surrogates between buffers, combined with the possibility of swapping endianness. I moved some things around to make the new code less tangled.

Assignee: jshin1987 → smontagu

Attachment #288642 - Flags: superreview?(bzbarsky)

Attachment #288642 - Flags: review?

Simon Montagu :smontagu

Assignee

Updated

•

17 years ago

Attachment #288642 - Flags: review? → review?(jshin1987)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 14

•

17 years ago

Simon, could you find someone else to sr this, please?

Simon Montagu :smontagu

Assignee

Updated

•

17 years ago

Attachment #288642 - Flags: superreview?(bzbarsky) → superreview?(cbiesinger)

Jungshik Shin

Updated

•

17 years ago

Attachment #288642 - Flags: review?(jshin1987) → review+

Mike Beltzner [:beltzner, not reading bugmail]

Comment 15

•

16 years ago

bz: moving off the hard blocker list for this release, renom if you think it MUST be fixed before we ship Firefox 3.

We'll take a reviewed patch, of course.

Flags: tracking1.9+ → wanted-next+

Boris Zbarsky [:bzbarsky]

Reporter

Comment 16

•

16 years ago

I think this should be at least wanted1.9.0.x....

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

16 years ago

Flags: wanted1.9.0.x?

Simon Montagu :smontagu

Assignee

Comment 17

•

16 years ago

biesi: ping?

Whiteboard: need sr=biesi

Samuel Sidler (old account; do not CC)

Updated

•

16 years ago

Flags: wanted1.9.0.x? → wanted1.9.0.x+

QA Contact: amyy → i18n

Daniel Veditz [:dveditz]

Comment 18

•

16 years ago

This got bumped out of 3.0 in favor of a later 3.0.x release and then died. All it needs is an sr= -- does it really have to be biesi?

Flags: blocking1.9.1?

Daniel Veditz [:dveditz]

Updated

•

16 years ago

Attachment #288642 - Flags: superreview?(cbiesinger) → superreview?(dveditz)

Benjamin Smedberg

Updated

•

16 years ago

Flags: blocking1.9.1? → blocking1.9.1+

Priority: P3 → P2

Simon Montagu :smontagu

Assignee

Updated

•

15 years ago

Attachment #288642 - Flags: superreview?(dveditz) → superreview?(jag)

Simon Montagu :smontagu

Assignee

Updated

•

15 years ago

Whiteboard: need sr=biesi → need sr

Daniel Veditz [:dveditz]

Comment 19

•

15 years ago

Comment on attachment 288642 [details] [diff] [review]
UTF-16 patch

sr=dveditz

Attachment #288642 - Flags: superreview?(jag) → superreview+

Daniel Veditz [:dveditz]

Updated

•

15 years ago

Flags: blocking1.9.0.8?

Whiteboard: need sr

Simon Montagu :smontagu

Assignee

Comment 20

•

15 years ago

Attached patch UTF-16 patch v.3 — Details — Splinter Review

This is a co-production from jag and me. r=me for his parts.

Attachment #288642 - Attachment is obsolete: true

Attachment #363113 - Flags: review?(jag)

Daniel Veditz [:dveditz]

Updated

•

15 years ago

Flags: blocking1.9.0.8? → blocking1.9.0.8+

Whiteboard: [sg:want P1]

jag (Peter Annema)

Comment 21

•

15 years ago

Comment on attachment 363113 [details] [diff] [review]
UTF-16 patch v.3

r=jag on the tests and the changes outside UTF16ConvertToUnicode.

You could create the |IOService| and |ConverterInputStream| constructors and |ios| outside testCase, and use |const| instead of |var|.

Attachment #363113 - Flags: review?(jag) → review+

Simon Montagu :smontagu

Assignee

Comment 22

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/1abd98b9df00

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Simon Montagu :smontagu

Assignee

Comment 23

•

15 years ago

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/bd90eb7a73ee

Keywords: fixed1.9.1

Simon Montagu :smontagu

Assignee

Comment 24

•

15 years ago

Comment on attachment 363113 [details] [diff] [review]
UTF-16 patch v.3

Requesting branch approval after a week baking on trunk. We have good test coverage of UTF-16 decoding with the tests from bug 335531, bug 340714 and bug 396637, plus the new test in this patch, so I am confident that it shouldn't cause any regressions.

Attachment #363113 - Flags: approval1.9.0.8?

Daniel Veditz [:dveditz]

Updated

•

15 years ago

Attachment #363113 - Flags: approval1.9.0.8? → approval1.9.0.8+

Daniel Veditz [:dveditz]

Comment 25

•

15 years ago

Comment on attachment 363113 [details] [diff] [review]
UTF-16 patch v.3

Approved for 1.9.0.8, a=dveditz for release-drivers

Simon Montagu :smontagu

Assignee

Updated

•

15 years ago

Keywords: fixed1.9.0.8

Daniel Veditz [:dveditz]

Comment 26

•

15 years ago

This will probably apply to the 1.8 branch, right?

Flags: wanted1.8.1.x?

Flags: blocking1.8.1.next?

Simon Montagu :smontagu

Assignee

Comment 27

•

15 years ago

Probably not, actually, unless we combine it with the patch from bug 340714.

Daniel Veditz [:dveditz]

Comment 28

•

15 years ago

Which we probably want to do, sorry. We're still supporting Thunderbird 2 and it can potentially get a lot of bad data pushed down to it.

Flags: wanted1.8.1.x?

Flags: wanted1.8.1.x+

Flags: blocking1.8.1.next?

Flags: blocking1.8.1.next+

Al Billings [:abillings - ex-MoCo]

Comment 29

•

15 years ago

Verified for 1.9.0.9 based on checked in tests with the patch.

Keywords: fixed1.9.0.9 → verified1.9.0.9

Simon Montagu :smontagu

Assignee

Comment 30

•

15 years ago

Attached patch Patch for 1.8 branch — Details — Splinter Review

This includes both patches from this bug, and also the patches from bug 396637, bug 340714 and bug 376636.

Attachment #371199 - Flags: approval1.8.0.next?

Samuel Sidler (old account; do not CC)

Updated

•

15 years ago

Attachment #371199 - Flags: approval1.8.1.next?

Daniel Veditz [:dveditz]

Comment 31

•

15 years ago

Comment on attachment 371199 [details] [diff] [review]
Patch for 1.8 branch

Approved for 1.8.1.22, a=dveditz for release-drivers

Attachment #371199 - Flags: approval1.8.1.next? → approval1.8.1.next+

Simon Montagu :smontagu

Assignee

Updated

•

15 years ago

Keywords: fixed1.8.1.22

Simon Montagu :smontagu

Assignee

Comment 32

•

15 years ago

Attached file Testcase — Details

Stand-alone version of the unit test for testing on 1.8 branch

Simon Montagu :smontagu

Assignee

Comment 33

•

15 years ago

N.B. attachment 371409 [details] request enhanced privileges, so will need to be saved and run locally (or with signed.applets.codebase_principal_support set to true)

Al Billings [:abillings - ex-MoCo]

Comment 34

•

15 years ago

Verified for 1.8.1.22 using Seamonkey and stand-alone testcase: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.22pre) Gecko/20090604 SeaMonkey/1.1.17pre. 

Verified for Trunk with Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090604 Minefield/3.6a1pre.

Status: RESOLVED → VERIFIED

Keywords: fixed1.8.1.22 → verified1.8.1.22

You need to log in before you can comment on or make changes to this bug.