Closed Bug 132583 Opened 22 years ago Closed 22 years ago

Anchors that contain non-UTF-8 characters may hang Mozilla

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.0

People

(Reporter: emaijala+moz, Assigned: jag+mozilla)

References

Details

(Keywords: intl, Whiteboard: [adt2])

Attachments

(2 files, 2 obsolete files)

Patch to avoid trying the conversion if the string isn't UTF-8 22 years ago Ere Maijala (slow) 4.03 KB, patch		Details \| Diff \| Splinter Review
Anchor Test Page (hangs browser when clicking link) 22 years ago Ere Maijala (slow) 161 bytes, text/html		Details
Prevent string iterator from looping infinitely 22 years ago Ere Maijala (slow) 2.61 KB, patch		Details \| Diff \| Splinter Review
Make NS_ConvertUTF8toUCS2 deal better with faulty input 22 years ago jag (Peter Annema) 2.92 KB, patch	scc : review+ darin.moz : superreview+ chofmann : approval+	Details \| Diff \| Splinter Review

Ere Maijala (slow)

Reporter

Description

•

22 years ago

Clicking anchor that contains scandinavian characters (anchor is "Origo
käytössä") on a page with Windows-1252 character set hangs Mozilla. This is
because the conversion from UTF8 to USC2 goes to an infinite loop if given wrong
input. 

I'll attach a patch that checks the validity of the string before passing it to
the conversion. 

I hope that the component is correct.

Ere Maijala (slow)

Reporter

Comment 1

•

22 years ago

Attached patch Patch to avoid trying the conversion if the string isn't UTF-8 (obsolete) — Details — Splinter Review

I added a small function to string/obsolete/nsString2.h. It is used in
docshell/base/nsDocShell.cpp to check if the string is valid UTF-8 before
passing it to NS_ConvertUTF8toUCS2().

Now I'm not sure if this is the best way to do it or if the functions are in
correct places. Input requested.

Ere Maijala (slow)

Reporter

Comment 2

•

22 years ago

Attached file Anchor Test Page (hangs browser when clicking link) — Details

Don't try this in the middle of something important :)

Boris Zbarsky [:bzbarsky]

Comment 3

•

22 years ago

isn't the real issue that NS_ConvertUTF8toUCS2 will loop on invalid input?

Summary: Clicking anchor that contains scandinavian characters may hang Mozilla → Clicking anchor that contains scandinavian characters may hang Mozilla

Ere Maijala (slow)

Reporter

Comment 4

•

22 years ago

Well, in the conversion code there are lines such as this: 
NS_ERROR("Not a UTF-8 string. This code should only be used for converting from
known UTF-8 strings.");

That's why I assumed it's better to avoid calling it..

Ere Maijala (slow)

Reporter

Comment 5

•

22 years ago

Oh, I forgot to say that it would be a simple fix to make it not loop forever.
It will already fail an assertion in the loop. Anyway, I'm not sure if it's
proper, because then the result might be unexpected.

Sorry for the spam.

Ere Maijala (slow)

Reporter

Comment 6

•

22 years ago

After investigating more I've found that there are more places where this can
happen (nsHTMLContentSink, nsXMLContentSink at least). Changing summary to
reflect this better.

Assignee: adamlock → jaggernaut

Component: Embedding: Docshell → String

QA Contact: adamlock → scc

Summary: Clicking anchor that contains scandinavian characters may hang Mozilla → Anchors that contain non-UTF-8 characters may hang Mozilla

Ere Maijala (slow)

Reporter

Comment 7

•

22 years ago

Comment on attachment 75387 [details] [diff] [review]
Patch to avoid trying the conversion if the string isn't UTF-8

Does not fix all situations. Need to find better place for isValidUTF8() also.

Attachment #75387 - Attachment is obsolete: true

Ere Maijala (slow)

Reporter

Comment 8

•

22 years ago

Attached patch Prevent string iterator from looping infinitely (obsolete) — Details — Splinter Review

If we don't get anything else into 1.0, I suggest we use this. This will
prevent the string iterator from staying in the infinite loop. Practically I've
just done what there was in comments (breaking out if !one_hop).

nhottanscp

Comment 9

•

22 years ago

*** Bug 134714 has been marked as a duplicate of this bug. ***

nhottanscp

Comment 10

•

22 years ago

adding a test case from bug 134714 and nominating for nsbeta1

const PRUnichar * str = NS_ConvertUTF8toUCS2("\xE0\xE0\xE0").get();

Keywords: nsbeta1

Samir Gehani

Comment 11

•

22 years ago

Nav triage team: nsbeta1+, adt2

Keywords: nsbeta1 → nsbeta1+

Whiteboard: [adt2]

jag (Peter Annema)

Assignee

Comment 12

•

22 years ago

-> 1.0

Target Milestone: --- → mozilla1.0

Peter Trudelle

Updated

•

22 years ago

Priority: -- → P4

nhottanscp

Comment 13

•

22 years ago

This has to be fixed soon. Bug 139842, bug 142930 are caused by this.

(not reading, please use seth@sspitzer.org instead)

Updated

•

22 years ago

Blocks: 142930

nhottanscp

Comment 14

•

22 years ago

cc to waterson, I think NS_ConvertUTF8toUCS2 should validate the input and
detect illegal UTF-8.

nhottanscp

Comment 15

•

22 years ago

This is not limited to the corrupted data outside from Mozilla.
It can be caused by defrating PRUnichar to char then calling
NS_ConvertUTF8toUCS2 later.

jag (Peter Annema)

Assignee

Comment 16

•

22 years ago

Attached patch Make NS_ConvertUTF8toUCS2 deal better with faulty input — Details — Splinter Review

This will set the string to an empty string if the input isn't UTF8.

Attachment #77060 - Attachment is obsolete: true

nhottanscp

Comment 17

•

22 years ago

Can the code put 0xFFFD for invalid bytes until it sees any valid bytes?
That way, the user can see something like "???abc" instead of a blank string.

nhottanscp

Comment 18

•

22 years ago

> Can the code put 0xFFFD for invalid bytes until it sees any valid bytes?
> That way, the user can see something like "???abc" instead of a blank string.
> 
see the patch of bug 128896
http://bugzilla.mozilla.org/attachment.cgi?id=76351&action=view

jag (Peter Annema)

Assignee

Comment 19

•

22 years ago

Yes, I see your solution there, but there is (or was) code out there that went
something like:

NS_ConvertUTF8toUCS2 temp(someUnknownEncodingStr);
if (temp.IsEmpty()) {
  // Looks like we're not UTF8, try something else ...
  ...
}

Which the old version of this code supported and I broke when I refactored parts
of it.

What I really want (and that's why I've put NS_ERRORs in place) is that people
only call this code when they know that it's a UTF8 string, and to use the intl
converter code when they don't know what they're dealing with.

Darin Fisher

Comment 20

•

22 years ago

Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input

>Index: nsString2.cpp

>-    SetCapacity(count);
>+    SetLength(count);
>       // |SetCapacity| normally doesn't guarantee the use we are putting it to here (see its interface comment in nsAString.h),
>       //  we can only use it since our local implementation, |nsString::SetCapacity|, is known to do what we want

seems like this comment no longer applies.

otherwise, the patch looks good to me.	r/sr=darin

Attachment #82858 - Flags: superreview+

Scott Collins

Comment 21

•

22 years ago

Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input

r=scc.	It looks reasonable to me.

Attachment #82858 - Flags: review+

jag (Peter Annema)

Assignee

Comment 22

•

22 years ago

Checked in on trunk. Will check in on branch when I get the appropriate approvals.

Status: NEW → RESOLVED

Closed: 22 years ago

Keywords: adt1.0.0

Resolution: --- → FIXED

Frank Tang

Updated

•

22 years ago

Blocks: 141008

Keywords: intl

scottputterman

Comment 23

•

22 years ago

Let's get this verified on the trunk first.

Ere Maijala (slow)

Reporter

Comment 24

•

22 years ago

Verified. Very good :)

Status: RESOLVED → VERIFIED

scottputterman

Comment 25

•

22 years ago

adding adt1.0.0+.  Please get drivers approval and then checkin to the 1.0 Branch.

Keywords: adt1.0.0 → adt1.0.0+

chris hofmann

Comment 26

•

22 years ago

Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input

a=chofmann,dbaron,valeski for the 1.0 branch

Attachment #82858 - Flags: approval+

jag (Peter Annema)

Assignee

Comment 27

•

22 years ago

Checked into the branch.

Keywords: fixed1.0.0

Boris Zbarsky [:bzbarsky]

Comment 28

•

22 years ago

*** Bug 145479 has been marked as a duplicate of this bug. ***

Frank Tang

Updated

•

22 years ago

Blocks: 146292
No longer blocks: 141008

Nobody; OK to take it and work on it

Updated

•

3 years ago

Component: String → XPCOM

You need to log in before you can comment on or make changes to this bug.