If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Anchors that contain non-UTF-8 characters may hang Mozilla

VERIFIED FIXED in mozilla1.0

Status

()

Core
String
P4
critical
VERIFIED FIXED
16 years ago
16 years ago

People

(Reporter: Ere Maijala (slow), Assigned: jag (Peter Annema))

Tracking

({intl})

Trunk
mozilla1.0
x86
Windows 2000
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [adt2])

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

16 years ago
Clicking anchor that contains scandinavian characters (anchor is "Origo
käytössä") on a page with Windows-1252 character set hangs Mozilla. This is
because the conversion from UTF8 to USC2 goes to an infinite loop if given wrong
input. 

I'll attach a patch that checks the validity of the string before passing it to
the conversion. 

I hope that the component is correct.
(Reporter)

Comment 1

16 years ago
Created attachment 75387 [details] [diff] [review]
Patch to avoid trying the conversion if the string isn't UTF-8

I added a small function to string/obsolete/nsString2.h. It is used in
docshell/base/nsDocShell.cpp to check if the string is valid UTF-8 before
passing it to NS_ConvertUTF8toUCS2().

Now I'm not sure if this is the best way to do it or if the functions are in
correct places. Input requested.
(Reporter)

Comment 2

16 years ago
Created attachment 75388 [details]
Anchor Test Page (hangs browser when clicking link)

Don't try this in the middle of something important :)
isn't the real issue that NS_ConvertUTF8toUCS2 will loop on invalid input?
Summary: Clicking anchor that contains scandinavian characters may hang Mozilla → Clicking anchor that contains scandinavian characters may hang Mozilla
(Reporter)

Comment 4

16 years ago
Well, in the conversion code there are lines such as this: 
NS_ERROR("Not a UTF-8 string. This code should only be used for converting from
known UTF-8 strings.");

That's why I assumed it's better to avoid calling it..
(Reporter)

Comment 5

16 years ago
Oh, I forgot to say that it would be a simple fix to make it not loop forever.
It will already fail an assertion in the loop. Anyway, I'm not sure if it's
proper, because then the result might be unexpected.

Sorry for the spam. 
(Reporter)

Comment 6

16 years ago
After investigating more I've found that there are more places where this can
happen (nsHTMLContentSink, nsXMLContentSink at least). Changing summary to
reflect this better.
Assignee: adamlock → jaggernaut
Component: Embedding: Docshell → String
QA Contact: adamlock → scc
Summary: Clicking anchor that contains scandinavian characters may hang Mozilla → Anchors that contain non-UTF-8 characters may hang Mozilla
(Reporter)

Comment 7

16 years ago
Comment on attachment 75387 [details] [diff] [review]
Patch to avoid trying the conversion if the string isn't UTF-8

Does not fix all situations. Need to find better place for isValidUTF8() also.
Attachment #75387 - Attachment is obsolete: true
(Reporter)

Comment 8

16 years ago
Created attachment 77060 [details] [diff] [review]
Prevent string iterator from looping infinitely

If we don't get anything else into 1.0, I suggest we use this. This will
prevent the string iterator from staying in the infinite loop. Practically I've
just done what there was in comments (breaking out if !one_hop).

Comment 9

16 years ago
*** Bug 134714 has been marked as a duplicate of this bug. ***

Comment 10

16 years ago
adding a test case from bug 134714 and nominating for nsbeta1

const PRUnichar * str = NS_ConvertUTF8toUCS2("\xE0\xE0\xE0").get();

Keywords: nsbeta1

Comment 11

16 years ago
Nav triage team: nsbeta1+, adt2
Keywords: nsbeta1 → nsbeta1+
Whiteboard: [adt2]
(Assignee)

Comment 12

16 years ago
-> 1.0
Target Milestone: --- → mozilla1.0

Updated

16 years ago
Priority: -- → P4

Comment 13

16 years ago
This has to be fixed soon. Bug 139842, bug 142930 are caused by this.
Blocks: 142930

Comment 14

16 years ago
cc to waterson, I think NS_ConvertUTF8toUCS2 should validate the input and
detect illegal UTF-8.

Comment 15

16 years ago
This is not limited to the corrupted data outside from Mozilla.
It can be caused by defrating PRUnichar to char then calling
NS_ConvertUTF8toUCS2 later.
(Assignee)

Comment 16

16 years ago
Created attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input

This will set the string to an empty string if the input isn't UTF8.
Attachment #77060 - Attachment is obsolete: true

Comment 17

16 years ago
Can the code put 0xFFFD for invalid bytes until it sees any valid bytes?
That way, the user can see something like "???abc" instead of a blank string.

Comment 18

16 years ago
> Can the code put 0xFFFD for invalid bytes until it sees any valid bytes?
> That way, the user can see something like "???abc" instead of a blank string.
> 
see the patch of bug 128896
http://bugzilla.mozilla.org/attachment.cgi?id=76351&action=view
(Assignee)

Comment 19

16 years ago
Yes, I see your solution there, but there is (or was) code out there that went
something like:

NS_ConvertUTF8toUCS2 temp(someUnknownEncodingStr);
if (temp.IsEmpty()) {
  // Looks like we're not UTF8, try something else ...
  ...
}

Which the old version of this code supported and I broke when I refactored parts
of it.

What I really want (and that's why I've put NS_ERRORs in place) is that people
only call this code when they know that it's a UTF8 string, and to use the intl
converter code when they don't know what they're dealing with.

Comment 20

16 years ago
Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input

>Index: nsString2.cpp

>-    SetCapacity(count);
>+    SetLength(count);
>       // |SetCapacity| normally doesn't guarantee the use we are putting it to here (see its interface comment in nsAString.h),
>       //  we can only use it since our local implementation, |nsString::SetCapacity|, is known to do what we want

seems like this comment no longer applies.

otherwise, the patch looks good to me.	r/sr=darin
Attachment #82858 - Flags: superreview+

Comment 21

16 years ago
Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input

r=scc.	It looks reasonable to me.
Attachment #82858 - Flags: review+
(Assignee)

Comment 22

16 years ago
Checked in on trunk. Will check in on branch when I get the appropriate approvals.
Status: NEW → RESOLVED
Last Resolved: 16 years ago
Keywords: adt1.0.0
Resolution: --- → FIXED

Updated

16 years ago
Blocks: 141008
Keywords: intl

Comment 23

16 years ago
Let's get this verified on the trunk first.
(Reporter)

Comment 24

16 years ago
Verified. Very good :)
Status: RESOLVED → VERIFIED

Comment 25

16 years ago
adding adt1.0.0+.  Please get drivers approval and then checkin to the 1.0 Branch.
Keywords: adt1.0.0 → adt1.0.0+

Comment 26

16 years ago
Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input

a=chofmann,dbaron,valeski for the 1.0 branch
Attachment #82858 - Flags: approval+
(Assignee)

Comment 27

16 years ago
Checked into the branch.
Keywords: fixed1.0.0
*** Bug 145479 has been marked as a duplicate of this bug. ***

Updated

16 years ago
Blocks: 146292
No longer blocks: 141008
You need to log in before you can comment on or make changes to this bug.