Closed Bug 317334 (onload=window()) Opened 19 years ago Closed 18 years ago

hang when long wrappable string is passed to prompt() [e.g. as used in the exploit for IE's <body onload=window()> bug]

Categories

(Core :: Layout, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.8.1beta2

People

(Reporter: bernd_mozilla, Assigned: martijn.martijn)

References

(Depends on 1 open bug, )

Details

(Keywords: fixed1.8.1, hang, perf, Whiteboard: [reflow-refactor])

Attachments

(10 files, 4 obsolete files)

load the testcase and hit one selection it hanges ff

see http://www.computerterrorism.com/research/ie/ct21-11-2005 for the details 
advertised at http://www.heise.de/newsticker/meldung/66480
It happens because this page is opened in an iframe in the new popup:
http://www.computerterrorism.com/research/ie/fillmem.htm
Specifically, it's this part of the script at that page:
//
// Address called by the bug (also serves as slide code)
//
for (spearson=1 ; spearson <=500 ; spearson++)
{ 
eip = eip + unescape("%u7030%u4300")
//eip = eip + unescape("%u4300")
}


//
// Create a large chunk for memory saturation
//
for (spearson=1 ; spearson <=200; spearson++) 
{ 
fillmem = fillmem + eip
}

This creates an enormous string, which makes Mozilla hang and apparently makes IE6 exploitable.
I can confrim that Firefox hangs up at:

http://www.computerterrorism.com/research/ie/poc.htm
and 
http://www.computerterrorism.com/research/ie/fillmem.htm

but not with the heise testcases.

Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8) Gecko/20051107 Firefox/1.5
*** Bug 317353 has been marked as a duplicate of this bug. ***
(In reply to comment #3)
> I can confrim that Firefox hangs up at:

NB: it's not a permanent hang, we do eventually recover (couple of minutes for me on WinXP) and display the prompt dialog.
if you interrupt the busy state in a debugger we're busy in layout trying to display the prompt(). Usually in some form of Reflow(), sometimes in font stuff, sometimes in Bidi (nsBidiPresUtils::RemoveBidiContinuation?).

*** Bug 317358 has been marked as a duplicate of this bug. ***
Note that with a GTK1 build (which is what my profiling build is) all our time is just spent messing with fonts...  I doubt that's the issue on Windows, though.
OK, with a GTK2 build I get something more like:

Total hit count: 340722
Count %Total  Function Name
151410   44.4     nsBlockFrame::DoRemoveFrame(nsIFrame*, int)
13795   4.0     SearchTable
7655   2.2     FcCharSetDestroy
7410   2.2     nsBlockFrame::TryAllLines(nsLineList_iterator*, nsLineList_iterator*, int*)
4656   1.4     PL_DHashTableOperate

with the calls to DoRemoveFrame coming mostly from nsBidiPresUtils::RemoveBidiContinuation, called from nsBidiPresUtils::Resolve.  About 40% of total time is spent under Resolve (most in DoRemoveFrame).

I broke in Resolve, and the 3rd or 4th time it's called I get:

200       mSuccess = InitLogicalArray(aPresContext, aFirstChild, nsnull, PR_TRUE);
(gdb) next
p202      if (text->mUnicodeBidi == NS_STYLE_UNICODE_BIDI_OVERRIDE) {
(gdb) p mLogicalFrames.Count()
$26 = 66709

Going forward some more:

238       mSuccess = mBidiEngine->CountRuns(&runCount);
(gdb) next
239       if (NS_FAILED(mSuccess) ) {
(gdb) p runCount
$27 = 5

So I'm guessing we have that many frames because of line-breaking or something?  Seems like a lot for a 200111-char string (which is what we have).  In any case, I bet that whatever order we're removing them in makes all this O(N^2) in the number of frames or something like that...
For what it's worth, when we call nsBidiPresUtils::RemoveBidiContinuation, we call it as:

Breakpoint 3, nsBidiPresUtils::RemoveBidiContinuation (this=0x8abcc28, 
    aPresContext=0x88b2a80, aFrame=0xa1c6934, aFirstIndex=4, aLastIndex=66708, 
    aOffset=@0xbfffb084) at ../../../mozilla/layout/base/nsBidiPresUtils.cpp:774

Note aLastIndex and aFirstIndex.  Then we proceed to remove those frames from the block, starting at the end (why at the end?).  That's the Really Slow way to do it given the linked-list storage for frames, no?  Is there a reason not to change the iteration order?
So yeah, switching the iteration order makes the profile look like:

Total hit count: 220757
Count %Total  Function Name
17914   8.1     SearchTable
8595   3.9     FcCharSetDestroy
5983   2.7     PL_DHashTableOperate
5227   2.4     nsStyleContext::GetStyleData(nsStyleStructID)
4858   2.2     nsTextFrame::MeasureText(nsPresContext*, nsHTMLReflowState const&, nsTextTransformer&, nsTextStyle&, nsTextFrame::TextReflowData&)

Too bad I crash at some point before that dialog can come up... ;)  But that may be because I screwed up the order-reversing, of course.
*** Bug 317373 has been marked as a duplicate of this bug. ***
*** Bug 317394 has been marked as a duplicate of this bug. ***
BTW: On a related note, in earlier builds it only took ~20 seconds to display the prompt, now it takes ~90 seconds here (also the memory usage is much higher, 30MB vs. 70MB). I narrowed that one down to the time range 2005-10-15 to 2005-10-16. Bonsai link would be http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=MozillaTinderboxAll&branch=HEAD&branchtype=match&filetype=match&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-01-15+04%3A00%3A00&maxdate=2005-01-16+07%3A00%3A00&cvsroot=%2Fcvsroot  might be related here (i mean 20 seconds to display this would still be much better than 90 seconds ;).
I meant 2005-01-15 and 2005-01-16 of course (it's correct in the bonsai link).
I'll bet on the SizeToContent fix in that range; if you want to test by local backout, please do.

That said, 

1) The real thing to do is to fix the perf issues
2) Displaying a 200,000-character string is never going to get particularly
   snappy.
Alias: onload=window()
Summary: hang with IE exploit → hang with IE exploit [body onload=window()]
(In reply to comment #11)
> Too bad I crash at some point before that dialog can come up... ;)  But that
> may be because I screwed up the order-reversing, of course.

As the code stands, just switching the iteration order is going to crash because RemoveFrame() deletes all of the frame's next-in-flows, so as we go on iterating through mLogicalFrames we get pointers to frames that have already been deleted.
OK.  So what's actually going on here?  I assume not all the frames in mLogicalFrames are in-flows of each other, but some are?  Can we just leave things that are not first in flow out of mLogicalFrames here?  Or filter them out, or something?

Alternately, why are we ending up with 60000-some frames here anyway?  Should we be?
I'll look into this some more tomorrow. As far as I can tell so far, bidi resolving isn't creating the 6000+ frames, but I'm not sure who is. Line breaking?
Line breaking is possible, yes...  But why are we line-breaking every 3 chars, then?
Attached image 1 char
so we reflow the text once it has these chars to death, already one char is 6 times reflown if it needs to break it reflows the whole thing multiple times
so it looks that the range of the unichar defines whether we wrap the text and start that "insane" linebreaking.
(In reply to comment #16)
> I'll bet on the SizeToContent fix in that range; if you want to test by local
> backout, please do.

Yeah, that was it. Without that patch, it takes 15 seconds.
Without that fix I bet we never do that second reflow (the one where we have 60000 frames).
FF 1.5 OSX eats all available CPU and hangs with a Spinning Beachball of Doom.
Severity: normal → major
OS: Windows XP → All
Hardware: PC → All
Severity: major → normal
Hardware: All → PC
the MEW is differently computed for both testcases, so we think we can wrap the text.

 area 0320D108 r=0 a=0,UC c=0,0 cnt=6246
  text 0320D1AC r=0 a=UC,UC c=UC,UC cnt=6247
  text 0320D1AC d=150000,195 me=75 <------- causes hang
  text 0320D1AC r=2 a=0,UC c=UC,UC cnt=6248

 area 0323FFE8 r=0 a=0,UC c=0,0 cnt=6182
  text 0324008C r=0 a=UC,UC c=UC,UC cnt=6183
  text 0324008C d=300000,195 me=300000
  text 0324008C r=2 a=0,UC c=UC,UC cnt=6184
  text 0324008C d=300000,195 me=300000
 area 0323FFE8 d=300000,0 me=300000 m=300000 o=(0,0) 300000 x 195

 
this testcases causes as much reflows as the first one
I think we hit the usual xul reflow is inefficient bug (bug 260191 for instance) in order to get the prefsize we reflow like http://lxr.mozilla.org/seamonkey/source/layout/xul/base/src/nsBoxFrame.cpp#1004
with a zero width as the reflow log shows to get the minimum width. Then the text needs to wrap. Somehow the initial text seems to be wrappable, but the attachment 203965 [details] shows that, if the text is definetely wrappable we will wrap the text at every possible place to come as close as possible to the 0 available width.

Looks like [reflow refactor].
or in other words:
we hit http://lxr.mozilla.org/seamonkey/source/layout/generic/nsFrame.cpp#4748
// 3) Minimum size. This is a toughy. We can pass the block a flag asking for the max element
// size. That would give us the width. Unfortunately you can only ask for a maxElementSize
// during an incremental reflow. So on other reflows we will just have to use 0.
// The min height on the other hand is fairly easy we need to get the largest
// line height. This can be done with the line iterator.

I believe this is wrong
So perhaps we should start filing separate bugs blocking this?  I'd still like to see what we can do about the bidi stuff here, for example...
Whiteboard: [reflow-refactor]
Keywords: perf
Summary: hang with IE exploit [body onload=window()] → hang when long wrappable string is passed to prompt() [e.g. as used in the exploit for IE's <body onload=window()> bug]
(In reply to comment #31)
> Somehow the initial text seems to be wrappable

I'm not sure if there is a doubtful tone here, but the initial text is legitimately wrappable between any pair of ideographic characters, i.e. just about everywhere.
(In reply to comment #18)
> Can we just leave
> things that are not first in flow out of mLogicalFrames here?  Or filter them
> out, or something?

It's possible that parent->RemoveFrame() isn't the ideal way to do this anyway. We might be better off writing something more purpose-built that just goes through reconnecting the bidi continuations.
 
> Alternately, why are we ending up with 60000-some frames here anyway?  Should
> we be?
 
It would also be interesting to try a similar testcase that creates a huge string of alternate left-to-right and right-to-left characters.
*** Bug 317573 has been marked as a duplicate of this bug. ***
*** Bug 317578 has been marked as a duplicate of this bug. ***
*** Bug 317587 has been marked as a duplicate of this bug. ***
Attached patch versuchSplinter Review
That patch improved it by 32% :) (before it was ~95 seconds, with the patch it's ~65 seconds).
*** Bug 317799 has been marked as a duplicate of this bug. ***
*** Bug 318109 has been marked as a duplicate of this bug. ***
*** Bug 318147 has been marked as a duplicate of this bug. ***
*** Bug 318144 has been marked as a duplicate of this bug. ***
*** Bug 318668 has been marked as a duplicate of this bug. ***
Has this dropped through the cracks? Exploits based on the CounterTerrorism PoC have shown up in the wild (on warez sites and the like).
Flags: blocking1.8.0.1?
I can't say what is the plan for the bidi stuff, but the analysis, that I have done points to refactoring of the xul reflow. This is exactly what the reflow branch is supposed to accomplish. This is what the status whiteboards means. It also defines a timetable which does not match 1.8.0.1. 
My interpretation is that xul reflow triggers repeatedly a weakness of bidiframes. This can be fixed at two levels: xul reflow or bidi frames. XUL reflow is close to an f-word for me, I don't understand what all the voodoo here does but from the reflow logs its easy to see that it is bad. So while I prefer the small tweaks, I can't say what is right and wrong for xul. Neither do I know how to test xul reflow properly other than to hork the UI and get feedback. This is not an option for 1.8.0.1.
*** Bug 320760 has been marked as a duplicate of this bug. ***
*** Bug 320348 has been marked as a duplicate of this bug. ***
Daniel, this bug is basically about 1.9-type performance work.  It's not a security bug as far as I can see; just another performance issue (no more or less serious than the existing performance issues we have).

I see no reason this should block 1.8.anything, nor do I see a way to "fix" any of this that would be acceptable for the 1.8 branch.

Of course see comment 33.  If we break this down into subproblems and come up with safe fixes to some of them, great.
(In reply to comment #50)
> Daniel, this bug is basically about 1.9-type performance work.

Maybe now. It started with a particular IE exploit that "hangs" Firefox/Mozilla (temporarily) and which is now found in the wild. Because real people are stumbling across actual instances of this any partial fix that helps would be welcome. But I'll bow to reality and remove the 1.8.0.1 nomination.
Flags: blocking1.8.0.1? → blocking1.8.1?
Depends on: 319930
OK, but 319930 covers the Bidi performance issues.  bernd, do you want to file a bug blocking this one on the wrapping and min-width issue?
Depends on: 329878
Depends on: 330350
Would limiting the amount of text we allow in a prompt() call to something reasonable make the performance acceptable?  Is enough text such that the prompt is unusable needed to trigger the bad performance problem?
So something like this? This creates a limit of 10000 chars for alerts, prompts, etc.
Let's try to get Martijn's patch reviewed, 10k chars in a prompt is a sane limit.
Assignee: nobody → martijn.martijn
Flags: blocking1.8.1? → blocking1.8.1+
Comment on attachment 226446 [details] [diff] [review]
limit of 10000 chars

Martijn, can you land this ASAP on trunk?
Attachment #226446 - Flags: review+
Checking in toolkit/content/commonDialog.xul;
/cvsroot/mozilla/toolkit/content/commonDialog.xul,v  <--  commonDialog.xul
new revision: 1.8; previous revision: 1.7
done
Checking in toolkit/content/xul.css;
/cvsroot/mozilla/toolkit/content/xul.css,v  <--  xul.css
new revision: 1.77; previous revision: 1.76
done

Ok, I've checked the "limit of 10000 chars" in the trunk. But I'm not marking this bug fixed, because it's more a workaround for this bug than anything else.
Comment on attachment 226446 [details] [diff] [review]
limit of 10000 chars

Checked in on trunk since June 28 with no regressions pointing to it. I think we want wider (read: beta) testing on this one ...
Attachment #226446 - Flags: approval1.8.1?
Target Milestone: --- → mozilla1.8.1beta1
The patch contains the following change:

-<?xml version="1.0"?> 
+<?xml version="1.0"?> 

What's that all about?
Whiteboard: [reflow-refactor] → [reflow-refactor][checkin needed]
Comment on attachment 226446 [details] [diff] [review]
limit of 10000 chars

a=darin on behalf of drivers (sans-garbage in patch)
Attachment #226446 - Flags: approval1.8.1? → approval1.8.1+
Argh!
I can see that garbage in the trunk too.
It needs to be removed.
Sorry about that.
Checking in commonDialog.js;
/cvsroot/mozilla/toolkit/content/commonDialog.js,v  <--  commonDialog.js
new revision: 1.9.2.4; previous revision: 1.9.2.3
done
Checking in commonDialog.xul;
/cvsroot/mozilla/toolkit/content/commonDialog.xul,v  <--  commonDialog.xul
new revision: 1.6.2.2; previous revision: 1.6.2.1
done
Checking in xul.css;
/cvsroot/mozilla/toolkit/content/xul.css,v  <--  xul.css
new revision: 1.61.2.14; previous revision: 1.61.2.13
done

Checked "limit of 10000 chars" patch in on 1.8.1 branch (without the garbage).
And I removed the garbage from trunk.
Keywords: fixed1.8.1
Whiteboard: [reflow-refactor][checkin needed] → [reflow-refactor]
The checkins in 1.8_Branch leads to a regression:

Bug 344045
No longer blocks: 344045
Depends on: 344045
No longer depends on: 344045
I've backed out my patch on the 1.8.1 branch, so bug 344021 and bug 344045 should be fixed now. There is clearly an issue with patch, which wasn't happening with the trunk build, because I somehow forgot to check in the commonDialog.js part of the patch into trunk somehow :( See comment 58.
Keywords: fixed1.8.1
Ok, this still needs to be checked in on trunk to have the workaround working. Instead of adding the rule in xul.css, maybe it is better to add the css rule as an inline style in commonDialog.xul?
Attachment #228621 - Flags: review?(mconnor)
Gavin and I just agreed that it is better to use an inline style here.
Attachment #228622 - Flags: review?(mconnor)
And now without the unnecessary garbage that I keep getting for some reason. Sorry for all the spam.
Attachment #228621 - Attachment is obsolete: true
Attachment #228622 - Attachment is obsolete: true
Attachment #228629 - Flags: review?(mconnor)
Attachment #228621 - Flags: review?(mconnor)
Attachment #228622 - Flags: review?(mconnor)
Target Milestone: mozilla1.8.1beta1 → mozilla1.8.1beta2
Actually the perf issue has already been fixed on the reflow branch...
Ok, this patch has the following comment added:
+  // XXX the substr(0, 10000) part is a workaround for bug 317334
So it makes it clear that that part could be backed out when this bug is fixed.
Attachment #228629 - Attachment is obsolete: true
Attachment #229797 - Flags: review?(mconnor)
Attachment #228629 - Flags: review?(mconnor)
Attachment #229797 - Flags: review?(mconnor) → review+
"rest of patch that needs to be on trunk" checked in on trunk.
Attached patch branch patch (obsolete) — Splinter Review
This is equivalent of what went into trunk.
Attachment #232113 - Flags: review?(mconnor)
Comment on attachment 232113 [details] [diff] [review]
branch patch

>+<?xml version="1.0"?> 

Yick, this won't work out so well. Can you fix the patch to get rid of this goo?
Attached patch branch patchSplinter Review
> Yick, this won't work out so well. Can you fix the patch to get rid of this
> goo?

Sorry about that :(
Attachment #232113 - Attachment is obsolete: true
Attachment #233979 - Flags: review?(mconnor)
Attachment #232113 - Flags: review?(mconnor)
Comment on attachment 233979 [details] [diff] [review]
branch patch

Looks OK
Attachment #233979 - Flags: review?(mconnor) → review+
Attachment #233979 - Flags: approval1.8.1?
Comment on attachment 233979 [details] [diff] [review]
branch patch

a=beltzner on behalf of drivers, for landing on the MOZILLA_1_8_BRANCH. Please mark fixed1.8.1 when it lands.
Attachment #233979 - Flags: approval1.8.1? → approval1.8.1+
Whiteboard: [reflow-refactor] → [reflow-refactor][checkin needed (1.8.1 branch)]
Checked in to 1.8 branch on behalf of author.
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Whiteboard: [reflow-refactor][checkin needed (1.8.1 branch)] → [reflow-refactor]
Depends on: 361334
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: