Closed Bug 507991 Opened 10 years ago Closed 10 years ago

Crash [@ nsSprocketLayout::PopulateBoxSizes] with object and position:absolute;overflow:scroll in binding


(Core :: Layout, defect, P2, critical)




Tracking Status
status1.9.2 --- beta2-fixed
blocking1.9.1 --- .8+
status1.9.1 --- .8-fixed


(Reporter: martijn.martijn, Assigned: tnikkel)



(4 keywords, Whiteboard: [sg:dos])

Crash Data


(5 files, 1 obsolete file)

Attached file testcase
I guess this might be related to bug 397596, no idea.
But since this has a different regression range, I'm filing this as a new bug.

See testcase, which crashes current trunk build (after hanging for a short while).
This seems to have regressed between 2009-05-19 and 2009-05-20:
0  	xul.dll  	nsSprocketLayout::PopulateBoxSizes  	 layout/xul/base/src/nsSprocketLayout.cpp:928
1 	xul.dll 	nsSprocketLayout::Layout 	layout/xul/base/src/nsSprocketLayout.cpp:249
2 	xul.dll 	nsBoxFrame::DoLayout 	layout/xul/base/src/nsBoxFrame.cpp:936
3 	xul.dll 	nsIFrame::Layout 	layout/xul/base/src/nsBox.cpp:543
4 	xul.dll 	nsSprocketLayout::Layout 	layout/xul/base/src/nsSprocketLayout.cpp:523
5 	xul.dll 	nsBoxFrame::DoLayout 	layout/xul/base/src/nsBoxFrame.cpp:936
6 	xul.dll 	nsBoxFrame::LayoutChildAt 	layout/xul/base/src/nsBoxFrame.cpp:2075
7 	xul.dll 	LayoutAndInvalidate 	layout/generic/nsGfxScrollFrame.cpp:2474
8 	xul.dll 	nsGfxScrollFrameInner::LayoutScrollbars 	layout/generic/nsGfxScrollFrame.cpp:2527
9 	xul.dll 	nsHTMLScrollFrame::Reflow 	layout/generic/nsGfxScrollFrame.cpp:871
10 	xul.dll 	xul.dll@0x3bb6fe
Group: core-security
Doesn't crash on OSX nor on Linux (debug).
Oh, sorry, it doesn't seem to crash online (I thought it did), probably because of bugzilla wackiness.
What I get is a loop:
#23325 0x184e2c13 in PresShell::FlushPendingNotifications (this=0xe41600, aType=Flush_InterruptibleLayout) at /Users/smaug/mozilla/hg/mozilla/layout/base/nsPresShell.cpp:4880
#23326 0x184c78bb in PresShell::DidDoReflow (this=0xe41600, aInterruptible=1) at /Users/smaug/mozilla/hg/mozilla/layout/base/nsPresShell.cpp:4778
#23327 0x184e233e in PresShell::ProcessReflowCommands (this=0xe41600, aInterruptible=1) at /Users/smaug/mozilla/hg/mozilla/layout/base/nsPresShell.cpp:7319
#23328 0x184e2c13 in PresShell::FlushPendingNotifications (this=0xe41600, aType=Flush_InterruptibleLayout) at /Users/smaug/mozilla/hg/mozilla/layout/base/nsPresShell.cpp:4880
#23329 0x184c78bb in PresShell::DidDoReflow (this=0xe41600, aInterruptible=1) at /Users/smaug/mozilla/hg/mozilla/layout/base/nsPresShell.cpp:4778
#23330 0x184e233e in PresShell::ProcessReflowCommands (this=0xe41600, aInterruptible=1) at /Users/smaug/mozilla/hg/mozilla/layout/base/nsPresShell.cpp:7319
#23331 0x184e2c13 in PresShell::FlushPendingNotifications (this=0xe41600, aType=Flush_InterruptibleLayout) at /Users/smaug/mozilla/hg/mozilla/layout/base/nsPresShell.cpp:4880

The stack isn't quite right, but DidDoReflow may call FlushPendingNotifications
via HandlePostedReflowCallbacks
I guess this is also related to bug 492978.
Attached patch patch (obsolete) — Splinter Review
The problem lies with this code in nsXBLService::LoadBindings

  // See if the URIs match.
  nsIURI* uri = styleBinding->PrototypeBinding()->BindingURI();
  PRBool equal;
  if (NS_SUCCEEDED(uri->Equals(aURL, &equal)) && equal)
    return NS_OK;

The testcase takes advantage using no fragment id refering to the first binding (that ability to do this was added in bug 366770), so when this code is executed it compares 'filename.html#a' with 'filename.html' and concludes they are different when it should conclude they are the same because 'a' is the id of the first binding. A similar issue but with recursive bindings using no fragment id was fixed in bug 432813 by fixing the issue in IsAncestorBinding in nsXBLService.cpp.

In this particular testcase we get into infinite loop because when the object gets bound to the tree in nsHTMLObjectElement::BindToTree a script runner is added that calls StartObjectLoad (this was changed from being a synchronous call to StartObjectLoad in bug 491063, which explains the regression range). The div with overflow: scroll gives us a scrollbar which has a reflow callback and that gets us into FlushPendingNotifications. (If you get rid of the scrollbars we don't get into the infinite loop of reflow and it doesn't hang; instead it just never stops loading.) When a script blocker in FlushPendingNotifications exits we run StartObjectLoad, when the object is loaded it triggers a ContentStatesChanged, and the FrameConstructor handles that by posting a frame reconstruct on the object content. That triggers a ReframeContainingBlock, which is the body. When we're reconstructing the frame for the span we call LoadBindings, and as I explained above, it thinks the binding URI is a new one so it recreates the object and div content and we start all over again.

I added a GetPrototypeBinding to nsIXBLDocumentInfo and then used this to determine if the current binding is the first binding, and compared the URI of the first binding minus the fragment id to the URI in question.

In addition to fixing the instance in nsXBLService::LoadBindings above, I also looked for any other place where we might compare binding URIs using equals that might cause problems. This found nsBindingManager::RemoveLayeredBinding and nsXBLService::GetBinding. nsXBLStreamListener::HasRequest does the same thing, but I think it should be ok, just not optimal performance if someone uses an empty fragment id.

In nsXBLService::GetBinding I disabled empty fragment id implying the first binding behaviour for bindings that extend other bindings because allowing it would have required a lot more code for something that I think no one really uses. Do we need to update the developer docs for this?

Do I need to bump the IID of nsIXBLService.h for just changing the name of one parameter?

Having the aDefaultReturn parameter to CompareBindingURIsWithPossiblyEmptyRef is a bit conservative. I'm ok with removing it.

This also fixes the testcases in bug 397596, bug 451198, and bug 492978 where I could see a problem (hang, crash, or never finishes loading).

Boris, are you the correct reviewer here?
Assignee: nobody → tnikkel
Attachment #393476 - Flags: review?(bzbarsky)
Blocks: 397596, 451198, 492978
Martijn and Jesse, are there any other testcases that use an empty fragment id to refer to the first binding besides this one, bug 397596, bug 451198, bug 492978, and bug 444490 (which I could not reproduce)?
I usually only use the first-binding trick for data: URIs, where it's required.  When I put the binding in the same XML file as the rest of the testcase, or in a separate file, I refer to it by ID.
Attachment #393476 - Flags: review?(bzbarsky) → review?(jonas)
Comment on attachment 393476 [details] [diff] [review]

Jonas can review this --- let's spread the load a bit :-)
> Do I need to bump the IID of nsIXBLService.h for just changing the name of one
> parameter?

No, since it doesn't change the ABI or the API (the meaning of the arguments, in this case).
Blocks: 507628
Attached patch patch v2Splinter Review
Whoops, forgot an important null check.

Also got rid of the unneeded IID change.
Attachment #393476 - Attachment is obsolete: true
Attachment #393729 - Flags: review?(jonas)
Attachment #393476 - Flags: review?(jonas)
Blocks: 366770
OS: Windows XP → All
Hardware: x86 → All
blocking1.9.1: --- → needed
blocking1.9.1: needed → ---
This bug seems to have stalled -- is it ready to land?
blocking1.9.1: --- → ?
Flags: wanted1.9.2?
It's waiting for review.
Marking blocking because it seems to knock off a bunch of similar crashes.
blocking1.9.1: ? → .5+
Flags: wanted1.9.2? → blocking1.9.2?
Priority: -- → P2
Whiteboard: [needs review] → [sg:dos][needs review]
Flags: blocking1.9.2? → blocking1.9.2+
What the hell is the story here?  Jonas, can we get this reviewed, please?
Anything I can do to help move this along?
Ugh, this stuff is a mess. I *think* the patch should work, but I can't say I'm confident that is the case.

If I understand this correctly, a lot of the complexity comes from the fact that prototypebinding->BindingURI() returns a URI different from one that can be used to link to the binding. I.e. BindingURI() can return a uri with a #foo reference, but you can link to the binding using an empty reference.

It seems like we could fix this much simpler/safer if we could maintain that  prototypebinding->BindingURI() always return the URI that is used to link to the binding. We might be able to do that by simply ensuring two things:

1. You can *only* link to a binding using a ref-less URI if the first binding in the document does *not* have an id.
2. When a binding does not have an id, make sure that prototypebinding->BindingURI() returns a uri without a ref.

This seems like a much safer change code-wise. I guess technically it could break content, but we're talking about content that uses XBL, and a new, and obscure, feature in XBL at that.

Timothy, does this make sense? I'd really prefer to go this safer route (if possible) given how complex and fragile this XBL code is already.

/ Jonas
Attached patch alternate patchSplinter Review
Implement the approach of comment 17.

The nsXBLService.cpp change is just reverting bug 432813 which is now unneeded.
Attachment #409302 - Flags: review?(jonas)
Attachment #393729 - Flags: review?(jonas)
Bug 451198 is actually a separate issue, although the previous patched fixed the testcase, the problem would have still been there by changing the id of the binding URI inside the <style> in the binding document.
No longer blocks: 451198
Ugh, so we didn't used to set mFirstBinding unless the binding had an id? That means that all current users must be specifying an id and thus will break with this approach?

Might not be a huge deal still, but you'll definitely need to fix a few mochitests to avoid turning the tree orange.

Or am I missing something?
I think you are correct.

A third approach that is not as ugly as my first approach and won't break anybody: have another URI member in nsXBLPrototypeBinding, call it the alternate URI, that is only non-null for the first binding and stores the equivalent empty ref URI of the binding. And add a member function to nsXBLPrototypeBinding that checks if a passed in URI will bind to this prototype binding (by comparing to both stored URIs). So we just need to replace uri->Equals(otheruri) calls in a few places with this new function, and add both URIs to the aDontExtendURIs array in nsXBLService::GetBinding.
But won't that keep a lot of the complexity of the first patch since it'll mean that there's two URIs that potentially could refer to a binding. I guess it's not clear to me what the patch would look like.

I actually think that the patch you've attached is fine. I'm farily certain that it'll just break a few mochitests and possibly a few local testcases that some of our testers have. Both of these can be easily fixed. The mochitests just needs to be fixed as part of the patch to keep the tree green.

I strongly doubt that it'll break any webpages.

So if you attach another patch that just fixes the automated tests we have in the tree I think we can land that along with the current patch.
Whiteboard: [sg:dos][needs review] → [sg:dos]
Implement the approach of comment 21.

This patch is a little bigger, but most of it is code removal or simplification.
Attachment #409433 - Flags: review?(jonas)
Whiteboard: [sg:dos] → [sg:dos][needs review]
Jonas, you granted review to two mutually exclusive approaches, are you indicating both are acceptable?
So either of these approaches should work. I feel slightly more confident that "alternate patch" won't have unintended consequences, and the only thing it breaks are things I'm happy to break.

My recommendation would be to land that patch together with a patch that fixes the various mochitests/reftests that depend on the behavior it changes.
I asked bz for his opinion, he said the 'alternate approach 3 patch' is "perhaps slightly nicer", so that is what I will land as that is the approach I prefer as well.
Whiteboard: [sg:dos][needs review] → [sg:dos]
Whiteboard: [sg:dos] → [sg:dos][needs landing]
Closed: 10 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [sg:dos][needs landing] → [sg:dos]
Target Milestone: --- → mozilla1.9.3a1
Attachment #409433 - Flags: approval1.9.1.6?
Attachment #409433 - Flags: superreview?(dveditz)
Comment on attachment 409433 [details] [diff] [review]
alternate approach 3 patch

>+PRBool nsXBLPrototypeBinding::CompareBindingURI(nsIURI* aURI) const
>+  PRBool equal;
>+  mBindingURI->Equals(aURI, &equal);
>+  if (!equal && mAlternateBindingURI) {
>+    mAlternateBindingURI->Equals(aURI, &equal);
>+  }
>+  return equal;

This replaces code that checks the return value of Equals(), and on the face of it there's a chance you'll use an uninitialized equal. It looks like all our current implementations of nsIURI::Equals() do make sure the return value is set even in the failure case, but I'm uncomfortable assuming that every URI scheme some add-on developers invents in the future will be as careful.

Please either initialize equal to PR_FALSE or check the return value from Equals().
Attachment #409433 - Flags: superreview?(dveditz) → superreview-
Ok, I'll make that change to m-c and 1.9.2.

I think we should take this off the radar and bump to the next dot release, I only realized it was even on the 1.9.1 radar when I landed this on 1.9.2 two days ago.
blocking1.9.1: .6+ → .7+
Comment on attachment 409433 [details] [diff] [review]
alternate approach 3 patch

Need a new patch anyway, please request approval1.9.1.7? on that one.
Attachment #409433 - Flags: approval1.9.1.6?
1.9.1 version with comment 29 issue fixed, if we want to take this on 1.9.1.
Timothy: This is definitely wanted. Please request approval on your patch if you think it's ready. It looks like it'll need superreview from dveditz too.
Attachment #417062 - Flags: superreview?(dveditz)
Attachment #417062 - Flags: approval1.9.1.7?
Comment on attachment 417062 [details] [diff] [review]
1.9.1 version of 'alternate approach 3 patch'

Approved for, a=dveditz for release-drivers
Attachment #417062 - Flags: superreview?(dveditz)
Attachment #417062 - Flags: superreview+
Attachment #417062 - Flags: approval1.9.1.7?
Attachment #417062 - Flags: approval1.9.1.7+
Verified using attached testcase for 1.9.1. Crashes on XP reliably. Does not crash the 1.9.1 nightly (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20100115 Shiretoko/3.5.8pre (.NET CLR 3.5.30729)).
Keywords: verified1.9.1
Target Milestone: mozilla1.9.2 → mozilla1.9.3a1
Group: core-security
Landed crashtest
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@ nsSprocketLayout::PopulateBoxSizes]
You need to log in before you can comment on or make changes to this bug.