Last Comment Bug 750002 - this site causes a hang in all versions (release, beta, aurora...) http://weeklyworldnews.com/topstory/47323/strange-takeout/
: this site causes a hang in all versions (release, beta, aurora...) http://wee...
Status: RESOLVED WORKSFORME
: regression
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: x86 All
: -- normal (vote)
: ---
Assigned To: :Ms2ger
:
Mentors:
http://weeklyworldnews.com/topstory/4...
Depends on:
Blocks: 707576
  Show dependency treegraph
 
Reported: 2012-04-28 18:58 PDT by Phil
Modified: 2012-05-31 21:44 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Testcase for comment 16 (932 bytes, text/html)
2012-05-13 18:39 PDT, Boris Zbarsky [:bz]
no flags Details

Description Phil 2012-04-28 18:58:42 PDT
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20120427 Firefox/15.0a1
Build ID: 20120427040203
Comment 1 Phil 2012-04-28 19:00:05 PDT
spinning beach ball of death.
Comment 2 Phil 2012-04-28 19:08:01 PDT
seems to work fine in chrome.
Comment 3 Phil 2012-04-28 19:09:28 PDT
hangs in safe mode
Comment 4 Alice0775 White 2012-04-29 07:17:09 PDT
Confirmed in 
http://hg.mozilla.org/mozilla-central/rev/b5c0d6abf69b
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120428 Firefox/15.0a1 ID:20120428030503


Unresponsive script dialog appears as follows:
A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete.
Script: http://w.sharethis.com/widget/?wp=MU&publisher=0bc81f68-2ca2-430f-a34f-5e54635a2cd9:1

Regression window:
Works:
http://hg.mozilla.org/mozilla-central/rev/a5e63e00db27
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111218 Firefox/11.0a1 ID:20111218031140
Unresponsive script dialog appears:
http://hg.mozilla.org/mozilla-central/rev/543af61eee05
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111218 Firefox/11.0a1 ID:20111218021643
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a5e63e00db27&tochange=543af61eee05

Last good: c3525cd1ce44
First bad: f9f6f9ed788a

Triggered by:
f9f6f9ed788a	Ms2ger — Bug 707576 - Remove nsIDOMNSElement; r=smaug
Comment 5 Boris Zbarsky [:bz] 2012-04-29 18:12:09 PDT
Ms2ger, this is the same issue as the hang from the Big Picture site that stopped being reproducible: it's using the same library (ShareThis).  Can you please dig into this ASAP?  Please let me know if I can help; I have some bandwidth available this week.
Comment 6 Boris Zbarsky [:bz] 2012-04-29 21:20:30 PDT
So I think we should go ahead and back out bug 707576 on m-c and aurora (but _not_ beta) and work on figuring out why that patch causes this problem....  If we can figure that out soon enough, and the fix is safe enough, maybe we can get that fix into beta?
Comment 7 Boris Zbarsky [:bz] 2012-04-29 21:23:05 PDT
akeybl was the originator of the suggestion in comment 6, btw.  So we should do that.  Ms2ger, want me to put the patches together?
Comment 8 Boris Zbarsky [:bz] 2012-04-29 21:34:16 PDT
As far as what's going on: my current best guess is that the script is sensitive to the precise order in which properties appear on elements and that bug 707576 changed that order...
Comment 9 Phil 2012-05-04 16:15:19 PDT
Works fine in beta 13.0b2 5/4/12. By accident or intent?
Comment 10 Boris Zbarsky [:bz] 2012-05-04 16:33:36 PDT
Beta doesn't have the patch bug 707576 in it.
Comment 11 Johnny Stenback (:jst, jst@mozilla.com) 2012-05-09 18:15:15 PDT
I have a patch locally (in a tree from just before the fix for 707576 landed) that moves everything from nsIDOMNSElement to nsIDOMElement except for the children attribute. No apparent problems with that patch. If I then additionally move the children property to nsIDOMElement then the problem shows up. If I however change nothing other than moving the children property from nsIDOMNSElement to nsIDOMElement, I also don't see any problems.

Digging deeper...
Comment 12 Johnny Stenback (:jst, jst@mozilla.com) 2012-05-12 23:48:48 PDT
Ok, this was no fun to debug. Here's what's going on here...

What happens on trunk here is that this code ends up calling a function that does this:

    function SHARETHIS_merge() {
        var e = {};
        for (var d = 0, a = arguments.length; d < a; d++) {
            var b = arguments[d];
            if (SHARETHIS_typeof(b) != "object") {
                continue
            }
            for (var c in b) {
                var g = b[c],
                    f = e[c];
                e[c] = (f && SHARETHIS_typeof(g) == "object" && SHARETHIS_typeof(f) == "object") ? SHARETHIS_merge(f, g) : SHARETHIS_unlink(g)
            }
        }
        return e
    }

Due to the fix for bug 707576 there's a difference in property order on DOM nodes, as bz suspected, and that leads to this code hitting a bunch of memory heavy property getters in this recursive function, .innerHTML and .outerHTML among others, and that ends up allocating gigs of memory and we start paging etc and we may or may not get to show the slow script dialog in a reasonable amount of time.

Before the fix for bug 707576 this code simply ilooped hitting only fast property getters, eventually hitting the stack limit and being stopped in its track, so the site "worked", with some definition of "worked".

I don't think there's anything we can do here, attempting to arbitrarily re-order our properties doesn't seem like a winning game here...
Comment 13 Johnny Stenback (:jst, jst@mozilla.com) 2012-05-12 23:49:56 PDT
Oh, and that busted code comes from sharethis.com, which splatters share buttons of all kinds on sites that use it.
Comment 14 Boris Zbarsky [:bz] 2012-05-13 18:25:20 PDT
Arbitrary reordering would in fact not be winning, but reordering in a particular spec-required order might be....  But:

Looks like XPConnect puts the "extra" interfaces that are not on the direct proto chain on the prototype first.  So we used to have nsIDOMNodeSelector, then nsIDOMNSElement, then nsIDOMEventTarget, then nsIDOMElementCSSInlineStyle, then nsIDOMHTMLElement, then nsIDOMElement, then nsIDOMNode, etc.  Modulo the fact that quickstubbed stuff always comes before non-quickstubbed stuff or something.  So previousElementSibling and nextElementSibling came before innerHTML and guaranteed that we'd blow the stack before we got to innerHTML.

In any case, per WebIDL the HTMLElement properties (including innerHTML) would be enumerated before the Element properties (like previousElementSibling), so things would still break.  In particular, with nextElementSibling/previousElementSibling moved to nsIDOMElement things break even with XPConnect, as far as I can see.

I looked at some other browsers.  Opera lists the properties alphabetically.  innerHTML comes after firstChild and firstElementChild but before things that might take us back up or to the left in the tree, so I would expect it to be hitting that innerHTML call.  WebKit has some weird sort order but still has innerHTML coming before the tree-traversal stuff.

I checked in Chrome, and it does throw a stack exhaustion exception here, in SHARETHIS_unlink.  But it happens pretty quickly.  I'll experiment a bit to see why.

Opera throws the exception as well, and the site is unresponsive for a good long while there.

I'm going to contact ShareThis about this.

In the meantime, we should, imo, back out bug 707576 on trunk again for now, until they fix this.  And maybe on Aurora, if it's not too late to rev all the IIDs there.  Johnny, are you willing to make sure that either you or Ms2ger get to that?
Comment 15 Boris Zbarsky [:bz] 2012-05-13 18:27:43 PDT
Oh, and for the record, the SHARETHIS_unlink function, which starts like this:

    function SHARETHIS_unlink(c) {
        var a;
        switch (SHARETHIS_typeof(c)) {
        case "object":
            a = {};
            for (var e in c) {
                a[e] = SHARETHIS_unlink(c[e])
            }
            break;

is the recursive culprit here.
Comment 16 Boris Zbarsky [:bz] 2012-05-13 18:38:26 PDT
Ah, the Chrome thing is simple.  On a trivial testcase that involves a function like SHARETHIS_unlink, Chrome runs out of stack after about 7000 recursive calls.  We run out of stack after about 150000 in Firefox 11, or after about 70000 in Firefox 10 (JS data structures got smaller in 11).  Opera runs out after about 112000.  

So Chrome obviously runs for a lot less time before running out of stack.  In fact, in my testing, they run for about 1/5 as long as we do, while only doing about 1/20 as much work...
Comment 17 Boris Zbarsky [:bz] 2012-05-13 18:39:28 PDT
Created attachment 623546 [details]
Testcase for comment 16

I guess we could also set a lower cap on our recursion depth, but that would break other things, I suspect.
Comment 18 Boris Zbarsky [:bz] 2012-05-14 10:41:43 PDT
Ah, the recursion counts I was doing were off because they counted function-valued props.  Chrome, of course, puts those higher up the proto chain, so never even gets to them.

It looks like we get to about 23k deep recursion before throwing; Chrome gets to 7k deep.  we walk a lot more properties per level...
Comment 19 Phil 2012-05-21 06:37:44 PDT
Looks like WWN changed their website since it works on TFF 12 when it didn't previously?
Comment 20 Boris Zbarsky [:bz] 2012-05-21 06:53:30 PDT
Yes, something changed there.  For what it's worth, Johnny and I contacted ShareThis and they're fixing this on their end too.
Comment 21 Boris Zbarsky [:bz] 2012-05-31 21:44:17 PDT
Worksforme, I guess....

Note You need to log in before you can comment on or make changes to this bug.