Closed Bug 750002 Opened 13 years ago Closed 12 years ago

this site causes a hang in all versions (release, beta, aurora...) http://weeklyworldnews.com/topstory/47323/strange-takeout/

Categories

(Core :: DOM: Core & HTML, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: pjpreilly, Assigned: Ms2ger)

References

()

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20120427 Firefox/15.0a1 Build ID: 20120427040203
spinning beach ball of death.
seems to work fine in chrome.
hangs in safe mode
Confirmed in http://hg.mozilla.org/mozilla-central/rev/b5c0d6abf69b Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120428 Firefox/15.0a1 ID:20120428030503 Unresponsive script dialog appears as follows: A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete. Script: http://w.sharethis.com/widget/?wp=MU&publisher=0bc81f68-2ca2-430f-a34f-5e54635a2cd9:1 Regression window: Works: http://hg.mozilla.org/mozilla-central/rev/a5e63e00db27 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111218 Firefox/11.0a1 ID:20111218031140 Unresponsive script dialog appears: http://hg.mozilla.org/mozilla-central/rev/543af61eee05 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111218 Firefox/11.0a1 ID:20111218021643 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a5e63e00db27&tochange=543af61eee05 Last good: c3525cd1ce44 First bad: f9f6f9ed788a Triggered by: f9f6f9ed788a Ms2ger — Bug 707576 - Remove nsIDOMNSElement; r=smaug
Blocks: 707576
Status: UNCONFIRMED → NEW
Component: Untriaged → DOM
Ever confirmed: true
Keywords: regression
Product: Firefox → Core
QA Contact: untriaged → general
Version: 15 Branch → Trunk
OS: Mac OS X → All
Ms2ger, this is the same issue as the hang from the Big Picture site that stopped being reproducible: it's using the same library (ShareThis). Can you please dig into this ASAP? Please let me know if I can help; I have some bandwidth available this week.
Assignee: nobody → Ms2ger
So I think we should go ahead and back out bug 707576 on m-c and aurora (but _not_ beta) and work on figuring out why that patch causes this problem.... If we can figure that out soon enough, and the fix is safe enough, maybe we can get that fix into beta?
akeybl was the originator of the suggestion in comment 6, btw. So we should do that. Ms2ger, want me to put the patches together?
As far as what's going on: my current best guess is that the script is sensitive to the precise order in which properties appear on elements and that bug 707576 changed that order...
Works fine in beta 13.0b2 5/4/12. By accident or intent?
Beta doesn't have the patch bug 707576 in it.
I have a patch locally (in a tree from just before the fix for 707576 landed) that moves everything from nsIDOMNSElement to nsIDOMElement except for the children attribute. No apparent problems with that patch. If I then additionally move the children property to nsIDOMElement then the problem shows up. If I however change nothing other than moving the children property from nsIDOMNSElement to nsIDOMElement, I also don't see any problems. Digging deeper...
Ok, this was no fun to debug. Here's what's going on here... What happens on trunk here is that this code ends up calling a function that does this: function SHARETHIS_merge() { var e = {}; for (var d = 0, a = arguments.length; d < a; d++) { var b = arguments[d]; if (SHARETHIS_typeof(b) != "object") { continue } for (var c in b) { var g = b[c], f = e[c]; e[c] = (f && SHARETHIS_typeof(g) == "object" && SHARETHIS_typeof(f) == "object") ? SHARETHIS_merge(f, g) : SHARETHIS_unlink(g) } } return e } Due to the fix for bug 707576 there's a difference in property order on DOM nodes, as bz suspected, and that leads to this code hitting a bunch of memory heavy property getters in this recursive function, .innerHTML and .outerHTML among others, and that ends up allocating gigs of memory and we start paging etc and we may or may not get to show the slow script dialog in a reasonable amount of time. Before the fix for bug 707576 this code simply ilooped hitting only fast property getters, eventually hitting the stack limit and being stopped in its track, so the site "worked", with some definition of "worked". I don't think there's anything we can do here, attempting to arbitrarily re-order our properties doesn't seem like a winning game here...
Oh, and that busted code comes from sharethis.com, which splatters share buttons of all kinds on sites that use it.
Arbitrary reordering would in fact not be winning, but reordering in a particular spec-required order might be.... But: Looks like XPConnect puts the "extra" interfaces that are not on the direct proto chain on the prototype first. So we used to have nsIDOMNodeSelector, then nsIDOMNSElement, then nsIDOMEventTarget, then nsIDOMElementCSSInlineStyle, then nsIDOMHTMLElement, then nsIDOMElement, then nsIDOMNode, etc. Modulo the fact that quickstubbed stuff always comes before non-quickstubbed stuff or something. So previousElementSibling and nextElementSibling came before innerHTML and guaranteed that we'd blow the stack before we got to innerHTML. In any case, per WebIDL the HTMLElement properties (including innerHTML) would be enumerated before the Element properties (like previousElementSibling), so things would still break. In particular, with nextElementSibling/previousElementSibling moved to nsIDOMElement things break even with XPConnect, as far as I can see. I looked at some other browsers. Opera lists the properties alphabetically. innerHTML comes after firstChild and firstElementChild but before things that might take us back up or to the left in the tree, so I would expect it to be hitting that innerHTML call. WebKit has some weird sort order but still has innerHTML coming before the tree-traversal stuff. I checked in Chrome, and it does throw a stack exhaustion exception here, in SHARETHIS_unlink. But it happens pretty quickly. I'll experiment a bit to see why. Opera throws the exception as well, and the site is unresponsive for a good long while there. I'm going to contact ShareThis about this. In the meantime, we should, imo, back out bug 707576 on trunk again for now, until they fix this. And maybe on Aurora, if it's not too late to rev all the IIDs there. Johnny, are you willing to make sure that either you or Ms2ger get to that?
Oh, and for the record, the SHARETHIS_unlink function, which starts like this: function SHARETHIS_unlink(c) { var a; switch (SHARETHIS_typeof(c)) { case "object": a = {}; for (var e in c) { a[e] = SHARETHIS_unlink(c[e]) } break; is the recursive culprit here.
Ah, the Chrome thing is simple. On a trivial testcase that involves a function like SHARETHIS_unlink, Chrome runs out of stack after about 7000 recursive calls. We run out of stack after about 150000 in Firefox 11, or after about 70000 in Firefox 10 (JS data structures got smaller in 11). Opera runs out after about 112000. So Chrome obviously runs for a lot less time before running out of stack. In fact, in my testing, they run for about 1/5 as long as we do, while only doing about 1/20 as much work...
I guess we could also set a lower cap on our recursion depth, but that would break other things, I suspect.
Ah, the recursion counts I was doing were off because they counted function-valued props. Chrome, of course, puts those higher up the proto chain, so never even gets to them. It looks like we get to about 23k deep recursion before throwing; Chrome gets to 7k deep. we walk a lot more properties per level...
Looks like WWN changed their website since it works on TFF 12 when it didn't previously?
Yes, something changed there. For what it's worth, Johnny and I contacted ShareThis and they're fixing this on their end too.
Worksforme, I guess....
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: