Last Comment Bug 570435 - quadratic behavior in JSScope::trace()/emptyShapeChange(), causing long gc pauses
: quadratic behavior in JSScope::trace()/emptyShapeChange(), causing long gc pa...
Status: RESOLVED FIXED
fixed-in-tracemonkey
: perf
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
P2 critical with 1 vote (vote)
: mozilla2.0b7
Assigned To: Brendan Eich [:brendan]
:
: Steven DeTar [:sdetar]
Mentors:
Depends on: 558451
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-06 16:47 PDT by Timothy Nikkel (:tnikkel)
Modified: 2010-09-12 14:28 PDT (History)
18 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
final+


Attachments
windbg (170.13 KB, text/plain)
2010-06-06 16:47 PDT, Timothy Nikkel (:tnikkel)
no flags Details
empty scope class counting instrumentation support, #ifdef DEBUG (3.51 KB, patch)
2010-07-04 00:18 PDT, Brendan Eich [:brendan]
no flags Details | Diff | Splinter Review
empty scope class counting instrumentation support, #ifdef DEBUG (3.56 KB, patch)
2010-07-04 00:45 PDT, Brendan Eich [:brendan]
no flags Details | Diff | Splinter Review
empty scope class counting instrumentation support, #ifdef DEBUG (3.67 KB, patch)
2010-07-04 01:38 PDT, Brendan Eich [:brendan]
no flags Details | Diff | Splinter Review
empty scope class counting instrumentation support, #ifdef DEBUG (4.48 KB, patch)
2010-07-18 15:28 PDT, David Baron :dbaron: 🏴󠁵󠁳󠁣󠁡󠁿 ⌚UTC-7 (if account gets disabled due to email bounces, ask a bugzilla admin to reenable it)
no flags Details | Diff | Splinter Review
gdb log (25.96 KB, text/plain)
2010-07-22 16:08 PDT, David Baron :dbaron: 🏴󠁵󠁳󠁣󠁡󠁿 ⌚UTC-7 (if account gets disabled due to email bounces, ask a bugzilla admin to reenable it)
no flags Details

Description User image Timothy Nikkel (:tnikkel) 2010-06-06 16:47:22 PDT
Created attachment 449563 [details]
windbg

I run m-c nightlies and for the past few months I've been experiencing 20+ second GC pauses a few times a day. I did not used to experience this problem. I do not have any better idea of a regression range. My firefox.exe process usually uses between 500mb and 800mb, large, but not unreasonable. I've attached a windbg log with three backtraces during one such gc hang.

If it would be helpful I could try to bisect this problem using old nightlies, but it would be a long process as I only experience a few (about 1 to 5) of these pauses a day.
Comment 1 User image Andreas Gal :gal 2010-06-06 18:26:06 PDT
   0  Id: 534.7b4 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr  
0012c62c 002ce207 mozjs!js::PropertyTree::emptyShapeChange(unsigned long oldEmptyShape = 0x12d2d0, unsigned long newEmptyShape = 0xc7b02e0)+0x2e [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\js\src\jspropertytree.cpp @ 407]
0012c668 0024175d mozjs!js_TraceObject+0x8cfa7

void
PropertyTree::emptyShapeChange(uint32 oldEmptyShape, uint32 newEmptyShape)
{
    if (oldEmptyShape == newEmptyShape)
        return;

    PropertyRootEntry *rent = (PropertyRootEntry *) hash.entryStore;
    PropertyRootEntry *rend = rent + JS_DHASH_TABLE_SIZE(&hash);

    while (rent < rend) {
        if (rent->emptyShape == oldEmptyShape)
            rent->newEmptyShape = newEmptyShape;
        rent++;
    }

    ++emptyShapeChanges;
}

Thanks Timothy. This is a very good bug report. Looks like we are running in circles in that function.
Comment 2 User image Brendan Eich [:brendan] 2010-06-07 07:13:32 PDT
Timothy, could you please try a tracemonkey nightly -- this may be a dup of bug 569391. The underlying pathological case will be removed by the patchwork for bug 558451. Thanks,

/be
Comment 3 User image Timothy Nikkel (:tnikkel) 2010-06-07 12:34:11 PDT
Ok, running on tracemonkey nightlies now.
Comment 4 User image Timothy Nikkel (:tnikkel) 2010-06-07 15:39:48 PDT
I just got another long pause using tracemonkey nightly 2010-06-07 (http://hg.mozilla.org/tracemonkey/rev/044852a34c7b).
Comment 5 User image Brendan Eich [:brendan] 2010-06-07 17:21:56 PDT
Timothy, can you describe what you have in the way of add-ons, loaded pages, and things like GreaseMonkey scripts? It would be great to reproduce this.

I'll fix bug 558451 this week, but now I'm concerned that won't make some deeper bug go away. It will get rid of the need to renumber empty scopes' shapes.

/be
Comment 6 User image Timothy Nikkel (:tnikkel) 2010-06-07 17:49:30 PDT
For addons I have Download Timestamp, FaviconizeTab, DOM Inspector, Uppity, Nightly Tester Tools, Modify Headers, Java Quick Starter (disabled), and Adblock Plus.

For pages loaded I have gmail, a bunch of bugzilla bug pages, google groups threads, google searches, wikipedia pages, and a bunch of random articles/blog posts that I will probably never get to. I can send a list of all my open tabs, but I obviously do not want to post that publicly.
Comment 7 User image Brendan Eich [:brendan] 2010-06-07 17:56:38 PDT
Timothy, if you are game could you try running for a while with add-ons disabled? If you can't live without some, disable half (group them in a list so we can try bisecting if there's a "wolf in the fold"). Thanks again,

/be
Comment 8 User image Timothy Nikkel (:tnikkel) 2010-06-07 18:09:34 PDT
Ok, I kept the three that I need most: Download Timestamp, FaviconizeTab, and Adblock Plus. If I get another pause I'll disable them individually to see if I can pin it on one of them.
Comment 9 User image Timothy Nikkel (:tnikkel) 2010-06-10 10:29:08 PDT
Still running the same tracemonkey build (2010-06-07 nightly) I've gotten the long pauses with Download Timestamp, FaviconizeTab, and Adblock Plus enabled. And I've gotten long pauses with each one disabled individually. So it doesn't look to be caused by any specific addon.

Let me know if there is anything else you'd like me to try.
Comment 10 User image Brendan Eich [:brendan] 2010-06-11 10:59:37 PDT
Timothy -- thanks, no need to do more. Ball's in my court.

/be
Comment 11 User image Andreas Gal :gal 2010-07-03 16:54:58 PDT
dbaron confirmed that this is the likely cause for his 30s+ GC delays as well.

This is a must fix for FF4 IMO. Please block on this.
Comment 12 User image Brendan Eich [:brendan] 2010-07-04 00:18:19 PDT
Created attachment 455927 [details] [diff] [review]
empty scope class counting instrumentation support, #ifdef DEBUG

This is a patch against m-c tip (applies to tm tip right now with --fuzz=7 needed for jscntxt.h).

David, anyone willing to apply and run with JS_EMPTYCLASS_DUMPFILE=/tmp/ec or some such, please attach non-empty /tmp/ec.N (N is the GC run number) files that seem associated with overlong GC runtime here.

I'll build and run m-c and try to do the same.

/be
Comment 13 User image Brendan Eich [:brendan] 2010-07-04 00:45:01 PDT
Created attachment 455929 [details] [diff] [review]
empty scope class counting instrumentation support, #ifdef DEBUG
Comment 14 User image Brendan Eich [:brendan] 2010-07-04 01:38:14 PDT
Created attachment 455932 [details] [diff] [review]
empty scope class counting instrumentation support, #ifdef DEBUG

Don't generate a dumpfile if the emptyScopeClasses hashmap is empty.

Previous patch reversed dumpfile field on line order to put count first, for easier sort -nr, etc.

/be
Comment 15 User image David Baron :dbaron: 🏴󠁵󠁳󠁣󠁡󠁿 ⌚UTC-7 (if account gets disabled due to email bounces, ask a bugzilla admin to reenable it) 2010-07-18 15:28:53 PDT
Created attachment 458219 [details] [diff] [review]
empty scope class counting instrumentation support, #ifdef DEBUG

The previous patch was missing initialization of the hashtable.
Comment 16 User image David Baron :dbaron: 🏴󠁵󠁳󠁣󠁡󠁿 ⌚UTC-7 (if account gets disabled due to email bounces, ask a bugzilla admin to reenable it) 2010-07-22 16:08:11 PDT
Created attachment 459625 [details]
gdb log

The revised patch still crashed, brendan says because of dangling JSClass pointers.

However, we enumerated a part of the hashtable (which was quite large) using gdb after the crash.

The live classes in the table were mostly the result of XBL.  nsXBLBinding::DoInitJSClass puts the address of the prototype in the JSClass, so there are a large number of unique classes.

Here's the full gdb output of our debugging session.
Comment 17 User image Brendan Eich [:brendan] 2010-07-23 13:05:28 PDT
This is all fixed in my patch queue now for bug 558451. The code in question is gone, and good riddance!

/be
Comment 18 User image Brendan Eich [:brendan] 2010-08-30 12:31:53 PDT
Patch stack for bug 558451 landed on tm.

/be
Comment 19 User image Ben Bucksch (:BenB) 2010-08-30 12:36:13 PDT
be: is the intention to merge that tracemonkey branch change and fix this bug for 2.0?
Comment 20 User image Brendan Eich [:brendan] 2010-08-30 12:41:22 PDT
(In reply to comment #19)
> be: is the intention to merge that tracemonkey branch change and fix this bug
> for 2.0?

Of course -- lots of stuff still to do for Mozilla 2, why wouldn't we fix this (and all of bug 558451)? Merge from tm to m-c may happen this week, even.

/be
Comment 21 User image abittner 2010-09-07 09:07:08 PDT
coming from my other gui-stall bugreport 

https://bugzilla.mozilla.org/show_bug.cgi?id=579653#c11

i was pointed here regarding my comment #11 over there, as that seems to belong rather here someone more experienced said.


my original bugreport was also about complete gui freezes in firefox4 betas every now and then and huge loads of various internal xul.dll functions and whatnot.


if it helps please look at the comment #11
https://bugzilla.mozilla.org/show_bug.cgi?id=579653#c11
for further information.

this happens with both beta4 and beta5candidatebuild1.
thanks and regards.
Comment 22 User image Brendan Eich [:brendan] 2010-09-12 14:28:38 PDT
See bug 558451 comment 166.

/be

Note You need to log in before you can comment on or make changes to this bug.