quadratic behavior in JSScope::trace()/emptyShapeChange(), causing long gc pauses

RESOLVED FIXED in mozilla2.0b7

Status

()

defect
P2
critical
RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: tnikkel, Assigned: brendan)

Tracking

({perf})

Trunk
mozilla2.0b7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 final+)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(3 attachments, 3 obsolete attachments)

Posted file windbg
I run m-c nightlies and for the past few months I've been experiencing 20+ second GC pauses a few times a day. I did not used to experience this problem. I do not have any better idea of a regression range. My firefox.exe process usually uses between 500mb and 800mb, large, but not unreasonable. I've attached a windbg log with three backtraces during one such gc hang.

If it would be helpful I could try to bisect this problem using old nightlies, but it would be a long process as I only experience a few (about 1 to 5) of these pauses a day.
Severity: normal → major
Priority: -- → P2
   0  Id: 534.7b4 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr  
0012c62c 002ce207 mozjs!js::PropertyTree::emptyShapeChange(unsigned long oldEmptyShape = 0x12d2d0, unsigned long newEmptyShape = 0xc7b02e0)+0x2e [e:\builds\moz2_slave\mozilla-central-win32-nightly\build\js\src\jspropertytree.cpp @ 407]
0012c668 0024175d mozjs!js_TraceObject+0x8cfa7

void
PropertyTree::emptyShapeChange(uint32 oldEmptyShape, uint32 newEmptyShape)
{
    if (oldEmptyShape == newEmptyShape)
        return;

    PropertyRootEntry *rent = (PropertyRootEntry *) hash.entryStore;
    PropertyRootEntry *rend = rent + JS_DHASH_TABLE_SIZE(&hash);

    while (rent < rend) {
        if (rent->emptyShape == oldEmptyShape)
            rent->newEmptyShape = newEmptyShape;
        rent++;
    }

    ++emptyShapeChanges;
}

Thanks Timothy. This is a very good bug report. Looks like we are running in circles in that function.
Attachment #449563 - Attachment mime type: application/octet-stream → text/plaon
Attachment #449563 - Attachment mime type: text/plaon → text/plain
blocking2.0: --- → ?
Timothy, could you please try a tracemonkey nightly -- this may be a dup of bug 569391. The underlying pathological case will be removed by the patchwork for bug 558451. Thanks,

/be
Ok, running on tracemonkey nightlies now.
I just got another long pause using tracemonkey nightly 2010-06-07 (http://hg.mozilla.org/tracemonkey/rev/044852a34c7b).
Timothy, can you describe what you have in the way of add-ons, loaded pages, and things like GreaseMonkey scripts? It would be great to reproduce this.

I'll fix bug 558451 this week, but now I'm concerned that won't make some deeper bug go away. It will get rid of the need to renumber empty scopes' shapes.

/be
For addons I have Download Timestamp, FaviconizeTab, DOM Inspector, Uppity, Nightly Tester Tools, Modify Headers, Java Quick Starter (disabled), and Adblock Plus.

For pages loaded I have gmail, a bunch of bugzilla bug pages, google groups threads, google searches, wikipedia pages, and a bunch of random articles/blog posts that I will probably never get to. I can send a list of all my open tabs, but I obviously do not want to post that publicly.
Timothy, if you are game could you try running for a while with add-ons disabled? If you can't live without some, disable half (group them in a list so we can try bisecting if there's a "wolf in the fold"). Thanks again,

/be
Assignee: general → brendan
Depends on: 558451
Ok, I kept the three that I need most: Download Timestamp, FaviconizeTab, and Adblock Plus. If I get another pause I'll disable them individually to see if I can pin it on one of them.
Still running the same tracemonkey build (2010-06-07 nightly) I've gotten the long pauses with Download Timestamp, FaviconizeTab, and Adblock Plus enabled. And I've gotten long pauses with each one disabled individually. So it doesn't look to be caused by any specific addon.

Let me know if there is anything else you'd like me to try.
Timothy -- thanks, no need to do more. Ball's in my court.

/be
dbaron confirmed that this is the likely cause for his 30s+ GC delays as well.

This is a must fix for FF4 IMO. Please block on this.
Severity: major → blocker
OS: Windows 2000 → All
Hardware: x86 → All
Summary: long (20+ seconds) gc pauses → quadratic behavior in JSScope::trace()/emptyShapeChange(), causing long gc pauses
This is a patch against m-c tip (applies to tm tip right now with --fuzz=7 needed for jscntxt.h).

David, anyone willing to apply and run with JS_EMPTYCLASS_DUMPFILE=/tmp/ec or some such, please attach non-empty /tmp/ec.N (N is the GC run number) files that seem associated with overlong GC runtime here.

I'll build and run m-c and try to do the same.

/be
Don't generate a dumpfile if the emptyScopeClasses hashmap is empty.

Previous patch reversed dumpfile field on line order to put count first, for easier sort -nr, etc.

/be
Attachment #455929 - Attachment is obsolete: true
blocking2.0: ? → final+
The previous patch was missing initialization of the hashtable.
Attachment #455932 - Attachment is obsolete: true
Keywords: perf
Posted file gdb log
The revised patch still crashed, brendan says because of dangling JSClass pointers.

However, we enumerated a part of the hashtable (which was quite large) using gdb after the crash.

The live classes in the table were mostly the result of XBL.  nsXBLBinding::DoInitJSClass puts the address of the prototype in the JSClass, so there are a large number of unique classes.

Here's the full gdb output of our debugging session.
This is all fixed in my patch queue now for bug 558451. The code in question is gone, and good riddance!

/be
Severity: blocker → critical
Patch stack for bug 558451 landed on tm.

/be
Whiteboard: fixed-in-tracemonkey
be: is the intention to merge that tracemonkey branch change and fix this bug for 2.0?
(In reply to comment #19)
> be: is the intention to merge that tracemonkey branch change and fix this bug
> for 2.0?

Of course -- lots of stuff still to do for Mozilla 2, why wouldn't we fix this (and all of bug 558451)? Merge from tm to m-c may happen this week, even.

/be
coming from my other gui-stall bugreport 

https://bugzilla.mozilla.org/show_bug.cgi?id=579653#c11

i was pointed here regarding my comment #11 over there, as that seems to belong rather here someone more experienced said.


my original bugreport was also about complete gui freezes in firefox4 betas every now and then and huge loads of various internal xul.dll functions and whatnot.


if it helps please look at the comment #11
https://bugzilla.mozilla.org/show_bug.cgi?id=579653#c11
for further information.

this happens with both beta4 and beta5candidatebuild1.
thanks and regards.
See bug 558451 comment 166.

/be
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b6
You need to log in before you can comment on or make changes to this bug.