Closed Bug 667250 Opened 13 years ago Closed 13 years ago

Poor response to text input with large profile, gets far worse over time

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 707063

People

(Reporter: jesup, Unassigned)

References

Details

(Keywords: perf)

Attachments

(3 files)

(This may end up in DOM or layout or editor one more analysis is done.)

Very poor response to typed characters - at times it's immediate, at other times it can take 5-20 seconds for the character to echo or delete to be done.

The problem starts with small pauses right after load of a large profile (500 tabs), but only ~15 tabs are loaded iniitally (using the pref for blocking loading of unfocused tabs: browser.sessionstore.max_concurrent_tabs=0) - poor-mans version of BarTab.  After the profile has been loaded for a while (29 hours), with maybe 25 or 30 loaded tabs, it gets very, very bad.  I've seen similar behavior in 4.0, 4.01, and now in 5.0 on WinXP.

Trunk build from yesterday, with jprof enabled.  Athlon X2 ~3GHz, 3GB ram, VSS ~1.4GB, RSS ~1.0GB.  Display is via remote X.  Ubuntu 8.04.

Jprof to be attached of me just typing into a bugzilla Comment field.  Overall, around 70% appears to be in handling stuff invoked by JS, and ~22% in the CycleCollector.

~43% is used in nsCSSFrameConstructor::ProcessPendingRestyles(); drilling down from there 33% is used in SelectorMatchesTree(), with 8.6% directly in that function, and it calls SelectorMatches() which is 25% (15.6% directly in SelectorMatches()).

ProcessPendingRestyles() appears to be invoked via JaegerShot eventually to nsIDOMNSHTMLElement_GetOffsetHeight(), which flushes notifications on the document.

Another 22% of top-level usage is in CycleCollector(), 11% of that is JS_GetCompartmentPrivate().
Keywords: perf
(In reply to comment #0)
> ~43% is used in nsCSSFrameConstructor::ProcessPendingRestyles(); drilling
> down from there 33% is used in SelectorMatchesTree(), with 8.6% directly in
> that function, and it calls SelectorMatches() which is 25% (15.6% directly
> in SelectorMatches()).
> 
> ProcessPendingRestyles() appears to be invoked via JaegerShot eventually to
> nsIDOMNSHTMLElement_GetOffsetHeight(), which flushes notifications on the
> document.

What's doing that, sessionstore?
Good question - seems quite possible that it would be invoked by typing (to save form data), though why would it cause restyling?  What's the easiest way to see what JS code is getting run (and how long it takes)?

There's another recent bug (bug 665997) by someone else indicating problems with pausing that seem similar (in his case, on an encrypted filesystem - that could cause extra overhead re-writing session store info).

And why would it get so much worse after we're running for a while?  Ok, maybe fragmentation/leaks could cause the GC to take longer, but we're only 22% GC in this trace.
I'll note that I do run the SessionManager extension.  Itf you think that could be involved, I can try disabling it and see what happens (though it's tough since I need to run for a long time and use it a fair bit).
I thought you could look at the JS stack to see what's getting offsetHeight.
Testing without SessionManager is a good idea, yes.

It sounds like at least the following are going on:

1)  Something is triggering restyles near the root of the document (what?) and
    flushing layout, which triggers those style reresolves.
2)  Those style reresolves involve rematching, not just reparenting; why?  That
    depends on what's triggering the restyles!
3)  Lots of cycle collection time; could indicate leaks.

I'd start by attaching gdb to the build and seeing where that offsetHeight get is coming from.  In a debug build you can use this macro:

  def jsstack
     call DumpJSStack()
  end

In an opt build, you would need to do that work yourself in gdb.  Luckily, you only really want the top stack frame.

> What's the easiest way to see what JS code is getting run (and how long it
> takes)?

Firebug?  But that needs a restart, whereas attaching GDB does not.
It wasn't even getting called, so I loaded about 10 tabs, and now it's getting called regularly.  Pauses are still small.

Not sure what tabs these are coming from yet.  Tired, turning in for now.

Traps seem to mostly be in http://cdn.gumgum.com/javascripts/ggv2.js, but I've seen some elsewhere, such as  http://static.ak.fbcdn.net/rsrc.php/v1/yB/r/IIeFDzJo603.js
Those seem to kick off a lot of GetOffsetHeight()s - a lot.  I set ignore to 100000 in gdb, and ate breakfast, and came back and it had triggered.

I suspect once I analyze the JS code we'll find that it likes to wake up a lot and do "something".  This is some advertising framework, it looks like, so it may be used by other sites as well.

Note I have not verified this is the problem I reported; this is tested on a fairly "fresh" unused profile that hasn't gotten to the point of LONG delays.  And I don't know that these are the cause of the shorter 1.5-2 sec delays I see every 5-10 seconds while typing, just that it's getting called, and that they're what all the cycles were going to in my jprof.  Also note that even in the jprof, while there's a strong likelihood that these calls are involved in the long delays I saw, it's not guaranteed - if it's something blocking as opposed to using cycles, that could cause the delays without it getting flagged in jprof.
This gets called every 1200ms or 600ms (perhaps it depending on if the tab is focused), and seems to kick off the action that I've caught.

(gdb) call DumpJSStack()
0 anonymous(E = [object HTMLImageElement]) ["http://cdn.gumgum.com/javascripts/ggv2.js":150]
    D = undefined>
    C = 34
    this = [object Object]
1 anonymous(3248) ["http://cdn.gumgum.com/javascripts/ggv2.js":200]
    G = [function]
    F = "16|http://www.babble.com/global/images/share.gif"
    E = [object HTMLImageElement]
    H = [object HTMLCollection]
    D = 16
    this = [object Window]
2 <TOP LEVEL> ["<unknown>":0]
    <failed to get 'this' value>


Line 150-152 of ggv.js (Ugh - there's more, you'd really need to look at it.)

GGGV.initAT=function(B){var A={zoff:B.zo||false,safe:B.sf,jit:B.jit,pscan:B.ps||false,pvid:B.pvid?B.pvid:"0-0",minHeight:~~B.mh||150,minWidth:~~B.mw||150,scanLapse:3000,UUID:0,UUIDPrefix:"ggat"+(+new Date()),handShaked:false,requestUrl:GGGV.baseUrl+"/assets/new",container:document.createElement("div"),restartTO:false,sizeFilter:function(E){var C=E.offsetWidth,D=E.offsetHeight;
return((!isNaN(C)&&!isNaN(D))&&(C>A.minWidth&&D>A.minHeight)&&!(C==160&&D==600)&&!(C==180&&D==150)&&!(C==200&&D==200)&&!(C==250&&D==250)&&!(C==300&&D==250)&&!(C==336&&D==280))
},getHashIndex:function(C){if("jquery" in C){C=C.get(0)
}return(C.nodeType===1&&C.tagName.toLowerCase()==="img")?[A.uniqueID(C),GGGV.normalizeURL(C.src)].join("|"):null

... etc ...

The only references to 600 and 1200 are in lines 1-5 of the script:

if(typeof window.ggv2id!=="undefined"&&window.ggv2id!==null&&typeof window.GGGV==="undefined"){var GGGV={rev:"$LastChangedRevision: 9035 $".match(new RegExp("[0-9]+"))[0],baseUrl:"http://g2.gumgum.com",jSource:"http://c.gumgum.com/javascripts/ggjquery-v4-min.js",imgExt:!(/MSIE 6/.test(navigator.userAgent))?"png":"gif",euc:function(A){return window.encodeURIComponent(A)
},pubId:ggv2id,domready:false,container:document.createElement("div"),modules:{},services:{},domQueue:[],foundAssets:{},loops:{render:{to:null,stack:[],fx:function renderLoop(){GGGV.forEachIn(GGGV.loops.render.stack,"render")
},tm:600},update:{to:null,stack:[],fx:function updateLoop(){GGGV.forEachIn(GGGV.loops.update.stack,"update")
},tm:1200}},errors:{}};

Sheesh, what a mess due to stripping.
I did another profile of it after it's been running a while (though not a huge amount of use).  Still showing the 1-3 second pauses (perhaps sometimes 5).

Though a shorter test, this profile looks exactly like the first one I posted.  ~42% in nsIDOMNSHTMLElement_GetOffsetHeight(), 31% in CycleCollector().

Still not sure it's the cause of the pauses, but it's certainly the cause of a fair bit of CPU use.  This could either be a bad JS app (oh well), or it could be an unexpected result of a (semi-?) reasonable JS app.
When the pauses get long enough, can you try "gdb attach <pid>" during a pause to get a backtrace?
Now that they've gotten longer, I tried breaking it in GDB during a pause, and I think I found it.  It's not babble, if this is right (and I'll try this a few more times to see if it's consistent).

I'm breaking in nsIDOMNSHTMLElement_GetOffsetHeight(), but invoked from http://s.huffpost.com/assets/js.php?f=hp_scrollbar.js%2Ctwitter_clientside.js&v49251 line 93 (called form line 95 I think).

I had 3 Huffpost tabs open; they always open a twitter display for the article.  I assume there aren't new messages on these older articles, but my guess is that they do something on a timer, and they may have a leak (or cause a leak).  Of course, I now froze all of cygwin and my remote X session with it, so I'll need to try again.
Ok, it looks like we have multiple causes:

a) the cause of the "big slowdown", which produces the nsIDOMNSHTMLElement_GetOffsetHeight() calls (See Huffpost, Babble, and I think other JS scripts are triggering that as well).

b) SessionStore is also causing pauses - I closed the only HuffPost tab loaded (15 windows; 15 or 16 tabs loaded, ~500 tabs total), and caught it in a pause I believe.  I'll attach the backtrace, but the last DumpJSStack() entry is "file:///home/jesup/src/mozilla/head/obj-i686-pc-linux-gnu-opt/dist/bin/components/nsSessionStore.js":2230

Next step may be to use the js triggers for jprof to turn on and off jprof at entry and exit of the sessionstore code, to figure out where it's spending it's time.  More generally, I suspect (haven't checked) the SessionStore code runs on the main thread to completion (and perhaps even writes synchronously), and thus pauses everything while it's running, which is a problem, especially as tabs and windows increase.
Spun off bug 669034 for the sessionstore issue, which in turn has spun off bug 669603 on Google bloating sessionstore data
In bug 705597 I describe how I have a 6MB sessionstore.js file that I reduce down to 880k simply by removing 'about:blank' entries. Is that related?
(In reply to Chris AtLee [:catlee] from comment #17)
> In bug 705597 I describe how I have a 6MB sessionstore.js file that I reduce
> down to 880k simply by removing 'about:blank' entries. Is that related?

That's definitely related and causes long pauses with certain sites.
(In reply to Randell Jesup [:jesup] from comment #13)
> I'm breaking in nsIDOMNSHTMLElement_GetOffsetHeight(), but invoked from
> http://s.huffpost.com/assets/js.php?f=hp_scrollbar.js%2Ctwitter_clientside.
> js&v49251 line 93 (called form line 95 I think).
> 
> I had 3 Huffpost tabs open; they always open a twitter display for the
> article.  I assume there aren't new messages on these older articles, but my
> guess is that they do something on a timer, and they may have a leak (or
> cause a leak).

Ok, while investigating other issues, I've proven that one of the problem is definitely huffington posts's twitter feed.  Leave it open for a day on a 'busy' page, and performance will degrade horribly.  File bug 707063 for that issue.

As bug 669034 and bug 707063 cover the basic stuff I initially found here, I'm going to close this bug as a dup of 707063.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: