Closed Bug 373462 Opened 14 years ago Closed 14 years ago

After destroying windows, Firefox pegs cpu and is unusable for a short while

Categories

(Core :: DOM: Core & HTML, defect)

x86
Windows XP
defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla1.9beta1

People

(Reporter: martijn.martijn, Assigned: smaug)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression, testcase)

Attachments

(5 files, 13 obsolete files)

591 bytes, text/html
Details
279 bytes, text/html
Details
7.03 KB, patch
Details | Diff | Splinter Review
6.94 KB, patch
Details | Diff | Splinter Review
24.73 KB, patch
jst
: review+
Details | Diff | Splinter Review
Attached file testcase
See testcase, which current trunk builds I get as result appr. 3700ms, while I normally would get (and expect) slightly more than 2000ms.
For a correct testcase, you need to load the testcase and then reload it.

With a branch build I get as result: appr. 2050ms (as expected)
With a 2007-02-10 trunk build I get as result: appr. 2100ms
With a 2007-02-11 trunk build I get as result: appr. 3000ms
Current trunk builds seem to be even slower with the testcase.
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-01-10+04&maxdate=2007-01-11+09&cvsroot=%2Fcvsroot
I think this is a regression from bug 366393, somehow.
Attached file testcase2
I'm seeing this also when clicking away a lot of alerts.
This testcase generates 30 alerts. Try clicking them away. I notice in current trunk builds, that somewhere, with 10 alerts to go or so, I get stuck and Firefox chews up all cpu power and is unresponsive for a few seconds. After that, I am able to continue to click the rest of the alerts away.
Should we try to delay GC/CC if user is somehow interacting
with the UI. That should help at least with the 2nd testcase when
clicking OK very fast.
(The patch is just quick hack to test the idea. )
Isn't HTML Content sink doing something similar...
Sorry, in comment 0 I meant to write the 2007-01-10 and 2007-01-11 trunk builds. The bonsai link is correct (points to the 2007-01-10/2007-01-11) regression range.
Flags: blocking1.9?
Blocks: 373540
There is some discussion possibly related to this in mozilla.dev.performance
bug 371200 might help this somewhat, though that's more of a bug about window closing times after firefox has been running for several days.
Flags: blocking1.9? → blocking1.9-
Whiteboard: [wanted-1.9]
This is now worksforme, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a4pre) Gecko/20070417 Minefield/3.0a4pre

Olli, you described your patch as " Quick hack for interaction delay", so it's not really something that you want in the tree, right?
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
That code isn't quite what I'd like to get, but something similar
might be good. IMO we shouldn't run CC when user is interacting
somehow, unless we get CC really fast.
This started happening again, so reopening :(
New regression range, between 2007-06-26 and 2006-06-28:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-06-26+04&maxdate=2007-06-28+09&cvsroot=%2Fcvsroot
Regression from bug 381199 or bug 385548?
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Flags: blocking1.9- → blocking1.9?
Dbaron: Do you know if we have any bugs that could cause the increase in time spent in CC? Or do you think this is simply how long it takes to run it?

In any event, it doesn't sound like a bad idea to add the key-or-mouse check to where we're currently running CC
Assignee: general → Olli.Pettay
Status: REOPENED → NEW
Flags: blocking1.9? → blocking1.9+
Well, suspecting everything we need to suspect (bug 385548) (and also the patch to suspect all native wrappers, which landed pretty soon after that window) probably slowed things down a bit.  Bug 381199 also would have made the effect happen the first 9 times, whereas previously it would have been delayed until the 10th.
Attachment #273119 - Attachment is obsolete: true
Attachment #273121 - Attachment is patch: true
Attachment #273121 - Attachment mime type: text/x-patch → text/plain
The patch seem to help with the testcase2 by delaying cc until user stops 
clicking "OK" button. That ofc means that there is probably more to
collect after all the alerts and the CC will possibly take more time.
So I'm not so sure if that is the behavior we want.
For user it might be better to not to have NS_INTERACTION_BREAK_LIMIT and 
NS_MAX_RESCHEDULING, but I think those are needed to ensure that cc is called
at least occasionally.
Any comments what we want to do here? Speeding up CC is one thing to do, but
delaying CC when user is interacting with the browser may make some sense, at 
least in some cases.
Martijn, you tried the patch? Did it make things to work any better, or was it
just so that cpu peek was moved to happen later, but so that it was still
as annoying as before?
Yeah, I tried the patch. It seemed to help for the testcase, but I still could see some cpu peek afterwards (and the browser also was not responding during that short period). So it wasn't as annoying as before.

But that being said, it seems I don't see this bug occuring anymore that strong (but also not even in the older builds, which is strange).
I guess it might be easily noticeable with slower computers, though (I'm using a 1.7GHz one).
It also might have happened, because I used a different, more polluted profile back then, I'm not sure.
Sorry for bringing such a mixed message :/
What i'd really like to see is an nsIObserver notification going out whenever user interaction starts and whenever it seems to have ended. Then the cyclecollecting code could hook up to those
Or alternatively create a service that you could ask if the user was active or not.
Attached patch inactive/active notifications (obsolete) — Splinter Review
This patch adds user-interaction-active/inactive notifications which are
posted every 5 secs. CC then calls collect if inactive or if there has been
many enough active notifications.
This does help with bug 385322 too, at least to some extent.
Attached patch v2 (obsolete) — Splinter Review
Jst, or anyone, what you think about this?
Notifications might be useful also elsewhere, not only for CC.
Attachment #275875 - Attachment is obsolete: true
Attachment #275949 - Flags: review?(jst)
Blocks: 385322
Status: NEW → ASSIGNED
Attached patch v2.0.1 (obsolete) — Splinter Review
Don't increase mDelayedCollectCount twice always when there is 
user-interaction-active, only to initialize delaying.
Attachment #275949 - Attachment is obsolete: true
Attachment #276982 - Flags: review?(jst)
Attachment #275949 - Flags: review?(jst)
Attached patch v3 (obsolete) — Splinter Review
Working better with page loads.
Attachment #276982 - Attachment is obsolete: true
Attachment #277128 - Flags: review?(jst)
Attachment #276982 - Flags: review?(jst)
(In reply to comment #21)
> Created an attachment (id=277128) [details]
> v3
I've been using this now for few days and so far working great, at least
on my quite fast machines.

I think as far as the code goes this patch looks good, but there's a couple of changes I'd like to see in how we schedule our GCs. As the patch is written, we've got a 10s timer that's used to check whether the user is active or inactive. If the user is inactive, we'll do a GC every time that timer fires, assuming we don't have a GC timer pending from a document load or we avoided 10 of the previous inactivity notifications (which means we're guaranteed to GC every 100s no matter what).

I think doing a GC every 10s when the user is inactive is too much. I also think user inactivity should be checked for at a faster interval, say 5s.

Maybe a good approach would be to separate the user inactivity notifications from being the direct trigger for GCs. In stead, we could have code that knows when the last GC ran, and checks at every user inactivity notification whether we should GC again. The time between GCs due to user inactivity could either be a constant time that's reasonably long (a minute, more?), or it could adapt to what the user is doing. I.e. if the user is away from the computer, lengthen the time between GCs as there's less likely to be anything worth collecting, and if the user is active, GC more often, but try to time things such that the GCs happen in between the user interacting with firefox.

It'll take some time to find the best algorithm for this, and if others have thoughts please speak up, but I think something like the above, combined with triggering GCs from other places where we're likely to generate a significant amount of garbage (i.e. XMLHttpRequest usage) would work pretty good.

And also, this *will* have an impact on how the browser behaves wrt paging etc, right now (w/o this change) it's theoretically possible to have the browser grow  (to a gig or what not) and get paged out if it's not used, but with a change like this we're likely to keep touching a lot of the memory the browser is using (due to leaks or bad memory ownership management most likely) and prevent the OS from being able to page the browser out to disk.

Thoughts anyone?
(In reply to comment #23)

> I think doing a GC every 10s when the user is inactive is too much. I also
> think user inactivity should be checked for at a faster interval, say 5s.

I agree

> or it could adapt to
> what the user is doing. I.e. if the user is away from the computer, lengthen
> the time between GCs as there's less likely to be anything worth collecting,
> and if the user is active, GC more often, but try to time things such that the
> GCs happen in between the user interacting with firefox.

Perhaps we could check whether CC found some cycles, and if it did, then
collect next time quite soon (20s when inactive?, longer when active?) 
but otherwise the timeout could increase up to few minutes.

> 
> It'll take some time to find the best algorithm for this, and if others have
> thoughts please speak up, but I think something like the above, combined with
> triggering GCs from other places where we're likely to generate a significant
> amount of garbage (i.e. XMLHttpRequest usage) would work pretty good.

Could we have some notification for may-cause-garbage operations. XHR for example could post notification after page load. Then after x number of
notifications cc could be called, even if user is active.
Hmm, perhaps some sort limit counter for cc calls (maybe a service), then if 
the limit is reached, always call cc. XHR load could increase the counter just
a bit, user activity notifications could have quite small impact too but
page load could increase it more.


> 
> And also, this *will* have an impact on how the browser behaves wrt paging 

Is there anyway we could avoid this, or reduce the impact. Though, I used
the patch for sometime and didn't see any dramatical changes in paging, I think.
(In reply to comment #24)
> (In reply to comment #23)
[...]
> Perhaps we could check whether CC found some cycles, and if it did, then
> collect next time quite soon (20s when inactive?, longer when active?) 
> but otherwise the timeout could increase up to few minutes.

That may be worth looking into, but I'm not sure we need that right away to get some testing out of this.

> > It'll take some time to find the best algorithm for this, and if others have
> > thoughts please speak up, but I think something like the above, combined with
> > triggering GCs from other places where we're likely to generate a significant
> > amount of garbage (i.e. XMLHttpRequest usage) would work pretty good.
> 
> Could we have some notification for may-cause-garbage operations. XHR for
> example could post notification after page load. Then after x number of
> notifications cc could be called, even if user is active.
> Hmm, perhaps some sort limit counter for cc calls (maybe a service), then if 
> the limit is reached, always call cc. XHR load could increase the counter just
> a bit, user activity notifications could have quite small impact too but
> page load could increase it more.

XHR could trigger the same thing that loading a page triggers, i.e. fire the current GC timer, followed by firing another timer if that timer happened to fall in the middle of a page load, etc. I.e. if we make XHR call nsJSContext::GC() (I have a patch for this, and to make that call static so no context is needed etc). That part could land separately from this broader GC scheduling change if need be.

> > And also, this *will* have an impact on how the browser behaves wrt paging 
> 
> Is there anyway we could avoid this, or reduce the impact. Though, I used
> the patch for sometime and didn't see any dramatical changes in paging, I
> think.

Short of doing what the Sun JVM's GC does now (if I remember what I read somewhere correctly) this is not trivially fixable. IIRC, they have code where they can find out whether a memory page is paged in or not and they avoid walking those pages when GC:ing. I'm sure they do walk them at some point, but not as frequently as they do normal GCs. This is something that MMgc might be able to do at some point, maybe, but it's not something we'd want the JS GC and the cycle collector to get involved in I don't think.

And really, this is not really a problem until things go really wrong and we leak a good portion of the available memory on a system, at which point we've already lost, really.
Attached patch v4 (obsolete) — Splinter Review
Keeping the patch still simple, but adding "higherProbability" parameter to
MaybeCC, so that after some user-interaction-xxx notification (which are posted 
every 5s) CC is called more likely. Higher probability is used also with page 
loads, lower probability is used with XHR.
CC is still more likely when user is *inactive*, we don't want to disturb user
too often and we can hope that user is idle every now and then.
user-interactivity counter is reset after ccollect, so if page load triggers
cc, next cc won't happen immediately, even if user is not active.
Attachment #277128 - Attachment is obsolete: true
Attachment #279777 - Flags: review?(jst)
Attachment #277128 - Flags: review?(jst)
Attached patch v5 (obsolete) — Splinter Review
Add still one limit; never call CC more often than NS_MIN_CC_INTERVAL.
Otherwise same as v4
Attachment #279777 - Attachment is obsolete: true
Attachment #280493 - Flags: review?(jst)
Attachment #279777 - Flags: review?(jst)
Comment on attachment 280493 [details] [diff] [review]
v5

I think this is worth getting in to see how it performs in the wild. r=jst
Attachment #280493 - Flags: review?(jst) → review+
Attachment #280493 - Flags: superreview?(jonas)
This is looking a bit weird to me. So we are always calling MaybeCC every 5 seconds and just adjusting the HigherProbability argument depending on user interaction during the last 30-60 seconds. Was that really intended? Also the fact that we can multiple times multiply sDelayedCCollectCount seems strange.

Shouldn't we do a CC as soon as the user goes inactive and then every X seconds if we feel it's needed? For some definition of "feel it's needed".

It would be good with some documentation of the intended algorithm to verify the code against.
(In reply to comment #29)
> This is looking a bit weird to me. So we are always calling MaybeCC every 5
> seconds and just adjusting the HigherProbability argument depending on user

just? MaybeCC is then called.

> interaction during the last 30-60 seconds. Was that really intended?

Yes, that was the intention. That is why the name "SOFT_LIMIT"

> Also the
> fact that we can multiple times multiply sDelayedCCollectCount seems strange.
> 

Multiplying CC several times increases the probability that CC is called, 
so if there are multiple aHigherProbability=PR_TRUE calls, CC certainly 
should be called quite soon.

> Shouldn't we do a CC as soon as the user goes inactive
That is one possible optimization.

> and then every X seconds
> if we feel it's needed? For some definition of "feel it's needed".
That is what the patch is trying to do, sort of, assuming CC is needed at least
every X seconds, but if there are pageloads or XHR, CC may happen more often.

Hmm.. I'd at least like to see a description, preferably as a comment in the code, of what the algorithm is doing. If you'd like you can put one here first so that I can review the patch against the description.
Comment for nsUserActivityObserver:
nsUserActivityObserver observes user-interaction-active and user-interaction-inactive notifications. It counts the number of notifications and if the
number is bigger than NS_CC_SOFT_LIMIT_ACTIVE (in case the current notification
is user-interaction-active) or NS_CC_SOFT_LIMIT_INACTIVE (current notification is
user-interaction-inactive) MaybeCC is called with aHigherParameter set to PR_TRUE,
otherwise PR_FALSE.

Comment for MaybeCC:
MaybeCC calls cycle collector if certain conditions are fulfilled.
The conditions are:
 - The timer related to page load (sGCTimer) must not be active.
 - At least NS_MIN_CC_INTERVAL milliseconds must have elapsed since the previous
   cycle collector call.
 - Certain number of MaybeCC calls have occurred.
   The number of needed MaybeCC calls depends on the aHigherProbability
   parameter. If the parameter is true, probability for calling cycle
   collector rises increasingly. If the parameter is all the time false, 
   at least NS_MAX_DELAYED_CCOLLECT MaybeCC calls are needed.

Is that enough?
Actually, what i'm more interested in is the desired results of all these calls, stuff like the 30-60 second wait, and what's the intended result after that wait. When we expect CC to happen etc.


One problem that I can see with the current code, if I'm reading it right, is that there isn't actually anything preventing CC from happening while the user is active, it's just making it half(?) as likely. If the user is inactive for 50 seconds and then starts mousing around, he'll still get a cycle collection after just a few seconds.

Hmm.. actually.. on second thought, would we in that case cycle collect after 30 seconds of inactivity? And a minimum of 30 seconds of activity is always required in order for a cycle collection to happen during user activity?

These are the types of questions i'd like for the description to explain. Anyhow, i'll review the patch as is.


My suggestion for when we should cycle collect, though this is of course not needed right away:
Cycle collect right away when "we think it's needed" if the user is currently inactive. If the user stays active for more than 60 seconds, force a cycle collection anyway. If the user goes inactive at any point during this period, cycle collect right away.
"we think it's needed" is for now defined as every 5 minutes, plus anytime after a pageload.  Anytime we do a CC we reset the 5 min counter. We could add additional points, such as after an XHR load.
Even better than the 5 min limit would be to do 2 minutes, but increase the limit if very few objects are collected.
(In reply to comment #33)
> One problem that I can see with the current code, if I'm reading it right, is
> that there isn't actually anything preventing CC from happening while the user
> is active
Right. We can't prevent cc totally even if user is active. CC *must* run every 
now and then, whether user is active or inactive, otherwise there will be OOM.
But better try to make it happen more likely when user is inactive.

>  We could add
> additional points, such as after an XHR load.
XHR calls MaybeCC, but sure we could add new points if we just find some 
useful cases. DOMParser maybe? x number of nsIContent::Unbind calls?

> Even better than the 5 min limit would be to do 2 minutes, but increase the
> limit if very few objects are collected.
Ofc that is just a guess that scripts won't start creating more garbage after
2 minutes. 

If we could somehow track memory usage and run after every XXMb allocations...
but again not when page is loading and hopefully not when user is active etc.

But, at least for now we could IMO use something simple, like the patch. 
For Moz2 and MMgc something else is perhaps needed.
I'm not super happy with the current patch. It looks to me like we will still in 75% of the cases do a CC right after finishing a load, whether the user is active or not.

With the patch the the sDelayedCCollectCount will basically have a random value between 0 and 30 when the load finishes. We'll then call MaybeCC(PR_TRUE) which means that any value over 7 will cause a CC. This means that about 75% of the time this patch will still cause the same CC pause described in this bug.

Unless I'm reading the code wrong :) Let me know if so.


What I suggest we do is instead is something like this:
1. Keep track of what the latest user-interaction-active/inactive notification was, i.e. if the user is currently active or not.
2. When a pageload finishes, call MaybeCC.
3. Let MaybeCC check if the latest notification was 'inactive', if so do a CC right away. If the latest was 'active' set a flag and start a timer A.
4. When we get an 'inactive' notification and the flag is set, do a CC right away, clear the flag and cancel timer A.
5. When timer A fires force a CC and cancel A even though the user is active.
6. When doing a CC restart a timer B. When B fires call MaybeCC.

I'd remove the aHigherProbablilty argument from MaybeCC as it's behavior is so different depending on what the value of sDelayedCCollectCount is, and its value is relatively random at any given point in time.


The timings for A and B is of course debatable. Setting them both to 1 minute would make the behavior similar to the current patch. I think the timing for timer B could probably be as high as 5 minutes. You could probably reuse the same nsITimer object for both A and B, like how jst did for the document loading. Dunno if that makes the code simpler or not.
Another thing with the current patch is that it takes 5 to 10 seconds to detect when the user goes inactive. I wonder if it'd be worth making that gap smaller. Ideal would be if we change the timer to fire every second when the user goes active, and then wait for 5 firings of the timer without key/mouse events before considering the user as inactive.

With the above algorithm we don't rely on notifications during inactive/active every 5 seconds, so you could just halt this timer as soon as an inactive notification has been sent which cause us to be swapped in less.
Some sort of counter is probably needed. If XHR is used often (like new XHR per 10msec), loaded documents may create quite a lot garbage. In that case
calling cc every 1 minute is too rarely. And I think we don't want to handle
XHR the same way as normal pageloads. XHR documents are (I think) usually 
smaller and are pretty much just data (no script etc).

Btw, the 75% estimate doesn't take account the min 10sec cc interval.
I added that interval to make sure that if XHR is used a lot, cc doesn't get 
called too often (just often enough, I hope).

Bug 385322 has a testcase, which shows why cc must be called quite often, 
also bug 383692 is relevant.
Yeah, a counter rather than a simple flag might not be a bad idea.

We could use a counter rather than the flag such that XHR loads increase the counter by 1 and normal page loads by 5. Whenever the counter is more than 3 we'll treat that as 'flag is set'. If the counter reaches 10 we'll do a CC right away.

The numbers above are just suggestions of course.
In which case the algorithm would start to work quite a bit like the current patch.
Would it? It still fixes the IMHO bad problem of that about 75% of the time the cycle collection still happens at undesirable times. I agree it would be less if the user spends less than 10 seconds on each page, but I'm not convinced that is very likely.

With my suggested approach we should relatively rarely cycle collect while the user is active. Only if the user stays active for more than a minute without pausing for 5 seconds we would.

Or if the user rapidly browses from page to page. But that could be adjusted by saying that we don't force a cycle collection until the counter reaches, say, 20. Then the user would have to browse to 4 different pages, without pausing at any of them for more than 5 seconds.
Attached patch v6 (obsolete) — Splinter Review
Modified the previous patch. CC is called when going to inactive state,
aHigherProbability not so aggressive (NS_PROBABILITY_MULTIPLIER 4 -> 3),
fewer CC calls because of NS_MAX_DELAYED_CCOLLECT 30 -> 45. 
Lowers the "75%" (not sure where that number exactly came up).

But also changed nsCycleCollector_collect so that it returns true if some nodes 
were collected. The information can be used to increase the probability for CC, 
and that is *really* useful to keep mem usage somewhat lower level with 
testcases like the ones in bug 385322. Fortunately at least usually when 
reading/scrolling web pages, CC calls don't collect anything; tested this for 
example with pages like bbc, gmail, gmaps (where some nodes are collected occasionally), devmo, yahoo, google, msn... So no need to call CC too often, 
but cases like bug 385322 really force us to call cc every now and then, 
and even quite often.

(This all is something we really must design properly for Moz2, or perhaps
MMgc does concurrent collecting - have to check that out.)
Attachment #280493 - Attachment is obsolete: true
Attachment #281194 - Flags: superreview?(jonas)
Attachment #280493 - Flags: superreview?(jonas)
Comment on attachment 281194 [details] [diff] [review]
v6

um, wrong patch. Right one coming in a minute.
Attachment #281194 - Flags: superreview?(jonas)
Attached patch fixed one comment (obsolete) — Splinter Review
Attachment #281194 - Attachment is obsolete: true
Attachment #281198 - Flags: superreview?(jonas)
Comment on attachment 281198 [details] [diff] [review]
fixed one comment

The 75% is described poorly in comment 35. Here's another try (with the old numbers):

When the user is interacting with a web page the sDelayedCCollectCount counter slowly increases from 0 until it hits 30, at which point we CC and reset it to 0. This means that a given random point in time sDelayedCCollectCount is a random value between 0 and 30.

When the user loads a new page (which we'll assume happens at random intervals) we'll call MaybeCC(PR_TRUE). If sDelayedCCollectCount is then higher than 7 we'll end up doing a CC. Assuming sDelayedCCollectCount is random between 0 and 30, that will happen 75% of the time.

With the new patch it'll happen 66% of the time. With the above logic  NS_MAX_DELAYED_CCOLLECT define doesn't actually affect the percentage, but only NS_PROBABILITY_MULTIPLIER.

Now, this math isn't entirely accurate. Due to the timer occasionally calling MaybeCC(PR_TRUE) we'll more often be in the lower range of 0-30 (0-45).

So my main concern of the patch still remains, that we're not checking for user activity when calling MaybeCC after a pageload. But I'm fine with checking this in and seeing what people think. If we keep getting complaints about freezing randomly we can adjust the algorithm then.


I'm more worried about the new forced CC every time the user goes from active->inactive. I'd rather rely on the regular CCs running often enough and non-disturbingly enough. So ideally I would like to see it simply removed.

But sr=me either way.
Attachment #281198 - Flags: superreview?(jonas) → superreview+
Attachment #281198 - Flags: approval1.9?
Target Milestone: --- → mozilla1.9 M9
urm, the patch caused terrible tp/tp2 regression on Linux bl-bldlnx03 :(
But no regression at all on MacOSX bm-xserve08 (there all the numbers went 
actually down a bit).
Ok, I think I have to change page load to cause CC the same way as before,
so not call ::MaybeCC there, but ::CC. That way cycle collector gets called
a lot more often during tp2, but each call takes shorter time.
I'm currently testing that approach, and CC seems to get called exactly as many
times as without the patch at all.
I'll re-land with that change, perhaps during weekend if not today.
Attached patch v7 (obsolete) — Splinter Review
Changed the 3 page load related ::MaybeCC calls to ::CC.
No other changes.
At least on my machine no ::MaybeCC initiated ::CC calls occur during 
tp2.
Attachment #281198 - Attachment is obsolete: true
(In reply to comment #46)
> a lot more often during tp2, but each call takes shorter time.

In real workloads, the time will be pretty close to constant, since most of the cost is tracing.  Doing two cycle collections in a row should take about twice as much time as doing just one.
hmm, well the patch reduced cc calls from ~60 to ~12 and tp/tp2 regressed badly.
And anyway, with the new patch page load related cc call happens exactly when
it happens without the patch, so I do think that should fix the regression.
Or at least I hope so.

Is CC really constant (or O(n))? In some cases when many (say 30) window objects 
get collected, CollectWhite and deletion of the objects takes lots of time.
That is though a case which happen usually only when closing a window with lots
of tabs or using "Close other tabs". (At some point I was thinking to divide
unlinking to smaller parts if there were thousands of white nodes. )
Hmm, unfortunately this bug may not get fixed with ::MaybeCC -> ::CC change,
but bug 385322 would be.
The case where the performance is the biggest problem is, I think, where the user has a lot of windows open and closes just one (or a few) of them.  In that case, the time is basically constant, since the largest term is O(N) in the (constant) number of existing windows.

There are cases (like Close Other Tabs) where the collection phase does take a bit of time.  And it's true that that part of the time occupied is not constant.  In the Tp test, which only has a single window open, that may be a significant part.
Note that for the tp tests it matters a lot *when* the CC happens. We only time the actual page loads, but then leave a pause between the loads. If CC happens during pageload that will affect tp, if it happens during the pause it will not.

So it sounds to me like the patch cause CC to happen during pageload, rather than during the pause, which is not what we want.

I'll write up a patch with my approach and see how that affects tp.
MaybeCC never runs when there is a page load timer active.
Some of the GlobalWindow objects seem to get deleted after cc has run, not 
while it is running. Probably some timer (nsCloseEvent doesn't seem to be 
responsible for these, at least not all of these).
If CC is delayed a lot in tp/tp2, there are occasionally quite many these
delayed deletions. 
Something like this:
(copy-pasted debug output, cc called right after (*), but before (**) )

Will run cycle collector (*)
--WEBSHELL 0x1c9a600 == 4
--WEBSHELL 0x13d06b0 == 3
--DOMWINDOW == 34
--DOMWINDOW == 33
--DOMWINDOW == 32
--DOMWINDOW == 31
--DOMWINDOW == 30
--DOMWINDOW == 29
Cycle collector did collect nodes (**)
--DOMWINDOW == 28
--DOMWINDOW == 27
--DOMWINDOW == 26
... (continues till 7)

Perhaps these delayed deletions happen while a new page is loading, that
would disturb page load performance quite badly.
Delayed releases happen when JS_GC runs, for example:
#0  0x00002aaab10a4f50 in ~nsGlobalWindow (this=0xc12dc0)
    at /home/smaug/mozilla/mozilla_cvs/mozilla/dom/src/base/nsGlobalWindow.cpp:528
#1  0x00002aaab10bca32 in ~nsGlobalChromeWindow (this=0xc12dc0)
    at /home/smaug/mozilla/mozilla_cvs/mozilla/dom/src/base/nsGlobalWindow.h:721
#2  0x00002aaab109191b in nsGlobalWindow::Release (this=0xc12dc0)
    at /home/smaug/mozilla/mozilla_cvs/mozilla/dom/src/base/nsGlobalWindow.cpp:757
#3  0x00002aaab10919aa in nsGlobalChromeWindow::Release (this=0x0)
    at /home/smaug/mozilla/mozilla_cvs/mozilla/dom/src/base/nsGlobalWindow.cpp:7925
#4  0x00002aaab06c7836 in XPCJSRuntime::GCCallback (cx=0x17a9870, status=JSGC_END)
    at /home/smaug/mozilla/mozilla_cvs/mozilla/js/src/xpconnect/src/xpcjsruntime.cpp:604
#5  0x00002aaab107f5ef in DOMGCCallback (cx=0x0, status=JSGC_END)
Running either CC (which calls GC) or just GC after page load.
This way JS doesn't keep windows alive too long, but
time consuming full CC doesn't occur too often for example with
testcase 2.
Attachment #281837 - Attachment is obsolete: true
Comment on attachment 282010 [details] [diff] [review]
v8, call at least JS_GC when page has loaded.

Still regress tp and txul a bit. But not tp2. And the regression is much smaller.
Firing notifications should be moved to ESM, because of embedding.
Attachment #282010 - Attachment is obsolete: true
Target Milestone: mozilla1.9 M9 → ---
Targeting M9 as this bug will block beta.
Target Milestone: --- → mozilla1.9 M9
Attached patch v9 (obsolete) — Splinter Review
Moved firing user-interaction-*** notifications to ESM, so that it always works 
also in embedded envs. (Didn't change the code from previous patches, just 
moved it from nsWindowRoot to ESM)
Changed NS_MIN_CC_INTERVAL from 10 seconds to 5 seconds so that CC isn't 
prevented too often. This changes the behavior to be a bit closer to the 
current always-call-cc-after-page-load-timer, but using the
testcase2, I don't see too many (or not at all) cc calls.
If CC isn't called after page load timer, GC is. That reduces GCs during page load.
This patch is, IMO, worth to try. v8 was already quite close.
Attachment #282409 - Flags: superreview?(jst)
Attachment #282409 - Flags: review?(jst)
Comment on attachment 282409 [details] [diff] [review]
v9

Yeah, let's give this a try. We can always tweak more once this is in etc...
Attachment #282409 - Flags: superreview?(jst)
Attachment #282409 - Flags: superreview+
Attachment #282409 - Flags: review?(jst)
Attachment #282409 - Flags: review+
Duplicate of this bug: 385322
Still regressed tp/txul
Blocks: 384323
I might want to retry landing this after Peterv's Bug 379718.
Whiteboard: [wanted-1.9]
I tried re-landing this with the change that cc was called twice. That almost
removed the tp/txul regressions on bl-bldlnx03. However talos/tp still shows the
regression.
Would it be bad to fix only bug 385322 part of this bug for m9.
That means basically replacing ::MaybeCCOrGC() calls with ::CC().
Actually, if Bug 379718 makes CC a bit faster, that might fix the rest of this
bug.
Attached patch Always call CC after page load (obsolete) — Splinter Review
To fix bug 385322 part of this bug.
Attached patch v 11Splinter Review
NS_MOUSE_ENTER/EXIT events should not change the state to
user-interaction-active, because those are dispatched also when page is loading
and mouse is over the page. mousemove and click etc. will still change the state.
Call CC() after page load if user is inactive, otherwise MaybeCC().
So this patch should keep the old behavior during page load if user is inactive.
Attachment #282409 - Attachment is obsolete: true
Attachment #285666 - Attachment is obsolete: true
Attachment #285684 - Flags: superreview?(jst)
Attachment #285684 - Flags: review?(jst)
Comment on attachment 285684 [details] [diff] [review]
v 11

Yeah, let's give this one a try. r+sr=jst
Attachment #285684 - Flags: superreview?(jst)
Attachment #285684 - Flags: superreview+
Attachment #285684 - Flags: review?(jst)
Attachment #285684 - Flags: review+
Calling CC twice caused small tdhtml regression, so I removed that.
Now tp/tdhtml/txul/talos look ok, marking this fixed.
We may want to change the #defines to optimize some usecases,
but perhaps in a different bug.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
verified fixed using Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9a9pre) Gecko/2007102605 Minefield/3.0a9pre

On testcase 1 i get "This took 2125ms" as result and not 3700 ms. Also testcase 2 is also working fine and i not stuck/hang on this testcase

-> Verified
Status: RESOLVED → VERIFIED
Duplicate of this bug: 383692
Flags: in-testsuite?
Blocks: 348156
Investigation of the bug 474586 showed that xpcshell does not initialize nsJSRuntime properly. But the nsJSRuntime::MaybeCC method (and probably other methods) is called without any check that nsJSRuntime is initialized. The simple fix can include checking that nsJSRuntime is initilized. But are there any other possibilities to fix the bug 474586? For example, is it possible to guarantee that nsJSRuntime is always initialized?
Depends on: 474586
Depends on: 508518
Olli, could you check if the description of user-interaction-* events at https://developer.mozilla.org/en/Observer_Notifications#Documents is correct?
The descriptions aren't correct anymore. user-interaction-inactive  is sent only once, not every x seconds.
Thanks, that's what I suspected. I took a guess as to what the description should say instead, but feel free to fix it in the wiki.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.