565217 - make cycle collector interruptible

Reporter

Description

•

15 years ago

We have discussed this a couple times before. The cycle collector should be interruptible. Currently it is not because its intertwined with the JS garbage collector. We will break that dependency and allow the CC to be aborted at any time (i.e. when a mouse/keyboard/network event arrives). This will hide the latency, and at the same time allow us to CC more often.

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Depends on: 549806

Andreas Gal :gal

Reporter

Comment 1

•

15 years ago

We should really do this for 1.9.3, but we can probably do it past the first beta.

Priority: -- → P2

Target Milestone: --- → mozilla1.9.3

Wayne Mery (:wsmwk)

Updated

•

15 years ago

Keywords: meta, perf

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Depends on: 565229

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Blocks: 490122

Andreas Gal :gal

Reporter

Comment 2

•

15 years ago

The GC has to leave behind gray bits for the CC. bug 565229 will make sure the bitmap survives the GC phase.

Andreas Gal :gal

Reporter

Comment 3

•

15 years ago

I fixed bug 565229. Its available on tracemonkey and will be merged into m-c shortly.

Andreas Gal :gal

Reporter

Updated

•

15 years ago

Hardware: x86 → All

Andreas Gal :gal

Reporter

Comment 4

•

14 years ago

After talking to a few people, here is my design for a partially concurrent, interruptible cycle collector: - Add the ability to observe when the main thread is about to block on the native event queue: 1) Add 2 new methods to nsIThreadObserver, beforeWait() and afterWait(). 2) Change nsBaseAppShell to loop with DoProcessNativeEvent(mayWait = FALSE) until we are out of events, then call beforeWait(), call DoProcessNativeEvent(mayWait = TRUE), and then afterWait(). - Move the cycle collector into its own thread. It blocks on a condvar until its triggered, and notifies a condvar when its done. Hook up the current CC trigger point to do just this. This will get the CC out of the main thread's cache on multi core machines and is a win in itself. 3) Write up beforeWait() and afterWait() to call into the cycle collector controller. beforeWait() starts a CC on the other thread if some time has passed. The CC runs concurrently to us blocking on the native event queue. When afterWait() is called, 2 cases are possible: * We are still building the CC model. Increment an atomic counter in afterWait, which the CC thread samples. It will abort what it does and return to its waiting stage. * We are done building the CC model. The CC continues, finds cycles, and builds up a bunch of nsIRunnables that contain a list of dead objects to be Unlink()ed. Send the list to the main thread for disposal when done and resume the waiting stage. I think I can do all of the above tasks, but most of this stuff is parallelizable and I am new to the gecko code base, so if I can get help with some of these tasks, that would be great.

Boris Zbarsky [:bzbarsky]

Comment 5

•

14 years ago

Question. Given a webpage with a 10ms interval timer on it, won't the main thread wake up every 10ms? And will that mean that only CC runs that take less than 10ms will be able to complete while that web page is loaded?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 6

•

14 years ago

How do you plan to deal with all cycle-collected objects being main-thread-only? Did you consider using the concurrent CC algorithm in the Bacon & Ranjan paper? Also, do you happen to have any performance benchmarks for cycle collection pause performance? I'd sort of like to figure out at some point if my work on bug 549793 actually helps.

Andreas Gal :gal

Reporter

Comment 7

•

14 years ago

dbaron, the cycle collector thread merely is a vessel to get the cycle collector out of the main thread's cache. It only runs concurrently to the main thread _waiting_. We can also let the actual cycle detection run concurrently since it only walks the internal model. Disposing of xpcom objects happens on the main thread. I would like to get this into FF4, so I am trying to do the minimum surgery for that. The B&R concurrent algorithm is after FF4 I think. And yeah, bug 549793 can certainly not hurt, in particular because I was hoping to run the CC a lot more often with this patch, since we can stop it at any time. That way individual CCs will hopefully be faster, too.

Andreas Gal :gal

Reporter

Comment 8

•

14 years ago

#5: Yes, the CC can be starved, forcing us eventually to eventually do a blocking CC. Better than doing a blocking one every time :)

Andreas Gal :gal

Reporter

Comment 9

•

14 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Add beforeIdle/afterIdle to nsBaseAppShell. This doesn't work on macosx because macosx doesn't run our event loop. Need mac help.

Boris Zbarsky [:bzbarsky]

Comment 10

•

14 years ago

ccing folks who might know mac appshell.

Steven Michaud [:smichaud] (Retired)

Comment 11

•

14 years ago

> macosx doesn't run our event loop This isn't true ... at least not literally. (Note that I haven't yet studied the patch, and may understand the issues better after I've done that.)

Andreas Gal :gal

Reporter

Comment 12

•

14 years ago

Basically what I need is for nsBaseAppShell to be told when the main thread blocks for native events, and the moment it comes back (beginIdle, endIdle). I couldn't find a way to make that happen.

Steven Michaud [:smichaud] (Retired)

Comment 13

•

14 years ago

I'll look into that tomorrow.

Andreas Gal :gal

Reporter

Comment 14

•

14 years ago

Cool, thanks a lot.

Steven Michaud [:smichaud] (Retired)

Comment 15

•

14 years ago

Attached patch Testing patch that works on OS X (obsolete) — Details — Splinter Review

With your (Andreas') patch from comment #9, BeforeIdle() and AfterIdle() are called on OS X while pages are being loaded or reloaded, but not (apparently) at other times. I assume this is what you meant by saying that it "doesn't work" on OS X. Turns out the reason is that ProcessNextNativeEvent() isn't the only place native events are processed. Many (possibly most) of them get processed during the monolithic call to [NSApp run] in nsAppShell::Run(). This isn't easy to work around. [NSApp run] is called once when the app starts up, and stays running until it terminates. But it basically just calls [NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] in a tight loop. So at first I thought I could replace the call to [NSApp run] with something like what I do in nsAppShell::ProcessNextNativeEvent(). But this led to strange problems (for example not being able to quit the app), from which I conclude that the call to [NSApp run] performs special, undocumented magic -- which makes it difficult or impossible to replace. The only other solution I can think of is to hook calls to [NSApplication sendEvent:]. Every native event goes through this handler/method. But it's tricky to find out where it's been called from (nsAppShell::Run() or nsAppShell::ProcessNextNativeEvent()), so I thought it best to have platform-specific calls to BeforeIdle() and AfterIdle() entirely replace cross-platform calls (i.e. those made from nsBaseAppShell::SpinNativeEventLoop()). This is what my current patch does. A (partially or wholly) platform-specific solution to this problem is required on OS X. But allowing other platforms the option to support BeforeIdle() and AfterIdle() "natively" may give finer-grained control over when they're called -- which may turn out to be useful. I changed BeforeIdle() and AfterIdle() to be static methods -- otherwise I'd have needed to create a global pointer to the current appshell. I suspect this may actually be more desireable. But if not my patch will need to be changed. Please try my patch out, and let me know what you think.

Andreas Gal :gal

Reporter

Comment 16

•

14 years ago

Awesome. Thanks a lot. Giving a try now.

Andreas Gal :gal

Reporter

Comment 17

•

14 years ago

Steven, this doesn't seem to work right yet. BeforeIdle 119 AfterIdle BeforeIdle 120 cc: Starting nsCycleCollector::Collect(1) Breakpoint 1, nsCycleCollector_beginCollection () at ../../../xpcom/base/nsCycleCollector.cpp:3236 3236 return sCollector ? sCollector->BeginCollection() : PR_FALSE; (gdb) bt #0 nsCycleCollector_beginCollection () at ../../../xpcom/base/nsCycleCollector.cpp:3236 #1 0x0000000100cf16ce in XPCCycleCollectGCCallback (cx=0x11a773300, status=JSGC_MARK_END) at ../../../../../js/src/xpconnect/src/nsXPConnect.cpp:361 #2 0x0000000104188e2b in GC (cx=0x11a773300) at ../../../js/src/jsgc.cpp:3337 #3 0x0000000104189181 in GCUntilDone (cx=0x11a773300, gckind=GC_NORMAL) at ../../../js/src/jsgc.cpp:3705 #4 0x00000001041892cb in js_GC (cx=0x11a773300, gckind=GC_NORMAL) at ../../../js/src/jsgc.cpp:3759 #5 0x000000010411d5ef in JS_GC (cx=0x11a773300) at ../../../js/src/jsapi.cpp:2410 #6 0x0000000100cf0cb1 in nsXPConnect::Collect (this=0x106417d00) at ../../../../../js/src/xpconnect/src/nsXPConnect.cpp:449 #7 0x0000000101399f06 in nsCycleCollector::Collect (this=0x1050a9600, aTryCollections=1) at ../../../xpcom/base/nsCycleCollector.cpp:2527 #8 0x000000010139a071 in nsCycleCollector_collect () at ../../../xpcom/base/nsCycleCollector.cpp:3224 #9 0x0000000100891def in nsJSContext::CC () at ../../../dom/base/nsJSEnvironment.cpp:3582 #10 0x0000000100891e4c in nsJSContext::IntervalCC () at ../../../dom/base/nsJSEnvironment.cpp:3670 #11 0x0000000100892214 in nsUserActivityObserver::Observe (this=0x11a772e20, aSubject=0x0, aTopic=0x1017b5cee "user-interaction-inactive", aData=0x0) at ../../../dom/base/nsJSEnvironment.cpp:270 #12 0x000000010132ce7e in nsObserverList::NotifyObservers (this=0x10529d708, aSubject=0x0, aTopic=0x1017b5cee "user-interaction-inactive", someData=0x0) at ../../../xpcom/ds/nsObserverList.cpp:130 #13 0x000000010132df16 in nsObserverService::NotifyObservers (this=0x104b1d6b0, aSubject=0x0, aTopic=0x1017b5cee "user-interaction-inactive", someData=0x0) at ../../../xpcom/ds/nsObserverService.cpp:182 #14 0x00000001006a3854 in nsUITimerCallback::Notify (this=0x1187842f0, aTimer=0x104bc5640) at ../../../../content/events/src/nsEventStateManager.cpp:282 #15 0x000000010138a5e2 in nsTimerImpl::Fire (this=0x104bc5640) at ../../../xpcom/threads/nsTimerImpl.cpp:430 #16 0x000000010138a7e6 in nsTimerEvent::Run (this=0x11af2e0b0) at ../../../xpcom/threads/nsTimerImpl.cpp:519 #17 0x0000000101383846 in nsThread::ProcessNextEvent (this=0x104b19620, mayWait=0, result=0x7fff5fbfcf04) at ../../../xpcom/threads/nsThread.cpp:547 #18 0x0000000101311d6d in NS_ProcessPendingEvents_P (thread=0x104b19620, timeout=20) at nsThreadUtils.cpp:200 #19 0x000000010119f4a8 in nsBaseAppShell::NativeEventCallback (this=0x118719100) at ../../../../widget/src/xpwidgets/nsBaseAppShell.cpp:127 #20 0x0000000101153cd2 in nsAppShell::ProcessGeckoEvents (aInfo=0x118719100) at ../../../../widget/src/cocoa/nsAppShell.mm:398 #21 0x00007fff87a84f21 in __CFRunLoopDoSources0 () #22 0x00007fff87a83119 in __CFRunLoopRun () #23 0x00007fff87a828df in CFRunLoopRunSpecific () #24 0x00007fff85cc2ada in RunCurrentEventLoopInMode () #25 0x00007fff85cc28df in ReceiveNextEventCommon () #26 0x00007fff85cc2798 in BlockUntilNextEventMatchingListInMode () #27 0x00007fff86bb6a4a in _DPSNextEvent () #28 0x00007fff86bb6399 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] () #29 0x00007fff86b7c06f in -[NSApplication run] () #30 0x0000000101153571 in nsAppShell::Run (this=0x118719100) at ../../../../widget/src/cocoa/nsAppShell.mm:763 #31 0x0000000100ede930 in nsAppStartup::Run (this=0x118830c80) at ../../../../../toolkit/components/startup/src/nsAppStartup.cpp:192 #32 0x000000010002824d in XRE_main (argc=1, argv=0x7fff5fbfeaa0, aAppData=0x104b16200) at ../../../toolkit/xre/nsAppRunner.cpp:3628 #33 0x00000001000011f9 in main (argc=1, argv=0x7fff5fbfeaa0) at ../../../browser/app/nsBrowserApp.cpp:158 (gdb) You can see here that cycle collection is hit while we are supposed to be in idle. The AfterIdle notification happens after the cycle collection is done. I think timer events go to the app shell, so we have to coordinate the code you wrote with the app shell code somehow.

Andreas Gal :gal

Reporter

Comment 18

•

14 years ago

A couple people ask today about the distribution of collection times. This is a mostly idle browser: This is the part of CC that will run while the browse idles: cc: Starting nsCycleCollector::Collect(1) cc: mRuntimes[*]->BeginCycleCollection() took 0ms cc: SelectPurple() took 2ms This is the part that will run on a background thread, assuming we manage to get the above done while we idle (2ms, doable!). cc: MarkRoots() took 19ms cc: ScanRoots() took 4ms cc: RootWhite() took 0ms [not sure this can run concurrently, investigating] This is the part that will happen back on the main thread: cc: CollectWhite() took 0ms The last part of Collect() is the JS GC running to completion. With my patch to disentangle the CC from JS GC this won't happen here at all, and we have separate bugs to make this time much less (individual compartments get GCed). cc: Collect() took 112ms I also checked the idle times. At least on MacOSX we get very decent idle times, even with content like cnn or gmail open (seconds of idle as long the mouse isn't moved). Last but not least, here is a more realistic CC (a couple windows open): cc: Starting nsCycleCollector::Collect(1) cc: mRuntimes[*]->BeginCycleCollection() took 1ms cc: SelectPurple() took 6ms Note again that the actual Bacon & Rajan algorithm takes a long time, not building the CC model. Also, the GC part of the model building won't happen here any more. cc: MarkRoots() took 43ms cc: ScanRoots() took 7ms cc: RootWhite() took 0ms cc: CollectWhite() took 0ms

Steven Michaud [:smichaud] (Retired)

Comment 19

•

14 years ago

(In reply to comment #17) > so we have to coordinate the code you wrote with the app shell code > somehow I take this to mean that we're only really "idle" when neither native events nor Gecko events are being processed. (Your stack shows that timer "events" are Gecko events.) New patch coming up.

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Depends on: 573060

Steven Michaud [:smichaud] (Retired)

Comment 20

•

14 years ago

Attached patch OS X testing patch rev2 (obsolete) — Details — Splinter Review

Here's a revision of my patch that suspends the operation of its native idle processing whenever Gecko processes an event (and therefore "coordinates" its platform-specific idle handling with your original patch's cross-platform idle handling). It should guarantee that the app is never considered idle while either a Gecko event or a native event is being processed. But I'm afraid you're not going to like it. It appears that the Gecko event queue wakes up every small fraction of a second to process events (presumably timer events). So the app is never idle for long. (I've left in the option to choose entirely platform-specific idle handling. I still think it may turn out to be useful.)

Steven Michaud [:smichaud] (Retired)

Updated

•

14 years ago

Attachment #452111 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Comment 21

•

14 years ago

Do you have a ballpark number handy? On my laptop with a few windows open I can get past the critical region of the cycle collector in less than 20ms or so. After that its ok if gecko wakes up. The cycle collector will collect to completion in the background and we don't have to throw away the heap model (if the wakeup comes earlier, we have to stop the collector and re-try later).

Steven Michaud [:smichaud] (Retired)

Comment 22

•

14 years ago

> Do you have a ballpark number handy? No, I don't. I just watched the printf statements flow by in the console log :-)

Steven Michaud [:smichaud] (Retired)

Comment 23

•

14 years ago

> It appears that the Gecko event queue wakes up every small fraction > of a second to process events (presumably timer events). So the app > is never idle for long. It occurs to me that some (most?, almost all?) of these events may be dummy events created in nsBaseAppShell::OnProcessNextEvent(). If so, they deserve special handling -- the app should still be considered idle while the Gecko event loop is processing one of these events (if it's the only event on the stack). I'll look into this.

Boris Zbarsky [:bzbarsky]

Comment 24

•

14 years ago

Hmm. Why do we have those events? Are those what keeps mac builds at near-constant ~10% cpu usage? (Separate bug, I know.)

Steven Michaud [:smichaud] (Retired)

Comment 25

•

14 years ago

> Are those what keeps mac builds at near-constant ~10% cpu usage? No. They're used on all platforms.

Steven Michaud [:smichaud] (Retired)

Comment 26

•

14 years ago

(By the way (and this is off-topic), I got some ideas for possibly speeding up performance of nsAppShell::ProcessNextNativeEvent() while working on this bug. I'm currently working like a dog on a new version of the JEP. But at some point in the next few weeks, when (hopefully) I have more time, I can start looking into them. When I do, I'll need your help, Boris.)

Boris Zbarsky [:bzbarsky]

Comment 27

•

14 years ago

Sounds good. I've seen that come up a lot in profiles....

Steven Michaud [:smichaud] (Retired)

Comment 28

•

14 years ago

(Following up comment #23) No, I was wrong -- they really are almost all nsTimerEvent events. I haven't been able to figure out what kind, exactly.

Boris Zbarsky [:bzbarsky]

Comment 29

•

14 years ago

Might be good to file a followup bug on that? And looking up the relevant nsTimerImpl mCallbackType types would be a good start....

Brendan Eich [:brendan]

Comment 30

•

14 years ago

Yeah, no mysteries -- we hate 'em. Please file, we can dup if needed. /be

Shawn Wilsher :sdwilsh

Comment 31

•

14 years ago

(In reply to comment #28) > No, I was wrong -- they really are almost all nsTimerEvent events. > > I haven't been able to figure out what kind, exactly. Note that images can spawn a bunch of them. See bug 502694.

Steven Michaud [:smichaud] (Retired)

Comment 32

•

14 years ago

I've done all I can here -- I really need to get back to work on the JEP.

Andreas Gal :gal

Reporter

Comment 33

•

14 years ago

Thanks for the help Steven! Testing your patch now.

Andreas Gal :gal

Reporter

Updated

•

14 years ago

Depends on: 573175

Steven Michaud [:smichaud] (Retired)

Comment 34

•

14 years ago

Attached patch Timer/event logging patch (obsolete) — Details — Splinter Review

> Thanks for the help Steven! You're most welcome! > And looking up the relevant nsTimerImpl mCallbackType types would be > a good start.... Given this hint, I found it isn't too difficult to log when every C/C++ timer on the trunk fires. So here's one last contribution. It also logs the class-name of all Gecko events as they run. Since it uses rtti, you need to add the following to your mozconfig: ac_add_options --enable-cpp-rtti With it, I was able to discover that almost all of the troublesome timer events' target is nsGlobalWindow::TimerCallback(). So it/they are probably installed from JavaScript code. And so's there's still a lot of work left to be done. Sigh. We really need to make timers easier to debug. At a minimum, there should be a debug flag that makes all timers (C/C++ and JavaScript ones) log identifying information every time they fire.

Steven Michaud [:smichaud] (Retired)

Comment 35

•

14 years ago

Attached patch Timer/event logging patch rev1 (crack nsGlobalWindow::TimerCallback()) (obsolete) — Details — Splinter Review

I'm a real glutton for punishment -- I've figured out what happens during the calls to nsGlobalWindow::TimerCallback(). And here's the new version of my patch that I used to do it. Turns out most (really almost all) of these calls are to the function nextStep() in the "button-periodic-redraw" binding of toolkit/content/widgets/button.xml.

Attachment #452399 - Attachment is obsolete: true

Steven Michaud [:smichaud] (Retired)

Comment 36

•

14 years ago

> Turns out most (really almost all) of these calls are to the > function nextStep() in the "button-periodic-redraw" binding of > toolkit/content/widgets/button.xml. Interestingly, these calls only happen on OS X. They were added (on the trunk) by the patch for bug 206636 (http://hg.mozilla.org/mozilla-central/rev/d1555870e786). The calls are on 100ms timers. But (as best I can tell) each "default button" gets its own timer.

Markus Stange [:mstange]

Comment 37

•

14 years ago

Where did you make these measurements, what windows did you have open? This timer is what drives the pulse animation for blue default buttons.

Steven Michaud [:smichaud] (Retired)

Comment 38

•

14 years ago

I used my patch from comment #35 in all my tests. I didn't actually make any time measurements -- I noticed that the nextStep() function with a "delay" of '100'. And by the way, I'm not at all sure that these timers are a problem. I just CCed you so that you could answer questions about them (more from others than myself, presumably).

Steven Michaud [:smichaud] (Retired)

Comment 39

•

14 years ago

I had just one window open (to the Minefield default home page). The calls to nextStep() continued even when the page was blank.

Steven Michaud [:smichaud] (Retired)

Comment 40

•

14 years ago

> I noticed that the nextStep() function with a "delay" of '100' I noticed that the nextStep() function is passed to setInterval() with a "delay" of '100'.

Steven Michaud [:smichaud] (Retired)

Comment 41

•

14 years ago

> But (as best I can tell) each "default button" gets its own timer. Is this correct, Markus?

Markus Stange [:mstange]

Comment 42

•

14 years ago

That's correct, yes. Our future plans for these kinds of animations are in bug 546837 comment 4.

Boris Zbarsky [:bzbarsky]

Comment 43

•

14 years ago

There's one non-anonymous buttond[default] in the Firefox UI itself somewhere. Not sure whether there might be anonymous ones too. But that shouldn't get you "event queue wakes up every small fraction of a second" unless "small fraction" is 0.1.

Steven Michaud [:smichaud] (Retired)

Comment 44

•

14 years ago

Boris, do you have any idea what the behavior would/should be if you had (say) 10 default buttons visible at once? Would/should this decrease the length of the longest possible idle? (I'm sure Andreas will do the measurements. I just wonder if there are any established ideas about how this *should* work.)

Brendan Eich [:brendan]

Comment 45

•

14 years ago

setTimeout and setInterval should cost at most one pending XPCOM timer. That's how they used to work, and that's how any minimal implementation would work. Are we really getting multiple nsTimerImpl instances, or just lots of closely spaced timer firings? /be

Steven Michaud [:smichaud] (Retired)

Comment 46

•

14 years ago

A bit of both, really. Each default button has a separate timer that fires once every 100ms. The question (or at least my question) is about the spacing of the timer firings as the number of these timers goes up.

Johnny Stenback (:jst)

Comment 47

•

14 years ago

Seems to me that the only sane way to deal with this is to have a single timer for these buttons and when that timer fires we update every button that needs updating etc. Even if we're able to synchronize these timers such that they'd all fire after one and other it'd be a waste of timer thread activity to do that over just having one IMO.

Boris Zbarsky [:bzbarsky]

Comment 48

•

14 years ago

> Would/should this decrease the length of the longest possible idle? With our current code it likely would; basically the firing times would be randomly distributed. Whether it should is a separate question.

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

14 years ago

Target Milestone: mozilla1.9.3 → mozilla2.0

Boris Zbarsky [:bzbarsky]

Comment 49

•

14 years ago

We should get a separate bug filed on that button thing. Can someone with a working computer (i.e. not me) please do that?

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

14 years ago

Depends on: 546837

Markus Stange [:mstange]

Comment 50

•

14 years ago

I'm not sure what the button problem is. That it uses timers at all? About the OS X implementation: The CFRunLoopObserver documentation says that you can be notified when the native event loop enters/exits the idle state by observing kCFRunLoopBefore/AfterWaiting. That sounds like a good place to call Before/AfterIdle().

Boris Zbarsky [:bzbarsky]

Comment 51

•

14 years ago

> That it uses timers at all? Yes, and especially one timer per button.

Steven Michaud [:smichaud] (Retired)

Comment 52

•

14 years ago

(In reply to comment #50) > About the OS X implementation: > The CFRunLoopObserver documentation says that you can be notified > when the native event loop enters/exits the idle state by observing > kCFRunLoopBefore/AfterWaiting. That sounds like a good place to call > Before/AfterIdle(). You mean as an alternative to hooking [NSApplication sendEvent:]? Yes, that might work. But we'll still need to also call AfterIdle() when we start processing Gecko events. In other words, we're only truly idle when we're processing neither native events nor Gecko events, and kCFRunLoopBefore/AfterWaiting is only about native events.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 53

•

14 years ago

If we use a single timer for all animated widgets, that would limit the wakeups to a reasonable interval. But it seems to me that for people with lots of tabs open (the people who get really hurt by cycle collection currently), there are likely to be various timers, both internal and JS, firing often enough that we probably won't see a useful interval of idle time ... unless we aggressively throttle things for background tabs. Which we should do anyway. Is there work happening on that? Seems like we need experiments with 50+ tabs open at least, including offenders like GMail, Zimbra, and other common apps.

Boris Zbarsky [:bzbarsky]

Comment 54

•

14 years ago

> Is there work happening on that? There is for throttling the refresh driver, but not for js timeouts...

Markus Stange [:mstange]

Comment 55

•

14 years ago

(In reply to comment #52) > But we'll still need to also call AfterIdle() when we start processing > Gecko events. I don't think so. We can't process Gecko events while the native event loop is waiting for a new native event. In order to get back into the Gecko event loop we need to leave the native waiting state first, and that's when AfterIdle() would fire.

Markus Stange [:mstange]

Updated

•

14 years ago

Depends on: 586216

Markus Stange [:mstange]

Comment 56

•

14 years ago

(In reply to comment #51) > > That it uses timers at all? > > Yes, and especially one timer per button. I filed bug 586216.

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

14 years ago

No longer depends on: 546837

Steven Michaud [:smichaud] (Retired)

Comment 57

•

14 years ago

(In reply to comment #55) > In order to get back into the Gecko event loop we need to leave the > native waiting state first, and that's when AfterIdle() would fire. No. AfterIdle() should be called when we're no longer idle, which is just when we start processing a Gecko event. It doesn't matter whether or not we continue in a "native idle state". My current patch calls AfterIdle() more than once between calls to BeforeIdle(). I'll try to fix that in a future patch ... but it shouldn't be hard to write AfterIdle() in such a way that this doesn't matter.

Markus Stange [:mstange]

Comment 58

•

14 years ago

(In reply to comment #57) > (In reply to comment #55) > > > In order to get back into the Gecko event loop we need to leave the > > native waiting state first, and that's when AfterIdle() would fire. > > No. AfterIdle() should be called when we're no longer idle, which is > just when we start processing a Gecko event. It doesn't matter > whether or not we continue in a "native idle state". I don't understand how this contradicts my suggestion. I'll write up a patch of what I have in mind so that we can be sure we're talking about the same thing.

Markus Stange [:mstange]

Comment 59

•

14 years ago

Attached patch demo patch using kCFRunLoopBefore/AfterWaiting observer (obsolete) — Details — Splinter Review

This patch dumps a message to the console whenever we enter/exit the native run loop waiting state and whenever we process a native event or a Gecko event. It also reduces the animation timer of the invisible progressbar in the main browser window to once per second. If we called nsBaseAppShell::BeforeIdle() and AfterIdle() at exactly those times where I now print "entering idle state" and "woke up!", we'd satisfy Andreas' requirements, wouldn't we?

Steven Michaud [:smichaud] (Retired)

Comment 60

•

14 years ago

> If we called nsBaseAppShell::BeforeIdle() and AfterIdle() at exactly > those times where I now print "entering idle state" and "woke up!", > we'd satisfy Andreas' requirements, wouldn't we? I don't think so. I think we'd only ever "wake up" when processing a native event -- not when processing a Gecko event. But I haven't yet tried your patch -- I need to do that before I'm sure.

Markus Stange [:mstange]

Comment 61

•

14 years ago

(In reply to comment #60) > > If we called nsBaseAppShell::BeforeIdle() and AfterIdle() at exactly > > those times where I now print "entering idle state" and "woke up!", > > we'd satisfy Andreas' requirements, wouldn't we? > > I don't think so. I think we'd only ever "wake up" when processing a > native event -- not when processing a Gecko event. We do wake up when processing a Gecko event. Gecko events are processed from our CFRunLoopSource which we schedule in ScheduleNativeEventCallback(). In order to notify run loop sources the run loop has to leave the waiting state first. See also http://developer.apple.com/mac/library/documentation/CoreFoundation/Reference/CFRunLoopObserverRef/Reference/reference.html#//apple_ref/doc/c_ref/CFRunLoopActivity

Steven Michaud [:smichaud] (Retired)

Comment 62

•

14 years ago

(In reply to comment #61) > We do wake up when processing a Gecko event. Gecko events are > processed from our CFRunLoopSource which we schedule in > ScheduleNativeEventCallback(). I've now built your patch, played with it (in gdb), and re-read the relevant code (in nsAppShell.mm and nsBaseAppShell.cpp). I was wrong -- your patch does always "wake up" whenever a Gecko event is processed. And in fact things are (with your patch) exactly the opposite of what I first feared -- it doesn't wake up for native events (at least not directly). But I now think this doesn't matter: As best I can tell, what Andreas wants to do requires only that Gecko be idle (whether or not native events are being processed should make no difference). (I'll talk more about this in my next comment.) So, though your patch is incomplete, I now think future patches should be based upon it -- or upon a combination of Andreas' original patch and your patch, borrowing only mHandleIdleNatively from my patch (since, on OS X, we'll want your patch to entirely override what Andreas' patch does on other platforms). In a bit I'll post a revision to show what I mean.

Steven Michaud [:smichaud] (Retired)

Comment 63

•

14 years ago

Attached patch Timer/event logging patch rev2 (incorporate Markus' ideas) (obsolete) — Details — Splinter Review

Here's an updated (and somewhat simplified) version of my debugging patch from comment #35, incorporating most of Markus' debugging patch. (Following up comment #62) > And in fact things are (with your patch) exactly the opposite of > what I first feared -- it doesn't wake up for native events (at > least not directly). I was wrong about this, too. With my debug patch I can see (from the NSLog() timestamps being the same) that Markus' RunLoopObserverCallback() does "wake up" at the same time pretty much every mouse or keyboard event is processed. mCFRunLoop (returned by +[NSRunLoop currentRunLoop]) is "the NSRunLoop object for the current thread" (i.e. the main thread). So it makes sense that it would also process native events (in addition to the Gecko events that we make it process by adding mCFRunLoopSource to it). By the way, using Markus' strategy we only call BeforeIdle() and AfterIdle() on the main thread. I assume this is correct. Is it?

Attachment #452430 - Attachment is obsolete: true

Steven Michaud [:smichaud] (Retired)

Updated

•

14 years ago

Attachment #465408 - Attachment description: Timer/event logging patch rev1 (incorporate Markus' ideas) → Timer/event logging patch rev2 (incorporate Markus' ideas)

Steven Michaud [:smichaud] (Retired)

Comment 64

•

14 years ago

Attached patch OS X testing patch rev3 (incorporate Markus' ideas) (obsolete) — Details — Splinter Review

Attachment #452298 - Attachment is obsolete: true

Markus Stange [:mstange]

Comment 65

•

14 years ago

(In reply to comment #63) > I can see (from the > NSLog() timestamps being the same) that Markus' > RunLoopObserverCallback() does "wake up" at the same time pretty much > every mouse or keyboard event is processed. As far as I can tell, being in the run loop waiting state is defined by the top three items in the call stack being mach_msg_trap | mach_msg | __CFRunLoopRun. If that's correct, everything else that happens on the main thread by definition doesn't happen during the run loop waiting state. Your patch looks good to me.

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

14 years ago

Depends on: 587876

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 66

•

14 years ago

(In reply to comment #63) > By the way, using Markus' strategy we only call BeforeIdle() and > AfterIdle() on the main thread. I assume this is correct. Is it? Yes, it's exactly what we're looking for! Thanks for these patches guys. My work is moving along nicely.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 67

•

14 years ago

Attached patch Rollup patch, v1 (obsolete) — Details — Splinter Review

This patch incorporates Andreas' original patch and Steven/Markus' patch, then makes CC interruptable and hooks it all together. Thanks for the help guys!

Assignee: nobody → bent.mozilla

Attachment #451734 - Attachment is obsolete: true

Attachment #464983 - Attachment is obsolete: true

Attachment #465435 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Attachment #467482 - Flags: review?(peterv)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 68

•

14 years ago

I should note that we don't make it through a full CC very often before the event loop wakes back up (though we do get complete runs every now and then). We need to get the dependent bugs fixed to give us longer.

Andreas Gal :gal

Reporter

Comment 69

•

14 years ago

IsCollectionCancelled() looks unnecessarily slow. You could use a simple flag that you set from the main thread. We have a similar mechanism in the JS engine (the operation callback flag, js_TriggerOperationCallback()). Otherwise this looks really awesome. ++bent

Brendan Eich [:brendan]

Comment 70

•

14 years ago

Please spell Canceled like an 'murkin, so it doesn't look like the stress goes on the last syllable. /be

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 71

•

14 years ago

(In reply to comment #70) > Please spell Canceled like an 'murkin, so it doesn't look like the stress goes > on the last syllable. It is so in the patch, Andreas just betrayed his own prejudices ;)

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

14 years ago

Keywords: meta

Summary: make cycle collector interruptible [meta] → make cycle collector interruptible

Johnny Stenback (:jst)

Comment 72

•

14 years ago

Blocking on this as this is a large part of what we need to do to cut down on gc pauses for fx4.

blocking2.0: --- → betaN+

Matheus Kerschbaum

Updated

•

14 years ago

Blocks: 595574

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 73

•

14 years ago

Attached patch Rollup patch, v1.1 (obsolete) — Details — Splinter Review

Merged to t-m tip.

Attachment #467482 - Attachment is obsolete: true

Attachment #488604 - Flags: review?(peterv)

Attachment #467482 - Flags: review?(peterv)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 74

•

14 years ago

Ben, have you pushed this to tryserver? Does the patch affect to performance test (dromaeo, tp4) results? Changing the time when CC runs usually affects to the tests.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 75

•

14 years ago

Attached patch Patch, v2 (obsolete) — Details — Splinter Review

Ok, this patch is way better. The old one leaked intermittently on try server due to the fact that we were removing things from the purple buffer during incremental CC but then maybe not allowing CC to finish if the main thread woke up again. Fixed that by moving the purple buffer modifications to the end of the traverse process. Also this uses mozilla::TimeStamp as the old interval code could have overflowed and made life miserable. Peter, please ignore the |fallible_t| changes for the moment as I'm having a hard time getting them to compile on windows. They're not strictly necessary for this patch, but we realized that the move to infallible malloc renders the cycle collector's OOM handling code moot so we need to fix that somehow. Smaug, I haven't done talos tests yet, any pointers on how to compare times here?

Attachment #488604 - Attachment is obsolete: true

Attachment #492421 - Flags: review?(peterv)

Attachment #488604 - Flags: review?(peterv)

Benjamin Smedberg

Comment 76

•

14 years ago

At this point, I don't think we can block on this, we need to ship.

blocking2.0: betaN+ → -

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 77

•

14 years ago

(In reply to comment #75) > Smaug, I haven't done talos tests yet, any pointers on how to compare times > here? Well, just compare the numbers of tp4 at least.

Andreas Gal :gal

Reporter

Comment 78

•

14 years ago

We have a tested patch ready to land, waiting for review for too long :( I will ask for approval when the review is done.

Andreas Gal :gal

Reporter

Comment 79

•

14 years ago

bent, can we bounce the review to someone else than peter?

Boris Zbarsky [:bzbarsky]

Comment 80

•

14 years ago

> I will ask for approval when the review is done. Right. This is definitely wanted and we should do it; we just won't block on it.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 81

•

14 years ago

Comment on attachment 492421 [details] [diff] [review] Patch, v2 dbaron, maybe you'd like to look over this one? ;)

Attachment #492421 - Flags: review?(dbaron)

Peter Van der Beken [:peterv]

Comment 82

•

14 years ago

I started reviewing this.

Andreas Gal :gal

Reporter

Comment 83

•

14 years ago

We really desperately need this. Thanks peter.

Peter Van der Beken [:peterv]

Comment 84

•

14 years ago

Comment on attachment 492421 [details] [diff] [review] Patch, v2 >diff --git a/dom/base/nsJSEnvironment.cpp b/dom/base/nsJSEnvironment.cpp >+void RunLoopObserverCallback(CFRunLoopObserverRef observer, >+ CFRunLoopActivity activity, >+ void *info) >+{ >+ if (activity == kCFRunLoopBeforeWaiting) { >+ nsAppShell::BeforeGeckoIdle(); >+ } else { Maybe assert that activity is kCFRunLoopAfterWaiting here? >+ nsAppShell::AfterGeckoIdle(); Why not call nsBaseAppShell::Before/AfterGeckoIdle directly here? >@@ -306,16 +323,31 @@ nsAppShell::Init() >+ CFRunLoopObserverContext observerContext; >+ bzero(&observerContext, sizeof(observerContext)); >+ // observerContext.version = 0; >+ observerContext.info = this; We never seem to use info, just pass NULL as the context instead? >+ >+ mCFRunLoopObserver = >+ ::CFRunLoopObserverCreate(kCFAllocatorDefault, >+ kCFRunLoopBeforeWaiting | kCFRunLoopAfterWaiting, >+ true, 0, RunLoopObserverCallback, >+ &observerContext); >+ NS_ENSURE_STATE(mCFRunLoopObserver); >+ >+ ::CFRunLoopAddObserver(mCFRunLoop, mCFRunLoopObserver, kCFRunLoopCommonModes); Why not use kCFRunLoopDefaultMode? >+nsBaseAppShell::AfterIdle() >+{ >+ if (gIdleCCRunning) { >+ // This will block until cycle collection finishes. >+ if (nsCycleCollector_stopCollecting()) { >+ // Reset the idle time if we actually collected something as we don't want >+ // to run again immediately. But we do if we didn't collect anything? Not sure I understand why we treat those conditions differently. Also, if we collected something then running it again immediately might collect more? >diff --git a/xpcom/base/nsCycleCollector.cpp b/xpcom/base/nsCycleCollector.cpp >@@ -2624,30 +2716,42 @@ nsCycleCollector::BeginCollection(PRBool >+ if (IsCollectionCanceled()) { >+ mScanInProgress = PR_FALSE; >+ FinishTraverse(); Would it make sense to have some sort of stack object to reset mScanInProgress (and maybe even calls FinishTraverse)? >@@ -2856,16 +2966,22 @@ nsCycleCollector::ExplainLiveExpectedGar >+ if (IsCollectionCanceled()) { >+ printf("nsCycleCollector: not explaining expected garbage since\n" >+ " cycle collection was interrupted\n"); >+ return; >+ } Doesn't that mean that we never do this for non-blocking collections? >@@ -2887,16 +3003,18 @@ nsCycleCollector::ExplainLiveExpectedGar > // traversal deficiencies. > mPurpleBuf.NoteAll(builder); > > MarkRoots(builder); > ScanRoots(); > > mScanInProgress = PR_FALSE; > >+ FinishTraverse(); Shouldn't you remove the loop at the end of ExplainLiveExpectedGarbage that does: for (PRUint32 i = 0; i <= nsIProgrammingLanguage::MAX; ++i) { if (mRuntimes[i]) mRuntimes[i]->FinishTraverse(); } ? >+ // We can't hold the lock while running cycle collection. Why exactly? >+ PRUint32 CollectAndBlock(nsICycleCollectorListener *aListener) > { > NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); >+ >+ nsAutoTPtrArray<PtrInfo, 4000> whiteNodes; >+ >+ NS_ASSERTION(!mListener, "Should have cleared this already!"); >+ mListener = aListener; >+ >+#ifdef DEBUG >+ mMode = Blocking; >+#endif >+ >+ { >+ MutexAutoLock autoLock(mLock); >+ >+ if (!mRunning) >+ return 0; >+ >+ { >+ MutexAutoUnlock autoUnlock(mLock); >+ if (!mCollector->PrepareForCollection(&whiteNodes)) >+ return 0; >+ } >+ >+ mCollectionInProgress = PR_TRUE; >+ >+ mRequest.Notify(); >+ PRUint32 StartCollecting(nsICycleCollectorListener *aListener) > { > NS_ASSERTION(NS_IsMainThread(), "Wrong thread!"); > >+ if (!mNonBlockingWhiteNodes) >+ mNonBlockingWhiteNodes = new nsTPtrArray<PtrInfo>(4000); > > NS_ASSERTION(!mListener, "Should have cleared this already!"); > mListener = aListener; > >+#ifdef DEBUG >+ mMode = NonBlocking; >+#endif >+ >+ MutexAutoLock autoLock(mLock); >+ >+ if (!mRunning) >+ return 0; >+ >+ { >+ MutexAutoUnlock autoUnlock(mLock); >+ if (!mCollector->PrepareForCollection(mNonBlockingWhiteNodes)) >+ return 0; >+ } >+ >+ mCollectionInProgress = PR_TRUE; > mRequest.Notify(); >+ >+ return 0; CollectAndBlock and StartCollecting look very similar, could we merge them and just pass in the mode? This |return 0| would just be an early return if mode == NonBlocking. >+ PRUint32 StopCollecting() >+#ifdef DEBUG >+ if (mCollectionInProgress) >+ NS_ASSERTION(mMode == NonBlocking, "Wrong mode!"); >+#endif NS_ASSERTION(!mCollectionInProgress || mMode == NonBlocking, ...); >+ mCollector->CancelCollection(); This part I find confusing. mCanceled doesn't mean canceled, that's only true if we were still running. Otherwise we've just finished building the graph, no? See also my comment about ExplainLiveExpectedGarbage. Still need to figure out how/when this makes us GC.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 85

•

14 years ago

(In reply to comment #84) > Maybe assert that activity is kCFRunLoopAfterWaiting here? ... > Why not call nsBaseAppShell::Before/AfterGeckoIdle directly here? ... > We never seem to use info, just pass NULL as the context instead? ... > Why not use kCFRunLoopDefaultMode? Steven, would you mind commenting on those questions here? I didn't modify your code. > But we do if we didn't collect anything? Not sure I understand why we treat > those conditions differently. Also, if we collected something then running it > again immediately might collect more? My thought was that if we succeed in collecting something we should wait a while before trying again so as to keep the idle CPU usage down. I'm not sure about these heuristics, smaug would probably be better at figuring out when we should try again. I'm just providing the knobs so others can tweak them :) > Would it make sense to have some sort of stack object to reset mScanInProgress > (and maybe even calls FinishTraverse)? I'm not in love with the idea, but I could do that if you want. > Doesn't that mean that we never do this for non-blocking collections? Yeah, that's a bug. I need to have two state bits I guess. > Shouldn't you remove the loop at the end of ExplainLiveExpectedGarbage that Probably, yes :) > Why exactly? Because then when we try to grab the lock in StopCollecting we will block until cycle collection finishes without setting our cancel flag. > CollectAndBlock and StartCollecting look very similar, could we merge them and > just pass in the mode? This |return 0| would just be an early return if mode == > NonBlocking. Sure. > NS_ASSERTION(!mCollectionInProgress || mMode == NonBlocking, ...); Jonas likes those too... > This part I find confusing. mCanceled doesn't mean canceled, that's only true > if we were still running. Otherwise we've just finished building the graph, no? Yeah, the name may be confusing you. It just means "please stop ASAP". Most of the time that will mean "cancel", but sometimes it will mean nothing.

Steven Michaud [:smichaud] (Retired)

Comment 86

•

14 years ago

>> Maybe assert that activity is kCFRunLoopAfterWaiting here? > ... >> Why not call nsBaseAppShell::Before/AfterGeckoIdle directly here? > ... >> We never seem to use info, just pass NULL as the context instead? > ... >> Why not use kCFRunLoopDefaultMode? > > Steven, would you mind commenting on those questions here? I didn't > modify your code. I'll try to get to these in the next few days.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 87

•

14 years ago

(In reply to comment #86) > I'll try to get to these in the next few days. Thanks!

Steven Michaud [:smichaud] (Retired)

Comment 88

•

14 years ago

>>+void RunLoopObserverCallback(CFRunLoopObserverRef observer, >>+ CFRunLoopActivity activity, >>+ void *info) >>+{ >>+ if (activity == kCFRunLoopBeforeWaiting) { >>+ nsAppShell::BeforeGeckoIdle(); >>+ } else { > > Maybe assert that activity is kCFRunLoopAfterWaiting here? Maybe it's best to change the else clause to } else if (kCFRunLoopAfterWaiting) { nsAppShell::AfterGeckoIdle(); } >>+ nsAppShell::AfterGeckoIdle(); > > Why not call nsBaseAppShell::Before/AfterGeckoIdle directly here? Fine with me. Then we could get rid of nsAppShell::Before/AfterGeckoIdle. >>+ CFRunLoopObserverContext observerContext; >>+ bzero(&observerContext, sizeof(observerContext)); >>+ // observerContext.version = 0; >>+ observerContext.info = this; > > We never seem to use info, just pass NULL as the context instead? Also fine with me. >>+ >>+ mCFRunLoopObserver = >>+ ::CFRunLoopObserverCreate(kCFAllocatorDefault, >>+ kCFRunLoopBeforeWaiting | >>+ kCFRunLoopAfterWaiting, >>+ true, 0, RunLoopObserverCallback, >>+ &observerContext); >>+ NS_ENSURE_STATE(mCFRunLoopObserver); >>+ >>+ ::CFRunLoopAddObserver(mCFRunLoop, mCFRunLoopObserver, >>+ kCFRunLoopCommonModes); > > Why not use kCFRunLoopDefaultMode? That's probably a good idea -- we're probably not properly idle unless the base/default run loop is idle. But this change would require additional testing.

Steven Michaud [:smichaud] (Retired)

Comment 89

•

14 years ago

> we're probably not properly idle unless the base/default run loop is > idle We're probably not properly idle unless the main thread's run loop is idle in the default mode.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 90

•

14 years ago

Comment on attachment 492421 [details] [diff] [review] Patch, v2 bent says my review is no longer needed here now that peterv is reviewing this

Attachment #492421 - Flags: review?(dbaron)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 91

•

14 years ago

Should we take this to Fx5? CC pauses aren't usually very long though, it is GC.

Roman

Comment 92

•

14 years ago

@Olli: pauses are long enough to be a constant drain on my willpower not to smash something... The best way to get Firefox to do long pauses for testing purposes is as follows: - Go to eBay - Open 25 products in background tabs. - Look through them. Close the tabs. - Go to step one. Repeat this for up to 10 cycles and Firefox will start exhibiting the freezes. The easiest way to verify that freezes are present is IMO to press Alt, Down, to drop down the menu, and then just hold the Left arrow. Every now and then a certain menu just stops scrolling. In my current v4.0, with 6 tabs open (+10 more in Tab Panorama), the pauses exceed 1 second (!). That is not so short as to be unnoticeable - in fact, it's almost unusably long. The PC isn't a slow one - a quad-core with 8GB RAM. Now that people will start using Tab Panorama to maintain more open tabs than ever before, this will probably become even more severe than in v3.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 93

•

14 years ago

Roman, are you sure the pauses are caused by CC and not GC? And note, this bug won't make CC pauses go away, just make them interruptible (at least in common cases). I do think we should fix this sooner than later, but we need to understand how the patch behaves in different cases.

Roman

Comment 94

•

14 years ago

Oh, apologies - no I'm not sure about that. It's just that this is the only bug that bug 490122 currently depends on, so it seemed that there was some certainty about this.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 95

•

14 years ago

Roman, if you have exact steps to reproduce the problem you see, could you file a new bug. CC me.

Johnny Stenback (:jst)

Comment 96

•

14 years ago

Roman, one way to check if the pauses you're seeing are because of gc and/or cc is to set javascript.options.mem.log to true in about:config and watch the messages in the Error Console, there you'll then see how much time we spend doing gc and cc. Please include that information in your new bug.

Roman

Comment 97

•

14 years ago

Thanks all, enabled logging and will file a separate report if the exact STR become clearer. Really doubt it though, since the problem is clearly of a random nature. I'll try to get a good profiling capture of the pauses though - can't believe I hadn't thought of Visual Studio's built-in profiler before.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 98

•

13 years ago

Could we get the patch updated and get some tryserver builds for testing.

Andreas Gal :gal

Reporter

Comment 99

•

13 years ago

Andrew, can you take a stab at this? This would help a ton with pause times.

Assignee: bent.mozilla → continuation

Andrew McCreight [:mccr8]

Assignee

Comment 100

•

13 years ago

Sure, I'll work on that this week.

Andrew McCreight [:mccr8]

Assignee

Updated

•

13 years ago

Target Milestone: mozilla2.0 → ---

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 101

•

13 years ago

(In reply to Andreas Gal :gal from comment #99) > This would help a ton with pause times. Well... Yes, if we idled long enough. Last we checked we were almost never idle, maybe that has changed by now?

Andreas Gal :gal

Reporter

Comment 102

•

13 years ago

That really depends on the workload. Time to measure.

Phoenix

Comment 103

•

13 years ago

Examples of pauses due to CC CC timestamp: 1320052537368000, collected: 6 (6 waiting for GC), suspected: 309, duration: 3188 ms. CC timestamp: 1320052524790000, collected: 332 (332 waiting for GC), suspected: 842, duration: 3157 ms. CC timestamp: 1320052500180000, collected: 0 (6 waiting for GC), suspected: 3700, duration: 3203 ms. CC timestamp: 1320052470821000, collected: 6 (6 waiting for GC), suspected: 3639, duration: 3281 ms. CC timestamp: 1320052462493000, collected: 329 (329 waiting for GC), suspected: 3897, duration: 3234 ms. CC timestamp: 1320052438087000, collected: 6 (6 waiting for GC), suspected: 125, duration: 3156 ms.

Andrew McCreight [:mccr8]

Assignee

Comment 104

•

13 years ago

Phoenix, something is going terribly wrong with your cycle collector, but that isn't directly related to this bug. Please file a new bug and CC me on it. It would also be good if you included information in your new bug report about what sites are causing this problem, what addons you have, what version of Firefox you are running, and so forth. Thanks!

Phoenix

Comment 105

•

13 years ago

I already have it, today/tomorrow I'll finish collect new information for update on it and add you in copy

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 106

•

13 years ago

(In reply to Phoenix from comment #103) > Examples of pauses due to CC To clarify, we're interested in the time that the event loop is idle (while the app, or at least the main thread, has nothing to do).

Andrew McCreight [:mccr8]

Assignee

Updated

•

13 years ago

Blocks: 698919

Roman

Updated

•

13 years ago

Blocks: 646941

Phoenix

Comment 107

•

13 years ago

Ben, Andrew, please take a look - https://bugzilla.mozilla.org/show_bug.cgi?id=608954#c45

Andrew McCreight [:mccr8]

Assignee

Comment 108

•

13 years ago

Attached patch Patch v2, rebased WIP — Details — Splinter Review

At least nsCycleCollector.cpp compiles. I haven't tried a full build yet.

Attachment #465408 - Attachment is obsolete: true

Attachment #492421 - Attachment is obsolete: true

Attachment #492421 - Flags: review?(peterv)

Andrew McCreight [:mccr8]

Assignee

Comment 109

•

13 years ago

Got the patch building, was able to browse around for a few minutes. I pushed to try to see what it looks like. Windows probably doesn't build, as I think I forgot to add an argument somewhere. https://tbpl.mozilla.org/?tree=Try&rev=eec1e90ed3b6

Andrew McCreight [:mccr8]

Assignee

Comment 110

•

13 years ago

Looks pretty good. It looks like the leaks are all nsTArrays, probably from sloppy porting of the white node array on my part. Should be easy to fix.

Andreas Gal :gal

Reporter

Comment 111

•

13 years ago

Andrew, can you get numbers on the impact on responsiveness? and how often do we manage to CC without interruption?

Andrew McCreight [:mccr8]

Assignee

Comment 112

•

13 years ago

I put up a Linux debug build on try: https://tbpl.mozilla.org/?tree=Try&rev=1fb6386cf639 I somehow managed to turn an nsAutoPtr into a regular pointer when rebasing. Hopefully this fixes that problem. The Windows build problem was not actually in widget stuff like I suspected. Apparently MSVC doesn't like the fallible allocation the patch adds: Block *b = new (fallible_t()) Block(); results in these errors: e:/builds/moz2_slave/try-w32-dbg/build/xpcom/base/nsCycleCollector.cpp(441) : error C2066: cast to function type is illegal e:/builds/moz2_slave/try-w32-dbg/build/xpcom/base/nsCycleCollector.cpp(441) : error C2440: 'initializing' : cannot convert from 'mozilla::fallible_t (__cdecl **)(void)' to 'EdgePool::Block *' Types pointed to are unrelated; conversion requires reinterpret_cast, C-style cast or function-style cast e:/builds/moz2_slave/try-w32-dbg/build/xpcom/base/nsCycleCollector.cpp(441) : error C2144: syntax error : 'EdgePool::Block' should be preceded by ';' I'll try to find out the right way to do this in MSVC.

Andrew McCreight [:mccr8]

Assignee

Updated

•

13 years ago

Whiteboard: [Snappy]

(dormant account)

Comment 113

•

13 years ago

Andrew can you assign a snappy priority to this?

Andrew McCreight [:mccr8]

Assignee

Comment 114

•

13 years ago

Assigning it P2 because I think it will be really great when we're consistently under, say, 100ms, but until then we should focus on getting there first.

Whiteboard: [Snappy] → [Snappy:P2]

David Zbarsky (:dzbarsky)

Updated

•

13 years ago

No longer blocks: 698919

David Zbarsky (:dzbarsky)

Updated

•

13 years ago

Blocks: 698919

Andrew McCreight [:mccr8]

Assignee

Comment 115

•

12 years ago

Incremental CC seems more useful at this point.

Andrew McCreight [:mccr8]

Assignee

Comment 116

•

12 years ago

(which is bug 850065)

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

Andrew McCreight [:mccr8]

Assignee

Updated

•

8 years ago

patch 14 years ago Andreas Gal :gal 3.67 KB, patch		Details \| Diff \| Splinter Review
Testing patch that works on OS X 14 years ago Steven Michaud [:smichaud] (Retired) 10.98 KB, patch		Details \| Diff \| Splinter Review
OS X testing patch rev2 14 years ago Steven Michaud [:smichaud] (Retired) 15.12 KB, patch		Details \| Diff \| Splinter Review
Timer/event logging patch 14 years ago Steven Michaud [:smichaud] (Retired) 55.77 KB, patch		Details \| Diff \| Splinter Review
Timer/event logging patch rev1 (crack nsGlobalWindow::TimerCallback()) 14 years ago Steven Michaud [:smichaud] (Retired) 58.94 KB, patch		Details \| Diff \| Splinter Review
demo patch using kCFRunLoopBefore/AfterWaiting observer 14 years ago Markus Stange [:mstange] 6.04 KB, patch		Details \| Diff \| Splinter Review
Timer/event logging patch rev2 (incorporate Markus' ideas) 14 years ago Steven Michaud [:smichaud] (Retired) 21.30 KB, patch		Details \| Diff \| Splinter Review
OS X testing patch rev3 (incorporate Markus' ideas) 14 years ago Steven Michaud [:smichaud] (Retired) 12.00 KB, patch		Details \| Diff \| Splinter Review
Rollup patch, v1 14 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 32.36 KB, patch		Details \| Diff \| Splinter Review
Rollup patch, v1.1 14 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 36.98 KB, patch		Details \| Diff \| Splinter Review
Patch, v2 14 years ago Ben Turner (not reading bugmail, use the needinfo flag!) 46.89 KB, patch		Details \| Diff \| Splinter Review
Patch v2, rebased WIP 13 years ago Andrew McCreight [:mccr8] 42.43 KB, patch		Details \| Diff \| Splinter Review