Closed Bug 937960 Opened 7 years ago Closed 7 years ago

ICC: Slice scheduling

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

Attachments

(5 files, 1 obsolete file)

In nsJSEnvironment, we have to actually schedule slices of ICC work.  This requires adding a new timer, dealing with the CC lockout stuff, deciding how much time to have between slices, decide how much time to spend on each slice, stop a CC if it is taking too long (eg too many slices), etc.  This is also where I'll add the pref to control if we are incrementally collecting or not.
Attachment #8346003 - Flags: review?(bugs)
The type of pref callbacks changed.
Attachment #8346003 - Attachment is obsolete: true
Attachment #8346003 - Flags: review?(bugs)
Attachment #8346984 - Flags: review?(bugs)
Comment on attachment 8346984 [details] [diff] [review]
part 1 - Add ICC pref to nsJSEnv.

Why you need the pref in b2g/app/b2g.js?

That addressed, r+
Attachment #8346984 - Flags: review?(bugs) → review+
I was just cargo culting some other variable.  I guess b2g will just use the value from the other one if no special one is set for b2g?
Yes it should, unless b2g has some really odd setup.
And anyhow the default for the pref in the nsJSEnvironment.cpp is false...
Comment on attachment 8346005 [details] [diff] [review]
part 2 - Add ICC slice timer and scheduling.


>+static void
>+ICCTimerFired(nsITimer *aTimer, void *aClosure)
This is dom code, so sane coding style.
* goes with the type.


This whole setup isn't too easy to understand.
ICCTimer calls ScheduledCycleCollectNow, but sounds like
ScheduledCycleCollectNow should be RunCycleCollectSlice or some such.
Perhaps a followup to clean up method names.
Attachment #8346005 - Flags: review?(bugs) → review+
Comment on attachment 8346006 [details] [diff] [review]
part 3 - Pass in a small time budget with ICC.


>+// Time budget for an incremental CC slice
>+static const int64_t kICCSliceBudget = 40; // ms
This sounds very high. I thought you were aiming for 5ms or so.
Certainly should be less than 16ms.
10ms perhaps?
Attachment #8346006 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #10)
> This sounds very high. I thought you were aiming for 5ms or so.
> Certainly should be less than 16ms. 10ms perhaps?

The problem is that on Mochitests you end up falling behind, and end up finishing things synchronously.  Ideally, I think we need to move to a more adaptive system which does larger slices when there is more work to do, but I'm not really sure how that should work.  With 10ms slices, how much much time should be in between slices, do you think?  I can try it out, maybe if it is small it won't be too bad.

40ms reduces pauses from page teardown, but not much otherwise.  Also keep in mind that right now only graph building is incrementalized.  I think the target slice time for IGC is 40ms, too, so I don't think using 40ms for the CC will make the worst case much worse than if it was lower.

I should also file a followup about doing some kind of NotifyDidPaint thing like IGC.
Could we try 10ms slices and just 32ms interval?

Better would be to do icc right after paint and do only a small slice.
(In reply to Olli Pettay [:smaug] from comment #12)
> Could we try 10ms slices and just 32ms interval?

Sure, I'll try that.  It won't matter for the initial landing, so if it is problematic, I can always tweak it later.

> Better would be to do icc right after paint and do only a small slice.

I filed bug 950791 for some kind of NotifyDidPaint thing.
(In reply to Olli Pettay [:smaug] from comment #9)
> This whole setup isn't too easy to understand.
> ICCTimer calls ScheduledCycleCollectNow, but sounds like
> ScheduledCycleCollectNow should be RunCycleCollectSlice or some such.
> Perhaps a followup to clean up method names.
Good point, the name is bad.  I filed bug 950959 for renaming those things.
Olli pointed out that my previous patch reduces the time between CCs, even
when ICC is disabled, which is bad. I've locally reverted the change to
NS_CC_DELAY in part 2. This patch adjusts things so we reduce the time
between CCs when ICC is active, but should not alter behavior otherwise.
Attachment #8348432 - Flags: review?(bugs)
Attachment #8348432 - Flags: review?(bugs) → review+
backed out by Ryan for Win7 debug XPCshell timeouts like this:
TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\modules\libpref\test\unit\test_libPrefs.js | Test timed out
TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\modules\libpref\test\unit\test_libPrefs.js | test failed (with xpcshell return code: 15), see following log:
Return code: 1 

So that's awesome.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a3287212603e
The timeout seems to have persisted across the backout, so if it doesn't show up on try after a few retriggers I'll reland.
  https://tbpl.mozilla.org/?tree=Try&rev=f92fb07aefcf
Keywords: checkin-needed
Backed out for being the likely cause of bug 952053.
https://hg.mozilla.org/integration/mozilla-inbound/rev/e9bab00a241e

Green Try run of the backout:
https://tbpl.mozilla.org/?tree=Try&rev=e8b3e7c612cc

If that bug does persist, feel free to reland.
Depends on: 952053
Yeah, it did seem to go away right after the backout.  Really odd.  Maybe I'm registering the pref in some horrible way.  I'll see if I can reproduce on try, and then see if not doing the pref stuff fixes it.
I removed anything related to prefs and pushed to try and got 2 failures out of 13:
  https://tbpl.mozilla.org/?tree=Try&rev=2ac023aed0f6
On inbound, without any of my patches, I got 1 failure out of 14:
  https://tbpl.mozilla.org/?tree=Try&rev=2ac023aed0f6
So... a similar rate of failure.  There's also been a slew of these failures on Fx-team, without my patches.  I'm going to give it a day or two to let us get more of a baseline failure rate on bug 952053, then I'll try relanding the set of patches that don't touch prefs at all, then I'll go from there.

Against an earlier inbound change set, I was getting about 1/3 to 1/2 failures with my patches ( https://tbpl.mozilla.org/?tree=Try&rev=4edef94420c3 https://tbpl.mozilla.org/?tree=Try&rev=eafad8ca3c51 https://tbpl.mozilla.org/?tree=Try&rev=f127e335893b )... but about the same percentage with none of my failures ( https://tbpl.mozilla.org/?tree=Try&rev=687bf7d70252 ).
> but about the same percentage with none of my failures
That should read "but about the same percentage with none of my patches".
I split out the code to add a pref from part 1 for landing, to try to work around
the pref-related timeout.  Here's the final patch.  Carrying forward smaug's r+.
Attachment #8351012 - Flags: review+
part 5:
https://hg.mozilla.org/integration/mozilla-inbound/rev/150a6fb7381a

I landed this separately to more easily see if it is really causing XPCshell timeouts.
I retriggered Win7 debug XPCShell about 8 times on each of the last two pushes, and it was all green, so this should be good to go.
Whiteboard: [leave open]
Comment on attachment 8346006 [details] [diff] [review]
part 3 - Pass in a small time budget with ICC.

Do I understand this correctly in that if you've taken over 2 seconds to do ICC, then your solution is to make the app totally unresponsive by forcing a full CC?
Yes.  According to telemetry, the 95th percentile of CC time is 255ms with ICC*, so I don't expect that will happen much.  The user is still better off than without ICC.  The main concern is that the CC will fall behind, and you will start using more memory, which will make the next CC take even longer, leading to some kind of death spiral.

* http://telemetry.mozilla.org/#nightly/32/CYCLE_COLLECTOR_FULL
(In reply to Andrew McCreight from comment #32)
> Yes.  According to telemetry, the 95th percentile of CC time is 255ms with
> ICC*, so I don't expect that will happen much.  The user is still better off
> than without ICC.  The main concern is that the CC will fall behind, and you
> will start using more memory, which will make the next CC take even longer,
> leading to some kind of death spiral.

Ah yes, I think I might have seen that death spiral - on a 32 bit Windows VM, when my process exceeds about 80% of 2GB (I'm not using /3GB in my boot.ini) its cycle collections start getting slower and slower (most of the time is spent in VirtualAlloc) until eventually it simply OOMs somewhere in operator new().
You need to log in before you can comment on or make changes to this bug.