Clumping wakeups in TimerThread can reduce wakeups

NEW
Unassigned

Status

()

Core
XPCOM
7 years ago
2 years ago

People

(Reporter: azakai, Unassigned)

Tracking

(Blocks: 1 bug)

Trunk
x86
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [battery][Power:P2])

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

7 years ago
Created attachment 450502 [details] [diff] [review]
Clumps together timers by firing them at regular intervals

In order to reduce the # of wakeups (good for saving power), we can 'clump' together wakeups that are close. For example, if we are supposed to have timers fire at the following times (in ms):

99,100,101

Then it might make sense to fire all 3 at time 100 (1 wakeup instead of 3).

One simple way to accomplish this is to fire timers at regular intervals, for example at a resolution of 10ms (100fps) we would fire timers at multiples of 10ms. Timers would be called at the time that is a nearest multiple to them, slightly early or slightly late, and consequently timers have a higher probability of firing together. The attached patch implements that (set the env var MOZ_TIME_QUANTUM to the interval time in ms to test).

The disadvantage to this approach is that timers do not fire at precisely the right times. If the time resolution is too big, it may be noticeable by the user. But, at a fast rate - say 60fps - it should be undetectable. In any event this would be less controversial in the case of mobile devices, where such delays are far less noticeable (and this is anyhow the case where we most care about saving power by having fewer wakeups). It might also make sense to adjust this based on the power remaining (save power more aggressively when less remains).

Some data: On www.webhamster.com there are a few hundred animated gifs, each with an animation rate of around 10fps. For different interval values (MOZ_TIME_QUANTUM), the # of wakeups is:

Interval/fps          # of wakeups
=====================================
 500ms /  2fps               8
 100ms / 10fps               22
  50ms / 20fps               40
  33ms / 30fps               47
  25ms / 40fps               48
  20ms / 50fps               54
  17ms / 60fps               53
       --                    --
  Free-running               63

(Free-running is the original behavior - no limit on timers whatsoever.)

(Note: Numbers are on mobile-e10s. In the non-e10s case, the # of wakeups would be lower, as only one process is running.)

So, even 50-60fps gives a 16% improvement. More aggressive values can save even more but can be noticeable (but as mentioned above, might be worthwhile on very low battery).

I don't think this patch should have any bad effects, especially when using a small time interval, as anyhow already there is some unpredictability in when timers fire (due to other stuff happening on the CPU, etc., and the mTimeoutAdjustment factor & its heuristics which tries to correct for that).
(Reporter)

Updated

7 years ago
Blocks: 446418
> But, at a fast rate - say 60fps - it should be undetectable.

That would be _trivially_ detectable on web pages where it would make setTimeout fire at 60Hz instead of 100Hz and therefore slow animations down by a factor of 1.6 or so.

> As anyhow already there is some unpredictability

There's a difference between unpredictability and imposing a new lower bound that's 1.6x higher than the previous one on web page timers.

Not to mention that in some cases we rely on timers set for "0" to really fire ASAP; changing that will regress performance benchmarks.

For animated images, the right solution is to move them to refresh driver (we have a bug on this already), at which point they'll all run off a 50Hz per-page timer.

I would also be ok with explicitly flagging timers as groupable as needed, by the way.  We just couldn't thus flag web page timeout timers.
How much power does this save?  Would be good to establish what the maximal gain is here, too.
(Reporter)

Comment 3

7 years ago
(In reply to comment #1)
> > But, at a fast rate - say 60fps - it should be undetectable.
> 
> That would be _trivially_ detectable on web pages where it would make
> setTimeout fire at 60Hz instead of 100Hz and therefore slow animations down by
> a factor of 1.6 or so.
> [..]
> There's a difference between unpredictability and imposing a new lower bound
> that's 1.6x higher than the previous one on web page timers.

I didn't think about that - good point. In a sense such web pages are written incorrectly if they don't show the animation at the right speed (by skipping frames), as they would break anyhow on slow machines whose setTimeout can't consistently reach 100Hz. But I agree with you, this is an unacceptable problem with the attached patch.

> 
> Not to mention that in some cases we rely on timers set for "0" to really fire
> ASAP; changing that will regress performance benchmarks.
> 

Also very true.

> For animated images, the right solution is to move them to refresh driver (we
> have a bug on this already), at which point they'll all run off a 50Hz per-page
> timer.

Regarding that, it would be a regression to use the current refresh driver, as it would cause 50 wakeups/second even for animations that don't need it.

I guess it would be possible to modify the refresh driver to handle animations at arbitrary rates (also changing ones, for animations with varying frame delays), but then it would be duplicating a sizable part of the code in TimerThread which already does precisely that, I think?

> 
> I would also be ok with explicitly flagging timers as groupable as needed, by
> the way.  We just couldn't thus flag web page timeout timers.

I like that idea. Regarding practical details, how about adding a new timer type, TYPE_REPEATING_SLACK_GROUPABLE, and use that for image animation timers (and the refresh driver)?
> In a sense such web pages are written incorrectly

Sure thing.  They all are, in fact.  ;)

> it would be a regression to use the current refresh driver, as
> it would cause 50 wakeups/second even for animations that don't need it

Hmm.  How common are animated gifs that run at slower than 50Hz?  But yes, this is a valid point.  The key part of using refresh driver is to not animate images in background tabs and the like at all....
(Reporter)

Comment 5

7 years ago
(In reply to comment #2)
> How much power does this save?  Would be good to establish what the maximal
> gain is here, too.

Well, the maximal gain could be high, but would require optimal conditions (not sure how likely they are). But the scenario would be something like this:

10, 20, 30, 40, 50, 60, 70, 80, 90

(intended firing times)

and after grouping, we get

10&20&30, 40&50&60, 70&80&90

So we have 1/3 the wakeups as before. During sleep the CPU takes orders of magnitude less power than when active, so we would be using close to 1/3 the power, assuming the wakeups are very brief. (Of course the 1/3 factor is just in this example, it could be more or less, depending on how much grouping we are actually willing to do.)
Right, but how much power does it save with the patch? What have we measured?
(Reporter)

Comment 7

7 years ago
(In reply to comment #4)
> Hmm.  How common are animated gifs that run at slower than 50Hz?  But yes, this
> is a valid point.  The key part of using refresh driver is to not animate
> images in background tabs and the like at all....

Not sure how common they are in general, but as one relevant example, the 'loading, please wait' animated image that is used on the Fennec startup page is 4Hz. (Due to bug 567339, we therefore have 4 unnecessary persistent wakeups, which is at least better than 50...)
(Reporter)

Comment 8

7 years ago
(In reply to comment #6)
> Right, but how much power does it save with the patch? What have we measured?

I measured wakeups (using powertop) in the data mentioned in this bug. To know how much power is actually saved (it could be less than the % of wakeups we saved, if the wakeups are long, and depending on the CPU's power saving features) we would need to do physical measurements. Apparently there are devices you plug the power cord into that measure how much power is drawn.
Yeah, I guess that's what I'm asking: if we do the best possible job of changing our timer implementation, how much does that help the user?  We should make sure that we're doing the things that will help most, which likely means the power-use equivalent of profiling.
(Reporter)

Comment 10

7 years ago
(In reply to comment #9)
> Yeah, I guess that's what I'm asking: if we do the best possible job of
> changing our timer implementation, how much does that help the user?  We should
> make sure that we're doing the things that will help most, which likely means
> the power-use equivalent of profiling.

Ok, I am looking into how to measure power draw at the hardware level for this.
(Reporter)

Comment 11

7 years ago
It seems that realtime measurement of power usage is hard. Simple consumer tools exist for measuring usage in watts, but for mobile devices we need milliwatts. There are various debates on the Maemo forums about how to do that properly and it doesn't seem simple.

So instead of getting into that, how about the following test:

1. Charge the device fully.
2. Disconnect from power.
3. Run test for 1 hour.
4. Measure how much battery charge remains (on the N900 this works: hal-device bme | grep level.perc) and calculate from that what we need.

I will start doing this soon, please let me know if I am missing something and there is a problem with this test setup.
It's going to be great to get power-usage info, thanks for doing this.  Might want another bug for it, but nonetheless.

Should probably run this both with the screen forced on (simulating active user use) and with it normally timed out off (simulating background web app load), so that we can see what the percentages are like.  If we only draw 10% of the power of the screen, f.e., then we're probably not very relevant to total power draw when the user is actively browsing, but we should look at ways to get down near 0 when we're not the focused app.
(Reporter)

Comment 13

7 years ago
Ok, I have some useful figures. Details are in this blog post:

http://mozakai.blogspot.com/2010/06/power-measuring-saving.html

To summarize, it looks like clumping together timeouts can noticeably reduce power usage. With 200 timeouts clumped into 1 or spread out entirely, the savings can be 8% (with screen on) or 38% (with screen off). As mentioned in the blog post, the numbers shouldn't be taken literally (this is just one setup, hard to say how real-world it is), but I think they do indicate that clumping timeouts is worth implementing (in some appropriate way that still needs to be determined).

(The power measurements were done with this Maemo tool: http://www.mauve.plus.com/bat.tar.gz )
(Reporter)

Comment 14

7 years ago
Created attachment 452346 [details] [diff] [review]
New timer type, TYPE_REPEATING_SLACK_GROUPABLE

Better version of previous patch. Introduces a new timer type, TYPE_REPEATING_SLACK_GROUPABLE, which is used only by animation timers. Will not influence other timers at all (but remains as useful as the original patch, removing 18% of wakeups in plugin-container on webhamster.com).
Attachment #450502 - Attachment is obsolete: true
(Reporter)

Comment 15

7 years ago
Created attachment 452770 [details] [diff] [review]
Per-timer grouping intervals

After more thought, I think this patch is a better approach.

In this patch, each timer has a groupingInterval attribute, so you can set the level of grouping for each timer separately. So for example animation timers can have a resolution of 20ms (=50Hz), and the JS engine GC callback could have a resolution of 1000ms (=1Hz).

The proposed patch currently just affects animation timers. If there is agreement on the approach, then we can consider adding groupingIntervals elsewhere.
Attachment #452346 - Attachment is obsolete: true
(Reporter)

Updated

7 years ago
Attachment #452770 - Flags: feedback?

Updated

7 years ago
Attachment #452770 - Flags: feedback? → feedback?(roc)

Updated

7 years ago
Assignee: nobody → azakai
Comment on attachment 452770 [details] [diff] [review]
Per-timer grouping intervals

Looks good.
Attachment #452770 - Flags: feedback?(roc) → feedback+
roc, next steps?
This patch should be reviewed. Who's doing timer reviews these days?
(Reporter)

Comment 19

7 years ago
Created attachment 458436 [details] [diff] [review]
Patch v2

Added an xpcshell test.
Attachment #452770 - Attachment is obsolete: true
Attachment #458436 - Flags: review?(doug.turner)
Comment on attachment 458436 [details] [diff] [review]
Patch v2

azakai - you need to change the uuid of the nsITimer interface because you added a new attribute.

Benjamin has been reviewing timer things - bouncing to him.
Attachment #458436 - Flags: review?(doug.turner) → review?(benjamin)
(Reporter)

Comment 21

7 years ago
Created attachment 459161 [details] [diff] [review]
Patch v3

Changed uuid as instructed, and marking dougt as reviewer, following offline discussion.
Attachment #458436 - Attachment is obsolete: true
Attachment #459161 - Flags: review?(doug.turner)
Attachment #458436 - Flags: review?(benjamin)
Comment on attachment 459161 [details] [diff] [review]
Patch v3

>diff --git a/layout/base/nsRefreshDriver.cpp b/layout/base/nsRefreshDriver.cpp
>--- a/layout/base/nsRefreshDriver.cpp
>+++ b/layout/base/nsRefreshDriver.cpp
>@@ -120,6 +120,7 @@
>   if (NS_FAILED(rv)) {
>     mTimer = nsnull;
>   }
>+  mTimer->SetGroupingInterval(REFRESH_INTERVAL_MILLISECONDS);

Why this change?  In the imgContainer you mentioned that 20 == 50fps, is this similar?



>+        do_check_true(stddev < timer.grouping / 50); // Deviations must be
>+                                                     // very small

I am somewhat worries that this might cause random oranges on the tests.  Have you seen any failures on your try pushes?
Attachment #459161 - Flags: review?(doug.turner) → review-
(Reporter)

Comment 23

7 years ago
Hmm, meanwhile the time measurement API was overhauled, into TimeStamp and TimeDuration. The new API purposefully does not expose absolute times, which this approach needs. What it does expose are millisecond-resolution times, but in tests I did now, that isn't good enough to actually reduce the # of wakeups. Without a clear benefit, there is no sense in using this patch.

For posterity's sake, here is the rewrite to the new API:

      static TimeStamp reference;
      if (reference.IsNull()) {
        reference = TimeStamp::Now();
      }
      TimeStamp groupedTimeout = mTimeout;
      TimeDuration groupingInterval_2 = TimeDuration::FromMilliseconds(mGroupingInterval/2);
      groupedTimeout += groupingInterval_2;
      PRInt64 absoluteTime = (groupedTimeout - reference).ToMilliseconds();
      absoluteTime -= absoluteTime % mGroupingInterval;
      groupedTimeout = reference + TimeDuration::FromMilliseconds(absoluteTime);
      if (groupedTimeout < TimeStamp::Now())
          groupedTimeout += groupingInterval_2 + groupingInterval_2;
      mTimeout = groupedTimeout;

Another note, while testing this, there were rare errors in one of the reftests,

  modules/libpr0n/test/reftest/apng/delaytest.html?bug411852-1.png

which occurred if too much grouping was done. So future patches should carefully test this reftest in particular.

Instead of this patch, other approaches to grouping timeouts can be tried, such as defining a 'flexibility window' for timers, so that they will fire ahead of time up to that window, *if* another timer is currently firing. This should have more of an effect.
Why do you need absolute times?

TimeStamp/TimeDuration exposes nanosecond resolution on some of our operating systems, and only millisecond resolution on others, but all of our timer stuff was already millisecond-resolution... so I'm not sure what more resolution would be needed for.
(Reporter)

Comment 25

7 years ago
(In reply to comment #24)
> Why do you need absolute times?
> 
> TimeStamp/TimeDuration exposes nanosecond resolution on some of our operating
> systems, and only millisecond resolution on others, but all of our timer stuff
> was already millisecond-resolution... so I'm not sure what more resolution
> would be needed for.

I didn't look much inside the new TimeStamp class, but the values received behave slightly differently than before, enough to mess up the idea of grouping timers by having them fire at absolute times that are multiples of some factor.

In any case, that a code refactor was enough to stop this patch from being beneficial probably means the patch was too 'sensitive' anyhow. I will try the other approach I mentioned earlier with the 'flexibility windows'.
Ok.  Let me know if you have any questions about the new timer setup, alright?  It's really basically the same as before except not sensitive to NTP messing with the system data and with now() having slightly worse resolution on Windows and Mac for the moment (that's subject to change).
(Reporter)

Comment 27

7 years ago
I still believe this is worth doing, but will not have time for it in the near future.
Assignee: azakai → nobody
Whiteboard: [battery]
Whiteboard: [battery] → [battery][Power]
Whiteboard: [battery][Power] → [battery][Power:P2]
You need to log in before you can comment on or make changes to this bug.