Last Comment Bug 720493 - nsIdleService sometimes fails to restart its idle time detection timer
: nsIdleService sometimes fails to restart its idle time detection timer
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Widget (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla13
Assigned To: Mike Kristoffersen (:MikeK)
:
Mentors:
Depends on: 732368 779843
Blocks: 517482
  Show dependency treegraph
 
Reported: 2012-01-23 13:30 PST by Mike Kristoffersen (:MikeK)
Modified: 2012-08-02 07:21 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-


Attachments
Work in progress, test, debug and "printf"s included, and a bit of agressive timer usage (33.17 KB, patch)
2012-01-25 16:30 PST, Mike Kristoffersen (:MikeK)
no flags Details | Diff | Splinter Review
Simplified version of the nsIdleService (15.14 KB, patch)
2012-01-26 14:06 PST, Mike Kristoffersen (:MikeK)
no flags Details | Diff | Splinter Review
As good as it gets before review comments... (30.24 KB, patch)
2012-01-27 10:16 PST, Mike Kristoffersen (:MikeK)
kchen: review+
Details | Diff | Splinter Review
Patch 592169 Updated with review comments (31.52 KB, patch)
2012-02-01 13:30 PST, Mike Kristoffersen (:MikeK)
no flags Details | Diff | Splinter Review
Should now be able to compile in other than the mobile configuration (31.53 KB, patch)
2012-02-01 14:26 PST, Mike Kristoffersen (:MikeK)
no flags Details | Diff | Splinter Review
Updated with codingstandard comments (32.42 KB, patch)
2012-02-02 14:09 PST, Mike Kristoffersen (:MikeK)
no flags Details | Diff | Splinter Review
Updated after review (31.47 KB, patch)
2012-02-12 13:35 PST, Mike Kristoffersen (:MikeK)
netzen: review+
Details | Diff | Splinter Review
Added needed info (user, proper description) to the patch (35.59 KB, patch)
2012-02-28 12:30 PST, Mike Kristoffersen (:MikeK)
no flags Details | Diff | Splinter Review
Now it should be updated with only the user and description compared to the R+'ed version :) (31.58 KB, patch)
2012-02-28 13:47 PST, Mike Kristoffersen (:MikeK)
no flags Details | Diff | Splinter Review

Description Mike Kristoffersen (:MikeK) 2012-01-23 13:30:59 PST
While testing the idle service in a monitored environment (read: code had a lot of printfs) in order to find the reason for bug 517482 it was observed several times that it failed to restart it's internal timer used for detecting idle time - this lead to a total failure to detect idle time.
Comment 1 Mike Kristoffersen (:MikeK) 2012-01-23 13:36:35 PST
I found another thing while testing the idle service - at least when running Fennec in an Ubuntu environment the idle service wont actually start until it is explicitly requested or there is some user events (mouse, keyboard) being received by Firefox - this will lead to the daily idle timer not firering, as that is started as a side effect of starting the idle service.
Comment 2 Mike Kristoffersen (:MikeK) 2012-01-25 16:30:13 PST
Created attachment 591653 [details] [diff] [review]
Work in progress, test, debug and "printf"s included, and a bit of agressive timer usage

Work in progress of a patch to this and bug 720490 - not quite debugged yet, so still have test and debug code inside.
Comment 3 Mike Kristoffersen (:MikeK) 2012-01-26 14:06:09 PST
Created attachment 591939 [details] [diff] [review]
Simplified version of the nsIdleService

Ok, found and fixed the bug from yesterday that made the timer go off too often.  I need to run it trough the tryserver, and check I didn't break too much of the coding standard before I ask for a review - but starting to wonder who would be a good candidate to review this?
Comment 4 Mike Kristoffersen (:MikeK) 2012-01-27 10:16:24 PST
Created attachment 592169 [details] [diff] [review]
As good as it gets before review comments...
Comment 5 Mike Kristoffersen (:MikeK) 2012-01-27 10:46:23 PST
Comment on attachment 592169 [details] [diff] [review]
As good as it gets before review comments...

Did a refactor of the idle service, trying to clean it up, can you review it?
Comment 6 Kan-Ru Chen [:kanru] (UTC+8) 2012-01-29 19:41:02 PST
Comment on attachment 592169 [details] [diff] [review]
As good as it gets before review comments...

Review of attachment 592169 [details] [diff] [review]:
-----------------------------------------------------------------

Hi, I like your patch though I cannot review the coding standard parts. You might want to replace 1000 with PR_MSEC_PER_SEC so they are consisted with the usage of PR_USEC_PER_SEC.

::: widget/xpwidgets/nsIdleService.cpp
@@ +289,5 @@
> +
> +  // Ensure timer is running
> +  ReconfigureTimer();
> +
> +  fflush(stdout);

Debug code?
Comment 7 Mike Kristoffersen (:MikeK) 2012-01-30 15:09:19 PST
(In reply to Kan-Ru Chen [:kanru] from comment #6)
> Comment on attachment 592169 [details] [diff] [review]
> As good as it gets before review comments...
> 
> Review of attachment 592169 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Hi, I like your patch though I cannot review the coding standard parts. You

Hmm... do you have a good candidate to do that part of the review, after I deal with you other comments?

> might want to replace 1000 with PR_MSEC_PER_SEC so they are consisted with
> the usage of PR_USEC_PER_SEC.

Why didn't I think about that?  Sounds like a good idea, will update it :)

> 
> ::: widget/xpwidgets/nsIdleService.cpp
> @@ +289,5 @@
> > +
> > +  // Ensure timer is running
> > +  ReconfigureTimer();
> > +
> > +  fflush(stdout);
> 
> Debug code?

Yes :)  Consider it gone...
Comment 8 Mike Kristoffersen (:MikeK) 2012-02-01 13:30:52 PST
Created attachment 593590 [details] [diff] [review]
Patch 592169 Updated with review comments
Comment 9 Mike Kristoffersen (:MikeK) 2012-02-01 14:13:51 PST
Comment on attachment 593590 [details] [diff] [review]
Patch 592169 Updated with review comments

Woups, version 2 coming up...
Comment 10 Mike Kristoffersen (:MikeK) 2012-02-01 14:26:53 PST
Created attachment 593608 [details] [diff] [review]
Should now be able to compile in other than the mobile configuration

This is the reason it is good we have the try server, the previous patch failed to compile on the platforms I couldn't compile locally - this one should be able to compile on all platforms :)
Comment 11 Brian R. Bondy [:bbondy] 2012-02-01 15:41:57 PST
Comment on attachment 593608 [details] [diff] [review]
Should now be able to compile in other than the mobile configuration

Review of attachment 593608 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch.

I didn't review this for technical correctness, nor try the patch, but I did a fast pass over it to fix the formatting. 
Please try someone from https://wiki.mozilla.org/Modules/All#Widget for the official review.

::: widget/xpwidgets/nsIdleService.cpp
@@ +59,5 @@
> +// interval in milliseconds between internal idle time requests.
> +#define MIN_IDLE_POLL_INTERVAL_MSEC (5 * PR_MSEC_PER_SEC) /* 5 sec */
> +
> +// Time used by the daily idle serivce to determine a significant idle time
> +#define DAILY_IDLE_SERVICE_SIGNIFICANT_IDLE_TIME_SEC 300 /* 5 min */

Perhaps something shorter like DAILY_SIGNIFICANT_IDLE_SERVICE_SEC

@@ +102,5 @@
>    }
>  
>    // Stop observing idle for today.
> +  (void)mIdleService->RemoveIdleObserver(this,
> +                              DAILY_IDLE_SERVICE_SIGNIFICANT_IDLE_TIME_SEC);

With renamed constant, align to opening parentheses

@@ +167,5 @@
>  
>    // The one thing we do every day is to start waiting for the user to "have
>    // a significant idle time".
> +  (void)me->mIdleService->AddIdleObserver(me,
> +                                  DAILY_IDLE_SERVICE_SIGNIFICANT_IDLE_TIME_SEC);

With the renamed constant, please align the constant with "me"

@@ +176,5 @@
> + * The idle services goal is to notify subscribers when a certain time has
> + * passed since the last user interaction with the system.
> + *
> + * On some platforms this is defined as the last time user events reached this
> + * application, on other platforms it is a system wide thing - the prefered

nit: preferred

@@ +213,5 @@
> + * - User interaction is detected
> + *
> + * - The timer expires
> + *
> + * When a new listener is added it's idle timeout, is compared with the next

nit: its

@@ +247,5 @@
>  
> +nsIdleService::nsIdleService() : mCurrentlySetToTimeoutAtInPR(0)
> +                               , mAnyObserverIdle(false)
> +                               , mDeltaToNextIdleSwitchInS(PR_UINT32_MAX)
> +                               , mLastUserInteractionInPR(0)

Put "," at the end of each line instead of before, align parameters as you did.

@@ +255,5 @@
>  }
>  
>  nsIdleService::~nsIdleService()
>  {
> +  if(mTimer) mTimer->Cancel();

if(mTimer) {
    mTimer->Cancel();
  }

When it's all on one line it is hard to debug.

@@ +264,5 @@
>  {
>    NS_ENSURE_ARG_POINTER(aObserver);
> +  // We don't accept idle time at 0, and we can't handle idle time that are too
> +  // high either - no more than ~136 years
> +  NS_ENSURE_ARG_RANGE(aIdleTimeInS, 1, (PR_UINT32_MAX/10)-1);

nit: Spaces should go between each side of a binary operator. 

  NS_ENSURE_ARG_RANGE(aIdleTimeInS, 1, PR_UINT32_MAX / 10 - 1);

You can optionally keep the parentheses but they aren't needed in this case.

@@ +281,5 @@
>      NS_ENSURE_SUCCESS(rv, rv);
>    }
>  
> +  // Check if the newly added observer has a smaller wait time than what we
> +  // are wating for now

nit: end sentence comments with a "." thoughout

@@ +284,5 @@
> +  // Check if the newly added observer has a smaller wait time than what we
> +  // are wating for now
> +  if (mDeltaToNextIdleSwitchInS > aIdleTimeInS) {
> +    // If it is, then this is the next to move to idle (at this point we
> +    // don't care if it shuld have switched allready

nit: typos
"...it should have switched already"

@@ +295,5 @@
>    return NS_OK;
>  }
>  
>  NS_IMETHODIMP
> +nsIdleService::RemoveIdleObserver(nsIObserver* aObserver, PRUint32 aTimeInS)

I like that you renamed the variables to specify if in seconds or ms by the way :)

@@ +323,2 @@
>  
> +  // If noone is idle, then we are done, any existing timers can keep running...

nit: "noone" -> "no one"

@@ +357,5 @@
> +  if (!numberOfPendingNotifications) {
> +    return;
> +  }
> +
> +  // Now send "back" events to all, if any should have timed out allready, then

nit: typo, already

@@ +358,5 @@
> +    return;
> +  }
> +
> +  // Now send "back" events to all, if any should have timed out allready, then
> +  // they will be re-waken by the timer that is allready running

nit: reawaken, already

@@ +399,5 @@
>      *idleTime = polledIdleTimeMS;
>      return NS_OK;
>    }
>  
> +  // timeSinceReset is in milli seconds.

nit: milliseconds

@@ +401,5 @@
>    }
>  
> +  // timeSinceReset is in milli seconds.
> +  PRUint32 timeSinceResetInMS = (PR_Now() - mLastUserInteractionInPR) /
> +                                                               PR_USEC_PER_MSEC;

Align PR_USEC_PER_MSEC to "(", so one space after the = on the next line.

@@ +455,5 @@
> +  // we do the caluculation in ms to lessen the chance for rounding errors to
> +  // trigger wrong results, it is also very important that we call PR_Now AFTER
> +  // the call to GetIdleTime()
> +  if (((PR_Now() - mLastUserInteractionInPR) / PR_USEC_PER_MSEC) >
> +                                                            currentIdleTimeInMS)

align second line to the right of "if(" on the next line

@@ +461,5 @@
> +    // We had user activity, so handle that part first (to ensure the listeners
> +    // don't risk getting an non-idle after they get a new idle indication
> +    ResetIdleTimeOut(currentIdleTimeInMS);
> +
> +    // NOTE: We can't bail here, as we might have something allready timed out

nit:already

@@ +470,4 @@
>  
> +  // Now check if anyone should go to idle mode
> +  if (mDeltaToNextIdleSwitchInS <= currentIdleTimeInS)
> +  {

Brace on the previous line.

@@ +534,3 @@
>  {
> +  // Bail if we don't have a timer service
> +  if (!mTimer) return;

if (!mTimer) {
  return;
}

@@ +552,4 @@
>      }
> +
> +    // Add a little (10 ms to ensure we don't undershoot, and never get a "0"
> +    // timer

Remove this from the comment:
"a little ("
Move timer to the previous line as long as it is under 80 chars.

@@ +554,5 @@
> +    // Add a little (10 ms to ensure we don't undershoot, and never get a "0"
> +    // timer
> +    mCurrentlySetToTimeoutAtInPR += 10 * PR_USEC_PER_MSEC;
> +
> +    // Start the timer, magic number +10 is to ensure that we always overshoot

nit: + 10

@@ +560,5 @@
> +    // rounding
> +    mTimer->InitWithFuncCallback(StaticIdleTimerCallback,
> +            this,
> +            (mCurrentlySetToTimeoutAtInPR - currentTime) / PR_USEC_PER_MSEC,
> +            nsITimer::TYPE_ONE_SHOT);

Parameters here should be aligned to the opening parentheses.
This line would be split on more than 1 line:
(mCurrentlySetToTimeoutAtInPR - 
 currentTime) / PR_USEC_PER_MSEC,

@@ +583,5 @@
> +  // ticking while we are processing
> +  PRTime curTimeInPR = PR_Now();
> +
> +  PRTime nextTimeoutAtInPR = mLastUserInteractionInPR +
> +                          (((PRTime)mDeltaToNextIdleSwitchInS) * PR_USEC_PER_SEC);

Move ((PRTime)mDeltaToNextIdleSwitchInS) on the previous line.
Can optionally get rid of the surrounding (...) since * has higher precedence.
Comment 12 Mike Kristoffersen (:MikeK) 2012-02-02 13:45:47 PST
(In reply to Brian R. Bondy [:bbondy] from comment #11)

Thank you very much for the feedback, I have updated the code, and will post the updated patch after I see it still compiles :)  

One question remains thou:

@@ +583,5 @@
> > +  // ticking while we are processing
> > +  PRTime curTimeInPR = PR_Now();
> > +
> > +  PRTime nextTimeoutAtInPR = mLastUserInteractionInPR +
> > +                          (((PRTime)mDeltaToNextIdleSwitchInS) * PR_USEC_PER_SEC);
> 
> Move ((PRTime)mDeltaToNextIdleSwitchInS) on the previous line.
> Can optionally get rid of the surrounding (...) since * has higher
> precedence.

There is not enough space on the previous line to have the (((PRTime)nDelta.... so I have
formatted it as:

  PRTime nextTimeoutAtInPR = mLastUserInteractionInPR +
                             (((PRTime)mDeltaToNextIdleSwitchInS) *
                              PR_USEC_PER_SEC);

I know the * binds harder than +, but think it makes it clearer with the ()...  and it wouldn't fit on one line, even I got rid of the ()...
Comment 13 Mike Kristoffersen (:MikeK) 2012-02-02 14:09:41 PST
Created attachment 593970 [details] [diff] [review]
Updated with codingstandard comments

Try server result from before formatting changes:
https://tbpl.mozilla.org/?tree=Try&rev=6829368114b8
Try server (build only) from this patch:
https://tbpl.mozilla.org/?tree=Try&rev=d59ff4ff5e4b

Didn't do a full try run on the updated patch, as it only contains non-logic changes
Comment 14 Brian R. Bondy [:bbondy] 2012-02-02 15:33:04 PST
> so I have
> formatted it as:
> ...

Yup that looks correct to me.

> I know the * binds harder than +, but think it makes it clearer with the ()

No problem, I do it for clarity sometimes too.
Comment 15 Brian R. Bondy [:bbondy] 2012-02-07 15:24:01 PST
I'll be working on this review tomorrow by the way.
Comment 16 Mike Kristoffersen (:MikeK) 2012-02-07 23:18:47 PST
Thank you :)
Comment 17 Brian R. Bondy [:bbondy] 2012-02-08 19:56:07 PST
Comment on attachment 593970 [details] [diff] [review]
Updated with codingstandard comments

Review of attachment 593970 [details] [diff] [review]:
-----------------------------------------------------------------

It's looking great, thanks for the patch.
Please see below and then re-request review with a new patch, I'd like to do another pass on it before giving r+.

Would it be possible to add some test cases for this?

Did this pass all tests on try on all platforms by the way?

I haven't tried using it yet but the logic looks good. How are you testing it for correctness by the way?
Should I make some code to try it myself or do you have a better suggestion?

Are there any comments that should be updated in widget/nsIIdleService.idl ?

::: widget/xpwidgets/nsIdleService.cpp
@@ +228,5 @@
> + * of active listeners, we don't stop the timer, we just let it expire.
> + *
> + * When user interaction is detected, either because it was directly detected or
> + * because we polled the system timer and found it to be unexpected low, then we
> + * check the flag that tells us if any service is in idle mode, if there are

Nit: If there are any listeners not service right?

@@ +525,5 @@
> +                                                        timeStr.get());
> +    }
> +  } else {
> +    // If we didn't expect anyone to be idle, then just re-start the timer.
> +    ReconfigureTimer();

nit: Swap this condition at the start of the if since it's smaller and easier to see what it belongs to.

@@ +555,4 @@
>      }
> +
> +    // Add 10 ms to ensure we don't undershoot, and never get a "0" timer.
> +    mCurrentlySetToTimeoutAtInPR += 10 * PR_USEC_PER_MSEC;

How about:

if (currentTime >= mCurrentlySetToTimeoutAtInPR) {
    mCurrentlySetToTimeoutAtInPR = currentTime + 10 * PR_SEC_PER_MSEC;
}

And remove these 2 lines?
		 
- // Add 10 ms to ensure we don't undershoot, and never get a "0" timer.
- mCurrentlySetToTimeoutAtInPR += 10 * PR_USEC_PER_MSEC;

::: widget/xpwidgets/nsIdleService.h
@@ +132,5 @@
> +   * Function that resets the idle time in the service, in other words it
> +   * sets the time for the last user interaction to now, or now-idleDelta
> +   *
> +   * @param idleDelta the time (in milliseconds) since the last user inter
> +   *        action

The word "action" should be aligned with the word "the" from the previous line.

@@ +168,3 @@
>  private:
>    /**
> +   * Ensure that the timer is expirering at least at the given time

nit: expiring

@@ +176,2 @@
>     */
> +  void SetTimerExpireryToAtOrBefore(PRTime aNextTimeoutInPR);

nit: Please typo in function name (Expiry).
How about SetTimerExpiryIfBefore since there is no action when it is "at"
Comment 18 Mike Kristoffersen (:MikeK) 2012-02-09 14:15:50 PST
(In reply to Brian R. Bondy [:bbondy] from comment #17)
> Comment on attachment 593970 [details] [diff] [review]
> Updated with codingstandard comments
> 
> Review of attachment 593970 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It's looking great, thanks for the patch.
> Please see below and then re-request review with a new patch, I'd like to do
> another pass on it before giving r+.

Sure, thank you for the review, I'll post an updated patch in a few days.

> 
> Would it be possible to add some test cases for this?

Currently we have test_bug343416.xul that is testing the idle service, it is randomly failing as described in Bug#517482, the reason for the random failure I believe is described in bug#720490 which this patch should also fix.  Do you have ideas for other aspects of the service that you would like to see a test for?

It should be noted that the idle service is currently not started until there is use input or the idle service is explicitly requested - this can in theory prevent the daily idle timer from starting - but that behavior is not changed by this patch, I think we should consider if it an issue that should have its own bug.

> 
> Did this pass all tests on try on all platforms by the way?

I ran a full test suite for all platforms on the try server, the results are here: https://tbpl.mozilla.org/?tree=Try&rev=6829368114b8

I don't believe any of the failures are related to the idle service/this patch.

I'll re-run after I have updated the patch.

> 
> I haven't tried using it yet but the logic looks good. How are you testing
> it for correctness by the way?
> Should I make some code to try it myself or do you have a better suggestion?

What I did was to add some test code locally with debug output, and then time the behavior manually -unfortunately I can't document this, but if the existing test doesn't test the service in a way that makes us comfortable, then I think we should add automatic test(s) to raise our level of confidence - so if you have ideas for new tests that we could create, please let me know.

> 
> Are there any comments that should be updated in widget/nsIIdleService.idl ?

I'll double check, but as the service didn't change its interface/behavior I don't expect so.
Comment 19 Brian R. Bondy [:bbondy] 2012-02-10 06:52:25 PST
Thanks for the info: 
test_bug343416.xul looks good as we don't have many entry points into the interface to cover.

> What I did was to add some test code locally with debug output,

It would be helpful to me if you could provide that in a different patch that I can apply as well.  But if you don't have this handy anymore that's ok.

> Sure, thank you for the review, I'll post an updated patch in a few days.

Sounds good, thanks!
Comment 20 Mike Kristoffersen (:MikeK) 2012-02-12 13:20:58 PST
(In reply to Brian R. Bondy [:bbondy] from comment #17)
> > +
> > +    // Add 10 ms to ensure we don't undershoot, and never get a "0" timer.
> > +    mCurrentlySetToTimeoutAtInPR += 10 * PR_USEC_PER_MSEC;
> 
> How about:
> 
> if (currentTime >= mCurrentlySetToTimeoutAtInPR) {
>     mCurrentlySetToTimeoutAtInPR = currentTime + 10 * PR_SEC_PER_MSEC;
> }
> 
> And remove these 2 lines?
> 		 
> - // Add 10 ms to ensure we don't undershoot, and never get a "0" timer.
> - mCurrentlySetToTimeoutAtInPR += 10 * PR_USEC_PER_MSEC;

Nja... that would not cover the case where the difference between currentTime and mCurrentlySetToTimeoutAtInPR is 1 usec.  The way the patch is now, I try to ensure that a timer will never have a duration below 10 ms.  btw changed currentTime to currentTimeInPR
Comment 21 Mike Kristoffersen (:MikeK) 2012-02-12 13:35:39 PST
Created attachment 596518 [details] [diff] [review]
Updated after review

Just pushed the updated patch to try:
https://tbpl.mozilla.org/?tree=Try&rev=fca03e490a8c
Lets see how it behaves there :)
Comment 22 Mike Kristoffersen (:MikeK) 2012-02-13 13:35:28 PST
(In reply to Mike Kristoffersen (:MikeK) from comment #21)
> Created attachment 596518 [details] [diff] [review]
> Updated after review
> 
> Just pushed the updated patch to try:
> https://tbpl.mozilla.org/?tree=Try&rev=fca03e490a8c
> Lets see how it behaves there :)

Hmm... While the issue this bug is about seem to have been fixed by the patch, there is still the random failure in test_bug343416.xul, I would have expected that to be fixed as a side effect but apparently not, will dig a bit more into that...
Comment 23 Brian R. Bondy [:bbondy] 2012-02-13 18:18:56 PST
Sounds good
Comment 24 Mike Kristoffersen (:MikeK) 2012-02-15 15:47:57 PST
This seems a bit hard to reproduce when running the .xul test locally (there are many reasons why this could happen), wondering if there is an easy way to run this specific test on the try server, instead of the whole group it is inside?
Comment 25 Brian R. Bondy [:bbondy] 2012-02-16 05:38:24 PST
You can run a specific test on your local machine but I don't know of a way to do it on the try server.  You can re-run an entire group on the try server only without re-doing a build/push by pressing the plus sign after clicking on the test group.
Comment 26 Mike Kristoffersen (:MikeK) 2012-02-20 12:27:37 PST
I have had a script running the test_bug343416.xul test 1000 times over the weekend, it passed every time on my 64-bit Ubuntu machine.  The random error I saw once on try was on a windows build - there is a known random error currently described in bug 517482 which corresponds to this.  

Unfortunately currently I don't have a system set up to build on windows....  Unless someone has a system available, where we can test if the number of random failures increases with this patch, then I'll try to figure out how to setup a windows machine for building/testing locally...  Or I guess I could grab the build from the try-server, unpack it and just run the test on a local windows machine without worrying about setting up the build environment???
Comment 27 Justin Lebar (not reading bugmail) 2012-02-21 02:26:03 PST
> Or I guess I could grab the build from the try-server, unpack it and just run the test on 
> a local windows machine without worrying about setting up the build environment?

Yes, that works.  But then, how is this going to help?  You already saw the error on try on Windows.  If you observe the random failure on your local machine and you don't have a build environment set up, you won't be able to debug it...
Comment 28 Mike Kristoffersen (:MikeK) 2012-02-21 03:34:16 PST
(In reply to Justin Lebar [:jlebar] from comment #27)
> > Or I guess I could grab the build from the try-server, unpack it and just run the test on 
> > a local windows machine without worrying about setting up the build environment?
> 
> Yes, that works.  But then, how is this going to help?  You already saw the
> error on try on Windows.  If you observe the random failure on your local
> machine and you don't have a build environment set up, you won't be able to
> debug it...

What I want to find out, is if the frequency of the (known) random failure changes, if it doesn't change for the worse then I think we should consider submitting this patch as it solves a specific problem in the current IdleService.
Comment 29 Justin Lebar (not reading bugmail) 2012-02-21 03:36:53 PST
In that case, I'd recommend testing by kicking off many builds on tryserver.  (In TBPL, click the relevant test and then click the blue plus sign near the bottom of the screen.)  In my experience, the frequency of randomoranges in local testing bears little correlation with the frequency of randomoranges on tryserver.
Comment 30 Mike Kristoffersen (:MikeK) 2012-02-21 03:59:40 PST
(In reply to Justin Lebar [:jlebar] from comment #29)
> In that case, I'd recommend testing by kicking off many builds on tryserver.
> (In TBPL, click the relevant test and then click the blue plus sign near the
> bottom of the screen.)  In my experience, the frequency of randomoranges in
> local testing bears little correlation with the frequency of randomoranges
> on tryserver.

Thanks, I'll try that tonight when I finished my day job :)
Comment 31 Mike Kristoffersen (:MikeK) 2012-02-26 15:20:16 PST
Ok, I have run the test 29 times, shared between different windows builds, none of them show the random error - so the question is if we will accept that as sufficient proof that this patch doesn't make the random error more likely - It should be mentioned again that I saw the error once in my initial test of the patch.
Comment 32 Mike Kristoffersen (:MikeK) 2012-02-26 15:20:56 PST
Oh, forgot to link to the results of the test run: https://tbpl.mozilla.org/?tree=Try&rev=a70da34a2115
Comment 33 Justin Lebar (not reading bugmail) 2012-02-27 05:28:39 PST
(In reply to Mike Kristoffersen (:MikeK) from comment #31)
> Ok, I have run the test 29 times, shared between different windows builds,
> none of them show the random error - so the question is if we will accept
> that as sufficient proof that this patch doesn't make the random error more
> likely

Sure, that's sufficient for me.  I mean, someone needs to *fix* the random error; I'm not going to get hung up over probabilities that we can't measure well anyway.
Comment 34 Justin Lebar (not reading bugmail) 2012-02-27 07:57:11 PST
This is a critical bug to fix because lots of things are triggered off idle, particularly telemetry, gc, and database cleanups.
Comment 35 Mike Kristoffersen (:MikeK) 2012-02-27 12:41:29 PST
Comment on attachment 596518 [details] [diff] [review]
Updated after review

Let's deal with the random error elsewhere :)
Comment 36 Brian R. Bondy [:bbondy] 2012-02-27 15:35:51 PST
Comment on attachment 596518 [details] [diff] [review]
Updated after review

Review of attachment 596518 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!
Comment 37 Justin Lebar (not reading bugmail) 2012-02-27 16:24:19 PST
Mike, in which bug are you going to move the idle service away from PR_Now and start using PR_IntervalNow?

This is critical for devices where the system timer is systematically reset e.g. via NTP.  In practice, most (all?) modern desktop and phone operating systems do this.
Comment 38 Justin Lebar (not reading bugmail) 2012-02-27 16:43:50 PST
So we poll every 5s on some operating systems?  Which ones?

This seems like it would waste a fair bit of battery life.

Detecting user-active via polling, moreover, is pretty crummy, because we want to know *right* when the user is back, not 5s afterwards.  Why can't we hook into the native event loop to detect user-back?
Comment 39 Mike Kristoffersen (:MikeK) 2012-02-27 22:32:47 PST
(In reply to Justin Lebar [:jlebar] from comment #37)
> Mike, in which bug are you going to move the idle service away from PR_Now
> and start using PR_IntervalNow?
> 
> This is critical for devices where the system timer is systematically reset
> e.g. via NTP.  In practice, most (all?) modern desktop and phone operating
> systems do this.

That would be in Bug 555313
Comment 40 Mike Kristoffersen (:MikeK) 2012-02-27 22:43:21 PST
(In reply to Justin Lebar [:jlebar] from comment #38)
> So we poll every 5s on some operating systems?  Which ones?

I'll get back to you on this one, need to check the code, I don't have it checked out on this machine.

> 
> This seems like it would waste a fair bit of battery life.

As most of the devices we are on will wake up regularly to listen to some kind of radio anyway I don't think it is that much of an issue - that is assuming we don't use exact timers...  but anyway one time each 5s doesn't seem too bad, we are not doing a lot of calculations each time.

> 
> Detecting user-active via polling, moreover, is pretty crummy, because we
> want to know *right* when the user is back, not 5s afterwards.  Why can't we
> hook into the native event loop to detect user-back?

But we are hooked into the native event loop on all platforms, so we will leave idle mode as soon as mouse/finger/key input is detected in the application.
Comment 41 Mike Kristoffersen (:MikeK) 2012-02-28 12:30:56 PST
Created attachment 601369 [details] [diff] [review]
Added needed info (user, proper description) to the patch
Comment 42 Mike Kristoffersen (:MikeK) 2012-02-28 12:39:08 PST
(In reply to Mike Kristoffersen (:MikeK) from comment #41)
> Created attachment 601369 [details] [diff] [review]
> Added needed info (user, proper description) to the patch

And that was the wrong file.... one sec...
Comment 43 Mike Kristoffersen (:MikeK) 2012-02-28 13:27:55 PST
(In reply to Justin Lebar [:jlebar] from comment #38)
> So we poll every 5s on some operating systems?  Which ones?

We do it on GTK builds, OS/2, Windows, X (mac)

Especially we don't do it on Android.

There is something that looks strange on QT, it looks like it was intended to use polling on X except on maemo, but I think it falls back to only look at user input in the app - but as it signals that it supports poll mode it has the poll timers running anyway.

It should be noted that polling is the more accurate way of doing it as it asks the OS for the idle time, where we in non-pooling mode only consider the app it self.

None of the above behavior has changed with this patch.
Comment 44 Justin Lebar (not reading bugmail) 2012-02-28 13:36:59 PST
> That would be in Bug 555313

Thanks.

> But we are hooked into the native event loop on all platforms, so we will leave idle mode as soon as 
> mouse/finger/key input is detected in the application.

Personally, I'd rather never poll and wake up only when the native event loop says we're back.  That way, we don't have to worry about battery use, etc.  I don't think any of the consumers of the idle service care about global-idle as much as they care about Firefox-idle.

But I'll file a separate bug.  Thanks for your hard work!
Comment 45 Mike Kristoffersen (:MikeK) 2012-02-28 13:43:48 PST
(In reply to Mike Kristoffersen (:MikeK) from comment #39)
> (In reply to Justin Lebar [:jlebar] from comment #37)
> > Mike, in which bug are you going to move the idle service away from PR_Now
> > and start using PR_IntervalNow?
> > 
> > This is critical for devices where the system timer is systematically reset
> > e.g. via NTP.  In practice, most (all?) modern desktop and phone operating
> > systems do this.
> 
> That would be in Bug 555313

It is correct that we will detect idle time wrongly if the clock changes under us - but I think it should be put a bit in perspective - most changes of the clock will be very small, and you can only request to get informed of idle timeouts with a resolution of 1 second.

The biggest issue is when we change to-from daylight savings, at this point the idle time could get wrong by the same amount the clock is adjusted with - but the impact of this is limited.  (as soon as any mouse or key input reaches the application, the internal mLastUserInteractionInPR variable will be reset to the current time)

We should still update the the service to stop using PR_Now, but I believe it isn't a change most users would notice in the behavior of the application.
Comment 46 Mike Kristoffersen (:MikeK) 2012-02-28 13:47:15 PST
Created attachment 601396 [details] [diff] [review]
Now it should be updated with only the user and description compared to the R+'ed version :)
Comment 47 Justin Lebar (not reading bugmail) 2012-02-28 13:52:43 PST
> We should still update the the service to stop using PR_Now, but I believe it isn't a change most 
> users would notice in the behavior of the application.

Agreed.  But two things:

 * It's really important that telemetry not miss pings due to the clock being adjusted, because that will throw off our data.

 * When we expose the idle service to the web (bug 715041), pages will notice these bugs.
Comment 48 Mike Kristoffersen (:MikeK) 2012-02-28 14:02:51 PST
(In reply to Justin Lebar [:jlebar] from comment #47)
> > We should still update the the service to stop using PR_Now, but I believe it isn't a change most 
> > users would notice in the behavior of the application.
> 
> Agreed.  But two things:
> 
>  * It's really important that telemetry not miss pings due to the clock
> being adjusted, because that will throw off our data.
> 
>  * When we expose the idle service to the web (bug 715041), pages will
> notice these bugs.

I'll grab that as the next thing I work on.  Can we jump to Bug 555313 to continue the discussion on this specific topic?
Comment 49 Ryan VanderMeulen [:RyanVM] 2012-02-28 15:48:45 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/712ea53253f5

Also, the title of your patch should be a description of what it is actually doing rather than just using the bug title. See the page below:
https://developer.mozilla.org/en/Mercurial_FAQ#How_can_I_generate_a_patch_for_somebody_else_to_check-in_for_me.3F
Comment 50 Matt Brubeck (:mbrubeck) 2012-02-29 11:21:14 PST
https://hg.mozilla.org/mozilla-central/rev/712ea53253f5
Comment 51 Marco Bonardo [::mak] 2012-03-02 03:51:26 PST
Looks like this fixed idle service to a point that now it can fire idle-daily (and I suppose other idle events) even after xpcom-shutdown. I don't think there was a specific protection before, but likely was so broken to not fire it.
The problem is that idle-daily is a category, so it can try to init other components too late.
Comment 52 Marco Bonardo [::mak] 2012-03-02 03:54:22 PST
filed bug 732368 on this.

Note You need to log in before you can comment on or make changes to this bug.