Closed Bug 1355746 Opened 8 years ago Closed 7 years ago

Parser should use idle dispatch and not a timer for background tabs

Categories

(Core :: DOM: HTML Parser, defect, P1)

50 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla56
Performance Impact high
Tracking Status
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: smaug, Assigned: hchang)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

http://searchfox.org/mozilla-central/rev/2fc8c8d483d9ec9fd0ec319c6c53807f7fa8e8a2/parser/html/nsHtml5TreeOpExecutor.cpp#264,280-288

Currently nsIThread has IdleDispatch, but no way to pass timeout value.
Timeout would be needed here. Or other option is to use timeout and idle dispatch in parser, and if the idle dispatch runnable is called before timeout, the timeout would be cancelled.
Priority: -- → P1
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p1]
Blocks: 1356682
Blocks: 1357114
Blocks: 1357146
Blocks: 1357154
Any slow down and jank in first use must be prioritized because negative users first experience may cause them to leave.
Depends on: 1358476
No longer blocks: 1356682
No longer blocks: 1357114
No longer blocks: 1357146
No longer blocks: 1357154
Hi :smaug,

I'd like to help solve this bug but want to be more clear first:

Once we do "idleDispatch with timeout", what it benefits is the background tab 
but "NOT the foreground tab" because the current implementation is equivalent
to giving the lowest priority (even lower than "idle") to the background tab
in the next 50 ms.

Also, if my above interpretation is correct, is it possible to retard
the foreground tab? For example, let's say the thread will become
idle at 45ms and there will be a foreground tab task landing at 49ms.
If the background tab task cannot be done within 5 ms, the new arrived
foreground task will be scheduled a little bit later as compared to
the current design. I know this is a very edge case but I just want to
point out that this is not impossible.

Thanks :)
Flags: needinfo?(bugs)
Assignee: nobody → hchang
We don't want to change foreground tab handling in this bug in any way.
In general we want foreground tabs to load as fast as possible, so the current mechanism should work ok for them. It is that we don't want loading background tabs to disturb interaction with foreground tab.

Also, in many cases using idle request may actually speed up background loading, since if there isn't really anything else happening, currently we run the timeout every 50ms, but we could just handle idle requests all the time.

Idle requests are (or will be very soon bug 1311425) timer aware, so they know if there is a timer coming soon, so this will also help with that issue. Currently the 50ms just runs at random times, whether or not there is something more important happening.
Currently if the timer runs, handling foreground tab is blocked until the timer callback has been executed, so this shouldn't make it any worse, and usually better.

I'd prefer if we kept the changes in this bug minimal. No major changes to ContentSink or anything.
Just replace the timer usage with idle dispatch. If we want to improve the handling, like take request's deadline into account, that should be done in a separate bug.
In other words, I'd prefer if we used this bug as an example of how to replace simple timer usage with idle dispatch.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #3)
> We don't want to change foreground tab handling in this bug in any way.
> In general we want foreground tabs to load as fast as possible, so the
> current mechanism should work ok for them. It is that we don't want loading
> background tabs to disturb interaction with foreground tab.
> 
> Also, in many cases using idle request may actually speed up background
> loading, since if there isn't really anything else happening, currently we
> run the timeout every 50ms, but we could just handle idle requests all the
> time.
> 
> Idle requests are (or will be very soon bug 1311425) timer aware, so they
> know if there is a timer coming soon, so this will also help with that
> issue. Currently the 50ms just runs at random times, whether or not there is
> something more important happening.
> Currently if the timer runs, handling foreground tab is blocked until the
> timer callback has been executed, so this shouldn't make it any worse, and
> usually better.
> 
> I'd prefer if we kept the changes in this bug minimal. No major changes to
> ContentSink or anything.
> Just replace the timer usage with idle dispatch. If we want to improve the
> handling, like take request's deadline into account, that should be done in
> a separate bug.
> In other words, I'd prefer if we used this bug as an example of how to
> replace simple timer usage with idle dispatch.

Thanks Olli! It' very clear to me right now. So I will have to wait for 
bug 1311425 if I am understanding correctly and simply replace the timer 
with that "timeout-aware idle request".

Thanks :)
We probably want to process background tab parsing only when SetDeadline give a deadline far enough in the future, say 3ms or so at least, or when SetDeadline hasn't been called at all (meaning the timer has run)

(3ms happens to be the default value for content.sink.interactive_parse_time)
Bug 1311425 is done at least from the API point of view so it's probably ok for you to start on this now, Henry :)
Flags: needinfo?(hchang)
(In reply to Andrew Overholt [:overholt] from comment #6)
> Bug 1311425 is done at least from the API point of view so it's probably ok
> for you to start on this now, Henry :)

Thanks for the notice. I will start working on this one. Thanks!
Flags: needinfo?(hchang)
Hi Olli,

I tried to translate your comment 5 to code and put it in the patch.

The patch still includes a tons of annoying trailing white spaces
and hasn't been tested yes. Do you mind having it a quick look and 
give me some early feedback to ensure I am on the right track?

Thanks!
Flags: needinfo?(bugs)
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review151388

::: parser/html/nsHtml5TreeOpExecutor.cpp:271
(Diff revision 1)
> +  void Run()
> +  {
> +    if (mDeadline.isSome()) {
> +      TimeDuration period = mDeadline.value() - TimeStamp::Now();
> +      if (period.ToMilliseconds() < 3) {
> +        return;

So what happens when we return here. Does anything reschedule the flusher?
We may need similar-ish setup what 
CollectorRunner has
https://hg.mozilla.org/mozilla-central/file/1c66da2b1e88/dom/base/nsJSEnvironment.cpp#l322

::: parser/html/nsHtml5TreeOpExecutor.cpp:287
(Diff revision 1)
> -    NS_RELEASE(gFlushTimer);
> -  }
> +    }
> -}
> +  }
>  
> +private:
> +  Maybe<TimeStamp> mDeadline;

Why Maybe? If deadline isn't set, mDeadline would return true from IsNull().
Hi Olli,

I submitted a new patch which I think already in a good shape.
It has two parts (maybe I will split it later):

1) class IdleScheduler, which internally works with NS_IdleDispatchToCurrentThread
   and NewIdleRunnableMethodWithTimer to schedule and re-schedule the given
   "scheduler callback" in idle cycles with bounded schedule time. The "budget"
   (as you said in comment 5) constraint is also taken into account.

2) Replacing nsHtml5TreeOpExecutor.cpp::gFlushTimer with gBackgroundFlushScheduler,
   where gBackgroundFlushScheduler is the specialized IdleScheduler. Since 
   IdleScheduler is designed to behave pretty similar to a repeating timer,
   it should be easy to understand and review from the the "use of the new API"
   perspective.

I haven't had a test case for IdleScheduler and I do think it is mandatory.
However, what I'd to know the most is how I can make sure I do or do not
regress the background/foreground parsing. I don't have a good idea to 
systematically test it except for comparing the background page loading time
with the old ones.

According to your comment 3, 

"I'd prefer if we used this bug as an example of how to replace simple timer usage with idle dispatch."

do you think the IdleScheduler (maybe it deserves a better name) is good
(in terms of generality and usability) enough to be the example?

Thanks!
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review151766

::: xpcom/threads/IdleScheduler.h:20
(Diff revision 3)
> +// (when current thread is in the idle cycle) no later than the given 
> +// timeout. Also, a "minimum budget" can be specified so that the task 
> +// will not be executed and will be re-scheduled if we are not given 
> +// enough time to finish the task.
> +template<uint32_t TimeoutMs, uint32_t MinBudgetMs = 0>
> +class IdleScheduler final

I'd prefer if this extended IdleRunnable to avoid extra objects.
No need to use NewIdleRunnableMethodWithTimer then but just NS_IdleDispatchToCurrentThread(runnable, delay) and re-dispatch the same object.

::: xpcom/threads/IdleScheduler.h:48
(Diff revision 3)
> +    if (mCancelled) {
> +      NS_WARNING("IdleScheduler has been cancelled.");
> +      return;
> +    }
> +
> +    RefPtr<MyIdleSchedulerType> unused;

What is this variable?

::: xpcom/threads/IdleScheduler.h:54
(Diff revision 3)
> +    if (!mDeadline.IsNull()) {
> +      // Check if we have sufficient time to execute task.
> +      TimeDuration budget = mDeadline - TimeStamp::Now();
> +      if (budget.ToMilliseconds() < (double)MinBudgetMs) {
> +        NS_WARNING("Budget not enough. Schedule again.");
> +        nsresult rv = MyIdleSchedulerType::ScheduleInternal(this);

So this is problematic. If there is just short idle period, say 2ms, we end up re-scheduling all the time until all that 2ms is used. That is the reason why CollectorRunner has a timer for scheduling.
Rewrote IdleScheduler a few bits in reference to CollectorRunner:

1) Let IdleScheduler extend IdleRunnable and maintain its own timers:
   one for nsIIdleRunnable::SetTimer and one for "delayed reschedule".
2) Remove template.
3) Rename IdleScheduler::Schedule to IdleScheduler::Create as a factory method
4) Because of (3), we can rename ScheduleInternal() to Schedule() since
   it has been the only one.
5) Using an additional timer "mDelayedScheduleTimer" for "delayed schedule"
   when the given idle cycle is too short.
6) Align the semantics of "budget" to what's used in CollectorRunner, where
   "budget" is the minimum time slot requirement of the given idle cycle.

Once the new IdleScheduler is in place, maybe we can replace CollectorRunner
with it. The differences and lacks of IdleScheduler are

1) CollectorRunner seems to use the callback return value and mRepeating
   to decide if we should repeat. IMHO, using the callback return value
   is simpler and good enough.
2) CollectorRunner supports "disallow idle dispatch".
3) CollectorRunner seems dependent on nsRefreshDriver. Its insight hasn't
   been clear to me so have no idea how to generalize such a case.
Hi Olli,

I polished my IdleScheduler and compared with CollectorRunner
as comment 16 and the latest patch is on review board now. I will
be writing test cases in the next couple days. 

What I might need your feedback for now is, do you think if I should
just "copy" the implementation CollectorRunner? The more I polish
my IdleScheduler, the more I feel it's exactly what IdleScheduler
is supposed to be. However, CollectorRunner is also quite new
and hasn't been verified very well. So, what would you like me to
move forward at this moment?

Thanks :)
"CollectorRunner seems to use the callback return value and mRepeating
 to decide if we should repeat. IMHO, using the callback return value
 is simpler and good enough."

CollectorRunner uses return value to indicate if callback did some work. It has nothing to do with repeat.
http://searchfox.org/mozilla-central/rev/d840ebd5858a61dbc1622487c1fab74ecf235e03/dom/base/nsJSEnvironment.cpp#232-233

All idle handling depends on refresh driver. CollectorRunner uses nsRefreshDriver::DispatchIdleRunnableAfterTick so that it doesn't need to create extra 
mScheduleTimer all the time, and RefreshDriver tick is anyhow better time to schedule another idle dispatch.


It might make sense to move CollectorRunner to some more generic place, rename it and just use it for this bug too. Better to have just one idle runner, if possible.

I may make some tweaks to CollectorRunner, but doesn't matter if the code has moved elsewhere.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #18)
> "CollectorRunner seems to use the callback return value and mRepeating
>  to decide if we should repeat. IMHO, using the callback return value
>  is simpler and good enough."
> 
> CollectorRunner uses return value to indicate if callback did some work. It
> has nothing to do with repeat.
> http://searchfox.org/mozilla-central/rev/
> d840ebd5858a61dbc1622487c1fab74ecf235e03/dom/base/nsJSEnvironment.cpp#232-233
> 

Oh my understanding is we can still repeat the runner by returning
'false' without mRpeating==true. (the difference is it will not
schedule immediately.)

> All idle handling depends on refresh driver. CollectorRunner uses
> nsRefreshDriver::DispatchIdleRunnableAfterTick so that it doesn't need to
> create extra 
> mScheduleTimer all the time, and RefreshDriver tick is anyhow better time to
> schedule another idle dispatch.
> 
> 
> It might make sense to move CollectorRunner to some more generic place,
> rename it and just use it for this bug too. Better to have just one idle
> runner, if possible.
> 
> I may make some tweaks to CollectorRunner, but doesn't matter if the code
> has moved elsewhere.

I decided to drop my own idle runner and work on CollectorRunner directly.
In order to make it easier to review, I created two patches

1) 'Locally' modify and rename CollectorRunner (to IdleTaskRunner) in nsJSEnvironment.cpp 
   so that we can see the diff.

2) Move IdleTaskRunner to xpcom/thread/IdleTaskRunner.h so that this bug
   can reuse it.

Here is the summary of the changes I made to CollectorRunner:

1) Rename CollectorRunner to IdleTaskRunner.
2) Change the callback type from plain c function pointer to std::function()
   so that we can bind data into it by lambda when needed.
3) Because of (2), we can remove the closure we pass via the constructor.
4) Because of (3), we have to slightly modify the callbacks (i.e. [1])
5) http://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/dom/base/nsJSEnvironment.cpp#263

```
    // Deadline is null when called from timer.
    bool deadLineWasNull = mDeadline.IsNull();
    bool didRun = false;
    if (deadLineWasNull || ((TimeStamp::Now() + mBudget) < mDeadline)) {
      CancelTimer();
      didRun = mCallback(mDeadline, mData);
    }

    if (mCallback && (mRepeating || !didRun)) {
      // If we didn't do meaningful work, don't schedule using immediate
      // idle dispatch, since that could lead to a loop until the idle
      // period ends.
      Schedule(didRun);
    }

    return NS_OK;
```

I would think we should always call CancelTimer() because the "side flag"
|mTimerActive| would block SetTimer() from being called by [2]

-------------------------------------------------------------------

So, do you think we should leave the patches in this bug or split them
to two different bugs? What I am doing is the former but I am also glad
to do the later.

Thanks!



[1] http://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/dom/base/nsJSEnvironment.cpp#1769
[2] http://searchfox.org/mozilla-central/rev/61054508641ee76f9c49bcf7303ef3cfb6b410d2/xpcom/threads/nsThreadUtils.cpp#382
Flags: needinfo?(bugs)
(In reply to Henry Chang [:hchang] from comment #23)
> Oh my understanding is we can still repeat the runner by returning
> 'false' without mRpeating==true. (the difference is it will not
> schedule immediately.)
sure. mRepeating == true means that the runner behaves as repeating timer.
Returning false from callback means nothing was actually done, so a new run is needed.
Perhaps that is a bit confusing but the existing GC/CC handling needed it.


> 5)
> http://searchfox.org/mozilla-central/rev/
> 61054508641ee76f9c49bcf7303ef3cfb6b410d2/dom/base/nsJSEnvironment.cpp#263
> 
> ```
>     // Deadline is null when called from timer.
>     bool deadLineWasNull = mDeadline.IsNull();
>     bool didRun = false;
>     if (deadLineWasNull || ((TimeStamp::Now() + mBudget) < mDeadline)) {
>       CancelTimer();
>       didRun = mCallback(mDeadline, mData);
>     }
> 
>     if (mCallback && (mRepeating || !didRun)) {
>       // If we didn't do meaningful work, don't schedule using immediate
>       // idle dispatch, since that could lead to a loop until the idle
>       // period ends.
>       Schedule(didRun);
>     }
> 
>     return NS_OK;
> ```
> 
> I would think we should always call CancelTimer() because the "side flag"
> |mTimerActive| would block SetTimer() from being called by [2]
no. The whole point is that if we didn't run the callback, we don't want to restart the timer.
In fact, CancelTimer should probably be called only if didRun is true.



> So, do you think we should leave the patches in this bug or split them
> to two different bugs? What I am doing is the former but I am also glad
> to do the later.
up to you, both are fine.
Flags: needinfo?(bugs)
Attachment #8877035 - Flags: review?(bugs)
Attachment #8875790 - Flags: review?(bugs)
(In reply to Olli Pettay [:smaug] from comment #24)
> (In reply to Henry Chang [:hchang] from comment #23)
>
> > So, do you think we should leave the patches in this bug or split them
> > to two different bugs? What I am doing is the former but I am also glad
> > to do the later.
> up to you, both are fine.

So I will be taking the former and have already flagged you for review :)

I *intentionally* leave local IdleTaskRunner in nsJSEnvironment.cpp in
the 2nd patch (it's supposed to use xpcom/thread/IdleTaskRunner.h) 
just because it would be easier to apply when they are under development.

After r+'ed,

I will 1) Remove the local IdleTaskRunner in nsJSEnvironment.cpp and
       2) Maybe move the implementation to a file.

Besides, one change I forgot to mention in comment 23 is I removed
the dependency on |sShuttingDown| [1]. The alternative is using 
a callback |ShouldCancel| to query the external state for "passive"
cancellation (from the caller perspective, because the caller doesn't
explicitly Cancel() the runner.)

[1] http://searchfox.org/mozilla-central/rev/d67ef71097da4d1aa344c9d9c672e49a7228e765/dom/base/nsJSEnvironment.cpp#241
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review153578

::: dom/base/nsJSEnvironment.cpp:365
(Diff revision 7)
>  
>          // We weren't allowed to do idle dispatch immediately, do it after a
>          // short timeout.
> -        mScheduleTimer->InitWithFuncCallback(ScheduleTimedOut, this, 16,
> +        uint32_t lateScheduleDelayMs;
> +        if (mDeadline.IsNull()) {
> +          lateScheduleDelayMs = 16; // XXX: Is it reasonable?

Why wouldn't it be? It is refresh driver tick

::: dom/base/nsJSEnvironment.cpp:1976
(Diff revision 7)
>  void
>  GCTimerFired(nsITimer *aTimer, void *aClosure)
>  {
>    nsJSContext::KillGCTimer();
>    // Now start the actual GC after initial timer has fired.
> -  sInterSliceGCRunner = CollectorRunner::Create(InterSliceGCRunnerFired,
> +  sInterSliceGCRunner = IdleTaskRunner::Create([aClosure](TimeStamp aDeadline) {

Indentation looks weird here, or is it just some MozReview bug.


(and I do so hate C++ lambdas. So easy way to make code unreadable.)

::: dom/base/nsJSEnvironment.cpp:2264
(Diff revision 7)
>      nsCycleCollector_dispatchDeferredDeletion();
>  
>      sCCRunner =
> -      CollectorRunner::Create(CCRunnerFired, NS_CC_SKIPPABLE_DELAY,
> -                              kForgetSkippableSliceDuration, true);
> +      IdleTaskRunner::Create(CCRunnerFired, NS_CC_SKIPPABLE_DELAY,
> +                              kForgetSkippableSliceDuration, true,
> +                              []{ return sShuttingDown; });

Fix indentation here. Params should align under CCRunnerFired

::: dom/base/nsJSEnvironment.cpp:2442
(Diff revision 7)
>  
>        // Schedule another GC slice if the GC has more work to do.
>        nsJSContext::KillInterSliceGCRunner();
>        if (!sShuttingDown && !aDesc.isComplete_) {
>          sInterSliceGCRunner =
> -          CollectorRunner::Create(InterSliceGCRunnerFired, NS_INTERSLICE_GC_DELAY,
> +          IdleTaskRunner::Create([](TimeStamp aDeadline) {

Huh, this is ugly, but that is what C++ lambdas are.
Attachment #8875790 - Flags: review?(bugs) → review+
Comment on attachment 8877035 [details]
Bug 1355746 - Part 2. Polish IdleTaskRunner and reuse it for background parsing.

https://reviewboard.mozilla.org/r/148376/#review153632

::: parser/html/nsHtml5TreeOpExecutor.cpp:64
(Diff revision 3)
>        mExecutor->RunFlushLoop();
>        return NS_OK;
>      }
>  };
>  
> +static RefPtr<IdleTaskRunner> gBackgroundFlushRunner = nullptr;

StaticRefPtr, and doesn't need to be explicitly initialized to nullptr;

::: parser/html/nsHtml5TreeOpExecutor.cpp:293
(Diff revision 3)
>      }
> +    // Now we set up a repetitive idle scheduler for flushing background list.
> +    gBackgroundFlushRunner =
> +      IdleTaskRunner::Create(&BackgroundFlushCallback, 
> +                            50, // The hard deadline: 50ms.
> +                            3,  // Required budget: 3ms.

Want to add comment why 3

::: xpcom/threads/IdleTaskRunner.h:16
(Diff revision 3)
> +#include "nsRefreshDriver.h"
> +
> +namespace mozilla {
> +
> +// Repeating callback runner.
> +class IdleTaskRunner final : public IdleRunnable

Hmm, dumping this class all to .h file doesn't look good. Add also .cpp and put non-trivial method implementations there.

Also, this patch is missing to remove the code from nsJSEnvironment.
Attachment #8877035 - Flags: review?(bugs) → review-
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review153634

::: dom/base/nsJSEnvironment.cpp:368
(Diff revision 7)
> -        mScheduleTimer->InitWithFuncCallback(ScheduleTimedOut, this, 16,
> +        uint32_t lateScheduleDelayMs;
> +        if (mDeadline.IsNull()) {
> +          lateScheduleDelayMs = 16; // XXX: Is it reasonable?
> +        } else {
> +          lateScheduleDelayMs =
> +            (uint32_t)((mDeadline - TimeStamp::Now()).ToMilliseconds());

You have 'now' already as a variable.

And mDeadline - TimeStamp::Now() doesn't make sense since mDeadline points to uninitilized TimeStamp here.
Attachment #8875790 - Flags: review+ → review-
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review153578

Thanks Olli! 

So would you like me to remove all the uses of lambda here?
(given that you said you are not a fan of it)

> Why wouldn't it be? It is refresh driver tick

It just occured to me that if mDeadline is null, according to the following flow,


    bool deadLineWasNull = mDeadline.IsNull();
    bool didRun = false;
    if (deadLineWasNull || ((TimeStamp::Now() + mBudget) < mDeadline)) {
      CancelTimer();
      didRun = mCallback(mDeadline);
    }

    if (mCallback && (mRepeating || !didRun)) {
      // If we didn't do meaningful work, don't schedule using immediate
      // idle dispatch, since that could lead to a loop until the idle
      // period ends.
      Schedule(didRun);
    }

Schedule should be called with didRun==true very likely (because that's the
point we have timeout as a hard deadline of the task, isn't it?) so we should
very unlikely to hit the case on line 365.

If we still reach line 365, since we don't know how far the refresh tick is,
will 8ms be a more suitable value which leads to the greatest probability that
we "just" had a refresh tick? Or we need to guarantee we have passed a refresh tick
in this case?

> Indentation looks weird here, or is it just some MozReview bug.
> 
> 
> (and I do so hate C++ lambdas. So easy way to make code unreadable.)

It seems to be the mozreview line-wrap bug :(
Comment on attachment 8877035 [details]
Bug 1355746 - Part 2. Polish IdleTaskRunner and reuse it for background parsing.

https://reviewboard.mozilla.org/r/148376/#review153632

Sorry for not explicitly pointing out that I was explaining why I didn't 
remove code in nsJSEnvironment and put everything in the header in
the following comment

https://bugzilla.mozilla.org/show_bug.cgi?id=1355746#c27

"I *intentionally* leave local IdleTaskRunner in nsJSEnvironment.cpp in
the 2nd patch (it's supposed to use xpcom/thread/IdleTaskRunner.h) 
just because it would be easier to apply when they are under development.

After r+'ed,

I will 1) Remove the local IdleTaskRunner in nsJSEnvironment.cpp and
       2) Maybe move the implementation to a file."
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review153578

> It just occured to me that if mDeadline is null, according to the following flow,
> 
> 
>     bool deadLineWasNull = mDeadline.IsNull();
>     bool didRun = false;
>     if (deadLineWasNull || ((TimeStamp::Now() + mBudget) < mDeadline)) {
>       CancelTimer();
>       didRun = mCallback(mDeadline);
>     }
> 
>     if (mCallback && (mRepeating || !didRun)) {
>       // If we didn't do meaningful work, don't schedule using immediate
>       // idle dispatch, since that could lead to a loop until the idle
>       // period ends.
>       Schedule(didRun);
>     }
> 
> Schedule should be called with didRun==true very likely (because that's the
> point we have timeout as a hard deadline of the task, isn't it?) so we should
> very unlikely to hit the case on line 365.
> 
> If we still reach line 365, since we don't know how far the refresh tick is,
> will 8ms be a more suitable value which leads to the greatest probability that
> we "just" had a refresh tick? Or we need to guarantee we have passed a refresh tick
> in this case?

Oops I got it. Line 365 is the case where we are almost in the beginning
of a full refresh cycle so 16 ms makes the timer fire right after the next 
tick.

If so, should I revert back to the original design (always 16)
because the "RefreshDriver is ticking" case will be handled well
on line 342: 

nsRefreshDriver::DispatchIdleRunnableAfterTick(this, mDelay);
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review153578

> Oops I got it. Line 365 is the case where we are almost in the beginning
> of a full refresh cycle so 16 ms makes the timer fire right after the next 
> tick.
> 
> If so, should I revert back to the original design (always 16)
> because the "RefreshDriver is ticking" case will be handled well
> on line 342: 
> 
> nsRefreshDriver::DispatchIdleRunnableAfterTick(this, mDelay);

Hmmm I think didn't acutally get it...

```
  TimeStamp GetIdleDeadlineHint(TimeStamp aDefault)
  {
    MOZ_ASSERT(NS_IsMainThread());

    if (LastTickSkippedAnyPaints()) {
      return TimeStamp::Now();
    }

    TimeStamp mostRecentRefresh = MostRecentRefresh();
    TimeDuration refreshRate = GetTimerRate();
    TimeStamp idleEnd = mostRecentRefresh + refreshRate;

    if (idleEnd +
          refreshRate * nsLayoutUtils::QuiescentFramesBeforeIdlePeriod() <
        TimeStamp::Now()) {
      return aDefault;
    }

    idleEnd = idleEnd - TimeDuration::FromMilliseconds(
      nsLayoutUtils::IdlePeriodDeadlineLimit());
    return idleEnd < aDefault ? idleEnd : aDefault;
  }
```

```
TimeStamp hint = nsRefreshDriver::GetIdleDeadlineHint(now);
    if (hint != now) {
      // Case 1: RefreshDriver is ticking
    } else {
      // Case 2: RefreshDriver doesn't seem to be running.
    }
```

For case 1 ==> (adjusted) idleEnd < now


For case 2 ==> (adjusted) idleEnd >= now:

What we only know here (case 2) is we will not be ticking very soon but the
|idleEnd| from now may still range from 1 (idlePeriodDeadLineLimit) to 15 ms.
Ideally in case 2 we'd like the timer to be fired right "after" the next tick.
Not sure if we have to be such ideal but I just want to make sure I didn't
misunderstand something here :)
Note, we use the schedule timer to re-scheduling using _idle_dispatch_, so it doesn't matter too much what the time is, since nothing heavy work will happen when that timer fires.
But sure, right after deadline (when deadline was non-null) would be good.
(In reply to Olli Pettay [:smaug] from comment #35)
> Note, we use the schedule timer to re-scheduling using _idle_dispatch_, so
> it doesn't matter too much what the time is, since nothing heavy work will
> happen when that timer fires.
> But sure, right after deadline (when deadline was non-null) would be good.

Thanks Olli :)

I have decided to reorder my patches as the following strategy:

1) Part 1. Rename, move code around and separate implementation to cpp.
   This patch is meant to be very trivial:
     1) Rename CollectorRunner to IdleTaskRunner.
     2) Move IdleTaskRunner New files: IdleTaskTimer.h/cpp.
 
2) Part 2. Since we know part 1 doesn't change any bit of the runner,
   we can regard IdleTaskTimer.h/cpp is exactly the same as the CollectorRunner
   in nsJSEnvironment and review the diff.

3) Test cases for testing basic feature of IdleTaskRunner: repeating/non-repeating/ShouldCancel/...

As for the best reschedule delay, I didn't change anything from the last
patch: (deadline - now) or 16ms if no deadline

Could you please review my latest patches again? Thanks :)
Attachment #8875790 - Flags: review?(bugs)
Attachment #8877035 - Flags: review?(bugs)
Attachment #8878372 - Flags: review?(bugs)
Attachment #8875790 - Flags: review?(bugs)
Attachment #8877035 - Flags: review?(bugs)
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review154448

I don't understand why this compiles.
But ok, the next patch seems to change that somehow..

::: xpcom/threads/IdleTaskRunner.cpp:16
(Diff revision 8)
> +
> +already_AddRefed<IdleTaskRunner>
> +IdleTaskRunner::Create(IdleTaskRunnerCallback aCallback, uint32_t aDelay,
> +                       int64_t aBudget, bool aRepeating, void* aData)
> +{
> +  if (sShuttingDown) {

What sShuttingDown is this now referring to?

::: xpcom/threads/IdleTaskRunner.cpp:35
(Diff revision 8)
> +  , mBudget(TimeDuration::FromMilliseconds(aBudget))
> +  , mRepeating(aRepeating), mTimerActive(false), mData(aData)
> +{
> +}
> +
> +nsresult 

This should be NS_IMETHODIMP

::: xpcom/threads/IdleTaskRunner.cpp:35
(Diff revision 8)
> +  , mBudget(TimeDuration::FromMilliseconds(aBudget))
> +  , mRepeating(aRepeating), mTimerActive(false), mData(aData)
> +{
> +}
> +
> +nsresult 

Do you have space after nsresult

::: xpcom/threads/IdleTaskRunner.cpp:67
(Diff revision 8)
> +{
> +  RefPtr<IdleTaskRunner> runnable = static_cast<IdleTaskRunner*>(aClosure);
> +  runnable->Run();
> +}
> +
> +void 

space after void?

::: xpcom/threads/IdleTaskRunner.cpp:93
(Diff revision 8)
> +                                 nsITimer::TYPE_ONE_SHOT);
> +    mTimerActive = true;
> +  }
> +}
> +
> +nsresult 

space after nsresult?

::: xpcom/threads/IdleTaskRunner.cpp:110
(Diff revision 8)
> +{
> +  RefPtr<IdleTaskRunner> runnable = static_cast<IdleTaskRunner*>(aClosure);
> +  runnable->Schedule(true);
> +}
> +
> +void 

space after void?
Attachment #8875790 - Flags: review?(bugs) → review+
Comment on attachment 8877035 [details]
Bug 1355746 - Part 2. Polish IdleTaskRunner and reuse it for background parsing.

https://reviewboard.mozilla.org/r/148376/#review154452

::: commit-message-ce9c9:5
(Diff revision 4)
> +Bug 1355746 - Part 2. Polish IdleTaskRunner and reuse it for background parsing.
> +
> +This patch is mainly to make IdleTaskRunner reusable by nsHtml5TreeOpExecutor.
> +The only necessary work to that purpose is to remove the dependency of
> +sShuttingDown, which was a static variable in nsHtml5TreeOpExecutor.cpp.

No, it was static variable in nsJSEnvironment.cpp, not in nsHTML5TreeOpExecutor.cpp

::: parser/html/nsHtml5TreeOpExecutor.cpp:291
(Diff revision 4)
> -                                             "FlushTimerCallback");
>      }
> +    // Now we set up a repetitive idle scheduler for flushing background list.
> +    gBackgroundFlushRunner =
> +      IdleTaskRunner::Create(&BackgroundFlushCallback, 
> +                             50, // The hard deadline: 50ms.

We can probably increase this 50 a lot. If foreground tab is busy, why require processing background tabs so often (in the same child process).
I'd be ok to try 250. But up to you, can be done also in a separate bug.

::: xpcom/threads/IdleTaskRunner.h:20
(Diff revision 4)
>  class IdleTaskRunner final : public IdleRunnable
>  {
>  public:
> +  // Return true if some meaningful work was done.
> +  using CallbackType = std::function<bool(TimeStamp aDeadline)>;
> +  

some spaces here.

::: xpcom/threads/IdleTaskRunner.h:21
(Diff revision 4)
>  {
>  public:
> +  // Return true if some meaningful work was done.
> +  using CallbackType = std::function<bool(TimeStamp aDeadline)>;
> +  
> +  using ShouldCancelCallbackType = std::function<bool()>;

This really needs a comment. And perhaps the name isn't quite right.
MayContinueProcessingCallbackType perhaps?

::: xpcom/threads/IdleTaskRunner.cpp:158
(Diff revision 4)
> -      mScheduleTimer->InitWithFuncCallback(ScheduleTimedOut, this, 16,
> +      uint32_t lateScheduleDelayMs;
> +      if (deadline.IsNull()) {
> +        lateScheduleDelayMs = 16;
> +      } else {
> +        lateScheduleDelayMs =
> +          (uint32_t)((deadline - now).ToMilliseconds());

'now' can be larger than 'deadline', and often is.
If now > deadline, we can actually dispatch to idle queue immediately.
So perhaps just change
if (aAllowIdleDispatch) { to
if (aAllowIdleDispatch || now > deadline) {
Or do some similar check when calling Schedule in Run()
Attachment #8877035 - Flags: review?(bugs) → review+
Comment on attachment 8878372 [details]
Bug 1355746 - Part 3. Test cases for IdleTaskRunner.

https://reviewboard.mozilla.org/r/149352/#review154456

::: xpcom/tests/gtest/TestThreadUtils.cpp:586
(Diff revision 1)
> +  using namespace mozilla;
> +
> +  // Repeating.
> +  int cnt1 = 0;
> +  RefPtr<IdleTaskRunner> runner1 =
> +    IdleTaskRunner::Create([&cnt1](TimeStamp) { cnt1++; return true; }, 

Some extra spaces at the end of many lines here

::: xpcom/tests/gtest/TestThreadUtils.cpp:592
(Diff revision 1)
> +                           10,
> +                           3,
> +                           true,
> +                           nullptr);
> +
> +  // Non-repeating but callback always return true so it's still repeating.

This comment is wrong. Callback returns false.
Attachment #8878372 - Flags: review?(bugs) → review+
Comment on attachment 8875790 [details]
Bug 1355746 - Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp

https://reviewboard.mozilla.org/r/147200/#review154448

Yes this doesn't compile due to the static variable |sShuttingDown| from original place (nsJSEnvironment).

> What sShuttingDown is this now referring to?

It's referring to nowhere so this patch itself doesn't compile 
and the next patch would fix this. The reason I didn't fix it 
in this patch is to keep this patch as simple as possible (only rename and code mvoe)
Hi Olli,

I was about to land the patch but realized (thanks to gtest) we have to
explicitly call IdleTaskRunner::Cancel() for a repeating runner
to completely delete the IdleTaskRunner object and stop repeating.
Does it make sense to you? I think that's because IdleTaskRunner
may be also refcounted by perhaps the thread object so it requires an
explicit cycle reference breakup. 

I would imagine this issue can be fixed by separating IdleTaskRunner
and the runnable but just like to know your though. If you feel the
current design and is totally fine, I will add comments to describe 
this fact and land it as soon as possible.

Thanks :)
Flags: needinfo?(bugs)
Yes, explicit Cancel is rather expected for a repeating runner.
Why would you separate runner and runnable?
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] from comment #51)
> Yes, explicit Cancel is rather expected for a repeating runner.
> Why would you separate runner and runnable?

If the runner is NOT a runnable, we may be able to have the chance 
to remove circular references in the runner dtor. But since it's 
fine to you, I'll just leave everything as it and land the patches.
Thanks :)
Pushed by hchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/733c828d5ba3
Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp r=smaug
https://hg.mozilla.org/integration/autoland/rev/38722eed8c8a
Part 2. Polish IdleTaskRunner and reuse it for background parsing. r=smaug
https://hg.mozilla.org/integration/autoland/rev/8cd0a75a643a
Part 3. Test cases for IdleTaskRunner. r=smaug
https://hg.mozilla.org/mozilla-central/rev/733c828d5ba3
https://hg.mozilla.org/mozilla-central/rev/38722eed8c8a
https://hg.mozilla.org/mozilla-central/rev/8cd0a75a643a
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Backed out for frequently failing mochitest dom/html/test/test_fullscreen-api.html and devtools' browser_service_workers_multi_content_process.js, both on Linux:

https://hg.mozilla.org/mozilla-central/rev/6779547fa0fca357851739ebee9bfd461675c7aa
https://hg.mozilla.org/mozilla-central/rev/8bb14d77ad3a975e83af424d21b6d3098df74c13
https://hg.mozilla.org/mozilla-central/rev/c87eba9664d6e85d7a36f2811f8709877842e115

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=8cd0a75a643a5b4747d9acd15b3bc90a8251305c&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable

Failure log devtools: https://treeherder.mozilla.org/logviewer.html#?job_id=108926190&repo=autoland

[task 2017-06-21T17:28:35.450810Z] 17:28:35     INFO - TEST-START | devtools/client/aboutdebugging/test/browser_service_workers_multi_content_process.js
[task 2017-06-21T17:28:37.001596Z] 17:28:37     INFO - GECKO(1177) | SW registered
[task 2017-06-21T17:29:20.499811Z] 17:29:20     INFO - TEST-INFO | started process screentopng
[task 2017-06-21T17:29:21.101117Z] 17:29:21     INFO - TEST-INFO | screentopng: exit 0
[task 2017-06-21T17:29:21.105094Z] 17:29:21     INFO - Buffered messages logged at 17:28:35
[task 2017-06-21T17:29:21.105145Z] 17:29:21     INFO - Entering test bound 
[task 2017-06-21T17:29:21.105189Z] 17:29:21     INFO - Force two content processes
[task 2017-06-21T17:29:21.105231Z] 17:29:21     INFO - opening about:debugging
[task 2017-06-21T17:29:21.105283Z] 17:29:21     INFO - Adding a new tab with URL: about:debugging#workers
[task 2017-06-21T17:29:21.105324Z] 17:29:21     INFO - Tab added and finished loading
[task 2017-06-21T17:29:21.105400Z] 17:29:21     INFO - TEST-PASS | devtools/client/aboutdebugging/test/browser_service_workers_multi_content_process.js | warning message is rendered - 
[task 2017-06-21T17:29:21.105476Z] 17:29:21     INFO - Adding a new tab with URL: http://example.com/browser/devtools/client/aboutdebugging/test/service-workers/empty-sw.html
[task 2017-06-21T17:29:21.106484Z] 17:29:21     INFO - Buffered messages logged at 17:28:37
[task 2017-06-21T17:29:21.107249Z] 17:29:21     INFO - Tab added and finished loading
[task 2017-06-21T17:29:21.108035Z] 17:29:21     INFO - Buffered messages finished
[task 2017-06-21T17:29:21.108238Z] 17:29:21     INFO - TEST-UNEXPECTED-FAIL | devtools/client/aboutdebugging/test/browser_service_workers_multi_content_process.js | Test timed out - 
[task 2017-06-21T17:29:21.109937Z] 17:29:21     INFO - Removing tab.
[task 2017-06-21T17:29:21.110114Z] 17:29:21     INFO - Waiting for event: 'TabClose' on [object XULElement].
[task 2017-06-21T17:29:21.113237Z] 17:29:21     INFO - Got event: 'TabClose' on [object XULElement].
[task 2017-06-21T17:29:21.114006Z] 17:29:21     INFO - Tab removed and finished closing
[task 2017-06-21T17:29:21.114124Z] 17:29:21     INFO - Removing tab.
[task 2017-06-21T17:29:21.114919Z] 17:29:21     INFO - Waiting for event: 'TabClose' on [object XULElement].
[task 2017-06-21T17:29:21.115003Z] 17:29:21     INFO - Got event: 'TabClose' on [object XULElement].
[task 2017-06-21T17:29:21.115710Z] 17:29:21     INFO - Tab removed and finished closing
[task 2017-06-21T17:29:21.116427Z] 17:29:21     INFO - GECKO(1177) | MEMORY STAT | vsize 2381MB | residentFast 375MB | heapAllocated 146MB


Failure log mochitest plain: https://treeherder.mozilla.org/logviewer.html#?job_id=108809756&repo=autoland
[task 2017-06-21T10:46:48.541797Z] 10:46:48     INFO - TEST-PASS | dom/html/test/test_fullscreen-api.html | [prefixed] Should get prefixed event on [object Window] 
[task 2017-06-21T10:46:48.543664Z] 10:46:48     INFO - [prefixed] Check fullscreen state after failed request in test case 1
[task 2017-06-21T10:46:48.545481Z] 10:46:48     INFO - TEST-PASS | dom/html/test/test_fullscreen-api.html | [prefixed] Should not be in fullscreen 
[task 2017-06-21T10:46:48.547286Z] 10:46:48     INFO - TEST-PASS | dom/html/test/test_fullscreen-api.html | [prefixed] Fullscreen element should be null 
[task 2017-06-21T10:46:48.549224Z] 10:46:48     INFO - TEST-PASS | dom/html/test/test_fullscreen-api.html | [prefixed] document.mozFullScreenElement should be identical to fullscreenElement 
[task 2017-06-21T10:46:48.550964Z] 10:46:48     INFO - Buffered messages logged at 10:41:48
[task 2017-06-21T10:46:48.552839Z] 10:46:48     INFO - TEST-PASS | dom/html/test/test_fullscreen-api.html | [prefixed] Fullscreen element should not match :fullscreen pseudo class 
[task 2017-06-21T10:46:48.554742Z] 10:46:48     INFO - TEST-PASS | dom/html/test/test_fullscreen-api.html | [prefixed] Fullscreen element should not match :-moz-full-screen pseudo class 
[task 2017-06-21T10:46:48.556593Z] 10:46:48     INFO - Buffered messages finished
[task 2017-06-21T10:46:48.558518Z] 10:46:48     INFO - TEST-UNEXPECTED-FAIL | dom/html/test/test_fullscreen-api.html | Test timed out. 
[task 2017-06-21T10:46:48.560354Z] 10:46:48     INFO -     reportError@SimpleTest/TestRunner.js:121:7
[task 2017-06-21T10:46:48.562101Z] 10:46:48     INFO -     TestRunner._checkForHangs@SimpleTest/TestRunner.js:142:7
[task 2017-06-21T10:46:48.563889Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.565647Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.567293Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.568981Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.570689Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.572349Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.573994Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.575664Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.577438Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.579116Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.581080Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.583096Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.584878Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.586608Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.588218Z] 10:46:48     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-21T10:46:48.589888Z] 10:46:48     INFO -     TestRunner.runTests@SimpleTest/TestRunner.js:380:5
[task 2017-06-21T10:46:48.591476Z] 10:46:48     INFO -     RunSet.runtests@SimpleTest/setup.js:194:3
[task 2017-06-21T10:46:48.593069Z] 10:46:48     INFO -     RunSet.runall@SimpleTest/setup.js:173:5
[task 2017-06-21T10:46:48.594612Z] 10:46:48     INFO -     hookupTests@SimpleTest/setup.js:266:5
[task 2017-06-21T10:46:48.596238Z] 10:46:48     INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
[task 2017-06-21T10:46:48.597955Z] 10:46:48     INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
[task 2017-06-21T10:46:48.599619Z] 10:46:48     INFO - EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3
[task 2017-06-21T10:46:48.601463Z] 10:46:48     INFO -     hookup@SimpleTest/setup.js:246:5
[task 2017-06-21T10:46:48.603436Z] 10:46:48     INFO - EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp&cleanupCrashes=true:11:1
[task 2017-06-21T10:46:48.693984Z] 10:46:48     INFO - GECKO(2783) | MEMORY STAT | vsize 1061MB | residentFast 233MB | heapAllocated 82MB
[task 2017-06-21T10:46:48.702684Z] 10:46:48     INFO - TEST-OK | dom/html/test/test_fullscreen-api.html | took 327083ms
Status: RESOLVED → REOPENED
Flags: needinfo?(hchang)
Resolution: FIXED → ---
https://treeherder.mozilla.org/#/jobs?repo=try&revision=780b7c9ed356
https://treeherder.mozilla.org/#/jobs?repo=try&revision=523ae2151299
https://treeherder.mozilla.org/#/jobs?repo=try&revision=64756ef88e84
https://treeherder.mozilla.org/#/jobs?repo=try&revision=699e0b0c5d50
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1eabc47a117d
(In reply to Olli Pettay [:smaug] from comment #41)
> Comment on attachment 8877035 [details]
>
> ::: parser/html/nsHtml5TreeOpExecutor.cpp:291
> (Diff revision 4)
> > -                                             "FlushTimerCallback");
> >      }
> > +    // Now we set up a repetitive idle scheduler for flushing background list.
> > +    gBackgroundFlushRunner =
> > +      IdleTaskRunner::Create(&BackgroundFlushCallback, 
> > +                             50, // The hard deadline: 50ms.
> 
> We can probably increase this 50 a lot. If foreground tab is busy, why
> require processing background tabs so often (in the same child process).
> I'd be ok to try 250. But up to you, can be done also in a separate bug.
> 

I had to use 50ms as the idle dispatch timeout otherwise test_fullscreen-api.html
would fail very frequently.  

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1eabc47a117d392ae903ed99d8edeb0b8331cd26

Even if I set the budget to 0, that test case would still fail a lot:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=699e0b0c5d50d7b818263ed5fad13e7ce40b8a23

Will file bug for investigating the idle dispatch timeout issue.
Flags: needinfo?(hchang)
Why is that fullscreen test failing? Is it a racy test?
I have totally no idea :( 

It only happens on Linux platform and I can't reproduce it on my local machine.
xidorn, if you're in SF and also Henry, perhaps you could try to figure out why the fullscreen test fails?
Flags: needinfo?(xidorn+moz)
From the latest try results (with 50ms hard deadline),

https://treeherder.mozilla.org/#/jobs?repo=try&revision=14bc8f48e2de076245b173279a5935025f8d68ac

almost every test is good except 'Linux Debug bc14'. It failed at different test cases
among 8 retries. Investigating...
Depends on: 1377131
Yeah, sure, Henry, where are you?
Flags: needinfo?(xidorn+moz)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=deb5d074a2c1
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a43a19c7a89e
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9e26f2423af5
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ae6f4780562c
https://treeherder.mozilla.org/#/jobs?repo=try&revision=101077d16863
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f667d9acf0f1
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d52f81cfd068
https://treeherder.mozilla.org/#/jobs?repo=try&revision=20a6fec9213c
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7efaf2e8cd33
https://treeherder.mozilla.org/#/jobs?repo=try&revision=701e62ee5519
No longer depends on: 1377131
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e943f6547b1
https://treeherder.mozilla.org/#/jobs?repo=try&revision=71cd7cf42b7c
Attachment #8885586 - Flags: review?(xidorn+moz)
Comment on attachment 8885586 [details]
Bug 1355746 - Part 4. Fix intermittent full screen test failures.

https://reviewboard.mozilla.org/r/156430/#review161588
Attachment #8885586 - Flags: review+
Comment on attachment 8885586 [details]
Bug 1355746 - Part 4. Fix intermittent full screen test failures.

https://reviewboard.mozilla.org/r/156430/#review161592
Attachment #8885586 - Flags: review?(xidorn+moz) → review+
Hi Xidorn,

I did a little research on the full screen testing failures 
(only on Linux opt build) but only concluded that the the 
failure is due to the failed check (FullscreenElementReadyCheck)
[1] which is caused by [2] (IsInActiveTab).

To be more detailed, right after requesting to exit fullscreen mode,
(via gtk API on Linux [3]), the window will not get active/focused
synchronously. I haven't figured out if it's the limitation of gtk
or gecko's essential issue.

I modified the test cases to ensure the testing window will have been active
before we request fullscreen. That totally fixed the intermittent failures.
Could you help review the patch?

Thanks!

[1] http://searchfox.org/mozilla-central/rev/31311070d9860b24fe4a7a36976c14b328c16208/dom/base/nsDocument.cpp#11512
[2] http://searchfox.org/mozilla-central/rev/31311070d9860b24fe4a7a36976c14b328c16208/dom/base/nsDocument.cpp#11407
[3] http://searchfox.org/mozilla-central/rev/31311070d9860b24fe4a7a36976c14b328c16208/widget/gtk/nsWindow.cpp#5007
I've r+ed the patch at comment 90, no?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff67b02625e6
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #92)
> I've r+ed the patch at comment 90, no?

Oops! I left the comment just a few minutes later than your review!
Thanks!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f0fc35271f6088f1650b9693a01b815425e8f48

The fullscreen failures have gone but there are still some more frequent intermittent 
failures than ever. I am flagging checkin-needed to let the sherif decide if we are fine
to land the patches.
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b223c191f442
Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp r=smaug
https://hg.mozilla.org/integration/autoland/rev/e0fd0ae58d0f
Part 2. Polish IdleTaskRunner and reuse it for background parsing. r=smaug
https://hg.mozilla.org/integration/autoland/rev/ae4f6149befb
Part 3. Test cases for IdleTaskRunner. r=smaug
https://hg.mozilla.org/integration/autoland/rev/8a5e11d641cc
Part 4. Fix intermittent full screen test failures. r=smaug,xidorn
Keywords: checkin-needed
Backed out for frequently failing mochitests with: /tests/dom/base/test/test_data_uri.html logged result after SimpleTest.finish(): iframe should have NullPrincipal:

https://hg.mozilla.org/integration/autoland/rev/3a8257f5d27be7af836deeaee029caf0facd7763
https://hg.mozilla.org/integration/autoland/rev/23059882ec88991533906672b4117cc9db26993f
https://hg.mozilla.org/integration/autoland/rev/ee6785ebedc5c2858898b8721d3a66f67ac5ab6a
https://hg.mozilla.org/integration/autoland/rev/82edb51e8a4c8b9b90c67f395008b2ee9d42d9c0

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=8a5e11d641cc6e80ab5f9f390e64f18c6b84f79f&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=113688964&repo=autoland

[task 2017-07-12T16:45:55.953158Z] 16:45:55     INFO - TEST-START | dom/base/test/test_createHTMLDocument.html
[task 2017-07-12T16:45:56.352579Z] 16:45:56     INFO - GECKO(1871) | MEMORY STAT | vsize 20974084MB | residentFast 836MB
[task 2017-07-12T16:45:56.355040Z] 16:45:56     INFO - TEST-OK | dom/base/test/test_createHTMLDocument.html | took 368ms
[task 2017-07-12T16:45:56.413617Z] 16:45:56     INFO - TEST-START | dom/base/test/test_data_uri.html
[task 2017-07-12T16:45:56.994062Z] 16:45:56     INFO - GECKO(1871) | img onload
[task 2017-07-12T16:45:57.587232Z] 16:45:57     INFO - GECKO(1871) | MEMORY STAT | vsize 20974083MB | residentFast 837MB
[task 2017-07-12T16:45:57.696611Z] 16:45:57     INFO - TEST-OK | dom/base/test/test_data_uri.html | took 1284ms
[task 2017-07-12T16:45:57.739592Z] 16:45:57    ERROR - /tests/dom/base/test/test_data_uri.html logged result after SimpleTest.finish(): iframe should have NullPrincipal.
Flags: needinfo?(hchang)
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #97)
> Backed out for frequently failing mochitests with:
> /tests/dom/base/test/test_data_uri.html logged result after
> SimpleTest.finish(): iframe should have NullPrincipal:
> 
> https://hg.mozilla.org/integration/autoland/rev/
> 3a8257f5d27be7af836deeaee029caf0facd7763
> https://hg.mozilla.org/integration/autoland/rev/
> 23059882ec88991533906672b4117cc9db26993f
> https://hg.mozilla.org/integration/autoland/rev/
> ee6785ebedc5c2858898b8721d3a66f67ac5ab6a
> https://hg.mozilla.org/integration/autoland/rev/
> 82edb51e8a4c8b9b90c67f395008b2ee9d42d9c0
> 
> Push with failures:
> https://treeherder.mozilla.org/#/
> jobs?repo=autoland&revision=8a5e11d641cc6e80ab5f9f390e64f18c6b84f79f&filter-
> resultStatus=testfailed&filter-resultStatus=busted&filter-
> resultStatus=exception&filter-resultStatus=retry&filter-
> resultStatus=usercancel&filter-resultStatus=runnable
> Failure log:
> https://treeherder.mozilla.org/logviewer.html#?job_id=113688964&repo=autoland
> 
> [task 2017-07-12T16:45:55.953158Z] 16:45:55     INFO - TEST-START |
> dom/base/test/test_createHTMLDocument.html
> [task 2017-07-12T16:45:56.352579Z] 16:45:56     INFO - GECKO(1871) | MEMORY
> STAT | vsize 20974084MB | residentFast 836MB
> [task 2017-07-12T16:45:56.355040Z] 16:45:56     INFO - TEST-OK |
> dom/base/test/test_createHTMLDocument.html | took 368ms
> [task 2017-07-12T16:45:56.413617Z] 16:45:56     INFO - TEST-START |
> dom/base/test/test_data_uri.html
> [task 2017-07-12T16:45:56.994062Z] 16:45:56     INFO - GECKO(1871) | img
> onload
> [task 2017-07-12T16:45:57.587232Z] 16:45:57     INFO - GECKO(1871) | MEMORY
> STAT | vsize 20974083MB | residentFast 837MB
> [task 2017-07-12T16:45:57.696611Z] 16:45:57     INFO - TEST-OK |
> dom/base/test/test_data_uri.html | took 1284ms
> [task 2017-07-12T16:45:57.739592Z] 16:45:57    ERROR -
> /tests/dom/base/test/test_data_uri.html logged result after
> SimpleTest.finish(): iframe should have NullPrincipal.

test_data_uri.html is as new as my patch and it doesn't seem to be relevant 
to my patches. Will rebase and check what's going on.
Flags: needinfo?(hchang)
(In reply to Henry Chang [:hchang] from comment #95)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=3f0fc35271f6088f1650b9693a01b815425e8f48
> 
> The fullscreen failures have gone but there are still some more frequent
> intermittent 
> failures than ever. I am flagging checkin-needed to let the sherif decide if
> we are fine
> to land the patches.

I don't think sheriff usually reads the comment before landing patches, especially for MozReview ones. And landing patches with known frequent intermittents would just waste everyone's time, so please don't do this.
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #99)
> (In reply to Henry Chang [:hchang] from comment #95)
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=3f0fc35271f6088f1650b9693a01b815425e8f48
> > 
> > The fullscreen failures have gone but there are still some more frequent
> > intermittent 
> > failures than ever. I am flagging checkin-needed to let the sherif decide if
> > we are fine
> > to land the patches.
> 
> I don't think sheriff usually reads the comment before landing patches,
> especially for MozReview ones. And landing patches with known frequent
> intermittents would just waste everyone's time, so please don't do this.

Sorry for that. Maybe I should have directly ni'ed someone.

Investigating test_data_uri.html, it appears that [1] a "data:" iframe principal
is not nullprincipal when it ought to be. Cannot reproduce it locally even
with throttled CPU to simulate the try environment.

[1] http://searchfox.org/mozilla-central/rev/cbd628b085ac809bf5a536109e6288aa91cbdff0/dom/base/test/test_data_uri.html#28
Yoshi wrote that test, perhaps he can help.
Flags: needinfo?(allstars.chh)
A quick finding according to the log [1] is nsIPrincipal.isNullPrincipal 
may be false even if nsIPrincipal.origin is something like 
'moz-nullprincipal:{3b13cb12-f07d-4a18-9b2c-9e27998e4e1f}'

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=114724500&repo=try&lineNumber=8760
(In reply to Henry Chang [:hchang] from comment #102)
> A quick finding according to the log [1] is nsIPrincipal.isNullPrincipal 
> may be false even if nsIPrincipal.origin is something like 
> 'moz-nullprincipal:{3b13cb12-f07d-4a18-9b2c-9e27998e4e1f}'
> 
> [1]
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=114724500&repo=try&lineNumber=8760

It seems technically possible

http://searchfox.org/mozilla-central/rev/01d27fdd3946f7210da91b18fcccca01d7324fe2/caps/BasePrincipal.cpp#369

if ContentPrincipal::GenerateOriginNoSuffixFromURI() failed.
(In reply to Henry Chang [:hchang] from comment #103)
> (In reply to Henry Chang [:hchang] from comment #102)
> > A quick finding according to the log [1] is nsIPrincipal.isNullPrincipal 
> > may be false even if nsIPrincipal.origin is something like 
> > 'moz-nullprincipal:{3b13cb12-f07d-4a18-9b2c-9e27998e4e1f}'
> > 
> > [1]
> > https://treeherder.mozilla.org/logviewer.
> > html#?job_id=114724500&repo=try&lineNumber=8760
> 
> It seems technically possible
> 
> http://searchfox.org/mozilla-central/rev/
> 01d27fdd3946f7210da91b18fcccca01d7324fe2/caps/BasePrincipal.cpp#369
> 
> if ContentPrincipal::GenerateOriginNoSuffixFromURI() failed.

Oops I was totally wrong. NullPrincipal would be created instead if codebaseprincipal
so the weird state in comment 102 remains weird...
(In reply to Henry Chang [:hchang] from comment #104)
> (In reply to Henry Chang [:hchang] from comment #103)
> > (In reply to Henry Chang [:hchang] from comment #102)
> > > A quick finding according to the log [1] is nsIPrincipal.isNullPrincipal 
> > > may be false even if nsIPrincipal.origin is something like 
> > > 'moz-nullprincipal:{3b13cb12-f07d-4a18-9b2c-9e27998e4e1f}'
> > > 
> > > [1]
> > > https://treeherder.mozilla.org/logviewer.
> > > html#?job_id=114724500&repo=try&lineNumber=8760
> > 
> > It seems technically possible
> > 
> > http://searchfox.org/mozilla-central/rev/
> > 01d27fdd3946f7210da91b18fcccca01d7324fe2/caps/BasePrincipal.cpp#369
> > 
> > if ContentPrincipal::GenerateOriginNoSuffixFromURI() failed.
> 
Oops I was totally wrong. NullPrincipal would be created instead at

http://searchfox.org/mozilla-central/rev/01d27fdd3946f7210da91b18fcccca01d7324fe2/caps/BasePrincipal.cpp#369

I thought the one which is created is NullPrincipalURI. So, I still cannot explain
comment 102.
Well, the very original error message had told the reason:

[task 2017-07-17T10:19:15.400501Z] 10:19:15    ERROR - /tests/dom/base/test/test_data_uri.html logged result after SimpleTest.finish(): iframe should have NullPrincipal

It's just because [1] might run before [2] and that's what the error message suggested.


[1] http://searchfox.org/mozilla-central/rev/01d27fdd3946f7210da91b18fcccca01d7324fe2/dom/base/test/test_data_uri.html#85
[2] http://searchfox.org/mozilla-central/rev/01d27fdd3946f7210da91b18fcccca01d7324fe2/dom/base/test/test_data_uri.html#28
Flags: needinfo?(allstars.chh)
Pushed by hchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4f39e2e274e0
Part 1. Rename CollectRunner to IdleTaskRunner and move to xpcom/threads/IdleTaskRunner.h/cpp r=smaug
https://hg.mozilla.org/integration/autoland/rev/c72349ea26d7
Part 2. Polish IdleTaskRunner and reuse it for background parsing. r=smaug
https://hg.mozilla.org/integration/autoland/rev/a85ddf5510dc
Part 3. Test cases for IdleTaskRunner. r=smaug
https://hg.mozilla.org/integration/autoland/rev/4e1b7f382b22
Part 4. Fix intermittent full screen test failures. r=smaug,xidorn
yippee, crossing fingers :)
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: