Closed Bug 1200063 Opened 9 years ago Closed 9 years ago

Intermittent test_wheel_transactions.html | Test timed out

Categories

(Core :: Panning and Zooming, defect)

Unspecified
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: RyanVM, Assigned: botond)

References

Details

(Keywords: intermittent-failure)

Attachments

(7 files)

Attached image test screenshot
      No description provided.
Flags: needinfo?(botond)
Thanks, I'll investigate.
Assignee: nobody → botond
Flags: needinfo?(botond)
Works better when the logging actually compiles:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4d79a6e896d6
From the logs, what's happening is:

  - The test hangs because it never receives a 'scroll' event
    after dispatching a native wheel event.

  - This is because the repaint request that's expected to
    generate this 'scroll' event is never sent by APZ.

  - APZ *does* call RequestContentRepaint() after processing
    the wheel event.

    ==> The repaint request is likely stuck in the TaskThrottler. 

Reading the TaskThrottler code, it seems to me that this can easily happen if, at the time the RequestContentRepaint() call happens, there has been a previous repaint request within 500 ms (TaskThrottler::mMaxWaitTime), and that repaint request does not produce a layer transaction (which can happen if nothing changes), so TaskComplete() is never called.

Kats, what's supposed to be triggering the execution of the task in TaskThrottler in such a case?
Flags: needinfo?(bugmail.mozilla)
Hm, good catch. It does appear that this would result in the repaint request to get stuck. The max-wait timeout should prevent it from getting permanently stuck in production code since the next repaint request will get processed but in test scenarios we might not get one of those and/or the test may assert things in the meantime.

We should probably rethink the task throttle architecture; it may not make sense to keep any more particularly since it's per-APZC and so not super effective at throttling anyway.
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> The max-wait timeout should prevent it from getting
> permanently stuck in production code since the next repaint request will get
> processed

Do you mean that in production we should always get a layer transaction within a short window of time that would trigger a call to TaskComplete()? Is that the case even if nothing is changing on the page?
No, I just mean that in production the user will likely do something that triggers a repaint, and it won't get stuck permanently. If the user does nothing and nothing is happening on the page then yeah, we might end up in a user-visible bad state.
OK, thanks, that makes sense.

What do you think about revising TaskThrottler in the following ways:

  - Keep just one instance per APZCTreeManager, and have all APZCs
    in it use that.

    In addition to making the throttling per-APZCTM rather than
    per-APZC, this will make the tracking of average paint durations
    per-APZCTM. Is that reasonable?

  - Use an nsITimer to make sure the pending repaint request is sent
    when mMaxWaitTime elapses, even if no subsequent repaint request
    has arrived in that time.
I think we should have one per process (or one per layers id), rather than one per APZCTM. We shouldn't really be blocking paints in one process if paints are outstanding in some other process. And using a timer of some sort to ensure the repaint request is flushed would be good, yes.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #19)
> I think we should have one per process (or one per layers id), rather than
> one per APZCTM. We shouldn't really be blocking paints in one process if
> paints are outstanding in some other process.

Yep, that's an excellent point.
Bug 1200063 - Make APZCTreeManager the place where GetFrameTime() can be overridden. r=kats
Attachment #8662057 - Flags: review?(bugmail.mozilla)
Bug 1200063 - Share a paint throttler between APZCs in the same layers id. r=kats
Attachment #8662058 - Flags: review?(bugmail.mozilla)
Bug 1200063 - Add a generic implementation of nsITimerCallback that's usable with a lambda or other function object. r=kats

We can consider moving this to xpcom in the future.
Attachment #8662059 - Flags: review?(bugmail.mozilla)
Bug 1200063 - Make sure TaskThrottler sends its pending repaint request eventually, even if it never receives a notification from the previous request. r=kats

This is important because the previous repaint can be a no-op, and those don't trigger notifications.
Attachment #8662060 - Flags: review?(bugmail.mozilla)
Here's an attempt at fixing this. A few notes for Kats:

  - When the APZCTreeManager is destroyed, so will the throttlers.
    Do you think there's a risk of an APZC still being alive and
    accessing a throttler (to which it holds a reference) past that
    time? If so, I guess we'd need to do something like store a pointer
    to the throttler in APZC, null it out in Destroy(), and null-check
    all accesses to it, but I'd like to avoid that if possible.

  - If you don't like GenericTimerCallback being in APZThreadUtils.h,
    I'm happy to move it somewhere else (especially if you suggest
    a somewhere else).

  - I didn't make TaskThrottler reference-counted, and the timer
    callback captures and uses a raw pointer to it, but the
    destructor cancels the timer, so it shouldn't fire after the
    throttler is destroyed. Does that seem reasonable?
(In reply to Botond Ballo [:botond] from comment #33)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=047fc56ceb24

And, of course, STLport doesn't like storing a UniquePtr in a map...
Attachment #8662057 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8662057 [details]
MozReview Request: Bug 1200063 - Make APZCTreeManager the place where GetFrameTime() can be overridden. r=kats

https://reviewboard.mozilla.org/r/19491/#review17453

::: gfx/layers/apz/src/AsyncPanZoomController.cpp:816
(Diff revision 1)
> +     // mTreeManager must be initialized before GetFrameTime() is called

Technically there is a call to GetFrameTime in the paint throttler initialization too but since it goes away in a future patch ao it doesn't matter. :)

::: gfx/tests/gtest/TestAsyncPanZoomController.cpp:158
(Diff revision 1)
> +  TimeStamp GetFrameTime() override {

nit: add blank lines before and after this function.
Comment on attachment 8662058 [details]
MozReview Request: Bug 1200063 - Share a paint throttler between APZCs in the same layers id. r=kats

https://reviewboard.mozilla.org/r/19493/#review17461

That turned out to be much cleaner than I thought it would be, nice work!
Attachment #8662058 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8662059 [details]
MozReview Request: Bug 1200063 - Add a generic implementation of nsITimerCallback that's usable with a lambda or other function object. r=kats

https://reviewboard.mozilla.org/r/19495/#review17455

Looks good, but please file a follow-up bug to move it to xpcom.

::: gfx/layers/apz/util/APZThreadUtils.h:90
(Diff revision 1)
> +GenericTimerCallback<Function>* MakeTimerCallback(const Function& aFunction)

This should probably be called NewTimerCallback to be consistent with NewRunnableMethod and friends. I didn't think of that when I wrote MakeAPZCInstance but that one should probably be renamed as well.
Attachment #8662059 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8662060 [details]
MozReview Request: Bug 1200063 - Make sure TaskThrottler sends its pending repaint request eventually, even if it never receives a notification from the previous request. r=kats

https://reviewboard.mozilla.org/r/19489/#review17465

::: gfx/layers/apz/src/TaskThrottler.cpp:51
(Diff revision 1)
> +          MakeTimerCallback([this, timeoutTime]()

I'm not a huge fan of the large indent on the callback implementation but I can live with it because it's a small function. I'd like to find a good coding style for this and put it in the gecko coding style doc before we have too many of these things in the codebase.
Attachment #8662060 - Flags: review?(bugmail.mozilla) → review+
(In reply to Botond Ballo [:botond] from comment #32)
> Here's an attempt at fixing this. A few notes for Kats:
> 
>   - When the APZCTreeManager is destroyed, so will the throttlers.
>     Do you think there's a risk of an APZC still being alive and
>     accessing a throttler (to which it holds a reference) past that
>     time? If so, I guess we'd need to do something like store a pointer
>     to the throttler in APZC, null it out in Destroy(), and null-check
>     all accesses to it, but I'd like to avoid that if possible.

Hm, I feel like this is a legitimate concern, because we might have layers holding on to AsyncPanZoomController instances and which end up doing things like run another animation frame or something. That might then try to use the dead task throttler which would probably be a UAF vulnerability. We can at least detect it somewhat by adding MOZ_ASSERT(mTreeManager) at places where we use the task throttler (wrapping it in a helper if needed). Another option is to have a static dummy instance of a TaskThrottler, and in Destroy() we point the reference to the dummy instance.

>   - If you don't like GenericTimerCallback being in APZThreadUtils.h,
>     I'm happy to move it somewhere else (especially if you suggest
>     a somewhere else).

This is fine, although it would be good to move it into xpcom.

>   - I didn't make TaskThrottler reference-counted, and the timer
>     callback captures and uses a raw pointer to it, but the
>     destructor cancels the timer, so it shouldn't fire after the
>     throttler is destroyed. Does that seem reasonable?

As long as the TaskThrottler destructor and the timer callback are running on the same thread, yes. Can we add asserts for that?
The issue in comment 34 made me have to make TaskThrottler reference-counted, which I believe will end up mooting both lifetime issues. Will post updated patches as soon as I confirm it's building on STLport platforms.
Comment on attachment 8662057 [details]
MozReview Request: Bug 1200063 - Make APZCTreeManager the place where GetFrameTime() can be overridden. r=kats

Bug 1200063 - Make APZCTreeManager the place where GetFrameTime() can be overridden. r=kats
Comment on attachment 8662058 [details]
MozReview Request: Bug 1200063 - Share a paint throttler between APZCs in the same layers id. r=kats

Bug 1200063 - Share a paint throttler between APZCs in the same layers id. r=kats

In this process, TaskThrottler is made reference-counted.
Comment on attachment 8662059 [details]
MozReview Request: Bug 1200063 - Add a generic implementation of nsITimerCallback that's usable with a lambda or other function object. r=kats

Bug 1200063 - Add a generic implementation of nsITimerCallback that's usable with a lambda or other function object. r=kats

We can consider moving this to xpcom in the future.
Comment on attachment 8662060 [details]
MozReview Request: Bug 1200063 - Make sure TaskThrottler sends its pending repaint request eventually, even if it never receives a notification from the previous request. r=kats

Bug 1200063 - Make sure TaskThrottler sends its pending repaint request eventually, even if it never receives a notification from the previous request. r=kats

This is important because the previous repaint can be a no-op, and those don't trigger notifications.
Bug 1200063 - Rename MakeAPZCInstance to NewAPZCInstance for consistency. r=kats
Attachment #8662103 - Flags: review?(bugmail.mozilla)
Comment on attachment 8662058 [details]
MozReview Request: Bug 1200063 - Share a paint throttler between APZCs in the same layers id. r=kats

Re-requesting review. I believe the reference-counting of TaskThrottler solves the issue of APZC accessing a potentially destroyed throttler.
Attachment #8662058 - Flags: review+ → review?(bugmail.mozilla)
Comment on attachment 8662060 [details]
MozReview Request: Bug 1200063 - Make sure TaskThrottler sends its pending repaint request eventually, even if it never receives a notification from the previous request. r=kats

Re-requesting review. The lambda now captures an nsRefPtr to the throttler, keeping it alive. (The destructor still calls Cancel() for good measure.)
Attachment #8662060 - Flags: review+ → review?(bugmail.mozilla)
(In reply to Botond Ballo [:botond] from comment #47)
> Comment on attachment 8662060 [details]
> MozReview Request: Bug 1200063 - Make sure TaskThrottler sends its pending
> repaint request eventually, even if it never receives a notification from
> the previous request. r=kats
> 
> Re-requesting review. The lambda now captures an nsRefPtr to the throttler,
> keeping it alive. (The destructor still calls Cancel() for good measure.)

Also, let me know if you like this indentation better :)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #37)
> Looks good, but please file a follow-up bug to move it to xpcom.

Filed bug 1205480.
Note that I had to change NS_DECL_ISUPPORTS in GenericTimerCallbackBase to NS_DECL_THREADSAFE_ISUPPORTS because the way we're using it, Release() on the timer callback can be called on different threads. Fixed locally.
(In reply to Botond Ballo [:botond] from comment #50)
> Updated Try push, looking much better:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=8aaed62ecfb5

In this Try push I've retriggered test_wheel_transactions 20 times on both a release and a debug build, and it's passing in all cases, suggesting that the patches really do fix the underlying problem that caused the intermittent.
Attachment #8662058 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8662058 [details]
MozReview Request: Bug 1200063 - Share a paint throttler between APZCs in the same layers id. r=kats

https://reviewboard.mozilla.org/r/19493/#review17489
Comment on attachment 8662060 [details]
MozReview Request: Bug 1200063 - Make sure TaskThrottler sends its pending repaint request eventually, even if it never receives a notification from the previous request. r=kats

https://reviewboard.mozilla.org/r/19489/#review17493

Looks good, thanks!
Attachment #8662060 - Flags: review?(bugmail.mozilla) → review+
Attachment #8662103 - Flags: review?(bugmail.mozilla) → review+
Comment on attachment 8662103 [details]
MozReview Request: Bug 1200063 - Rename MakeAPZCInstance to NewAPZCInstance for consistency. r=kats

https://reviewboard.mozilla.org/r/19501/#review17497
Comment on attachment 8662415 [details] [diff] [review]
Rename MakeAPZCInstance to NewAPZCInstance for consistency

Argh, ignore this. Apologies, I typed `hg bz` :(.
Blocks: 1237905
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: