Closed Bug 1125999 Opened 5 years ago Closed 5 years ago

dom/svg/test/test_pathAnimInterpolation.xhtml | application timed out w/ silk refresh driver enabled

Categories

(Core :: Graphics, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: mchang, Assigned: mchang)

References

Details

Attachments

(3 files)

From https://treeherder.mozilla.org/#/jobs?repo=try&revision=17f8b0217a92 - 

The test is timing out with the refresh driver enabled, and only the refresh driver enabled. e.g. compositor + touch resampling are disabled.

From an initial investigation, it seems to only occur with the child refresh driver enabled. The timeout occurs due to the test_end mochitest notification not occurring. Without Silk, this should occur, but doesn't.

I/GeckoDump(  434): ⰲ겿{"action":"test_end","time":1422305807113,"thread":"","js_source":"TestRunner","pid":null,"source":"mochitest","test":"/tests/dom/svg/test/test_pathLength.html","status":"OK","message":"Finished in 917ms","extra":{"runtime":917}}ⰲ겿

Investigate and find out why.
I think all the test time outs probably suffer the same problem (M7, M9, Debug M14, Debug M17, Debug M20). Each suite contains a large test that all executes on the main thread. While this test is running, the software vsync thread is sending vsync notifications to the child process, main thread. Since the test is executing and blocking, a large number of vsync notifications pile up on the main thread to tick the refresh driver on the content process. Once the test finishes, the Test framework calls into the SpecialPowers (https://dxr.mozilla.org/mozilla-central/source/testing/specialpowers/content/specialpowersAPI.js#847), which posts another task on the main thread with setTimeout. The posted task is what actually executes the TestFinished code in mochitest. However, this posted task is behind a large number of run refresh driver tasks, so it never executes within the timeout period.

On my local machine, which has 32 gigs of ram + quad core, the test takes 13,631 ms to execute, which is (13,631 / 16.6 = 821 posted tasks to finish before getting to the TestFinished task. In addition, while these tasks are executing, more vsync tasks are coming every 16.6 ms. Locally, I tested a "one-shot" type of vsync notification and the test passed locally. Waiting on try before confirming.
BTW, the current refresh driver implementation will only schedule the next refresh driver tick after the current refresh driver has ticked, so there is no pile up.
This is my simple test. Wait for try result.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1519a4ea85e

1) use early return if the tick is useless.
2) reuse messageloop in softwareVsync. In desktop, I saw a lot of Enable/DisableVsync() call and we create and delete the new thread for several times.
From comment 3 - the try still has a timeout in Debug M14 and Opt M7. 

(In reply to Jerry Shih[:jerry] (UTC+8) from comment #3)
> This is my simple test. Wait for try result.
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1519a4ea85e
> 
> 1) use early return if the tick is useless.
> 2) reuse messageloop in softwareVsync. In desktop, I saw a lot of
> Enable/DisableVsync() call and we create and delete the new thread for
> several times.

We shouldn't have an implementation that knows whether or not the vsync has a message loop, nor should it depend only on the software vsync.

Try with a one shot type of refresh driver - https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3cf0c24ede4 - All the timeouts are gone. The implementation is terrible though, re-working.
(In reply to Mason Chang [:mchang] from comment #4)
> From comment 3 - the try still has a timeout in Debug M14 and Opt M7. 
> 
> (In reply to Jerry Shih[:jerry] (UTC+8) from comment #3)
> > This is my simple test. Wait for try result.
> > https://treeherder.mozilla.org/#/jobs?repo=try&revision=b1519a4ea85e
> > 
> > 1) use early return if the tick is useless.
> > 2) reuse messageloop in softwareVsync. In desktop, I saw a lot of
> > Enable/DisableVsync() call and we create and delete the new thread for
> > several times.
> 

I still got timeout with this implementation, so this is not work.

> We shouldn't have an implementation that knows whether or not the vsync has
> a message loop, nor should it depend only on the software vsync.
> 

I only change the SoftwareVsyncSource, not the framework part.


> Try with a one shot type of refresh driver -
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3cf0c24ede4 - All
> the timeouts are gone. The implementation is terrible though, re-working.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #5)
> I only change the SoftwareVsyncSource, not the framework part.

I looked at the patch pushed to try. So it's not really re-using the message loop, just not starting / stopping the thread right? I think we should still keep this behavior since that's what all the real platforms do as well (At least OSX and b2g).
I've been thinking of a couple of ways to throttle the messages. I don't really want to check within the IPC system that a flood of messages have occured since that seems like bad design. Another option is to have a keepalive IPC message where if we don't get a response back from the child every couple of IPC messages, then we stop listening to vsync. When we process the vsync messages, we can send theupdated keepalive. But since we don't have a specific number of IPC messages that have been sent, it would be complicated to figure out when the message queue has been flushed. Starting / stopping the refresh driver timer at specific times would break how the refresh driver already starts/stops timers with the various other timers. Keeping a queue of sent messages doesn't really work either since the child always gets them on the main thread, so when the message is acted upon depends on the IPC pipe.

The current refresh driver implementation without silk re-schedules the next timer once the current tick has finished and is a "one shot" type of timer. Jerry originally went with this option in the first implementation. I think we may have to go with this option since the amount of time between refreshes can be long due to the main thread being blocked. Figuring out some way in the parent to detect if the child is busy seems like a bad design.

The only wrinkle is that we would constantly enable/disable vsync due to how it's currently implemented. I think we can probably change this too such that we only enable/disable vsync when we start/stop timers.
large number of pending vsync ticks in message queue if child's main thread has
a heavy task at previous frame.

It might solve the timeout problem that we have a lot of pending vsync tick during
the long long test.
Attachment #8555780 - Flags: feedback?(bent.mozilla)
Comment on attachment 8555780 [details] [diff] [review]
Compress vsync IPC messages sent to the content process

From try - https://treeherder.mozilla.org/#/jobs?repo=try&revision=de41cd35eef6 - Compressing vsync notifications in the content process works pretty well! Will create a part 2 of this patch for the parent process which enforces the same behavior.
Attachment #8555780 - Flags: feedback?(bent.mozilla) → review?(bent.mozilla)
Attachment #8555780 - Attachment description: With "compress", we can skip the overdue vsync tick. Thus, we will not have a → Compress vsync IPC messages sent to the content process
IIRC compressed messages mean that only the most recent IPC message is processed? Been googling through the IPDL docs on MDN but couldn't find mention of the compress keyword.
(In reply to Mason Chang [:mchang] from comment #11)
> IIRC compressed messages mean that only the most recent IPC message is
> processed? Been googling through the IPDL docs on MDN but couldn't find
> mention of the compress keyword.

This is correct.
Hi Mason,
the last change of "compress" message is at bug 1076820.
https://bugzilla.mozilla.org/show_bug.cgi?id=1076820#c10
I think the most recent one is processed.
Attachment #8555780 - Flags: review?(bent.mozilla) → review+
WIP - Compress vsync notifications sent to the refresh driver in the parent process. Waiting on try before asking for review.
Compressing vsync notifications on the main thread looks good too - https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e5f6beb160c
Attachment #8556120 - Flags: review?(bugmail.mozilla)
Duplicate of this bug: 1125032
For attachment 8556120 [details] [diff] [review], we prevent the main thread from piling vsync notifications. When a vsync occurs, we update a vsync timestamp and only post a task if no task has been posted before. If the main thread in the parent process is busy, the posted task will not execute and we just update the latest vsync timestamp. When the posted task to tick the refresh drivers occurs, it will use the most recent vsync timestamp and allow tasks from the vsync thread to post again. This essentially mimics the behavior of the compress keyword on IPDL for the parent process vsync notifications.
Attachment #8556120 - Flags: review?(bugmail.mozilla) → review+
Duplicate of this bug: 1078179
https://hg.mozilla.org/mozilla-central/rev/77f201d0f462
https://hg.mozilla.org/mozilla-central/rev/0cdf611d1e9f
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Comment on attachment 8555780 [details] [diff] [review]
Compress vsync IPC messages sent to the content process

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 1123762, refresh driver for silk
User impact if declined: We cannot enable silk for the refresh driver
Testing completed: Manual testing and mochitests.
Risk to taking this patch (and alternatives if risky): Low. This patch prevents the main thread from being flooded with vsync messages and is required to pass mochitests with silk enabled.
String or UUID changes made by this patch: None
Attachment #8555780 - Flags: approval-mozilla-b2g37?
Comment on attachment 8556120 [details] [diff] [review]
WIP - Part 2: Compress vsync notifications in parent process

See comment 21.
Attachment #8556120 - Flags: approval-mozilla-b2g37?
Attachment #8555780 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Attachment #8556120 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Blocks: 1130678
You need to log in before you can comment on or make changes to this bug.