Closed Bug 1147953 Opened 5 years ago Closed 5 years ago

Vsync Refresh Driver Stops Painting Content on Windows 10

Categories

(Core :: Graphics, defect, P2)

x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- unaffected
firefox39 --- affected
firefox40 --- fixed

People

(Reporter: mchang, Assigned: mchang)

References

()

Details

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1144317 +++

When we enable the vsync refresh driver on Windows 10, technical preview 10041, content and chrome seems to stop painting. Users cannot click on anything and keyboard keys do not show on the screen. This seems to occur regardless of video card. OMTC is enabled and the vsync compositor works fine.

Can reproduce with windows 10 TP on vmware fusion on a macbook pro as well. Find and fix this.
See Also: → 1147753
DwmGetCompositionTimingInfo and DwmFlush APIs are pretty different on Windows 10. DwmFlush wakes up before the vblank time, so when QueryPerformanceCounter returns, it returns a timestamp before vblank according to DwmGetCompositionTimingInfo. This creates a negative timestamp, but since QPC_TIME is an unsigned unsigned int, it becomes a very large number. Since we subtract this number from TimeStamp::Now(), we skip refresh driver ticks since we can't have timestamps in the past, causing nothing to paint.
Duplicate of this bug: 1148002
Fallback to software vsync on Windows 10. The timestamps of when DwmFlush wakes up as well as the timestamps provided by DwmGetCompositionTimingInfo don't make a lot of sense and vary wildly, even more so than software timers. For example, I was getting +- 5ms to +-40 ms on a test program outside of gecko. I tried using D3DKMTWaitForVerticalBlankEvent, but on Windows 10, the documentation says that it is only valid and reserved for system use / windows drivers. I couldn't find any documentation regarding DwmGetCompositionTimingInfo changes on Windows 10, so for now let's fallback to software vsync on Windows 10 and check back when Windows 10 gets closer to release.

As a side note, I filed a feedback form with Windows 10, so we'll see if that fixes anything.
Attachment #8584154 - Flags: review?(jmuizelaar)
(In reply to Mason Chang [:mchang] from comment #3)
> Created attachment 8584154 [details] [diff] [review]
> Fallback to software vsync on Windows 10
> 
> Fallback to software vsync on Windows 10. The timestamps of when DwmFlush
> wakes up as well as the timestamps provided by DwmGetCompositionTimingInfo
> don't make a lot of sense and vary wildly, even more so than software
> timers. For example, I was getting +- 5ms to +-40 ms on a test program
> outside of gecko. I tried using D3DKMTWaitForVerticalBlankEvent, but on
> Windows 10, the documentation says that it is only valid and reserved for
> system use / windows drivers. I couldn't find any documentation regarding
> DwmGetCompositionTimingInfo changes on Windows 10, so for now let's fallback
> to software vsync on Windows 10 and check back when Windows 10 gets closer
> to release.
> 
> As a side note, I filed a feedback form with Windows 10, so we'll see if
> that fixes anything.

Should we land a fix for this? You're testing against a prerelease os, this bug will likely be fixed in the rc. I'm worried this will land and then be forgotten. 

Do Windows 10 users have a work around? Can this new vsync feature be disabled via a pref for example?
(In reply to Jim Mathies [:jimm] from comment #4)
> (In reply to Mason Chang [:mchang] from comment #3)
> > Created attachment 8584154 [details] [diff] [review]
> > Fallback to software vsync on Windows 10
> > 
> > Fallback to software vsync on Windows 10. The timestamps of when DwmFlush
> > wakes up as well as the timestamps provided by DwmGetCompositionTimingInfo
> > don't make a lot of sense and vary wildly, even more so than software
> > timers. For example, I was getting +- 5ms to +-40 ms on a test program
> > outside of gecko. I tried using D3DKMTWaitForVerticalBlankEvent, but on
> > Windows 10, the documentation says that it is only valid and reserved for
> > system use / windows drivers. I couldn't find any documentation regarding
> > DwmGetCompositionTimingInfo changes on Windows 10, so for now let's fallback
> > to software vsync on Windows 10 and check back when Windows 10 gets closer
> > to release.
> > 
> > As a side note, I filed a feedback form with Windows 10, so we'll see if
> > that fixes anything.
> 
> Should we land a fix for this? You're testing against a prerelease os, this
> bug will likely be fixed in the rc. I'm worried this will land and then be
> forgotten. 
> 
> Do Windows 10 users have a work around? Can this new vsync feature be
> disabled via a pref for example?

I'm not quite sure I understand what you mean. All the vsync work is currently behind a preference and this particular bug was caused by bug 1144317, which was backed out. I file a follow up bug as Windows gets closer to release to test again.

Windows 10 users will fallback to software timers, which is currently what we have before the vsync work. They should mostly no worse off than they are today. Does that answer your question?
(In reply to Mason Chang [:mchang] from comment #5)
> Windows 10 users will fallback to software timers, which is currently what
> we have before the vsync work. They should mostly no worse off than they are
> today. Does that answer your question?

The current windows code does use DwmGetCompositionTimingInfo for timing, which typically does better than software timers. Not sure how well it does on Windows 10 though, as it wasn't around when the pre-silk windows vsync code landed.
> I'm not quite sure I understand what you mean. All the vsync work is
> currently behind a preference and this particular bug was caused by bug
> 1144317, which was backed out. I file a follow up bug as Windows gets closer
> to release to test again.
> 
> Windows 10 users will fallback to software timers, which is currently what
> we have before the vsync work. They should mostly no worse off than they are
> today. Does that answer your question?

There's no point in landing this patch, in fact it hurts you a bit since windows 10 users can't test for you. I'd suggest you not hard code windows 10 to software, and ask those users to flip 'gfx.vsync.refreshdriver' to test when they get new releases. The dwm api problems you're seeing should be fixed by the rc.
(In reply to Jim Mathies [:jimm] from comment #7)
> > I'm not quite sure I understand what you mean. All the vsync work is
> > currently behind a preference and this particular bug was caused by bug
> > 1144317, which was backed out. I file a follow up bug as Windows gets closer
> > to release to test again.
> > 
> > Windows 10 users will fallback to software timers, which is currently what
> > we have before the vsync work. They should mostly no worse off than they are
> > today. Does that answer your question?
> 
> There's no point in landing this patch, in fact it hurts you a bit since
> windows 10 users can't test for you. I'd suggest you not hard code windows
> 10 to software, and ask those users to flip 'gfx.vsync.refreshdriver' to
> test when they get new releases. The dwm api problems you're seeing should
> be fixed by the rc.

Well, if we don't land this patch, we can't land silk at all for Windows 7 and Windows 8 users, which already show good performance benefits. We want to enable silk and enable it by default for the platforms we care about, and it's already enabled by default on b2g and OS X. Yes, they should be fixed by RC, so we can just delete this patch once RC comes around.
Updated to adjust the timestamp to Now in case of negative timestamps.
Attachment #8584154 - Attachment is obsolete: true
Attachment #8584154 - Flags: review?(jmuizelaar)
Attachment #8584187 - Flags: review?(jmuizelaar)
Comment on attachment 8584187 [details] [diff] [review]
Fallback to software timers on Windows 10

Review of attachment 8584187 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +2158,5 @@
> +  // bug 1147953 - Windows 10 TP 10041 DwmComposition produces very
> +  // different and non-uniform timestamps. DwmFlush also wakes up the thread
> +  // before DwmGetCompositionTimingInfo vblank time. Disable actual vsync
> +  // on Windows 10 until it gets closer to release to double check again.
> +  if (IsWin10OrLater()) {

What if we also add an exception depending on a preference such as gfx.vsync.hw-vsync.forced ?

Otherwise, this is hardcoded to never work, but if there's a new Windows 10 build which fixes the issue or improve it, or if there are some other conditions which could make HW vsync behave as we expect it to, no one will be able to find out since it's just disabled by this code and that's it - until the next time you give it a go.

This is no longer an issue of it absolutely doesn't work like it was initially since the guard against negative adjustment should make it work.

Let's allow people to play with it even when we think it doesn't work reliably, like we do with D2D1.1 and other gfx stuff which is disabled by default but still allows users to experiment with by forcing the "dodgy" code paths.
I just verified that DwmComposition on Windows 10 in our current refresh driver implementation is also broken. It always falls back to software timers.
Duplicate of this bug: 1148160
No, I'm not a fan of letting users experiment with dodgy code paths. In my opinion, Windows 10 is pre-release software and we shouldn't spend a lot of time supporting pre-release software. We also have the problem where DwmFlush wakes up a lot sooner than vsync as well than on Windows 7/8, so they will tick the refresh driver more often, which will be even jankier than software timers and burn CPU. Realistically, the refresh driver code path from Vsync -> RefreshDriver doesn't really matter where vsync comes from. The refresh drivers should just tick. As long as the patch is roughly equal to what we have now, I'm ok turning this on so we can land silk on Windows 7, 8, and 8.1. I'd much rather users test Windows 10 again after we internally have verified that it works rather than have them crash into our bugs.
Would it be possible to allow the Vsync Refresh Driver to fall back to the software timers automatically? Say if its mean frequency is nowhere near the reported refresh rate, or if it varies wildly. Playing with an implementation of my own a few years back on Windows 7, I found that the DWM would sometimes be completely unusable on my laptop, with DWMFlush() returning with wildly varying times or sometimes even immediately. I assume that was a driver bug of some sort, but AMD's legacy drivers are the latest available for my laptop so it might well still be there.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #14)
> Would it be possible to allow the Vsync Refresh Driver to fall back to the
> software timers automatically? Say if its mean frequency is nowhere near the
> reported refresh rate, or if it varies wildly. Playing with an
> implementation of my own a few years back on Windows 7, I found that the DWM
> would sometimes be completely unusable on my laptop, with DWMFlush()
> returning with wildly varying times or sometimes even immediately. I assume
> that was a driver bug of some sort, but AMD's legacy drivers are the latest
> available for my laptop so it might well still be there.

So we do that already. If DwmComposition is unavailable, we fallback to software timers. This current patch checks for crazy values and resets them to sane values and ticks. The problem is that the thread should sleep until DwmFlush() finishes, which blocks the thread until DWM finishes composting. In theory, DWM should control all compositing across the OS, so Gecko going ahead of DWM doesn't really help because DWM will just delay showing our content to the screen until it's ready. I think on Windows 8 and on, you have to use DWM.

In terms of DWMFush returning wildly varying times, yes you are right. We correct for this by using DwmGetCompositionTimingInfo to get the actual vblank time, we don't use Now() from when DWMFlush() finishes. We then get the actual vblank time and feed that through to the refresh driver.
(In reply to Mason Chang [:mchang] from comment #15)
> (In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #14)
> > Would it be possible to allow the Vsync Refresh Driver to fall back to the
> > software timers automatically? Say if its mean frequency is nowhere near the
> > reported refresh rate, or if it varies wildly. Playing with an
> > implementation of my own a few years back on Windows 7, I found that the DWM
> > would sometimes be completely unusable on my laptop, with DWMFlush()
> > returning with wildly varying times or sometimes even immediately. I assume
> > that was a driver bug of some sort, but AMD's legacy drivers are the latest
> > available for my laptop so it might well still be there.
> 
> So we do that already. If DwmComposition is unavailable, we fallback to
> software timers. This current patch checks for crazy values and resets them
> to sane values and ticks. The problem is that the thread should sleep until
> DwmFlush() finishes, which blocks the thread until DWM finishes composting.
> In theory, DWM should control all compositing across the OS, so Gecko going
> ahead of DWM doesn't really help because DWM will just delay showing our
> content to the screen until it's ready. I think on Windows 8 and on, you
> have to use DWM.
> 
> In terms of DWMFush returning wildly varying times, yes you are right. We
> correct for this by using DwmGetCompositionTimingInfo to get the actual
> vblank time, we don't use Now() from when DWMFlush() finishes. We then get
> the actual vblank time and feed that through to the refresh driver.

Sorry forgot your main question! Yes we could have another check to make sure we get sane values or if the mean strays a lot, but this hasn't been a problem on Windows 7 and 8. I'd really only like to build this iff we have to. OS X has as flat as a line as you can get. b2g on a flame varies +- 0.01 ms. Windows should be able to do this, and Dwm on windows 7 and 8 can.
*note - I only tested on VMware, will test on real hardware tomorrow, but I expect the same thing since the issues in bug 1144317 reproduced in a VM.
One piece of information that's missing here is the graphics overall strategy for Windows 10.  Stay tuned for that, and I'll make this depend on bug 1148406 so that we know we need to get back to it.
Depends on: 1148406
I just chatted with :jimm. We're ok landing this patch for now and filing a follow up patch to double check the DWM APIs once Windows 10 gets closer to release.
Blocks: 1148481
Comment on attachment 8584187 [details] [diff] [review]
Fallback to software timers on Windows 10

Tested on real hardware thanks to Avih. The DwmGetCompositionTimingInfo information makes sense, but DwmFlush wakes up before the vblank time now, causing an negative timestamp. I guess this would've been caught since we had an assert there if we had Windows 10 integration testing. Reworking.
Attachment #8584187 - Flags: review?(jmuizelaar)
The main difference is that on Windows 7 and 8, DwmFlush always wakes up AFTER the reported vsync timestamp from DwmGetCompositionTimingInfo. On Windows 10, this is the inverse. Therefore, we had a very large negative adjustment due to QPC_TIME being unsigned and skipped refresh driver ticks since refresh driver ticks must always go forward in time. This patch allows for negative adjustment values to calculate the vsync timestamp. On Windows 10, the vsync refresh driver with this patch works locally.

The problem of bad vsync timestamps from DwmGetCompositionTimingInfo before might be from running in a VM.
Attachment #8584754 - Flags: review?(jmuizelaar)
Attachment #8584754 - Flags: review?(jmuizelaar) → review+
(In reply to Mason Chang [:mchang] from comment #21)
> The main difference is that on Windows 7 and 8, DwmFlush always wakes up
> AFTER the reported vsync timestamp from DwmGetCompositionTimingInfo. On
> Windows 10, this is the inverse. Therefore, we had a very large negative
> adjustment due to QPC_TIME being unsigned and skipped refresh driver ticks
> since refresh driver ticks must always go forward in time.

This is only speculation, but perhaps they moved composition time to somewhere during VDraw to avoid situations where software tries to present during VBlank and ends up racing with composition (which definitely happens in Windows 7). IIRC OS X also does something like this with their compositor.
By the way, I stopped experiencing this bug on the 2015-03-28 Nightly build. Has something changed?
(In reply to Josh Tumath from comment #25)
The change got backed out. bug 1144317 comment 32.
I tested the timings received from DwmGetCompositionTimingInfo on Windows 8.1 on VMWare Fusion on a macbook pro. The timings are also quite inconsistent on Windows 8, but DwmFlush still executes after the vblank times. I also tested the vsync refresh driver in this configuration and it works. The main assumption that was broken on Windows 10 was that DwmFlush wakes up after the vblank time according to DwmGetCompositionTimingInfo, so the adjustment for vsync timestamps would be positive.

With the patch in inbound, we allow for DwmFlush to wake up before DwmGetCompositionTimingInfo and correct the timestamps in case we get a large negative timestamp. In this error case, we won't be smooth but at least we'll paint. I don't think we should have any expectation of being smooth if the vsync timestamps are flaky, which is the case in a VM.
https://hg.mozilla.org/mozilla-central/rev/b63d5342d4aa
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Did it manage to land in 39? If not, should we consider uplifting this?
Comment on attachment 8584754 [details] [diff] [review]
Allow negative adjustment timestamps

Approval Request Comment
[Feature/regressing bug #]: Bug 1140723, Silk on Windows
[User impact if declined]: The user can get incorrect timestamps on Windows 10, causing some animations to be keyed off incorrectly if OMTA is enabled.
[Describe test coverage new/current, TreeHerder]: Manual, no tests due to lack of Windows 10 testing infrastructure.
[Risks and why]: Low, this patch fixes an invalid assumption on Windows 10.
[String/UUID change made/needed]: None

I think this is right, aurora is 39 now right?
Attachment #8584754 - Flags: approval-mozilla-aurora?
Please, please do not uplift this.  We really have no capacity to deal with any additional regressions potentially coming from things that land directly on aurora.
To be clear - if we don't have the vsync refresh driver enabled on Windows in 39, there is no point in uplifting the fix to "if the refresh driver is enabled", and we do not want the vsync refresh driver in 39.
(In reply to Milan Sreckovic [:milan] from comment #33)
> To be clear - if we don't have the vsync refresh driver enabled on Windows
> in 39, there is no point in uplifting the fix to "if the refresh driver is
> enabled", and we do not want the vsync refresh driver in 39.

I'm more worried about passing invalid timestamps to the vsync compositor on Windows 10 and then this hitting release. I don't think we have OMTA enabled on desktop, but if anything uses the timestamp from the compositor, it will be wrong. I won't enable the refresh driver on 39.
Fair enough, but if we're worried about anything, we would disable compositor vsync in 39 as well.
(In reply to Milan Sreckovic [:milan] from comment #35)
> Fair enough, but if we're worried about anything, we would disable
> compositor vsync in 39 as well.

Sure, that sounds like a good backup plan.
OMTA was just enabled in bug 980770, which uses the timestamps from the compositor. This needs to be uplifted to 39 for Windows 10 users or we disable the vsync compositor on 39. I'd rather uplift this to 39 since the vsync compositor is already enabled and stable so far.
Comment on attachment 8584754 [details] [diff] [review]
Allow negative adjustment timestamps

From irc, Milan requested that we disable the vsync compositor on Windows to contain risk. Cancelling request.
Attachment #8584754 - Flags: approval-mozilla-aurora?
Disable the vsync compositor in gecko 39 in bug 1150223.
Depends on: 1190257
You need to log in before you can comment on or make changes to this bug.