Closed Bug 1127151 Opened 5 years ago Closed 5 years ago

Create a Vsync Source on Windows

Categories

(Firefox :: General, defect, P1)

x86_64
Windows 7
defect

Tracking

()

RESOLVED FIXED
Firefox 39
Tracking Status
firefox38 --- fixed

People

(Reporter: mchang, Assigned: mchang)

References

()

Details

Attachments

(4 files, 4 obsolete files)

We have hardware vsync sources on OSX + b2g. Create the same thing on Windows platforms.
Component: Performance → General
Product: Firefox OS → Firefox
Hardware: ARM → x86_64
Target Milestone: --- → Firefox 39
Version: unspecified → Trunk
From an initial investigation with windows outside of Firefox, using the WaitForVBlank API - https://msdn.microsoft.com/en-us/library/windows/desktop/bb174559%28v=vs.85%29.aspx. 

It looks like Windows makes it pretty explicit about which GPU and monitor we're on and it waits for vblank on a specific monitor/GPU combination. It also blocks the current thread rather than making a dedicated vsync thread like OSX/b2g. Thus the vsync message may be bad for specific gpu/monitor combinations whereas OS X has a specific vsync rate that works across all connected monitors. 

The option I'm going to try is to create a new nsThread like the Software vsync thread that WaitForVblank and sends a vsync notification. The problem with this is that it is noisy compared to the other platforms. We'll have to get the TimeStamp of when the thread woke up instead of when the vsync actually occurred. From limited testing, this is still better than software timers, but much noiser than OS X. e.g. 16.64-16.68 ms ranges from my limited, non stress-tested testing.
On platforms where the Desktop Window Manager (DWM) exists (Windows Vista and up) and is enabled (guaranteed on Windows 8 and up), DwmGetCompositionTimingInfo [1] should give you the QueryPerformanceCounter timestamp of the last VBlank (or the timestamp of the last composition, which may be more relevant). There's also DwmFlush [2], which does something similar to WaitForVBlank (but I'm not sure how they compare exactly).

Also, does DXGI require Direct3D 10+ hardware?

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa969514.aspx
[2] https://msdn.microsoft.com/en-us/library/windows/desktop/dd389405.aspx
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2)
> On platforms where the Desktop Window Manager (DWM) exists (Windows Vista
> and up) and is enabled (guaranteed on Windows 8 and up),
> DwmGetCompositionTimingInfo [1] should give you the QueryPerformanceCounter
> timestamp of the last VBlank (or the timestamp of the last composition,
> which may be more relevant). There's also DwmFlush [2], which does something
> similar to WaitForVBlank (but I'm not sure how they compare exactly).
> 
> Also, does DXGI require Direct3D 10+ hardware?
> 
> [1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa969514.aspx
> [2] https://msdn.microsoft.com/en-us/library/windows/desktop/dd389405.aspx

I don't know if DXGI requires D3D 10+ hardware, but requires D3D10. D3D10 looks like Windows Vista, so I think it's probably a safe assumption? I can't find the link, but I think in the worst case, windows falls back to a software display.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2)
> On platforms where the Desktop Window Manager (DWM) exists (Windows Vista
> and up) and is enabled (guaranteed on Windows 8 and up),
> DwmGetCompositionTimingInfo [1] should give you the QueryPerformanceCounter
> timestamp of the last VBlank (or the timestamp of the last composition,
> which may be more relevant). There's also DwmFlush [2], which does something
> similar to WaitForVBlank (but I'm not sure how they compare exactly).
> 
> Also, does DXGI require Direct3D 10+ hardware?
> 
> [1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa969514.aspx
> [2] https://msdn.microsoft.com/en-us/library/windows/desktop/dd389405.aspx

Thanks for these tips! So I tried DwmGetCompositionTimingInfo which gives us the last timestamp of the previous composite, but it doesn't actually wait for a vsync, so it would be useful to ensure clean timestamps. I also tried DwmFlush, which is similar to WaitForVBlank but much noiser in terms of when it wakes up than WaitForVBlank. The problem is that WaitForVBlank actually kills the whole system, which I'm not sure why. Still investigating, maybe a combination of DwmFlush + getting the timestamp from [1] may be the right way to go.
So the Dwm should be calling WaitForVBlank so I'm surprised that having another process call it would ruin the system.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #5)
> So the Dwm should be calling WaitForVBlank so I'm surprised that having
> another process call it would ruin the system.

Yes... I am very confused. Actually creating a separate vsync thread that just does this:

for (;;) {
  if (!vsyncEnabled) return;
  IDXGIOutput::WaitForVBlank();
}

Kills the whole thing. BUT doing this spin loop:

for (;;) {
  if (!vsyncEnabled) return;
  sleep(1);
  IDXGIOutput::WaitForVBlank();
}

Works just fine....
Is WaitForVBlank actually working? or is the loop spinning a lot more without the sleep?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)
> Is WaitForVBlank actually working? or is the loop spinning a lot more
> without the sleep?

Yeah the WaitForVBlank is actually working. I ran Firefox through VTune and the thread is actually still sleeping without the sleep. So it's not spin looping CPU.
Attached patch Create Windows Vsync Source (obsolete) — Splinter Review
Create a hardware vsync source using DwmFlush and DwmGetCompositionTimingInfo. This creates a separate thread that just loops forever for DwmFlushes() and grabs the last vblank time from DwmGetCompositionTimingInfo. This works pretty well on my windows machine whereas WaitForVBlank doesn't.

@Bas - I was having problems getting WaitForVBlank to work correctly. Just creating a new vsync thread + calling IDXGIOutput->WaitForVBlank() and doing nothing else in a loop killed performance on Firefox. I couldn't type anything or load any menus. Interestingly, if I put a sleep(1); WaitForVBlank(), everything worked fine. Any thoughts on why this could occur? I already verified that the thread is actually sleeping and not using CPU in vtune.

Something that may help is that we create the Vsync thread during gfxPlatform creation time. So during gfxPlatform::init(), I get the adapter via GetDXGIAdapter(). Then in the vsync thread loop:

IDXGIOutput mOutput;
GetDXGIAdapter()->EnumOutputs(0, mOutput);

vsync loop:
for (;;) {
  mOutput->WaitForVBlank().
}

I have a feeling that I'm just using it too early or calling it in some bad context. Any hints would be very useful. Thanks!
Flags: needinfo?(bas)
Comment on attachment 8557432 [details] [diff] [review]
Create Windows Vsync Source

Review of attachment 8557432 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +1943,5 @@
> +          }
> +
> +          // Use a combination of DwmFlush + DwmGetCompositionTimingInfoPtr
> +          // The qpcVBlank is always AFTER Now(), so it behaves like b2g
> +          // Using mPrimaryDisplay->WaitForVBlank, the whole system dies :/

Using WaitForVBlank here produces lockups everywhere.

@@ +1970,5 @@
> +      Monitor mVsyncEnabledLock;
> +      base::Thread* mVsyncThread;
> +      bool mVsyncEnabled;
> +      nsRefPtr<IDXGIAdapter1> mAdapter;
> +      IDXGIOutput* mPrimaryDisplay;

Will clean these up after figuring out if its possible to use WaitForVBlank().
On the plus side, this + APZ on windows while holding down the "down" arrow keypad browsing engadget, it's a perfectly smooth scroll with almost no jank. Without silk + APZ, engadget is super janky.
This needs at least a check for Windows XP, since the DWM was introduced with Windows Vista. In addition, it can be turned off dynamically - you can use DwmIsCompositionEnabled [1] to check if it's active. These checks could be omitted on Windows 8+, where the DWM is always active. I assume DwmFlush will return immediately if the compositor is disabled, but I haven't tested this.

It would also be nice to have a solution for Windows XP or when the compositor is disabled. You could do this by presenting continually using a Direct3D 9 device created with a PresentationInterval of D3DPRESENT_INTERVAL_ONE [2], or do something more precise using IDirect3DDevice9::GetRasterStatus [3], Sleep and as little busy-looping as you can get away with (D3DRASTER_STATUS::ScanLine is not guaranteed to be accurate). I've had success in the past using Sleep(n), Sleep(1), Sleep(0) and SwitchToThread() and a bit of statistics, but as you can probably tell this can get hairy ;) I also understand if we choose to just ignore these situations, given that their prevalence should be decreasing.

[1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa969518.aspx
[2] https://msdn.microsoft.com/en-us/library/windows/desktop/bb172588.aspx
[3] https://msdn.microsoft.com/en-us/library/windows/desktop/bb174402.aspx
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #12)
> This needs at least a check for Windows XP, since the DWM was introduced
> with Windows Vista. In addition, it can be turned off dynamically - you can
> use DwmIsCompositionEnabled [1] to check if it's active. These checks could
> be omitted on Windows 8+, where the DWM is always active. I assume DwmFlush
> will return immediately if the compositor is disabled, but I haven't tested
> this.
> 
> It would also be nice to have a solution for Windows XP or when the
> compositor is disabled. You could do this by presenting continually using a
> Direct3D 9 device created with a PresentationInterval of
> D3DPRESENT_INTERVAL_ONE [2], or do something more precise using
> IDirect3DDevice9::GetRasterStatus [3], Sleep and as little busy-looping as
> you can get away with (D3DRASTER_STATUS::ScanLine is not guaranteed to be
> accurate). I've had success in the past using Sleep(n), Sleep(1), Sleep(0)
> and SwitchToThread() and a bit of statistics, but as you can probably tell
> this can get hairy ;) I also understand if we choose to just ignore these
> situations, given that their prevalence should be decreasing.
> 
> [1] https://msdn.microsoft.com/en-us/library/windows/desktop/aa969518.aspx
> [2] https://msdn.microsoft.com/en-us/library/windows/desktop/bb172588.aspx
> [3] https://msdn.microsoft.com/en-us/library/windows/desktop/bb174402.aspx

Ahh good to know! I think realistically, on windows XP we'll just fallback to software vsync and not use hardware vsync. Software vsync is already implemented here - https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/SoftwareVsyncSource.cpp?from=SoftwareVsyncSource.cpp&case=true#1
That will no doubt cause tearing, but then I doubt the approach of 'notify a thread that VBlank has happened, and so we should paint in time for the next one' is really amenable to presenting *inside* VBlank - unless it can be done in less than 0.5ms. You'd want your actual Present() call in the VSync thread for that.
Oh, though if I recall correctly we already create devices with D3DPRESENT_INTERVAL_ONE / D3DPRESENT_INTERVAL_DEFAULT or the Direct3D 10+ equivalent, so that will probably prevent tearing. We'll just drop paint notifications on the floor every now and then (as 60fps tends to be slightly faster than the actual refresh rate).
Mason, I might have missed that on one of the other bugs, but does the new system support "vsync off" style iterations for performance tests? Right now "ASAP mode" (vsync off to iterate as fast as possible) is used extensively in our performance tests because it's a great indication of how fast the system can iterate animations and other rendering iterations.

Is it supported? How would such mode be triggered? Maybe using software mode with very high frequency? or maybe an explicit mode where it iterates as fast as possible?

FWIW, currently it's controlled using the pref layout.frame_rate where -1 is the default (use vsync if available or 60hz otherwise), a positive number is an explicit frequency, and 0 is ASAP mode.
(In reply to Avi Halachmi (:avih) from comment #16)
> Mason, I might have missed that on one of the other bugs, but does the new
> system support "vsync off" style iterations for performance tests? Right now
> "ASAP mode" (vsync off to iterate as fast as possible) is used extensively
> in our performance tests because it's a great indication of how fast the
> system can iterate animations and other rendering iterations.
> 
> Is it supported? How would such mode be triggered? Maybe using software mode
> with very high frequency? or maybe an explicit mode where it iterates as
> fast as possible?
> 
> FWIW, currently it's controlled using the pref layout.frame_rate where -1 is
> the default (use vsync if available or 60hz otherwise), a positive number is
> an explicit frequency, and 0 is ASAP mode.

Ahh good timing, I was going to ping you about it soon. I have to think a bit more about how to do this. Knowing that the tests are controlled with that pref is good to know! I suspect in the short term, we can just not use hardware vsync if that pref is set to -1, so all the old code paths will still execute. In the long run, we may have to use a software mode with a high frequency, but that would be kind of ugly right now. I'll have to think about it more.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #14)
> That will no doubt cause tearing, but then I doubt the approach of 'notify a
> thread that VBlank has happened, and so we should paint in time for the next
> one' is really amenable to presenting *inside* VBlank - unless it can be
> done in less than 0.5ms. You'd want your actual Present() call in the VSync
> thread for that.

Hmm, yeah we shouldn't have tearing with software vsync. Even with the current implementation (not silk), everything is on software timers and the vsync notifications we get in the refresh driver aren't really waiting on vsync. It tries to predict the next vsync but still uses a software timer to schedule the next paint. I think we use Present(0) to wait for the next frame in the compositor to prevent tearing. Lumping the compositor and refresh driver into 1 software timer shouldn't cause anymore tearing than what we already have without silk since both are already on independent software timers.
Attached patch WaitForVBlank Implementation (obsolete) — Splinter Review
Creates and enables silk on Windows using the WaitForVBlank() implementation.

@David - Jeff suggested I try this on another machine to see if it also locks up. Can you please try this patch locally and see if everything becomes 10x slower? Thanks!
Flags: needinfo?(bas)
Attachment #8558045 - Flags: feedback?(dvander)
Attached patch DwmFlush implementation (obsolete) — Splinter Review
If attachment 8558045 [details] [diff] [review], WaitForVBlank is really slow, can you please try this implementation? It uses DwmFlush instead. Just some verification that it doesn't slow down Firefox is enough. Thanks!
Attachment #8558047 - Flags: feedback?(dvander)
Blocks: 1128165
Comment on attachment 8558045 [details] [diff] [review]
WaitForVBlank Implementation

Review of attachment 8558045 [details] [diff] [review]:
-----------------------------------------------------------------

Yes indeed, that's very strange. It seems like vblanks are being delivered at a reasonable rate... I think it's worth digging into why the main thread is blocked. I'll poke at it a bit tonight.
Attachment #8558045 - Flags: feedback?(dvander)
From chatting with dvander, he can reproduce the issue with WaitForVBlank and using the dwmflush method resolves the issue.
Comment on attachment 8558047 [details] [diff] [review]
DwmFlush implementation

Indeed, the DWM variant works fine for me and the WaitForVBlank version is super laggy. I tried digging into why. The furthest I was able to get tonight is that VsyncRefreshDriverTimer::RunRefreshDrivers() very often takes 200-300ms in the bad build. With the Sleep(1) it's usually much less than that. It seems like other messages that come in are getting processed fine.
Attachment #8558047 - Flags: feedback?(dvander) → feedback+
I think we'll go with the DwmFlush + DwmGetCompositionTiming method. WaitForVBlank seems to have some internal lock that causes all d3d calls to spend most of their time waiting for locks. Plus, DwmGetCompositionTiming has more accurate timestamps due to the qpcVBlank info. 

Part 1 exposes the DwmFlush method in WinUtils.
Attachment #8557432 - Attachment is obsolete: true
Attachment #8558045 - Attachment is obsolete: true
Attachment #8558047 - Attachment is obsolete: true
Attachment #8558874 - Flags: review?(jmuizelaar)
Creates a D3DVsyncSource in gfxWindowsPlatform if dwm is enabled, otherwise falls back to software vsync. Since DwmFlush halts the current thread, we create a new thread that calls DwmFlush, much like the SoftwareVsync thread.
Attachment #8558876 - Flags: review?(bugmail.mozilla)
Uses DwmGetCompositionTime in the vsync thread to grab the last vsync time. Then adjusts the vsync timestamp from Now() to when the qpcVBlank occurred. An alternative to adjusting the vsync timestamp would be to implement a direct TimeStamp_Value in TimeStamp_windows.cpp.
Attachment #8558877 - Flags: review?(jmuizelaar)
Attachment #8558877 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 8558877 [details] [diff] [review]
Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop

Review of attachment 8558877 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +1946,5 @@
> +        LARGE_INTEGER qpcNow;
> +        LARGE_INTEGER frequency;
> +        QueryPerformanceFrequency(&frequency);
> +        TimeStamp vsync = TimeStamp::Now();
> +        TimeStamp prev = TimeStamp::Now();

will delete this prev, just realized it's here.
This is hinted at elsewhere in this bug but I figured I'd make it explicit. What is likely happening when we call WaitForVBlank is that we take a DXGI lock and that blocks us from making progress elsewhere. It seems reasonable to assume that WaitForVBlank is intended to be called on the same thread as is doing the Present. (I think I may be able to confirm that this is what the DWM does).

Starting in Windows 8.1 it may make sense to use the GetFrameLatencyWaitableObject method but I haven't looked in detail https://msdn.microsoft.com/en-us/library/windows/apps/dn448914.aspx
Comment on attachment 8558877 [details] [diff] [review]
Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop

Review of attachment 8558877 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +1959,5 @@
> +          // Use a combination of DwmFlush + DwmGetCompositionTimingInfoPtr
> +          // The qpcVBlank is always AFTER Now(), so it behaves like b2g
> +          // Using WaitForVBlank, the whole system locks due to lock contention in WaitForVBlank
> +          WinUtils::dwmFlushProcPtr();
> +          HRESULT hr = WinUtils::dwmGetCompositionTimingInfoPtr(0, &vblankTime);

The first parameter is HWND to the window we're interested at, which implies the monitor we're getting the info for (in case of multiple monitors).

0 is IIRC half-documented way to get the desktop window, which means that the info will apply only to the main monitor. The current implementation of windows vsync ended up with 0 since it was hard to get the handle to the actual window.

However, apparently with Windows 8.1 it must be 0, which I guess will end up with windows choosing the window according to the process which makes this API call - https://msdn.microsoft.com/en-us/library/windows/desktop/aa969514%28v=vs.85%29.aspx

Also, IIRC the first 1/few calls to to this API can end up without results, so if this happens - we shouldn't disable HW timing forever and try few more times before we're giving up.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> This is hinted at elsewhere in this bug but I figured I'd make it explicit.
> What is likely happening when we call WaitForVBlank is that we take a DXGI
> lock and that blocks us from making progress elsewhere. It seems reasonable
> to assume that WaitForVBlank is intended to be called on the same thread as
> is doing the Present. 
> 
> Starting in Windows 8.1 it may make sense to use the
> GetFrameLatencyWaitableObject method but I haven't looked in detail
> https://msdn.microsoft.com/en-us/library/windows/apps/dn448914.aspx

Ahh thanks for that link. Actually it's explicit in that doc:

"With the flip model swap chain, back buffer "flips" are queued whenever your game calls IDXGISwapChain::Present. When the rendering loop calls Present(), the system blocks the thread until it is done presenting a prior frame, making room to queue up the new frame, before it actually presents." I think you're right, WaitForVBLank is intended to be called on the same thread.

> (I think I may be able to confirm that this is what the DWM does).
I traced through the dissassembly of DwmFlush and ran it through a vtune. I don't see WaitForVBlank ever called in the DwmFlush mechanism and didn't see it in the disassembly, but I might have done a mis-step while debugging. I do think it uses a different path since WaitForVBlank waits for a hardware signal whereas DwmFlush is when all surface updates have been done, not necessarily vblank I think.

> Starting in Windows 8.1 it may make sense to use the
> GetFrameLatencyWaitableObject method but I haven't looked in detail
> https://msdn.microsoft.com/en-us/library/windows/apps/dn448914.aspx

Ahh that would be very nice. Do we know if we want to do that since it isn't backwards compatible to Windows 7? I'd rather not do two vsync implementations.
(In reply to Avi Halachmi (:avih) from comment #30)
> Comment on attachment 8558877 [details] [diff] [review]
> Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop
> 
> Review of attachment 8558877 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/thebes/gfxWindowsPlatform.cpp
> @@ +1959,5 @@
> > +          // Use a combination of DwmFlush + DwmGetCompositionTimingInfoPtr
> > +          // The qpcVBlank is always AFTER Now(), so it behaves like b2g
> > +          // Using WaitForVBlank, the whole system locks due to lock contention in WaitForVBlank
> > +          WinUtils::dwmFlushProcPtr();
> > +          HRESULT hr = WinUtils::dwmGetCompositionTimingInfoPtr(0, &vblankTime);
> 
> The first parameter is HWND to the window we're interested at, which implies
> the monitor we're getting the info for (in case of multiple monitors).
> 
> 0 is IIRC half-documented way to get the desktop window, which means that
> the info will apply only to the main monitor. The current implementation of
> windows vsync ended up with 0 since it was hard to get the handle to the
> actual window.
> 
> However, apparently with Windows 8.1 it must be 0, which I guess will end up
> with windows choosing the window according to the process which makes this
> API call -
> https://msdn.microsoft.com/en-us/library/windows/desktop/aa969514%28v=vs.
> 85%29.aspx

Yeah, I chose 0 since it's explicit that it must be 0 in newer versions of windows. 

> Also, IIRC the first 1/few calls to to this API can end up without results,
> so if this happens - we shouldn't disable HW timing forever and try few more
> times before we're giving up.

Yea, this code checks to ensure that the call succeeded, otherwise we just use the current TimeStamp, which is when the thread woke up and not the actual vblank time.
Comment on attachment 8558877 [details] [diff] [review]
Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop

Review of attachment 8558877 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know anything about windows APIs, so I can't really provide any feedback here. But the way it's integrated into the vsync framework looks ok.
Attachment #8558877 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 8558876 [details] [diff] [review]
Part 2: Create a Vsync Source on Windows

Review of attachment 8558876 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +1909,5 @@
> +        }
> +
> +        MOZ_RELEASE_ASSERT(mVsyncThread->Start(), "Could not start Windows vsync thread");
> +        CancelableTask* vsyncStart = NewRunnableMethod(this,
> +            &D3DVsyncDisplay::VBlankLoop);

This function doesn't exist. Please try to split patches so that no parts depend on future parts.
Attachment #8558876 - Flags: review?(bugmail.mozilla) → review+
Attachment #8558878 - Flags: review?(bugmail.mozilla) → review+
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #34)
> Comment on attachment 8558876 [details] [diff] [review]
> Part 2: Create a Vsync Source on Windows
> 
> Review of attachment 8558876 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/thebes/gfxWindowsPlatform.cpp
> @@ +1909,5 @@
> > +        }
> > +
> > +        MOZ_RELEASE_ASSERT(mVsyncThread->Start(), "Could not start Windows vsync thread");
> > +        CancelableTask* vsyncStart = NewRunnableMethod(this,
> > +            &D3DVsyncDisplay::VBlankLoop);
> 
> This function doesn't exist. Please try to split patches so that no parts
> depend on future parts.

Ahh yeah, but from comment 33, I tried to extract the Windows API parts from the VsyncSource parts. I thought this time it would be appropriate.
Comment on attachment 8558877 [details] [diff] [review]
Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop

If you don't mind, I have some questions about this patch.

> +          if (hr == S_OK) {

The DWM can be disabled dynamically, either by the user or by starting a program that does not support it (such as some games or VNC), or specifically disables it. So I think you need to handle failure here in some way, or check WinUtils::dwmIsCompositionEnabledPtr() directly. It would also be nice if we could try again at some later point, so we don't get stuck with software vsync for the whole session!

> +            QPC_TIME adjust = qpcNow.QuadPart - vblankTime.qpcVBlank;

Is qpcVBlank actually the value we want here? Technically I don't think we care about VBlank *itself* - we care that we're in time for the next composition. So perhaps qpcCompose or even qpcFrameDisplayed would be more suitable, assuming those report sane values.

> +            MOZ_ASSERT(adjust > 0);

Isn't it conceivable that vblank *just* started, and we managed to call QPC before its value changed? More seriously, IIRC QPC values are not guaranteed to be in sync across threads, so this assertion might trip. I would expect the actual difference to be negligible.
Comment on attachment 8558877 [details] [diff] [review]
Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop

Review of attachment 8558877 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/thebes/gfxWindowsPlatform.cpp
@@ +1934,5 @@
>          MonitorAutoLock lock(mVsyncEnabledLock);
>          return mVsyncEnabled;
>        }
>  
> +      void VBlankLoop()

What is the overhead of this loop? How do talos numbers look?

@@ +1961,5 @@
> +          // Using WaitForVBlank, the whole system locks due to lock contention in WaitForVBlank
> +          WinUtils::dwmFlushProcPtr();
> +          HRESULT hr = WinUtils::dwmGetCompositionTimingInfoPtr(0, &vblankTime);
> +          vsync = TimeStamp::Now();
> +          if (hr == S_OK) {

use SUCCEEDED(hr) here.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #36)
> Comment on attachment 8558877 [details] [diff] [review]
> Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop
> 
> If you don't mind, I have some questions about this patch.
> 
> > +          if (hr == S_OK) {
> 
> The DWM can be disabled dynamically, either by the user or by starting a
> program that does not support it (such as some games or VNC), or
> specifically disables it. So I think you need to handle failure here in some
> way, or check WinUtils::dwmIsCompositionEnabledPtr() directly. It would also
> be nice if we could try again at some later point, so we don't get stuck
> with software vsync for the whole session!

Oh that's terrible, ok thanks. I'll rework this patch to incorporate this feedback, thanks!

> > +            QPC_TIME adjust = qpcNow.QuadPart - vblankTime.qpcVBlank;
> 
> Is qpcVBlank actually the value we want here? Technically I don't think we
> care about VBlank *itself* - we care that we're in time for the next
> composition. So perhaps qpcCompose or even qpcFrameDisplayed would be more
> suitable, assuming those report sane values.

I'll double check these values tomorrow. I actually do think we care about vblank itself and not necessarily when we composites. If we have variable length composites, we would also get variable length timestamps, which isn't what we want, but I'll check again tomorrow.

> 
> > +            MOZ_ASSERT(adjust > 0);
> 
> Isn't it conceivable that vblank *just* started, and we managed to call QPC
> before its value changed? More seriously, IIRC QPC values are not guaranteed
> to be in sync across threads, so this assertion might trip. I would expect
> the actual difference to be negligible.

Hm, I guess yes that is conceivable. I don't know the inner workings of DwmGetCompositionTimingInfoPtr if it blocks to sync QPC or not, but yes, I can delete this.
(In reply to Jim Mathies [:jimm] from comment #37)
> Comment on attachment 8558877 [details] [diff] [review]
> Part 3: Use DwmFlush and DwmGetCompositionTime in vsync thread loop
> 
> Review of attachment 8558877 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/thebes/gfxWindowsPlatform.cpp
> @@ +1934,5 @@
> >          MonitorAutoLock lock(mVsyncEnabledLock);
> >          return mVsyncEnabled;
> >        }
> >  
> > +      void VBlankLoop()
> 
> What is the overhead of this loop? How do talos numbers look?

The overhead should be rather small, otherwise we would have lots of problems! It only posts tasks on other threads. On my Macbook Pro on OS X (which has the exact same loop, just listening to vsync), it usually takes ~0.01 milliseconds. Even with ~10 windows open, still only takes ~0.02 milliseconds. The DwmFlush puts the thread to sleep so it's not burning CPU. We already do the same thing with software timers in the refresh driver / compositor with the MessageLoop.

I haven't tested on talos yet since this is always enabled by default and the talos tests always execute in ASAP mode, so they execute composites/refresh driver ticks as soon as possible. This scheduling mechanism won't be in effect for talos tests.
Sorry, should say this is always disabled by default.
(In reply to Mason Chang [:mchang] from comment #38)
> I'll double check these values tomorrow. I actually do think we care about
> vblank itself and not necessarily when we composites. If we have variable
> length composites, we would also get variable length timestamps, which isn't
> what we want, but I'll check again tomorrow.

My reasoning is that composition must happen *before* VBlank, so the DWM can copy or flip the buffer or whatever it does internally when VBlank does hit to avoid tearing. I would expect the DWM to have a set cutoff point, where it collects whatever presents have been queued up and merges them into a single updated surface.

But perhaps this time is dynamic, based on how much work there is for it to do. VBlank is still a nice stable source for us to use, and it should always follow composition, so perhaps it is the right thing to use here.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #41)
> (In reply to Mason Chang [:mchang] from comment #38)
> > I'll double check these values tomorrow. I actually do think we care about
> > vblank itself and not necessarily when we composites. If we have variable
> > length composites, we would also get variable length timestamps, which isn't
> > what we want, but I'll check again tomorrow.
> 
> My reasoning is that composition must happen *before* VBlank, so the DWM can
> copy or flip the buffer or whatever it does internally when VBlank does hit
> to avoid tearing. I would expect the DWM to have a set cutoff point, where
> it collects whatever presents have been queued up and merges them into a
> single updated surface.
> 
> But perhaps this time is dynamic, based on how much work there is for it to
> do. VBlank is still a nice stable source for us to use, and it should always
> follow composition, so perhaps it is the right thing to use here.

I just did some tests and using the qpcComposeTime is ~2-3x more variable than the qpcVBlankTime. It must mean that composition time is variable depending on how much work to do, which makes sense. Using qpcVBlankTime, we get std-deviations of ~0.01-~0.02ms whereas with qpcComposeTime, we get ~0.03ms - 0.07ms. This testing is done outside of Gecko. With qpcVBlankTime, we sometimes get below <0.008! Tested over 100 loop iterations, so qpcVBlankTime is the way to go for more uniform timestamps.

I also tested qpcFrameDisplayed, qpcFrameComplete and qpcFrameDisplayed. All these numbers are always 0 on Windows 7, so these numbers cannot be used.
Updated with feedback from various comments. Also falls back to posting a delayed task is DwmComposition becomes disabled.
Attachment #8558877 - Attachment is obsolete: true
Attachment #8558877 - Flags: review?(jmuizelaar)
Attachment #8559970 - Flags: review?(jmuizelaar)
Attachment #8558874 - Flags: review?(jmuizelaar) → review+
Attachment #8559970 - Flags: review?(jmuizelaar) → review+
I'd like to point out the solution Chromium is investigating (crbug.com/467617) for Windows-based v-sync: D3DKMTWaitForVerticalBlankEvent. See https://msdn.microsoft.com/en-us/library/windows/hardware/ff547265(v=vs.85).aspx - apparently it works on Vista+ and does not depend on the DWM.
Interesting. From the Chromium bug, here's an implementation: https://gist.github.com/anonymous/4397e4909c524c939bee

Aside from the obvious advantage of not requiring the DWM to be active, and the other advantages listed in the Chromium bug, it would be interesting to see if it does the right thing for multiple monitors. But this discussion should probably go into a new bug.
(In reply to ashley from comment #47)
> I'd like to point out the solution Chromium is investigating
> (crbug.com/467617) for Windows-based v-sync:
> D3DKMTWaitForVerticalBlankEvent. See
> https://msdn.microsoft.com/en-us/library/windows/hardware/ff547265(v=vs.85).
> aspx - apparently it works on Vista+ and does not depend on the DWM.

Ahh cool, thanks for bringing this up! I can try this in a bit to see if it helps.
You need to log in before you can comment on or make changes to this bug.