Closed Bug 1615590 Opened 4 years ago Closed 4 years ago

10-second long startup hangs on developer machines after each update because of the sanity test "snapshot" checks, if webrender compositor is enabled

Categories

(Core :: Graphics, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla75
Performance Impact ?
Tracking Status
firefox-esr68 --- unaffected
firefox73 --- wontfix
firefox74 + disabled
firefox75 --- verified

People

(Reporter: Gijs, Assigned: sotaro, NeedInfo)

References

(Regression)

Details

(Keywords: hang, perf, regression)

Attachments

(4 files, 1 obsolete file)

I'm seeing this both on current nightly and on beta. Profile from beta: https://perfht.ml/2SG3Boi

Looks like the parent is doing sync IPC via mozilla::layers::PWebRenderBridgeChild::SendGetSnapshot at startup. Stack:

mozilla::layers::PWebRenderBridgeChild::SendGetSnapshot(mozilla::layers::PTextureChild*)
PWebRenderBridge::Msg_GetSnapshot
mozilla::layers::WebRenderLayerManager::MakeSnapshotIfRequired(mozilla::gfx::IntSizeTyped<mozilla::LayoutDevicePixel>)
mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer(nsDisplayList*, nsDisplayListBuilder*, WrFiltersHolder&&, mozilla::layers::WebRenderBackgroundData*)
nsDisplayList::PaintRoot(nsDisplayListBuilder*, gfxContext*, unsigned int)
nsDisplayList::PaintRoot
static nsLayoutUtils::PaintFrame(gfxContext*, nsIFrame*, nsRegion const&, unsigned int, nsDisplayListBuilderMode, nsLayoutUtils::PaintFrameFlags)
nsLayoutUtils::PaintFrame
mozilla::PresShell::RenderDocument(nsRect const&, mozilla::RenderDocumentFlags, unsigned int, gfxContext*)
mozilla::dom::CanvasRenderingContext2D::DrawWindow(nsGlobalWindowInner&, double, double, double, double, nsTSubstring<char> const&, unsigned int, mozilla::ErrorResult&)
mozilla::dom::CanvasRenderingContext2D_Binding::drawWindow(JSContext*, JS::Handle<JSObject *>, void*, JSJitMethodCallArgs const&)
CanvasRenderingContext2D.drawWindow
mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions>(JSContext*, unsigned int, JS::Value*)
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
Interpret(JSContext*, js::RunState&)
js::RunScript(JSContext*, js::RunState&)
takeWindowSnapshot
testCompositor
runSanityTest
receiveMessage
js::RunScript
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)
JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)
mozilla::dom::MessageListener::ReceiveMessage(JSContext*, JS::Handle<JS::Value>, mozilla::dom::ReceiveMessageArgument const&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&)
nsFrameMessageManager::ReceiveMessage(nsISupports*, nsFrameLoader*, bool, nsTSubstring<char16_t> const&, bool, mozilla::dom::ipc::StructuredCloneData*, mozilla::jsipc::CpowHolder*, nsIPrincipal*, nsTArray<mozilla::dom::ipc::StructuredCloneData>*, mozilla::ErrorResult&)
MessageListener.receiveMessage
message manager handler
mozilla::dom::BrowserParent::ReceiveMessage(nsTString<char16_t> const&, bool, mozilla::dom::ipc::StructuredCloneData*, mozilla::jsipc::CpowHolder*, nsIPrincipal*, nsTArray<mozilla::dom::ipc::StructuredCloneData>*)
mozilla::dom::BrowserParent::RecvAsyncMessage(nsTString<char16_t> const&, nsTArray<mozilla::jsipc::CpowEntry>&&, nsIPrincipal*, mozilla::dom::ClonedMessageData const&)
BrowserParent::RecvAsyncMessage gfxSanity:ContentLoaded
mozilla::dom::PBrowserParent::OnMessageReceived(IPC::Message const&)
PBrowser::Msg_AsyncMessage
mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&)
mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&)
nsThread::ProcessNextEvent(bool, bool*)
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::RunHandler()
MessageLoop::Run()
nsBaseAppShell::Run()
nsAppShell::Run()
nsAppStartup::Run()
XREMain::XRE_mainRun()
XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)
XREMain::XRE_main
XRE_main(int, char**, mozilla::BootstrapConfig const&)
NS_internal_main(int, char**, char**)
wmain(int, wchar_t**)

When this happens, the parent stops drawing, in some cases even showing the windows "not responding" titlebar. This is a powerful machine, it should not be hanging for 10 seconds. I have no idea what this code is doing, but unless this is somehow restricted to my machine, we can't ship this.

Attached file about support data
Flags: needinfo?(sotaro.ikeda.g)
Keywords: regression
Regressed by: 1592509
Blocks: wr-74

So this is some JS API calling into CanvasRenderingContext2D.drawWindow... Do we know what's calling that on startup?

Ah, no, SanityTest.jsm, sorry, should've looked at the profiles rather than just the stack :)

I confirmed that on today's nightly, if I use mozregression --launch 2020-02-14 --pref gfx.webrender.compositor:false, the problem goes away. I'll see if this is a regression for webrender compositor on this machine (ie if I can find a specific bug if I force-enable the pref while running mozregression)

Regressed by: 1585619
Has Regression Range: --- → yes
Whiteboard: [qf]

I don't quite understand the profile. It shows that, for most of the time we spend waiting, the GPU process isn't actually running? Or maybe whatever delays the GPU process happens before the profiler gets initialized?

(In reply to Markus Stange [:mstange] from comment #8)

I don't quite understand the profile. It shows that, for most of the time we spend waiting, the GPU process isn't actually running? Or maybe whatever delays the GPU process happens before the profiler gets initialized?

I can reproduce this issue very reliably, so if there is some way for me to get this information please let me know and I can try doing so. The graphics failure section in about:support does include "Killing GPU process due to IPC reply timeout" so perhaps something breaks the GPU process? I have no idea about the specifics of the profile, the profile was just how I figured out what was causing this massive hang on startup that I was seeing...

Hey Gijs, I looked at this today with jrmuizel and mstange, and their strong suspicion is that the GPU process crashes or has hung, the launcher times out, and relaunches a new GPU process, and the main thread in the parent process gets blocked waiting on the sync IPC message to be serviced. Eventually that process gets killed automatically after 10s, the sanity test completes, and a new GPU process gets launched.

So what we need here is the stack for the hung GPU process. Is there a crash report in about:crashes? We might have gathered one when force-killing the hung GPU process.

Flags: needinfo?(gijskruitbosch+bugs)

If there's no crash report, we could extend the auto-kill time so that a debugger can be attached. You bump that time by setting layers.gpu-process.ipc_reply_timeout_ms

(In reply to Mike Conley (:mconley) (:⚙️) from comment #11)

If there's no crash report, we could extend the auto-kill time so that a debugger can be attached. You bump that time by setting layers.gpu-process.ipc_reply_timeout_ms

There's no crash report. So I did what you suggested, but MSVS at least out of the box did not break on anything when attached to the gpu process... it just updates the UI to indicate the GPU process died (ie no longer indicates it's attached to anything etc.).

What would be the easiest way for me to gather the actually-interesting stack here? I tried looking at https://searchfox.org/mozilla-central/rev/174f1195ec740e8f17223b48018f7e14e6d4e40e/gfx/layers/ipc/CompositorManagerChild.cpp#258 and can find the parent process side of things from there, but I'm not sure how to find out what the child process is doing at that point - when I break the debugger in the child process at this point it seems to "just" be waiting in mozilla::widget::WinUtils' WaitForMessage loop. It's basically using (almost) no CPU while the parent process is "stuck" waiting for the child.

Flags: needinfo?(mstange)
Flags: needinfo?(mconley)
Flags: needinfo?(gijskruitbosch+bugs)

Is there perhaps (NSPR?) logging I can use to gather more info?

(In reply to :Gijs (he/him) from comment #12)

when I break the debugger in the child process at this point

Which process are you referring to here, exactly? Is the "child process" the GPU process? Another confusing aspect to this is that the "parent process" / "UI process" is the child-side of the CompositorManager actor, and the GPU process is the parent-side of that actor.
When the parent process main thread is blocking in GetSnapshot, it's waiting for something to happen on the Compositor thread of the GPU process, which is waiting for something to happen on the Renderer thread of the GPU process. The WaitForMessage stack sounds like it might have been on the main thread of the GPU process. So the stacks we need are of the Compositor thread and on the Renderer thread of the GPU process.
Any content processes are not of interest here - this hang is purely an interaction between the parent process and the gpu process.

Flags: needinfo?(mstange)

(In reply to Markus Stange [:mstange] from comment #14)

(In reply to :Gijs (he/him) from comment #12)

when I break the debugger in the child process at this point

Which process are you referring to here, exactly? Is the "child process" the GPU process?

Yeah, sorry, the GPU process.

Another confusing aspect to this is that the "parent process" / "UI process" is the child-side of the CompositorManager actor, and the GPU process is the parent-side of that actor.
When the parent process main thread is blocking in GetSnapshot, it's waiting for something to happen on the Compositor thread of the GPU process, which is waiting for something to happen on the Renderer thread of the GPU process. The WaitForMessage stack sounds like it might have been on the main thread of the GPU process. So the stacks we need are of the Compositor thread and on the Renderer thread of the GPU process.

Renderer:

>	xul.dll!mozilla::wr::RenderCompositorANGLE::MaybeReadback(const mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> & aReadbackSize, const mozilla::wr::ImageFormat & aReadbackFormat, const mozilla::Range<unsigned char> & aReadbackBuffer) Line 945	C++
 	xul.dll!mozilla::wr::RendererOGL::UpdateAndRender(const mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > & aReadbackSize, const mozilla::Maybe<mozilla::wr::ImageFormat> & aReadbackFormat, const mozilla::Maybe<mozilla::Range<unsigned char> > & aReadbackBuffer, bool aHadSlowFrame, mozilla::wr::RendererStats * aOutStats) Line 168	C++
 	xul.dll!mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId aWindowId, const mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> & aStartId, const mozilla::TimeStamp & aStartTime, bool aRender, const mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > & aReadbackSize, const mozilla::Maybe<mozilla::wr::ImageFormat> & aReadbackFormat, const mozilla::Maybe<mozilla::Range<unsigned char> > & aReadbackBuffer, bool aHadSlowFrame) Line 478	C++
 	xul.dll!mozilla::wr::WebRenderAPI::Readback::Readback::Run(mozilla::wr::RenderThread & aRenderThread, mozilla::wr::WrWindowId aWindowId) Line 492	C++
 	xul.dll!mozilla::wr::RenderThread::RunEvent(mozilla::wr::WrWindowId aWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > aEvent) Line 414	C++
 	[Inline Frame] xul.dll!mozilla::detail::RunnableMethodArguments<mozilla::wr::WrWindowId,mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > &&>::applyImpl(mozilla::wr::RenderThread * o, void(mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> >) args, mozilla::Tuple<StoreCopyPassByConstLRef<mozilla::wr::WrWindowId>,StoreCopyPassByRRef<mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > > > &, std::integer_sequence<unsigned long long,0,1>) Line 1160	C++
 	[Inline Frame] xul.dll!mozilla::detail::RunnableMethodArguments<mozilla::wr::WrWindowId,mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > &&>::apply(mozilla::wr::RenderThread * o, void(mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> >)) Line 1166	C++
 	xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread *,void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> >),1,mozilla::RunnableKind::Standard,mozilla::wr::WrWindowId,mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > &&>::Run() Line 1215	C++
 	[Inline Frame] xul.dll!MessageLoop::RunTask(already_AddRefed<nsIRunnable> aTask) Line 442	C++
 	xul.dll!MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask && pending_task) Line 450	C++
 	xul.dll!MessageLoop::DoWork() Line 523	C++
 	xul.dll!base::MessagePumpDefault::Run(base::MessagePump::Delegate * delegate) Line 35	C++
 	[Inline Frame] xul.dll!MessageLoop::RunInternal() Line 315	C++
 	xul.dll!MessageLoop::RunHandler() Line 309	C++
 	[Inline Frame] xul.dll!MessageLoop::Run() Line 290	C++
 	xul.dll!base::Thread::ThreadMain() Line 192	C++
 	xul.dll!`anonymous namespace'::ThreadFunc(void * closure) Line 20	C++
 	[External Code]	
 	[Inline Frame] mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,1> >,void (*)(int, void *, void *)>::operator()(int &) Line 151	C++
 	mozglue.dll!patched_BaseThreadInitThunk(int aIsInitialThread, void * aStartAddress, void * aThreadParam) Line 583	C++
 	[External Code]	

Compositor:

 	[External Code]	
>	nss3.dll!_PR_MD_WAIT_CV(_MDCVar * cv, _MDLock * lock, unsigned int timeout) Line 252	C
 	[Inline Frame] nss3.dll!_PR_WaitCondVar(PRThread * thread, PRCondVar * cvar, PRLock * lock, unsigned int timeout) Line 177	C
 	nss3.dll!PR_WaitCondVar(PRCondVar * cvar, unsigned int timeout) Line 539	C
 	nss3.dll!PR_Wait(PRMonitor * mon, unsigned int ticks) Line 308	C
 	[Inline Frame] xul.dll!mozilla::ReentrantMonitor::Wait(unsigned int aInterval) Line 95	C++
 	[Inline Frame] xul.dll!mozilla::layers::SynchronousTask::Wait() Line 25	C++
 	xul.dll!mozilla::wr::WebRenderAPI::Readback(const mozilla::TimeStamp & aStartTime, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> size, const mozilla::gfx::SurfaceFormat & aFormat, const mozilla::Range<unsigned char> & buffer) Line 517	C++
 	xul.dll!mozilla::layers::WebRenderBridgeParent::RecvGetSnapshot(mozilla::layers::PTextureParent * aTexture) Line 1876	C++
 	xul.dll!mozilla::layers::PWebRenderBridgeParent::OnMessageReceived(const IPC::Message & msg__, IPC::Message * & reply__) Line 979	C++
 	xul.dll!mozilla::layers::PCompositorManagerParent::OnMessageReceived(const IPC::Message & msg__, IPC::Message * & reply__) Line 590	C++
 	xul.dll!mozilla::ipc::MessageChannel::DispatchSyncMessage(mozilla::ipc::ActorLifecycleProxy * aProxy, const IPC::Message & aMsg, IPC::Message * & aReply) Line 2184	C++
 	xul.dll!mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message && aMsg) Line 2133	C++
 	xul.dll!mozilla::ipc::MessageChannel::MessageTask::Run() Line 2009	C++
 	[Inline Frame] xul.dll!MessageLoop::RunTask(already_AddRefed<nsIRunnable> aTask) Line 442	C++
 	xul.dll!MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask && pending_task) Line 450	C++
 	xul.dll!MessageLoop::DoWork() Line 523	C++
 	xul.dll!base::MessagePumpForUI::DoRunLoop() Line 204	C++
 	[Inline Frame] xul.dll!base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate * delegate, base::MessagePumpWin::Dispatcher * dispatcher) Line 57	C++
 	xul.dll!base::MessagePumpWin::Run(base::MessagePump::Delegate * delegate) Line 79	C++
 	[Inline Frame] xul.dll!MessageLoop::RunInternal() Line 315	C++
 	xul.dll!MessageLoop::RunHandler() Line 309	C++
 	[Inline Frame] xul.dll!MessageLoop::Run() Line 290	C++
 	xul.dll!base::Thread::ThreadMain() Line 192	C++
 	xul.dll!`anonymous namespace'::ThreadFunc(void * closure) Line 20	C++
 	[External Code]	
 	[Inline Frame] mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,1> >,void (*)(int, void *, void *)>::operator()(int &) Line 151	C++
 	mozglue.dll!patched_BaseThreadInitThunk(int aIsInitialThread, void * aStartAddress, void * aThreadParam) Line 583	C++
 	[External Code]	

I think this points at the code from bug 1585619 (the regressing bug) but I don't know what other information this really provides... It looks like the Renderer thread is hung in PrintWindow()? But I don't know why... Does that help or did I look at the wrong thing?

Flags: needinfo?(mconley) → needinfo?(mstange)

The "Remarks" section on https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-printwindow#remarks does seem to indicate that blocking on this is unwise...

Thanks, can you get the rest of the stack of the Renderer thread (the part in external code), too? Your last comment indicates that it's blocking in PrintWindow, ok. So not in WaitForCommitCompletion. Can you confirm?

Flags: needinfo?(mstange)
 	ntdll.dll!KiUserCallbackDispatch()	Unknown
 	win32u.dll!NtUserPrintWindow()	Unknown
 	xul.dll!mozilla::wr::RenderCompositorANGLE::MaybeReadback(const mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> & aReadbackSize, const mozilla::wr::ImageFormat & aReadbackFormat, const mozilla::Range<unsigned char> & aReadbackBuffer) Line 945	C++
 	xul.dll!mozilla::wr::RendererOGL::UpdateAndRender(const mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > & aReadbackSize, const mozilla::Maybe<mozilla::wr::ImageFormat> & aReadbackFormat, const mozilla::Maybe<mozilla::Range<unsigned char> > & aReadbackBuffer, bool aHadSlowFrame, mozilla::wr::RendererStats * aOutStats) Line 168	C++
 	xul.dll!mozilla::wr::RenderThread::UpdateAndRender(mozilla::wr::WrWindowId aWindowId, const mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> & aStartId, const mozilla::TimeStamp & aStartTime, bool aRender, const mozilla::Maybe<mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> > & aReadbackSize, const mozilla::Maybe<mozilla::wr::ImageFormat> & aReadbackFormat, const mozilla::Maybe<mozilla::Range<unsigned char> > & aReadbackBuffer, bool aHadSlowFrame) Line 478	C++
 	xul.dll!mozilla::wr::WebRenderAPI::Readback::Readback::Run(mozilla::wr::RenderThread & aRenderThread, mozilla::wr::WrWindowId aWindowId) Line 492	C++
 	xul.dll!mozilla::wr::RenderThread::RunEvent(mozilla::wr::WrWindowId aWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > aEvent) Line 414	C++
 	[Inline Frame] xul.dll!mozilla::detail::RunnableMethodArguments<mozilla::wr::WrWindowId,mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > &&>::applyImpl(mozilla::wr::RenderThread * o, void(mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> >) args, mozilla::Tuple<StoreCopyPassByConstLRef<mozilla::wr::WrWindowId>,StoreCopyPassByRRef<mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > > > &, std::integer_sequence<unsigned long long,0,1>) Line 1160	C++
 	[Inline Frame] xul.dll!mozilla::detail::RunnableMethodArguments<mozilla::wr::WrWindowId,mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > &&>::apply(mozilla::wr::RenderThread * o, void(mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> >)) Line 1166	C++
 	xul.dll!mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread *,void (mozilla::wr::RenderThread::*)(mozilla::wr::WrWindowId, mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> >),1,mozilla::RunnableKind::Standard,mozilla::wr::WrWindowId,mozilla::UniquePtr<mozilla::wr::RendererEvent,mozilla::DefaultDelete<mozilla::wr::RendererEvent> > &&>::Run() Line 1215	C++
 	[Inline Frame] xul.dll!MessageLoop::RunTask(already_AddRefed<nsIRunnable> aTask) Line 442	C++
 	xul.dll!MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask && pending_task) Line 450	C++
 	xul.dll!MessageLoop::DoWork() Line 523	C++
 	xul.dll!base::MessagePumpDefault::Run(base::MessagePump::Delegate * delegate) Line 35	C++
 	[Inline Frame] xul.dll!MessageLoop::RunInternal() Line 315	C++
 	xul.dll!MessageLoop::RunHandler() Line 309	C++
 	[Inline Frame] xul.dll!MessageLoop::Run() Line 290	C++
 	xul.dll!base::Thread::ThreadMain() Line 192	C++
 	xul.dll!`anonymous namespace'::ThreadFunc(void * closure) Line 20	C++
>	kernel32.dll!BaseThreadInitThunk()	Unknown
 	[Inline Frame] mozglue.dll!mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,1> >,void (*)(int, void *, void *)>::operator()(int &) Line 151	C++
 	mozglue.dll!patched_BaseThreadInitThunk(int aIsInitialThread, void * aStartAddress, void * aThreadParam) Line 583	C++
 	ntdll.dll!RtlUserThreadStart()	Unknown

This is all I can get out of MSVS... does that answer your question?

Flags: needinfo?(mstange)

(In reply to Markus Stange [:mstange] from comment #17)

So not in WaitForCommitCompletion. Can you confirm?

Right, it's definitely in PrintWindow().

With WR native compositor, screenshot becomes very slow(normally 30ms-60ms). Then when profiler screenshot happens, WR native compositor is disabled by Bug 1602643 . But we could not do it in this case. So at first, we might need to think about a possibility of removing drawWindow() if WR compositor is enabled on Windows.
https://searchfox.org/mozilla-central/rev/174f1195ec740e8f17223b48018f7e14e6d4e40e/toolkit/components/gfx/SanityTest.jsm#108

I wonder if it would work to use something like:

        hdc_target = GetDC(window);

        BitBlt(hdc, 0, 0, capture->width, capture->height, hdc_target,
               capture->x, capture->y, SRCCOPY);

        ReleaseDC(NULL, hdc_target);

instead of PrintWindow()

BitBlt() does not work for capturing DirectComposition.

Hmm, I tested the following STR as to cause PrintWindow() on 2 Win10 PCs, but RenderCompositorANGLE::MaybeReadback() took around 30-40ms.

[1] Confirm that WebRender is enabled in about:support
[2] Enable the prefs gfx.webrender.compositor=true
[3] Remve "sanity-test.driver-version" pref
[4] Restart Firefox

Depends on: 1615946
Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(sotaro.ikeda.g)

Sanity test was originally added by Bug 1156135. It checks hardware video decoding and rendering capability. If it failed, "media.hardware-video-decoding.failed" pref is set to true.

Then Bug 1377866 added a check if Advanced Layers works. If the check fails, Advanced layer was disabled and update compositor by calling utils.triggerDeviceReset().

It seems that we could skip gfx sanity tests when WebRender is used. It is basically better to be handled by blacklist. And GPU process support(Bug 1314133) was enabled after Bug 1156135. By Bug 1338011, content process does not do hardware video decoding by default. It seems that it could reduce the video rendering problem. And D3D9DXVA2Manager blacklists several GPUs now. Further, WebRender is carefully rolling out. WebRender is not qualified for old gpu drivers.

And if WebRender is used, it seems that we could disable Advanced Layers as fallback candidate.

If the above could be acceptable, gfx sanity test could be skipped if WebRender is used.

Pushed by sikeda.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/beaad55e6508
Skip drawWindow() when LayerManger type is WebRender r=jrmuizel
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Hoping to test that the fix works on my machine once today's nightlies are built, though I'll note here for anyone else looking at this that bug 1615683 turned webrender compositor off again on nightly (making this issue go away anyway - but not affecting beta), which confused me for a bit.

See Also: → 1615683
Summary: 10-second long startup hangs on developer machines after each update because of the sanity test "snapshot" checks → 10-second long startup hangs on developer machines after each update because of the sanity test "snapshot" checks, if webrender compositor is enabled

I'm still seeing this hang on today's nightly. Do you want me to file a follow-up bug, or reopen this one, or...?

Flags: needinfo?(sotaro.ikeda.g)

Oh, sorry. I made a silly mistake during creating the patch. I am going to re-open the bug and create a followup fix.

Flags: needinfo?(sotaro.ikeda.g)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attachment #9127727 - Attachment is obsolete: true
Pushed by sikeda.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e5f581db2514
Skip takeWindowSnapshot() with WebRender r=nical
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED

Sotaro, is that something we could uplift safely to beta or should it ride the 75 train?

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(mstange)

:Gijs, can you check if the problem is addressed on latest nightly?

Flags: needinfo?(gijskruitbosch+bugs)

Yep, today's nightly seems to not have this issue anymore.

Status: RESOLVED → VERIFIED
Flags: needinfo?(gijskruitbosch+bugs)

Good. Thank you!

Flags: needinfo?(sotaro.ikeda.g)

Comment on attachment 9127739 [details]
Bug 1615590 - Skip takeWindowSnapshot() with WebRender

Beta/Release Uplift Approval Request

  • User impact if declined: 10-second long startup hang might happen on some PCs just after updating Firefox.

https://phabricator.services.mozilla.com/D63125 is necessary to this patch.

  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): With the change, hardware decoded video readback test is skipped. If there is a PCs that are not blacklisted for hardware decoded video yet and the PCs are qualified for WebRender, hardware decoded video rendering could cause a problem. But hardware decoded video is already managed by the blacklisting. If the problem will happen, the blacklist should be updated.
  • String changes made/needed: none
Attachment #9127739 - Flags: approval-mozilla-beta?
Attachment #9127190 - Flags: approval-mozilla-beta?

Comment on attachment 9127739 [details]
Bug 1615590 - Skip takeWindowSnapshot() with WebRender

Given the importantperf regression fixed, uplift approved for 74.0b8. If this patch causes functional regressions or instabilities, we will back it out before shipping 74. Thanks

Attachment #9127739 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9127190 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Conflict on toolkit/components/gfx/SanityTest.jsm when I import https://hg.mozilla.org/mozilla-central/rev/e5f581db2514.

Flags: needinfo?(sotaro.ikeda.g)

Razvan Maries, I could apply the 2 patches on beta without problem. D63431 depends on D63125, did you apply them in the same order?

Flags: needinfo?(sotaro.ikeda.g) → needinfo?(rmaries)
Regressions: 1618175

:jrmuizel, is it better to backout the fix from beta? DC compositor is going to be disabled by Bug 1618377 and the fix causes Bug 1618175.

Flags: needinfo?(jmuizelaar)

Yeah. I think that is probably best.

Flags: needinfo?(jmuizelaar)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #44)

https://hg.mozilla.org/releases/mozilla-beta/rev/641687384e58
https://hg.mozilla.org/releases/mozilla-beta/rev/bdf8c742a153

:aryx, can you backout the above from beta because of Comment 45 and Comment 46? Thanks!

Flags: needinfo?(aryx.bugmail)

Pascal can the flags be modified so this doesn't show up in the uplift query since it got backed out?

Flags: needinfo?(pascalc)
Flags: needinfo?(pascalc)
Attachment #9127190 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Attachment #9127739 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-

Gijs, can you try capturing a Firefox or Edge window with https://webrtc.github.io/samples/src/content/getusermedia/getdisplaymedia/. I believe the Window capture code also uses PrintWindow so it would be interesting to see if you see hangs there too.

Flags: needinfo?(gijskruitbosch+bugs)
Attached video screen-recording.webm

(In reply to Jeff Muizelaar [:jrmuizel] from comment #50)

Gijs, can you try capturing a Firefox or Edge window with https://webrtc.github.io/samples/src/content/getusermedia/getdisplaymedia/. I believe the Window capture code also uses PrintWindow so it would be interesting to see if you see hangs there too.

I don't see a lack of responsiveness on the page or the UI but the printwindow call doesn't seem to work properly - see attached recording from the page. Almost the entire screen is white, the only bit that isn't is what it thinks is the window frame - which looks like a win7 aero basic frame, which is odd because this is win10 and my window frame looks nothing like that, and the titlebar is not even enabled anyway (ie tabs are in the titlebar). Profile from another attempt in case that's enlightening: https://perfht.ml/2VDY4lo .

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(jmuizelaar)
See Also: → 1736479
Performance Impact: --- → ?
Whiteboard: [qf]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: