Crash in test_9999_cleanup.xul (painting in the middle of reflow) after setting transparent accentcolor in DevEdition lightweight theme

RESOLVED FIXED in Firefox 40

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: bgrins, Assigned: tnikkel)

Tracking

unspecified
mozilla40
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

After landing Bug 1158872 (https://hg.mozilla.org/mozilla-central/rev/7882ac796e0f), a permaorange crash developed in test_9999_cleanup.xul (Bug 1159150).
Windows XP 32-bit fx-team debug test mochitest-other

https://treeherder.mozilla.org/logviewer.html#?job_id=2852909&repo=fx-team

22:51:55 INFO - Assertion failure: mPresContext->mLayoutPhaseCount[eLayoutPhase_Reflow] == 0 (painting in the middle of reflow), at c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\src\layout\base\nsAutoLayoutPhase.cpp:37 

22:52:07 WARNING - PROCESS-CRASH | toolkit/mozapps/update/tests/chrome/test_9999_cleanup.xul | application crashed [@ nsAutoLayoutPhase::Enter()]
22:52:07 INFO - Crash dump filename: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpnc99lb.mozrunner\minidumps\33848d32-2947-49c0-b591-c96e3dd1f840.dmp
22:52:07 INFO - Operating system: Windows NT
22:52:07 INFO - 5.1.2600 Service Pack 3
22:52:07 INFO - CPU: x86
22:52:07 INFO - GenuineIntel family 6 model 30 stepping 5
22:52:07 INFO - 8 CPUs
22:52:07 INFO - Crash reason: EXCEPTION_BREAKPOINT
22:52:07 INFO - Crash address: 0x29eb872
22:52:07 INFO - Thread 0 (crashed)
22:52:07 INFO - 0 xul.dll!nsAutoLayoutPhase::Enter() [nsAutoLayoutPhase.cpp:7882ac796e0f : 51 + 0x20]
22:52:07 INFO - eip = 0x029eb872 esp = 0x0012ec08 ebp = 0x0012ec14 ebx = 0x0ff531f0
22:52:07 INFO - esi = 0x00000025 edi = 0x0fc84520 eax = 0x00000000 ecx = 0x002b0ad9
22:52:07 INFO - edx = 0x00370ea0 efl = 0x00200212
22:52:07 INFO - Found by: given as instruction pointer in context
22:52:07 INFO - 1 xul.dll!nsAutoLayoutPhase::nsAutoLayoutPhase(nsPresContext *,nsLayoutPhase) [nsAutoLayoutPhase.cpp:7882ac796e0f : 20 + 0x4]
22:52:07 INFO - eip = 0x029cf6b2 esp = 0x0012ec10 ebp = 0x0012ec14
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 2 xul.dll!PresShell::Paint(nsView *,nsRegion const &,unsigned int) [nsPresShell.cpp:7882ac796e0f : 6227 + 0x10]
22:52:07 INFO - eip = 0x02a330ed esp = 0x0012ec1c ebp = 0x0012ed14
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 3 xul.dll!nsViewManager::ProcessPendingUpdatesPaint(nsIWidget *) [nsViewManager.cpp:7882ac796e0f : 445 + 0x1b]
22:52:07 INFO - eip = 0x027e56d9 esp = 0x0012ed1c ebp = 0x0012ed54
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 4 xul.dll!nsViewManager::ProcessPendingUpdatesForView(nsView *,bool) [nsViewManager.cpp:7882ac796e0f : 385 + 0x8]
22:52:07 INFO - eip = 0x027e557d esp = 0x0012ed5c ebp = 0x0012ed74
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 5 xul.dll!nsViewManager::ProcessPendingUpdates() [nsViewManager.cpp:7882ac796e0f : 1075 + 0xb]
22:52:07 INFO - eip = 0x027e547a esp = 0x0012ed7c ebp = 0x0012ed98
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 6 xul.dll!nsViewManager::WillPaintWindow(nsIWidget *) [nsViewManager.cpp:7882ac796e0f : 679 + 0x6]
22:52:07 INFO - eip = 0x027e60c2 esp = 0x0012ed8c ebp = 0x0012ed98
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 7 xul.dll!nsView::WillPaintWindow(nsIWidget *) [nsView.cpp:7882ac796e0f : 1027 + 0x11]
22:52:07 INFO - eip = 0x027e3cc4 esp = 0x0012eda0 ebp = 0x0012edac
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 8 xul.dll!nsWindow::OnPaint(HDC__ *,unsigned int) [nsWindowGfx.cpp:7882ac796e0f : 290 + 0x7]
22:52:07 INFO - eip = 0x02852bc6 esp = 0x0012edb4 ebp = 0x0012ef10
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 9 xul.dll!nsWindow::ProcessMessage(unsigned int,unsigned int &,long &,long *) [nsWindow.cpp:7882ac796e0f : 4800 + 0xa]
22:52:07 INFO - eip = 0x02854d18 esp = 0x0012ef18 ebp = 0x0012f0ac
22:52:07 INFO - Found by: call frame info
22:52:07 INFO - 10 xul.dll!nsWindow::WindowProcInternal(HWND__ *,unsigned int,unsigned int,long) [nsWindow.cpp:7882ac796e0f : 4365 + 0x16]
22:52:07 INFO - eip = 0x02859461 esp = 0x0012f0b4 ebp = 0x0012f0d8
22:52:07 INFO - Found by: call frame info
The patch that caused the permaorange (which adds an accentcolor property on an installed lightweight theme).
I have an ongoing try push that switches the color to rgba(255, 255, 255, .001) to see if it still crashes: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cd88b6a4e02
Perhaps we can fix this by collecting the frames that need nsContainerFrame::SyncWindowProperties called when we reflow them, and then calling it on all the collected frames after reflow is finished.
jimm -- who's the right person to debug what's going on here?

Also, re comment 3, I think an opacity of .001 will round to 0, since we store the opacity just like a color component (0-255); try 0.004.
Flags: needinfo?(jmathies)
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #5)
> Also, re comment 3, I think an opacity of .001 will round to 0, since we
> store the opacity just like a color component (0-255); try 0.004.

Thanks for the info - I've sent a try push with rgba(255, 255, 255, 0.004): https://treeherder.mozilla.org/#/jobs?repo=try&revision=f38f06f2eacb

And a push with rgba(255, 255, 255, 1): https://treeherder.mozilla.org/#/jobs?repo=try&revision=171b22160fa3
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #5)
> jimm -- who's the right person to debug what's going on here?
> 
> Also, re comment 3, I think an opacity of .001 will round to 0, since we
> store the opacity just like a color component (0-255); try 0.004.

Anyone from layout with windows box to test on, also maybe gfx since it involves painting/transparency.
Flags: needinfo?(jmathies)
To be clear, "anyone from layout" doesn't include me, since I wouldn't know what to look for.  Why is the windows widget code spinning a nested event loop and processing paints here?  And what would prevent it from doing so?  Or is it really that this widget API can't be called during reflow because it might always trigger painting?  (Didn't we have some sort of general solution to that problem a while ago?)
Flags: needinfo?(jmathies)
(Needing to avoid making the calls would mean we'd want tn's solution from comment 4.)
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #8)
> To be clear, "anyone from layout" doesn't include me, since I wouldn't know
> what to look for.  Why is the windows widget code spinning a nested event
> loop and processing paints here?  And what would prevent it from doing so? 
> Or is it really that this widget API can't be called during reflow because
> it might always trigger painting?  (Didn't we have some sort of general
> solution to that problem a while ago?)

I don't see a nested event loop here. What I see is:

1) a call to widget api SetTransparencyMode during reflow that sets transparency on the native window.
2) this triggers a windows api call (SetWindowLongPtrW)
3) during 2 the main thread ends up in os theme code
4) This somehow triggers a WM_WINDOWPOSCHANGED event on the window. Which is weird, never seen this happen before in our widget code. Feels as though the system may have been blocked for a while here.
5) we try to dispatch a resize event and the dispatcher asserts.

I would suggest looking at that size event, it's weird that that shows up here.

10:50:21     INFO -  #01: mozilla::EventDispatcher::DispatchDOMEvent(nsISupports *,mozilla::WidgetEvent *,nsIDOMEvent *,nsPresContext *,nsEventStatus *) [dom/events/EventDispatcher.cpp:698]
10:50:21     INFO -  #02: nsGlobalWindow::DispatchEvent(nsIDOMEvent *,bool *) [dom/base/nsGlobalWindow.cpp:9630]
10:50:21     INFO -  #03: nsGlobalWindow::DispatchEvent(nsIDOMEvent *,bool *) [dom/base/nsGlobalWindow.cpp:9605]
10:50:21     INFO -  #04: nsContentUtils::DispatchEvent(nsIDocument *,nsISupports *,nsAString_internal const &,bool,bool,bool,bool *,bool) [dom/base/nsContentUtils.cpp:3681]
10:50:21     INFO -  #05: nsContentUtils::DispatchTrustedEvent(nsIDocument *,nsISupports *,nsAString_internal const &,bool,bool,bool *) [dom/base/nsContentUtils.cpp:3650]
10:50:21     INFO -  #06: nsGlobalWindow::DispatchCustomEvent(nsAString_internal const &) [dom/base/nsGlobalWindow.cpp:5871]
10:50:21     INFO -  #07: nsWebShellWindow::SizeModeChanged(nsSizeMode) [xpfe/appshell/nsWebShellWindow.cpp:354]
10:50:21     INFO -  #08: nsWindow::OnWindowPosChanged(tagWINDOWPOS *) [widget/windows/nsWindow.cpp:5968]
10:50:21     INFO -  #09: nsWindow::ProcessMessage(unsigned int,unsigned int &,long &,long *) [widget/windows/nsWindow.cpp:5187]
10:50:21     INFO -  #10: nsWindow::WindowProcInternal(HWND__ *,unsigned int,unsigned int,long) [widget/windows/nsWindow.cpp:4365]
10:50:21     INFO -  #11: CallWindowProcCrashProtected [xpcom/base/nsCrashOnException.cpp:35]
10:50:21     INFO -  #12: nsWindow::WindowProc(HWND__ *,unsigned int,unsigned int,long) [widget/windows/nsWindow.cpp:4317]
10:50:21     INFO -  #13: USER32 + 0x8734
10:50:21     INFO -  #14: USER32 + 0x1bdf1
10:50:21     INFO -  #15: USER32 + 0x18ea0
10:50:21     INFO -  #16: USER32 + 0x1be3b
10:50:21     INFO -  #17: ntdll + 0xe453
10:50:21     INFO -  #18: UxTheme + 0x7ce8
10:50:21     INFO -  #19: USER32 + 0x1e565
10:50:21     INFO -  #20: UxTheme + 0x7d8e
10:50:21     INFO -  #21: UxTheme + 0x1f37b
10:50:21     INFO -  #22: UxTheme + 0x20f67
10:50:21     INFO -  #23: UxTheme + 0x4b53
10:50:21     INFO -  #24: UxTheme + 0x1ac7
10:50:21     INFO -  #25: UxTheme + 0x1b3d
10:50:21     INFO -  #26: USER32 + 0x194ed
10:50:21     INFO -  #27: USER32 + 0x8734
10:50:21     INFO -  #28: USER32 + 0x8816
10:50:21     INFO -  #29: USER32 + 0x1a013
10:50:21     INFO -  #30: USER32 + 0x1a039
10:50:21     INFO -  #31: nsWindow::WindowProcInternal(HWND__ *,unsigned int,unsigned int,long) [widget/windows/nsWindow.cpp:4372]
10:50:21     INFO -  #32: CallWindowProcCrashProtected [xpcom/base/nsCrashOnException.cpp:35]
10:50:21     INFO -  #33: nsWindow::WindowProc(HWND__ *,unsigned int,unsigned int,long) [widget/windows/nsWindow.cpp:4317]
10:50:21     INFO -  #34: USER32 + 0x8734
10:50:21     INFO -  #35: USER32 + 0x1bdf1
10:50:21     INFO -  #36: USER32 + 0x18ea0
10:50:21     INFO -  #37: USER32 + 0x1c31c
10:50:21     INFO -  #38: ntdll + 0xe453
10:50:21     INFO -  #39: USER32 + 0x1c2d0
10:50:21     INFO -  #40: nsWindow::HideWindowChrome(bool) [widget/windows/nsWindow.cpp:2765]
10:50:21     INFO -  #41: nsWindow::SetWindowTranslucencyInner(nsTransparencyMode) [widget/windows/nsWindow.cpp:6846]
10:50:21     INFO -  #42: nsWindow::SetTransparencyMode(nsTransparencyMode) [widget/windows/nsWindow.cpp:2631]
10:50:21     INFO -  #43: nsContainerFrame::SyncWindowProperties(nsPresContext *,nsIFrame *,nsView *,nsRenderingContext *) [layout/generic/nsContainerFrame.cpp:656]
10:50:21     INFO -  #44: PresShell::DoReflow(nsIFrame *,bool) [layout/base/nsPresShell.cpp:9250]
10:50:21     INFO -  #45: PresShell::ProcessReflowCommands(bool) [layout/base/nsPresShell.cpp:9376]
10:50:21     INFO -  #46: PresShell::FlushPendingNotifications(mozilla::ChangesToFlush) [layout/base/nsPresShell.cpp:4314]
10:50:21     INFO -  #47: nsRefreshDriver::Tick(__int64,mozilla::TimeStamp) [layout/base/nsRefreshDriver.cpp:1732]
10:50:21     INFO -  #48: mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver *,__int64,mozilla::TimeStamp) [layout/base/nsRefreshDriver.cpp:199]
10:50:21     INFO -  #49: mozilla::RefreshDriverTimer::Tick(__int64,mozilla::TimeStamp) [layout/base/nsRefreshDriver.cpp:189]
10:50:21     INFO -  #50: mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(mozilla::TimeStamp) [layout/base/nsRefreshDriver.cpp:441]
10:50:21     INFO -  #51: mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(mozilla::TimeStamp) [layout/base/nsRefreshDriver.cpp:376]
10:50:21     INFO -  #52: nsRunnableMethodImpl<void ( mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp),1,mozilla::TimeStamp>::Run() [xpcom/glue/nsThreadUtils.h:812]
10:50:21     INFO -  #53: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:868]
10:50:21     INFO -  #54: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/glue/nsThreadUtils.cpp:265]
10:50:21     INFO -  #55: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:99]
10:50:21     INFO -  #56: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:233]
10:50:21     INFO -  #57: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:227]
10:50:21     INFO -  #58: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:201]
10:50:21     INFO -  #59: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:167]
10:50:21     INFO -  #60: nsAppShell::Run() [widget/windows/nsAppShell.cpp:180]
10:50:21     INFO -  #61: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:281]
10:50:21     INFO -  #62: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:4071]
10:50:21     INFO -  #63: XREMain::XRE_main(int,char * * const,nsXREAppData const *) [toolkit/xre/nsAppRunner.cpp:4151]
10:50:21     INFO -  #64: XRE_main [toolkit/xre/nsAppRunner.cpp:4240]
Flags: needinfo?(jmathies)
(In reply to Jim Mathies [:jimm] from comment #10)
> 1) a call to widget api SetTransparencyMode during reflow that sets
> transparency on the native window.
> 2) this triggers a windows api call (SetWindowLongPtrW)
> 3) during 2 the main thread ends up in os theme code
> 4) This somehow triggers a WM_WINDOWPOSCHANGED event on the window. Which is
> weird, never seen this happen before in our widget code. Feels as though the
> system may have been blocked for a while here.
> 5) we try to dispatch a resize event and the dispatcher asserts.

The link in comment 1 seems to have a different stack.

https://treeherder.mozilla.org/logviewer.html#?job_id=2852909&repo=fx-team

nsWindow::SetWindowTranslucencyInner triggers nsWindow::OnPaint
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #8)
> (Didn't we have some sort of general
> solution to that problem a while ago?)

We had the same problem when calling Show on the widget during reflow (for menupopup's at least). nsIWidget::Show can trigger synchronous painting. So we made the Show calls async.
(In reply to Timothy Nikkel (:tn) from comment #11)
> The link in comment 1 seems to have a different stack.
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=2852909&repo=fx-team
> 
> nsWindow::SetWindowTranslucencyInner triggers nsWindow::OnPaint

Yeah, basically the same trigger with a different windowing event.

I'm not sure how soon that code in nsBrowserGlue takes effect. If it happens after the window is visible on the desktop we should find a way to set it sooner. You're probably taking a big perf hit here too.
(In reply to Jim Mathies [:jimm] from comment #13)
> I'm not sure how soon that code in nsBrowserGlue takes effect. If it happens
> after the window is visible on the desktop we should find a way to set it
> sooner. You're probably taking a big perf hit here too.

The main place this `accentcolor` is used is here: https://dxr.mozilla.org/mozilla-central/source/toolkit/modules/LightweightThemeConsumer.jsm#122.  This is called when a lightweight-theme-styling-update event fires (when a theme is enabled or disabled).  It also seems to be called when the LightweightThemeConsumer is constructed, in https://dxr.mozilla.org/mozilla-central/source/toolkit/content/widgets/general.xml#73.

It will always have the potential to be called after the window is visible, since it fires when a lightweight theme changes.  I'm not sure about exactly when the initial construction happens.
(In reply to Brian Grinstead [:bgrins] from comment #6)
> Thanks for the info - I've sent a try push with rgba(255, 255, 255, 0.004):
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=f38f06f2eacb
> 
> And a push with rgba(255, 255, 255, 1):
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=171b22160fa3

Just an update to this - the push with rgba(255, 255, 255, 0.004) is crashing just like the `transparent` one did, but the one with rgba(255, 255, 255, 1) is fine.
To simplify the test case, I made a change directly to toolkit/modules/LightweightThemeConsumer.jsm and omitted the change to nsBrowserGlue.js.  This reproduces the crash: https://treeherder.mozilla.org/#/jobs?repo=try&revision=84d8263b9dee.
Posted patch async window properties (obsolete) — Splinter Review
I don't know if we'll be able to make these widget functions guarantee that no sync painting will happen. So I think this is what we'll need in that case.
Attachment #8600224 - Flags: feedback?(bgrinstead)
Warnings as errors compile fix.
Attachment #8600224 - Attachment is obsolete: true
Attachment #8600224 - Flags: feedback?(bgrinstead)
Attachment #8600228 - Flags: feedback?(bgrinstead)
(In reply to Timothy Nikkel (:tn) from comment #17)
> I don't know if we'll be able to make these widget functions guarantee that
> no sync painting will happen. So I think this is what we'll need in that
> case.

Thanks!  I've sent two try pushes, the first using the original patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a651c852cd47.  And a second using the more reduced case to LightweightThemeConsumer: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fd2069cf2000.
Comment on attachment 8600228 [details] [diff] [review]
async window properties

Try push looks good, thanks!
Attachment #8600228 - Flags: feedback?(bgrinstead) → feedback+
Comment on attachment 8600228 [details] [diff] [review]
async window properties

I realized after that nsIPresShell needs an IID bump. I'll include that in the patch.
Attachment #8600228 - Flags: review?(mats)
Comment on attachment 8600228 [details] [diff] [review]
async window properties

>layout/base/nsPresShell.cpp
>+nsIPresShell::SyncWindowProperties(nsView* aView)
>+{
>+  nsRenderingContext rcx(CreateReferenceRenderingContext());
>+  nsIFrame* frame = aView->GetFrame();
>+  if (frame && mPresContext) {
>+    nsContainerFrame::SyncWindowProperties(mPresContext, frame, aView, &rcx, true);
>+  }

'rcx' can go inside the 'if'

>layout/generic/nsContainerFrame.cpp
>+  nsRefPtr<nsPresContext> pc(aPresContext);
> ...
>+  nsBoxLayoutState aState(pc, aRC);
> ...
>   SetSizeConstraints(aPresContext, windowWidget, minSize, maxSize);

Should probably use 'pc' in the SetSizeConstraints() call too,
for consistency.  Or, it might be clearer in intent like this:

>+  nsRefPtr<nsPresContext> kungFuDeathGrip(aPresContext);
> ...
>+  nsBoxLayoutState aState(aPresContext, aRC);
> ...
>   SetSizeConstraints(aPresContext, windowWidget, minSize, maxSize);


>layout/generic/nsContainerFrame.h
>   static void SyncWindowProperties(nsPresContext*       aPresContext,
>                                    nsIFrame*            aFrame,
>+                                   nsView*              aView,
>+                                   nsRenderingContext*  aRC,
>+                                   bool                 aSync);

I'd prefer 'uint32_t aFlags' or an enum (for clarity at the call sites).

>view/nsView.cpp
>+  if (mViewManager) mViewManager->PostPendingUpdate();

Style nit:
if (mViewManager) {
  ...
}

>view/nsViewManager.cpp
>     if (view) {
>+      if (view->mNeedsWindowPropertiesSync) {
>+        view->mNeedsWindowPropertiesSync = false;
>+        if (view->GetFrame()) {
>+          if (nsViewManager* vm = view->GetViewManager()) {
>+            if (nsIPresShell* ps = vm->GetPresShell()) {
>+              ps->SyncWindowProperties(view);
>+            }
>+          }
>+        }

It's not immediately clear why the GetFrame() check is needed here,
so I think you should drop it since it's checked by SyncWindowProperties
(where it is used).
(I'm expecting it to be non-null always, but checked just in case...)
Attachment #8600228 - Flags: review?(mats) → review+
(In reply to Mats Palmgren (:mats) from comment #22)
> Comment on attachment 8600228 [details] [diff] [review]
> async window properties

Thanks for the review. I made those changes.
Looks like menu popup frame also makes SetTransparencyMode calls when it is not safe based on code inspection. I filed bug 1161341 for that.
Assignee: nobody → tnikkel
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/f58aab6a4e62
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
You need to log in before you can comment on or make changes to this bug.