Closed Bug 608515 Opened 14 years ago Closed 14 years ago

After nVidia driver is crashed, Aero glass isn't recovered if Firefox button is visible

Categories

(Core :: Widget: Win32, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: masayuki, Assigned: masayuki)

References

Details

Attachments

(3 files, 1 obsolete file)

Attached image screenshot
spinning out from bug 604124 comment 26.

If Firefox button is visible, the window's Aero glass isn't painted correctly after nVidia driver is recovered from crash. Then, the minimize, maximize and close buttons isn't painted and they don't work.

If I open a new window, its Aero glass is fine. And also if I hide the firefox button, this bug isn't reproduced.
I think that this bug must be fixed in betaN.
blocking2.0: --- → ?
Note:

Non-glass toolbar such as bookmark toolbar is painted correctly. If somebody guess the cause and create a trial patch but cannot test it. I'll check that. Feel free to ask me.
How often do nVidia drivers "crash"? Is there any way to simulate something like this?
My work machine sees several crashes per day. Unfortunately the crashes occur in kernel mode and are inconsistent. Any non-minimized browser windows have the effect shown in the screenshot and the minimize/maximize/close buttons are usually invisible and don't work. This should really block the final release at least because the only reasonable solution for a user to fix this is restarting Firefox.
(In reply to comment #3)
> How often do nVidia drivers "crash"? Is there any way to simulate something
> like this?

Several times per hour on my notebook. I think that the driver version should be in blacklist at final release, but it's useful for QA for now :-)

Unfortunately, I'm not sure what causes the crash. When I surf some pages, sometimes it's crashed.
I see this too.
blocking2.0: ? → final+
I see this bug on another application which also uses non-client area.

However, I cannot reproduce this bug on MS Office 2010 which uses non-client area too.
blocking2.0: final+ → ?
Oops, I'm sorry. I reset the blocking flag.

# It seems that bugzilla doesn't check the conflict of blocking flag, I'll look for the bug report and I'll file it if there is no bug report.
Blacklisting seems appropriate if the we can determine which set of drivers fail.

Addressing loss of aero on a per window basis is going to be tough. We receive notification of this from windows but our media query system relies on a global flag that's tied to desktop composition settings.

I'm not sure if a media query like -moz-windows-compositor can easily be applied per window. I think it can't, dbaron, any thoughts?
I found the cause, I'll post the patch soon.
Assignee: nobody → masayuki
Status: NEW → ASSIGNED
Attached patch Patch v1.0Splinter Review
http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindow.cpp#5268
> 5268   case WM_DWMCOMPOSITIONCHANGED:
> 5269     UpdateNonClientMargins();
> 5270     BroadcastMsg(mWnd, WM_DWMCOMPOSITIONCHANGED);
> 5271     DispatchStandardEvent(NS_THEMECHANGED);
> 5272     UpdateGlass();
> 5273     Invalidate(PR_FALSE);
> 5274     break;

We cache the composition state to nsUXThemeData::sHaveCompositor.

It'll be modified by nsUXThemeData::CheckForCompositor(). It is called only by UpdateGlass() in above code. However, NS_THEMECHANGED event causes calling nsLookAndFeel::GetMetric(eMetric_MaemoClassic) and the result will be cached.

So, nsPresShell caches old state, therefore, the painting is broken.

This patch removes nsUXThemeData::sHaveCompositor. I think that it's not needed. nsLookAndFeel::GetMetric(eMetric_MaemoClassic) isn't called many times. So, we don't need to worry about the performance.
Attachment #487891 - Flags: review?(jmathies)
blocking2.0: ? → final+
Attachment #487891 - Flags: review?(jmathies) → review+
http://hg.mozilla.org/mozilla-central/rev/b6ccf7feb2e7
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b8
Ugh, this isn't fixed yet. I couldn't reproduce this bug with patched debug build, but I can still reproduce this bug with nightly build.

I'm not sure what are difference between them.

-> REOPEN
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla2.0b8 → ---
Hmm, it seems that there is a timing issue. If the driver crashed when mouse cursor is on another application's window, the aero glass of Fx is recovered sometimes. I guess that if Fx redraws something at crashing or recovering, we're breaking the non-client area painting before WM_DWMCOMPOSITIONCHANGED message is handled.
(In reply to comment #14)
> Hmm, it seems that there is a timing issue. If the driver crashed when mouse
> cursor is on another application's window, the aero glass of Fx is recovered
> sometimes. I guess that if Fx redraws something at crashing or recovering,
> we're breaking the non-client area painting before WM_DWMCOMPOSITIONCHANGED
> message is handled.

Note, nsUXThemeData::CheckForCompositor() checks a global desktop setting. So depending on when it gets called you'll might get different results. We get WM_DWMCOMPOSITIONCHANGED during the transition which triggers a flush of the theme settings via Pres Shell.

During a driver crash I wonder if we get multiple WM_DWMCOMPOSITIONCHANGED events, and I wonder if somehow Pres Shell gets out of sync.
When this happens, is the aero shutdown for the whole desktop or for individual windows?
Using the latest nightly, one Minefield window will recover successfully and another one will turn black. On the window which turns black, Alt-tabbing between the window and another window causes a (non-glass?) title bar to flash for an instant. I recorded a video of it but it is too large to attach; email me (bsmith@mozilla.com) if you want it. My work machine (Lenovo T510) is in 650 castro regularly exhibits this bug as the video driver crashes often. Unfortunately, I cannot figure out how to reproduce it on demand.

I usually have multiple Minefield windows open and I think that might be a contributing factor in this bug.
(In reply to comment #17)
> Using the latest nightly, one Minefield window will recover successfully and
> another one will turn black. On the window which turns black, Alt-tabbing
> between the window and another window causes a (non-glass?) title bar to flash
> for an instant. I recorded a video of it but it is too large to attach; email
> me (bsmith@mozilla.com) if you want it. My work machine (Lenovo T510) is in 650
> castro regularly exhibits this bug as the video driver crashes often.
> Unfortunately, I cannot figure out how to reproduce it on demand.
> 
> I usually have multiple Minefield windows open and I think that might be a
> contributing factor in this bug.

Can you dump your nVidia driver info from the Lenovo in here? We might want to black list it.
(In reply to comment #15)
> Note, nsUXThemeData::CheckForCompositor() checks a global desktop setting. So
> depending on when it gets called you'll might get different results. We get
> WM_DWMCOMPOSITIONCHANGED during the transition which triggers a flush of the
> theme settings via Pres Shell.

Yes, right.

> During a driver crash I wonder if we get multiple WM_DWMCOMPOSITIONCHANGED
> events, and I wonder if somehow Pres Shell gets out of sync.

2 WM_DWMCOMPSOTIONCHANGED message are sent per a crash. Probably, one is for disabling the composition (crashed), the other is for enabling the composition (recovered).

The nsILookAndFeel user is only nsCSSRuleProcessor.
http://mxr.mozilla.org/mozilla-central/source/layout/style/nsCSSRuleProcessor.cpp#1023
And it caches the result. But I don't find the user of the cached result.
Attached patch Additional patch v1.0 (obsolete) — Splinter Review
This patch fixes two issues.

1. WM_DWMCOMPOSITIONCHANGED message shouldn't be lower priority than input messages.

key and mouse input messages should be processed after all UIs are recovered.

2. Cache the compositor state and refresh it only when WM_DWMCOMPOSITIONCHANGED is received.

This patch cache the compositor state again. But it cannot be referred directly. When CheckForCompositior() is called without argument, it returns cached state. By this change, all drawing methods which call CheckForCompositor() draw for same state.

E.g., in the old logic, a possible scenario is, a method paints something for disabled composition after driver crash, then, next method paints something for enabled composition after driver recovered but before WM_DWMCOMPOSITIONCHANGED. This patch prevents such strange behavior.

I reproduced the crash 3 times with the patched build. The window was recovered all times.
Attachment #489164 - Flags: review?(jmathies)
Comment on attachment 489164 [details] [diff] [review]
Additional patch v1.0

>-    if (PeekKeyAndIMEMessage(&msg, NULL) ||
>+    // However, we must not ignore WM_DWM* messages which are handled by us,
>+    // see bug 608515.

No need to mention the bug #, we can rely on hg blame for that. Also, this comment isn't clear. We aren't ignoring WM_DWMCOMPOSITIONCHANGED, this comment should explain why we peek for it.

>+    if (
>+#if MOZ_WINSDK_TARGETVER >= MOZ_NTDDI_LONGHORN
>+        ::PeekMessageW(&msg, NULL, WM_DWMCOMPOSITIONCHANGED,
>+                       WM_DWMCOMPOSITIONCHANGED, PM_REMOVE) ||
>+#endif
>+        PeekKeyAndIMEMessage(&msg, NULL) ||

We should run this through a talos run on Try since we're adding overhead to every message we process.

 
>+  // This method returns cached compostor state. Most callers should call
>+  // without the argument.
>+  // The cache should be modified only when application is starting and
>+  // received WM_DWMCOMPOSITIONCHANGED.

We receive WM_DWMCOMPOSITIONCHANGED in other cases besides startup. For example when we switch from aero to aero basic, and when full screen games like quake live take over the desktop. Pardon my picky comment editing but I'd like this to be a bit more clear, something like:

This method returns the cached compositor state. Most
callers should call without the argument. The cache
should be modified only when the application receives
WM_DWMCOMPOSITIONCHANGED. This rule prevents inconsistent
results for two or more calls which check the state during
composition transition.


>+  static PRBool CheckForCompositor(PRBool aUseCachedValue = PR_TRUE) {
>+    static BOOL sCachedValue = FALSE;
> #if MOZ_WINSDK_TARGETVER >= MOZ_NTDDI_LONGHORN
>-    if(dwmIsCompositionEnabledPtr)
>-      dwmIsCompositionEnabledPtr(&compositionIsEnabled);
>+    if(!aUseCachedValue && dwmIsCompositionEnabledPtr) {
>+      dwmIsCompositionEnabledPtr(&sCachedValue);
>+    }
> #endif // MOZ_WINSDK_TARGETVER >= MOZ_NTDDI_LONGHORN
>-    return (compositionIsEnabled != FALSE);
>+    return (sCachedValue != FALSE);
>   }
> };

Do we get a WM_DWMCOMPOSITIONCHANGED event in early startup that primes the cached value? The only place I see you calling |CheckForCompositor(PR_TRUE)| is in WM_DWMCOMPOSITIONCHANGED.


>+  // XXX If you want to handle WM_DWM* messages newly, the messages shouldn't
>+  //     be reordered by other messages.
>+  //     Check nsAppShell::ProcessNextNativeEvent().
>   case WM_DWMCOMPOSITIONCHANGED:
>+    // First, update the compositor state to latest.
>+    nsUXThemeData::CheckForCompositor(PR_TRUE);
>     UpdateNonClientMargins();
>     BroadcastMsg(mWnd, WM_DWMCOMPOSITIONCHANGED);

Please skip the 'XXX' and format the comment correctly. In fact, I don't think we event need that upper comment since "First, update the compositor state to latest." expains what's going on.
Attachment #489164 - Flags: review?(jmathies) → review-
(In reply to comment #18)
> Can you dump your nVidia driver info from the Lenovo in here? We might want to
> black list it.

The system is running Win7-x64 on a T510 with nVidia 3100M, driver version 8.17.12.5738, which is the latest driver that can be installed on this system without hacking the .inf files of the newer driver released by NVIDIA. But, people not using ThinkPads might be able to install the newer one from the NVIDIA website.
> >+    if (
> >+#if MOZ_WINSDK_TARGETVER >= MOZ_NTDDI_LONGHORN
> >+        ::PeekMessageW(&msg, NULL, WM_DWMCOMPOSITIONCHANGED,
> >+                       WM_DWMCOMPOSITIONCHANGED, PM_REMOVE) ||
> >+#endif
> >+        PeekKeyAndIMEMessage(&msg, NULL) ||
> 
> We should run this through a talos run on Try since we're adding overhead to
> every message we process.

I confirmed that this new patch can fix this bug without message optimization. This was removed.

> >+  static PRBool CheckForCompositor(PRBool aUseCachedValue = PR_TRUE) {
> >+    static BOOL sCachedValue = FALSE;
> > #if MOZ_WINSDK_TARGETVER >= MOZ_NTDDI_LONGHORN
> >-    if(dwmIsCompositionEnabledPtr)
> >-      dwmIsCompositionEnabledPtr(&compositionIsEnabled);
> >+    if(!aUseCachedValue && dwmIsCompositionEnabledPtr) {
> >+      dwmIsCompositionEnabledPtr(&sCachedValue);
> >+    }
> > #endif // MOZ_WINSDK_TARGETVER >= MOZ_NTDDI_LONGHORN
> >-    return (compositionIsEnabled != FALSE);
> >+    return (sCachedValue != FALSE);
> >   }
> > };
> 
> Do we get a WM_DWMCOMPOSITIONCHANGED event in early startup that primes the
> cached value? The only place I see you calling |CheckForCompositor(PR_TRUE)| is
> in WM_DWMCOMPOSITIONCHANGED.

Oh, the previous patch is wrong. PR_TRUE for the argument equals calling without it. So, |CheckForCompositor(PR_FALSE)| is I intended.

However, PR_FALSE is misleadable, I reverted the meaning of the argument in this patch.
Attachment #489164 - Attachment is obsolete: true
Attachment #490026 - Flags: review?(jmathies)
Attachment #490026 - Flags: review?(jmathies) → review+
http://hg.mozilla.org/mozilla-central/rev/c7128de9595a

I hope this patch fixes this bug completely...
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: