Closed Bug 1284092 Opened 8 years ago Closed 8 years ago

Crash in mozilla::widget::WinCompositorWidget::ClearTransparentWindow

Categories

(Core :: Widget: Win32, defect, P4)

50 Branch
x86
Windows
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox49 --- unaffected
firefox50 + fixed

People

(Reporter: alice0775, Assigned: dvander)

References

Details

(Keywords: crash, regression, Whiteboard: tpi:+)

Crash Data

Attachments

(3 files)

Attached file userChrome.css
[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-88f8dcdc-efb0-4acf-914c-fa7af2160703.
=============================================================

See report http://forums.mozillazine.org/viewtopic.php?p=14645597#p14645597

Steps To Reproduce:
1. Disable e10s
2. Create following userChrome.css in your profile\chrome folder

-------------cut-----------------
@charset "utf-8";

@namespace url("http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul");
#PopupAutoComplete {
  background: none!important;
  border: none!important;
}
-------------cut-----------------

3. Open any web page with autocomplete input field
   e.g. https://bugzilla.mozilla.org/
4. Type abc and press ENTER
5. Open again the page https://bugzilla.mozilla.org/
6. Type abd

Actual Results:
Crash
Flags: needinfo?(dvander)
Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a643f56f7523bf90d726c354aa714a6372aed92d&tochange=56154813d29d7b56d2039fc3a3c80f27c0a4571f

Regressed by: 56154813d29d	Jonathan Watt — Bug 1279628, part 2 - Switch all but one of the gfxASurface::CheckSurfaceSize calls to call Factory::CheckSurfaceSize. r=mstange
Blocks: 1279628
No longer blocks: 1281998
Component: Widget: Win32 → Graphics
Flags: needinfo?(dvander) → needinfo?(jwatt)
Tracking 50+ for this graphics regression.
Component: Graphics → Widget: Win32
Bas, would you be able to find someone to take a look at this? I probably won't have time before I go off on parental leave. I suspect the cause is probably another instance of the issue I described in bug 1279628 comment 5. If so the fix would be fairly simple as it was in the 'part 3' patch on that bug where an extra early return was added at the beginning of CanvasRenderingContext2D::DrawWindow.
Flags: needinfo?(jwatt) → needinfo?(bas)
Alice: I can't reproduce this on Windows either. Can you still repro? If so, is the issue fixed in this patched build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ebeba4e17450&selectedJob=24429315
Flags: needinfo?(alice0775)
I can reproduce the crash on latest Nightly50.0a1[1].
[1]
https://hg.mozilla.org/mozilla-central/rev/78dd94ba93c77d0bba45f8e4525947629a305a41
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0 ID:20160727030230

And I can still reproduce the crash on the try build.
[2]
https://hg.mozilla.org/try/rev/ebeba4e174501de516783f312795b8d3344d40f3
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0 ID:20160724125141
Flags: needinfo?(alice0775)
Are you able to give me a stack for the Try build crash? It should be crashing at one of the MOZ_CRASH macros that I added, and where it crashes will give me a clue about how to fix this.
Flags: needinfo?(alice0775)
WinDbg log for the try build.
(according to https://developer.mozilla.org/en-US/docs/Mozilla/How_to_get_a_stacktrace_with_WinDbg)
Flags: needinfo?(alice0775)
Unfortunately that WinDbg log doesn't seem to identify either the stack that was originally reported, or any of the locations where I added MOZ_CRASH() calls. In fact the stack data doesn't seem to show anything that makes sense. :(

I also can't reproduce either on Win7 on real hardware, or on Win8.1 in a VM.
Can't reproduce on Win10 on real hardware either.
make sure
* disabled e10s
* type abc as stored form history
* and then type abd   --- not abc

I can also reproduce the crash on Win10x64 IP(build 14393.5) on VMware Player.
bp-cfb9b3da-2aef-4388-9d20-ab1d82160728
(In reply to Alice0775 White from comment #10)
> * and then type abd   --- not abc

Ah-hah! I thought I was supposed to retype the same string. That was want I had missed, thanks!
So this crash happens because when we're under the stack:

  gfxWindowsSurface::gfxWindowsSurface
  WinCompositorWidget::CreateTransparentSurface
  WinCompositorWidget::ResizeTransparentWindow
  nsWindow::Resize

CheckSurfaceSize now returns false when one of |size|'s dimensions is zero, and therefore CreateTransparentSurface calls MakeInvalid and creates an invalid gfxWindowSurface which gets assigned to WinCompositorWidget::mTransparentSurface. Later when we hit WinCompositorWidget::ClearTransparentWindow we call CreateDrawTargetForSurface(mTransparentSurface, size) which returns nullptr as the resulting DrawTarget, then we crash on trying to reference that DrawTarget.
I guess we could try and fix this in any of the following ways:

 1. Special case the gfxWindowsSurface ctor code to allow zero sized
    surfaces as they were permitted before bug 1279628 landed.

 2. Make WinCompositorWidget::CreateTransparentSurface null out
    mTransparentSurface if it fails to create a valid surface, and
    make the code that uses mTransparentSurface handle it being null.

 3. Make the code that handles mTransparentSurface check to see if the
    surface is valid before using it.

It seems like the code should be doing either #2 or #3 anyway since we can't assume that we're able to create a valid gfxWindowsSurface.

Any strong opinion on the above, Bas?
Flags: needinfo?(dvander)
I think it would be better to fix WinCompositorWidget to handle null/invalid mTransparentSurface and mMemoryDC, but I don't have time to do that. In case nobody gets to that here's a patch to revert the behavior of the gfxWindowsSurface constructor.
Assignee: nobody → jwatt
Attachment #8775690 - Flags: review?(bas)
(2) sounds best to me to be honest, as you said we should be error-checking anyway.
Flags: needinfo?(dvander)
I'm about to be leave for a couple months. Any chance you might be able to do that, David?
Sure thing.
Assignee: jwatt → dvander
Status: NEW → ASSIGNED
Thank you!
Flags: needinfo?(bas)
Attachment #8775690 - Flags: review?(bas) → review+
Priority: -- → P4
Whiteboard: tpi:+
Is there something we can land here?
Flags: needinfo?(dvander)
I can't reproduce this anymore, it should have been obsoleted by bug 1289236.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(dvander)
Resolution: --- → WORKSFORME
Still happening in 50, 51 and 53:
https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Awidget%3A%3AWinCompositorWidget%3A%3AClearTransparentWindow&date=%3E%3D2016-10-15T17%3A58%3A56.000Z&date=%3C2016-11-15T17%3A58%3A56.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1

Line for the failure (and probably exact trigger) has changed, probably due to bug 1289236, but it's till happening.

NI's for visibility.  Flags/pri might need adjustment
Status: RESOLVED → REOPENED
Flags: needinfo?(jwatt)
Flags: needinfo?(dvander)
Resolution: WORKSFORME → ---
I was not able to see this signature on anything other than 53. Unless this spikes significantly, I'd consider it a wontfix for 50.
dvander, are you able to implement #2? (You're more familiar with this code.)
Flags: needinfo?(jwatt)
(In reply to Ritu Kothari (:ritu) from comment #22)
> I was not able to see this signature on anything other than 53. Unless this
> spikes significantly, I'd consider it a wontfix for 50.

https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3Awidget%3A%3AWinCompositorWidget%3A%3AClearTransparentWindow&date=%3E%3D2016-08-17T01%3A42%3A59.000Z&date=%3C2016-11-17T01%3A42%3A59.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1

It's a rare-ish crash, so you need to expand the time looked at. That shows 50, 51, 52 and 53 crashing in the last 3 months.
I only see two crashes after this bug was fixed, and they're both OOMs whereas the original bug was a 0 size. It looks easy to fix this with a null check, but it's a separate bug and looks super low volume.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Flags: needinfo?(dvander)
Resolution: --- → FIXED
Seems like this got stuck in limbo. For cleaner tracking, I'm going to consider this bug "done" per comment 20 and will file a new bug to track the remaining crashes (which are still happening per crash-stats).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: