Closed
Bug 797568
Opened 12 years ago
Closed 12 years ago
Crash in nsWindow::OnMotionNotifyEvent with abort message: "DRI2DestroyDrawable: BadDrawable (invalid Pixmap or Window parameter)"
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla19
People
(Reporter: wgianopoulos, Assigned: karlt)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(8 files, 1 obsolete file)
47.57 KB,
image/png
|
Details | |
5.56 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
987 bytes,
patch
|
BenWa
:
review+
|
Details | Diff | Splinter Review |
1.92 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
2.29 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
7.11 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
2.37 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
1.82 KB,
patch
|
roc
:
review+
|
Details | Diff | Splinter Review |
I was asked to file this on IRC, even though to create it you need to set alter a pref layers.acceleration.force-enabled;true.
Steps to reproduce:
1. set layers.acceleration.force-enabled;true
2. visit http://www.patriots.com.
3. Hover over team and click on Trnasactions
crash occurs
bp-8883fd9e-1682-4eeb-88a4-bf40a2121003
Reporter | ||
Comment 1•12 years ago
|
||
Guess I could have spelled Transactions better :-)
Reporter | ||
Comment 2•12 years ago
|
||
The graphics section of about:support would have been here had there been a useful way to copy it to the clipboard. I guess we need another bug for that.
Comment 3•12 years ago
|
||
Since this is an X error (that we're just killing Firefox for), I'm going to call this a Widget bug.
Component: Graphics → Widget: Gtk
Updated•12 years ago
|
Crash Signature: [@ mozalloc_abort | NS_DebugBreak_P | X11Error]
Keywords: crash
Summary: Crash @ mozalloc_abort | NS_DebugBreak_P | X11Error → Crash in nsWindow::OnMotionNotifyEvent with abort message: "DRI2DestroyDrawable: BadDrawable (invalid Pixmap or Window parameter)"
Reporter | ||
Comment 4•12 years ago
|
||
FYI I probably would not even have asked if a bug was wanted here since I was overriding something turned off on purpose was that WebGL demos run orders of magnitude faster with layers acceleration on using this OS hardware combo and this is the only crash I have so far encountered with it forced enabled.
Reporter | ||
Comment 5•12 years ago
|
||
added this as a screenshot since it seems impossible to bet useful about:support output to attach to a bug.
Reporter | ||
Comment 6•12 years ago
|
||
second try with a better image format.
Reporter | ||
Updated•12 years ago
|
Attachment #667718 -
Attachment is obsolete: true
Assignee | ||
Comment 7•12 years ago
|
||
Mesa tries to suppress this error
http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/dri2.c?h=8.0&id=59997d619d8957fca3b2042fe4ebeed0709c0204#n185
but that only works if the error is received during _XReply, not while waiting for subsequent events
http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/xcb_io.c?id=ed00b460acb08787b695f27b864e96102dfd4867#n202
Bill wasn't able to reproduce with MOZ_X_SYNC=1 in the environment.
I expect that is because XSynchronize is implemented with an _XReply after every async request.
Assignee | ||
Comment 8•12 years ago
|
||
This is a GLX layers issue, though it might be a Mesa bug.
Component: Widget: Gtk → Graphics: Layers
Comment 9•12 years ago
|
||
Nicolas has expressed a lot of interest in getting GL layers enabled by default on Linux :)
Assignee: nobody → nical.bugzilla
Assignee | ||
Comment 10•12 years ago
|
||
If the XSync before the DRI2DestroyDrawable request were instead after the request, I expect we wouldn't see this issue. That XSync was there in the first revision
http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/x11/dri2.c?id=e82dd8c6e1fa2fff5b960de26961080ba5e9651d#n217
before the DRI2Error function was added
http://cgit.freedesktop.org/mesa/mesa/commit/?h=8.0&id=16887d042a917fa4773e4d853f50051b54e9948c
There may be a bug in Gecko calling glXDestroyPixmap too late.
An xtrace log may contain some helpful clues.
http://xtrace.alioth.debian.org/
Perhaps there's a chance attachment 663949 [details] [diff] [review] may help by making the context not current earlier.
Assignee | ||
Comment 11•12 years ago
|
||
Does the crash still happen with Flash Player disabled?
We are calling glXDestroyPixmap in the gfxXlibSurface destructor and that is not XFinished() before PluginInstanceParent::RecvShow() returns.
Assignee | ||
Comment 12•12 years ago
|
||
If libbonoboui-2.so.0 is loaded, it suppresses all BadDrawable messages, so to reproduce that needs to be moved out of the way.
http://git.gnome.org/browse/libbonoboui/tree/bonobo/bonobo-ui-main.c#n41
Once that was out of the way I could reproduce intermittently but reasonably frequently clicking on "Watch Press Conference", pressing back and trying again if necessary.
The resource id in the abort message was for a resource created in the plugin process, supporting the theory in comment 11.
Reporter | ||
Comment 13•12 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #10)
> If the XSync before the DRI2DestroyDrawable request were instead after the
> request, I expect we wouldn't see this issue. That XSync was there in the
> first revision
> http://cgit.freedesktop.org/mesa/mesa/tree/src/glx/x11/dri2.
> c?id=e82dd8c6e1fa2fff5b960de26961080ba5e9651d#n217
> before the DRI2Error function was added
> http://cgit.freedesktop.org/mesa/mesa/commit/?h=8.
> 0&id=16887d042a917fa4773e4d853f50051b54e9948c
>
> There may be a bug in Gecko calling glXDestroyPixmap too late.
>
> An xtrace log may contain some helpful clues.
> http://xtrace.alioth.debian.org/
>
> Perhaps there's a chance attachment 663949 [details] [diff] [review] may
> help by making the context not current earlier.
Unfortunately, that patch did not seem to help. I will probably find more time to investigate the other suggestions over the weekend.
Reporter | ||
Comment 14•12 years ago
|
||
I think you are right on that this is flash related. My sense of what is happening here is that if there is a load of a flash object pending when I do something to either close the tab or navigate away form the window, there are cases when this crash occurs when the flash object load completes.
Reporter | ||
Comment 15•12 years ago
|
||
So like when the object load completes, a missing check to see if the context that caused the load still exists.
Assignee | ||
Comment 16•12 years ago
|
||
Thanks, Bill. Comments 11 and 12 identify an issue that we need to fix, and point to that as the likely cause here.
This is probably a regression from bug 787853, because that delayed the glXDestroyPixmap until gfxXlibSurface destruction.
I expect the simplest fix would ensure that PluginInstanceParent::RecvShow() releases the reference held by the old mFrontSurface and ensures that mImageContainer is not holding a reference before calling FinishX().
The code there is remnant from when the PluginInstanceOwner or nsObjectFrame managed the ImageContainer. Now that the PluginInstanceParent owns the image container, I suspect it would make sense for the RecvShow just to update the ImageContainer and the code called on RecvNPN_InvalidateRect would no longer need to call GetImageContainer just to update the Image in the container.
Perhaps the MacIOSurface code could be left as is if the same doesn't make sense there. There's some callbacks attached to surfaces, and I'm not familiar with the purpose of that.
Blocks: 787853
Reporter | ||
Comment 17•12 years ago
|
||
So, is there something else I should be testing here? OR are we waiting for a patch to test?
Assignee | ||
Comment 18•12 years ago
|
||
I think we can fix the known issue, then test that when we have a patch, before doing further investigation, thanks.
Updated•12 years ago
|
Keywords: regression
Version: Trunk → 18 Branch
Reporter | ||
Comment 19•12 years ago
|
||
OK but I would be willing to test a not ready for review patch to see if it fixes the issue I am encountering.
Reporter | ||
Comment 20•12 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #16)
> Thanks, Bill. Comments 11 and 12 identify an issue that we need to fix, and
> point to that as the likely cause here.
>
> This is probably a regression from bug 787853, because that delayed the
> glXDestroyPixmap until gfxXlibSurface destruction.
>
> I expect the simplest fix would ensure that PluginInstanceParent::RecvShow()
> releases the reference held by the old mFrontSurface and ensures that
> mImageContainer is not holding a reference before calling FinishX().
>
> The code there is remnant from when the PluginInstanceOwner or nsObjectFrame
> managed the ImageContainer. Now that the PluginInstanceParent owns the
> image container, I suspect it would make sense for the RecvShow just to
> update the ImageContainer and the code called on RecvNPN_InvalidateRect
> would no longer need to call GetImageContainer just to update the Image in
> the container.
>
> Perhaps the MacIOSurface code could be left as is if the same doesn't make
> sense there. There's some callbacks attached to surfaces, and I'm not
> familiar with the purpose of that.
It seems you are on to something backing out the fix for bug 787853 seems to fix the issue for me.
Reporter | ||
Comment 21•12 years ago
|
||
So, I am re-spinning my builds without the fix for bug 787853.
Reporter | ||
Comment 22•12 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #18)
> I think we can fix the known issue, then test that when we have a patch,
> before doing further investigation, thanks.
Well I gave time here because of the uplift. If this can't be fixed then the patch causing the regression should be backed out.
Assignee | ||
Updated•12 years ago
|
Assignee: nical.bugzilla → karlt
Assignee | ||
Comment 23•12 years ago
|
||
When trying to follow the plugin ImageContainer code involved here, I found it quite difficult to follow, partly because the implementation has changed over time but there are still remnants of old implementations to distract you. I'll attach some cleanup patches.
Having nsObjectFrame::GetImageContainer() is essentially deceitful, because this just creates an empty ImageContainer and never really uses it. In bug 724886, ownership of the ImageContainer moved from the nsObjectFrame to the PluginInstanceParent.
Despite appearances, the logic for dispatching NotifyPaintWaiter() is unchanged by this patch.
Attachment #676439 -
Flags: review?(roc)
Assignee | ||
Comment 24•12 years ago
|
||
This code is obsolete, perhaps since we now always use async drawing for OOP plugins as of https://hg.mozilla.org/mozilla-central/rev/b8fa6f780f38
(Note that UseAsyncRendering here is quite different from IsAsyncDrawing() in PluginInstanceParent/Child.) IP plugins don't have an ImageContainer.
Does this mean that the mIOSurface code in PluginInstanceParent (and matching Child code) for sync
NPDrawingModelCoreAnimation/NPDrawingModelInvalidatingCoreAnimation is now
unused?
Attachment #676442 -
Flags: review?(bgirard)
Assignee | ||
Comment 25•12 years ago
|
||
Attachment #676444 -
Flags: review?(roc)
Assignee | ||
Comment 26•12 years ago
|
||
GetImageContainer(void) creates a container and never fails.
It's a little easier to follow the code when there are fewer places that can create ImageContainers. This code comes from Bug 651192.
Attachment #676448 -
Flags: review?(roc)
Reporter | ||
Updated•12 years ago
|
status-firefox19:
--- → affected
tracking-firefox19:
--- → ?
Assignee | ||
Comment 27•12 years ago
|
||
Now that the PluginInstanceParent owns the container, it is easier to follow the state of the ImageContainer Image when the PluginInstanceParent controls it.
When the plugin was shutting down, RecvNPN_InvalidateRect would not cause GetImageContainer(ImageContainer**) to be called, and so there was the special case of the PluginInstanceParent updating the container if it received a null surface, but the nsPluginInstanceOwner initiating when there is a surface.
However, I think I recall new non-null surfaces arriving during destruction, and these messages should remove the old surfaces from use, which wasn't happening because GetImageContainer(ImageContainer**) wasn't called.
The surface flush was a no-op given cairo hadn't been drawing to the surface.
It is not so simple to update the mFrontIOSurface code to do the same because ref counting in nsPluginInstanceOwner::GetImageContainer(ImageContainer**) assumes that PluginInstanceParent::GetImageContainer() creates a new image for each call.
Attachment #676451 -
Flags: review?(roc)
Assignee | ||
Comment 28•12 years ago
|
||
At this point, it would be possible to rearrange the order of code in RecvShow and null out mFrontSurface at the right time to run ~gfxXlibSurface.
However relying on the timing of removal of references can be quite fickle.
gfxASurface::Finish() looks like what we want.
Assignee | ||
Comment 29•12 years ago
|
||
Try debug tests pass with changes very similar to these.
https://tbpl.mozilla.org/?tree=Try&rev=3a78a1b95c29
https://tbpl.mozilla.org/?tree=Try&rev=dc6e868153ad
Still to run Android tests.
Attachment #676439 -
Flags: review?(roc) → review+
Attachment #676444 -
Flags: review?(roc) → review+
Attachment #676448 -
Flags: review?(roc) → review+
Attachment #676451 -
Flags: review?(roc) → review+
Assignee | ||
Comment 30•12 years ago
|
||
I considered setting mDrawable to None, but it seems quite reasonable to use
ReleasePixmap() after calling Finish(). Even cairo_xlib_surface_get_drawable
returns the drawable after cairo_surface_finish() has been called.
Attachment #676469 -
Flags: review?(roc)
Assignee | ||
Comment 31•12 years ago
|
||
http://www.cairographics.org/manual/cairo-cairo-surface-t.html#cairo-surface-finish is what we need here.
I wonder why we weren't having the similar trouble to this bug for RENDER
layers with the XRenderFreePicture call for destruction of
cairo_xlib_surfaces.
cairo_surface_finish calls cairo_surface_flush, and the flush may be necessary even for image surfaces.
Attachment #676470 -
Flags: review?(roc)
Attachment #676469 -
Flags: review?(roc) → review+
Attachment #676470 -
Flags: review?(roc) → review+
Updated•12 years ago
|
status-firefox18:
--- → affected
Comment 32•12 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #24)
> Created attachment 676442 [details] [diff] [review]
> Does this mean that the mIOSurface code in PluginInstanceParent (and
> matching Child code) for sync
> NPDrawingModelCoreAnimation/NPDrawingModelInvalidatingCoreAnimation is now
> unused?
I think so but I'd have to do a few experiments to be 100% sure.
Updated•12 years ago
|
Attachment #676442 -
Flags: review?(bgirard) → review+
Assignee | ||
Comment 33•12 years ago
|
||
Assignee | ||
Comment 34•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/0b0b4ef87dac
https://hg.mozilla.org/integration/mozilla-inbound/rev/08783347c920
https://hg.mozilla.org/integration/mozilla-inbound/rev/caad55e54b0b
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7e77004a0d0
https://hg.mozilla.org/integration/mozilla-inbound/rev/2886e8e67bfd
https://hg.mozilla.org/integration/mozilla-inbound/rev/541b971b847b
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ba88bd2f440
Comment 35•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/0b0b4ef87dac
https://hg.mozilla.org/mozilla-central/rev/08783347c920
https://hg.mozilla.org/mozilla-central/rev/caad55e54b0b
https://hg.mozilla.org/mozilla-central/rev/d7e77004a0d0
https://hg.mozilla.org/mozilla-central/rev/2886e8e67bfd
https://hg.mozilla.org/mozilla-central/rev/541b971b847b
https://hg.mozilla.org/mozilla-central/rev/1ba88bd2f440
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Updated•12 years ago
|
Assignee | ||
Comment 36•12 years ago
|
||
> tracking-firefox18: --- → ?
AFAIK this affects only those on systems without libgnomeui (to pull in libbonoboui-2.so.0) that have set layers.acceleration.force-enabled true to test an incomplete feature.
If there are crashes happening due to errors from RenderFreePicture, then those may also be addressed by this, but I haven't seen that happen.
You need to log in
before you can comment on or make changes to this bug.
Description
•