Closed Bug 763449 Opened 8 years ago Closed 2 years ago

Intermittent test_browserElement_oop_Close.html, test_browserElement_oop_BackForward.html | Test timed out, "ABORT: X_CopyArea: BadDrawable (invalid Pixmap or Window parameter); 4 requests ago: nsX11ErrorHandler.cpp, line 157"

Categories

(Core :: Graphics: Layers, defect, P3, critical)

x86
Linux
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: emorley, Assigned: karlt)

References

Details

(Keywords: crash, intermittent-failure, qawanted, Whiteboard: [leave open])

Attachments

(3 files)

Rev3 Fedora 12 mozilla-inbound opt test mochitests-2/5 on 2012-06-10 22:22:19 PDT for push 61fd66629c4f

slave: talos-r3-fed-070

https://tbpl.mozilla.org/php/getParsedLog.php?id=12543356&tree=Mozilla-Inbound

{
821 INFO TEST-START | /tests/dom/browser-element/mochitest/test_browserElement_oop_Close.html
creating 1!
[Child 2244] ###!!! ABORT: X_CopyArea: BadDrawable (invalid Pixmap or Window parameter); 4 requests ago: file ../../../toolkit/xre/nsX11ErrorHandler.cpp, line 157
[Child 2244] ###!!! ABORT: X_CopyArea: BadDrawable (invalid Pixmap or Window parameter); 4 requests ago: file ../../../toolkit/xre/nsX11ErrorHandler.cpp, line 157
822 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/browser-element/mochitest/test_browserElement_oop_Close.html | Test timed out.
XScreenSaver state: Disabled
User input has been idle for 969 seconds
args: ['/home/cltbld/talos-slave/test/build/bin/screentopng']
SCREENSHOT: <snip>
823 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/browser-element/mochitest/test_browserElement_oop_Close.html | This test left crash dumps behind, but we weren't expecting it to!
824 INFO TEST-INFO | Found unexpected crash dump file /tmp/tmpQU2mnt/minidumps/141750e9-841d-1590-7bd98799-2dc599d5.dmp.
825 INFO TEST-INFO | Found unexpected crash dump file /tmp/tmpQU2mnt/minidumps/141750e9-841d-1590-7bd98799-2dc599d5.extra.
826 INFO TEST-END | /tests/dom/browser-element/mochitest/test_browserElement_oop_Close.html | finished in 305522ms
}

...

{
159428 INFO TEST-START | Shutdown
159429 INFO Passed: 142295
159430 INFO Failed: 2
159431 INFO Todo:   16488
159432 INFO SimpleTest FINISHED
159433 INFO TEST-INFO | Ran 0 Loops
159434 INFO SimpleTest FINISHED
NOTE: child process received `Goodbye', closing down
NOTE: child process received `Goodbye', closing down
INFO | automation.py | Application ran for: 0:12:40.292715
INFO | automation.py | Reading PID log: /tmp/tmpNsOgW9pidlog
==> process 2192 launched child process 2244
==> process 2192 launched child process 2257
==> process 2192 launched child process 2285
==> process 2192 launched child process 2288
==> process 2192 launched child process 2290
==> process 2192 launched child process 2293
INFO | automation.py | Checking for orphan process with PID: 2244
INFO | automation.py | Checking for orphan process with PID: 2257
INFO | automation.py | Checking for orphan process with PID: 2285
INFO | automation.py | Checking for orphan process with PID: 2288
INFO | automation.py | Checking for orphan process with PID: 2290
INFO | automation.py | Checking for orphan process with PID: 2293
Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1339389994/firefox-16.0a1.en-US.linux-i686.crashreporter-symbols.zip
PROCESS-CRASH | Main app process exited normally | application crashed (minidump found)
Crash dump filename: /tmp/tmpQU2mnt/minidumps/141750e9-841d-1590-7bd98799-2dc599d5.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.i686.PAE #1 SMP Sat Nov 7 21:25:57 EST 2009 i686
CPU: x86
     GenuineIntel family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 0 (crashed)
 0  libmozalloc.so!mozalloc_abort [mozalloc_abort.cpp : 19 + 0x0]
    eip = 0x00852eb2   esp = 0xbfa7f600   ebp = 0x00c59844   ebx = 0x00853118
    esi = 0x00c59844   edi = 0xbfa7f648   eax = 0x0000000a   ecx = 0x00000001
    edx = 0x00c5a32c   efl = 0x00210246
    Found by: given as instruction pointer in context
 1  libxul.so!NS_DebugBreak_P [nsDebugImpl.cpp : 382 + 0x5]
    eip = 0x01cca569   esp = 0xbfa7f620   ebp = 0x00c59844   ebx = 0x024efe20
    esi = 0xbfa7fa34   edi = 0xbfa7f648
    Found by: call frame info
 2  libxul.so!X11Error [nsX11ErrorHandler.cpp : 157 + 0x1e]
    eip = 0x0118d161   esp = 0xbfa7fa60   ebp = 0x00000004   ebx = 0x024efe20
    esi = 0x00000000   edi = 0xbfa802a8
    Found by: call frame info
 3  libX11.so.6.3.0 + 0x3c120
    eip = 0x00661121   esp = 0xbfa80380   ebp = 0xbfa80438   ebx = 0x00759f44
    esi = 0xb55f8490   edi = 0xbfa803bc
    Found by: call frame info
 4  libX11.so.6.3.0 + 0x428e6
    eip = 0x006678e7   esp = 0xbfa80440   ebp = 0xbfa804a8
    Found by: previous frame's frame pointer
 5  libX11.so.6.3.0 + 0x42f95
    eip = 0x00667f96   esp = 0xbfa804b0   ebp = 0xbfa804f8
    Found by: previous frame's frame pointer
 6  libX11.so.6.3.0 + 0x36707
    eip = 0x0065b708   esp = 0xbfa80500   ebp = 0xbfa80538
    Found by: previous frame's frame pointer
 7  libxul.so!mozilla::layers::ShadowLayerForwarder::PlatformSyncBeforeUpdate [ShadowLayerUtilsX11.cpp : 155 + 0x10]
    eip = 0x01d533df   esp = 0xbfa80540   ebp = 0xb2e89e64
    Found by: previous frame's frame pointer
 8  libxul.so!mozilla::layers::ShadowLayerForwarder::EndTransaction [ShadowLayers.cpp : 321 + 0x4]
    eip = 0x01d51e01   esp = 0xbfa80560   ebp = 0xb2e89e64   ebx = 0x024efe20
    Found by: call frame info
 9  libxul.so!mozilla::layers::BasicShadowLayerManager::ForwardTransaction [BasicLayers.cpp : 3597 + 0x16]
    eip = 0x01d3b484   esp = 0xbfa81330   ebp = 0xb204bc40   ebx = 0x024efe20
    esi = 0xbfa81368   edi = 0xb2e89e10
    Found by: call frame info
10  libxul.so!mozilla::layers::BasicShadowLayerManager::EndTransaction [BasicLayers.cpp : 3569 + 0x7]
    eip = 0x01d3bb68   esp = 0xbfa81bc0   ebp = 0xb204bc40   ebx = 0x024efe20
    esi = 0xb2e89e10   edi = 0xbfa82170
}
Would it be possible to run this test a few dozen times on Linux with the
--sync command line arg?

-> Widget:GTK until we get the real stack for the X11Error
Component: General → Widget: Gtk
Keywords: qawanted
QA Contact: general → gtk
Guessing a layers surface ownership/sync issue.
Component: Widget: Gtk → Graphics: Layers
QA Contact: gtk → graphics-layers
Or perhaps caused by whatever is causing bug 733323.
https://tbpl.mozilla.org/php/getParsedLog.php?id=12977506&tree=Mozilla-Inbound
Summary: Intermittent abort during test_browserElement_oop_Close.html | Test timed out. ("ABORT: X_CopyArea: BadDrawable (invalid Pixmap or Window parameter); 4 requests ago: file ../../../toolkit/xre/nsX11ErrorHandler.cpp, line 157") → Intermittent abort during test_browserElement_oop_Close.html, test_browserElement_oop_Back | Test timed out. ("ABORT: X_CopyArea: BadDrawable (invalid Pixmap or Window parameter); 4 requests ago: file ../../../toolkit/xre/nsX11ErrorHandler.cpp line 157")
https://tbpl.mozilla.org/php/getParsedLog.php?id=12977043&full=1&branch=mozilla-inbound
Summary: Intermittent abort during test_browserElement_oop_Close.html, test_browserElement_oop_Back | Test timed out. ("ABORT: X_CopyArea: BadDrawable (invalid Pixmap or Window parameter); 4 requests ago: file ../../../toolkit/xre/nsX11ErrorHandler.cpp line 157") → Intermittent test_browserElement_oop_Close.html, test_browserElement_oop_BackForward.html | Test timed out, "ABORT: X_CopyArea: BadDrawable (invalid Pixmap or Window parameter); 4 requests ago: nsX11ErrorHandler.cpp, line 157"
Blocks: 782505
The issue is that both the Shadow and Shadowable layers can access the front
buffer, but when the Shadow layer destroys the front buffer it hasn't told the
Shadowable layer not to use it.

This seems specific to BasicShadowThebesLayer, but I'm not making sense of the
D3D10 code where LayerManagerD3D10 also calls
ShadowLayerForwarder::PaintedThebesBuffer() yet I don't see any D3D10 layers
code handling the OpThebesBufferSwap reply.

Prior to the call from ShadowLayerParent::Destroy() to mLayer->Disconnect()
being added for bug 623255, the Disconnect() would only be called on the Shadow
layer when its Shadowable layer is being deleted.  I wonder why that bug
wasn't resolved by making the BasicTextureImage keep an owning ref to its
mGLContext?

I played with delaying destruction of the front buffer to the
BasicShadowThebesLayer destructor, but mAllocator has been deleted at that
point.

I'm wondering whether having separate Destroy and Disconnect methods on Layers
may be the way to go?  Destroy could be the signal corresponding to widget
destruction, while Disconnect is the signal corresponding to disconnection
from the Shadowable.

Or perhaps Destroy can somehow be called when GL layers are removed from their
layer tree (resolving bug 623255 in a different way).
Assignee: nobody → karlt
Only BasicShadowThebesLayers and ShadowThebesLayerOGLs with
OpenDescriptorForDirectTexturing return a read-only front buffer from Swap.
I'm not familiar with PGrallocBuffer, but perhaps the system looks after ref
counting and saves us there.

(In reply to Karl Tomlinson (:karlt) from comment #19)
> I wonder why that bug
> wasn't resolved by making the BasicTextureImage keep an owning ref to its
> mGLContext?

Making TextureImages keep a reference to the GL context wouldn't be a complete
solution because GL layers also delete textures, framebuffers, and buffers,
(and ?programs) directly and this would all need to be adjusted to use a
shared context.

> Or perhaps Destroy can somehow be called when GL layers are removed from
> their layer tree (resolving bug 623255 in a different way).

Detecting when layers are removed, but won't be re-added looks like it would
require changes into FrameLayerBuilder (and I don't know how extensive that
would be).

I think it would be possible to have separate Destroy and Disconnect methods
for Shadow GL layers, but making sure to destroy GL resources before the
context looks like an unnecessary burden.

The option I'm preferring is probably to change the GL layer resource handling
to use a shared context to release resources (or no cleanup is necessary if
there is no shared context).  That would remove the need to have these Destroy
methods, that are almost specific to GL layers.

Explicitly calling Destroy on window destruction rather than waiting for a
__delete__ message may provide slightly faster release of resources, and could
use the existing context, rather than having to switch to another context, but
they seem minor advantages for the extra code and complexity involved.
This is not a full solution, but it is part of the solution, and finishing the SyncFrontBufferToBackBuffer earlier may reduce the frequency of these test failures.
Attachment #663893 - Flags: review?(jones.chris.g)
A Destroy method gives more control over when to destroy a context while still allowing clients to keep a reference, which it can use to detect when the context is destroyed.
Attachment #663949 - Flags: review?(bgirard)
Still WIP:

TextureImages on child, including orphan, shadow layers have a bare pointer
to this GLContext.  Make this into a strong reference, so that TextureImages
can determine whether they need to destroy resources with a shared context.

LayerOGLs use the GLContext on the LayerManagerOGL to clean up resources.
LayerManagerOGL can switch its mGLContext to the shared context on Destroy.

Some systems don't use a shared context.  For these, resources are destroyed
automatically when the context is destroyed.  We could null-check the context,
but there is debug resource tracking on the GLContext methods.  Therefore it may be
best to keep a reference to the destroyed GLContext on the LayerManagerOGL, so
that the GLContext methods can still be used to track resources, even though
the methods will be essentially no-ops when the context is destroyed.
Attachment #663949 - Flags: review?(bgirard) → review+
Comment on attachment 663959 [details] [diff] [review]
part 3: keep a reference to mGLContext from LayerManagerOGL, but Destroy GLContext on manager Destroy

Review of attachment 663959 [details] [diff] [review]:
-----------------------------------------------------------------

::: widget/gtk2/nsWindow.cpp
@@ -595,5 @@
>  
>      /** Need to clean our LayerManager up while still alive */
>      if (mLayerManager) {
> -        nsRefPtr<GLContext> gl = nullptr;
> -        if (mLayerManager->GetBackendType() == mozilla::layers::LAYERS_OPENGL) {

Nice, much better
Attachment #663959 - Flags: review?(bgirard) → review+
Blocks: 788935
Comment on attachment 663893 [details] [diff] [review]
Finish X requests on mROFrontBuffer before deleting shadow layer

Nice fix!
Attachment #663893 - Flags: review?(jones.chris.g) → review+
Comment on attachment 663893 [details] [diff] [review]
Finish X requests on mROFrontBuffer before deleting shadow layer

https://hg.mozilla.org/integration/mozilla-inbound/rev/ef440fbe0d52
Attachment #663893 - Flags: checkin+
Attachment 663959 [details] [diff] causes shutdown crashes on Android mochitests (but still green - bug 794244) due to TiledLayerBufferOGL accessing the mGLContext after the LayerManagerOGL is Destroy()ed.
Whiteboard: [orange] → [orange][leave open]
Whiteboard: [orange][leave open] → [leave open]
No longer blocks: 788935
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Mass-closing old bugs I filed that have not had recent activity/no longer affect me.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.