Closed Bug 612573 Opened 14 years ago Closed 13 years ago

Crash in content process - RenderCreatePicture: BadDrawable (invalid Pixmap or Window parameter)

Categories

(Firefox for Android Graveyard :: General, defect)

x86_64
Linux
defect
Not set
critical

Tracking

(status1.9.2 unaffected, status1.9.1 unaffected, fennec-)

VERIFIED FIXED
Firefox 4.0
Tracking Status
status1.9.2 --- unaffected
status1.9.1 --- unaffected
fennec - ---

People

(Reporter: MikeK, Assigned: cjones)

References

Details

(Keywords: testcase, Whiteboard: [approved-patches-landed])

Attachments

(4 files)

When loading pages from the local file system crashing is frequent, when loading the same pages from a web-server it doesn't crash.

The page I have been using to reproduce this is attached together with a stack trace from gdb.
To reproduce load this page into Fennec, if it doesn't fail immediately, then just hit "reload" until it does.
Please run with --sync to get a useful stack.
Severity: major → critical
Keywords: stackwanted
Summary: Crash in content process (Seems to originate from passing bad stuff into X) → Crash in content process - RenderCreatePicture: BadDrawable (invalid Pixmap or Window parameter)
tracking-fennec: --- → ?
I have the same crash.

I load this code:
<!DOCTYPE html>
<html>
    <body>
        <div>i</div>
    </body>
</html>

on a local http server.

The dump in the console:

###!!! ABORT: RenderCreatePicture: BadDrawable (invalid Pixmap or Window parameter); 21 requests ago: file /builds/slave/linux-fennec-trunk-nightly/build/mozilla-central/toolkit/xre/nsX11ErrorHandler.cpp, line 190
###!!! ABORT: RenderCreatePicture: BadDrawable (invalid Pixmap or Window parameter); 21 requests ago: file /builds/slave/linux-fennec-trunk-nightly/build/mozilla-central/toolkit/xre/nsX11ErrorHandler.cpp, line 190

If I run fennec --sync, "almost" no crash (maybe once every 20 loads), with the same dump.
Stacktrace of paul's example, I've tried to get one using the --sync flag but I've been unable to crash with it
tracking-fennec: ? → 2.0-
i don't think that this happens on device, but it is pretty annoying for debugging on the desktop.
(In reply to comment #2)
> Please run with --sync to get a useful stack.

I don't believe that works properly with content processes, but I know that setting MOZ_X_SYNC=1 in the environment definitely does.

(In reply to comment #5)
> i don't think that this happens on device, but it is pretty annoying for
> debugging on the desktop.

It won't because we barely use X11 in fennec on maemo, and only in the chrome process.  A "workaround" is to build with --enable-mobile-optimize on desktop, which fennec devs ought to be doing anyway ;).  (I have been for a very long time, probably why I never saw this crash.)

I can't repro with MOZ_X_SYNC=1, which might mean that this is an inter-process synchronization issue :S.  In the readback part of the basic layers swap-and-readback code, this

-  if (OptionalThebesBuffer::Tnull_t == aReadOnlyFrontBuffer.type()) {
+  if (true || OptionalThebesBuffer::Tnull_t == aReadOnlyFrontBuffer.type()) {

makes the crashes go away.  It sure looks to me like we're syncing correctly on layers transactions, but I'll work this out on paper.
The layers log showed the problem here.

The synchronization logic is still OK, but the model it depends on was being violated by layers being destroyed in in the middle of transactions (i.e., there was a missing Hold() during txns).  The race condition this created was

    C                                       P
   ------------------------------------   ----------------------
    [finish txn]
    BasicShadowableThebesLayer(X):
(1)   [request copy from front-->back]

    BeginTxn()
      SetRoot(newRoot)
        ~oldRoot()
          ~BasicShadowableThebesLayer
(2)         Send__delete__()
    ...                                    BasicShadowThebesLayer
                                             Recv__delete__()
(3)                                            [destroy front]

A ShadowableThebesLayer fired off the readback request (1) to the X server through the content-process's Display.  Then, that same ShadowableThebesLayer was deleted (erroneously!) during a layers txn.  This caused the __delete__ message to be delivered to the parent, which set off the request to destroy the parent's front buffer, through the chrome-process's Display.  The processing of the readback request then raced with the processing of the destroy request in the X server, and on the occasions when the readback lost, the content process got the X error and aborted.

This patch fixes up the model violation by ensuring that setting a new root doesn't destroy the old root or its children during a txn, and adds a guard assertion for thebeslayers.
Assignee: nobody → jones.chris.g
Attachment #492930 - Flags: review?(karlt)
Attachment #492930 - Flags: review?(karlt) → review+
Keywords: stackwanted
Comment on attachment 492930 [details] [diff] [review]
Make sure shadowable layers aren't destroyed in the middle of transactions

Safe patch for a bug that's going to bite fennec-on-desktop and people working on multi-process desktop firefox.
Attachment #492930 - Flags: approval2.0?
Comment on attachment 492930 [details] [diff] [review]
Make sure shadowable layers aren't destroyed in the middle of transactions

chris, is there a test needed for this change?

I see the problem every day and it is a pita.
Attachment #492930 - Flags: approval2.0? → approval2.0+
I would really like a test, but it would be next to impossible to write one that's at all reliable (I don't know how to, at least).  This bug could have appeared intermittently in existing mochitests, if we were running them on a machine with remote=true and X rendering enabled.
s/mochitests/any tests that paint/
yeah, i was trying to think of how you would tickle this problem.  it is a pretty inconsistent failure and I'll be glad when it is gone.
http://hg.mozilla.org/mozilla-central/rev/dde5e4be82c1
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I am still seeing the issue with the desktop version:
!! remote browser loaded
loading about:blank, 1
[TabChild] RESIZE to (w,h)= (800d, 500d)
###!!! ABORT: RenderCreatePicture: BadDrawable (invalid Pixmap or Window parameter); 10 requests ago: file /builds/slave/places-mobile-browser-linux-nightly/build/places/toolkit/xre/nsX11ErrorHandler.cpp, line 190
###!!! ABORT: RenderCreatePicture: BadDrawable (invalid Pixmap or Window parameter); 10 requests ago: file /builds/slave/places-mobile-browser-linux-nightly/build/places/toolkit/xre/nsX11ErrorHandler.cpp, line 190

###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv


The steps I took were to:
1) open a new tab
2) open to http://people.mozilla.com/~nhirata/html_tp/
3) try to open div.html
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I can't repro using steps from comment 14 in both opt and debug desktop builds of m-c/m-b (without --enable-mobile-optimize).  Which build were you testing?
(m-c 9a0741efda5c and m-b 4a2c0c666d7b)
While working on the reftest-ipc harness, I ran firefox through the shadow-layer+X11 code path probably >1000 times and didn't see a crash like this.  Is anyone still seeing this?
I just started seeing this (or something like it) again in my Linux desktop builds. No symbols unfortunately, but I can turn them on and see if I get a stack.
Whiteboard: [approved-patches-landed]
patches landed, i don't see this.  if you still see this sort of problem, please open a new bug and include a stack.
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
VERIFIED FIXED on:

Build Id: Mozilla /5.0 (Android;Linux armv7l;rv:7.0a1) Gecko/20110530 Firefox/7.0a1 Fennec/7.0a1 

Build Id:Mozilla /5.0 (Android;Linux armv7l;rv:6.0a2) Gecko/20110529 Firefox/6.0a2 Fennec/6.0a2 

Device: HTC Desire Z (Android 2.2)
Status: RESOLVED → VERIFIED
Target Milestone: --- → Firefox 4.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: