Closed Bug 560246 Opened 14 years ago Closed 14 years ago

crash [@ nsDisplayList::PaintThebesLayers] zooming street view

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Linux
defect
Not set
major

Tracking

(blocking1.9.2 .4+, status1.9.2 .4-fixed)

VERIFIED FIXED
mozilla1.9.3a5
Tracking Status
blocking1.9.2 --- .4+
status1.9.2 --- .4-fixed

People

(Reporter: kbrosnan, Assigned: karlt)

References

()

Details

(Keywords: regression, Whiteboard: [qa-examined-192])

Attachments

(3 files, 2 obsolete files)

Zoomed using the mouse wheel in and out of the street view along the route. Not 100% reproducible but zooming in and out several times will crash eventually.

Signature	nsDisplayList::PaintThebesLayers
UUID	81622faa-61bd-46c0-b5b4-835f82100419
Time 	2010-04-19 08:56:37.365999
Uptime	8952
Last Crash	8965 seconds before submission
Product	Firefox
Version	3.7a5pre
Build ID	20100419030620
Branch	1.9.3
OS	Linux
OS Version	0.0.0 Linux 2.6.33-ARCH #1 SMP PREEMPT Sun Apr 4 10:27:30 CEST 2010 x86_64
CPU	amd64
CPU Info	family 6 model 30 stepping 5
Crash Reason	SIGSEGV
Crash Address	0x0
User Comments	kbrosnan
Processor Notes 	
Related Bugs

Crashing Thread
Frame 	Module 	Signature 	Source
0 	libxul.so 	nsDisplayList::PaintThebesLayers 	layout/style/nsRuleNode.h:724
1 	libxul.so 	nsDisplayList::Paint 	layout/base/nsDisplayList.cpp:778
2 	libxul.so 	nsLayoutUtils::PaintFrame 	layout/base/nsLayoutUtils.cpp:1254
3 	libxul.so 	PresShell::Paint 	layout/base/nsPresShell.cpp:5625
4 	libxul.so 	nsViewManager::Refresh 	view/src/nsViewManager.cpp:426
5 	libxul.so 	nsViewManager::DispatchEvent 	view/src/nsViewManager.cpp:877
6 	libxul.so 	HandleEvent 	view/src/nsView.cpp:160
7 	libxul.so 	nsWindow::DispatchEvent 	widget/src/gtk2/nsWindow.cpp:587
8 	libxul.so 	nsWindow::OnExposeEvent 	widget/src/gtk2/nsWindow.cpp:2494
9 	libxul.so 	expose_event_cb 	widget/src/gtk2/nsWindow.cpp:5703
10 	libgtk-x11-2.0.so.0.2000.0 	libgtk-x11-2.0.so.0.2000.0@0x138287
Karl, probably more "paint must win races" issues, can you look?
Assignee: nobody → karlt
blocking1.9.2: --- → ?
Haven't reproduced a crash with a debug build from e3c21e4b1d7d.
Saw some messages:

###!!! ASSERTION: constructing frames in the middle of a paint: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_Paint] == 0', file /home/karl/moz/dev/layout/base/nsPresContext.h, line 1237
WARNING: Fix the caller!: file /home/karl/moz/dev/content/events/src/nsEventDispatcher.cpp, line 488

Most crash reports seem to be 20100418030330 or 20100419030620, though there
is one 20100411030619.

Will try fresh source.

Kevin, re "using the mouse wheel in and out of the street view", do you mean
only zooming so far in and out that street view is still showing, or do you
mean zooming in and out such that street view appears and disappears?
If the latter, does the crash seem to happen after zooming in, or out, or both?
Summary: crash [@ nsDisplayList::PaintThebesLayers] → crash [@ nsDisplayList::PaintThebesLayers] zooming street view
The latter. Bouncing in and out of the street view. Start scrolling the wheel till the street view is triggered, then immediately reverse direction to exit the street view.
(In reply to comment #1)
> Karl, probably more "paint must win races" issues, can you look?

Looks like it.
Um

#if defined(MOZ_X11)
    switch (npevent->type) {
    case GraphicsExpose:
        PLUGIN_LOG_DEBUG(("  schlepping drawable 0x%lx across the pipe\n",
                          npevent->xgraphicsexpose.drawable));
        // Make sure the X server has created the Drawable and completes any
        // drawing before the plugin draws on top.
        //
        // XSync() waits for the X server to complete.  Really this parent
        // process does not need to wait; the child is the process that needs
        // to wait.  A possibly-slightly-better alternative would be to send
        // an X event to the child that the child would wait for.
        XSync(GetXDisplay(), False);
        break;
    case ButtonPress:
        // Release any active pointer grab so that the plugin X client can
        // grab the pointer if it wishes.
        Display *dpy = GetXDisplay();
#  ifdef MOZ_WIDGET_GTK2
        // GDK attempts to (asynchronously) track whether there is an active
        // grab so ungrab through GDK.
        gdk_pointer_ungrab(npevent->xbutton.time);
#  else
        XUngrabPointer(dpy, npevent->xbutton.time);
#  endif
        // Wait for the ungrab to complete.
        XSync(dpy, False);
        break;

        return CallPaint(npremoteevent, &handled) ? handled : 0;
    }
#endif

CallPaint() is dead code above.
Attachment #440147 - Flags: review?(jones.chris.g)
Attachment #440147 - Flags: review?(jones.chris.g) → review+
Blocks: 544211, OOPP
I didn't write a test for "paint wins" way back when because it seemed very hard.  I think this might work though

  browser                            plugin
 ------------------                 ---------------
  [onload]
  call armEvalTimer ==>              [wakeup]
                                     NPN_AsyncCall(evalDomMod, 0ms)
  setTimeout(1sec, forceRepaint)
  [event loop]
                                     [wakeup]
                                     PR_Sleep(2secs)

  [wakeup]
  forceRepaint()
  ...
  CallHandleEvent()
                                     [~1sec elapses]
                                     eval("dom.modify();")
                                     CallEval()
            ==> RPC race <==

The problems I see with this test are (i) timing dependent, obviously, though the failure mode should be spurious pass; (ii) assuming no NPAPI calls are made on the plugin between |setTimeout(forceRepaint)| and |repaint()|.
Test seems to work

plugin: sleeping for 2 seconds
plugin: forcing a 'safe' RPC race with SetWindow()
  (parent won, so we're not deferring)
  (parent won, so we're deferring)
  (processing deferred in-call)
plugin: forcing Evaluate() to race with a painting call
  (child won, so we're not deferring)
  (child won, so we're deferring)
###!!! ASSERTION: constructing frames in the middle of a paint: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_Paint] == 0', file /home/cjones/mozilla/mozilla-central/layout/base/nsPresContext.h, line 1258
These tests work quite reliably for me when run in isolation, but quite poorly when run in mochitest-ipcplugins.  The reason is that the testplugin makes assumptions about which browser message is pending, and GC throws those assumptions out the window by sending masses of __delete__() messages at random times.

Another issue is that although the test reliably repro's the modify-during-paint bug and I see the assertion failure, I don't know to make the test fail other than to crash.  Karl, would it crash in an opt build?
Attachment #440237 - Flags: review?(karlt)
Uses roc's code to force crashes in opt builds.  This relies on crashing the OOPP to remove any possibility of __delete__() caused by GC interfering with the painting tests, but this could possibly be removed with effort.

Before karl's push (opt build):

plugin: sleeping for 2 seconds
plugin: enqueuing deferred Evaluate() race
plugin: forcing Evaluate() to race with a painting call
  (child won, so we're not deferring)
  (child won, so we're deferring)
  (processing deferred in-call)
TEST-UNEXPECTED-FAIL | automation.py | Exited with code 11 during test run

After:

plugin: sleeping for 2 seconds
plugin: enqueuing deferred Evaluate() race
plugin: forcing Evaluate() to race with a painting call
  (parent won, so we're deferring)
  (parent won, so we're not deferring)
  (processing deferred in-call)

###!!! [Parent][RPCChannel] Error: Channel error: cannot send/recv
Attachment #440237 - Attachment is obsolete: true
Attachment #440426 - Flags: review?(karlt)
Attachment #440237 - Flags: review?(karlt)
Comment on attachment 440147 [details] [diff] [review]
use CallPaint for GraphicsExpose [pushed m-c]

http://hg.mozilla.org/mozilla-central/rev/f19782aa6ada
Attachment #440147 - Attachment description: use CallPaint for GraphicsExpose → use CallPaint for GraphicsExpose [pushed m-c]
Attachment #440147 - Flags: approval1.9.2.4?
blocking1.9.2: ? → .4+
Keywords: regression
Comment on attachment 440426 [details] [diff] [review]
Test SetWindow() and Paint() racing with Evaluate()

The test depends on our current paint model somewhat to get SetWindow then
a paint event.  This might change with retained layers.

A possible improvement might be to do 2 consecutive runs on the same plugin,
and so the second run with the same invalidate rect should not have a
SetWindow call.

Using getPaintCount from the plugin's evaluate to test success would make 2
consecutive runs on the same plugin easier, and we'd be more likely to know
when the test wasn't doing what we wanted, and could mean that only one
plugin crash would be required.

Guaranteeing a SetWindow before the paint is harder.  A plugin resize would do
this but I fear that might cause style re-resolution and therefore more
messages to mess things up.

However, if when the paint model changes painting becomes async, then the order
won't matter and this test will no longer be required.  So I think it's
reasonable to use this approach for now and we can see what changes in our
next paint model overhaul.

(I talked with cjones on irc about some other minor issues.)
Attachment #440426 - Flags: review?(karlt)
Marking fixed (though the test is still to land).
Status: NEW → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
Updated per karlt's IRC comments.  Ready to land, tree willin' and the creek don't rise.
Attachment #440426 - Attachment is obsolete: true
Attachment #440716 - Flags: review+
Comment on attachment 440147 [details] [diff] [review]
use CallPaint for GraphicsExpose [pushed m-c]

a=LegNeato for 1.9.2.4
Attachment #440147 - Flags: approval1.9.2.4? → approval1.9.2.4+
v. Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.3a5pre) Gecko/20100422 Minefield/3.7a5pre
Status: RESOLVED → VERIFIED
At the last second, I found a problem with this test for IPP, apparently related to retaining the window object incorrectly (too long?).  I'll fix this up and make sure it still does what we want.
When will the test be checked in for this?
I don't know.  It will take a long time for me to figure out whether the test is failing in-process because the test is buggy, or because our in-process plugin code is buggy.  It's not a terribly high priority for me atm.
All right. QA would like to verify that this fix is functioning correctly. The original repro case didn't crash all the time, which makes it difficult.

No offense meant to anyone but I don't want a test that is mostly written to move completely to the back of someone's queue on branch and then never wind up being finished, so it was worth asking.
(In reply to comment #18)
> At the last second, I found a problem with this test for IPP, apparently
> related to retaining the window object incorrectly (too long?).

Could that be related to bp-84588e1e-744f-4f08-bdba-2fc4b2100424?
Likely.  bsmedberg says he might know what the problem is (appears to me to be some wrapper getting GC'd too early), spun this off into bug 561844.
Whiteboard: [qa-examined-192]
(In reply to comment #22)
I spent a good 15 min zooming in and out of street view, normally I would crash 1 min or so. I am reasonably confident that the patch resolved the issue.
Kevin, did you look at this with a 1.9.2 build?
I was able to reproduce the crash on trunk from a build from 4/19 but not on the latest trunk nightly. I was not able to reproduce the problem on 1.9.2 at all, not on 3.6.4, nor a nightly from 4/19, nor in the latest 1.9.2 nightly.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: