Closed Bug 560246 Opened 15 years ago Closed 15 years ago

crash [@ nsDisplayList::PaintThebesLayers] zooming street view

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Linux
defect
Not set
major

Tracking

(blocking1.9.2 .4+, status1.9.2 .4-fixed)

VERIFIED FIXED
mozilla1.9.3a5
Tracking Status
blocking1.9.2 --- .4+
status1.9.2 --- .4-fixed

People

(Reporter: kbrosnan, Assigned: karlt)

References

()

Details

(Keywords: regression, Whiteboard: [qa-examined-192])

Attachments

(3 files, 2 obsolete files)

Zoomed using the mouse wheel in and out of the street view along the route. Not 100% reproducible but zooming in and out several times will crash eventually. Signature nsDisplayList::PaintThebesLayers UUID 81622faa-61bd-46c0-b5b4-835f82100419 Time 2010-04-19 08:56:37.365999 Uptime 8952 Last Crash 8965 seconds before submission Product Firefox Version 3.7a5pre Build ID 20100419030620 Branch 1.9.3 OS Linux OS Version 0.0.0 Linux 2.6.33-ARCH #1 SMP PREEMPT Sun Apr 4 10:27:30 CEST 2010 x86_64 CPU amd64 CPU Info family 6 model 30 stepping 5 Crash Reason SIGSEGV Crash Address 0x0 User Comments kbrosnan Processor Notes Related Bugs Crashing Thread Frame Module Signature Source 0 libxul.so nsDisplayList::PaintThebesLayers layout/style/nsRuleNode.h:724 1 libxul.so nsDisplayList::Paint layout/base/nsDisplayList.cpp:778 2 libxul.so nsLayoutUtils::PaintFrame layout/base/nsLayoutUtils.cpp:1254 3 libxul.so PresShell::Paint layout/base/nsPresShell.cpp:5625 4 libxul.so nsViewManager::Refresh view/src/nsViewManager.cpp:426 5 libxul.so nsViewManager::DispatchEvent view/src/nsViewManager.cpp:877 6 libxul.so HandleEvent view/src/nsView.cpp:160 7 libxul.so nsWindow::DispatchEvent widget/src/gtk2/nsWindow.cpp:587 8 libxul.so nsWindow::OnExposeEvent widget/src/gtk2/nsWindow.cpp:2494 9 libxul.so expose_event_cb widget/src/gtk2/nsWindow.cpp:5703 10 libgtk-x11-2.0.so.0.2000.0 libgtk-x11-2.0.so.0.2000.0@0x138287
Karl, probably more "paint must win races" issues, can you look?
Assignee: nobody → karlt
blocking1.9.2: --- → ?
Haven't reproduced a crash with a debug build from e3c21e4b1d7d. Saw some messages: ###!!! ASSERTION: constructing frames in the middle of a paint: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_Paint] == 0', file /home/karl/moz/dev/layout/base/nsPresContext.h, line 1237 WARNING: Fix the caller!: file /home/karl/moz/dev/content/events/src/nsEventDispatcher.cpp, line 488 Most crash reports seem to be 20100418030330 or 20100419030620, though there is one 20100411030619. Will try fresh source. Kevin, re "using the mouse wheel in and out of the street view", do you mean only zooming so far in and out that street view is still showing, or do you mean zooming in and out such that street view appears and disappears? If the latter, does the crash seem to happen after zooming in, or out, or both?
Summary: crash [@ nsDisplayList::PaintThebesLayers] → crash [@ nsDisplayList::PaintThebesLayers] zooming street view
The latter. Bouncing in and out of the street view. Start scrolling the wheel till the street view is triggered, then immediately reverse direction to exit the street view.
(In reply to comment #1) > Karl, probably more "paint must win races" issues, can you look? Looks like it.
Um #if defined(MOZ_X11) switch (npevent->type) { case GraphicsExpose: PLUGIN_LOG_DEBUG((" schlepping drawable 0x%lx across the pipe\n", npevent->xgraphicsexpose.drawable)); // Make sure the X server has created the Drawable and completes any // drawing before the plugin draws on top. // // XSync() waits for the X server to complete. Really this parent // process does not need to wait; the child is the process that needs // to wait. A possibly-slightly-better alternative would be to send // an X event to the child that the child would wait for. XSync(GetXDisplay(), False); break; case ButtonPress: // Release any active pointer grab so that the plugin X client can // grab the pointer if it wishes. Display *dpy = GetXDisplay(); # ifdef MOZ_WIDGET_GTK2 // GDK attempts to (asynchronously) track whether there is an active // grab so ungrab through GDK. gdk_pointer_ungrab(npevent->xbutton.time); # else XUngrabPointer(dpy, npevent->xbutton.time); # endif // Wait for the ungrab to complete. XSync(dpy, False); break; return CallPaint(npremoteevent, &handled) ? handled : 0; } #endif CallPaint() is dead code above.
Attachment #440147 - Flags: review?(jones.chris.g)
Attachment #440147 - Flags: review?(jones.chris.g) → review+
Blocks: 544211, OOPP
I didn't write a test for "paint wins" way back when because it seemed very hard. I think this might work though browser plugin ------------------ --------------- [onload] call armEvalTimer ==> [wakeup] NPN_AsyncCall(evalDomMod, 0ms) setTimeout(1sec, forceRepaint) [event loop] [wakeup] PR_Sleep(2secs) [wakeup] forceRepaint() ... CallHandleEvent() [~1sec elapses] eval("dom.modify();") CallEval() ==> RPC race <== The problems I see with this test are (i) timing dependent, obviously, though the failure mode should be spurious pass; (ii) assuming no NPAPI calls are made on the plugin between |setTimeout(forceRepaint)| and |repaint()|.
Test seems to work plugin: sleeping for 2 seconds plugin: forcing a 'safe' RPC race with SetWindow() (parent won, so we're not deferring) (parent won, so we're deferring) (processing deferred in-call) plugin: forcing Evaluate() to race with a painting call (child won, so we're not deferring) (child won, so we're deferring) ###!!! ASSERTION: constructing frames in the middle of a paint: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_Paint] == 0', file /home/cjones/mozilla/mozilla-central/layout/base/nsPresContext.h, line 1258
These tests work quite reliably for me when run in isolation, but quite poorly when run in mochitest-ipcplugins. The reason is that the testplugin makes assumptions about which browser message is pending, and GC throws those assumptions out the window by sending masses of __delete__() messages at random times. Another issue is that although the test reliably repro's the modify-during-paint bug and I see the assertion failure, I don't know to make the test fail other than to crash. Karl, would it crash in an opt build?
Attachment #440237 - Flags: review?(karlt)
Uses roc's code to force crashes in opt builds. This relies on crashing the OOPP to remove any possibility of __delete__() caused by GC interfering with the painting tests, but this could possibly be removed with effort. Before karl's push (opt build): plugin: sleeping for 2 seconds plugin: enqueuing deferred Evaluate() race plugin: forcing Evaluate() to race with a painting call (child won, so we're not deferring) (child won, so we're deferring) (processing deferred in-call) TEST-UNEXPECTED-FAIL | automation.py | Exited with code 11 during test run After: plugin: sleeping for 2 seconds plugin: enqueuing deferred Evaluate() race plugin: forcing Evaluate() to race with a painting call (parent won, so we're deferring) (parent won, so we're not deferring) (processing deferred in-call) ###!!! [Parent][RPCChannel] Error: Channel error: cannot send/recv
Attachment #440237 - Attachment is obsolete: true
Attachment #440426 - Flags: review?(karlt)
Attachment #440237 - Flags: review?(karlt)
Attachment #440147 - Attachment description: use CallPaint for GraphicsExpose → use CallPaint for GraphicsExpose [pushed m-c]
Attachment #440147 - Flags: approval1.9.2.4?
blocking1.9.2: ? → .4+
Keywords: regression
Comment on attachment 440426 [details] [diff] [review] Test SetWindow() and Paint() racing with Evaluate() The test depends on our current paint model somewhat to get SetWindow then a paint event. This might change with retained layers. A possible improvement might be to do 2 consecutive runs on the same plugin, and so the second run with the same invalidate rect should not have a SetWindow call. Using getPaintCount from the plugin's evaluate to test success would make 2 consecutive runs on the same plugin easier, and we'd be more likely to know when the test wasn't doing what we wanted, and could mean that only one plugin crash would be required. Guaranteeing a SetWindow before the paint is harder. A plugin resize would do this but I fear that might cause style re-resolution and therefore more messages to mess things up. However, if when the paint model changes painting becomes async, then the order won't matter and this test will no longer be required. So I think it's reasonable to use this approach for now and we can see what changes in our next paint model overhaul. (I talked with cjones on irc about some other minor issues.)
Attachment #440426 - Flags: review?(karlt)
Marking fixed (though the test is still to land).
Status: NEW → RESOLVED
Closed: 15 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
Updated per karlt's IRC comments. Ready to land, tree willin' and the creek don't rise.
Attachment #440426 - Attachment is obsolete: true
Attachment #440716 - Flags: review+
Comment on attachment 440147 [details] [diff] [review] use CallPaint for GraphicsExpose [pushed m-c] a=LegNeato for 1.9.2.4
Attachment #440147 - Flags: approval1.9.2.4? → approval1.9.2.4+
v. Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.3a5pre) Gecko/20100422 Minefield/3.7a5pre
Status: RESOLVED → VERIFIED
At the last second, I found a problem with this test for IPP, apparently related to retaining the window object incorrectly (too long?). I'll fix this up and make sure it still does what we want.
When will the test be checked in for this?
I don't know. It will take a long time for me to figure out whether the test is failing in-process because the test is buggy, or because our in-process plugin code is buggy. It's not a terribly high priority for me atm.
All right. QA would like to verify that this fix is functioning correctly. The original repro case didn't crash all the time, which makes it difficult. No offense meant to anyone but I don't want a test that is mostly written to move completely to the back of someone's queue on branch and then never wind up being finished, so it was worth asking.
(In reply to comment #18) > At the last second, I found a problem with this test for IPP, apparently > related to retaining the window object incorrectly (too long?). Could that be related to bp-84588e1e-744f-4f08-bdba-2fc4b2100424?
Likely. bsmedberg says he might know what the problem is (appears to me to be some wrapper getting GC'd too early), spun this off into bug 561844.
Whiteboard: [qa-examined-192]
(In reply to comment #22) I spent a good 15 min zooming in and out of street view, normally I would crash 1 min or so. I am reasonably confident that the patch resolved the issue.
Kevin, did you look at this with a 1.9.2 build?
I was able to reproduce the crash on trunk from a build from 4/19 but not on the latest trunk nightly. I was not able to reproduce the problem on 1.9.2 at all, not on 3.6.4, nor a nightly from 4/19, nor in the latest 1.9.2 nightly.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: