Closed Bug 1000633 Opened 10 years ago Closed 10 years ago

B2G: Data Race via AsyncPanZoomAnimation::ExecuteDeferredTasks

Categories

(Core :: Panning and Zooming, defect)

All
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32
blocking-b2g 1.4+
Tracking Status
firefox30 --- wontfix
firefox31 --- fixed
firefox32 --- fixed
b2g-v1.3 --- unaffected
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: gwagner, Assigned: botond)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Current trunk on nexus 4.

STR:
scrolling in settings app

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 4428.4499]
0xb4803856 in operator[] (i=1, this=0xa34cd610) at ../../dist/include/mozilla/Vector.h:372
372	      MOZ_ASSERT(!entered);
(gdb) bt
#0  0xb4803856 in operator[] (i=1, this=0xa34cd610) at ../../dist/include/mozilla/Vector.h:372
#1  mozilla::layers::AsyncPanZoomAnimation::ExecuteDeferredTasks (this=0xa34cd600) at ../../../gfx/layers/apz/src/AsyncPanZoomController.h:956
#2  0xb48051be in mozilla::layers::AsyncPanZoomController::SampleContentTransformForFrame (this=0xa346d800, aSampleTime=..., aNewTransform=0xafaff244, 
    aScrollOffset=<optimized out>) at ../../../gfx/layers/apz/src/AsyncPanZoomController.cpp:1624
#3  0xb481b44e in mozilla::layers::AsyncCompositionManager::ApplyAsyncContentTransformToTree (this=0xae2766a0, aCurrentFrame=..., aLayer=0xa34e7800, 
    aWantNextFrame=0xafaffbfb) at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:518
#4  0xb481b3ec in mozilla::layers::AsyncCompositionManager::ApplyAsyncContentTransformToTree (this=0xae2766a0, aCurrentFrame=..., aLayer=0xa34e7000, 
    aWantNextFrame=0xafaffbfb) at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:501
#5  0xb481b3ec in mozilla::layers::AsyncCompositionManager::ApplyAsyncContentTransformToTree (this=0xae2766a0, aCurrentFrame=..., aLayer=0xa34e2400, 
    aWantNextFrame=0xafaffbfb) at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:501
#6  0xb481b3ec in mozilla::layers::AsyncCompositionManager::ApplyAsyncContentTransformToTree (this=0xae2766a0, aCurrentFrame=..., aLayer=0xa793e800, 
    aWantNextFrame=0xafaffbfb) at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:501
#7  0xb481b3ec in mozilla::layers::AsyncCompositionManager::ApplyAsyncContentTransformToTree (this=0xae2766a0, aCurrentFrame=..., aLayer=0xa7939800, 
    aWantNextFrame=0xafaffbfb) at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:501
#8  0xb481b3ec in mozilla::layers::AsyncCompositionManager::ApplyAsyncContentTransformToTree (this=0xae2766a0, aCurrentFrame=..., aLayer=0xaf67f000, 
    aWantNextFrame=0xafaffbfb) at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:501
#9  0xb481b3ec in mozilla::layers::AsyncCompositionManager::ApplyAsyncContentTransformToTree (this=0xae2766a0, aCurrentFrame=..., aLayer=0xaf67e800, 
    aWantNextFrame=0xafaffbfb) at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:501
#10 0xb481c238 in mozilla::layers::AsyncCompositionManager::TransformShadowTree (this=0xae2766a0, aCurrentFrame=...)
    at ../../../gfx/layers/composite/AsyncCompositionManager.cpp:870
#11 0xb482910e in mozilla::layers::CompositorParent::CompositeToTarget (this=0xae56c400, aTarget=0x0) at ../../../gfx/layers/ipc/CompositorParent.cpp:639
#12 0xb4372290 in DispatchToMethod<FdWatcher, void (FdWatcher::*)()> (method=
    (void (FdWatcher::*)(FdWatcher * const)) 0xb4829297 <mozilla::layers::CompositorParent::Composite()>, obj=<optimized out>, arg=<optimized out>)
    at ../../../ipc/chromium/src/base/tuple.h:383
#13 RunnableMethod<FdWatcher, void (FdWatcher::*)(), Tuple0>::Run (this=<optimized out>) at ../../../ipc/chromium/src/base/task.h:307
#14 0xb45461c4 in MessageLoop::RunTask (this=0xafaffde0, task=0xac11e560) at ../../../ipc/chromium/src/base/message_loop.cc:344
#15 0xb4546b02 in MessageLoop::DeferOrRunPendingTask (this=<optimized out>, pending_task=<optimized out>) at ../../../ipc/chromium/src/base/message_loop.cc:352
#16 0xb4546e06 in DoDelayedWork (next_delayed_work_time=0xb213b6f0, this=0xafaffde0) at ../../../ipc/chromium/src/base/message_loop.cc:457
#17 MessageLoop::DoDelayedWork (this=0xafaffde0, next_delayed_work_time=0xb213b6f0) at ../../../ipc/chromium/src/base/message_loop.cc:440
#18 0xb4547e2e in base::MessagePumpDefault::Run (this=0xb213b6e0, delegate=0xafaffde0) at ../../../ipc/chromium/src/base/message_pump_default.cc:39
#19 0xb454630a in MessageLoop::RunInternal (this=0xafaffde0) at ../../../ipc/chromium/src/base/message_loop.cc:226
#20 0xb4546322 in RunHandler (this=0xafaffde0) at ../../../ipc/chromium/src/base/message_loop.cc:219
#21 MessageLoop::Run (this=0xafaffde0) at ../../../ipc/chromium/src/base/message_loop.cc:193
#22 0xb45499ae in base::Thread::ThreadMain (this=0xb213a760) at ../../../ipc/chromium/src/base/thread.cc:164
#23 0xb453e1fc in ThreadFunc (closure=<optimized out>) at ../../../ipc/chromium/src/base/platform_thread_posix.cc:39
#24 0xb6efca5c in __thread_entry (func=0xb453e1f5 <ThreadFunc(void*)>, arg=0xb213a760, tls=0xafafff00) at bionic/libc/bionic/pthread_create.cpp:92
#25 0xb6efcbd8 in pthread_create (thread_out=0xb213a768, attr=<optimized out>, start_routine=0x78, arg=0xb213a760) at bionic/libc/bionic/pthread_create.
Main Thread:

* 3    Thread 4428.4499  0xb4803856 in operator[] (i=1, this=0xa34cd610) at ../../dist/include/mozilla/Vector.h:372
  2    Thread 4428.4489  __futex_syscall3 () at bionic/libc/arch-arm/bionic/futex_arm.S:40
  1    Thread 4428.4428  write () at bionic/libc/arch-arm/syscalls/write.S:10
(gdb) thread 1
[Switching to thread 1 (Thread 4428.4428)]
#0  write () at bionic/libc/arch-arm/syscalls/write.S:10
10	    mov     r7, ip
(gdb) bt
#0  write () at bionic/libc/arch-arm/syscalls/write.S:10
#1  0xb453d8d8 in base::MessagePumpLibevent::ScheduleWork (this=0xb6b01e50) at ../../../ipc/chromium/src/base/message_pump_libevent.cc:363
#2  0xb4546ba4 in MessageLoop::PostTask_Helper (this=0xb2a52de0, from_here=<optimized out>, task=0xac11e400, delay_ms=<optimized out>, nestable=true)
    at ../../../ipc/chromium/src/base/message_loop.cc:315
#3  0xb4546bd0 in MessageLoop::PostTask (this=<optimized out>, from_here=<optimized out>, task=<optimized out>)
    at ../../../ipc/chromium/src/base/message_loop.cc:259
#4  0xb455891e in mozilla::ipc::ProcessLink::SendMessage (this=<optimized out>, msg=0xa34bcf00) at ../../../ipc/glue/MessageLink.cpp:170
#5  0xb45574aa in mozilla::ipc::MessageChannel::Send (this=<optimized out>, aMsg=<optimized out>) at ../../../ipc/glue/MessageChannel.cpp:416
#6  0xb4570e4a in mozilla::dom::PBrowserParent::SendNotifyAPZStateChange (this=0xabaa0570, aViewId=@0xbeff9b40, aChange=@0xbeff9b50, aArg=<optimized out>)
    at PBrowserParent.cpp:481
#7  0xb4ace016 in mozilla::dom::TabParent::NotifyAPZStateChange (this=<optimized out>, aViewId=12589493212747399168, 
    aChange=mozilla::layers::GeckoContentController::StartTouch, aArg=0) at ../../../dom/ipc/TabParent.cpp:562
#8  0xb520b3b8 in mozilla::layout::RemoteContentController::NotifyAPZStateChange (this=0xac6839c0, aGuid=..., aChange=<optimized out>, aArg=0)
    at ../../../layout/ipc/RenderFrameParent.cpp:674
#9  0xb48057aa in mozilla::layers::AsyncPanZoomController::OnTouchStart (this=0xa346d800, aEvent=<optimized out>)
    at ../../../gfx/layers/apz/src/AsyncPanZoomController.cpp:626
#10 0xb480581a in mozilla::layers::AsyncPanZoomController::HandleInputEvent (this=0xa346d800, aEvent=...)
    at ../../../gfx/layers/apz/src/AsyncPanZoomController.cpp:549
#11 0xb48064be in mozilla::layers::AsyncPanZoomController::ReceiveInputEvent (this=0xa346d800, aEvent=...)
    at ../../../gfx/layers/apz/src/AsyncPanZoomController.cpp:530
#12 0xb48014cc in mozilla::layers::APZCTreeManager::ProcessTouchEvent (this=0xb0021c90, aEvent=..., aOutTargetGuid=0xbeff9d68)
    at ../../../gfx/layers/apz/src/APZCTreeManager.cpp:497
#13 0xb4acfec0 in SendRealTouchEvent (event=..., this=0xabaa0570) at ../../../dom/ipc/TabParent.cpp:902
#14 mozilla::dom::TabParent::SendRealTouchEvent (this=0xabaa0570, event=...) at ../../../dom/ipc/TabParent.cpp:864
#15 0xb4c317c6 in mozilla::EventStateManager::HandleCrossProcessEvent (this=0xaee9cea0, aEvent=0xbeffa100, aTargetFrame=<optimized out>, aStatus=0xbeffa0a8)
    at ../../../dom/events/EventStateManager.cpp:1220
#16 0xb4c34c6c in mozilla::EventStateManager::PostHandleEvent (this=0xaee9cea0, aPresContext=0xae57cc00, aEvent=0xbeffa100, aTargetFrame=0xaba06f00, 
    aStatus=0xbeffa0a8) at ../../../dom/events/EventStateManager.cpp:2634
#17 0xb50dadc6 in PresShell::HandleEventInternal (this=0xaf7cac00, aEvent=0xbeffa100, aStatus=0xbeffa0a8) at ../../../layout/base/nsPresShell.cpp:7402
#18 0xb50dafc0 in PresShell::HandlePositionedEvent (this=0xaf7cac00, aTargetFrame=<optimized out>, aEvent=0xbeffa100, aEventStatus=0xbeffa0a8)
    at ../../../layout/base/nsPresShell.cpp:7119
#19 0xb50db9fe in PresShell::HandleEvent (this=0xb21c8b80, aFrame=<optimized out>, aEvent=0xbeffa100, aDontRetargetEvents=<optimized out>, 
    aEventStatus=0xbeffa0a8) at ../../../layout/base/nsPresShell.cpp:6919
#20 0xb4da09e0 in nsViewManager::DispatchEvent (this=<optimized out>, aEvent=0xbeffa100, aView=<optimized out>, aStatus=0xbeffa0a8)
    at ../../../view/src/nsViewManager.cpp:782
#21 0xb4d9e4fc in nsView::HandleEvent (this=<optimized out>, aEvent=0xbeffa100, aUseAttachedEvents=<optimized out>) at ../../../view/src/nsView.cpp:1097
#22 0xb4ae986e in nsWindow::DispatchEvent (this=0xb005a2e0, aEvent=<optimized out>, aStatus=@0xbeffa0d4) at ../../../widget/gonk/nsWindow.cpp:460
#23 0xb4aea9f2 in nsWindow::DispatchInputEvent (aEvent=..., aWasCaptured=0xbeffa0fb) at ../../../widget/gonk/nsWindow.cpp:269
#24 0xb4ae89b8 in sendTouchEvent (captured=0xbeffa0fb, data=...) at ../../../widget/gonk/nsAppShell.cpp:273
#25 GeckoInputDispatcher::dispatchOnce (this=<optimized out>) at ../../../widget/gonk/nsAppShell.cpp:737
Recent regression?
Component: Graphics → Panning and Zooming
That code was added recently in bug 965871, so likely yeah a regression from that.
Assignee: nobody → botond
Blocks: 965871
After examining the code, I found a data race which is likely the one causing this error.

The race is between 

  - AsyncPanZoomController::CancelAnimation(), which can be called 
    from the main thread, invoking the animation's destructor and
    therefore the destructor of the Vector stored by the animation

  - AsyncPanZoomController::SampleContentTransformForFrame() which
    calls ExecuteDeferredTasks() on the animation, which accesses
    the Vector via operator[]

CancelAnimation() acquires mMonitor before accessing the animation,
but SampleContentTransformForFrame() does not at the place where
it calls ExecuteDeferredTasks(). I did this on purpose because the
deferred tasks can call APZCTreeManager functions which can acqurie
the tree lock, and holding the monitor while doing that would
violate the lock order. I did not, of course, realize that I would
be introducing the race described above.

To make everyone happy, we can change ExecuteDeferredTasks() to
GetDeferredTasks(), call that while we are still holding the 
monitor, then release the monitor, and finally execute the tasks.
The tasks themselves cannot then be allowed to reference the
animation, but that's fine because they shouldn't need to (and 
don't currently).
Attached patch bug1000633.patch (obsolete) — Splinter Review
T-push: https://tbpl.mozilla.org/?tree=Try&rev=ab9c35312976
Attachment #8412757 - Flags: review?(bugmail.mozilla)
Comment on attachment 8412757 [details] [diff] [review]
bug1000633.patch

Review of attachment 8412757 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the change below

::: gfx/layers/apz/src/AsyncPanZoomController.h
@@ +949,2 @@
>     */
> +  Vector<Task*> GetDeferredTasks() {

I'd rather change the name here to something that indicates it's clearing mDeferredTasks as well. GetAndClearDeferredTasks() ?
Attachment #8412757 - Flags: review?(bugmail.mozilla) → review+
Nom'ing for 1.4 uplift since the regression was introduced in 30.
blocking-b2g: --- → 1.4?
Attached patch bug1000633.patchSplinter Review
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> ::: gfx/layers/apz/src/AsyncPanZoomController.h
> @@ +949,2 @@
> >     */
> > +  Vector<Task*> GetDeferredTasks() {
> 
> I'd rather change the name here to something that indicates it's clearing
> mDeferredTasks as well. GetAndClearDeferredTasks() ?

Updated to TakeDeferredTasks(), carrying r+.
Attachment #8412757 - Attachment is obsolete: true
Attachment #8412791 - Flags: review+
(In reply to Botond Ballo [:botond] from comment #5)
> Created attachment 8412757 [details] [diff] [review]
> bug1000633.patch
> 
> T-push: https://tbpl.mozilla.org/?tree=Try&rev=ab9c35312976

The B2G debug build failures are "fetch errors", so unrelated.

https://hg.mozilla.org/integration/mozilla-inbound/rev/107913740c1f
So, this is a regression, introduced in 30 just before it went Aurora.  We would like it uplifted to 30 (1.4).  Is this relevant to Fennec as well?  I'm assuming 1.4 == 30beta still?
Keywords: regression
(In reply to Milan Sreckovic [:milan, travelling, maybe slow to respond] from comment #10)
> Is this relevant to Fennec as well? 

Nope. This is in the part of the APZ code that Fennec doesn't use (which is most of it).
(In reply to Milan Sreckovic [:milan, travelling, maybe slow to respond] from comment #10)
> I'm assuming 1.4 == 30beta still?

1.4 has been branched and is no longer the same as mozilla-beta.
https://hg.mozilla.org/mozilla-central/rev/107913740c1f
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
IRC context for blocking analysis:

10:35 AM <jsmith_> botond: For bug 1000633 - can you indicate what the user impact is if we don't fix that bug on 1.4?
10:36 AM <kats> jsmith_: possible crashes in the b2g process
10:37 AM <jsmith_> kats: ok - what's the risk of taking the patch?
10:38 AM <kats> jsmith_: pretty low, it just adjusts a little code to be locked more correctly
10:39 AM <kats> jsmith_: even if it's broken it should only affect a small set of scenarios and be easily caught
1.4+ per comment #14.
blocking-b2g: 1.4? → 1.4+
Comment on attachment 8412791 [details] [diff] [review]
bug1000633.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
  Bug 965871 (APZ fling handoff)

User impact if declined: 
  Possibility of undefined behaviour (memory corruption etc.)
  when performing flings. Intermittent crashes in debug builds.

Testing completed (on m-c, etc.): 
  Has been on m-c for almost a month.

Risk to taking this patch (and alternatives if risky): 
  Very low. This code is mainly used on B2G, and it was
  already uplifted to 1.4 (and will go into 2.0 via the
  normal train).

String or IDL/UUID changes made by this patch:
  None.

Note that this was already uplifted to B2G 1.4. The purpose of this uplift is mainly to have a dependent patch in bug 1015331 uplift cleanly.
Attachment #8412791 - Flags: approval-mozilla-aurora?
Comment on attachment 8412791 [details] [diff] [review]
bug1000633.patch

Approving on aurora so this can get uplifted cleanly : https://bugzilla.mozilla.org/show_bug.cgi?id=1015331
Attachment #8412791 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Moving to Gonk since this is a B2G bug.
OS: Mac OS X → Gonk (Firefox OS)
Hardware: x86 → All
You need to log in before you can comment on or make changes to this bug.