Closed Bug 1213273 Opened 6 years ago Closed 6 years ago

crash in nsTimerImpl::Cancel()

Categories

(Core :: Panning and Zooming, defect)

43 Branch
Unspecified
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: tracy, Assigned: botond)

References

Details

(Keywords: crash, topcrash-win)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-71158104-0449-4c1a-acb3-894922151008.
=============================================================

with other issues clearing out, this has begun to appear in top 10 crashes on Nightly. It is currently at #5 with ~150 crashes in the past seven days.

It has been around in low volume since middle of September.  However, it picked up in volume (4x the previous rate) on 2015100103.

top of the crash thread:
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsTimerImpl::Cancel() 	xpcom/threads/nsTimerImpl.cpp
1 	xul.dll 	mozilla::layers::AsyncPanZoomController::FlushRepaintForNewInputBlock() 	gfx/layers/apz/src/AsyncPanZoomController.cpp
2 	xul.dll 	mozilla::layers::APZCTreeManager::FlushRepaintsRecursively(mozilla::layers::HitTestingTreeNode*) 	gfx/layers/apz/src/APZCTreeManager.cpp
3 	xul.dll 	mozilla::layers::APZCTreeManager::FlushRepaintsRecursively(mozilla::layers::HitTestingTreeNode*) 	gfx/layers/apz/src/APZCTreeManager.cpp
4 	xul.dll 	mozilla::layers::APZCTreeManager::FlushRepaintsRecursively(mozilla::layers::HitTestingTreeNode*) 	gfx/layers/apz/src/APZCTreeManager.cpp
5 	xul.dll 	mozilla::layers::APZCTreeManager::FlushRepaintsRecursively(mozilla::layers::HitTestingTreeNode*) 	gfx/layers/apz/src/APZCTreeManager.cpp
6 	xul.dll 	mozilla::layers::APZCTreeManager::FlushRepaintsToClearScreenToGeckoTransform() 	gfx/layers/apz/src/APZCTreeManager.cpp
7 	xul.dll 	mozilla::layers::APZCTreeManager::ReceiveInputEvent(mozilla::InputData&, mozilla::layers::ScrollableLayerGuid*, unsigned __int64*) 	gfx/layers/apz/src/APZCTreeManager.cpp
8 	xul.dll 	mozilla::layers::APZCTreeManager::ProcessWheelEvent(mozilla::WidgetWheelEvent&, mozilla::layers::ScrollableLayerGuid*, unsigned __int64*) 	gfx/layers/apz/src/APZCTreeManager.cpp
9 	xul.dll 	mozilla::layers::APZCTreeManager::ReceiveInputEvent(mozilla::WidgetInputEvent&, mozilla::layers::ScrollableLayerGuid*, unsigned __int64*) 	gfx/layers/apz/src/APZCTreeManager.cpp
10 	xul.dll 	nsBaseWidget::DispatchAPZAwareEvent(mozilla::WidgetInputEvent*) 	widget/nsBaseWidget.cpp
11 	xul.dll 	nsWindow::DispatchWheelEvent(mozilla::WidgetWheelEvent*) 	widget/windows/nsWindow.cpp
Keywords: topcrash-win
ni? Botond for investigation, since Kats is away for several weeks.
Component: XPCOM → Panning and Zooming
Flags: needinfo?(botond)
Crash Signature: [@ nsTimerImpl::Cancel()] → [@ nsTimerImpl::Cancel()] [@ nsTimerImpl::Cancel]
This is a regression from bug 1200063 or one of its follow-ups (bug 1207270 and bug 1208973). I will investigate.
Assignee: nobody → botond
Flags: needinfo?(botond)
There's a comment in nsITimer.idl that says the following:

 * By default a timer will fire on the thread that created it.  Set the .target
 * attribute to fire on a different thread.  Once you have set a timer's .target
 * and called one of its init functions, any further interactions with the timer
 * (calling cancel(), changing member fields, etc) should only be done by the
 * target thread, or races may occur with bad results like timers firing after
 * they've been canceled, and/or not firing after re-initiatization.

Currently, TaskThrottler:

 - always creates the timer on the main thread
 - initializes or re-initializes it on either the compositor thread or the main thread
 - cancels it on either the compositor thread or the main thread

This doesn't seem to obey the stated contract. I wonder if that could be responsible for the crashes seen here and in bug 1213605.

[1] https://dxr.mozilla.org/mozilla-central/rev/d374d16cbb251c9dac5af69f8e186e821ce82fe2/xpcom/threads/nsITimer.idl#68
Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange
Assuming the problem is what I've described in comment 3, working around it by e.g. posting a message to another thread just to cancel the timer seems fairly messy. Instead, I thought I'd try getting rid of nsITimer altogether, and using MessageLoop::PostDelayedTask() instead. (I originally discarded this idea because I had in mind calling it via GeckoContentController::PostDelayedTask(), which requires that the call happen on the controller thread, which isn't necessarily the case here, but then I realized I can use MessageLoop::PostDelayedTask() directly.)

Try push:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=4c8b4c6751f8
Blocks: 1213605
(In reply to Botond Ballo [:botond] from comment #5)
> Try push:
>   https://treeherder.mozilla.org/#/jobs?repo=try&revision=4c8b4c6751f8

Gtests are falling over due to a thinko I made. Will post a fixed patch.
Comment on attachment 8675096 [details]
MozReview Request: Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange

Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange
That looks better. Markus, are you comfortable reviewing this, or should we wait for Kats?
Comment on attachment 8675096 [details]
MozReview Request: Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange

Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange
Attachment #8675096 - Flags: review?(mstange)
Comment on attachment 8675096 [details]
MozReview Request: Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange

https://reviewboard.mozilla.org/r/22327/#review19961

::: gfx/layers/apz/src/TaskThrottler.h:104
(Diff revision 2)
> +                                 // which deletes it

Won't this pointer be dangling after the task fires? And then CancelTimeoutTask might access it.
Attachment #8675096 - Flags: review?(mstange)
(In reply to Markus Stange [:mstange] from comment #10)
> Won't this pointer be dangling after the task fires? And then
> CancelTimeoutTask might access it.

Yes, good catch! I believe we can solve this by nulling out the pointer in OnTimeout().
Comment on attachment 8675096 [details]
MozReview Request: Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange

Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange
Attachment #8675096 - Flags: review?(mstange)
Comment on attachment 8675096 [details]
MozReview Request: Bug 1213273 - Use a chromium Task instead of an nsITimer for the timeout in TaskThrottler. r=mstange

https://reviewboard.mozilla.org/r/22327/#review20073
Attachment #8675096 - Flags: review?(mstange) → review+
https://hg.mozilla.org/mozilla-central/rev/3711cda860a1
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Depends on: 1217251
Depends on: 1217529
How does this work on Fennec? The controller thread in Fennec (the Java UI thread) doesn't have a MessageLoop::current(), so it should be crashing.
Flags: needinfo?(botond)
Ah, never mind. I see that's bug 1217251. I'll comment there.
Flags: needinfo?(botond)
Duplicate of this bug: 1213605
You need to log in before you can comment on or make changes to this bug.