Intermittent devtools/client/canvasdebugger/test/browser_profiling-canvas.js | application crashed [@ MessageLoop::DeletePendingTasks()]

RESOLVED FIXED in Firefox 56

Status

()

defect
--
critical
RESOLVED FIXED
2 years ago
10 months ago

People

(Reporter: intermittent-bug-filer, Assigned: lsalzman)

Tracking

({crash, intermittent-failure})

unspecified
mozilla56
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 unaffected, firefox56 fixed)

Details

(crash signature)

Attachments

(1 attachment)

Initial investigation points at the runnable stuck on the work queue to be of ObjectWatcher::Watch, which following the chain of users seems to point to ProcessWatcher::EnsureProcessTerminated. The most probable user of that to be causing that seems like GeckoChildProcessHost::~GeckoChildProcessHost. Why this would in any way be related to a change to use DirectWrite for fonts on Windows 7 is as yet undetermined, and I am still looking into it. Superficially, it seems unrelated and more likely to be a result of changed timing patterns in the tests exposing a pre-existing problem.
Flags: needinfo?(lsalzman)
Note that the oldest open report of this assertion failure to date was from 2016-01-03 (bug 1236350).

Lee, are you willing to work on this bug?  Looks like you went deeper with hunting this down than me in bug 1355196.
Flags: needinfo?(lsalzman)
Component: Developer Tools: Canvas Debugger → IPC
Product: Firefox → Core
(In reply to Honza Bambas (:mayhemer) from comment #4)
> Note that the oldest open report of this assertion failure to date was from
> 2016-01-03 (bug 1236350).
> 
> Lee, are you willing to work on this bug?  Looks like you went deeper with
> hunting this down than me in bug 1355196.

I am still in the process of investigating this.
Flags: needinfo?(lsalzman)
Great!  Can I assign this bug then to you?
Assignee: nobody → lsalzman
The source of this particular instance of the intermittent in DeletePendingTasks was coming from ObjectWatcher queuing a task after the message loop is already in its death throes. We let it queue the task even though it would never actually be processed and merely trigger this assertion.

This adds tracking for whether the MessageLoop is shutting down. Callers sometimes adds tasks before the Run() loop is actually started, and usually during the Run() loop itself, both of which are safe times to add tasks since they will still actually get processed. However, after the Run() loop returns, it is never invoked again, so it is thus not safe to add tasks.

The added assertion in PostTask_Helper here will help us pinpoint the intermittents at the actual time they are triggered by calls to PostTask, rather than waiting all the way till DeletePendingTasks is called.

And in particular with ObjectWatcher::DoneWaiting, we simply skip adding the pending task if the MessageLoop is shutting down. ObjectWatcher only gets used by ChildReaper, and ChildReaper will always call KillProcess() in its destructor in the worst case, to ensure the process actually gets cleaned up.
Attachment #8887624 - Flags: review?(bobowencode)
Comment on attachment 8887624 [details] [diff] [review]
don't allow new tasks to be posted to MessageLoop when it is shutting down

Review of attachment 8887624 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not a peer for this code, so redirecting to billm.

(In reply to Lee Salzman [:lsalzman] from comment #7)
...
> And in particular with ObjectWatcher::DoneWaiting, we simply skip adding the
> pending task if the MessageLoop is shutting down. ObjectWatcher only gets
> used by ChildReaper, and ChildReaper will always call KillProcess() in its
> destructor in the worst case, to ensure the process actually gets cleaned up.

Well in the non-forced case, we really have to rely on ChildReaper::WillDestroyCurrentMessageLoop to do the clean-up, because I think we won't get deleted unless that happens anyway.

As far as I can tell in the non-forced case we will never kill a hung/slow-to-shutdown child process, but that is no change from how it works now.

::: ipc/chromium/src/base/message_loop.h
@@ +120,4 @@
>    // NOTE: These methods may be called on any thread.  The Task will be invoked
>    // on the thread that executes MessageLoop::Run().
>  
> +  bool IsProcessingTasks() const { return !shutting_down_; }

I think this name is slightly misleading, perhaps IsAcceptingTasks would be better?

::: ipc/chromium/src/base/object_watcher.cc
@@ +128,5 @@
>    // provided, which in turn ensures our change to did_signal can be observed
>    // on the target thread.
> +  if (watch->origin_loop->IsProcessingTasks()) {
> +    watch->origin_loop->PostTask(addrefedWatch.forget());
> +  }

I think that this is OK, because any remaining clean-up should still happen as a result of ChildReaper::WillDestroyCurrentMessageLoop and ObjectWatcher::WillDestroyCurrentMessageLoop in the non-forced case.
In the forced/ref-counted case ~ChildReaper calling KillProcess will do it as a last resort.
Attachment #8887624 - Flags: review?(wmccloskey)
Attachment #8887624 - Flags: review?(bobowencode)
Attachment #8887624 - Flags: feedback+
Attachment #8887624 - Flags: review?(wmccloskey) → review+
Pushed by lsalzman@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/04e3649ca8fc
don't allow new tasks to be posted to MessageLoop when it is shutting down. r=billm
It should be noted again that, provided this landing sticks, that we're going to essentially be trading the assertion inside MessageLoop::DeletePendingTasks() for assertions inside MessageLoop::PostTask_Helper coming from different callers. This will be nothing to be alarmed about, because in general it will point the finger at the thread that decided to queue the bad task where we actually have enough context information to track down things, as opposed to the receiver, which has lost that context.
https://hg.mozilla.org/mozilla-central/rev/04e3649ca8fc
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in before you can comment on or make changes to this bug.