Closed Bug 1433855 Opened 5 years ago Closed 5 years ago

Crash in mozilla::plugins::FunctionBroker<T>::PostToDispatchThread

Categories

(Core :: Security: Process Sandboxing, defect, P1)

Unspecified
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 --- fixed

People

(Reporter: calixte, Assigned: handyman)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-fd382b98-2a80-4aec-89e4-abd060180129.
=============================================================

Top 10 frames of crashing thread:

0 ntdll.dll NtWaitForKeyedEvent 
1 ntdll.dll RtlSleepConditionVariableCS 
2 kernel32.dll SleepConditionVariableCS 
3 mozglue.dll mozilla::detail::ConditionVariableImpl::wait mozglue/misc/ConditionVariable_windows.cpp:58
4 xul.dll mozilla::CondVar::Wait xpcom/threads/CondVar.h:68
5 xul.dll mozilla::plugins::FunctionBroker<1, short >::PostToDispatchThread dom/plugins/ipc/FunctionBroker.h:1447
6 xul.dll mozilla::plugins::FunctionBroker<1, short >::BrokerCallClient dom/plugins/ipc/FunctionBroker.h:1340
7 xul.dll mozilla::plugins::FunctionBroker<1, short >::MaybeBrokerCallClient dom/plugins/ipc/FunctionBroker.h:1313
8 xul.dll mozilla::plugins::FunctionBroker<1, short >::InterceptorStub dom/plugins/ipc/FunctionBroker.h:1231
9 npswf64_28_0_0_137.dll F905260817____________________________ F_578802783_______________________________________________________________:6145

=============================================================

There are 14 crashes (from are 8 installations) in nightly 60 with buildid 20180128220152. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1382251.

[1] https://hg.mozilla.org/mozilla-central/rev?node=1e37c536895f
Flags: needinfo?(davidp99)
Crash Signature: [@ mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] → [@ mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ hang | mozilla::plugins::FunctionBroker<T>::PostToDispatchThread]
Crash Signature: [@ mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ hang | mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] → [@ mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ hang | mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ nsThread::Dispatch]
Crash Signature: [@ mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ hang | mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ nsThread::Dispatch] → [@ mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ hang | mozilla::plugins::FunctionBroker<T>::PostToDispatchThread] [@ nsThread::Dispatch] [@ F_733287607____________________________________________________________________________] [@ F_…
Assignee: nobody → davidp99
Flags: needinfo?(davidp99)
Priority: -- → P1
Crash Signature: F_213380032____________________________________] [@ @0x0 | F_937183022___________________________________________________________________________________] → F_213380032____________________________________] [@ @0x0 | F_937183022___________________________________________________________________________________] [@ mozilla::plugins::FunctionBroker<T>::PostToDispatchHelper]
Most of these signatures are top-crashers in nightly for plugin process.
Keywords: topcrash
Looking into this.  Unfortunately, I don't have any leads yet.
FYI, the crashes in some of the signatures are not tied to bug 1382251 (they may be related -- I'm still looking).  For example, [@nsThread::Dispatch] [1] has crashes going back to 1/24 but bug 1382251 (which is definitely responsible for some of of the signatures) went in on 1/27.

[1] https://crash-stats.mozilla.com/signature/?signature=nsThread%3A%3ADispatch
This patch attempts to correct a synchronization issue in the PostToDispatchThread, but (I think) it butts up against an inconsistency in the docs for the synchronization primitives so it may be wrong.  I also don't have an STR or anything like that and I can't say that I see how this would cause the crash.  But it is a bug that could cause some failure.  Note that some of the crash signatures say that this is a hang, which is what I'd expect if this is the cause for the bug.

Here's the location of the failure [1]:

>  void FunctionBroker::PostToDispatchThread {
>  [...]
>  // Run PostToDispatchHelper on the dispatch thread.  It will notify our
>  // waiting monitor when it is done.
>  Monitor monitor("FunctionDispatchThread Lock");
>  MonitorAutoLock lock(monitor);
>  bool success = false;
>  FunctionBrokerChild::GetInstance()->PostToDispatchThread(
>    NewRunnableFunction("FunctionDispatchThreadRunnable", &PostToDispatchHelper,
>                        this, &monitor, &success, &aWinError, &aRet,
>                        &aParameters...));
>  monitor.Wait();
>  return success;
>  }

And the runnable it wants to dispatch [2]:

>  static void
>  PostToDispatchHelper(const SelfType* bmhi, Monitor* monitor, bool* ok, uint32_t* winErr, ResultType* r, ParamTypes*... p)
>  {
>    // Note: p is also non-null... its just hard to assert that.
>    MOZ_ASSERT(bmhi && monitor && ok && winErr && r);
>    *ok = bmhi->BrokerCallClient(*winErr, *r, *p...);
>    *ok &= NS_SUCCEEDED(monitor->Notify());
>  };

The issue would come in the unlikely event that the PostToDispatchThread happened, the dispatch thread ran PostToDispatchHelper, got its response and notified the monitor -- all before PostToDispatchThread got to the next line, monitor.Wait().  Obviously, the Wait() misses the Notify() in that case and you get a hang.  Now, the way to fix this is to Lock() the monitor in PostToDispatchHelper before calling Notify().  (Remember, the first thing Wait() does is release the monitor so this is ok.)  Note that we dont need the lock for the BrokerCallClient call -- there is no data race.  This is all about the timing of the Notify call.

So where does PostToDispatchHelper free the lock?  This is where the docs come in.  In general, Monitor is based on PRCondVar and PR_NotifyCondVar says [3]:

> ** Returns PR_FAILURE if the caller has not locked the lock associated
> ** with the condition variable.

...meaning we should (err, must) hold the lock during the Notify call.  This is obviously not true since we don't see PR_FAILURE -- it just works.  The deal is that the Windows implementation of Monitor uses a non-POSIX backend [4]:

> WakeConditionVariable(&platformData()->cv_);

...and, with CONDITION_VARIABLEs (i.e. cv_ above), we _dont_ need the lock before, as seen in this MSDN example [5].  I dont know if its an error to hold the lock before notifying on Windows but I know that we can free the lock _before_ notifying and still get the sequential guarantee we need to fix _this_ bug.  So thats what I do.  I expect this wouldn't work on POSIX-based platforms as-is but it will tell us for sure if this mistake is related to this bug.

(To be clear, if this works and Windows allows Notify on a Locked monitor, then we can just move the Notify call into the block with the MonitorAutoLock to fix POSIX.  Im currently just trying to avoid trading one error for another, which would mean I'd learn nothing from this patch in that case.)

Trying tests now [6]

---------

[1] https://searchfox.org/mozilla-central/rev/c56f656febb1aa58c72b218699e24a76425bd284/dom/plugins/ipc/FunctionBroker.h#1440
[2] https://searchfox.org/mozilla-central/rev/c56f656febb1aa58c72b218699e24a76425bd284/dom/plugins/ipc/FunctionBroker.h#1267
[3] https://searchfox.org/mozilla-central/rev/c56f656febb1aa58c72b218699e24a76425bd284/nsprpub/pr/include/prcvar.h#77
[4] https://searchfox.org/mozilla-central/rev/c56f656febb1aa58c72b218699e24a76425bd284/mozglue/misc/ConditionVariable_windows.cpp#45
[5] https://msdn.microsoft.com/en-us/ms686903(v=vs.85)
[6] https://treeherder.mozilla.org/#/jobs?repo=try&revision=08a25dcf060515054e64a71b6c5615532097f682
Comment on attachment 8946958 [details] [diff] [review]
Make sure plugin function broker's PostToDispatchThread is Waiting before Notifying it

Still no STR.  See my last comment.
Attachment #8946958 - Flags: review?(jmathies)
Comment on attachment 8946958 [details] [diff] [review]
Make sure plugin function broker's PostToDispatchThread is Waiting before Notifying it

thanks for the detailed explanation, lets land it and see.
Attachment #8946958 - Flags: review?(jmathies) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5cc1a1751a0e
Make sure plugin function broker's PostToDispatchThread is Waiting before Notifying it. r=jimm
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/5cc1a1751a0e
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Duplicate of this bug: 1434863
I am still seeing a number of plugin crashes in the latest nightly: http://bit.ly/2EgKI63. Not all the signatures, but specifically some of the flash ones (top 7 in the crash signatures list).
(In reply to Marcia Knous [:marcia - use ni] from comment #10)
> I am still seeing a number of plugin crashes in the latest nightly:
> http://bit.ly/2EgKI63. Not all the signatures, but specifically some of the
> flash ones (top 7 in the crash signatures list).

Should I reopen this bug? Or do we want the continuing issue in some signatures in a new bug?
Flags: needinfo?(davidp99)
I hadn't gone over the whole list closely enough but I believe there are at least two different bugs here.  The signatures that have tapered off to almost (!) zero are related to the Monitor fix we added but the others, the ones with only Flash+garbage stacks, are something else.  Their timing suggests their cause is the same original patch but they make me think there is a function Flash is calling that I'm returning junk for -- and messing it up.

So its probably a good idea to create a new bug for the crashes that didn't almost go away.  The others (the ones with FunctionBroker in the signature) need to be addressed eventually but aren't a priority.  Big thanks, Marcia.
Flags: needinfo?(davidp99) → needinfo?(mozillamarcia.knous)
Marcia,

Unless you've already started making a new one, I think you can just re-open this bug.  I've spent a few more hours cross-checking and there is definitely a correlation between the "mystery stacks" full of garbage and the ones with FunctionBroker in them.  I'm not 100% certain they are the same thing but the odds now feel more strongly than not that they are.  Apologies for not catching that sooner.
...and we come full circle.

Marcia (I'm deleting the NI), I think I've found what could be a cause for this.  Its in about the same spot but its not really related to this patch (it fixes something else).  So its easier to leave this as-is and add the new fix in a new bug, which I'm doing now.
Flags: needinfo?(mozillamarcia.knous)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla60 → ---
(In reply to David Parks (dparks) [:handyman] from comment #14)
> ...and we come full circle.
> 
> Marcia (I'm deleting the NI), I think I've found what could be a cause for
> this.  Its in about the same spot but its not really related to this patch
> (it fixes something else).  So its easier to leave this as-is and add the
> new fix in a new bug, which I'm doing now.

Thanks David - I assume the new bug is Bug 1436253? Also the crash rate is very high on nightly because of these flash and plugin crashes.
Sorry, I misread comment 14.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Yep, bug 1436253.  Good news: its already got a patch that I have high hopes for.  Bad news: I would probably have said that here, too.
You need to log in before you can comment on or make changes to this bug.