Closed
Bug 1135424
Opened 9 years ago
Closed 9 years ago
Consider making the state machine thread a task queue
Categories
(Core :: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla39
Tracking | Status | |
---|---|---|
firefox39 | --- | fixed |
People
(Reporter: bholley, Assigned: bholley)
References
Details
Attachments
(4 files)
7.73 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
8.12 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
54.45 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
4.07 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
Right now MediaDecoderStateMachineScheduler::GetStateMachineThread returns mEventTarget, which is an nsIThreadPool. Because it's a thread pool, tasks dispatched to it can run concurrently with other tasks (in particular, the state machine). So anybody trying to dispatch things "to the state machine thread" in the hopes of avoiding races is doing the wrong thing. And grepping through the code, we have at least a couple of instances of this. I think we should switch the idea of a "state machine thread" to a "state machine task queue". This should be pretty straightforward to do, and I'm happy to do it as part of the other work that I'm doing. Any thoughts or objections?
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(cpearce)
Assignee | ||
Comment 1•9 years ago
|
||
Facepalm - I just realized that the shared thread pool sets a thread limit of 1, making it not much of a pool at all. So I don't think there are any races here. That still leaves the question of whether it makes sense to run the MDSM task-queue-on-thread-pool to decrease interference between MDSMs and make the code more consistent. I've got patches that get us most of the way there already, though the timer is certainly still a sticky issue. We should probably discuss this in a larger discussion about our threading model.
Group: core-security
Flags: needinfo?(cpearce)
Summary: Usage of GetStateMachineThread potentially unsafe → Consider making the state machine thread a task queue
Assignee | ||
Updated•9 years ago
|
Comment 2•9 years ago
|
||
I'm hoping we can remove the timer in future. Once roc moves the video queue and A/V sync into the compositor, if we can figure out how to update HTMLMediaElement.currentTime for audio only streams we can probably remove the timer. Removing the audio thread will likely help here too.
Assignee | ||
Comment 3•9 years ago
|
||
Per bug 1135785 comment 10, we should try to do this shortly after bug 1135424 lands.
Depends on: 1135785
Assignee | ||
Comment 4•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2222b5111987
Assignee | ||
Comment 5•9 years ago
|
||
The original idea behind the current model was that we wanted ironclad guarantees that consumers would always get a callback on their promise. But we now have use cases where the consumer wants to forget about a promise (using the new Disconnect()) feature, and in some cases wants to shut down the task queue that the response is going to be dispatched on. In the case of this bug, we want to avoid waiting for the longest outstanding timer promise to be resolved before shutting down the MDSM. So this patch fixes up the pieces needed to make this work: * Loosening our invariants to allow dispatch targets to be released on any thread, since MediaTaskQueue and nsIEventTarget both have thread-safe refcounting. * Releasing mThisVal in Disconnect, so that we no longer depend on successful dispatch to release it on the correct (dispatch) thread. * Fiddling with various assertions. We also make some assertions fatal in nightly/aurora builds while we're at it.
Attachment #8576345 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 6•9 years ago
|
||
Attachment #8576346 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 7•9 years ago
|
||
Attachment #8576347 -
Flags: review?(matt.woodrow)
Comment 8•9 years ago
|
||
Comment on attachment 8576345 [details] [diff] [review] Part 1 - Allow MediaPromise dispatch to fail if the ThenValue has been disconnected. v1 Review of attachment 8576345 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/MediaPromise.h @@ +245,5 @@ > PROMISE_LOG("%s Then() call made from %s [Runnable=%p, Promise=%p, ThenValue=%p]", > resolved ? "Resolving" : "Rejecting", ThenValueBase::mCallSite, > runnable.get(), aPromise, this); > + nsresult rv = detail::DispatchMediaPromiseRunnable(mResponseTarget, runnable); > + (void) rv; mozilla::unused?
Attachment #8576345 -
Flags: review?(matt.woodrow) → review+
Comment 9•9 years ago
|
||
Comment on attachment 8576346 [details] [diff] [review] Part 2 - Implement MediaTimer. v1 Review of attachment 8576346 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/MediaTimer.h @@ +74,5 @@ > + nsRefPtr<MediaTimerPromise::Private> mPromise; > + > + explicit Entry(const TimeStamp& aTimeStamp) > + : mTimeStamp(aTimeStamp) > + , mPromise(new MediaTimerPromise::Private(__func__)) This __func__ is probably going to be fairly useless. We could make WaitUntil take a const char* parameter and have the callers of that use __func__ to get more useful values.
Attachment #8576346 -
Flags: review?(matt.woodrow) → review+
Comment 10•9 years ago
|
||
Comment on attachment 8576347 [details] [diff] [review] Part 3 - Switch the MDSM to a task queue. v1 Review of attachment 8576347 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/MediaDecoderStateMachine.h @@ +322,5 @@ > > + // Invokes ScheduleStateMachine to run at |aTimeStamp|, unless it's > + // already scheduled to run earlier, in which case the request is > + // discarded. > + void ScheduleStateMachineAt(const TimeStamp aTimeStamp); Why did this change to an absolute time instead of a relative one (TimeDuration)? All the callers have to use TimeStamp::Now() to convert relative to absolute, and I'm not really sure what passing a value < Now() would actually mean.
Attachment #8576347 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Comment 11•9 years ago
|
||
Green try push in comment 4 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/e52401d30a67 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d86806ea63f4 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/584d91ffdf88
Comment 12•9 years ago
|
||
Apparently you need mochitest-e10s-3 on your try push too, to persuade me of your innocence. Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/1e99f9913951 on suspicion of being the cause of 330 seconds without output hangs in test_playback.html in mochitest-e10s-3. I don't feel great about putting the blame on you, but retriggering I got three of them on you, and none on the pushes below you.
Assignee | ||
Comment 13•9 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #12) > Apparently you need mochitest-e10s-3 on your try push too, to persuade me of > your innocence. Will do! > Backed out in > https://hg.mozilla.org/integration/mozilla-inbound/rev/1e99f9913951 on > suspicion of being the cause of 330 seconds without output hangs in > test_playback.html in mochitest-e10s-3. I don't feel great about putting the > blame on you, but retriggering I got three of them on you, and none on the > pushes below you. Well, there's also a legitimate assertion failure (which I've fixed). But yes, let's investigate this on try. Control: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9557ba6176a4 Logging: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dcc31427f8cf
Assignee | ||
Comment 14•9 years ago
|
||
After several iterations of logging, it looks like we're hanging in the AwaitShutdownAndIdle call in nsDecoderDisposeEvent::Run. I'm not sure why exactly, but I would guess that one of the tasks in the task queue is blocking on the main thread, either with synchronous dispatch or because we're spinning the event loop and something higher up on the stack is holding a blocking resource (the event loop spinning stuff might explain why this is e10s-only). To be honest, shutting down the task queue synchronously on the main thread was kind of cutting a corner to begin with. I'll write up a patch to do it properly with the MediaPromise that BeginShutdown already returns.
Assignee | ||
Comment 15•9 years ago
|
||
Attachment #8577870 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 16•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0de522268b51
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Bobby Holley (:bholley) from comment #16) > https://treeherder.mozilla.org/#/jobs?repo=try&revision=0de522268b51 Looks green. \o/
Updated•9 years ago
|
Attachment #8577870 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Comment 18•9 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/1bec876948ea remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/9d1e2d6f156b remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/3b51954d4b0d remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/08bb5eab67b7
Comment 19•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/1bec876948ea https://hg.mozilla.org/mozilla-central/rev/9d1e2d6f156b https://hg.mozilla.org/mozilla-central/rev/3b51954d4b0d https://hg.mozilla.org/mozilla-central/rev/08bb5eab67b7
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Comment 20•9 years ago
|
||
I get the following error doing a self build of current trunk, which I suspect was caused by the patches for this bug: clang++ -o nsTypeAheadFind.o -c -fvisibility=hidden -DSTATIC_EXPORTABLE_JS_API -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DAB_CD=en-US -DNO_NSPR_10_SUPPORT -I/usr/local/src/Mozilla/ bugzilla1110911/blah/mozilla-central/toolkit/components/typeaheadfind -I. -I../../../dist/include -I/usr/local/src/Mozil la/bugzilla1110911/blah/mozilla-central/obj-firefox-64bit/dist/include/nspr -I/usr/local/src/Mozilla/bugzilla1110911/blah/mozilla-central/obj-firefox-64bit/dist/include/nss -fPIC -Qunused-arguments -DMOZILLA_CLIENT -include ../../../mozilla-config.h -MD -MP -MF .deps/nsTypeAheadFind.o.pp -Qunused-arguments -Qunused-arguments -Wall -Wempty-body -Woverloaded-virtual -Wsign-compare -Wwrite-strings -Wno-invalid-offsetof -Wno-c++0x-extensions -Wno-extended-offsetof -Wno-unknown-warning-option -Wno-return-type-c-linkage -g -gfull -isysroot /Developer/SDKs/MacOSX10.7.sdk -fno-exceptions -fno-strict-aliasing -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -DNO_X11 -pipe -DNDEBUG -DTRIMMED -g -fno-omit-frame-pointer /usr/local/src/Mozilla/bugzilla1110911/blah/mozilla-central/toolkit/components/typeaheadfind/nsTypeAheadFind.cpp In file included from /usr/local/src/Mozilla/bugzilla1110911/blah/mozilla-central/obj-firefox-64bit/layout/generic/Unified_cpp_layout_generic3.cpp:38: In file included from /usr/local/src/Mozilla/bugzilla1110911/blah/mozilla-central/layout/generic/nsVideoFrame.cpp:14: In file included from ../../dist/include/mozilla/dom/HTMLVideoElement.h:11: In file included from ../../dist/include/mozilla/dom/HTMLMediaElement.h:22: In file included from ../../dist/include/MediaDecoder.h:192: ../../dist/include/MediaPromise.h:267:16: error: 'Disconnect' marked 'override' but does not override any member functions virtual void Disconnect() MOZ_OVERRIDE ^
Assignee | ||
Comment 21•9 years ago
|
||
(In reply to Steven Michaud from comment #20) > I get the following error doing a self build of current trunk, which I > suspect was caused by the patches for this bug: You need to update your clang. See bug 1129499.
Comment 22•9 years ago
|
||
> You need to update your clang. See bug 1129499.
That's going to be difficult. I use Apple's clang (which comes with XCode), but because I'm building on OS X 10.7.5 I can't use Apple's latest clang (or XCode). I suppose I can hack something up, though (perhaps by building my own clang).
I expect this isn't the last you'll hear of this. In other words, I expect others will also have run into this difficulty.
Thanks, though, for pointing out the bug that made the change (and triggered the problem).
Assignee | ||
Comment 23•9 years ago
|
||
(In reply to Steven Michaud from comment #22) > That's going to be difficult. I use Apple's clang (which comes with XCode), > but because I'm building on OS X 10.7.5 I can't use Apple's latest clang (or > XCode). I suppose I can hack something up, though (perhaps by building my > own clang). The easiest thing would probably be to write a local patch to remove the MOZ_OVERRIDE annotation. > I expect this isn't the last you'll hear of this. In other words, I expect > others will also have run into this difficulty. I am open to suggestions on how to improve the situation. > Thanks, though, for pointing out the bug that made the change (and triggered > the problem). To be clear, that was a different instance of the same problem.
Comment 24•9 years ago
|
||
>> I expect this isn't the last you'll hear of this. In other words, >> I expect others will also have run into this difficulty. > > I am open to suggestions on how to improve the situation. I just found I can use a patched build of (non-Apple) clang r185949, which I'd created to do Mac ASan builds on OS X 10.7.5 (http://people.mozilla.org/~stmichaud/bmo/firefox-asan-howto.txt). So you don't need to worry about me. But if enough other people show up with the same problem, who can't easily upgrade their clang, I suggest you continue to accomodate old clang versions (those with a bug in how they handle the "override" keyword) -- here and in future patches. Here you'd presumably need to get rid of the OVERRIDE keyword in the following line: https://hg.mozilla.org/mozilla-central/annotate/2e2222a40262/dom/media/MediaPromise.h#l267
Assignee | ||
Comment 25•9 years ago
|
||
(In reply to Steven Michaud from comment #24) > Here you'd presumably need to get rid of the OVERRIDE keyword in the > following line: > > https://hg.mozilla.org/mozilla-central/annotate/2e2222a40262/dom/media/ > MediaPromise.h#l267 We can't do that, because then static analysis builds fail. We now require that all overrides are marked MOZ_OVERRIDE.
Comment 26•9 years ago
|
||
Oh well :-( Let's hope there aren't too many others who still need to build on OS X 10.7. Or if there are some, let's hope they can use my workaround from comment #24. (As I remember it, the latest non-Apple clang doesn't build/work on OS X 10.7, so you can't just upgrade to the latest version.)
Comment 27•9 years ago
|
||
I compile in a 10.7 VM with clang from macport (so I can get the clang plugin working) and it works fine for me.
Comment 28•9 years ago
|
||
> clang from macport
It's good to know that also works.
You need to log in
before you can comment on or make changes to this bug.
Description
•