Closed
Bug 1161405
Opened 8 years ago
Closed 8 years ago
Suboptimal parallelism of nsThreadPool
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla41
People
(Reporter: jwwang, Assigned: jwwang)
References
Details
Attachments
(2 files, 3 obsolete files)
3.07 KB,
patch
|
froydnj
:
review+
lizzard
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
lizzard
:
approval-mozilla-esr38-
jocheng
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
2.67 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
https://hg.mozilla.org/mozilla-central/file/102d0e9aa9e1/xpcom/threads/nsThreadPool.cpp#l87 nsThreadPool::PutEvent will not spawn new threads if there are already idle threads. However, is it possible for nsThreadPool::PutEvent to kick in again before the idle thread is woken up to consume the events? That is we end up having 2 events in the queue but only one thread to serve them and the 2 events will run in sequence.
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(benjamin)
Assignee | ||
Comment 1•8 years ago
|
||
This test case will fail and prove my assumption in comment 0. I think we should also consider the number of pending events as well as |mIdleCount| when deciding to spawn a new thread to maximize the parallelism of the thread pool.
Comment 2•8 years ago
|
||
gerald had been investigating that issue earlier today and was working on a fix.
Comment 3•8 years ago
|
||
Argh, you beat me by 6 hours! I've also experienced this today in a test where ~18 videos are started, and are all garbage-collected together at the end, causing a flury of task dispatches (to shutdown everything) when there were 3 idle threads at that time. Dispatches happened pretty much together, so the idle threads didn't have time to grab events from the queue, and therefore the idle count stayed at 3. At the end of all this, there were only 5 threads running (out of 25 max), and 20+ events still in the queue, which could have been handled by new threads. My thoughts on this: Either PutEvent should decrement the idle count (but this may have side effects?) as suggested by Jean-Yves, or maybe when the last idle thread grabs something from the queue, it could check if more events are on the queue and create thread/s for it/them. I was *not* working on a fix (yet)! But please let me know if you'd like me to really work on it.
Comment 5•8 years ago
|
||
You probably want Nathan, who's the new XPCOM owner.
Flags: needinfo?(nfroyd)
Assignee | ||
Comment 7•8 years ago
|
||
Try is green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8e307c28757
Assignee | ||
Comment 8•8 years ago
|
||
Improve parallelism of nsThreadPool by taking the number of pending events into account when spawning a new thread.
Assignee: nobody → jwwang
Attachment #8601363 -
Attachment is obsolete: true
Attachment #8601890 -
Flags: review?(nfroyd)
Assignee | ||
Comment 9•8 years ago
|
||
Test case for parallel execution of runnables.
Attachment #8601893 -
Flags: review?(nfroyd)
Comment 10•8 years ago
|
||
A data point, in relation with my comment 3: This patch + EME tests from bug 1138294 still shows issue in the Media stack, with too many tasks created, waiting for subsequent tasks to respond, but there are not enough available threads to deal with the latter ones. E.g.: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f4736a0246f Taking one log: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/gsquelart@mozilla.com-7f4736a0246f/try-linux/try_ubuntu32_vm_test-mochitest-3-bm02-tests1-linux32-build741.txt.gz Search for 'SetIdle', we see 12 of them, but all 25 MediaTaskQueue threads are busy, so no thread can deal with pending DoDrain tasks (which would help get rid of SetIdle's). So while this patch is definitely needed to fix a bug in ThreadPool, it is not enough to fix some classes of issues in the Media stack using it. Bug 1161892 will probably help, by using separate thread pools for separate task queues. Go team!
Assignee | ||
Comment 11•8 years ago
|
||
If it is about thread limit of the pool, can you change 25 to 50 or above to see if that helps?
Comment 12•8 years ago
|
||
Much better with 250 threads: https://treeherder.mozilla.org/#/jobs?repo=try&revision=19c762c50c7c I previously tried 250 threads but without your fix and still had lock-ups, so your fix is definitely helping.
![]() |
||
Comment 13•8 years ago
|
||
Comment on attachment 8601893 [details] [diff] [review] 1161405_part2_testcase-v1.patch Review of attachment 8601893 [details] [diff] [review]: ----------------------------------------------------------------- I'm trying to think of some way to make this test less timing-dependent; especially on b2g emulators, it seems plausible that we could intermittently fail. What about forcing some sort of explicit synchronization between the two dispatched events, say r1 spinning on an Atomic<bool> that r2 sets to true? The test would deadlock if r1 and r2 had to execute on the same thread, but maybe that wouldn't be such a bad thing? ::: xpcom/tests/gtest/TestThreadPool.cpp @@ +93,5 @@ > + PR_Sleep(PR_MillisecondsToInterval(2000)); > + > + // Dispatch 2 events in a row. Since we are still within the thread limit, > + // We should wake up the idle thread and spawn a new thread to serve these > + // 2 events. We also expect |r2| (seqNum=2) to finish before |r1| (seqNum=1). Shouldn't seqNum=3 for r1? So, just to make sure I understand things, the desired sequence of events here is: 1. Dispatch r1 2. Have an idle thread, service r1 on that. 3. Dispatch r2 4. We haven't reached our limit of threads running, spawn another thread to handle that. which is, I think, what we get with the current patch, but the behavior today is: 1. Dispatch r1 2. Have an idle thread, service r1 on that. Let's call this T1. 3. Dispatch r2 4. We *should* have no idle threads at this point. 5. But T1 might not have decremented the count of idle threads, because it might not have retrieved r1 from the queue yet. 6. It looks like we have an idle thread, then. Stick r2 in the queue and wait. Is that correct? If so, then at step 4/5, we're really at the mercy of the thread scheduling of the OS, and whether or not T1 actually gets to update the state of the thread pool prior to trying to figure out what to do with r2, correct? It seems like we ought to be able to do better here by more eagerly updating the status of the thread pool, which I think is what Gerald was suggesting in comment 3.
Attachment #8601893 -
Flags: review?(nfroyd) → feedback+
![]() |
||
Comment 14•8 years ago
|
||
Comment on attachment 8601890 [details] [diff] [review] 1161405_part1_improve_parallelism-v1.patch Review of attachment 8601890 [details] [diff] [review]: ----------------------------------------------------------------- I'm curious how alternate approaches would work, so just f+'ing this while we discuss. ::: xpcom/threads/nsEventQueue.cpp @@ +135,5 @@ > + for (Page* page = mHead; page != mTail; page = page->mNext) { > + count += EVENTS_PER_PAGE; > + } > + > + count += mOffsetTail - mOffsetHead; This computation looks suspicious since mOffsetTail and mOffsetHead don't necessarily have to point to the same thing. But I think everything actually works out. A comment is probably worthwhile...maybe initializing |count| with |-mOffsetHead| would read a little better, once the comment was written?
Attachment #8601890 -
Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #13) Isn't it also possible that T1 could handle both r1 and r2 before a new thread has had time to spin up? In which case we'll spin up an extra thread for nothing.
Assignee | ||
Comment 17•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #13) > Is that correct? If so, then at step 4/5, we're really at the mercy of the > thread scheduling of the OS, and whether or not T1 actually gets to update > the state of the thread pool prior to trying to figure out what to do with > r2, correct? Exactly. > It seems like we ought to be able to do better here by more eagerly updating > the status of the thread pool, which I think is what Gerald was suggesting > in comment 3. I thought about that and decided that is complicated to implement. Say we have an idle and an active thread and a new event about to be put into the queue. If we eagerly decrement |mIdleCount| in nsThreadPool::PutEvent, then the idle thread won't have to decrement |mIdleCount| again when it wake up. However, it is also possible the active thead picks the event first and we have to restore |mIdleCount|. Therefore I take a simple approach by comparing the idle thread count with the pending event count. If the idle thread count is larger than the pending event count, we are guaranteed to have at least idle thread to serve the incoming event and we don't need to spawn a new thread. Otherwise, we spawn a new thread for the new event even though it is possible another active thread could pick the new event before the newly spawned thread. However, that is fine since we want as much parallelism as possible for the thread pool. I will re-write the test case to avoid the timing issue.
Flags: needinfo?(jwwang)
Assignee | ||
Comment 18•8 years ago
|
||
Add comments.
Attachment #8601890 -
Attachment is obsolete: true
Attachment #8602479 -
Flags: review?(nfroyd)
Assignee | ||
Comment 19•8 years ago
|
||
Fix timing issue in testcase-v1.
Attachment #8601893 -
Attachment is obsolete: true
Attachment #8602480 -
Flags: review?(nfroyd)
![]() |
||
Comment 20•8 years ago
|
||
I think it's possible for comment 16 to still be true, yes? Also wondering if it makes more sense for nsEventQueue to just count events as they're added/removed. I won't be able to look at these today, too many other things in the queue, I'll go through them tomorrow.
Flags: needinfo?(jwwang)
Assignee | ||
Comment 21•8 years ago
|
||
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20) > I think it's possible for comment 16 to still be true, yes? Do you mean the testcase-v2? No. Since r1 won't finish until r2 set mDone to true, there is no way for T1 to handle both r1 and r2. > Also wondering if it makes more sense for nsEventQueue to just count events > as they're added/removed. Do you mean the implementation of nsEventQueue::Count()? It is a lot easier to do that. I am fine with both options.
Flags: needinfo?(jwwang)
(In reply to JW Wang [:jwwang] from comment #21) > (In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20) > > I think it's possible for comment 16 to still be true, yes? > Do you mean the testcase-v2? No, he (and I) mean for generic runnables r1 and r2, not for the ones in your test. > > Also wondering if it makes more sense for nsEventQueue to just count events > > as they're added/removed. > Do you mean the implementation of nsEventQueue::Count()? It is a lot easier > to do that. I am fine with both options. I would say no actually because this way the cost (small though it may be) is eaten by the caller of Count(), not by everyone using an nsEventQueue.
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #22) > No, he (and I) mean for generic runnables r1 and r2, not for the ones in > your test. Ah, I see. Then the answer is yes. I think that is fine since we want to maximize concurrency and we will shutdown idle threads that times out. > I would say no actually because this way the cost (small though it may be) > is eaten by the caller of Count(), not by everyone using an nsEventQueue. Do you say no to my implementation where the cost is eaten by the caller of Count()?
![]() |
||
Comment 24•8 years ago
|
||
(In reply to JW Wang [:jwwang] from comment #23) > (In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #22) > > No, he (and I) mean for generic runnables r1 and r2, not for the ones in > > your test. > Ah, I see. Then the answer is yes. I think that is fine since we want to > maximize concurrency and we will shutdown idle threads that times out. Well, maximizing concurrency is nice, but minimizing time (memory, power, etc.) wasted by spinning up threads we don't need is also desirable. That being said, it looks like many of our uses of nsThreadPool already set threadLimit == idleThreadLimit, so spinning up an extra thread doesn't necessarily cause a bunch of churn, since that thread is likely to hang around for a while anyway.
![]() |
||
Comment 25•8 years ago
|
||
Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Review of attachment 8602479 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/threads/nsEventQueue.cpp @@ +131,5 @@ > + if (!mHead) { > + return 0; > + } > + > + /* How we count the number of events in the queue: Thanks for writing this comment.
Attachment #8602479 -
Flags: review?(nfroyd) → review+
![]() |
||
Updated•8 years ago
|
Attachment #8602480 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 26•8 years ago
|
||
Thanks for the review.
Assignee | ||
Comment 27•8 years ago
|
||
Try with gtests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3b5bf59f049
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 28•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/eac3aa030c4e https://hg.mozilla.org/integration/mozilla-inbound/rev/a4700f2b25fc
Comment 29•8 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/eac3aa030c4e https://hg.mozilla.org/mozilla-central/rev/a4700f2b25fc
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Assignee | ||
Comment 30•8 years ago
|
||
Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Approval Request Comment [Feature/regressing bug #]:unknown [User impact if declined]:This is required to fix the test case timeout in bug 1105720. [Describe test coverage new/current, TreeHerder]:Tested on TreeHerder and Central for a week. So far so good. [Risks and why]: Low. The change is simple and has been tested on Central. [String/UUID change made/needed]:none
Attachment #8602479 -
Flags: approval-mozilla-beta?
Attachment #8602479 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 31•8 years ago
|
||
Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: This patch increases the concurrency of the thread pool for better media playback performance and prevents media playback from hanging in some rare cases. This patch is also required to fix test case timeout in bug 1105720. User impact if declined: media playback would hang in rare cases. Fix Landed on Version:41 Risk to taking this patch (and alternatives if risky): low String or UUID changes made by this patch: none
Attachment #8602479 -
Flags: approval-mozilla-esr38?
Assignee | ||
Comment 32•8 years ago
|
||
Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch [Approval Request Comment] Bug caused by (feature/regressing bug #): unknown User impact if declined: This patch is required to fix test case timeout in bug 1105720. Otherwise the test will have to be disabled and decrease our test coverage as well as software quality. Testing completed: Treeherder and run on Central for a week. Risk to taking this patch (and alternatives if risky): low String or UUID changes made by this patch:none
Attachment #8602479 -
Flags: approval-mozilla-b2g37?
Updated•8 years ago
|
status-b2g-v2.2:
--- → affected
status-b2g-master:
--- → fixed
Comment 33•8 years ago
|
||
Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Approving and please NI if causing any issue and need backout.
Attachment #8602479 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Updated•8 years ago
|
Updated•8 years ago
|
Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch We are trying to keep ESR changes to major impact security and stability fixes.
Attachment #8602479 -
Flags: approval-mozilla-esr38? → approval-mozilla-esr38-
Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Approved for uplift to beta (39) and aurora (40) to improve media playback stability and make sure our existing tests work correctly.
Attachment #8602479 -
Flags: approval-mozilla-beta?
Attachment #8602479 -
Flags: approval-mozilla-beta+
Attachment #8602479 -
Flags: approval-mozilla-aurora?
Attachment #8602479 -
Flags: approval-mozilla-aurora+
Comment 36•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/1f92963a4c91 https://hg.mozilla.org/releases/mozilla-aurora/rev/eb63bda4d444
Flags: in-testsuite+
Comment 37•8 years ago
|
||
https://hg.mozilla.org/releases/mozilla-beta/rev/4e1782f281d5 https://hg.mozilla.org/releases/mozilla-beta/rev/5f5e1ff2d8a4
Comment 38•8 years ago
|
||
Backed out from beta for hitting reproducible crashes on linux32 mochitest-bc. https://treeherder.mozilla.org/logviewer.html#?job_id=371260&repo=mozilla-beta https://hg.mozilla.org/releases/mozilla-beta/rev/94832733e36b
Flags: needinfo?(jwwang)
Assignee | ||
Comment 39•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9da332daa6ab There are crashes on Beta without the patch applied.
Flags: needinfo?(jwwang)
Comment 40•8 years ago
|
||
mozilla-beta went green post-backout. Also, your Try push is showing bustage on e10s runs, which is expected for mozilla-beta since e10s is disabled there. Note that the non-e10s runs are green. https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&filter-searchStr=Linux%20pgo%20Mochitest%20Mochitest%20Browser%20Chrome%20M%28bc1%29
Flags: needinfo?(jwwang)
Assignee | ||
Comment 41•8 years ago
|
||
The only cause I can think of is this patch have thread pool use more threads and is more likely to cause out of memory. However it is strange crashes only show on PGO builds. Btw, how can I make PGO builds appear on Treeherder? https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build doesn't work for me. I will try to tweak configuration of thread pool to find out if it is really about OOM.
Flags: needinfo?(jwwang) → needinfo?(ryanvm)
Updated•8 years ago
|
Flags: qe-verify-
Comment 42•8 years ago
|
||
They won't show up as pgo because it's a build system hack, but you can verify by checking the build log that they are.
Flags: needinfo?(ryanvm)
Comment 43•8 years ago
|
||
Also, I was able to reproduce on Try without pgo, fwiw.
Assignee | ||
Comment 44•8 years ago
|
||
Finally I can repro it on Linux opt build. (https://treeherder.mozilla.org/#/jobs?repo=try&revision=69e2df98a9ef) However, I can't repro it after setting a hard limit (3 at most) to the number of threads in the pool. (https://treeherder.mozilla.org/#/jobs?repo=try&revision=228bf46baf39) It is obvious my patch makes it more likely to use more threads and thus more memory (and more likey for OOM) in order to maximize concurrency to improve performance. I will add logs to find out how many threads are enough to cause OOM.
Comment 45•8 years ago
|
||
I doubt this is going to make Beta39 at this point. JW, do you think this is still worth trying to get in for B2G v2.2?
Flags: needinfo?(jwwang)
Assignee | ||
Comment 46•8 years ago
|
||
I think not. I think B2G 2.2 is fine without this small improvement.
Flags: needinfo?(jwwang)
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•