Closed Bug 1161405 Opened 9 years ago Closed 9 years ago

Suboptimal parallelism of nsThreadPool

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox39 --- wontfix
firefox40 --- fixed
firefox41 --- fixed
firefox-esr38 --- wontfix
b2g-v2.2 --- wontfix
b2g-master --- fixed

People

(Reporter: jwwang, Assigned: jwwang)

References

Details

Attachments

(2 files, 3 obsolete files)

https://hg.mozilla.org/mozilla-central/file/102d0e9aa9e1/xpcom/threads/nsThreadPool.cpp#l87

nsThreadPool::PutEvent will not spawn new threads if there are already idle threads. However, is it possible for nsThreadPool::PutEvent to kick in again before the idle thread is woken up to consume the events? That is we end up having 2 events in the queue but only one thread to serve them and the 2 events will run in sequence.
Flags: needinfo?(benjamin)
Attached patch 1161405_testcase.patch (obsolete) — Splinter Review
This test case will fail and prove my assumption in comment 0. I think we should also consider the number of pending events as well as |mIdleCount| when deciding to spawn a new thread to maximize the parallelism of the thread pool.
Blocks: 1105720
gerald had been investigating that issue earlier today and was working on a fix.
Argh, you beat me by 6 hours!
I've also experienced this today in a test where ~18 videos are started, and are all garbage-collected together at the end, causing a flury of task dispatches (to shutdown everything) when there were 3 idle threads at that time. Dispatches happened pretty much together, so the idle threads didn't have time to grab events from the queue, and therefore the idle count stayed at 3.
At the end of all this, there were only 5 threads running (out of 25 max), and 20+ events still in the queue, which could have been handled by new threads.

My thoughts on this: Either PutEvent should decrement the idle count (but this may have side effects?) as suggested by Jean-Yves, or maybe when the last idle thread grabs something from the queue, it could check if more events are on the queue and create thread/s for it/them.

I was *not* working on a fix (yet)! But please let me know if you'd like me to really work on it.
I cannot help with this.
Flags: needinfo?(benjamin)
You probably want Nathan, who's the new XPCOM owner.
Flags: needinfo?(nfroyd)
I will ask for review when patch is ready.
Flags: needinfo?(nfroyd)
Improve parallelism of nsThreadPool by taking the number of pending events into account when spawning a new thread.
Assignee: nobody → jwwang
Attachment #8601363 - Attachment is obsolete: true
Attachment #8601890 - Flags: review?(nfroyd)
Attached patch 1161405_part2_testcase-v1.patch (obsolete) — Splinter Review
Test case for parallel execution of runnables.
Attachment #8601893 - Flags: review?(nfroyd)
A data point, in relation with my comment 3:

This patch + EME tests from bug 1138294 still shows issue in the Media stack, with too many tasks created, waiting for subsequent tasks to respond, but there are not enough available threads to deal with the latter ones.

E.g.: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f4736a0246f
Taking one log: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/gsquelart@mozilla.com-7f4736a0246f/try-linux/try_ubuntu32_vm_test-mochitest-3-bm02-tests1-linux32-build741.txt.gz
Search for 'SetIdle', we see 12 of them, but all 25 MediaTaskQueue threads are busy, so no thread can deal with pending DoDrain tasks (which would help get rid of SetIdle's).

So while this patch is definitely needed to fix a bug in ThreadPool, it is not enough to fix some classes of issues in the Media stack using it. Bug 1161892 will probably help, by using separate thread pools for separate task queues. Go team!
If it is about thread limit of the pool, can you change 25 to 50 or above to see if that helps?
Much better with 250 threads:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=19c762c50c7c

I previously tried 250 threads but without your fix and still had lock-ups, so your fix is definitely helping.
Comment on attachment 8601893 [details] [diff] [review]
1161405_part2_testcase-v1.patch

Review of attachment 8601893 [details] [diff] [review]:
-----------------------------------------------------------------

I'm trying to think of some way to make this test less timing-dependent; especially on b2g emulators, it seems plausible that we could intermittently fail.

What about forcing some sort of explicit synchronization between the two dispatched events, say r1 spinning on an Atomic<bool> that r2 sets to true?  The test would deadlock if r1 and r2 had to execute on the same thread, but maybe that wouldn't be such a bad thing?

::: xpcom/tests/gtest/TestThreadPool.cpp
@@ +93,5 @@
> +  PR_Sleep(PR_MillisecondsToInterval(2000));
> +
> +  // Dispatch 2 events in a row. Since we are still within the thread limit,
> +  // We should wake up the idle thread and spawn a new thread to serve these
> +  // 2 events. We also expect |r2| (seqNum=2) to finish before |r1| (seqNum=1).

Shouldn't seqNum=3 for r1?

So, just to make sure I understand things, the desired sequence of events here is:

1. Dispatch r1
2. Have an idle thread, service r1 on that.
3. Dispatch r2
4. We haven't reached our limit of threads running, spawn another thread to handle that.

which is, I think, what we get with the current patch, but the behavior today is:

1. Dispatch r1
2. Have an idle thread, service r1 on that.  Let's call this T1.
3. Dispatch r2
4. We *should* have no idle threads at this point.
5. But T1 might not have decremented the count of idle threads, because it might not have retrieved r1 from the queue yet.
6. It looks like we have an idle thread, then.  Stick r2 in the queue and wait.

Is that correct?  If so, then at step 4/5, we're really at the mercy of the thread scheduling of the OS, and whether or not T1 actually gets to update the state of the thread pool prior to trying to figure out what to do with r2, correct?

It seems like we ought to be able to do better here by more eagerly updating the status of the thread pool, which I think is what Gerald was suggesting in comment 3.
Attachment #8601893 - Flags: review?(nfroyd) → feedback+
Comment on attachment 8601890 [details] [diff] [review]
1161405_part1_improve_parallelism-v1.patch

Review of attachment 8601890 [details] [diff] [review]:
-----------------------------------------------------------------

I'm curious how alternate approaches would work, so just f+'ing this while we discuss.

::: xpcom/threads/nsEventQueue.cpp
@@ +135,5 @@
> +  for (Page* page = mHead; page != mTail; page = page->mNext) {
> +    count += EVENTS_PER_PAGE;
> +  }
> +
> +  count += mOffsetTail - mOffsetHead;

This computation looks suspicious since mOffsetTail and mOffsetHead don't necessarily have to point to the same thing.

But I think everything actually works out.  A comment is probably worthwhile...maybe initializing |count| with |-mOffsetHead| would read a little better, once the comment was written?
Attachment #8601890 - Flags: review?(nfroyd) → feedback+
ni? for comment 13.
Flags: needinfo?(jwwang)
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #13)

Isn't it also possible that T1 could handle both r1 and r2 before a new thread has had time to spin up? In which case we'll spin up an extra thread for nothing.
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #13)
> Is that correct?  If so, then at step 4/5, we're really at the mercy of the
> thread scheduling of the OS, and whether or not T1 actually gets to update
> the state of the thread pool prior to trying to figure out what to do with
> r2, correct?
Exactly.

> It seems like we ought to be able to do better here by more eagerly updating
> the status of the thread pool, which I think is what Gerald was suggesting
> in comment 3.

I thought about that and decided that is complicated to implement.
Say we have an idle and an active thread and a new event about to be put into the queue. If we eagerly decrement |mIdleCount| in nsThreadPool::PutEvent, then the idle thread won't have to decrement |mIdleCount| again when it wake up. However, it is also possible the active thead picks the event first and we have to restore |mIdleCount|.

Therefore I take a simple approach by comparing the idle thread count with the pending event count. If the idle thread count is larger than the pending event count, we are guaranteed to have at least idle thread to serve the incoming event and we don't need to spawn a new thread. Otherwise, we spawn a new thread for the new event even though it is possible another active thread could pick the new event before the newly spawned thread. However, that is fine since we want as much parallelism as possible for the thread pool.

I will re-write the test case to avoid the timing issue.
Flags: needinfo?(jwwang)
Add comments.
Attachment #8601890 - Attachment is obsolete: true
Attachment #8602479 - Flags: review?(nfroyd)
Fix timing issue in testcase-v1.
Attachment #8601893 - Attachment is obsolete: true
Attachment #8602480 - Flags: review?(nfroyd)
I think it's possible for comment 16 to still be true, yes?

Also wondering if it makes more sense for nsEventQueue to just count events as they're added/removed.

I won't be able to look at these today, too many other things in the queue, I'll go through them tomorrow.
Flags: needinfo?(jwwang)
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20)
> I think it's possible for comment 16 to still be true, yes?
Do you mean the testcase-v2? No. Since r1 won't finish until r2 set mDone to true, there is no way for T1 to handle both r1 and r2.

> Also wondering if it makes more sense for nsEventQueue to just count events
> as they're added/removed.
Do you mean the implementation of nsEventQueue::Count()? It is a lot easier to do that. I am fine with both options.
Flags: needinfo?(jwwang)
(In reply to JW Wang [:jwwang] from comment #21)
> (In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20)
> > I think it's possible for comment 16 to still be true, yes?
> Do you mean the testcase-v2?

No, he (and I) mean for generic runnables r1 and r2, not for the ones in your test.

> > Also wondering if it makes more sense for nsEventQueue to just count events
> > as they're added/removed.
> Do you mean the implementation of nsEventQueue::Count()? It is a lot easier
> to do that. I am fine with both options.

I would say no actually because this way the cost (small though it may be) is eaten by the caller of Count(), not by everyone using an nsEventQueue.
(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #22)
> No, he (and I) mean for generic runnables r1 and r2, not for the ones in
> your test.
Ah, I see. Then the answer is yes. I think that is fine since we want to maximize concurrency and we will shutdown idle threads that times out.

> I would say no actually because this way the cost (small though it may be)
> is eaten by the caller of Count(), not by everyone using an nsEventQueue.
Do you say no to my implementation where the cost is eaten by the caller of Count()?
(In reply to JW Wang [:jwwang] from comment #23)
> (In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #22)
> > No, he (and I) mean for generic runnables r1 and r2, not for the ones in
> > your test.
> Ah, I see. Then the answer is yes. I think that is fine since we want to
> maximize concurrency and we will shutdown idle threads that times out.

Well, maximizing concurrency is nice, but minimizing time (memory, power, etc.) wasted by spinning up threads we don't need is also desirable.

That being said, it looks like many of our uses of nsThreadPool already set threadLimit == idleThreadLimit, so spinning up an extra thread doesn't necessarily cause a bunch of churn, since that thread is likely to hang around for a while anyway.
Comment on attachment 8602479 [details] [diff] [review]
1161405_part1_improve_parallelism-v2.patch

Review of attachment 8602479 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/threads/nsEventQueue.cpp
@@ +131,5 @@
> +  if (!mHead) {
> +    return 0;
> +  }
> +
> +  /* How we count the number of events in the queue:

Thanks for writing this comment.
Attachment #8602479 - Flags: review?(nfroyd) → review+
Attachment #8602480 - Flags: review?(nfroyd) → review+
Thanks for the review.
Keywords: checkin-needed
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/eac3aa030c4e
https://hg.mozilla.org/mozilla-central/rev/a4700f2b25fc
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Comment on attachment 8602479 [details] [diff] [review]
1161405_part1_improve_parallelism-v2.patch

Approval Request Comment
[Feature/regressing bug #]:unknown
[User impact if declined]:This is required to fix the test case timeout in bug 1105720.
[Describe test coverage new/current, TreeHerder]:Tested on TreeHerder and Central for a week. So far so good.
[Risks and why]: Low. The change is simple and has been tested on Central.
[String/UUID change made/needed]:none
Attachment #8602479 - Flags: approval-mozilla-beta?
Attachment #8602479 - Flags: approval-mozilla-aurora?
Comment on attachment 8602479 [details] [diff] [review]
1161405_part1_improve_parallelism-v2.patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This patch increases the concurrency of the thread pool for better media playback performance and prevents media playback from hanging in some rare cases. This patch is also required to fix test case timeout in bug 1105720.
User impact if declined: media playback would hang in rare cases.
Fix Landed on Version:41
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8602479 - Flags: approval-mozilla-esr38?
Comment on attachment 8602479 [details] [diff] [review]
1161405_part1_improve_parallelism-v2.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): unknown
User impact if declined: This patch is required to fix test case timeout in bug 1105720. Otherwise the test will have to be disabled and decrease our test coverage as well as software quality.
Testing completed: Treeherder and run on Central for a week.
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch:none
Attachment #8602479 - Flags: approval-mozilla-b2g37?
Comment on attachment 8602479 [details] [diff] [review]
1161405_part1_improve_parallelism-v2.patch

Approving and please NI if causing any issue and need backout.
Attachment #8602479 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment on attachment 8602479 [details] [diff] [review]
1161405_part1_improve_parallelism-v2.patch

We are trying to keep ESR changes to major impact security and stability fixes.
Attachment #8602479 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38-
Comment on attachment 8602479 [details] [diff] [review]
1161405_part1_improve_parallelism-v2.patch

Approved for uplift to beta (39) and aurora (40) to improve media playback stability and make sure our existing tests work correctly.
Attachment #8602479 - Flags: approval-mozilla-beta?
Attachment #8602479 - Flags: approval-mozilla-beta+
Attachment #8602479 - Flags: approval-mozilla-aurora?
Attachment #8602479 - Flags: approval-mozilla-aurora+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9da332daa6ab
There are crashes on Beta without the patch applied.
Flags: needinfo?(jwwang)
mozilla-beta went green post-backout. Also, your Try push is showing bustage on e10s runs, which is expected for mozilla-beta since e10s is disabled there. Note that the non-e10s runs are green.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&filter-searchStr=Linux%20pgo%20Mochitest%20Mochitest%20Browser%20Chrome%20M%28bc1%29
Flags: needinfo?(jwwang)
The only cause I can think of is this patch have thread pool use more threads and is more likely to cause out of memory. However it is strange crashes only show on PGO builds.

Btw, how can I make PGO builds appear on Treeherder? https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build doesn't work for me.

I will try to tweak configuration of thread pool to find out if it is really about OOM.
Flags: needinfo?(jwwang) → needinfo?(ryanvm)
Flags: qe-verify-
They won't show up as pgo because it's a build system hack, but you can verify by checking the build log that they are.
Flags: needinfo?(ryanvm)
Also, I was able to reproduce on Try without pgo, fwiw.
Finally I can repro it on Linux opt build. (https://treeherder.mozilla.org/#/jobs?repo=try&revision=69e2df98a9ef)

However, I can't repro it after setting a hard limit (3 at most) to the number of threads in the pool. (https://treeherder.mozilla.org/#/jobs?repo=try&revision=228bf46baf39) It is obvious my patch makes it more likely to use more threads and thus more memory (and more likey for OOM) in order to maximize concurrency to improve performance.

I will add logs to find out how many threads are enough to cause OOM.
I doubt this is going to make Beta39 at this point. JW, do you think this is still worth trying to get in for B2G v2.2?
Flags: needinfo?(jwwang)
I think not. I think B2G 2.2 is fine without this small improvement.
Flags: needinfo?(jwwang)
You need to log in before you can comment on or make changes to this bug.