Closed Bug 1161405 Opened 10 years ago Closed 10 years ago

Suboptimal parallelism of nsThreadPool

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla41

Tracking Flags:

Tracking

Status

firefox39

---

wontfix

firefox40

---

fixed

firefox41

---

fixed

firefox-esr38

---

wontfix

b2g-v2.2

---

wontfix

b2g-master

---

fixed

People

(Reporter: jwwang, Assigned: jwwang)

References

Details

Attachments

(2 files, 3 obsolete files)

1161405_testcase.patch 10 years ago JW Wang [:jwwang] [:jw_wang] 1.86 KB, patch		Details \| Diff \| Splinter Review
1161405_part1_improve_parallelism-v1.patch 10 years ago JW Wang [:jwwang] [:jw_wang] 2.32 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
1161405_part2_testcase-v1.patch 10 years ago JW Wang [:jwwang] [:jw_wang] 1.87 KB, patch	froydnj : feedback+	Details \| Diff \| Splinter Review
1161405_part1_improve_parallelism-v2.patch 10 years ago JW Wang [:jwwang] [:jw_wang] 3.07 KB, patch	froydnj : review+ lizzard : approval-mozilla-aurora+ lizzard : approval-mozilla-beta+ lizzard : approval-mozilla-esr38- jocheng : approval-mozilla-b2g37+	Details \| Diff \| Splinter Review
1161405_part2_testcase-v2.patch 10 years ago JW Wang [:jwwang] [:jw_wang] 2.67 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review

JW Wang [:jwwang] [:jw_wang]

Assignee

Description

•

10 years ago

https://hg.mozilla.org/mozilla-central/file/102d0e9aa9e1/xpcom/threads/nsThreadPool.cpp#l87 nsThreadPool::PutEvent will not spawn new threads if there are already idle threads. However, is it possible for nsThreadPool::PutEvent to kick in again before the idle thread is woken up to consume the events? That is we end up having 2 events in the queue but only one thread to serve them and the 2 events will run in sequence.

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

10 years ago

Flags: needinfo?(benjamin)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 1

•

10 years ago

Attached patch 1161405_testcase.patch (obsolete) — Details — Splinter Review

This test case will fail and prove my assumption in comment 0. I think we should also consider the number of pending events as well as |mIdleCount| when deciding to spawn a new thread to maximize the parallelism of the thread pool.

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

10 years ago

Blocks: 1105720

Jean-Yves Avenard [:jya]

Comment 2

•

10 years ago

gerald had been investigating that issue earlier today and was working on a fix.

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 3

•

10 years ago

Argh, you beat me by 6 hours! I've also experienced this today in a test where ~18 videos are started, and are all garbage-collected together at the end, causing a flury of task dispatches (to shutdown everything) when there were 3 idle threads at that time. Dispatches happened pretty much together, so the idle threads didn't have time to grab events from the queue, and therefore the idle count stayed at 3. At the end of all this, there were only 5 threads running (out of 25 max), and 20+ events still in the queue, which could have been handled by new threads. My thoughts on this: Either PutEvent should decrement the idle count (but this may have side effects?) as suggested by Jean-Yves, or maybe when the last idle thread grabs something from the queue, it could check if more events are on the queue and create thread/s for it/them. I was *not* working on a fix (yet)! But please let me know if you'd like me to really work on it.

Benjamin Smedberg

Comment 4

•

10 years ago

I cannot help with this.

Flags: needinfo?(benjamin)

Bobby Holley (:bholley)

Comment 5

•

10 years ago

You probably want Nathan, who's the new XPCOM owner.

Flags: needinfo?(nfroyd)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 6

•

10 years ago

I will ask for review when patch is ready.

Flags: needinfo?(nfroyd)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 7

•

10 years ago

Try is green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8e307c28757

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 8

•

10 years ago

Attached patch 1161405_part1_improve_parallelism-v1.patch (obsolete) — Details — Splinter Review

Improve parallelism of nsThreadPool by taking the number of pending events into account when spawning a new thread.

Assignee: nobody → jwwang

Attachment #8601363 - Attachment is obsolete: true

Attachment #8601890 - Flags: review?(nfroyd)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 9

•

10 years ago

Attached patch 1161405_part2_testcase-v1.patch (obsolete) — Details — Splinter Review

Test case for parallel execution of runnables.

Attachment #8601893 - Flags: review?(nfroyd)

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 10

•

10 years ago

A data point, in relation with my comment 3: This patch + EME tests from bug 1138294 still shows issue in the Media stack, with too many tasks created, waiting for subsequent tasks to respond, but there are not enough available threads to deal with the latter ones. E.g.: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f4736a0246f Taking one log: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/gsquelart@mozilla.com-7f4736a0246f/try-linux/try_ubuntu32_vm_test-mochitest-3-bm02-tests1-linux32-build741.txt.gz Search for 'SetIdle', we see 12 of them, but all 25 MediaTaskQueue threads are busy, so no thread can deal with pending DoDrain tasks (which would help get rid of SetIdle's). So while this patch is definitely needed to fix a bug in ThreadPool, it is not enough to fix some classes of issues in the Media stack using it. Bug 1161892 will probably help, by using separate thread pools for separate task queues. Go team!

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 11

•

10 years ago

If it is about thread limit of the pool, can you change 25 to 50 or above to see if that helps?

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 12

•

10 years ago

Much better with 250 threads: https://treeherder.mozilla.org/#/jobs?repo=try&revision=19c762c50c7c I previously tried 250 threads but without your fix and still had lock-ups, so your fix is definitely helping.

Nathan Froyd [:froydnj]

Comment 13

•

10 years ago

Comment on attachment 8601893 [details] [diff] [review] 1161405_part2_testcase-v1.patch Review of attachment 8601893 [details] [diff] [review]: ----------------------------------------------------------------- I'm trying to think of some way to make this test less timing-dependent; especially on b2g emulators, it seems plausible that we could intermittently fail. What about forcing some sort of explicit synchronization between the two dispatched events, say r1 spinning on an Atomic<bool> that r2 sets to true? The test would deadlock if r1 and r2 had to execute on the same thread, but maybe that wouldn't be such a bad thing? ::: xpcom/tests/gtest/TestThreadPool.cpp @@ +93,5 @@ > + PR_Sleep(PR_MillisecondsToInterval(2000)); > + > + // Dispatch 2 events in a row. Since we are still within the thread limit, > + // We should wake up the idle thread and spawn a new thread to serve these > + // 2 events. We also expect |r2| (seqNum=2) to finish before |r1| (seqNum=1). Shouldn't seqNum=3 for r1? So, just to make sure I understand things, the desired sequence of events here is: 1. Dispatch r1 2. Have an idle thread, service r1 on that. 3. Dispatch r2 4. We haven't reached our limit of threads running, spawn another thread to handle that. which is, I think, what we get with the current patch, but the behavior today is: 1. Dispatch r1 2. Have an idle thread, service r1 on that. Let's call this T1. 3. Dispatch r2 4. We *should* have no idle threads at this point. 5. But T1 might not have decremented the count of idle threads, because it might not have retrieved r1 from the queue yet. 6. It looks like we have an idle thread, then. Stick r2 in the queue and wait. Is that correct? If so, then at step 4/5, we're really at the mercy of the thread scheduling of the OS, and whether or not T1 actually gets to update the state of the thread pool prior to trying to figure out what to do with r2, correct? It seems like we ought to be able to do better here by more eagerly updating the status of the thread pool, which I think is what Gerald was suggesting in comment 3.

Attachment #8601893 - Flags: review?(nfroyd) → feedback+

Nathan Froyd [:froydnj]

Comment 14

•

10 years ago

Comment on attachment 8601890 [details] [diff] [review] 1161405_part1_improve_parallelism-v1.patch Review of attachment 8601890 [details] [diff] [review]: ----------------------------------------------------------------- I'm curious how alternate approaches would work, so just f+'ing this while we discuss. ::: xpcom/threads/nsEventQueue.cpp @@ +135,5 @@ > + for (Page* page = mHead; page != mTail; page = page->mNext) { > + count += EVENTS_PER_PAGE; > + } > + > + count += mOffsetTail - mOffsetHead; This computation looks suspicious since mOffsetTail and mOffsetHead don't necessarily have to point to the same thing. But I think everything actually works out. A comment is probably worthwhile...maybe initializing |count| with |-mOffsetHead| would read a little better, once the comment was written?

Attachment #8601890 - Flags: review?(nfroyd) → feedback+

Nathan Froyd [:froydnj]

Comment 15

•

10 years ago

ni? for comment 13.

Flags: needinfo?(jwwang)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 16

•

10 years ago

(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #13) Isn't it also possible that T1 could handle both r1 and r2 before a new thread has had time to spin up? In which case we'll spin up an extra thread for nothing.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 17

•

10 years ago

(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #13) > Is that correct? If so, then at step 4/5, we're really at the mercy of the > thread scheduling of the OS, and whether or not T1 actually gets to update > the state of the thread pool prior to trying to figure out what to do with > r2, correct? Exactly. > It seems like we ought to be able to do better here by more eagerly updating > the status of the thread pool, which I think is what Gerald was suggesting > in comment 3. I thought about that and decided that is complicated to implement. Say we have an idle and an active thread and a new event about to be put into the queue. If we eagerly decrement |mIdleCount| in nsThreadPool::PutEvent, then the idle thread won't have to decrement |mIdleCount| again when it wake up. However, it is also possible the active thead picks the event first and we have to restore |mIdleCount|. Therefore I take a simple approach by comparing the idle thread count with the pending event count. If the idle thread count is larger than the pending event count, we are guaranteed to have at least idle thread to serve the incoming event and we don't need to spawn a new thread. Otherwise, we spawn a new thread for the new event even though it is possible another active thread could pick the new event before the newly spawned thread. However, that is fine since we want as much parallelism as possible for the thread pool. I will re-write the test case to avoid the timing issue.

Flags: needinfo?(jwwang)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 18

•

10 years ago

Attached patch 1161405_part1_improve_parallelism-v2.patch — Details — Splinter Review

Add comments.

Attachment #8601890 - Attachment is obsolete: true

Attachment #8602479 - Flags: review?(nfroyd)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 19

•

10 years ago

Attached patch 1161405_part2_testcase-v2.patch — Details — Splinter Review

Fix timing issue in testcase-v1.

Attachment #8601893 - Attachment is obsolete: true

Attachment #8602480 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 20

•

10 years ago

I think it's possible for comment 16 to still be true, yes? Also wondering if it makes more sense for nsEventQueue to just count events as they're added/removed. I won't be able to look at these today, too many other things in the queue, I'll go through them tomorrow.

Flags: needinfo?(jwwang)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 21

•

10 years ago

(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20) > I think it's possible for comment 16 to still be true, yes? Do you mean the testcase-v2? No. Since r1 won't finish until r2 set mDone to true, there is no way for T1 to handle both r1 and r2. > Also wondering if it makes more sense for nsEventQueue to just count events > as they're added/removed. Do you mean the implementation of nsEventQueue::Count()? It is a lot easier to do that. I am fine with both options.

Flags: needinfo?(jwwang)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 22

•

10 years ago

(In reply to JW Wang [:jwwang] from comment #21) > (In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20) > > I think it's possible for comment 16 to still be true, yes? > Do you mean the testcase-v2? No, he (and I) mean for generic runnables r1 and r2, not for the ones in your test. > > Also wondering if it makes more sense for nsEventQueue to just count events > > as they're added/removed. > Do you mean the implementation of nsEventQueue::Count()? It is a lot easier > to do that. I am fine with both options. I would say no actually because this way the cost (small though it may be) is eaten by the caller of Count(), not by everyone using an nsEventQueue.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 23

•

10 years ago

(In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #22) > No, he (and I) mean for generic runnables r1 and r2, not for the ones in > your test. Ah, I see. Then the answer is yes. I think that is fine since we want to maximize concurrency and we will shutdown idle threads that times out. > I would say no actually because this way the cost (small though it may be) > is eaten by the caller of Count(), not by everyone using an nsEventQueue. Do you say no to my implementation where the cost is eaten by the caller of Count()?

Nathan Froyd [:froydnj]

Comment 24

•

10 years ago

(In reply to JW Wang [:jwwang] from comment #23) > (In reply to Ben Turner [:bent] (use the needinfo flag!) from comment #22) > > No, he (and I) mean for generic runnables r1 and r2, not for the ones in > > your test. > Ah, I see. Then the answer is yes. I think that is fine since we want to > maximize concurrency and we will shutdown idle threads that times out. Well, maximizing concurrency is nice, but minimizing time (memory, power, etc.) wasted by spinning up threads we don't need is also desirable. That being said, it looks like many of our uses of nsThreadPool already set threadLimit == idleThreadLimit, so spinning up an extra thread doesn't necessarily cause a bunch of churn, since that thread is likely to hang around for a while anyway.

Nathan Froyd [:froydnj]

Comment 25

•

10 years ago

Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Review of attachment 8602479 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/threads/nsEventQueue.cpp @@ +131,5 @@ > + if (!mHead) { > + return 0; > + } > + > + /* How we count the number of events in the queue: Thanks for writing this comment.

Attachment #8602479 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Updated

•

10 years ago

Attachment #8602480 - Flags: review?(nfroyd) → review+

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 26

•

10 years ago

Thanks for the review.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 27

•

10 years ago

Try with gtests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3b5bf59f049

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

10 years ago

Keywords: checkin-needed

JW Wang [:jwwang] [:jw_wang]

Assignee

Updated

•

10 years ago

Keywords: checkin-needed

Pulsebot

Comment 28

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/eac3aa030c4e https://hg.mozilla.org/integration/mozilla-inbound/rev/a4700f2b25fc

Carsten Book [:Tomcat]

Comment 29

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/eac3aa030c4e https://hg.mozilla.org/mozilla-central/rev/a4700f2b25fc

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox41: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla41

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 30

•

10 years ago

Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Approval Request Comment [Feature/regressing bug #]:unknown [User impact if declined]:This is required to fix the test case timeout in bug 1105720. [Describe test coverage new/current, TreeHerder]:Tested on TreeHerder and Central for a week. So far so good. [Risks and why]: Low. The change is simple and has been tested on Central. [String/UUID change made/needed]:none

Attachment #8602479 - Flags: approval-mozilla-beta?

Attachment #8602479 - Flags: approval-mozilla-aurora?

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 31

•

10 years ago

Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: This patch increases the concurrency of the thread pool for better media playback performance and prevents media playback from hanging in some rare cases. This patch is also required to fix test case timeout in bug 1105720. User impact if declined: media playback would hang in rare cases. Fix Landed on Version:41 Risk to taking this patch (and alternatives if risky): low String or UUID changes made by this patch: none

Attachment #8602479 - Flags: approval-mozilla-esr38?

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 32

•

10 years ago

Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch [Approval Request Comment] Bug caused by (feature/regressing bug #): unknown User impact if declined: This patch is required to fix test case timeout in bug 1105720. Otherwise the test will have to be disabled and decrease our test coverage as well as software quality. Testing completed: Treeherder and run on Central for a week. Risk to taking this patch (and alternatives if risky): low String or UUID changes made by this patch:none

Attachment #8602479 - Flags: approval-mozilla-b2g37?

Josh Cheng [:josh]

Updated

•

10 years ago

status-b2g-v2.2: --- → affected

status-b2g-master: --- → fixed

Josh Cheng [:josh]

Comment 33

•

10 years ago

Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Approving and please NI if causing any issue and need backout.

Attachment #8602479 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

status-firefox39: --- → affected

status-firefox40: --- → fixed

status-firefox-esr38: --- → affected

(PTO June 19-July 7) Ryan VanderMeulen

Updated

•

10 years ago

status-firefox40: fixed → affected

Liz Henry (:lizzard) (relman/hg->git project)

Comment 34

•

10 years ago

Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch We are trying to keep ESR changes to major impact security and stability fixes.

Attachment #8602479 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38-

Liz Henry (:lizzard) (relman/hg->git project)

Comment 35

•

10 years ago

Comment on attachment 8602479 [details] [diff] [review] 1161405_part1_improve_parallelism-v2.patch Approved for uplift to beta (39) and aurora (40) to improve media playback stability and make sure our existing tests work correctly.

Attachment #8602479 - Flags: approval-mozilla-beta?

Attachment #8602479 - Flags: approval-mozilla-beta+

Attachment #8602479 - Flags: approval-mozilla-aurora?

Attachment #8602479 - Flags: approval-mozilla-aurora+

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

10 years ago

status-firefox-esr38: affected → wontfix

(PTO June 19-July 7) Ryan VanderMeulen

Comment 36

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/1f92963a4c91 https://hg.mozilla.org/releases/mozilla-aurora/rev/eb63bda4d444

status-firefox40: affected → fixed

Flags: in-testsuite+

(PTO June 19-July 7) Ryan VanderMeulen

Comment 37

•

10 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/4e1782f281d5 https://hg.mozilla.org/releases/mozilla-beta/rev/5f5e1ff2d8a4

status-firefox39: affected → fixed

(PTO June 19-July 7) Ryan VanderMeulen

Comment 38

•

10 years ago

Backed out from beta for hitting reproducible crashes on linux32 mochitest-bc. https://treeherder.mozilla.org/logviewer.html#?job_id=371260&repo=mozilla-beta https://hg.mozilla.org/releases/mozilla-beta/rev/94832733e36b

status-firefox39: fixed → affected

Flags: needinfo?(jwwang)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 39

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9da332daa6ab There are crashes on Beta without the patch applied.

Flags: needinfo?(jwwang)

(PTO June 19-July 7) Ryan VanderMeulen

Comment 40

•

10 years ago

mozilla-beta went green post-backout. Also, your Try push is showing bustage on e10s runs, which is expected for mozilla-beta since e10s is disabled there. Note that the non-e10s runs are green. https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&filter-searchStr=Linux%20pgo%20Mochitest%20Mochitest%20Browser%20Chrome%20M%28bc1%29

Flags: needinfo?(jwwang)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 41

•

10 years ago

The only cause I can think of is this patch have thread pool use more threads and is more likely to cause out of memory. However it is strange crashes only show on PGO builds. Btw, how can I make PGO builds appear on Treeherder? https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build doesn't work for me. I will try to tweak configuration of thread pool to find out if it is really about OOM.

Flags: needinfo?(jwwang) → needinfo?(ryanvm)

Andrei Vaida [:avaida]

Updated

•

10 years ago

Flags: qe-verify-

(PTO June 19-July 7) Ryan VanderMeulen

Comment 42

•

10 years ago

They won't show up as pgo because it's a build system hack, but you can verify by checking the build log that they are.

Flags: needinfo?(ryanvm)

(PTO June 19-July 7) Ryan VanderMeulen

Comment 43

•

10 years ago

Also, I was able to reproduce on Try without pgo, fwiw.

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 44

•

10 years ago

Finally I can repro it on Linux opt build. (https://treeherder.mozilla.org/#/jobs?repo=try&revision=69e2df98a9ef) However, I can't repro it after setting a hard limit (3 at most) to the number of threads in the pool. (https://treeherder.mozilla.org/#/jobs?repo=try&revision=228bf46baf39) It is obvious my patch makes it more likely to use more threads and thus more memory (and more likey for OOM) in order to maximize concurrency to improve performance. I will add logs to find out how many threads are enough to cause OOM.

(PTO June 19-July 7) Ryan VanderMeulen

Comment 45

•

10 years ago

I doubt this is going to make Beta39 at this point. JW, do you think this is still worth trying to get in for B2G v2.2?

status-firefox39: affected → wontfix

Flags: needinfo?(jwwang)

JW Wang [:jwwang] [:jw_wang]

Assignee

Comment 46

•

10 years ago

I think not. I think B2G 2.2 is fine without this small improvement.

Flags: needinfo?(jwwang)

(PTO June 19-July 7) Ryan VanderMeulen

Updated

•

10 years ago

status-b2g-v2.2: affected → wontfix

You need to log in before you can comment on or make changes to this bug.