The stream transport service's thread pool keeps starting and stopping threads during downloads
Categories
(Core :: Networking, defect, P2)
Tracking
()
People
(Reporter: mstange, Unassigned)
References
Details
(Whiteboard: [necko-triaged])
Attachments
(5 files, 1 obsolete file)
It seems we tried to address this in bug 1535361 but I'm not sure if it fully worked.
Here's a profile from bug 1646976 that shows many calls to NS_NewNamedThread
during the course of a download: https://share.firefox.dev/2CfdEMK
Updated•5 years ago
|
![]() |
||
Comment 1•5 years ago
|
||
I can't reproduce with a local https server hosting a 3GB file downloading to a fast ssd (100MB download), Nightly. I only see some 7 stream trans threads.
For this too many threads issue, I will ask for logs, with MOZ_LOG=timestamp,nsThreadPool:5
![]() |
||
Updated•5 years ago
|
![]() |
||
Comment 3•5 years ago
|
||
Thanks you!
I found one possible cause: when initializing the pool for stream trans, we set the idle interval to a wrong value. Instead of PR_SecondsToInterval
we must pass only the number of seconds, directly.
Our thread pools have two limits: total number of active threads and the limit of idle thread that can linger for some period of time (tens of seconds) The stream trans pool has these limits set to 25 total active and 5 idle.
The logic controlling the number of threads is kinda flawed. The stream trans pool never reaches more than 5 threads, actually.
I can see a lot of cases in the log where there is a burst of events put to the pool; the pooled threads pick up (with a small delay - not significant, it's an expected system thread scheduling context switching). Because of that, when there are more events waiting in the queue than we have idle threads, we spawn a new (usually sixth) thread. But before the new thread starts, the pending events are all handled by idle threads. The new thread than doesn't find any event and the number of idle threads is now at the limit. We immediately kill that thread.
In the end, this is reiteration of bug 1535361 even with a higher idle limit.
So, there are two things to fix:
- the initialization of the idle timeout for stream trans
- remove these lines to keep fresh threads around for a while
![]() |
||
Comment 4•5 years ago
|
||
![]() |
||
Updated•5 years ago
|
![]() |
||
Comment 5•5 years ago
|
||
![]() |
||
Comment 6•5 years ago
|
||
Depends on D84700
![]() |
||
Comment 7•5 years ago
|
||
Some local experiments and deeper look to the log show that none of the changes I proposed in comment 3 would work here.
The problem (or behavior) I can see in the log from comment 2 is that we put 6 or more events to the pool's queue. On each we notify the threads to wake up and handle them, but the threads still sleep (a low core loaded system?). On the sixths event we see that idle thread number < pending events number and spawn a new thread (and wait for it to be spawned - causing context switch) Before this new thread starts, all idle threads wake up (thanks the context switch) and handle all the queued events, often just one thread, very quickly (<1ms). The new thread then sees no events and is thus immediately killed.
So, for the stream trans service I would propose to have 25 idle threads. As observed and as the trivial algorithm we have behaves, we never have more than ~6 threads, regardless the overall active limit is set to 25. With progressive timeout (shortened with number of idle threads) new threads would linger for a shorter time.
With setting 25 idle threads (same as total num) for stream trans service, during download I can see 4 to 9 threads created and nice, even distribution of tasks among them. But as I can't reproduce this issue easily locally, I can see something very similar with vanilla Gecko... So hard to say.
@Jonas, would you be willing to test an experimental build? (need to create one first)
Updated•5 years ago
|
![]() |
||
Comment 8•5 years ago
|
||
![]() |
||
Comment 10•5 years ago
|
||
@Jonas, can you please check this build? https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/B0sRv9NoT4id__BEJ0nRMw/runs/0/artifacts/public/build/target.dmg
Thanks.
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
Reporter | ||
Comment 14•5 years ago
|
||
It probably didn't affect CPU usage much, but I think it did help with the thread starting and stopping issue. Only 26 samples in NS_NewNamedThread
now: https://share.firefox.dev/39qwTPH
![]() |
||
Comment 15•5 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #14)
It probably didn't affect CPU usage much, but I think it did help with the thread starting and stopping issue. Only 26 samples in
NS_NewNamedThread
now: https://share.firefox.dev/39qwTPH
That was the purpose of the patch:
2020-07-24 15:49:59.662 ⁃ name=StreamTrans #61 ⁃ nsThreadPool::thread ⁃ StreamTrans #61 ⁃ created ⁃ alive-time=44354 ⁃ run-time=581 ⁃ idle-time=43308 ⁃ tasks-ran=6755
2020-07-24 15:49:59.665 ⁃ name=StreamTrans #62 ⁃ nsThreadPool::thread ⁃ StreamTrans #62 ⁃ created ⁃ alive-time=44356 ⁃ run-time=566 ⁃ idle-time=43350 ⁃ tasks-ran=6712
2020-07-24 15:49:59.667 ⁃ name=StreamTrans #63 ⁃ nsThreadPool::thread ⁃ StreamTrans #63 ⁃ released ⁃ alive-time=44111 ⁃ run-time=571 ⁃ idle-time=43056 ⁃ tasks-ran=6754
2020-07-24 15:49:59.793 ⁃ name=StreamTrans #64 ⁃ nsThreadPool::thread ⁃ StreamTrans #64 ⁃ created ⁃ alive-time=44331 ⁃ run-time=564 ⁃ idle-time=43312 ⁃ tasks-ran=6673
2020-07-24 15:50:00.181 ⁃ name=StreamTrans #65 ⁃ nsThreadPool::thread ⁃ StreamTrans #65 ⁃ created ⁃ alive-time=43834 ⁃ run-time=531 ⁃ idle-time=42904 ⁃ tasks-ran=6588
2020-07-24 15:50:00.228 ⁃ name=StreamTrans #66 ⁃ nsThreadPool::thread ⁃ StreamTrans #66 ⁃ created ⁃ alive-time=43788 ⁃ run-time=536 ⁃ idle-time=42825 ⁃ tasks-ran=6602
2020-07-24 15:50:00.278 ⁃ name=StreamTrans #67 ⁃ nsThreadPool::thread ⁃ StreamTrans #67 ⁃ released ⁃ alive-time=43610 ⁃ run-time=684 ⁃ idle-time=42483 ⁃ tasks-ran=6529
2020-07-24 15:50:02.785 ⁃ name=StreamTrans #68 ⁃ nsThreadPool::thread ⁃ StreamTrans #68 ⁃ created ⁃ alive-time=41228 ⁃ run-time=411 ⁃ idle-time=40461 ⁃ tasks-ran=5898
2020-07-24 15:50:03.992 ⁃ name=StreamTrans #69 ⁃ nsThreadPool::thread ⁃ StreamTrans #69 ⁃ created ⁃ alive-time=40029 ⁃ run-time=455 ⁃ idle-time=39163 ⁃ tasks-ran=5641
2020-07-24 15:50:13.886 ⁃ name=StreamTrans #70 ⁃ nsThreadPool::thread ⁃ StreamTrans #70 ⁃ created ⁃ alive-time=30135 ⁃ run-time=262 ⁃ idle-time=29654 ⁃ tasks-ran=3565
2020-07-24 15:50:16.228 ⁃ name=StreamTrans #71 ⁃ nsThreadPool::thread ⁃ StreamTrans #71 ⁃ created ⁃ alive-time=27788 ⁃ run-time=239 ⁃ idle-time=27363 ⁃ tasks-ran=3218
2020-07-24 15:50:24.111 ⁃ name=StreamTrans #72 ⁃ nsThreadPool::thread ⁃ StreamTrans #72 ⁃ created ⁃ alive-time=20026 ⁃ run-time=137 ⁃ idle-time=19802 ⁃ tasks-ran=1769
2020-07-24 15:50:24.251 ⁃ name=StreamTrans #73 ⁃ nsThreadPool::thread ⁃ StreamTrans #73 ⁃ created ⁃ alive-time=19886 ⁃ run-time=127 ⁃ idle-time=19650 ⁃ tasks-ran=1762
2020-07-24 15:50:24.304 ⁃ name=StreamTrans #74 ⁃ nsThreadPool::thread ⁃ StreamTrans #74 ⁃ released ⁃ alive-time=19474 ⁃ run-time=128 ⁃ idle-time=19266 ⁃ tasks-ran=1737
2020-07-24 15:50:26.143 ⁃ name=StreamTrans #75 ⁃ nsThreadPool::thread ⁃ StreamTrans #75 ⁃ created ⁃ alive-time=17914 ⁃ run-time=88 ⁃ idle-time=17751 ⁃ tasks-ran=1539
2020-07-24 15:50:26.144 ⁃ name=StreamTrans #76 ⁃ nsThreadPool::thread ⁃ StreamTrans #76 ⁃ released ⁃ alive-time=17869 ⁃ run-time=97 ⁃ idle-time=17701 ⁃ tasks-ran=1536
only few threads created for the download, instead of several hundreds. This will have some small effect on perf, but is not the root cause of the original bug.
Updated•5 years ago
|
![]() |
||
Comment 16•5 years ago
|
||
Depends on D84701
![]() |
||
Comment 17•5 years ago
|
||
@Jonas, and thanks for testing and providing all the data!
![]() |
||
Comment 18•5 years ago
|
||
Nathan, can you please check on https://phabricator.services.mozilla.com/D84701#2663470 ?
![]() |
||
Comment 19•5 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #18)
Nathan, can you please check on https://phabricator.services.mozilla.com/D84701#2663470 ?
Done.
![]() |
||
Comment 20•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #19)
(In reply to Honza Bambas (:mayhemer) from comment #18)
Nathan, can you please check on https://phabricator.services.mozilla.com/D84701#2663470 ?
Done.
Replied, thanks.
![]() |
||
Updated•5 years ago
|
Updated•4 years ago
|
Description
•