1670737 - nsThreadPool is not reusing threads the way it should

:Gijs (he/him)

Reporter

Description

•

5 years ago

It appears that thread pools end up creating a thread when there are more queued events than idle threads:

https://searchfox.org/mozilla-central/rev/c2e3ac11be4837718c2604e58085fbc8252b69dd/xpcom/threads/nsThreadPool.cpp#89-93

but then end up destroying that thread shortly afterwards, when realizing that we cannot keep more idle threads:

https://searchfox.org/mozilla-central/rev/c2e3ac11be4837718c2604e58085fbc8252b69dd/xpcom/threads/nsThreadPool.cpp#251-254,266

and because the thread immediately removes itself from mThreads, shutting down the "real" thread asynchronously, we can create another thread soon afterwards, and then this repeats.

I'm not entirely clear on how we can both have the thread not finding an event to process, and yet the thread pool finding more events than can be serviced by the idle threads, but I see us churn through 100 Stream Transport threads on startup on macOS (checking with some custom logging), and Florian's profiling markers were showing up similar issues.

This kind of situation could also potentially explain bug 1667581.

:Gijs (he/him)

Reporter

Comment 1

•

5 years ago

Attached file Log of thread pool management — Details

Here's a sorted log of what happens around stream transport service threadpool handling of threads.

In this 2.5 minute session (most of the log taken up by the first few seconds of startup), some 200 threads were created and destroyed, just for the stream transport service. Here's a typical example of the issue:

1602526897410460 StreamTrans Put event, now got 1
1602526897410464 StreamTrans #148 - yay event
1602526897410464 StreamTrans Put event, now got 1
1602526897410470 StreamTrans #143 - yay event
1602526897410470 StreamTrans Put event, now got 1
1602526897410484 StreamTrans Put event, now got 2
1602526897410503 StreamTrans Put event, now got 3
1602526897410508 StreamTrans Put event, now got 3
1602526897410509 StreamTrans #145 - yay event
1602526897410514 StreamTrans 3 events (limit 3), spawning
1602526897410533 StreamTrans Put event, now got 4
1602526897410536 StreamTrans New named thread (6 / 25)
1602526897410539 StreamTrans #140 - yay event
1602526897410544 StreamTrans #140 - yay event
1602526897410545 StreamTrans #135 - yay event
1602526897410547 StreamTrans #148 - no event
1602526897410548 StreamTrans #143 - yay event
1602526897410561 StreamTrans #147 - no event
1602526897410570 StreamTrans #145 - no event
1602526897410573 StreamTrans #143 - no event
1602526897410575 StreamTrans #135 - no event
1602526897410585 StreamTrans #140 - no event
1602526897410587 StreamTrans #140 - ending because it's going idle but we already have exactly 5 idle threads

We start off with 0 events in the queue. An event gets added and processed. Then something adds a bunch of events in a row. When we hit 5 events, with 2 taken up by previously idle threads , this means we have 3 events and 3 idle threads, so we create a new thread. All the events get processed, and so 50µs after we created a thread, we destroy a thread, because once it is about to go idle we check and realize have 5 idle threads already (which is the maximum).

Kris probably has better ideas about a longer-term "right" fix, but some things that would appear like they might improve this are:

increasing the maximum number of idle threads (which will reduce the frequency of idle threads being saturated)
come up with a way of destroying threads less eagerly
make the limit on idle threads more dynamic based on the load the thread pool sees
don't create more threads quite so eagerly, and accept waiting for events to be picked up - it's not clear that, given we're away from the main thread and using multiple processes, adding more threads is going to get the events processed any faster
investigate consumers that post series of runnables to the threadpool. One particularly bad example I've noticed through Florian's work on profiler markers is nsSegmentedBuffer, which is used (via storage stream) by http channels, the startupcache, and various other bits of infrastructure. When destroyed, it frees its data in segments with a runnable for each item (implemented in bug 1584898). Chunking that work (freeing e.g. 8 or 32 chunks of memory in each task or whatever) would likely alleviate some of the problem here.

Kris Wright :KrisWright

Comment 2

•

5 years ago

We have more or less concluded that this is an issue inherent to the way nsThreadPool ramps up and down its thread count, and there's no easy way to fix it. Maybe this won't be an issue anymore after Bug 1672257 and we start sending things to the scheduler pool. The easiest "fix" for now is to keep idle threads alive longer instead of immediately destroying them when we exceed our idle thread limit. It doesn't necessarily fix the underlying issue of "nsThreadPool can make threads faster than they can destroy themselves" but since this is an issue that usually emerges with frequent, large batches of quick tasks like FreeOMT calls coming into the thread pool, keeping them alive longer means there's more likely going to be a thread available to service the next batch of tasks as they receive them even if we exceeded our idle thread count.

Kris Wright :KrisWright

Comment 3

•

4 years ago

This seems to be causing an issue in the background thread pool right now so I'll be looking into this again.

Assignee: nobody → kwright

Nika Layzell [:nika] (ni? for response)

Comment 4

•

3 years ago

I remember we had discussed having a limit on the number of threads which are able to be shutting down concurrently for a threadpool - was that ever implemented?

(In reply to :Gijs (he/him) from comment #1)
5. investigate consumers that post series of runnables to the threadpool. One particularly bad example I've noticed through Florian's work on profiler markers is nsSegmentedBuffer, which is used (via storage stream) by http channels, the startupcache, and various other bits of infrastructure. When destroyed, it frees its data in segments with a runnable for each item (implemented in bug 1584898). Chunking that work (freeing e.g. 8 or 32 chunks of memory in each task or whatever) would likely alleviate some of the problem here.

FWIW we batch together these frees since bug 1667581, which I imagine reduces the severity of this issue somewhat.

Severity: -- → S2

Flags: needinfo?(kwright)

Kris Wright :KrisWright

Comment 5

•

3 years ago

We haven't implemented any kind of permanent fix for this. As it stands a thread pool is not aware of when a thread shutdown resolves. We could probably fix this by adding some thread shutdown context to remove itself from some shutdown list when it resolves.

Flags: needinfo?(kwright)

Nika Layzell [:nika] (ni? for response)

Updated

•

2 years ago

Severity: S2 → S3

Jens Stutte [:jstutte]

Updated

•

1 year ago

Comment 6

•

1 year ago

This work moved to bug 1891664 (without noticing this bug early enough to keep it here). I assume it could even be marked as duplicate?

Assignee: kwright → nobody

Depends on: 1891664

Flags: needinfo?(kwright)

Comment 7

•

1 year ago

Makes sense, marking this one as duplicate.

Status: NEW → RESOLVED

Closed: 1 year ago

Duplicate of bug: 1891664

Flags: needinfo?(kwright)

Resolution: --- → DUPLICATE

Bugzilla

nsThreadPool is not reusing threads the way it should

Categories

(Core :: XPCOM, defect)

Tracking

()

People

(Reporter: Gijs, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type