Open Bug 1538784 Opened 6 years ago Updated 1 month ago

Set nsThreadPool limits more intelligently

Categories

(Core :: XPCOM, enhancement)

Other Branch
enhancement

Tracking

()

Performance Impact medium

People

(Reporter: alexical, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: perf:resource-use)

There's a large number of places (the only exception I can find is the PaintWorker thread pool) where we hard-code the thread limit, often to a number like "4". This value is straightforwardly counter-productive if the user has fewer than 4 logical cores unless the workers are IO-bound. In general I suspect that we should never be setting this value higher than max(1, PR_GetNumberOfProcessors() - 1).

Tentatively marking this as a qf:p2:resource issue. This sounds particularly bad, especially on weaker hardware.

Whiteboard: [qf] → [qf:p2:resource
Whiteboard: [qf:p2:resource → [qf:p2:resource]
Component: General → XPCOM
Performance Impact: --- → P2
Whiteboard: [qf:p2:resource]
Severity: normal → S3

I think comment 0 is still accurate in describing the status quo, we are setting explicit limits quite rarely and AFAICT only in one case based on actual cores.
I am less sure about the performance gain we could achieve by lowering that number as machines hardly ever have less than 4 logical cores for quite a while now (even Android phones). And bug 1891664 significantly reduced the overhead from spinning up and down threads on frequent bursts of workload, which would have been mitigated previously to some extent by just lowering the number of threads, of course.

Increasing the default for all pools based on the number of logical cores might be also counter-productive, as we are running multiple processes, so the effectively available number of cores could be much lower in any given moment of time. It is probably best to not use very high numbers in general and to let the OS figure out the best scheduling when there are more threads than cores. Note that bug 1891664 also improved the scheduling in a way that makes it much more likely to reuse the same thread for subsequent (sparse) events, making it also easier for the OS to stick to the same core and benefit from hot caches and such.

There might be some investigation we could do in adding telemetry to see which thread pools hit our limits often at all, to see if there are any blind spots. My expectation would be that in many cases we run events through a (serializing) TaskQueue on top of a pool, such that hitting the limits might actually happen less often than we may fear.

Depends on: 1895073
See Also: → 1891664
You need to log in before you can comment on or make changes to this bug.