Open Bug 1402265 Opened 7 years ago Updated 1 year ago

Better estimate CPU resources and apportion HelperThreads to tasks

Categories

(Core :: JavaScript Engine, enhancement, P2)

enhancement

Tracking

()

People

(Reporter: lth, Unassigned)

References

(Blocks 1 open bug)

Details

A number of things are going on in an uncoordinated way:

Bug 1400383 clamped the CPU count in the HelperThreads system to 8 based on the observation that most single-CPU systems do not have more than 8 cores and on NUMA systems with more than 8 cores the result of having more than 8 CPUs in the helper threads system leads to a slowdown.  8 is OK at the moment but recent high-end CPUs actually have higher core counts and we should deal with that.  (And NUMA interconnects may also not be a uniform thing.)  See eg bug 1400383 comment 4.

Various parts of the JS system assume that they can essentially use a lot of the available helper threads for their own purposes: wasm will use up to cpuCount threads for parallel compilation; the GC uses up to cpuCount threads for sweeping and compacting; ion compilation will use all threads available (more than the number of cores).  If these activities should happen to overlap we'll thrash a bit, limited only by the number of threads available.  There are other static resource allocation decisions in HelperThreads.cpp, see every max*Threads() method.

Additionally we have wasm tier-2 compilation which we want to finish as soon as we can but never in a way that slows down other tasks or causes jank.  There might be other jobs in this category.

We have a crude static-priority scheduler in the HelperThreads system, which will cdr down a list of possible task types and always pick the first one it comes to, nevermind if that starves lower-priority tasks.  Some lower-priority tasks probably should not starve.

(And there are complications to handle, like ongoing GC holding back compression tasks and complicated & duplicated cancellation / shutdown semantics.)

Presumably some of this is also sometimes at cross purposes with Firefox.  For one thing, we compute the core count and thread count ourselves, we do not allow the host environment to set it or control it dynamically.

It's not obvious how (or how much) to fix this, but more and more it looks like it needs a general scheduler solution.  There's no indication that we have really awful performance or anything like that, but it seems like a system that has grown slightly out of control.
A few notes from a conversation I had with Luke, plus thoughts around that.

It's tempting to make use of the OS scheduler for our own scheduling decisions by having more threads and then manipulating thread priority as appropriate to express task priority.  For example, we can run wasm tier-2 compilation on low-priority threads.  But to do so we need to be sure the wasm compilation does not use threads out of our fixed thread pool in a way that prevents other higher-priority work from being done.  So we'll likely end up adding more threads, and/or having a separate pool of threads for low-priority work.

There's possibly a distinction between low-priority work that can't be left to starve (compression?) and low-priority work that can starve, at least for a while (wasm tier-2 compilation; ion freeing?).  So perhaps we have several priority levels, cut along these lines.

For some tasks we can perhaps try to determine the level of resourcing we need, eg, for a particular tier-2 compilation job perhaps we don't want to try to allocate all the low-priority threads, but only some of them (so as to allow other work to happen too).  But it's not obvious that this is any better than having a notion of job groups at a given priority level and making sure that the job groups are fairly scheduled by interleving the tasks of the groups among the threads at that priority level.
Blocks: 1391197
Priority: -- → P2
There's also the balance between the size of the jobs and the number of jobs.  Right now we're using a static work division based on measurements Benjamin made for throughput of both compilers.  But the work division for tier-2 ion might be different because we may have a different metric for what constitutes good resource use; it may be that low-priority threads should have higher availability, for example, which would lead us to use smaller work increments for background compilation.
Blocks: 1435997
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.