There are three desiderata for changing the way Workers are scheduled (1) Prevent workers from starving each other (2) Prevent sync XHR from tying up an OS thread / scheduling slot (3) With (1) and (2), schedule workers fairly and to meet quantitative efficiency goals: maximum computational throughput and minimum latency in responding to IO events I think all these goals can be met by implementing a moderately simple userspace "Worker threads" scheme, built on top of NSPR/XPCOM threads. Each Worker would be assigned an OS thread. There would be a fairly high limit on total OS threads allocated, say 256; exceeding that limit would throw an exception on creating a worker. (Probably want a per-domain limit too.) We would choose a much smaller "optimal" number of workers to execute concurrently, call this K. This library would be built on the following basic mechanisms * scheduling OS threads with mutex/condvars: when a thread is scheduled, its condvar is signaled. When it's pre-empted, it waits on the condvar until being scheduled again. * pre-emption of Worker JS with JS_TriggerOperationCallback() * Worker JS XHR code "yield()"ing when it would block * interval timer on the main thread simulating a HW timer interrupt With these mechanisms, I think a reasonable scheduling scheme is * limit the number of scheduled OS threads to K * maintain a "runnable" FIFO queue; keep workers from the same page/domain adjacent in the queue for better data locality when executing them * every T_c ms, where T_c is the "time slice" allotted to computational Workers, fire a timer event on the main thread, schedule up to K workers, and pre-empt descheduled workers. To be clear, this means that if there are < K runnable workers, then the workers can run indefinitely without being pre-empted so as to avoid "context switch" overhead * give Worker threads one of two "personalities" --- INTERACTIVE or COMPUTATIONAL. Workers become INTERACTIVE when they send an XHR, and revert to COMPUTATIONAL when pre-empted. INTERACTIVE workers are given a time slice T_i, with T_i < T_c * reserve 1 scheduling slot for INTERACTIVE workers, so that perceived responsiveness is good wrt to sync XHR responses. The slot is "reserved" in the sense that the worker running in the slot (if any) will be immediately pre-empted when a sync XHR response arrives, and the Worker receiving the response will be scheduled in its place T_c and T_i should be set based on profiling/experience, but I think reasonable initial values are 1000 ms and 100 ms, respectively. AFAICT, this should only improve on the current implementation. One potential regression is in memory usage, as we would increase the number of OS threads allocated to workers.
(In reply to comment #0) > * pre-emption of Worker JS with JS_TriggerOperationCallback() The SM guys tell me that the latency of TriggerOperationCallback() is "next opcode" in the interpreter, and "next branch" in JITed code, so using this as a pre-emption mechanism should be quite precise. Its only vulnerability is extremely long basic blocks in loopy, JITed code, but I don't think this is worth worrying about.
(In reply to comment #1) > (In reply to comment #0) > > * pre-emption of Worker JS with JS_TriggerOperationCallback() > > The SM guys tell me that the latency of TriggerOperationCallback() is "next > opcode" in the interpreter, and "next branch" in JITed code, so using this as a > pre-emption mechanism should be quite precise. Its only vulnerability is > extremely long basic blocks in loopy, JITed code, but I don't think this is > worth worrying about. Hmm ... IIUC, this would also be vulnerable to calls into native code that could take a long time, like String.match(/* really long string */). But these probably aren't worth worrying about either.
(In reply to comment #0) > * give Worker threads one of two "personalities" --- INTERACTIVE or > COMPUTATIONAL. Workers become INTERACTIVE when they send an XHR, and revert to > COMPUTATIONAL when pre-empted. INTERACTIVE workers are given a time slice T_i, > with T_i < T_c An interesting follow-up project would be to add VISIBLE and HIDDEN attributes, where Workers in the currently focused page are VISIBLE and all others are HIDDEN. Then VISIBLE workers could be given higher priority than HIDDEN workers, and scheduling choices could be made with a lottery, e.g.
(In reply to comment #0) > There would be a fairly high limit on > total OS threads allocated, say 256; exceeding that limit would throw an > exception on creating a worker. (Probably want a per-domain limit too.) I'm beginning to feel uneasy about this part of the proposal. It could be seen as a regression from the current implementation, which can deal with a very large number of workers (though they might be infinitely starved). I would prefer to treat OS thread allocation as we would when trying to allocate huge arrays --- obey the requests until the platform APIs error out. There would still be somewhat of a regression in that we couldn't allocate as many workers, since each becomes more expensive, but at least we wouldn't be imposing an arbitrary limit. (Though we would still respect an explicit limit set by a user pref.) Put another way, this proposal would trade more memory per Worker for fair scheduling. Thoughts?
Bug 649537 gives each worker its own thread, up to 20 per domain by default. Beyond 20 the workers are queued (we still create the JS object but don't actually start the worker). Can this bug be resolved?
There are some things I'd like to try here beyond OS-thread-per-worker, but TBH it's not going to happen anytime soon. If closing this would help you somehow, please do ;).
There are many other worker improvements we should do before this, and I don't think having this bug around is helping anyone.