Closed Bug 1509283 Opened 6 years ago Closed 6 years ago

When compiling in parallel, compile the last compileTask locally

Categories

(Core :: JavaScript: WebAssembly, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: bbouvier, Assigned: bbouvier)

Details

Attachments

(1 file)

(This came from a discussion with Nical on parallel work, work-stealing and the Rayon crate.)

When the master thread is done launching the compilation batches on the shared task queue, it's just waiting for other threads to finish their work. This is bad, especially when doing baseline compilation, because the parallelization overhead might be high in this case, and the master thread might get descheduled. If we compile the last task on the master thread, then with some chance it'll be the last one to finish compilation, and won't have to wait for helper threads to finish, which means no descheduling either.

(Of course, in an ideal system, we'd have something like the master thread pushing tasks on a lock-free double-ended queue, and helper threads stealing tasks from this queue, and everybody would be working until the queue is empty. This would be a bigger project.)

The simple change consisting in moving the last task compilation on the master thread causes up to a 5% speedup (~10ms) on baseline compilation of angry bots on my machine. Benchmarking with taskset on an increasing number of CPUs from 1 to 8 shows that this is more effective the more CPUs we have, but it's always a win.
Attachment #9026923 - Flags: review?(luke)
Comment on attachment 9026923 [details] [diff] [review]
locallycompilecurrenttask.patch

Review of attachment 9026923 [details] [diff] [review]:
-----------------------------------------------------------------

It's subtle, but I think this isn't quite what you want for doing work-stealing since finishFuncDefs() isn't blocking; it's issuing work that can proceed in parallel to the remaining generator-thread work that occurs between finishFuncDefs() and the actual blocking call in finishCodeTier().  Really, to do work-stealing, you want to change finishOutstandingTask() to, instead of wait()ing, first check if there are any pending CompileTasks in the queue and, if there are, work on those instead.  This would be a fantastic change for two reasons:
 - it would also avoid the blocking that happens during the middle of compilation (in MG::compileFuncDef())
 - it would categorically avoid the current dining-philosopher-deadlock hazard and thus the need for the hacky 'isMaster' workaround in GlobalHelperThreadState::checkTaskThreadLimit()

The only problem is that I think Lars tried this a while back (for the same motivation) and there was a problem.  I forget the exact details, but the problem was that we needed to change GlobalHelperThreadState::currentTask dynamically and this broke some assumption that currentTask didn't change intra-task.  ISTR the need to change currentTask had to do with the asm.js ParseTask corner case which give me hope that we can solve it either by deoptimizing asm.js or maybe things are better now than before.  Do you remember the details Lars?  I bet it's in a bug comment somewhere...
Attachment #9026923 - Flags: review?(luke)
Attachment #9026923 - Flags: review-
Attachment #9026923 - Flags: feedback?(lhansen)
(In reply to Luke Wagner [:luke] from comment #1)
> Comment on attachment 9026923 [details] [diff] [review]
> locallycompilecurrenttask.patch
> 
> Review of attachment 9026923 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> It's subtle, but I think this isn't quite what you want for doing
> work-stealing since finishFuncDefs() isn't blocking; it's issuing work that
> can proceed in parallel to the remaining generator-thread work that occurs
> between finishFuncDefs() and the actual blocking call in finishCodeTier(). 

Note that I wasn't pretending to implement a full-fledged work stealing solution here, but just wanted to highlight that there's an easy win in having the last batch of work being done on the main thread, since it ends up blocking eventually in finishCodeTier() anyway. It was proven to be a small win in some cases (esp. for the baseline compiler), but it's small enough that it doesn't matter much. If you think this isn't going anywhere, we can just close this bug.
Comment on attachment 9026923 [details] [diff] [review]
locallycompilecurrenttask.patch

Review of attachment 9026923 [details] [diff] [review]:
-----------------------------------------------------------------

This actually feels safe to me, though see below.  (Clearly there's a tiny semantic change here in that we do not check cancellation before compiling the last task and I think that should probably be changed back, and the resetting of the state variables should probably not be duplicated this way, but I don't think that affects correctness.)  I think the only way this could make trouble is if there's an assumption somewhere that if parallel_ is true then all compilation work is done on worker threads, ie, if there's an assumption that the main thread is available to do something on behalf of the worker threads while the worker threads are compiling.

All that said, it certainly sounds familiar that we tried this but didn't do it because there was something that prevented us from doing it :(  The "master task" was always a pain.  And that was definitely brought on by asm.js.  And currentTask_ is certainly used as a flag in some contexts.

I'm going to just clear f? here because I'm not actually sure.  This seems like a nice easy win, if correct...
Attachment #9026923 - Flags: feedback?(lhansen)
(In reply to Benjamin Bouvier [:bbouvier] from comment #2)
Yeah, I was questioning whether it was actually a win, but I suppose that if we compare (1) the parallelism achieved between firing off that final batch and processing the module "tail", (2) the reduction in thread-dispatch-overhead from not having to fire off a parallel task (for a non-full batch), I can see how (1) is usually going to be smaller.  And I suppose in the rather-small module case, it could avoid any thread dispatch entirely.

I was actually hoping to nerd-snipe you into doing full work-stealing, but I suppose we'll just continue waiting until bug 1435997.
Attachment #9026923 - Flags: review- → review+
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4201f7161e7a
Compile the last wasm function batch on the main thread; r=luke
https://hg.mozilla.org/mozilla-central/rev/4201f7161e7a
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
Assignee: nobody → bbouvier
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: