IonMonkey: Compile in chunks/parts on the background thread

RESOLVED FIXED in mozilla33

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: h4writer, Assigned: h4writer)

Tracking

(Blocks 1 bug)

unspecified
mozilla33
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 6 obsolete attachments)

We decided to only compile one IonBuilder at a time for a few reasons. Like battery consumption and to not take all cpu power away from the benchmark. Though there is an issue with this. A big script will stall the background compiler until it is finished. This is an issues on octane-mandreel and asmjs tests.

As a solution I propose a possibility to pause an ion compilation and continue another ionbuilder. That way we can interleave compilation, even when only having one compiler busy at a time.

This would help a lot of issues I'm seeing happening:
1) Would make it more feasible to enable backtracking by default (using optimization levels)
2) Fix the regression of bug 911738 in octane-mandreel
3) Make it possible to remove our script limit for ionmonkey, which would help bug 1010417
4) It will decrease the bimodal of our scores due to bad timed compilation and stalls (I hope this will fix the windows 8 octane-mandreel bad performance)
Posted patch Very WIP (obsolete) — Splinter Review
This is just my playing patch I used to test if it fixes bug 911738.
It brings the score back from 25000 to 26400 on my computer.
27000 is the original score before without any patch. So already much better. There are still some rough edges that will improve performance a little bit.
Assignee: nobody → hv1989
Blocks: 911738
I wonder whether it is really so important to only use one compilation thread:
 - I assume that the bursts of Ion compilation that would actually use more than one core would be relatively rare and not enough to significantly impact battery life.  IIUC, we have power-measurement hardware/people that could actually measure this nowadays.
 - If we limited the number of concurrent Ion compilations to the number of *cores* instead of HW threads, then CPU shouldn't be stolen away from the main thread.  There would be some extra contention for memory access, but it's not clear how this balances faster main thread execution (when the Ion compilations return sooner).  Have there been any measurements of this?
(In reply to Luke Wagner [:luke] from comment #2)
> I wonder whether it is really so important to only use one compilation
> thread:

We can still do that. This patch doesn't make it impossible to do that. If we are committed in doing that, we should open a bug for it...

Now there are some extra reasons why having a way to pause/proceed a compilation would help. Even if we decide to just have more compilation workers.
1) If we have only have 1-2 cores it will still be rather easy to stall the compilation thread. Only 1-2 big tasks will already stall it. With this solution that isn't the case. It removes the compilation stalls, without depending on another factor that can differ between different computers.
2) Possibility to switch tasks due to priority. E.g. with compilation levels I want to start a higher tier compilation. But when a lower level tier gets scheduled to compile, we should "pause" the higher tier and start the lower tier immediately. (Instead of waiting till at least one cpu is free again)

This patch is rather small and not really complex.
Every IonBuilder has two extra functions. One to request it to "pause" and another to request it to "proceed".
We have a pool of threads (5 in this case) where we schedule the ionbuilders. An extra thread oversees this pool and let every thread work for a small bit, whereafter it pauses the thread and starts the next thread.
Currently I settled for 1ms. Most compilations are done during that time.
Sounds good
Blocks: 1010417
Posted patch WIP (obsolete) — Splinter Review
I would like to have some feedback on this patch, since you almost own HelperThreads.cpp.

Goal:
To be able to pause tasks on the background thread, allowing us to switch tasks during execution in order to prevent stalling the threads with big tasks. (e.g. for now when a big ionscript is scheduled all other ionbuilders need to wait till it is finished. Or maybe in the future if we decide to allow running ionbuilder concurrently to not take all threads, so other tasks cannot get finished).

I already listed the bugs that would be able to preceed. E.g. possibly enable backtracking and also a big step for optimization levels. This should make it possible to schedule scripts better. (lower optimization levels shouldn't get blocked by higher optimization level, esp since those take longer).

Implementation:
A thread can mark itself temporary "pausable". Which currently only IonBuilder does.
When there are "pausable" tasks a timer gets enabled that will switch active/paused threads every 1ms. Currently this is just a rotation, but it is possible to add priority here. (usecount/optimization level/sort task). Currently it is only allowed to have 5 paused tasks. In order to not have a big pool of paused threads that don't get finished.

The patch itself can still use some cleanup, but mostly works. The 1ms is choosen so almost all tasks are finished during that period. Only the big scripts will go over it. The maximum of 5 paused tasks is also arbitrarily chosen and in a optmized build isn't exhausted. (debug builds have a lot of debug checks and take much longer and go to this limit.). I think it might be lower, like 3 or something.
Attachment #8425379 - Attachment is obsolete: true
Attachment #8439143 - Flags: feedback?(bhackett1024)
Comment on attachment 8439143 [details] [diff] [review]
WIP

Review of attachment 8439143 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is the right approach.  I'd really like to keep the helper threads as simple as possible, to make it easier to give them new kinds of jobs in the future, keep the code easy to read, and avoid bugs.  Additionally, pausing threads makes it harder to fully utilize all cores (since the number of helper threads is tied to the number of cores) and to reason about the possibility of deadlocking when there aren't enough threads available.

An alternative I think would be both more effective and a lot simpler would be to do stopping/resuming of compilations within the Ion compilation backend itself.  The compilation thread can periodically check whether there is a shorter and/or higher priority compilation it could be doing instead (ideally this would be done every 1ms I guess, since doing this check will require locking).  If there is another job for it, it just marks where it is in the compilation, sticks the IonBuilder back on the HelperThreadState worklist, then returns and gets another job to do.  When that thread, or any other, gets back to the interrupted compilation it just resumes where it left off.

The main new complexity here is we'd need a way to mark the point a compilation is at, and be able to resume there later.  If we only marked/resumed between compilation passes this would be easy, but that's probably insufficient for long passes like regalloc.  Most of the passes, including regalloc, are either rpo traversals or worklist based, so it should be easy to mark where in the pass we are (remember the block we're at for rpo, or remember the worklist), and to resume we'd need to make sure the pass state is on the heap rather than the stack.

Additionally, I don't think there's any particularly good reason why we only let one thread do background Ion compilation at a time.  We could increase that but should still make sure we don't have so many compilations active that they saturate all the cores (i.e. so the main thread can always make progress without having to fight over a core).  It seems like if we permitted multiple compilation threads but only allowed one at a time to compile big scripts, we could get most of the benefits of this patch for machines with > 2 cores.
Attachment #8439143 - Flags: feedback?(bhackett1024) → feedback-
Since it is quite related to what I'm doing and not that hard. I've split the worklist into a small/large worktask. This should already ameliorate the sitation a bit.

I decided on 120 blocks to be a nice cut-off. Most tasks are completed in less than 1ms. Also most human written code is smaller than 120 blocks. So the worktask is mostly doing largish asmjsish tasks. Which can take a long time.
Attachment #8439143 - Attachment is obsolete: true
Attachment #8444483 - Flags: review?(bhackett1024)
Comment on attachment 8444483 [details] [diff] [review]
Have two types of ion compilation queues. (big/small tasks)

Review of attachment 8444483 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, though I think it would be cleaner, more explicit and more flexible vis a vis future tuning if we kept the single worklist and only used GetIonBuilderType in canStartIonCompile (to allow a second thread to compile a small builder if a big builder is already in progress) and handleIonWorkload (to prioritize compiling small builders over big builders).

::: js/src/vm/HelperThreads.cpp
@@ +538,5 @@
>      JS_ASSERT(isLocked());
>      return !asmJSWorklist().empty() && !numAsmJSFailedJobs;
>  }
>  
> +#define SMALL_IONBUILDER 120

static const size_t
Attachment #8444483 - Flags: review?(bhackett1024)
1) Can you try using priorities on threads

I tried, but there are only 3 priority levels available. Also the documentation says it can just ignore the value, without giving which platforms it doesn't honor it. It is a bit a best effort thing, but can and might be discarded.
So I think it would be distastrous to depend on this. It might even cause new discrepancies between platforms that are hard to fix.

2) What about saving the state in IonBuilder itself. So that we can just stop an ionbuilder. And use that same thread to start another ionbuilder. When ready, just take the old ionbuilder again and it will proceed where it left off?

I'm personally not in favor of this. It might be very easy to do when only doing it between optimization passes. But some passes really take a long time. E.g. regalloc/gvn/... It looks really ugly to do it that way. I talked with Jan about it and he is also not a real fan of this.

3) What did I do?

I decreased the scope a bit. (I took away the timed switching of threads. Which might have helped too, but is not the main thing I need for the blocked bugs).

The starting/stopping of an IonBuilder is confined in the canStartIonXXX and handleIonWorkload now. It has 5 extra threads for disponal. When a task arrives with a higher priority the thread is put to sleep and a new task is started on one of the extra threads. At all times only CpuCount threads are active. All other threads are either disabled or sleeping.
Attachment #8444519 - Flags: review?(bhackett1024)
Comment on attachment 8444519 [details] [diff] [review]
Enable delaying an ion compilation task

Review of attachment 8444519 [details] [diff] [review]:
-----------------------------------------------------------------

Per IRC discussion, this is better but I think it is still too complicated.  I don't think the DISABLED mechanism is necessary since it seems to exist to keep too many threads from being active at once, but the canStart methods can just as easily control this.  Also, the new lock and cvar in MIRGenerator seems unnecessary and it isn't clear what it's relation to the HelperThreadState locks are.

I'll attach a patch in a minute which should allow compilation threads to pause each other with a pretty simple synchronization mechanism that fits in well with the existing code.  A new cvar is added to HelperThreadState which is waited on by threads which have been told to pause.  When a new compilation job is added it can indicate to an active thread that it needs to pause, and thereby restrict the number of unpaused compilation threads at a time.  This patch permits MaximumIonCompilationThreads() compilation threads at a time, where the builders that are being actively compiled are the ones with the highest priority as determined by IonBuilderHasHigherPriority() (which has a lot of freedom in its behavior, i.e. the relative priorities of builders can shift over time as use counts change).
Attachment #8444519 - Flags: review?(bhackett1024)
Posted patch pause cvar patch (obsolete) — Splinter Review
Comment on attachment 8444733 [details] [diff] [review]
pause cvar patch

Review of attachment 8444733 [details] [diff] [review]:
-----------------------------------------------------------------

I had a quick look and now I understand what you mean by having the pauseLock in HelperThreads.cpp. I agree this is better.

But I'm kinda terrified/surprised of the idea of removing the "disabled threads" part.
I'm not entirely convinced we can just leave that part out.
The wisdom behind it is that having X running threads == CPU count cores, is most optimimal.
I mostly think this is because the threads eventually will be able to saturate one CPU core
and will not have to switch between cores, which might happen if we have more active running cores.

Also given the current code. There is not much different with having the "disabled threads" added.
It only requires adding another lock,
and the following rule:
- waking a thread, means you need to disable/sleep another.
- putting a thread to bed, means you need to awake a disable thread.
(In reply to Hannes Verschore [:h4writer] from comment #12)
> But I'm kinda terrified/surprised of the idea of removing the "disabled
> threads" part.
> I'm not entirely convinced we can just leave that part out.
> The wisdom behind it is that having X running threads == CPU count cores, is
> most optimimal.
> I mostly think this is because the threads eventually will be able to
> saturate one CPU core
> and will not have to switch between cores, which might happen if we have
> more active running cores.

Or another solution would be to have all "canStartXXX" count how many active tasks there are and report false in case there are too many. Which would do exactly the same, but no need for an additional lock?
(In reply to Hannes Verschore [:h4writer] from comment #12)
> But I'm kinda terrified/surprised of the idea of removing the "disabled
> threads" part.
> I'm not entirely convinced we can just leave that part out.
> The wisdom behind it is that having X running threads == CPU count cores, is
> most optimimal.
> I mostly think this is because the threads eventually will be able to
> saturate one CPU core
> and will not have to switch between cores, which might happen if we have
> more active running cores.

Yes, but there are considerations here besides the optimal use of the machine's cores.  We also want to make sure that different parts of the system aren't being starved; this bug is basically about avoiding starvation specifically for Ion compilation, without using more cores than are desired.  But other users can be starved by the helpers, both in the current state of things and with your patch.

For example, currently, if we do an off thread asm.js parse it will saturate all remaining helper threads with asm.js compilation jobs, while the main thread can still browse normally.  If the main thread triggers an off thread compression then that compression task will not end up running until the asm.js parse has completely finished, and the main thread might end up being blocked waiting for that compression task to run.  The patch I attached avoids this by having asm.js jobs only use up to cpuCount threads, and if a compression or parse task or whatever is triggered it will use one of the remaining helper threads, overutilizing the threads but avoiding starvation.  I think this is a worthwhile trade/off.
(In reply to Brian Hackett (:bhackett) from comment #14)
> Yes, but there are considerations here besides the optimal use of the
> machine's cores.  We also want to make sure that different parts of the
> system aren't being starved; this bug is basically about avoiding starvation
> specifically for Ion compilation, without using more cores than are desired.
> But other users can be starved by the helpers, both in the current state of
> things and with your patch.

Makes sense. I'm ok with this patch. It feels like doing most things I wanted to accomplish too. How do we proceed? Do you make it review ready? Should I take the patch and finish it? (In that case I assume I should request review to e.g. jandem, since you actually made the patch and I will only by polishing it...)
(In reply to Brian Hackett (:bhackett) from comment #14)
> For example, currently, if we do an off thread asm.js parse it will saturate
> all remaining helper threads with asm.js compilation jobs, while the main
> thread can still browse normally.  If the main thread triggers an off thread
> compression then that compression task will not end up running until the
> asm.js parse has completely finished, and the main thread might end up being
> blocked waiting for that compression task to run.

IIUC, the only reason that can happen is because asm.js is handled first in the if/elseif chain in HelperThread::threadLoop.  Is that right or is there some other hard block?  Assuming it's just the if/elseif: could this be fixed more generally by rotating what job queue gets serviced first.  It seems like this would be useful in the future as we do more with helper tasks and introduce more opportunities for starvation.
(In reply to Hannes Verschore [:h4writer] from comment #15)
> Makes sense. I'm ok with this patch. It feels like doing most things I
> wanted to accomplish too. How do we proceed? Do you make it review ready?
> Should I take the patch and finish it? (In that case I assume I should
> request review to e.g. jandem, since you actually made the patch and I will
> only by polishing it...)

Yeah, can you finish the patch up and ask jandem or billm for review?  I tested it locally just by running it through jit-tests --ion-eager but didn't do any performance testing or anything (plus the IonBuilder priority comparison and maximum allowable compilation threads functions are just placeholders).
(In reply to Luke Wagner [:luke] from comment #16)
> (In reply to Brian Hackett (:bhackett) from comment #14)
> > For example, currently, if we do an off thread asm.js parse it will saturate
> > all remaining helper threads with asm.js compilation jobs, while the main
> > thread can still browse normally.  If the main thread triggers an off thread
> > compression then that compression task will not end up running until the
> > asm.js parse has completely finished, and the main thread might end up being
> > blocked waiting for that compression task to run.
> 
> IIUC, the only reason that can happen is because asm.js is handled first in
> the if/elseif chain in HelperThread::threadLoop.  Is that right or is there
> some other hard block?  Assuming it's just the if/elseif: could this be
> fixed more generally by rotating what job queue gets serviced first.  It
> seems like this would be useful in the future as we do more with helper
> tasks and introduce more opportunities for starvation.

Changing the if/else chain would address a lot of this problem but wouldn't be a complete fix.  If a compression or other task comes in while all the helper threads are busy with long running asm.js or Ion compilations then that new task will need to wait until one of the existing tasks finishes.

I think the patch in this bug will do a good job addressing the starvation issue by trying to avoid saturating all the helper threads.  Right now the only way we are likely to have many jobs running at once is through asm.js compilation, though after this bug Ion compilation could cause the behavior too.  There will only ever be one GC helper task per runtime, and only one compression task per parsing thread.  Off thread parsing could I guess lead to a lot of tasks running at once, though I don't know if that would happen in practice due to the speed of parsing vs. the latency required to initially fetch the parsed script from the network / disk.
(In reply to Brian Hackett (:bhackett)
asm.js parallel compilation throttles itself by maintaining an upper bound on outstanding compilation jobs, so you could *almost* fix the problem (without even doing what I suggested in comment 16) just by changing the current limit (numParallelJobs in CheckFunctionsParallel) from (threadCount + 1) to (Min(cpuCount, threadCount)).  (Regardless, numParallel should be updated to reflect the changes in this patch.)  Technically, this only limits the number of asm.js compilation jobs from a *single* asm.js module compilation; there could be multiple async asm.js compilations in progress which could clog all the threads, but this seems at least as rare as the other cases you mentioned.
This is more or less the patch Brian gave. I only adjusted the priority function.
Attachment #8444483 - Attachment is obsolete: true
Attachment #8444519 - Attachment is obsolete: true
Attachment #8444733 - Attachment is obsolete: true
Attachment #8447178 - Flags: review?(jdemooij)
(In reply to Hannes Verschore [:h4writer] from comment #20)
Again, from comment 19, I don't think we need to change canStartAsmJSCompile() if we change numParallelJobs in AsmJS.cpp (which we should do anyway as it will create too many threads with this patch).
Updated with comments of luke.
Attachment #8447178 - Attachment is obsolete: true
Attachment #8447178 - Flags: review?(jdemooij)
Attachment #8447286 - Flags: review?(jdemooij)
Comment on attachment 8447286 [details] [diff] [review]
Make it possible to pause ion tasks

Review of attachment 8447286 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with nits below addressed. Also requesting review from Luke as this touches asm.js code.

::: js/src/jit/MIRGenerator.h
@@ +166,1 @@
>      mozilla::Atomic<bool, mozilla::Relaxed> cancelBuild_;

Nit: MIRGraph.cpp has:

 cancelBuild_(false),
 pauseBuild_(nullptr),

I'm pretty sure some compilers will warn about this; just add pauseBuild after cancelBuild.

::: js/src/vm/HelperThreads.cpp
@@ +575,5 @@
> +    if (first->script()->hasIonScript() != second->script()->hasIonScript())
> +        return !first->script()->hasIonScript();
> +
> +    // Lastly depend on the useCount.
> +    return first->script()->getUseCount() > second->script()->getUseCount();

Nit: // A higher useCount indicates a higher priority.

@@ +586,5 @@
> +}
> +
> +jit::IonBuilder *
> +GlobalHelperThreadState::highestPriorityPendingIonCompile(bool remove /* = false */)
> +{

JS_ASSERT(isLocked()); here.

@@ +606,5 @@
> +}
> +
> +HelperThread *
> +GlobalHelperThreadState::lowestPriorityUnpausedIonCompileAtThreshold()
> +{

And here.

@@ +626,5 @@
> +}
> +
> +HelperThread *
> +GlobalHelperThreadState::highestPriorityPausedIonCompile()
> +{

And here.

@@ +1334,5 @@
>  
> +void
> +js::PauseCurrentHelperThread()
> +{
> +}

Nit: add a MOZ_ASSUME_UNREACHABLE or MOZ_CRASH here, like the previous method; AFAICS it's only called in threadsafe builds (please double-check this).

::: js/src/vm/HelperThreads.h
@@ +85,5 @@
>      // Runtimes which have sweeping / allocating work to do.
>      GCHelperStateVector gcHelperWorklist_;
>  
>    public:
> +    size_t maximumIonCompilationThreads() {

Nit: make this const: size_t maximumIonCompilationThreads() const {

Also, I think maxIonCompilationThreads is shorter and matches what we do elsewhere (maxInlineDepth, maxMallocBytes etc)

Same for the AsmJS one.

@@ +114,5 @@
>          // For notifying threads doing work that they may be able to make progress.
> +        PRODUCER,
> +
> +        // For notifying threads doing work which have paused that they may be
> +        // able to resume making progress.

Nit: I had to read this sentence twice. Maybe:

// For notifying threads paused while doing work that they may be able to
// resume making progress.
Attachment #8447286 - Flags: review?(luke)
Attachment #8447286 - Flags: review?(jdemooij)
Attachment #8447286 - Flags: review+
Blocks: 1032160
Attachment #8447286 - Flags: review?(luke) → review+
Comment on attachment 8448135 [details] [diff] [review]
Tracelogger: Mark the time a thread is paused

Review of attachment 8448135 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8448135 - Flags: review?(jdemooij) → review+
Depends on: 1032930
Status: NEW → RESOLVED
Closed: 5 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Hannes, since this is complete, would it be possible to move forward on the plan to try to remove the script size limit? (as also discussed on bug 1010417)
(In reply to Alon Zakai (:azakai) from comment #30)
> Hannes, since this is complete, would it be possible to move forward on the
> plan to try to remove the script size limit? (as also discussed on bug
> 1010417)

Sorry, I haven't been able to follow-up on it yet. Other things needed my attention. I'll try to squeeze it in. It isn't that hard anymore to finally finish this ;).
Cool, thanks. Not urgent in any way, of course, but will give some nice speedups on non-asm, and in general one less thing to worry about in that area.
You need to log in before you can comment on or make changes to this bug.