Closed Bug 1728781 Opened 3 months ago Closed 3 months ago

Ion hangs on web.autocad.com

Categories

(Core :: Javascript: WebAssembly, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: lth, Assigned: lth)

References

Details

Attachments

(1 file, 1 obsolete file)

Create an account on web.autocad.com. Make your browser remember your login. If necessary, navigate into the webapp until you see a folder view with a folder "samples" but go no further.

Once logged in, open about:config and disable javascript.options.wasm_optimizingjit, make sure javascript.options.wasm_baselinejit is enabled, and enable javascript.options.wasm_verbose. Close all tabs and quit.

Reopen the browser from the command line. The folder view should be displayed. In the terminal window you should see something like this:

JavaScript warning: https://web.autocad.com/fabric.c89f4d74a33e58c54283eae0ff8d554c.js, line 2: WebAssembly verbose: async compileStreaming() started
JavaScript warning: https://web.autocad.com/fabric.c89f4d74a33e58c54283eae0ff8d554c.js, line 2: WebAssembly verbose: available wasm compilers: tier1=baseline tier2=none
JavaScript warning: , line 0: WebAssembly verbose: async compile succeeded

Open the samples folder, and open a demo project, the "Drive Roller Assembly" project is suitably small. It should display after a few seconds.

Now go to about:config and enable j.o.wasm_optimizingjit and disable j.o.wasm_baselinejit. Close all tabs and quit.

Reopen the browser from the command line, etc. Observe that the line 'async compile succeeded' never appears (I waited 12 minutes at one point). If you open a process viewer or htop or somesuch, you will see that Firefox is pegging one core. If you open the samples and try to open a sample, it will never load.

Almost certainly this:

#0  0x00007ffff2bea207 in js::jit::CodePosition::operator<=(js::jit::CodePosition) const (this=0x7ffde21c3a20, other=...) at js/src/jit/RegisterAllocator.h:169
#1  js::jit::BacktrackingAllocator::tryMergeReusedRegister(js::jit::VirtualRegister&, js::jit::VirtualRegister&) (this=this@entry=0x7fffd87e5510, def=..., input=...) at js/src/jit/BacktrackingAllocator.cpp:1138
#2  0x00007ffff2be809a in js::jit::BacktrackingAllocator::mergeAndQueueRegisters() (this=this@entry=0x7fffd87e5510) at js/src/jit/BacktrackingAllocator.cpp:1273
#3  0x00007ffff2be7c0b in js::jit::BacktrackingAllocator::go() (this=0x7fffd87e5510) at js/src/jit/BacktrackingAllocator.cpp:885
#4  0x00007ffff2dce563 in js::jit::GenerateLIR(js::jit::MIRGenerator*) (mir=0x7fffd87e6090) at js/src/jit/Ion.cpp:1518
#5  0x00007ffff2f90484 in js::wasm::IonCompileFunctions(js::wasm::ModuleEnvironment const&, js::wasm::CompilerEnvironment const&, js::LifoAlloc&, mozilla::Vector<js::wasm::FuncCompileInput, 8ul, js::SystemAllocPolicy> const&, js::wasm::CompiledCode*, mozilla::UniquePtr<char [], JS::FreePolicy>*) (moduleEnv=..., compilerEnv=<optimized out>, lifo=..., inputs=..., code=code@entry=0x7fffc318f278, error=0x7fffd87e7b18) at js/src/wasm/WasmIonCompile.cpp:5826
#6  0x00007ffff2f796a7 in ExecuteCompileTask(js::wasm::CompileTask*, mozilla::UniquePtr<char [], JS::FreePolicy>*) (task=task@entry=0x7fffc318ef60, error=0x7ffdf2f80068, error@entry=0x7fffd87e7b18) at js/src/wasm/WasmGenerator.cpp:715
#7  0x00007ffff2f79504 in js::wasm::CompileTask::runHelperThreadTask(js::AutoLockHelperThreadState&) (this=0x7fffc318ef60, lock=...) at js/src/wasm/WasmGenerator.cpp:747
#8  0x00007ffff277e77d in js::GlobalHelperThreadState::runTaskLocked(js::HelperThreadTask*, js::AutoLockHelperThreadState&) (this=this@entry=0x7fffda707000, task=0x7fffc318ef60, locked=...) at js/src/vm/HelperThreads.cpp:2840
#9  0x00007ffff277e3b9 in js::GlobalHelperThreadState::runOneTask(js::AutoLockHelperThreadState&) (this=0x7fffda707000, lock=...) at js/src/vm/HelperThreads.cpp:2809
...

The program appears never to return from mergeAndQueueRegisters. Number of virtual registers = 332053... This could just be deadly slow.

Comment on attachment 9239171 [details]
WIP: Bug 1728781 - Short-circuit the regalloc in very large graphs.

This patch papers over the problem by forcing a copy if we think the analysis would run too long, but the question is, is this the best we can do, and if it is, how should we best determine an "optimal" cutoff? (And if this is not the best we can do, how do we do better?)

Attachment #9239171 - Flags: feedback?(nicolas.b.pierron)
Attachment #9239171 - Flags: feedback?(jseward)
Attachment #9239171 - Flags: feedback?(jdemooij)
Attachment #9239171 - Flags: feedback?(iireland)

I guess the cutoff could be a function of the number of times the analysis has been run and not the size of the vreg set, though on x64, almost everything is must-reuse, so it might not matter in practice.

Comment on attachment 9239171 [details]
WIP: Bug 1728781 - Short-circuit the regalloc in very large graphs.

I think it would make sense to apply a similar limit, not based on the total number of virtual registers, as this would apply to everything after the threshold is reached.

Instead, I suggest to apply a threshold based on the size of the LiveRange, either when attempting to merge, or after creating the a larger LiveRange after merging multiple: https://searchfox.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#1096,1179-1180

Attachment #9239171 - Flags: feedback?(nicolas.b.pierron) → feedback+

It seems to me important that we distinguish the cases

  1. it takes too long, but would eventually terminate if we left it long enough
  2. the allocator is in a loop and will never terminate

If we're 100% sure it's (1) then some kind of heuristic cutoff is good, at least
short term. But if there's any chance that it's (2) then we have a much more
serious problem, because then it might happen on other, smaller inputs too.

(In reply to Julian Seward [:jseward] from comment #6)

It seems to me important that we distinguish the cases

  1. it takes too long, but would eventually terminate if we left it long enough
  2. the allocator is in a loop and will never terminate

If we're 100% sure it's (1)

In this case, there's no evidence of anything else. We know Ion has some serious nonlinearity that bites wasm hard, bug 1561127 is another case in point.

See Also: → 1561127

Do not extend an input live range with a must-reuse definition if the
input range is already very large, but insert a copy. This has the
effect of splitting the live range at the definition.

The purpose of this change is to avoid creating live ranges that are
so large that they will make the compiler's nonlinear live range
processing algorithms dominate compilation time to the point where the
compiler does not, practically speaking, terminate.

(In reply to Nicolas B. Pierron [:nbp] from comment #5)

Instead, I suggest to apply a threshold based on the size of the LiveRange, either when attempting to merge, or after creating the a larger LiveRange after merging multiple: https://searchfox.org/mozilla-central/source/js/src/jit/BacktrackingAllocator.cpp#1096,1179-1180

Thanks. I've created a patch around that idea and tried to reason about why it works (though it does work). With this applied and a limit on the live range size of 1e6, I get an 8s compile time for the application on my 2017-era Xeon. With a limit of 1e5 this drops to 5s, but I felt this was possibly too restrictive.

(In reply to Lars T Hansen [:lth] from comment #9)
Maybe it's safe to set the limit to something way lower, say 1000 instead of 100,000 ?
The vast majority of live ranges are short, so this won't affect many, plus, any
live range that is 100k insns long is likely to get either split or spilled anyway.
So it doesn't sound like a big deal to skip a coalescing opportunity for it.

(In reply to Julian Seward [:jseward] from comment #10)

(In reply to Lars T Hansen [:lth] from comment #9)
Maybe it's safe to set the limit to something way lower, say 1000 instead of 100,000 ?
The vast majority of live ranges are short, so this won't affect many, plus, any
live range that is 100k insns long is likely to get either split or spilled anyway.
So it doesn't sound like a big deal to skip a coalescing opportunity for it.

Yeah, "maybe" and maybe "not many". This is the issue - how do we tell? We have poor facilities for understanding wasm performance at the moment, and the impacts of compiler changes that are not about local code generation patterns.

1e6 is very conservative to be sure, and since it resolves the problem OK it seemed like an OK compromise. But it's just an initial guess and the only data points I have are the ones posted in comment 9. I think lowering it is fine, but arguments in favor of that probably need to be backed by data about typical live ranges in a substantial corpus of code (and of course, with our luck some large live ranges will be in somebody's performance-sensitive loop).

(Remember also that ion live ranges are conservative, there may be holes and they may be larger than how we normally think about them for that reason.)

Attachment #9239171 - Flags: feedback?(iireland) → feedback+
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14ed7588e639
Do not merge in very large live ranges. r=nbp
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch
Attachment #9239171 - Attachment is obsolete: true
Attachment #9239171 - Flags: feedback?(jseward)
Attachment #9239171 - Flags: feedback?(jdemooij)
You need to log in before you can comment on or make changes to this bug.