Closed Bug 1402958 Opened 7 years ago Closed 7 years ago

Extremely slow page startup due to js::str_normalize

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Performance Impact medium
Tracking Status
firefox58 --- fixed

People

(Reporter: azakai, Assigned: bhackett1024)

Details

(Keywords: perf)

Attachments

(4 files)

Testcase: http://138.197.196.76/Emscripten/ClothUIWASM/

Reported to emscripten in https://github.com/kripken/emscripten/issues/5591

STR: Load the testcase link. In Chrome it very quickly in a few seconds finishes startup and shows a cloth animation. In Firefox it gets stuck with the spinner for a very very long time (minutes).

Profiling this using perf on linux nightly, I see

65.79%  JS Helper        libxul.so           [.] js::str_normalize
 7.08%  JS Helper        libxul.so           [.] SkRecorder::onDrawPosTextH
 2.73%  JS Helper        libxul.so           [.] JS_CopyPropertiesFrom
 0.48%  JS Helper        libxul.so           [.] GetPM

and with -g the execution path is

- 90.15% JSObject::enclosingEnvironment
   - 89.79% sh::TIntermConstantUnion::foldBinary
     - 89.76% sh::TIntermConstantUnion::foldBinary
       - 89.70% color_xform_RGBA<(SrcFormat)0, (DstFormat)7, (SkAlphaType)3, (ColorSpaceMatch)0>
         - 89.70% color_xform_RGBA<(SrcFormat)0, (DstFormat)6, (SkAlphaType)3, (ColorSpaceMatch)0>
           - 88.55% SkState_Shader_Blitter<StateF16>::blitRect
             - 87.09% std::vector<mozilla::SdpFingerprintAttributeList::Fingerprint, std::allocator<mozilla::SdpFingerprintAttributeList::Fingerprint> >::operator=
               - 75.59% js::str_concat
                 - 74.02% GetObjectSlotNameFunctor::operator()
                   - 70.82% js::str_normalize

So maybe that spinner keeps rendering text, and str_normalize is slow?

Also possible this is a graphics bug, and those graphics methods shouldn't be calling into JS so much?
The symptoms here have changed a little over the last few days. The problem I see now is

1. Load the testcase
2. Wait til it shows the cloth UI for a few seconds
3. Close the tab

It *still* uses 100% CPU after closing it, as well as continues allocating more memory constantly. If I open multiple of those and close them all, I see multiple cores being used, basically it sends the content process into a very bad state - not hanging, but very busy with unnecessary work that consumes time and increasing memory.

Profiling one of those content processes in the bad state, I see stuff like what is pasted below. Looks like wasm ION compilation, specifically register allocation? Perhaps the reason the symptoms just changed are that wasm baseline was turned on? So an existing compilation bug went from the main thread to a background thread (while the page runs in baseline)?

(Side note, the previous profiling info from the original comment may have been wrong, I think I may have profiled the parent process by mistake.)

+   59.76%     0.00%  JS Helper        libxul.so                [.] _ZN2js6detail16ThreadTrampolineIRFvPvEJPNS_12HelperThreadEEE5StartES2_
+   59.76%     0.00%  JS Helper        libpthread-2.23.so       [.] start_thread
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js3jit21BacktrackingAllocator2goEv
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js3jit11GenerateLIREPNS0_12MIRGeneratorE
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js4wasm19IonCompileFunctionsERKNS0_17ModuleEnvironmentERNS_9LifoAllocER
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZL18ExecuteCompileTaskPN2js4wasm11CompileTaskEPN7mozilla9UniquePtrIA_cN2JS
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js4wasm34ExecuteCompileTaskFromHelperThreadEPNS0_11CompileTaskE
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js12HelperThread18handleWasmWorkloadERNS_25AutoLockHelperThreadStateENS
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js12HelperThread10threadLoopEv.cold.615
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js3jit21BacktrackingAllocator17buildLivenessInfoEv
+   40.42%    39.96%  JS Helper        libxul.so                [.] _ZN2js3jit15VirtualRegister15addInitialRangeERNS0_13TempAllocatorENS0_12Cod
Attached file ClothUI.wasm
This reproduces in the shell as well. Attached is a testcase. STR:

js --no-wasm-baseline a.js ClothUI.wasm 

It takes at least minutes to compile this. Not sure if it eventually finishes or not.

So there seem to be two issues here:

1. This testcase compiles very very slowly in ion.
2. When we run it in baseline and compile in ion on the background, that compilation continues to run even though the tab has been closed.
Attached file a.js
Attached file f.wasm
The problem happens in function $5463, which is indeed quite large. Here is a much smaller wasm file with the same issue (relevant function is now $61).
(In reply to Alon Zakai (:azakai) from comment #2)

> 2. When we run it in baseline and compile in ion on the background, that
> compilation continues to run even though the tab has been closed.

To close off this particular thread: that is by design, since we already break wasm compilation into chunks and each chunk is expected to (ion-)compile in a reasonable amount of time, absent pathological behavior (like here).  There are no interrupt mechanisms, we just don't compile any remaining chunks once we determine we're shutting down.
Ran Gecko Profiler on the page, which got me this: https://perfht.ml/2xPDwus

but that does not seem to highlight the culprit thread here, and the slow thread is outside Gecko Profiler's catch (or my configuration capabilities, filed https://github.com/devtools-html/perf.html/issues/608)
I see that on current Firefox Nightly, the page does load up instantaneously. Ran mozregression on this, which led to 

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9e7267c06e4e7fb037e539809b409519ea3d63c0&tochange=40ffba3a6034e0cc9a1a8e4fc9e739adb07f9e04

so it looks like the recently landed Wasm compilation tiering masks this bug: in most recent Firefox Nightly, the page does load immediately with help of tiering, and if one sets the pref javascript.options.wasm_baselinejit to false, then it uncovers the startup hang again, i.e. the effect that Alon was seeing with the background compilation taking long. Flipping the pref puts that compilation back on the foreground in a sense, to help diagnose the issue.
Makes sense.  We'll look into this and see if we can find the pathological algorithm here.
Yeah, I can confirm easily with
  js --no-wasm-baseline --no-threads a.js f.wasm
which runs at least 7 minutes when I killed it.

This seems to hit a hardcore quadratic (or worse) case. The function in question is just 133876 bytes and all of the 7 minutes were in buildLivenessInfo().  Brian, do you suppose you could look and see what's exploding here?
Flags: needinfo?(bhackett1024)
Priority: -- → P2
Whiteboard: [qf:p2]
Keywords: perf
Attached patch patchSplinter Review
The function being compiled here has 18 thousand virtual registers with over 17 million distinct live ranges (a span of LIR instructions where a vreg is live); despite being relatively small the function has elicited quadratic complexity in the basic structures needed to accurately keep track of those live ranges.  The backtracking allocator makes things worse by merging live ranges for a vreg with a linear time algorithm --- turning the quadratic behavior cubic --- but even if this was fixed the allocator would still choke trying to allocate this graph.

The attached patch puts a cap on the number of live ranges the allocator will accurately keep track of in its liveness computations.  Above this cap, distinct ranges where a vreg is live might end up being merged together.  The resulting allocation will be worse, in that the allocator will need to make sure the vreg's value is stored somewhere in places where it isn't actually live, but I doubt this will make any measurable performance difference since all these vregs are almost certainly being spilled shortly after they are defined.

The 100k cap here is about 50 times larger than the largest live range count on octane, and allows this testcase to complete in 1.2 seconds.
Assignee: nobody → bhackett1024
Flags: needinfo?(bhackett1024)
Attachment #8916431 - Flags: review?(sunfish)
Thanks Brian!
Comment on attachment 8916431 [details] [diff] [review]
patch

Review of attachment 8916431 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds reasonable to me.
Attachment #8916431 - Flags: review?(sunfish) → review+
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c86b72b8700
Put a cap on the number of distinct live ranges the backtracking allocator can handle, r=sunfish.
https://hg.mozilla.org/mozilla-central/rev/3c86b72b8700
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Performance Impact: --- → P2
Whiteboard: [qf:p2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: