Closed
Bug 1402958
Opened 7 years ago
Closed 7 years ago
Extremely slow page startup due to js::str_normalize
Categories
(Core :: JavaScript Engine, defect, P2)
Core
JavaScript Engine
Tracking
()
Tracking | Status | |
---|---|---|
firefox58 | --- | fixed |
People
(Reporter: azakai, Assigned: bhackett1024)
Details
(Keywords: perf)
Attachments
(4 files)
Testcase: http://138.197.196.76/Emscripten/ClothUIWASM/ Reported to emscripten in https://github.com/kripken/emscripten/issues/5591 STR: Load the testcase link. In Chrome it very quickly in a few seconds finishes startup and shows a cloth animation. In Firefox it gets stuck with the spinner for a very very long time (minutes). Profiling this using perf on linux nightly, I see 65.79% JS Helper libxul.so [.] js::str_normalize 7.08% JS Helper libxul.so [.] SkRecorder::onDrawPosTextH 2.73% JS Helper libxul.so [.] JS_CopyPropertiesFrom 0.48% JS Helper libxul.so [.] GetPM and with -g the execution path is - 90.15% JSObject::enclosingEnvironment - 89.79% sh::TIntermConstantUnion::foldBinary - 89.76% sh::TIntermConstantUnion::foldBinary - 89.70% color_xform_RGBA<(SrcFormat)0, (DstFormat)7, (SkAlphaType)3, (ColorSpaceMatch)0> - 89.70% color_xform_RGBA<(SrcFormat)0, (DstFormat)6, (SkAlphaType)3, (ColorSpaceMatch)0> - 88.55% SkState_Shader_Blitter<StateF16>::blitRect - 87.09% std::vector<mozilla::SdpFingerprintAttributeList::Fingerprint, std::allocator<mozilla::SdpFingerprintAttributeList::Fingerprint> >::operator= - 75.59% js::str_concat - 74.02% GetObjectSlotNameFunctor::operator() - 70.82% js::str_normalize So maybe that spinner keeps rendering text, and str_normalize is slow? Also possible this is a graphics bug, and those graphics methods shouldn't be calling into JS so much?
Reporter | ||
Comment 1•7 years ago
|
||
The symptoms here have changed a little over the last few days. The problem I see now is 1. Load the testcase 2. Wait til it shows the cloth UI for a few seconds 3. Close the tab It *still* uses 100% CPU after closing it, as well as continues allocating more memory constantly. If I open multiple of those and close them all, I see multiple cores being used, basically it sends the content process into a very bad state - not hanging, but very busy with unnecessary work that consumes time and increasing memory. Profiling one of those content processes in the bad state, I see stuff like what is pasted below. Looks like wasm ION compilation, specifically register allocation? Perhaps the reason the symptoms just changed are that wasm baseline was turned on? So an existing compilation bug went from the main thread to a background thread (while the page runs in baseline)? (Side note, the previous profiling info from the original comment may have been wrong, I think I may have profiled the parent process by mistake.) + 59.76% 0.00% JS Helper libxul.so [.] _ZN2js6detail16ThreadTrampolineIRFvPvEJPNS_12HelperThreadEEE5StartES2_ + 59.76% 0.00% JS Helper libpthread-2.23.so [.] start_thread + 40.42% 0.00% JS Helper libxul.so [.] _ZN2js3jit21BacktrackingAllocator2goEv + 40.42% 0.00% JS Helper libxul.so [.] _ZN2js3jit11GenerateLIREPNS0_12MIRGeneratorE + 40.42% 0.00% JS Helper libxul.so [.] _ZN2js4wasm19IonCompileFunctionsERKNS0_17ModuleEnvironmentERNS_9LifoAllocER + 40.42% 0.00% JS Helper libxul.so [.] _ZL18ExecuteCompileTaskPN2js4wasm11CompileTaskEPN7mozilla9UniquePtrIA_cN2JS + 40.42% 0.00% JS Helper libxul.so [.] _ZN2js4wasm34ExecuteCompileTaskFromHelperThreadEPNS0_11CompileTaskE + 40.42% 0.00% JS Helper libxul.so [.] _ZN2js12HelperThread18handleWasmWorkloadERNS_25AutoLockHelperThreadStateENS + 40.42% 0.00% JS Helper libxul.so [.] _ZN2js12HelperThread10threadLoopEv.cold.615 + 40.42% 0.00% JS Helper libxul.so [.] _ZN2js3jit21BacktrackingAllocator17buildLivenessInfoEv + 40.42% 39.96% JS Helper libxul.so [.] _ZN2js3jit15VirtualRegister15addInitialRangeERNS0_13TempAllocatorENS0_12Cod
Reporter | ||
Comment 2•7 years ago
|
||
This reproduces in the shell as well. Attached is a testcase. STR: js --no-wasm-baseline a.js ClothUI.wasm It takes at least minutes to compile this. Not sure if it eventually finishes or not. So there seem to be two issues here: 1. This testcase compiles very very slowly in ion. 2. When we run it in baseline and compile in ion on the background, that compilation continues to run even though the tab has been closed.
Reporter | ||
Comment 3•7 years ago
|
||
Reporter | ||
Comment 4•7 years ago
|
||
The problem happens in function $5463, which is indeed quite large. Here is a much smaller wasm file with the same issue (relevant function is now $61).
Comment 5•7 years ago
|
||
(In reply to Alon Zakai (:azakai) from comment #2) > 2. When we run it in baseline and compile in ion on the background, that > compilation continues to run even though the tab has been closed. To close off this particular thread: that is by design, since we already break wasm compilation into chunks and each chunk is expected to (ion-)compile in a reasonable amount of time, absent pathological behavior (like here). There are no interrupt mechanisms, we just don't compile any remaining chunks once we determine we're shutting down.
Comment 6•7 years ago
|
||
Ran Gecko Profiler on the page, which got me this: https://perfht.ml/2xPDwus but that does not seem to highlight the culprit thread here, and the slow thread is outside Gecko Profiler's catch (or my configuration capabilities, filed https://github.com/devtools-html/perf.html/issues/608)
Comment 7•7 years ago
|
||
I see that on current Firefox Nightly, the page does load up instantaneously. Ran mozregression on this, which led to https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9e7267c06e4e7fb037e539809b409519ea3d63c0&tochange=40ffba3a6034e0cc9a1a8e4fc9e739adb07f9e04 so it looks like the recently landed Wasm compilation tiering masks this bug: in most recent Firefox Nightly, the page does load immediately with help of tiering, and if one sets the pref javascript.options.wasm_baselinejit to false, then it uncovers the startup hang again, i.e. the effect that Alon was seeing with the background compilation taking long. Flipping the pref puts that compilation back on the foreground in a sense, to help diagnose the issue.
Comment 8•7 years ago
|
||
Makes sense. We'll look into this and see if we can find the pathological algorithm here.
Comment 9•7 years ago
|
||
Yeah, I can confirm easily with js --no-wasm-baseline --no-threads a.js f.wasm which runs at least 7 minutes when I killed it. This seems to hit a hardcore quadratic (or worse) case. The function in question is just 133876 bytes and all of the 7 minutes were in buildLivenessInfo(). Brian, do you suppose you could look and see what's exploding here?
Flags: needinfo?(bhackett1024)
Updated•7 years ago
|
Assignee | ||
Comment 10•7 years ago
|
||
The function being compiled here has 18 thousand virtual registers with over 17 million distinct live ranges (a span of LIR instructions where a vreg is live); despite being relatively small the function has elicited quadratic complexity in the basic structures needed to accurately keep track of those live ranges. The backtracking allocator makes things worse by merging live ranges for a vreg with a linear time algorithm --- turning the quadratic behavior cubic --- but even if this was fixed the allocator would still choke trying to allocate this graph. The attached patch puts a cap on the number of live ranges the allocator will accurately keep track of in its liveness computations. Above this cap, distinct ranges where a vreg is live might end up being merged together. The resulting allocation will be worse, in that the allocator will need to make sure the vreg's value is stored somewhere in places where it isn't actually live, but I doubt this will make any measurable performance difference since all these vregs are almost certainly being spilled shortly after they are defined. The 100k cap here is about 50 times larger than the largest live range count on octane, and allows this testcase to complete in 1.2 seconds.
Assignee: nobody → bhackett1024
Flags: needinfo?(bhackett1024)
Attachment #8916431 -
Flags: review?(sunfish)
Comment 11•7 years ago
|
||
Thanks Brian!
Comment 12•7 years ago
|
||
Comment on attachment 8916431 [details] [diff] [review] patch Review of attachment 8916431 [details] [diff] [review]: ----------------------------------------------------------------- Sounds reasonable to me.
Attachment #8916431 -
Flags: review?(sunfish) → review+
Comment 13•7 years ago
|
||
Pushed by bhackett@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3c86b72b8700 Put a cap on the number of distinct live ranges the backtracking allocator can handle, r=sunfish.
Comment 14•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3c86b72b8700
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Updated•2 years ago
|
Performance Impact: --- → P2
Whiteboard: [qf:p2]
You need to log in
before you can comment on or make changes to this bug.
Description
•