Closed Bug 1402958 Opened 7 years ago Closed 7 years ago

Extremely slow page startup due to js::str_normalize

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla58

Project Flags:

Performance Impact

medium

Tracking Flags:

Tracking

Status

firefox58

---

fixed

People

(Reporter: azakai, Assigned: bhackett1024)

Details

(Keywords: perf)

Attachments

(4 files)

ClothUI.wasm 7 years ago Alon Zakai (:azakai) 6.27 MB, application/octet-stream		Details
a.js 7 years ago Alon Zakai (:azakai) 696 bytes, application/x-javascript		Details
f.wasm 7 years ago Alon Zakai (:azakai) 158.04 KB, application/octet-stream		Details
patch 7 years ago Brian Hackett [Laid off!] 5.91 KB, patch	sunfish : review+	Details \| Diff \| Splinter Review

Alon Zakai (:azakai)

Reporter

Description

•

7 years ago

Testcase: http://138.197.196.76/Emscripten/ClothUIWASM/

Reported to emscripten in https://github.com/kripken/emscripten/issues/5591

STR: Load the testcase link. In Chrome it very quickly in a few seconds finishes startup and shows a cloth animation. In Firefox it gets stuck with the spinner for a very very long time (minutes).

Profiling this using perf on linux nightly, I see

65.79%  JS Helper        libxul.so           [.] js::str_normalize
 7.08%  JS Helper        libxul.so           [.] SkRecorder::onDrawPosTextH
 2.73%  JS Helper        libxul.so           [.] JS_CopyPropertiesFrom
 0.48%  JS Helper        libxul.so           [.] GetPM

and with -g the execution path is

- 90.15% JSObject::enclosingEnvironment
   - 89.79% sh::TIntermConstantUnion::foldBinary
     - 89.76% sh::TIntermConstantUnion::foldBinary
       - 89.70% color_xform_RGBA<(SrcFormat)0, (DstFormat)7, (SkAlphaType)3, (ColorSpaceMatch)0>
         - 89.70% color_xform_RGBA<(SrcFormat)0, (DstFormat)6, (SkAlphaType)3, (ColorSpaceMatch)0>
           - 88.55% SkState_Shader_Blitter<StateF16>::blitRect
             - 87.09% std::vector<mozilla::SdpFingerprintAttributeList::Fingerprint, std::allocator<mozilla::SdpFingerprintAttributeList::Fingerprint> >::operator=
               - 75.59% js::str_concat
                 - 74.02% GetObjectSlotNameFunctor::operator()
                   - 70.82% js::str_normalize

So maybe that spinner keeps rendering text, and str_normalize is slow?

Also possible this is a graphics bug, and those graphics methods shouldn't be calling into JS so much?

Alon Zakai (:azakai)

Reporter

Comment 1

•

7 years ago

The symptoms here have changed a little over the last few days. The problem I see now is

1. Load the testcase
2. Wait til it shows the cloth UI for a few seconds
3. Close the tab

It *still* uses 100% CPU after closing it, as well as continues allocating more memory constantly. If I open multiple of those and close them all, I see multiple cores being used, basically it sends the content process into a very bad state - not hanging, but very busy with unnecessary work that consumes time and increasing memory.

Profiling one of those content processes in the bad state, I see stuff like what is pasted below. Looks like wasm ION compilation, specifically register allocation? Perhaps the reason the symptoms just changed are that wasm baseline was turned on? So an existing compilation bug went from the main thread to a background thread (while the page runs in baseline)?

(Side note, the previous profiling info from the original comment may have been wrong, I think I may have profiled the parent process by mistake.)

+   59.76%     0.00%  JS Helper        libxul.so                [.] _ZN2js6detail16ThreadTrampolineIRFvPvEJPNS_12HelperThreadEEE5StartES2_
+   59.76%     0.00%  JS Helper        libpthread-2.23.so       [.] start_thread
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js3jit21BacktrackingAllocator2goEv
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js3jit11GenerateLIREPNS0_12MIRGeneratorE
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js4wasm19IonCompileFunctionsERKNS0_17ModuleEnvironmentERNS_9LifoAllocER
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZL18ExecuteCompileTaskPN2js4wasm11CompileTaskEPN7mozilla9UniquePtrIA_cN2JS
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js4wasm34ExecuteCompileTaskFromHelperThreadEPNS0_11CompileTaskE
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js12HelperThread18handleWasmWorkloadERNS_25AutoLockHelperThreadStateENS
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js12HelperThread10threadLoopEv.cold.615
+   40.42%     0.00%  JS Helper        libxul.so                [.] _ZN2js3jit21BacktrackingAllocator17buildLivenessInfoEv
+   40.42%    39.96%  JS Helper        libxul.so                [.] _ZN2js3jit15VirtualRegister15addInitialRangeERNS0_13TempAllocatorENS0_12Cod

Alon Zakai (:azakai)

Reporter

Comment 2

•

7 years ago

Attached file ClothUI.wasm — Details

This reproduces in the shell as well. Attached is a testcase. STR:

js --no-wasm-baseline a.js ClothUI.wasm 

It takes at least minutes to compile this. Not sure if it eventually finishes or not.

So there seem to be two issues here:

1. This testcase compiles very very slowly in ion.
2. When we run it in baseline and compile in ion on the background, that compilation continues to run even though the tab has been closed.

Alon Zakai (:azakai)

Reporter

Comment 3

•

7 years ago

Attached file a.js — Details

Alon Zakai (:azakai)

Reporter

Comment 4

•

7 years ago

Attached file f.wasm — Details

The problem happens in function $5463, which is indeed quite large. Here is a much smaller wasm file with the same issue (relevant function is now $61).

Lars T Hansen [:lth]

Comment 5

•

7 years ago

(In reply to Alon Zakai (:azakai) from comment #2)

> 2. When we run it in baseline and compile in ion on the background, that
> compilation continues to run even though the tab has been closed.

To close off this particular thread: that is by design, since we already break wasm compilation into chunks and each chunk is expected to (ion-)compile in a reasonable amount of time, absent pathological behavior (like here).  There are no interrupt mechanisms, we just don't compile any remaining chunks once we determine we're shutting down.

Jukka Jylänki

Comment 6

•

7 years ago

Ran Gecko Profiler on the page, which got me this: https://perfht.ml/2xPDwus

but that does not seem to highlight the culprit thread here, and the slow thread is outside Gecko Profiler's catch (or my configuration capabilities, filed https://github.com/devtools-html/perf.html/issues/608)

Jukka Jylänki

Comment 7

•

7 years ago

I see that on current Firefox Nightly, the page does load up instantaneously. Ran mozregression on this, which led to 

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9e7267c06e4e7fb037e539809b409519ea3d63c0&tochange=40ffba3a6034e0cc9a1a8e4fc9e739adb07f9e04

so it looks like the recently landed Wasm compilation tiering masks this bug: in most recent Firefox Nightly, the page does load immediately with help of tiering, and if one sets the pref javascript.options.wasm_baselinejit to false, then it uncovers the startup hang again, i.e. the effect that Alon was seeing with the background compilation taking long. Flipping the pref puts that compilation back on the foreground in a sense, to help diagnose the issue.

Luke Wagner [:luke]

Comment 8

•

7 years ago

Makes sense.  We'll look into this and see if we can find the pathological algorithm here.

Luke Wagner [:luke]

Comment 9

•

7 years ago

Yeah, I can confirm easily with
  js --no-wasm-baseline --no-threads a.js f.wasm
which runs at least 7 minutes when I killed it.

This seems to hit a hardcore quadratic (or worse) case. The function in question is just 133876 bytes and all of the 7 minutes were in buildLivenessInfo().  Brian, do you suppose you could look and see what's exploding here?

Flags: needinfo?(bhackett1024)

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

status-firefox58: --- → affected

Priority: -- → P2

Whiteboard: [qf:p2]

Firefox Bug Husbandry Bot

Updated

•

7 years ago

Keywords: perf

Brian Hackett [Laid off!]

Assignee

Comment 10

•

7 years ago

Attached patch patch — Details — Splinter Review

The function being compiled here has 18 thousand virtual registers with over 17 million distinct live ranges (a span of LIR instructions where a vreg is live); despite being relatively small the function has elicited quadratic complexity in the basic structures needed to accurately keep track of those live ranges.  The backtracking allocator makes things worse by merging live ranges for a vreg with a linear time algorithm --- turning the quadratic behavior cubic --- but even if this was fixed the allocator would still choke trying to allocate this graph.

The attached patch puts a cap on the number of live ranges the allocator will accurately keep track of in its liveness computations.  Above this cap, distinct ranges where a vreg is live might end up being merged together.  The resulting allocation will be worse, in that the allocator will need to make sure the vreg's value is stored somewhere in places where it isn't actually live, but I doubt this will make any measurable performance difference since all these vregs are almost certainly being spilled shortly after they are defined.

The 100k cap here is about 50 times larger than the largest live range count on octane, and allows this testcase to complete in 1.2 seconds.

Assignee: nobody → bhackett1024

Flags: needinfo?(bhackett1024)

Attachment #8916431 - Flags: review?(sunfish)

Luke Wagner [:luke]

Comment 11

•

7 years ago

Thanks Brian!

Dan Gohman [:sunfish]

Comment 12

•

7 years ago

Comment on attachment 8916431 [details] [diff] [review]
patch

Review of attachment 8916431 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds reasonable to me.

Attachment #8916431 - Flags: review?(sunfish) → review+

Pulsebot

Comment 13

•

7 years ago

Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3c86b72b8700
Put a cap on the number of distinct live ranges the backtracking allocator can handle, r=sunfish.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 14

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3c86b72b8700

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox58: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

2 years ago

Performance Impact: --- → P2

Whiteboard: [qf:p2]

You need to log in before you can comment on or make changes to this bug.