Closed
Bug 1329019
Opened 7 years ago
Closed 7 years ago
Wasm: Too many vector grows and copies in asmMergeWith
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
mozilla58
Tracking | Status | |
---|---|---|
firefox58 | --- | fixed |
People
(Reporter: away, Assigned: luke)
References
Details
Attachments
(8 files, 3 obsolete files)
3.86 KB,
patch
|
Details | Diff | Splinter Review | |
26.16 KB,
patch
|
lth
:
review+
|
Details | Diff | Splinter Review |
3.00 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
8.42 KB,
patch
|
yury
:
review+
|
Details | Diff | Splinter Review |
13.26 KB,
patch
|
lth
:
review+
|
Details | Diff | Splinter Review |
3.53 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
168.69 KB,
patch
|
lth
:
review+
|
Details | Diff | Splinter Review |
5.90 KB,
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
In an xperf profile I noticed a lot of time spent on grows and copies in Assembler::asmMergeWith and friends. Here's a hacky patch that reserves an initial 40 megs in the masm buffer, just to demonstrate that there's room for improvement. (This is on top of the patches for bug 1329018 in my queue) Threads: 1 2 3 4 5 6 7 8 baseline: 621 441 347 305 292 283 275 268 patch: 604 436 327 292 269 262 252 244 The wins are not as prominent as when I last measured; maybe the tree has generally gotten better in the meantime. So we don't necessarily need to fix this urgently. Luke mentioned that for Cretonne there may be a re-working of the buffer handling in this area. That could be a good opportunity to reduce some of the layers of copying.
Updated•7 years ago
|
Priority: -- → P3
Assignee | ||
Comment 1•7 years ago
|
||
Taking. I have some patches that pass tests, but they turned off batching so a major regression which I need to work on next.
Assignee: nobody → luke
Assignee | ||
Comment 2•7 years ago
|
||
This patch just cleans up some stuff I noticed on the real patch: some parts of LinkData::InternalLink are now unnecessary and we can also reuse Bind() and remove the custom wasm paths that were doing the same thing.
Attachment #8906162 -
Flags: review?(lhansen)
Assignee | ||
Comment 3•7 years ago
|
||
Here's a WIP that passes wasm tests (fails some debug tests for probably simple reasons). Overall, it feels like a major simplification and cleanup, and it's also a lot more clear to me now how this will plug into cretonne. This patch provides a 25% reduction in overall baseline compile time when running with `taskset 0x3` and --cpu-count=2 by reducing the time to do parallel function compilation by 34%. Unfortunately, the simplistic approach of doing all the linking at the end loses parallelism because the linking/copying is now one big sequential operation at the end; while asmMergeWith was being done in parallel. Thus, as you add more hyperthreads/cores, the speedup gets dominated by the sequential linking. So next step is to go back to doing the linking incrementally (as CompileTasks return) which shouldn't be too bad. But, progress.
Comment 4•7 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #3) > This patch provides a 25% reduction in overall baseline compile time when > running with `taskset 0x3` and --cpu-count=2 by reducing the time to do > parallel function compilation by 34%. Oh, I *want* this!
Comment 5•7 years ago
|
||
Comment on attachment 8906162 [details] [diff] [review] simplify-internal-link Review of attachment 8906162 [details] [diff] [review]: ----------------------------------------------------------------- Nice cleanup.
Attachment #8906162 -
Flags: review?(lhansen) → review+
Assignee | ||
Comment 6•7 years ago
|
||
Updated patch WIP, almost done, just need to fill in ARM and there is now a dependency on the size estimation in bug 1380033. So the patch now shows a pretty big speedup on all --cpu-count (on Tanks) with baseline: =1 638ms -> 547ms (-14%) =2 430ms -> 314ms (-27%) =4 287ms -> 196ms (-31%) =8 260ms -> 153ms (-41%) Ion also gets better, but less pronounced. Another fun metric is comparing the ion / baseline ratio change (on Tanks): =1 11.2x -> 12.6x =2 10.0x -> 12.2x =4 6.7x -> 8.3x =8 4.5x -> 7.7x so the patch makes baseline look relatively better.
Attachment #8906164 -
Attachment is obsolete: true
Assignee | ||
Comment 7•7 years ago
|
||
Later patches hit two omissions in podResizeToFit() (I expect wasm is the only client of this function): (1) realloc don't accept a target length of 0, (2) mReserved was not being updated on shrink.
Attachment #8908191 -
Flags: review?(nfroyd)
Assignee | ||
Comment 8•7 years ago
|
||
This patch adds an overload to replaceRawBuffer() that allows providing a vector with length != capacity.
Attachment #8908193 -
Flags: review?(nfroyd)
Assignee | ||
Comment 9•7 years ago
|
||
This patch only computes the hash when debugging is enabled. This was taking like 7% of total compilation time.
Attachment #8908194 -
Flags: review?(ydelendik)
Assignee | ||
Comment 10•7 years ago
|
||
This patch splits CallSiteAndTargetVector into separate CallSiteVector and CallSiteTargetVector. This allows CallSiteVector to go straight into MetadataTier (during parallel compilation) while keeping the CallSiteTargetVector local to the ModuleGenerator (b/c it isn't needed after linking). Otherwise, the two were being split in the final sequential phase. (This matters more in conjunction with the next patch.)
Attachment #8908196 -
Flags: review?(lhansen)
Comment 11•7 years ago
|
||
Comment on attachment 8908191 [details] [diff] [review] fix-vector-resize-to-fit Review of attachment 8908191 [details] [diff] [review]: ----------------------------------------------------------------- Could we add a standalone test for this, possibly one test for each of the problematic cases? I am unsure about the statement "realloc doesn't accept a target length of 0". Would it be more accurate to say that "realloc frees the target buffer if the length is 0, which the code was not expecting"? Clearing review for my confusion below and tests. ::: mfbt/Vector.h @@ +249,5 @@ > + aV.mTail.mCapacity = aV.kInlineCapacity; > +#ifdef DEBUG > + aV.mTail.mReserved = 0; > +#endif > + return; This hunk seems correct. @@ +261,5 @@ > aV.mTail.mCapacity = aV.mLength; > +#ifdef DEBUG > + aV.mTail.mReserved = aV.mTail.mReserved < aV.mLength > + ? aV.mTail.mReserved > + : aV.mLength; Hm, so the doc comment for mBegin says: * Pointer to the buffer, be it inline or heap-allocated. Only [mBegin, * mBegin + mLength) hold valid constructed T objects. The range [mBegin + * mLength, mBegin + mCapacity) holds uninitialized memory. The range * [mBegin + mLength, mBegin + mReserved) also holds uninitialized memory * previously allocated by a call to reserve(). so mReserved should be >= mLength, and this assignment does not respect that invariant in the first case, right? (The invariant is explicitly spelled out in the documentation comment for the reserved() method.) It's not clear to me that mReserved is everywhere updated to respect this invariant, so I am certainly willing to admit my misunderstanding of this code.
Attachment #8908191 -
Flags: review?(nfroyd)
Comment 12•7 years ago
|
||
Comment on attachment 8908193 [details] [diff] [review] enhance-replace-raw-buffer Review of attachment 8908193 [details] [diff] [review]: ----------------------------------------------------------------- Again, tests would be great here...but I see we don't have any tests for replaceRawBuffer in the first place, so maybe we can just choose to keep punting. r=me; the comment below is just me wondering out loud what the right thing to do is. Interested in your thoughts here. ::: mfbt/Vector.h @@ +779,5 @@ > * N.B. This call assumes that there are no uninitialized elements in the > + * passed range [aP, aP + aLength). The range [aP + aLength, aP + > + * aCapacity) must be allocated uninitialized memory. > + */ > + void replaceRawBuffer(T* aP, size_t aLength, size_t aCapacity); Do we think an overload is sufficient here, or should this be called something like replaceOversizedRawBuffer to clue the reader in to what's going on?
Attachment #8908193 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #12) > Do we think an overload is sufficient here, or should this be called > something like replaceOversizedRawBuffer to clue the reader in to what's > going on? That's a good question. Honestly, I was surprised replaceRawBuffer() *didn't* take a capacity, so it seems pretty natural to me to take 3 args (viewing the other overload as sugar), but maybe that's just b/c I wrote Vector all those years ago and think in terms of [begin, length, capacity] :)
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #11) > I am unsure about the statement "realloc doesn't accept a target length of > 0". Would it be more accurate to say that "realloc frees the target buffer > if the length is 0, which the code was not expecting"? I was just too-briefly restating this assert that I hit: https://hg.mozilla.org/mozilla-central/file/593158cd4910/js/public/Utility.h#l253 > so mReserved should be >= mLength, and this assignment does not respect that > invariant in the first case, right? (The invariant is explicitly spelled > out in the documentation comment for the reserved() method.) > > It's not clear to me that mReserved is everywhere updated to respect this > invariant, so I am certainly willing to admit my misunderstanding of this > code. Thanks! I hadn't noticed that invariant. I see we do update mReserved when mLength is changed, e.g., by growBy() so that certainly does appear to be the intention. That makes that line simpler :)
Assignee | ||
Comment 15•7 years ago
|
||
Attachment #8908191 -
Attachment is obsolete: true
Attachment #8908333 -
Flags: review?(nfroyd)
Comment 16•7 years ago
|
||
Comment on attachment 8908333 [details] [diff] [review] fix-vector-resize-to-fit Review of attachment 8908333 [details] [diff] [review]: ----------------------------------------------------------------- Thank you!
Attachment #8908333 -
Flags: review?(nfroyd) → review+
Comment 17•7 years ago
|
||
Comment on attachment 8908194 [details] [diff] [review] fix-hash Review of attachment 8908194 [details] [diff] [review]: ----------------------------------------------------------------- Looks good.
Attachment #8908194 -
Flags: review?(ydelendik) → review+
Assignee | ||
Comment 18•7 years ago
|
||
Ok, I ended up doing the simplistic thing for ARM which leaves some possible performance gains for the future (ARM's masm's segmented-chunk buffer cannot be so-trivially swap()ed with a Vector<uint8_t> as x86-shared's can). I'd suggest reading WasmGenerator.h to see the new types first: the basic idea is that a CompileTask now contains N FuncCompileInput's which are compiled into a single CompiledCode struct which contains everything that used to go in a MacroAssembler. A key optimization is that CompiledCode swaps its empty-but-reserved Vectors in *and* out of the temporary MacroAssembler. I'll wait on bug 1380033 to land so I can replace the TODO. Unfortunately, I wasn't able to make func-index == code-range-index in this patch b/c the FarJumpIslands inserted by patchCallSites() get in the way. However, I realized I could have two VectorCodeRange (1 for functions, 1 for everything else) in some future patch.
Attachment #8907935 -
Attachment is obsolete: true
Attachment #8908761 -
Flags: review?(lhansen)
Comment 19•7 years ago
|
||
Comment on attachment 8908196 [details] [diff] [review] split-callsite-and-target Review of attachment 8908196 [details] [diff] [review]: ----------------------------------------------------------------- Seems reasonable enough.
Attachment #8908196 -
Flags: review?(lhansen) → review+
Comment 20•7 years ago
|
||
Comment on attachment 8908761 [details] [diff] [review] rm-asmMergeWith Review of attachment 8908761 [details] [diff] [review]: ----------------------------------------------------------------- OK. This patch is massive so the review is a lot more "no obvious bugs" than "obviously no bugs". ::: js/src/wasm/WasmGenerator.h @@ +201,5 @@ > const CodeRange& funcCodeRange(uint32_t funcIndex) const; > + const CodeRange& trapCodeRange(Trap trap) const; > + const CodeRange& debugTrapCodeRange() const; > + > + bool linkCallSites(); The removals of MOZ_MUST_USE here and below (many instances) seems unmotivated. ::: js/src/wasm/WasmTypes.cpp @@ +677,5 @@ > funcBeginToNormalEntry_(0), > funcBeginToTierEntry_(0), > kind_(kind) > { > + MOZ_ASSERT_IF(!oom::IsThreadSimulatingOOM(), begin_ <= end_); The guard on the assertion (also many cases below) is a little strange. If this can fail during simulated OOM (plausible) then why not during real OOM, and if so, are the effects benign?
Attachment #8908761 -
Flags: review?(lhansen) → review+
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #20) Thanks! Sorry it was so big. (Some of the intermediate iterations (that were slowdowns) were even more drastic.) > > + bool linkCallSites(); > > The removals of MOZ_MUST_USE here and below (many instances) seems > unmotivated. I'm certainly open to reverting, but my thinking here was that MOZ_MUST_USE is most useful as a tool in common utilities to ensure return values are checked and, within the private confines of ModuleGenerator, it wasn't providing much value while at the same time making all the signatures look gross and shouty and shifted over 13 columns. WDYT? > > + MOZ_ASSERT_IF(!oom::IsThreadSimulatingOOM(), begin_ <= end_); > > The guard on the assertion (also many cases below) is a little strange. If > this can fail during simulated OOM (plausible) then why not during real OOM, > and if so, are the effects benign? This just avoided a bunch of otherwise-unnecessary "if (masm.oom()) return false" checks right before codeRanges.emplaceBack() calls (that would hit the assert on OOM but were benign). But I don't have a strong opinion here either; probably it is cleaner to keep asserts tight and avoid confusion for future readers, so I'll revert.
Comment 22•7 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #21) > (In reply to Lars T Hansen [:lth] from comment #20) > Thanks! Sorry it was so big. (Some of the intermediate iterations (that > were slowdowns) were even more drastic.) > > > > + bool linkCallSites(); > > > > The removals of MOZ_MUST_USE here and below (many instances) seems > > unmotivated. > > I'm certainly open to reverting, but my thinking here was that MOZ_MUST_USE > is most useful as a tool in common utilities to ensure return values are > checked and, within the private confines of ModuleGenerator, it wasn't > providing much value while at the same time making all the signatures look > gross and shouty and shifted over 13 columns. WDYT? Your call. Myself, I like to use 'MOZ_MUST_USE bool' everywhere as a marker for 'must check the result'; maybe eventually we'll move to something Result-based here. > > > + MOZ_ASSERT_IF(!oom::IsThreadSimulatingOOM(), begin_ <= end_); > > > > The guard on the assertion (also many cases below) is a little strange. If > > this can fail during simulated OOM (plausible) then why not during real OOM, > > and if so, are the effects benign? > > This just avoided a bunch of otherwise-unnecessary "if (masm.oom()) return > false" checks right before codeRanges.emplaceBack() calls (that would hit > the assert on OOM but were benign). But I don't have a strong opinion here > either; probably it is cleaner to keep asserts tight and avoid confusion for > future readers, so I'll revert. Hm, sounds related to the hack that I introduced earlier that defaulted the values of some of these structures so that these asserts would pass. I did that to avoid OOM issues too.
Comment 23•7 years ago
|
||
Pushed by lwagner@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7b360a7ded8d Baldr: simplify InternalLink code (r=lth) https://hg.mozilla.org/integration/mozilla-inbound/rev/55b6ce9eb26d Vector::podResizeToFit should update reserved (r=froydnj) https://hg.mozilla.org/integration/mozilla-inbound/rev/fd601c9bcd00 Allow specifying capacity to Vector::replaceRawBuffer (r=froydnj) https://hg.mozilla.org/integration/mozilla-inbound/rev/1f23975f1d1f Baldr: Only compute bytecode hash in debug mode (r=yury) https://hg.mozilla.org/integration/mozilla-inbound/rev/30f8a995d4c1 Baldr: Split CallSiteAndTarget into CallSite and CallSiteTarget (r=lth) https://hg.mozilla.org/integration/mozilla-inbound/rev/6ff0f49d6e25 Baldr: don't transport MacroAssemblers between helper and ModuleGenerator threads (r=lth)
Comment 24•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7b360a7ded8d https://hg.mozilla.org/mozilla-central/rev/55b6ce9eb26d https://hg.mozilla.org/mozilla-central/rev/fd601c9bcd00 https://hg.mozilla.org/mozilla-central/rev/1f23975f1d1f https://hg.mozilla.org/mozilla-central/rev/30f8a995d4c1 https://hg.mozilla.org/mozilla-central/rev/6ff0f49d6e25
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox58:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment 25•7 years ago
|
||
Pushed by lwagner@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f26c09f0f737 fix JS_CODEGEN_NONE (r=npotb)
Comment 26•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f26c09f0f737
Comment 27•7 years ago
|
||
hi, mips build failed after this patch, I have created a patch, can you help to review this? thank you.
Attachment #8913179 -
Flags: review?(luke)
Assignee | ||
Comment 28•7 years ago
|
||
Comment on attachment 8913179 [details] [diff] [review] 0001-MIPS-Fix-build-error.patch Review of attachment 8913179 [details] [diff] [review]: ----------------------------------------------------------------- Oh right, sorry for the churn.
Attachment #8913179 -
Flags: review?(luke) → review+
Comment 29•7 years ago
|
||
thank you. Can somebody help to checkin this patch? I can not modify this to add a checkin needed flag.
Comment 30•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/238ad8ca8df0 MIPS: Add some missing functions to fix build errors. r=luke
Comment 31•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/238ad8ca8df0
Comment 32•7 years ago
|
||
hi, this patch: https://hg.mozilla.org/mozilla-central/rev/6ff0f49d6e25 use +AssemblerMIPSShared::appendRawCode(const uint8_t* code, size_t numBytes) replace -AssemblerMIPSShared::asmMergeWith(const AssemblerMIPSShared& other) and it remove a function call: addLongJump(BufferOffset(size() + off)); so when append a new wasm code, it would not handle the long jumps? so after that if move the codes by call executableCopy() will failed on mips platform? because we can not update the new target address?
You need to log in
before you can comment on or make changes to this bug.
Description
•