Closed Bug 1329019 Opened 7 years ago Closed 7 years ago

Wasm: Too many vector grows and copies in asmMergeWith

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla58

Tracking Flags:

Tracking

Status

firefox58

---

fixed

People

(Reporter: away, Assigned: luke)

References

Details

Attachments

(8 files, 3 obsolete files)

Hacky proof of concept (don't land) 7 years ago (Away) 3.86 KB, patch		Details \| Diff \| Splinter Review
simplify-internal-link 7 years ago Luke Wagner [:luke] 26.16 KB, patch	lth : review+	Details \| Diff \| Splinter Review
rm-asmMergeWith (WIP) 7 years ago Luke Wagner [:luke] 164.52 KB, patch		Details \| Diff \| Splinter Review
rm-asmMergeWith (WIP 2) 7 years ago Luke Wagner [:luke] 166.02 KB, patch		Details \| Diff \| Splinter Review
fix-vector-resize-to-fit 7 years ago Luke Wagner [:luke] 1.21 KB, patch		Details \| Diff \| Splinter Review
enhance-replace-raw-buffer 7 years ago Luke Wagner [:luke] 3.00 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
fix-hash 7 years ago Luke Wagner [:luke] 8.42 KB, patch	yury : review+	Details \| Diff \| Splinter Review
split-callsite-and-target 7 years ago Luke Wagner [:luke] 13.26 KB, patch	lth : review+	Details \| Diff \| Splinter Review
fix-vector-resize-to-fit 7 years ago Luke Wagner [:luke] 3.53 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
rm-asmMergeWith 7 years ago Luke Wagner [:luke] 168.69 KB, patch	lth : review+	Details \| Diff \| Splinter Review
0001-MIPS-Fix-build-error.patch 7 years ago yuyin 5.90 KB, patch	luke : review+	Details \| Diff \| Splinter Review

(Away)

Reporter

Description

•

7 years ago

Attached patch Hacky proof of concept (don't land) — Details — Splinter Review

In an xperf profile I noticed a lot of time spent on grows and copies in Assembler::asmMergeWith and friends.

Here's a hacky patch that reserves an initial 40 megs in the masm buffer, just to demonstrate that there's room for improvement.

(This is on top of the patches for bug 1329018 in my queue)
Threads:       1   2   3   4   5   6   7   8
baseline:      621 441 347 305 292 283 275 268
patch:         604 436 327 292 269 262 252 244

The wins are not as prominent as when I last measured; maybe the tree has generally gotten better in the meantime. So we don't necessarily need to fix this urgently. Luke mentioned that for Cretonne there may be a re-working of the buffer handling in this area. That could be a good opportunity to reduce some of the layers of copying.

Hannes Verschore [:h4writer]

Updated

•

7 years ago

Priority: -- → P3

Luke Wagner [:luke]

Assignee

Comment 1

•

7 years ago

Taking.  I have some patches that pass tests, but they turned off batching so a major regression which I need to work on next.

Assignee: nobody → luke

Luke Wagner [:luke]

Assignee

Comment 2

•

7 years ago

Attached patch simplify-internal-link — Details — Splinter Review

This patch just cleans up some stuff I noticed on the real patch: some parts of LinkData::InternalLink are now unnecessary and we can also reuse Bind() and remove the custom wasm paths that were doing the same thing.

Attachment #8906162 - Flags: review?(lhansen)

Luke Wagner [:luke]

Assignee

Comment 3

•

7 years ago

Attached patch rm-asmMergeWith (WIP) (obsolete) — Details — Splinter Review

Here's a WIP that passes wasm tests (fails some debug tests for probably simple reasons).  Overall, it feels like a major simplification and cleanup, and it's also a lot more clear to me now how this will plug into cretonne.

This patch provides a 25% reduction in overall baseline compile time when running with `taskset 0x3` and --cpu-count=2 by reducing the time to do parallel function compilation by 34%.

Unfortunately, the simplistic approach of doing all the linking at the end loses parallelism because the linking/copying is now one big sequential operation at the end; while asmMergeWith was being done in parallel. Thus, as you add more hyperthreads/cores, the speedup gets dominated by the sequential linking.

So next step is to go back to doing the linking incrementally (as CompileTasks return) which shouldn't be too bad.  But, progress.

Lars T Hansen [:lth]

Comment 4

•

7 years ago

(In reply to Luke Wagner [:luke] from comment #3)

> This patch provides a 25% reduction in overall baseline compile time when
> running with `taskset 0x3` and --cpu-count=2 by reducing the time to do
> parallel function compilation by 34%.

Oh, I *want* this!

Lars T Hansen [:lth]

Comment 5

•

7 years ago

Comment on attachment 8906162 [details] [diff] [review]
simplify-internal-link

Review of attachment 8906162 [details] [diff] [review]:
-----------------------------------------------------------------

Nice cleanup.

Attachment #8906162 - Flags: review?(lhansen) → review+

Luke Wagner [:luke]

Assignee

Comment 6

•

7 years ago

Attached patch rm-asmMergeWith (WIP 2) (obsolete) — Details — Splinter Review

Updated patch WIP, almost done, just need to fill in ARM and there is now a dependency on the size estimation in bug 1380033.

So the patch now shows a pretty big speedup on all --cpu-count (on Tanks) with baseline:
 =1  638ms -> 547ms  (-14%)
 =2  430ms -> 314ms  (-27%)
 =4  287ms -> 196ms  (-31%)
 =8  260ms -> 153ms  (-41%)
Ion also gets better, but less pronounced.

Another fun metric is comparing the ion / baseline ratio change (on Tanks):
 =1  11.2x -> 12.6x
 =2  10.0x -> 12.2x
 =4   6.7x ->  8.3x
 =8   4.5x ->  7.7x
so the patch makes baseline look relatively better.

Attachment #8906164 - Attachment is obsolete: true

Luke Wagner [:luke]

Assignee

Comment 7

•

7 years ago

Attached patch fix-vector-resize-to-fit (obsolete) — Details — Splinter Review

Later patches hit two omissions in podResizeToFit() (I expect wasm is the only client of this function): (1) realloc don't accept a target length of 0, (2) mReserved was not being updated on shrink.

Attachment #8908191 - Flags: review?(nfroyd)

Luke Wagner [:luke]

Assignee

Comment 8

•

7 years ago

Attached patch enhance-replace-raw-buffer — Details — Splinter Review

This patch adds an overload to replaceRawBuffer() that allows providing a vector with length != capacity.

Attachment #8908193 - Flags: review?(nfroyd)

Luke Wagner [:luke]

Assignee

Comment 9

•

7 years ago

Attached patch fix-hash — Details — Splinter Review

This patch only computes the hash when debugging is enabled.  This was taking like 7% of total compilation time.

Attachment #8908194 - Flags: review?(ydelendik)

Luke Wagner [:luke]

Assignee

Comment 10

•

7 years ago

Attached patch split-callsite-and-target — Details — Splinter Review

This patch splits CallSiteAndTargetVector into separate CallSiteVector and CallSiteTargetVector.  This allows CallSiteVector to go straight into MetadataTier (during parallel compilation) while keeping the CallSiteTargetVector local to the ModuleGenerator (b/c it isn't needed after linking).  Otherwise, the two were being split in the final sequential phase.  (This matters more in conjunction with the next patch.)

Attachment #8908196 - Flags: review?(lhansen)

Nathan Froyd [:froydnj]

Comment 11

•

7 years ago

Comment on attachment 8908191 [details] [diff] [review]
fix-vector-resize-to-fit

Review of attachment 8908191 [details] [diff] [review]:
-----------------------------------------------------------------

Could we add a standalone test for this, possibly one test for each of the problematic cases?

I am unsure about the statement "realloc doesn't accept a target length of 0".  Would it be more accurate to say that "realloc frees the target buffer if the length is 0, which the code was not expecting"?

Clearing review for my confusion below and tests.

::: mfbt/Vector.h
@@ +249,5 @@
> +      aV.mTail.mCapacity = aV.kInlineCapacity;
> +#ifdef DEBUG
> +      aV.mTail.mReserved = 0;
> +#endif
> +      return;

This hunk seems correct.

@@ +261,5 @@
>      aV.mTail.mCapacity = aV.mLength;
> +#ifdef DEBUG
> +    aV.mTail.mReserved = aV.mTail.mReserved < aV.mLength
> +                         ? aV.mTail.mReserved
> +                         : aV.mLength;

Hm, so the doc comment for mBegin says:

   * Pointer to the buffer, be it inline or heap-allocated. Only [mBegin,
   * mBegin + mLength) hold valid constructed T objects. The range [mBegin +
   * mLength, mBegin + mCapacity) holds uninitialized memory. The range
   * [mBegin + mLength, mBegin + mReserved) also holds uninitialized memory
   * previously allocated by a call to reserve().

so mReserved should be >= mLength, and this assignment does not respect that invariant in the first case, right?  (The invariant is explicitly spelled out in the documentation comment for the reserved() method.)

It's not clear to me that mReserved is everywhere updated to respect this invariant, so I am certainly willing to admit my misunderstanding of this code.

Attachment #8908191 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 12

•

7 years ago

Comment on attachment 8908193 [details] [diff] [review]
enhance-replace-raw-buffer

Review of attachment 8908193 [details] [diff] [review]:
-----------------------------------------------------------------

Again, tests would be great here...but I see we don't have any tests for replaceRawBuffer in the first place, so maybe we can just choose to keep punting.

r=me; the comment below is just me wondering out loud what the right thing to do is.  Interested in your thoughts here.

::: mfbt/Vector.h
@@ +779,5 @@
>     * N.B. This call assumes that there are no uninitialized elements in the
> +   *      passed range [aP, aP + aLength). The range [aP + aLength, aP +
> +   *      aCapacity) must be allocated uninitialized memory.
> +   */
> +  void replaceRawBuffer(T* aP, size_t aLength, size_t aCapacity);

Do we think an overload is sufficient here, or should this be called something like replaceOversizedRawBuffer to clue the reader in to what's going on?

Attachment #8908193 - Flags: review?(nfroyd) → review+

Luke Wagner [:luke]

Assignee

Comment 13

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #12)
> Do we think an overload is sufficient here, or should this be called
> something like replaceOversizedRawBuffer to clue the reader in to what's
> going on?

That's a good question.  Honestly, I was surprised replaceRawBuffer() *didn't* take a capacity, so it seems pretty natural to me to take 3 args (viewing the other overload as sugar), but maybe that's just b/c I wrote Vector all those years ago and think in terms of [begin, length, capacity] :)

Luke Wagner [:luke]

Assignee

Comment 14

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #11)
> I am unsure about the statement "realloc doesn't accept a target length of
> 0".  Would it be more accurate to say that "realloc frees the target buffer
> if the length is 0, which the code was not expecting"?

I was just too-briefly restating this assert that I hit:
  https://hg.mozilla.org/mozilla-central/file/593158cd4910/js/public/Utility.h#l253

> so mReserved should be >= mLength, and this assignment does not respect that
> invariant in the first case, right?  (The invariant is explicitly spelled
> out in the documentation comment for the reserved() method.)
> 
> It's not clear to me that mReserved is everywhere updated to respect this
> invariant, so I am certainly willing to admit my misunderstanding of this
> code.

Thanks!  I hadn't noticed that invariant.  I see we do update mReserved when mLength is changed, e.g., by growBy() so that certainly does appear to be the intention.  That makes that line simpler :)

Luke Wagner [:luke]

Assignee

Comment 15

•

7 years ago

Attached patch fix-vector-resize-to-fit — Details — Splinter Review

Attachment #8908191 - Attachment is obsolete: true

Attachment #8908333 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 16

•

7 years ago

Comment on attachment 8908333 [details] [diff] [review]
fix-vector-resize-to-fit

Review of attachment 8908333 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you!

Attachment #8908333 - Flags: review?(nfroyd) → review+

Yury Delendik (:yury)

Comment 17

•

7 years ago

Comment on attachment 8908194 [details] [diff] [review]
fix-hash

Review of attachment 8908194 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

Attachment #8908194 - Flags: review?(ydelendik) → review+

Luke Wagner [:luke]

Assignee

Comment 18

•

7 years ago

Attached patch rm-asmMergeWith — Details — Splinter Review

Ok, I ended up doing the simplistic thing for ARM which leaves some possible performance gains for the future (ARM's masm's segmented-chunk buffer cannot be so-trivially swap()ed with a Vector<uint8_t> as x86-shared's can).

I'd suggest reading WasmGenerator.h to see the new types first: the basic idea is that a CompileTask now contains N FuncCompileInput's which are compiled into a single CompiledCode struct which contains everything that used to go in a MacroAssembler.  A key optimization is that CompiledCode swaps its empty-but-reserved Vectors in *and* out of the temporary MacroAssembler.

I'll wait on bug 1380033 to land so I can replace the TODO.

Unfortunately, I wasn't able to make func-index == code-range-index in this patch b/c the FarJumpIslands inserted by patchCallSites() get in the way.  However, I realized I could have two VectorCodeRange (1 for functions, 1 for everything else) in some future patch.

Attachment #8907935 - Attachment is obsolete: true

Attachment #8908761 - Flags: review?(lhansen)

Lars T Hansen [:lth]

Comment 19

•

7 years ago

Comment on attachment 8908196 [details] [diff] [review]
split-callsite-and-target

Review of attachment 8908196 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable enough.

Attachment #8908196 - Flags: review?(lhansen) → review+

Luke Wagner [:luke]

Assignee

Updated

•

7 years ago

Blocks: 1401827

Lars T Hansen [:lth]

Comment 20

•

7 years ago

Comment on attachment 8908761 [details] [diff] [review]
rm-asmMergeWith

Review of attachment 8908761 [details] [diff] [review]:
-----------------------------------------------------------------

OK. This patch is massive so the review is a lot more "no obvious bugs" than "obviously no bugs".

::: js/src/wasm/WasmGenerator.h
@@ +201,5 @@
>      const CodeRange& funcCodeRange(uint32_t funcIndex) const;
> +    const CodeRange& trapCodeRange(Trap trap) const;
> +    const CodeRange& debugTrapCodeRange() const;
> +
> +    bool linkCallSites();

The removals of MOZ_MUST_USE here and below (many instances) seems unmotivated.

::: js/src/wasm/WasmTypes.cpp
@@ +677,5 @@
>      funcBeginToNormalEntry_(0),
>      funcBeginToTierEntry_(0),
>      kind_(kind)
>  {
> +    MOZ_ASSERT_IF(!oom::IsThreadSimulatingOOM(), begin_ <= end_);

The guard on the assertion (also many cases below) is a little strange.  If this can fail during simulated OOM (plausible) then why not during real OOM, and if so, are the effects benign?

Attachment #8908761 - Flags: review?(lhansen) → review+

Luke Wagner [:luke]

Assignee

Comment 21

•

7 years ago

(In reply to Lars T Hansen [:lth] from comment #20)
Thanks!  Sorry it was so big.  (Some of the intermediate iterations (that were slowdowns) were even more drastic.)

> > +    bool linkCallSites();
> 
> The removals of MOZ_MUST_USE here and below (many instances) seems
> unmotivated.

I'm certainly open to reverting, but my thinking here was that MOZ_MUST_USE is most useful as a tool in common utilities to ensure return values are checked and, within the private confines of ModuleGenerator, it wasn't providing much value while at the same time making all the signatures look gross and shouty and shifted over 13 columns.  WDYT?

> > +    MOZ_ASSERT_IF(!oom::IsThreadSimulatingOOM(), begin_ <= end_);
> 
> The guard on the assertion (also many cases below) is a little strange.  If
> this can fail during simulated OOM (plausible) then why not during real OOM,
> and if so, are the effects benign?

This just avoided a bunch of otherwise-unnecessary "if (masm.oom()) return false" checks right before codeRanges.emplaceBack() calls (that would hit the assert on OOM but were benign).  But I don't have a strong opinion here either; probably it is cleaner to keep asserts tight and avoid confusion for future readers, so I'll revert.

Lars T Hansen [:lth]

Comment 22

•

7 years ago

(In reply to Luke Wagner [:luke] from comment #21)
> (In reply to Lars T Hansen [:lth] from comment #20)
> Thanks!  Sorry it was so big.  (Some of the intermediate iterations (that
> were slowdowns) were even more drastic.)
> 
> > > +    bool linkCallSites();
> > 
> > The removals of MOZ_MUST_USE here and below (many instances) seems
> > unmotivated.
> 
> I'm certainly open to reverting, but my thinking here was that MOZ_MUST_USE
> is most useful as a tool in common utilities to ensure return values are
> checked and, within the private confines of ModuleGenerator, it wasn't
> providing much value while at the same time making all the signatures look
> gross and shouty and shifted over 13 columns.  WDYT?

Your call.   Myself, I like to use 'MOZ_MUST_USE bool' everywhere as a marker for 'must check the result'; maybe eventually we'll move to something Result-based here.

> > > +    MOZ_ASSERT_IF(!oom::IsThreadSimulatingOOM(), begin_ <= end_);
> > 
> > The guard on the assertion (also many cases below) is a little strange.  If
> > this can fail during simulated OOM (plausible) then why not during real OOM,
> > and if so, are the effects benign?
> 
> This just avoided a bunch of otherwise-unnecessary "if (masm.oom()) return
> false" checks right before codeRanges.emplaceBack() calls (that would hit
> the assert on OOM but were benign).  But I don't have a strong opinion here
> either; probably it is cleaner to keep asserts tight and avoid confusion for
> future readers, so I'll revert.

Hm, sounds related to the hack that I introduced earlier that defaulted the values of some of these structures so that these asserts would pass.  I did that to avoid OOM issues too.

Pulsebot

Comment 23

•

7 years ago

Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b360a7ded8d
Baldr: simplify InternalLink code (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/55b6ce9eb26d
Vector::podResizeToFit should update reserved (r=froydnj)
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd601c9bcd00
Allow specifying capacity to Vector::replaceRawBuffer (r=froydnj)
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f23975f1d1f
Baldr: Only compute bytecode hash in debug mode (r=yury)
https://hg.mozilla.org/integration/mozilla-inbound/rev/30f8a995d4c1
Baldr: Split CallSiteAndTarget into CallSite and CallSiteTarget (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ff0f49d6e25
Baldr: don't transport MacroAssemblers between helper and ModuleGenerator threads (r=lth)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 24

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7b360a7ded8d
https://hg.mozilla.org/mozilla-central/rev/55b6ce9eb26d
https://hg.mozilla.org/mozilla-central/rev/fd601c9bcd00
https://hg.mozilla.org/mozilla-central/rev/1f23975f1d1f
https://hg.mozilla.org/mozilla-central/rev/30f8a995d4c1
https://hg.mozilla.org/mozilla-central/rev/6ff0f49d6e25

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox58: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Pulsebot

Comment 25

•

7 years ago

Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f26c09f0f737
fix JS_CODEGEN_NONE (r=npotb)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 26

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f26c09f0f737

yuyin

Comment 27

•

7 years ago

Attached patch 0001-MIPS-Fix-build-error.patch — Details — Splinter Review

hi, mips build failed after this patch, I have created a patch, can you help to review this? thank you.

Attachment #8913179 - Flags: review?(luke)

Luke Wagner [:luke]

Assignee

Comment 28

•

7 years ago

Comment on attachment 8913179 [details] [diff] [review]
0001-MIPS-Fix-build-error.patch

Review of attachment 8913179 [details] [diff] [review]:
-----------------------------------------------------------------

Oh right, sorry for the churn.

Attachment #8913179 - Flags: review?(luke) → review+

yuyin

Comment 29

•

7 years ago

thank you. 

Can somebody help to checkin this patch? I can not modify this to add a checkin needed flag.

Pulsebot

Comment 30

•

7 years ago

Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/238ad8ca8df0
MIPS: Add some missing functions to fix build errors. r=luke

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 31

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/238ad8ca8df0

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

7 years ago

Depends on: 1406889

yuyin

Comment 32

•

7 years ago

hi,
this patch:

https://hg.mozilla.org/mozilla-central/rev/6ff0f49d6e25
use

+AssemblerMIPSShared::appendRawCode(const uint8_t* code, size_t numBytes)
replace
-AssemblerMIPSShared::asmMergeWith(const AssemblerMIPSShared& other)

and it remove a function call:
addLongJump(BufferOffset(size() + off));

so when append a new wasm code, it would not handle the long jumps？

so after that if move the codes by call executableCopy() will failed on mips platform? because we can not update the new target address?

You need to log in before you can comment on or make changes to this bug.