Closed
Bug 1333447
Opened 7 years ago
Closed 7 years ago
Wasm baseline: AssemblerBufferWithConstantPools::allocEntry is very slow
Categories
(Core :: JavaScript Engine: JIT, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: lth, Assigned: lth)
References
Details
Attachments
(2 files, 2 obsolete files)
4.94 KB,
text/plain
|
Details | |
13.58 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
Profiling suggested that nearly 15% of the running time of the baseline compiler was spent in AssemblerBufferWithConstantPools::allocEntry and ABWCP::insertEntryForwards (which is called by allocEntry). This is based on compiling AngryBots, with --no-threads, and this is with the profiler including the time to read the file from disk, so the fraction spent in these functions is actually higher. The problem seems to be that especially allocEntry is very general, takes a ton of parameters, and uses many registers, so there's a lot of work on both sides of the call boundary. insertEntryForwards is similarly general. In practice, most of the calls are on behalf of putInt() putting an ALU instruction. This is a particularly simple case and putInt can make the call to allocEntry a special case by checking some preconditions quickly and then just depositing the data. Doing so reduces the compilation time from 2035ms to 1870ms, or about 7%. The attached patch will need to be vetted further but it seems worth it to pursue this. Even multi-threaded compilation speeds up, though not by that amount. (It's possible to observe a speedup with Ion as well, for single-threaded compilation.)
Assignee | ||
Comment 1•7 years ago
|
||
Raw benchmark results, for posterity.
Assignee | ||
Comment 2•7 years ago
|
||
Oh, there's gold in them thar hills! By further streamlining (a little) the path to just put an instruction when there's space for it, which is the common case, the single-thread compilation time drops to 1730ms, for a total reduction (just from this modest change) of 15%. Even then, AssemblerBuffer and assembler functions dominate the profile, and it would not be surprising if there are other fast paths that should get some attention. (I'll need to put this aside for a couple weeks but I'll attach the WIP patch.)
Assignee | ||
Comment 3•7 years ago
|
||
Attachment #8829896 -
Attachment is obsolete: true
Comment 4•7 years ago
|
||
I am moving this to P1. It looks important enough and we have a rough patch. Feel free to disagree with this promotion ;).
Priority: P3 → P1
Assignee | ||
Comment 5•7 years ago
|
||
I'll claim it, then, since I probably care the most about it and have hardware that's useful for working on it. I'll be sure to test on some other hardware too.
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Comment 6•7 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #2) > Oh, there's gold in them thar hills! I do recall stepping through this path on occasion and thinking "surely, we can't be doing all this work on *every* instruction...".
Assignee | ||
Comment 7•7 years ago
|
||
This adds a single fast path through the assembler for when we want to just put an instruction, which is nearly always. This speeds up the compilation of AngryBots on my ARM dev board (compiling with --wasm-always-baseline --no-threads, which highlights assembly cost the most) by 17%, reducing the time from 2010ms to 1725ms and removing most assembler work from the top of the profile. One hopes other compilation jobs will benefit as well. The one part of the change that I would especially like an opinion is whether the the new fast-path guard in putInt() in IonAssemblerBufferWithConstantPools.h is correct. This was lifted out of allocEntry(), and then simplified, as the comment indicates. The effect of the simplification is to delay certain error reporting, which should be fine, but I want to make sure it doesn't compromise correctness in the process.
Attachment #8830229 -
Attachment is obsolete: true
Attachment #8841675 -
Flags: review?(nicolas.b.pierron)
Comment 8•7 years ago
|
||
Comment on attachment 8841675 [details] [diff] [review] bug1333447-fast-paths-v2.patch Review of attachment 8841675 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/arm/Assembler-arm.cpp @@ -1682,2 @@ > spew(m_buffer.getInstOrNull(offs)); > -#endif Can we move these function calls to the header as well? I do not really like the idea of having the same code written twice as this is a high risk that we will only check one and not the other. If not, please add a comment to recall the next persons who is changing this code to update both versions of this function. ::: js/src/jit/shared/IonAssemblerBuffer.h @@ +139,5 @@ > + MOZ_ALWAYS_INLINE > + void putU32Aligned(uint32_t value) { > + MOZ_ASSERT(bytelength_ + 4 <= SliceSize); > + MOZ_ASSERT((bytelength_ & 3) == 0); > + *reinterpret_cast<uint32_t*>(&instructions[bytelength_]) = value; nit: Either assert the alignment of &instruction[0] in addition to the previous checks, or the alignment of &instruction[byteLength_]. @@ +375,5 @@ > // bounds of the buffer. Use |getInstOrNull()| if |off| may be unassigned. > Inst* getInst(BufferOffset off) { > const int offset = off.getOffset(); > + // This function is hot, do not make the next line a RELEASE_ASSERT. > + MOZ_ASSERT(off.assigned() && offset >= 0 && (unsigned)offset < size()); existing-nit: s/(unsigned)offset/unsigned(offset)/ ::: js/src/jit/shared/IonAssemblerBufferWithConstantPools.h @@ -832,3 @@ > BufferOffset allocEntry(size_t numInst, unsigned numPoolEntries, > - uint8_t* inst, uint8_t* data, PoolEntry* pe = nullptr, > - bool markAsBranch = false) note: markAsBranch-noop removal confused me while doing this review and would have better been in a different patch. Next time maybe ;) @@ +891,5 @@ > + // which is set to zero when nopFill_ is set. > + // > + // We assume that we don't have to check this->oom() if there is space to > + // insert a plain instruction; there will always come a later time when it will be > + // checked anyway. This indeed sounds correct. Looking at places which can produce these OOM only highlight cases where we either allocate a new slice, or a pool entry. The only case which is not covered here is when we register a new branch such that we can later add it to the pool, and we would indeed push instruction, but the flag should still be present, thus preventing us to commit the code to any executable page.
Attachment #8841675 -
Flags: review?(nicolas.b.pierron) → review+
Assignee | ||
Comment 9•7 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4693c2d4cd6a791f3a58389b656479518f0fd484 Pushed with the wrong bug number (sigh): https://hg.mozilla.org/integration/mozilla-inbound/rev/7453899cfe44
Comment 10•7 years ago
|
||
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c807c80d954a ARM assembler: fast paths for putting simple instructions. r=nbp
Assignee | ||
Comment 11•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcd5db7ac471cad3370ab332e3425c3cbac9bf15 Bug 1333447 - ARM assembler: fast paths for putting simple instructions. r=nbp
Comment 12•7 years ago
|
||
Backout by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/bb9d82f04cf6 Backed out changeset c807c80d954a wrong patch landed
Comment 13•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bcd5db7ac471
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in
before you can comment on or make changes to this bug.
Description
•