Closed
Bug 598839
Opened 14 years ago
Closed 14 years ago
JM: Faster x64 Syncing
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: sstangl, Unassigned)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 4 obsolete files)
24.24 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
From bug 598491 we have storeValueFromComponents(), which emits more efficient whole-Value storing code on x64. Once bug 591836 settles we can use this function to optimize x64 syncing, which has a large effect on performance, as previously measured in bug 587444.
Reporter | ||
Comment 1•14 years ago
|
||
============================================================================= ** TOTAL **: 1.040x as fast 581.6ms +/- 0.3% 559.3ms +/- 0.4% ============================================================================= 3d: 1.068x as fast 116.7ms +/- 1.0% 109.3ms +/- 1.4% cube: 1.121x as fast 35.7ms +/- 1.5% 31.8ms +/- 0.6% morph: - 44.9ms +/- 1.3% 44.3ms +/- 1.6% raytrace: 1.086x as fast 36.1ms +/- 1.2% 33.3ms +/- 2.6% access: 1.062x as fast 79.8ms +/- 0.9% 75.1ms +/- 0.9% binary-trees: - 11.3ms +/- 1.4% 11.3ms +/- 2.1% fannkuch: 1.042x as fast 31.7ms +/- 1.0% 30.5ms +/- 1.6% nbody: 1.127x as fast 23.2ms +/- 1.1% 20.6ms +/- 0.9% nsieve: 1.059x as fast 13.6ms +/- 1.2% 12.8ms +/- 1.1% bitops: 1.079x as fast 39.9ms +/- 0.5% 36.9ms +/- 0.6% 3bit-bits-in-byte: 1.085x as fast 4.0ms +/- 1.4% 3.7ms +/- 3.9% bits-in-byte: 1.106x as fast 14.7ms +/- 0.9% 13.3ms +/- 1.1% bitwise-and: ?? 7.7ms +/- 1.7% 7.8ms +/- 1.7% nsieve-bits: 1.107x as fast 13.4ms +/- 1.0% 12.1ms +/- 0.8% controlflow: 1.052x as fast 6.2ms +/- 2.0% 5.9ms +/- 2.2% recursive: 1.052x as fast 6.2ms +/- 2.0% 5.9ms +/- 2.2% crypto: 1.029x as fast 41.4ms +/- 1.0% 40.2ms +/- 0.7% aes: - 22.2ms +/- 1.0% 21.9ms +/- 1.0% md5: 1.048x as fast 11.6ms +/- 1.5% 11.1ms +/- 0.6% sha1: 1.051x as fast 7.6ms +/- 2.2% 7.3ms +/- 1.9% date: - 61.5ms +/- 0.7% 60.9ms +/- 0.8% format-tofte: - 26.0ms +/- 1.3% 25.7ms +/- 1.0% format-xparb: - 35.5ms +/- 0.7% 35.2ms +/- 1.0% math: 1.062x as fast 56.7ms +/- 0.4% 53.4ms +/- 1.0% cordic: 1.132x as fast 20.5ms +/- 0.7% 18.1ms +/- 0.9% partial-sums: - 23.6ms +/- 0.8% 23.3ms +/- 1.8% spectral-norm: 1.056x as fast 12.5ms +/- 1.1% 11.9ms +/- 1.2% regexp: - 22.4ms +/- 0.6% 22.4ms +/- 0.7% dna: - 22.4ms +/- 0.6% 22.4ms +/- 0.7% string: 1.012x as fast 157.1ms +/- 0.4% 155.2ms +/- 0.4% base64: ?? 14.3ms +/- 1.0% 14.5ms +/- 1.7% fasta: 1.032x as fast 25.8ms +/- 0.5% 25.0ms +/- 0.6% tagcloud: 1.011x as fast 42.3ms +/- 0.5% 41.9ms +/- 0.5% unpack-code: - 57.6ms +/- 0.6% 57.3ms +/- 0.6% validate-input: 1.028x as fast 17.0ms +/- 0.8% 16.6ms +/- 1.1%
Reporter | ||
Comment 2•14 years ago
|
||
This patch introduces a function ensureFeSynced() that knows how to efficiently sync a full Value on x64, with equivalent syncing on x86 to what exists now. The patch then integrates this function throughout the syncing paths, so that x64 sees the performance improvement posted above. Most of the code is shared between x86 and x64.
Attachment #479822 -
Flags: review?(dvander)
Reporter | ||
Comment 3•14 years ago
|
||
Same as previous patch, except without duplicate syncing in syncAndKill().
Attachment #479822 -
Attachment is obsolete: true
Attachment #480000 -
Flags: review?(dvander)
Attachment #479822 -
Flags: review?(dvander)
Reporter | ||
Comment 4•14 years ago
|
||
Same as previous patch, but fixes overly ambitious syncing on x86 in sync() and syncAndKill() during the register-syncing phases. It is difficult to determine whether x86 performance is affected by this patch -- the results are always within the noisy range.
Attachment #480000 -
Attachment is obsolete: true
Attachment #480430 -
Flags: review?(dvander)
Attachment #480000 -
Flags: review?(dvander)
Comment on attachment 480430 [details] [diff] [review] Restore x86 perf in ::sync() and ::syncAndKill(). This patch looks sound, but given the delicacy of this area and the chance to clean up some things, > inline void >+FrameState::ensureFeSynced(const FrameEntry *fe, Address to, Assembler &masm) const This looks like it's designed to sync a FrameEntry to a slot on the frame, as opposed to FrameState::storeTo, which syncs to an arbitrary address. From looking at the rest of this patch, ensureFeSynced is only passed an fe/to mismatch in one place: storeLocal. I think we can factor this better. Remove the |to| parameter and always expect the actual |fe|, not the backing store. Or perhaps better, keep the address and assert that it's addressOf(fe). >+{ >+ const FrameEntry *orig = fe; >+ if (fe->isCopy()) >+ fe = fe->copyOf(); Keeping in line with canonical naming, |fe| should stay as-is and the backing store assigned to "backing". >+ >+ JS_ASSERT_IF(!orig->type.synced(), fe->type.inRegister() || fe->type.isConstant()); >+ JS_ASSERT_IF(!orig->data.synced(), fe->data.inRegister() || fe->data.isConstant()); These asserts are tautologies, so you can remove them. >+ /* If syncing to a different address, always perform sync. */ >+ if (addressOf(orig).base != to.base || addressOf(orig).offset != to.offset) { This block should go away. (If for no other reason, we should shy away from address snooping.) > inline void >@@ -507,18 +543,25 @@ FrameState::syncData(const FrameEntry *fe, Address to, Assembler &masm) const > !fe->data.synced()); > JS_ASSERT(fe->data.inRegister() || fe->data.isConstant()); > >- if (fe->data.isConstant()) { >+#if defined JS_PUNBOX64 Good changes, but we need a follow up bug to split up FrameState. This whole function looks like it wants to be box format-specific. `sync() const` is almost totally box-specific too. >- if ((!fe->type.synced() && !backing->type.inRegister()) || >- (!fe->data.synced() && !backing->data.inRegister())) { >+ /* Fall back to a slower sync algorithm if load required. */ >+ if ((!fe->type.synced() && backing->type.inMemory()) || >+ (!fe->data.synced() && backing->data.inMemory())) { Nice - good catch, this could avoid some fallback into syncFancy(). >+ /* Sync now, unless synced above. */ Change this comment to something explaining which phase of the algorithm we're in. Maybe: "Any register-backed FE parts have been synced up-front. If a part still needs syncing, it is either a copy or a constant." >+#if defined JS_PUNBOX64 >+ if (!fe->type.inRegister() && !fe->data.inRegister()) >+ ensureFeSynced(fe, addressOf(fe), masm); Comment in here: "If either part of a PUNBOX FE was register-backed, then the entire value was already synced." >+#elif defined JS_NUNBOX32 >+ if (!fe->data.synced() && !fe->data.inRegister()) { >+ syncData(backing, addressOf(fe), masm); >+ if (fe->isConstant()) >+ continue; Not your code, but please add a :FIXME: bug 601657. > >@@ -530,6 +527,27 @@ FrameState::syncAndKill(Registers kill, Uses uses, Uses ignore) > > JS_ASSERT(fe->isTracked()); > >+#if defined JS_PUNBOX64 >+ /* Sync entire FE to prevent loads. */ >+ ensureFeSynced(fe, addressOf(fe), masm); >+ >+ if (!fe->data.synced()) >+ fe->data.sync(); >+ if (!fe->type.synced()) >+ fe->type.sync(); It can wait for a follow-up patch, but I expect we'll want two versions of ensureFeSynced: a const that takes a masm, and a non-const that doesn't. In the latter case, marking of data/type as synced would be done for the caller. Out of curiosity, is there any reason not to use ensureFeSynced() here on x86 as well? >+#if defined JS_PUNBOX64 >+ /* Attempt to sync both simultaneously. */ >+ if (!fe->isCopy() || !(backing->data.inMemory() || backing->type.inMemory())) { >+ ensureFeSynced(backing, address, masm); >+ if (!fe->type.synced()) >+ fe->type.sync(); >+ if (!fe->data.synced()) >+ fe->data.sync(); >+ } >+#endif Another case where a mutable ensureFeSynced() would help. >+ /* If not possible, do a piecewise sync. */ > if (!fe->data.synced()) { >- if (backing != fe && backing->data.inMemory()) >+ if (fe->isCopy() && backing->data.inMemory()) > tempRegForData(backing); fe != backing test is faster. > syncData(backing, address, masm); > fe->data.sync(); > if (fe->isConstant() && !fe->type.synced()) { > fe->type.sync(); Bleh, another :FIXME: bug 601657 :) Actually, I'm semi-convinced this entire path (save for the marking killed registers) could be squeezed into a mutable ensureFeSynced. Even better if you made *that* function call into an ensureDataSynced() and ensureTypeSynced(), because... >+ if (!backing->data.inMemory() && !backing->type.inMemory()) { >+ ensureFeSynced(backing, addressOf(local), masm); >+ } else { >+ if (!local->data.synced()) { >+ if (backing->data.inMemory()) >+ tempRegForData(backing); >+ syncData(backing, addressOf(local), masm); >+ } >+ if (!local->type.synced()) { >+ if (backing->type.inMemory()) >+ tempRegForType(backing); >+ syncType(backing, addressOf(local), masm); >+ } All of this could go away. So major bonus points if you go that route, otherwise, we can leave it for a follow-up. >+ /* TODO: It is unclear why this call is here. */ > if (closed) > forgetEntry(local); Closed variables cannot be optimized, so its known type must be forgotten. Though I think as we discussed on IRC, this one is stray and the work is already done by resetSynced() later. Either remove the TODO or the block ;) BTW: Please make sure to test on --jitflags=m,md on both x86/x64, since with debug mode FS hits very, very different code paths.
Attachment #480430 -
Flags: review?(dvander)
Reporter | ||
Comment 6•14 years ago
|
||
This is the same as the above patch, plus dvander's fixes, plus rebasing on top of bug 601657. From the previous patch, the following code is new: - What used to be called sync{Type,Data} are now called ensure{Type,Data}Synced. The 'ensure' family of functions are const. The new 'sync' family of functions (sync{Type,Data,Fe}) may allocate registers and will set the synced flags of the relevant FE appropriately. - No functions in the 'ensure' family take an Address parameter, as it was never actually necessary. (We could also get rid of the masm parameter, but then we would have to eliminate it from the entirety of FrameState, which can be done in a separate patch.) Passes trace-tests on x86,x64 with --jitflags=m,md.
Attachment #480430 -
Attachment is obsolete: true
Attachment #481151 -
Flags: review?(dvander)
Reporter | ||
Comment 7•14 years ago
|
||
For encouragement, the newest x64 SS-0.9.1 results, which are slightly improved (as a result of better code in syncAndKill(), from syncFe()'s existence): ============================================================================= ** TOTAL **: 1.056x as fast 581.8ms +/- 0.5% 550.7ms +/- 0.4% ============================================================================= 3d: 1.065x as fast 116.9ms +/- 1.2% 109.7ms +/- 1.4% cube: 1.099x as fast 35.5ms +/- 1.5% 32.3ms +/- 1.5% morph: - 44.8ms +/- 1.2% 44.5ms +/- 1.6% raytrace: 1.113x as fast 36.7ms +/- 1.7% 32.9ms +/- 2.3% access: 1.088x as fast 79.7ms +/- 0.7% 73.2ms +/- 0.8% binary-trees: 1.035x as fast 10.1ms +/- 1.6% 9.7ms +/- 2.1% fannkuch: 1.076x as fast 32.3ms +/- 0.8% 30.1ms +/- 1.3% nbody: 1.142x as fast 23.2ms +/- 1.1% 20.3ms +/- 0.9% nsieve: 1.073x as fast 14.1ms +/- 1.5% 13.1ms +/- 0.7% bitops: 1.139x as fast 40.2ms +/- 1.4% 35.3ms +/- 0.6% 3bit-bits-in-byte: 1.155x as fast 3.9ms +/- 2.8% 3.4ms +/- 4.1% bits-in-byte: 1.24x as fast 15.1ms +/- 1.4% 12.2ms +/- 0.9% bitwise-and: - 7.8ms +/- 2.4% 7.7ms +/- 1.7% nsieve-bits: 1.118x as fast 13.5ms +/- 1.5% 12.0ms +/- 0.5% controlflow: 1.173x as fast 6.2ms +/- 2.4% 5.3ms +/- 2.5% recursive: 1.173x as fast 6.2ms +/- 2.4% 5.3ms +/- 2.5% crypto: 1.111x as fast 42.6ms +/- 1.0% 38.3ms +/- 0.6% aes: 1.099x as fast 23.2ms +/- 1.2% 21.1ms +/- 1.0% md5: 1.130x as fast 11.7ms +/- 1.4% 10.3ms +/- 1.3% sha1: 1.119x as fast 7.7ms +/- 2.0% 6.9ms +/- 1.4% date: 1.049x as fast 64.6ms +/- 0.8% 61.6ms +/- 0.5% format-tofte: 1.047x as fast 27.0ms +/- 1.2% 25.8ms +/- 0.6% format-xparb: 1.050x as fast 37.6ms +/- 0.7% 35.8ms +/- 0.6% math: 1.098x as fast 56.7ms +/- 0.5% 51.6ms +/- 0.8% cordic: 1.171x as fast 20.1ms +/- 0.5% 17.2ms +/- 0.6% partial-sums: 1.032x as fast 24.0ms +/- 1.2% 23.3ms +/- 1.5% spectral-norm: 1.121x as fast 12.6ms +/- 1.1% 11.2ms +/- 1.1% regexp: - 22.3ms +/- 0.6% 22.2ms +/- 0.5% dna: - 22.3ms +/- 0.6% 22.2ms +/- 0.5% string: ?? 152.5ms +/- 0.4% 153.3ms +/- 0.6% base64: 1.047x as fast 13.8ms +/- 0.9% 13.2ms +/- 0.8% fasta: 1.039x as fast 26.0ms +/- 0.6% 25.0ms +/- 1.3% tagcloud: 1.008x as fast 41.1ms +/- 0.4% 40.8ms +/- 0.4% unpack-code: *1.075x as slow* 54.7ms +/- 0.9% 58.8ms +/- 1.0% validate-input: 1.089x as fast 16.9ms +/- 1.2% 15.5ms +/- 0.9%
Comment on attachment 481151 [details] [diff] [review] Faster x64 Syncing, Second Take >+FrameState::ensureFeSynced(const FrameEntry *fe, Assembler &masm) const ... >+ /* If we can, sync the type and data in one go. */ >+ if (!fe->data.synced() && !fe->type.synced()) { It seems like it's always more advantageous to avoid a load. Should this be special cased to x64, for something like: if (!dataSynced || !typeSycned && (constant || ((typeReg || typeConst) && dataReg)) ? >+FrameState::ensureTypeSynced(const FrameEntry *fe, Assembler &masm) const ... >+#if defined JS_PUNBOX64 >+ /* Attempt to store the entire Value, to prevent a load. */ >+ if (backing->data.inRegister()) { >+ RegisterID dreg = backing->data.reg(); >+ if (backing->isConstant()) >+ masm.storeValue(backing->getValue(), to); This backing->isConstant() case is unreachable. Should be joisted like ensureDataSynced()? >+ if (JS_UNLIKELY(needTypeReg && needDataReg)) { >+ /* Use ValueReg to do a whole-Value mem-to-mem move. */ >+ JS_ASSERT(backing != fe); Explain this assert - maybe something like "Registers are only needed for memory-to-memory moves, which can only occur for a copy backed by memory." >+inline void >+FrameState::syncType(FrameEntry *fe, Assembler &masm) No need to take in a masm, I think. Helps distinguish it from the const case too. >+inline void >+FrameState::syncData(FrameEntry *fe, Assembler &masm) Ditto. >+ /* >+ * Likewise, storeLocal() may have set this FE, with a known type, >+ * to be a copy of another FE, which has an unknown type. >+ * Just forget the type, since the backing is used in all cases. >+ */ >+ if (fe->isCopy()) { >+ fe->type.invalidate(); >+ return; I think this is safe, but we shouldn't do it because LOCALINC is wrong in the first place. The storeLocal can leave an illegal copy in the case of an eval or closed slot. We should just pop and re-push a copy. It looks like callers of storeLocal have to be audited for this case, because !popGuaranteed w/ closed variables must not remain copies. Let's fix LOCALINC in this patch. I'll file a follow-up for the others. >- if (closed) >+ if (closed) { >+ if (!local->isCopy()) >+ forgetEntry(local); > local->resetSynced(); Existing code, but stick a comment in here, like "If the FE can have registers, make sure to free them before resetting."
Attachment #481151 -
Flags: review?(dvander)
(In reply to comment #8) > Should be joisted like ensureDataSynced()? Oh, let's hoist instead of joist.
Reporter | ||
Comment 10•14 years ago
|
||
(In reply to comment #8) > It seems like it's always more advantageous to avoid a load. Should this be > special cased to x64, for something like: > > if (!dataSynced || !typeSynced && (constant || ((typeReg || typeConst) && > dataReg)) > > ? If (!dataSynced && !typeSynced), then we emit sync code with no load. If (dataSynced && !typeSynced), then only ensureTypeSynced() will have any output, and ensureTypeSynced() contains x64-specific code that avoids the load if possible. Likewise for (!dataSynced && typeSynced). So the condition as exists is OK -- it always avoids a load if possible. This is implicit and annoying to read: we can equivalently change the code to be: > if (!fe->data.synced() && !fe->type.synced()) > ... > else if (!fe->data.synced()) > ensureDataSynced(fe, masm); > else if (!fe->type.synced()) > ensureTypeSynced(fe, masm); But the entire point of the 'ensure' family is so that we don't have to do such checks. And if we don't do that, we can special-case the 'both' logic to be x64-specific, since it has no effect on x86 except to add bloated, redundant code. > > >+FrameState::ensureTypeSynced(const FrameEntry *fe, Assembler &masm) const > ... > >+#if defined JS_PUNBOX64 > >+ /* Attempt to store the entire Value, to prevent a load. */ > >+ if (backing->data.inRegister()) { > >+ RegisterID dreg = backing->data.reg(); > >+ if (backing->isConstant()) > >+ masm.storeValue(backing->getValue(), to); > > This backing->isConstant() case is unreachable. Should be hoisted like > ensureDataSynced()? Oops! Good catch. As discussed on IRC, we'll leave jsop_localinc() changes for a separate patch. Version three forthcoming.
Reporter | ||
Comment 11•14 years ago
|
||
Attachment #481151 -
Attachment is obsolete: true
Attachment #481420 -
Flags: review?(dvander)
Comment on attachment 481420 [details] [diff] [review] Faster x64 Syncing, Third Take Awesome patch, good to see a lot of the FS redundancy go away.
Attachment #481420 -
Flags: review?(dvander) → review+
Reporter | ||
Comment 13•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/08970767d83d
Whiteboard: fixed-in-tracemonkey
Reporter | ||
Comment 14•14 years ago
|
||
Was backed out: http://hg.mozilla.org/tracemonkey/rev/ce646a5a0454 http://hg.mozilla.org/tracemonkey/rev/42cf12e5e5e5
Whiteboard: fixed-in-tracemonkey
Reporter | ||
Comment 15•14 years ago
|
||
You put your right hand in, http://hg.mozilla.org/tracemonkey/rev/ae702233627c
Whiteboard: fixed-in-tracemonkey
Comment 16•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/ae702233627c
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•