Open Bug 832718 Opened 12 years ago Updated 1 year ago

[meta] IonMonkey: Use SIMD to optimize gaussian blur.

Categories

(Core :: JavaScript Engine, enhancement, P3)

x86_64
Linux
enhancement

Tracking

()

REOPENED

People

(Reporter: nbp, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

(Keywords: meta)

Attachments

(1 file)

Guassian blur (kraken benchmark) is interesting because, it contains a loop which repeat 4 times the same operation, and the current MIR optimizations, such as GVN & LICM are able to move most of the boilerplate away. These 4 identical operations are made on Doubles (might be Int32 inputs) as inferred from the type inference. and they can be divided into 2 contiguous 16 bytes aligned inputs (2 values from the element vector). The goal would be to merge the r & g and b & a into 2 xmm registers, and to manipulate them as one entity. As opposed to our current usage of xmm registers, we would have to introduce a new MIR type to reflect the fact that 2 values / 2 doubles are packed into one FloatRegister. This would have an impact on snapshots and on the register allocation to handle any spill. How we should schedule such things: 1/ Add SIMD support into the assembler, for loading 2 Values, checking if they are boxed int, unboxing them, multiplying them, adding them, dividing them, and store them. 2/ Evaluate the performance gain by hacking the engine and doing a manual substitution of the compilation result. This will determine if the previous set of patches should land or not, and if we should continue any investigation and implementation. 3/ As this is a complex integration, I will suggest to add a flag to enable or not this feature in the JS Shell. As adding such feature might have consequences on multiple aspect of the code. 4.1/ Support a simple test case for copying value with xmm registers. See if xmm are still valuable only for a copy or if we would need to add heuristics to a later optimization phase. 4.2/ Add Register allocator & snapshot supports for MIRType_PackedValue and MIRType_PackedDouble. Add one instruction for Loading and Storing. 5/ Add the rest of the MIR / LIR, as well as the new phase(s) for converting vectorized code, such as gaussian blur, into a SIMD-powered assembly.
Doing SIMD based parallelization is an upcoming goal for the ParallelArray project I believe. I've been curious if the work there could also be used for vectorizing normal JS, and how often that would be helpful.
Depends on: 832777
Depends on: 832778
Depends on: 832779
This is interesting for sure, but do we know why we are currently slower than V8 on guassian-blur? There may be some simpler, lower-hanging fruit there.
I finished to test the prototype which does the unboxing of packed int (if they are) into double and do packed multiplications & additions and divisions before storing doubles back to the memory. The prototype is available at: (only works on x64) https://github.com/nbp/mozilla-central/branches/ionmonkey-fosdem-2013 The current result are showing a 20% improvement (187.1ms --> 149.7) over 100 runs of Gaussian blur. Currently this prototype does the unboxing with SIMD, and it will surely benefit from the surely-Double arrays patch on which Brian is working on.
This is the codegen output of the current (as the date of this message) version of https://github.com/nbp/mozilla-central/branches/ionmonkey-fosdem-2013 This modification add a few LIR nodes to avoid doing the register allocation and use the register allocation made on Float register to allocate Packed-double registers. The snapshot encoding and the *fake* optimization is made on the last MIR step of the graph. This optimization relies on the MIR id & op to substitute/mutate the instructions to work on Packed doubles. It add a few arch-specific LIR nodes which are targeted by the Lowering, in order to allocate temporary registers such as the one needed for unboxing Int32-s with SIMD. Snapshots are bad, but no bailout occur during gaussian blur, so the prototype just encode packed doubles as doubles.
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #3) > Currently this prototype does the unboxing with SIMD, and it will surely > benefit from the surely-Double arrays patch on which Brian is working on. That is bug 833898, and has a patch up for review if you want to test with it.
Whiteboard: [ARM-opt]
??? I you look at the attached patches of dependent bugs, this is far from being an ARM optimization.
Whiteboard: [ARM-opt]
Assignee: general → nobody
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
Blocks: sm-opt-jits
Severity: S3 → N/A
Status: RESOLVED → REOPENED
Type: defect → enhancement
Priority: -- → P3
Resolution: INCOMPLETE → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: