Open Bug 832718 Opened 11 years ago Updated 5 months ago

[meta] IonMonkey: Use SIMD to optimize gaussian blur.

Tracking

()

Status:

REOPENED

People

(Reporter: nbp, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

(Keywords: meta)

Attachments

(1 file)

Code generated by the prototype. 11 years ago Nicolas B. Pierron [:nbp] 43.54 KB, text/plain		Details

Nicolas B. Pierron [:nbp]

Reporter

Description

•

11 years ago

Guassian blur (kraken benchmark) is interesting because, it contains a loop which repeat 4 times the same operation, and the current MIR optimizations, such as GVN & LICM are able to move most of the boilerplate away.

These 4 identical operations are made on Doubles (might be Int32 inputs) as inferred from the type inference. and they can be divided into 2 contiguous 16 bytes aligned inputs (2 values from the element vector).

The goal would be to merge the r & g and b & a into 2 xmm registers, and to manipulate them as one entity. As opposed to our current usage of xmm registers, we would have to introduce a new MIR type to reflect the fact that 2 values / 2 doubles are packed into one FloatRegister. This would have an impact on snapshots and on the register allocation to handle any spill.

How we should schedule such things:

1/ Add SIMD support into the assembler, for loading 2 Values, checking if they are boxed int, unboxing them, multiplying them, adding them, dividing them, and store them.

2/ Evaluate the performance gain by hacking the engine and doing a manual substitution of the compilation result. This will determine if the previous set of patches should land or not, and if we should continue any investigation and implementation.

3/ As this is a complex integration, I will suggest to add a flag to enable or not this feature in the JS Shell. As adding such feature might have consequences on multiple aspect of the code.

4.1/ Support a simple test case for copying value with xmm registers. See if xmm are still valuable only for a copy or if we would need to add heuristics to a later optimization phase.

4.2/ Add Register allocator & snapshot supports for MIRType_PackedValue and MIRType_PackedDouble. Add one instruction for Loading and Storing.

5/ Add the rest of the MIR / LIR, as well as the new phase(s) for converting vectorized code, such as gaussian blur, into a SIMD-powered assembly.

Brian Hackett [Laid off!]

Comment 1

•

11 years ago

Doing SIMD based parallelization is an upcoming goal for the ParallelArray project I believe.  I've been curious if the work there could also be used for vectorizing normal JS, and how often that would be helpful.

Nicolas B. Pierron [:nbp]

Reporter

Updated

•

11 years ago

Depends on: 832777

Nicolas B. Pierron [:nbp]

Reporter

Updated

•

11 years ago

Depends on: 832778

Nicolas B. Pierron [:nbp]

Reporter

Updated

•

11 years ago

Depends on: 832779

Jan de Mooij [:jandem]

Comment 2

•

11 years ago

This is interesting for sure, but do we know why we are currently slower than V8 on guassian-blur? There may be some simpler, lower-hanging fruit there.

Nicolas B. Pierron [:nbp]

Reporter

Comment 3

•

11 years ago

I finished to test the prototype which does the unboxing of packed int (if they are) into double and do packed multiplications & additions and divisions before storing doubles back to the memory.

The prototype is available at: (only works on x64)
https://github.com/nbp/mozilla-central/branches/ionmonkey-fosdem-2013

The current result are showing a 20% improvement (187.1ms --> 149.7) over 100 runs of Gaussian blur.

Currently this prototype does the unboxing with SIMD, and it will surely benefit from the surely-Double arrays patch on which Brian is working on.

Nicolas B. Pierron [:nbp]

Reporter

Comment 4

•

11 years ago

Attached file Code generated by the prototype. — Details

This is the codegen output of the current (as the date of this message) version of
https://github.com/nbp/mozilla-central/branches/ionmonkey-fosdem-2013

This modification add a few LIR nodes to avoid doing the register allocation and use the register allocation made on Float register to allocate Packed-double registers.  The snapshot encoding and the *fake* optimization is made on the last MIR step of the graph.

This optimization relies on the MIR id & op to substitute/mutate the instructions to work on Packed doubles.   It add a few arch-specific LIR nodes which are targeted by the Lowering, in order to allocate temporary registers such as the one needed for unboxing Int32-s with SIMD.

Snapshots are bad, but no bailout occur during gaussian blur, so the prototype just encode packed doubles as doubles.

Brian Hackett [Laid off!]

Comment 5

•

11 years ago

(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #3)
> Currently this prototype does the unboxing with SIMD, and it will surely
> benefit from the surely-Double arrays patch on which Brian is working on.

That is bug 833898, and has a patch up for review if you want to test with it.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

10 years ago

Whiteboard: [ARM-opt]

Nicolas B. Pierron [:nbp]

Reporter

Comment 7

•

10 years ago

???

I you look at the attached patches of dependent bugs, this is far from being an ARM optimization.

Whiteboard: [ARM-opt]

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

BugBot (nomail) [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Keywords: meta

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

5 months ago

Status: NEW → RESOLVED

Closed: 5 months ago

Resolution: --- → INCOMPLETE

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

5 months ago

Blocks: sm-opt-jits

Severity: S3 → N/A

Status: RESOLVED → REOPENED

Type: defect → enhancement

Priority: -- → P3

Resolution: INCOMPLETE → ---

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[meta] IonMonkey: Use SIMD to optimize gaussian blur.

Categories

(Core :: JavaScript Engine, enhancement, P3)

Tracking

()

People

(Reporter: nbp, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

(Keywords: meta)

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 7

Updated

Updated

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type