Closed Bug 681858 Opened 12 years ago Closed 12 years ago

Improve gfx3DMatrix multiplication performance

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: mattwoodrow, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [inbound][qa-])

Attachments

(1 file, 1 obsolete file)

We now use gfx3DMatrix everywhere in layout, even though the majority of them only contain 2d components. We can check for this, and just do a 2d multiplication in this case.

This may (or may not) be causing a performance problem/regression in nsLayoutUtils::ChangeMatrixBasis on android, we should fix it regardless since it's easy.
Attachment #555631 - Flags: review?(tterribe)
Triple-checked my data and ensured that vfp was being used.  For a test case that uses CSS transforms to pan a page around in response to touch input, a pretty typical profile follows

$ ./report.sh -t 0.5
samples  %        app name                 symbol name
1997      6.8442  libxul.so                bits_image_fetch_bilinear_affine_pad_a8r8g8b8
777       2.6630  libc.so                  memcpy
722       2.4745  libxul.so                pixman_composite_src_0565_0565_asm_neon
545       1.8678  libxul.so                pixman_composite_over_0565_n_0565_asm_neon
403       1.3812  libxul.so                store_scanline_r5g6b5
378       1.2955  libxul.so                gfx3DMatrix::operator*(gfx3DMatrix const&) const
367       1.2578  vmlinux                  schedule
326       1.1173  vmlinux                  sync_buffer
323       1.1070  libxul.so                fetch_scanline_r5g6b5
314       1.0762  vmlinux                  add_event_entry
262       0.8979  oprofiled                get_file
232       0.7951  libxul.so                nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*)
228       0.7814  vmlinux                  find_vma
209       0.7163  vmlinux                  op_cpu_buffer_read_entry
206       0.7060  vmlinux                  sdhci_request
199       0.6820  vmlinux                  lookup_dcookie
192       0.6580  oprofiled                sfile_log_arc
186       0.6375  libxul.so                nsFrameManager::ReResolveStyleContext(nsPresContext*, nsIFrame*, nsIContent*, nsStyleChangeList*, nsChangeHint, nsRestyleHint, mozilla::css::RestyleTracker&, nsFrameManager::DesiredA11yNotifications, nsTArray<nsIContent*, nsTArrayDefaultAllocator>&, TreeMatchContext&)
182       0.6238  dalvik-jit-code-cache (deleted) /dev/ashmem/dalvik-jit-code-cache (deleted)
170       0.5826  libmozalloc.so           arena_dalloc
164       0.5621  vmlinux                  rb_event_length
157       0.5381  libdvm.so                dalvik_inst
154       0.5278  libc.so                  memmove
Comment on attachment 555631 [details] [diff] [review]
Only do 2d multiplication where we can

Review of attachment 555631 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with a couple of suggestions.

::: gfx/thebes/gfx3DMatrix.cpp
@@ +53,5 @@
>  
>  gfx3DMatrix
>  gfx3DMatrix::operator*(const gfx3DMatrix &aMatrix) const
>  {
> +  if (Is2D() && aMatrix.Is2D()) {

Is2D takes a gfxMatrix * argument that defaults to NULL, and fills it in with the 2D part if it's not NULL. Since you're now doing two of these checks on every multiply, this is likely to be a significant source of calls to this function, and you're always passing NULL.

It might be nice to overload it so there's a normal Is2D() with no arguments that skips that conditional, and then an Is2D(gfxMatrix *) that does what the current one does (calling the no-arg version for the actual check).

::: layout/base/nsLayoutUtils.cpp
@@ +1011,5 @@
> +  /* Translate to the origin before aMatrix */
> +  result.Translate(-aOrigin);
> +
> +  /* Translate back into position after aMatrix */
> +  result *= gfx3DMatrix::Translation(aOrigin);;

The Is2D() checks can help reduce this from 64 multiplies to 12, but they have to check the values of 20 elements first. An actual 3D translation (on the right) can be done with 12 multiplies without any checks. It would be great to have a version of Translate() that applies from that side, too.
Attachment #555631 - Flags: review?(tterribe) → review+
Thanks for the suggestions Tim!

Carrying forward r=derf
Attachment #555631 - Attachment is obsolete: true
Attachment #555854 - Flags: review+
http://hg.mozilla.org/mozilla-central/rev/bbb80094e71a
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
Whiteboard: [inbound] → [inbound][qa-]
You need to log in before you can comment on or make changes to this bug.