Closed Bug 1122383 Opened 9 years ago Closed 9 years ago

[GFX 2D] Optimize ConvertBGRXToBGRA

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla38

People

(Reporter: hev, Assigned: hev)

Details

(Whiteboard: [gfx-noted])

Attachments

(1 file, 1 obsolete file)

0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch 9 years ago hev [:hev] 1.09 KB, patch		Details \| Diff \| Splinter Review
0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch 9 years ago hev [:hev] 1.19 KB, patch	jwatt : review+	Details \| Diff \| Splinter Review

hev [:hev]

Assignee

Description

•

9 years ago

Attached patch 0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch (obsolete) — Details — Splinter Review

User Agent: Mozilla/5.0 (X11; Linux mips64; rv:37.0) Gecko/20100101 Firefox/37.0
Build ID: 20150108132736

Steps to reproduce:

Overwrite bytes instead of double words(32-bit) got better performance.

Benchmark (convert 1 Gbytes data):
on MIPS platform:
old score: 9.7 seconds
new score: 0.7 seconds

hev [:hev]

Assignee

Updated

•

9 years ago

Attachment #8550110 - Flags: review?(gal)

Jonathan Watt [:jwatt]

Comment 1

•

9 years ago

Thanks for CC'ing me.

Is the perf issue due to the bitwise OR assignment, or due to the division (you're also getting rid of the latter)? Just wondering if the compiler you're using isn't hoisting the division out of the loop (which would be sad).

While you're here can you make aStride to const (which, if there is a division issue, might help with that)?

aData is of type uint8_t* so you don't need the reinterpret_cast if we make this change.

I think it would also be worth adding a comment here along the lines of "Perf note: we used to cast to uint32_t* and |= a 32-bit mask, but that was slow on MIPS with compiler XXX". Subject to change depending on the answer to my initial question of course.

Which compiler are you using? Did you test any other platforms/compilers? I'd expect the main compilers that we support to optimize the multiply-by-four that you're adding to a shift, but we should check. Better yet would be to avoid the shift, although that would mean leaving the uint32_t* cast and then casting back to uint8_t* to do the setting.

Component: Untriaged → Graphics

Product: Firefox → Core

Version: 37 Branch → Trunk

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

9 years ago

Whiteboard: [gfx-noted]

hev [:hev]

Assignee

Comment 2

•

9 years ago

Thanks for reply.

(In reply to Jonathan Watt [:jwatt] from comment #1)
> Thanks for CC'ing me.
> 
> Is the perf issue due to the bitwise OR assignment, or due to the division
> (you're also getting rid of the latter)? Just wondering if the compiler
> you're using isn't hoisting the division out of the loop (which would be
> sad).
I think the most important is to save the memory bandwidth in new version.
old:
1. load pixel (4-byte) from memory.
2. set alpha-byte = 0xff.
3. write pixel (4-byte) to memory.
4. next pixel, goto 1.
new:
1. write alpha-byte = 0xff to memory.
2. next pixel, goto 1.
> 
> While you're here can you make aStride to const (which, if there is a
> division issue, might help with that)?
> 
> aData is of type uint8_t* so you don't need the reinterpret_cast if we make
> this change.
You are right.
> 
> I think it would also be worth adding a comment here along the lines of
> "Perf note: we used to cast to uint32_t* and |= a 32-bit mask, but that was
> slow on MIPS with compiler XXX". Subject to change depending on the answer
> to my initial question of course.
> 
> Which compiler are you using? Did you test any other platforms/compilers?
on MIPS, GCC 4.9.2
> I'd expect the main compilers that we support to optimize the
> multiply-by-four that you're adding to a shift, but we should check. Better
> yet would be to avoid the shift, although that would mean leaving the
> uint32_t* cast and then casting back to uint8_t* to do the setting.
on x86-64, (same as mips, 1G bytes) new score: 0.11s, old score: 0.82s

hev [:hev]

Assignee

Updated

•

9 years ago

Attachment #8550110 - Attachment is obsolete: true

Attachment #8550110 - Flags: review?(gal)

hev [:hev]

Assignee

Comment 3

•

9 years ago

Attached patch 0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch — Details — Splinter Review

Attachment #8551015 - Flags: review?(gal)

Jonathan Watt [:jwatt]

Comment 4

•

9 years ago

Comment on attachment 8551015 [details] [diff] [review]
0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch

Looks good, thank you. I'll land this for you in a day or so.

Looking back over your previous patches it looks like you need to ask someone to push them to Try for you, and then add the "push-needed" keyword to the keywords field of the bug to get your patch landed. Since they are months old now, in this instance you should first also update to master and make sure they are still relevant, apply and build.

Attachment #8551015 - Flags: review?(gal) → review+

Jonathan Watt [:jwatt]

Comment 5

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ef9bec3906fb

Assignee: nobody → r

Ryan VanderMeulen [:RyanVM]

Comment 6

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/ef9bec3906fb

Status: UNCONFIRMED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla38

Jonathan Watt [:jwatt]

Comment 7

•

9 years ago

Thanks for the patch, Heiher!

Andreas Gal :gal

Comment 8

•

9 years ago

Can we test this on ARM? Most platforms don't do 8-bit stores these days. This might be a wash on arm but I would love to try.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[GFX 2D] Optimize ConvertBGRXToBGRA

Categories

(Core :: Graphics, defect)

Tracking

()

People

(Reporter: hev, Assigned: hev)

References

Details

(Whiteboard: [gfx-noted])

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Updated

Comment 1

Updated

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type