[GFX 2D] Optimize ConvertBGRXToBGRA

RESOLVED FIXED in mozilla38

Status

()

Core
Graphics
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: hev, Assigned: hev)

Tracking

Trunk
mozilla38
SGI
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [gfx-noted])

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

3 years ago
Created attachment 8550110 [details] [diff] [review]
0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch

User Agent: Mozilla/5.0 (X11; Linux mips64; rv:37.0) Gecko/20100101 Firefox/37.0
Build ID: 20150108132736

Steps to reproduce:

Overwrite bytes instead of double words(32-bit) got better performance.

Benchmark (convert 1 Gbytes data):
on MIPS platform:
old score: 9.7 seconds
new score: 0.7 seconds
(Assignee)

Updated

3 years ago
Attachment #8550110 - Flags: review?(gal)
Thanks for CC'ing me.

Is the perf issue due to the bitwise OR assignment, or due to the division (you're also getting rid of the latter)? Just wondering if the compiler you're using isn't hoisting the division out of the loop (which would be sad).

While you're here can you make aStride to const (which, if there is a division issue, might help with that)?

aData is of type uint8_t* so you don't need the reinterpret_cast if we make this change.

I think it would also be worth adding a comment here along the lines of "Perf note: we used to cast to uint32_t* and |= a 32-bit mask, but that was slow on MIPS with compiler XXX". Subject to change depending on the answer to my initial question of course.

Which compiler are you using? Did you test any other platforms/compilers? I'd expect the main compilers that we support to optimize the multiply-by-four that you're adding to a shift, but we should check. Better yet would be to avoid the shift, although that would mean leaving the uint32_t* cast and then casting back to uint8_t* to do the setting.
Component: Untriaged → Graphics
Product: Firefox → Core
Version: 37 Branch → Trunk
Whiteboard: [gfx-noted]
(Assignee)

Comment 2

3 years ago
Thanks for reply.

(In reply to Jonathan Watt [:jwatt] from comment #1)
> Thanks for CC'ing me.
> 
> Is the perf issue due to the bitwise OR assignment, or due to the division
> (you're also getting rid of the latter)? Just wondering if the compiler
> you're using isn't hoisting the division out of the loop (which would be
> sad).
I think the most important is to save the memory bandwidth in new version.
old:
1. load pixel (4-byte) from memory.
2. set alpha-byte = 0xff.
3. write pixel (4-byte) to memory.
4. next pixel, goto 1.
new:
1. write alpha-byte = 0xff to memory.
2. next pixel, goto 1.
> 
> While you're here can you make aStride to const (which, if there is a
> division issue, might help with that)?
> 
> aData is of type uint8_t* so you don't need the reinterpret_cast if we make
> this change.
You are right.
> 
> I think it would also be worth adding a comment here along the lines of
> "Perf note: we used to cast to uint32_t* and |= a 32-bit mask, but that was
> slow on MIPS with compiler XXX". Subject to change depending on the answer
> to my initial question of course.
> 
> Which compiler are you using? Did you test any other platforms/compilers?
on MIPS, GCC 4.9.2
> I'd expect the main compilers that we support to optimize the
> multiply-by-four that you're adding to a shift, but we should check. Better
> yet would be to avoid the shift, although that would mean leaving the
> uint32_t* cast and then casting back to uint8_t* to do the setting.
on x86-64, (same as mips, 1G bytes) new score: 0.11s, old score: 0.82s
(Assignee)

Updated

3 years ago
Attachment #8550110 - Attachment is obsolete: true
Attachment #8550110 - Flags: review?(gal)
(Assignee)

Comment 3

3 years ago
Created attachment 8551015 [details] [diff] [review]
0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch
Attachment #8551015 - Flags: review?(gal)
Comment on attachment 8551015 [details] [diff] [review]
0001-GFX-2D-Optimize-ConvertBGRXToBGRA.patch

Looks good, thank you. I'll land this for you in a day or so.

Looking back over your previous patches it looks like you need to ask someone to push them to Try for you, and then add the "push-needed" keyword to the keywords field of the bug to get your patch landed. Since they are months old now, in this instance you should first also update to master and make sure they are still relevant, apply and build.
Attachment #8551015 - Flags: review?(gal) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef9bec3906fb
Assignee: nobody → r
https://hg.mozilla.org/mozilla-central/rev/ef9bec3906fb
Status: UNCONFIRMED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Thanks for the patch, Heiher!

Comment 8

3 years ago
Can we test this on ARM? Most platforms don't do 8-bit stores these days. This might be a wash on arm but I would love to try.
You need to log in before you can comment on or make changes to this bug.