Closed Bug 679527 Opened 10 years ago Closed 9 years ago

qcms performance regression from Fx3 using LUT ICC profile

Categories

(Core :: GFX: Color Management, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla19

People

(Reporter: cyber.spamage, Assigned: BenWa)

References

(Blocks 1 open bug)

Details

(Whiteboard: [keep open])

Attachments

(4 files)

Tested with:
AMD X2 4800+ @2.4Ghz
2GB DDR400
WinXP SP3 x86
Firefox 8.0a1 http://hg.mozilla.org/mozilla-central/rev/73c4423aafee
gfx.color_management.enablev4 = True

Rough execution time with attached image using a LAB ICCv2 monitor profile:
Firefox 3.0 using lcms1 (LAB monitor profile): ~8 seconds
Firefox 8.0a1pre using qcms (LAB monitor profile): ~21 seconds
Photoshop CS4 (LAB monitor profile): ~1 second

This is a major performance regression from littleCMS1-based color management Firefox 3.0.

qcms performance optimizations are badly needed, since you never want a large image hanging your browser for 21 seconds. The goal should be at least matching or beating the performance of lcms1 in Firefox 3.0 using a LUT monitor profile, before enabling LUT support by default.

If speed goals for qcms aren't easily meet, someone may want to re-assess littleCMS (now version 2.2), to see if it's viable for re-integration or possibly co-existing with qcms.
Keywords: perf, topperf
See Also: → 538114
We don't want to label this as |perf, topperf| performance issue since this is non default behavior.

Should we turn this on by default we would, but this is precisely why we will fix it before turning this on by default.
Keywords: perf, topperf
Whiteboard:
Blocks: 679875
I profiled this:
http://people.mozilla.com/~bgirard/cleopatra/?report=af3d7ced765ec8a5266bc34cfa5371e384c77438

It doesn't seemed like there is in fact a performance problem anymore. I remember use fixing a problem recently where we were using a slow path incorrectly.

Can you reproduce Alex?

I'm going to try again with a LUT output profile.
Whiteboard:
I can reproduce the problem with a v4 output profile. We spend over 75% of the time applying color corrections:

http://people.mozilla.com/~bgirard/cleopatra/?report=d19bc5b5d92fc85fe535315c7d0d284c90349554
I don't have a desktop build ready to look at this today but I will do this soon:
- Use floorf when appropriate
- transform_util.h regresses some inlining.
- Optimize 'qcms_transform_data_tetra_clut_rgba':
  - Useless conversion from int->float->int
  - Used fix point precision instead of float
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached patch Prefer floorfSplinter Review
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
Attachment #673438 - Flags: review?(jmuizelaar)
Attached patch Use floorf/ceilfSplinter Review
Attachment #673439 - Flags: review?(jmuizelaar)
The objdump of this code is still a nightmare so there's a lot left to be done.
Attachment #673438 - Flags: review?(jmuizelaar) → review+
Attachment #673450 - Flags: review?(jmuizelaar)
Comment on attachment 673439 [details] [diff] [review]
Use floorf/ceilf

Review of attachment 673439 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/qcms/transform_util.c
@@ +465,5 @@
>          for (i = 0; i < length; i++) {
>                  uint16_fract_t result;
>                  double x = ((double) i) / (double) (length - 1);
>                  x = pow(x, gamma);                //XXX turn this conversion into a function
> +                result = floorf(x*65535. + .5);

x is a double so don't use floorf()
Attachment #673439 - Flags: review?(jmuizelaar) → review+
Attachment #673450 - Flags: review?(jmuizelaar) → review+
Still slow:
http://people.mozilla.com/~bgirard/cleopatra/#report=8456eebc95516a18e94319e1e985ae0257130e1b&search=tetra_clut&filter=[{%22type%22%3A%22RangeSampleFilter%22%2C%22start%22%3A904%2C%22end%22%3A7857}]&selection=[%22%28root%29%22%2C%22start%22%2C%22main%22%2C%22XRE_main%22%2C%22Startup%3A%3AXRE_Main%22%2C%22XREMain%3A%3AXRE_main%28int%2C%20char**%2C%20nsXREAppData%20const*%29%22%2C%22XREMain%3A%3AXRE_mainRun%28%29%22%2C%22nsAppStartup%3A%3ARun%28%29%22%2C%22nsAppShell%3A%3ARun%28%29%22%2C%22-[NSApplication%20run]%22%2C%22-[GeckoNSApplication%20nextEventMatchingMask%3AuntilDate%3AinMode%3Adequeue%3A]%22%2C%22-[NSApplication%20nextEventMatchingMask%3AuntilDate%3AinMode%3Adequeue%3A]%22%2C%22_DPSNextEvent%22%2C%22BlockUntilNextEventMatchingListInMode%22%2C%22ReceiveNextEventCommon%22%2C%22RunCurrentEventLoopInMode%22%2C%22CFRunLoopRunSpecific%22%2C%22__CFRunLoopRun%22%2C%22__CFRunLoopDoSources0%22%2C%22__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__%22%2C%22Events%3A%3AProcessGeckoEvents%22%2C%22nsAppShell%3A%3AProcessGeckoEvents%28void*%29%22%2C%22nsBaseAppShell%3A%3ANativeEventCallback%28%29%22%2C%22NS_ProcessPendingEvents_P%28nsIThread*%2C%20unsigned%20int%29%22%2C%22nsThread%3A%3AProcessNextEvent%28bool%2C%20bool*%29%22%2C%22mozilla%3A%3Aimage%3A%3ARasterImage%3A%3ADecodeWorker%3A%3ARun%28%29%22%2C%22mozilla%3A%3Aimage%3A%3ARasterImage%3A%3ADecodeWorker%3A%3ADecodeSomeOfImage%28mozilla%3A%3Aimage%3A%3ARasterImage*%2C%20mozilla%3A%3Aimage%3A%3ARasterImage%3A%3ADecodeWorker%3A%3ADecodeType%29%22%2C%22mozilla%3A%3Aimage%3A%3ARasterImage%3A%3AWriteToDecoder%28char%20const*%2C%20unsigned%20int%29%22%2C%22mozilla%3A%3Aimage%3A%3AnsJPEGDecoder%3A%3AWriteInternal%28char%20const*%2C%20unsigned%20int%29%22%2C%22mozilla%3A%3Aimage%3A%3AnsJPEGDecoder%3A%3AOutputScanlines%28bool*%29%22%2C%22qcms_transform_data_tetra_clut%22]
You need to log in before you can comment on or make changes to this bug.