752254 - Massive Performance Problem for users with B2A CLUT display profiles

Reporter

Description

•

13 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0 Build ID: 20120501201020 Steps to reproduce: Displayed this image on a Q6600 system running Windows 7 64-bit: http://www.malch.com/nikon/DSD_1314ProPhoto.jpg Actual results: Firefox consumed approx 25 seconds of CPU rendering this image. Expected results: It should have completed the task in 2-4 seconds of CPU (and apparently does so on some CPU's). The problem has been present since FF8 through 14.0a2 (Aurora). Some have suggested that the problem arises only on CPUs that do NOT support SSE4.1 instruction sets. Apparently, the problem also arises on Linux systems. However, it has also been reported that the Firefox 12 on Mepis 11 resolves the problem for reasons that are unknown but presumably related to the build parameters.

(not currently active) Ted Mielczarek

Updated

•

13 years ago

Component: Untriaged → Graphics

Product: Firefox → Core

QA Contact: untriaged → thebes

malch

Reporter

Comment 1

•

13 years ago

Additional test results: The following tests were carried out on a Q6600 CPU running 64-bit Windows 7. We used Firefox 13.0 but we have seen very similar performance with all recent versions of Firefox. Recorded times are in CPU seconds as reported by Windows task manager. These include the time to launch Firefox. We tested two versions of the same image: A progressive JPEG available here: http://www.malch.com/nikon/DSD_1314ProPhoto.jpg A non-progressive JPEG available here: http://www.malch.com/nikon/DSD_1314ProPhotoNP.JPG Test 1 ====== Color management off (gfx.color_management.mode=0) Images loaded from local hard drive Non-progressive: 3 seconds Progressive: 4 seconds Test 2 ====== Color management off (gfx.color_management.mode=0) Images loaded from network Non-progressive: 11 seconds Progressive: 11 seconds Test 3 ====== Color management on (gfx.color_management.mode=2) Images loaded from local hard drive Non-progressive: 6 seconds Progressive: 15 seconds Test 4 ====== Color management on (gfx.color_management.mode=2) Images loaded from network Non-progressive: 14 seconds Progressive: 26 seconds We did experiment with the "Use Hardware Acceleration when Available" option but it did not appear to make any significant difference. These experiments appear to suggest two different issues: 1. The CPU times consumed when loading the images over a network appear excessive compared with images loaded from a local hard drive. 7 to 8 seconds of CPU cycles represents an enormous delta. 2. There appears to be a performance issue when applying color management to progressive JPEGs.

malch

Reporter

Comment 2

•

13 years ago

More data that may explain the slowness rendering progressive JPEG's... In nsJPEGDecoder::WriteInternal() we monitored the inner loops for progressive and non-progressive images where calls are made to OutputScanlines(). We also looked at the values of mInfo.output_scanline. In the non-progressive case, the image appears to be processed in a single pass, as expected. In the progressive case, the image appears to be processed in 7 passes! This seems high. We expected to 3 (or maybe 4) passes. The additional passes do appear to account for the slowness with progressive images. The overhead is considerable, especially when full color management via qcms is enabled. Others more familiar with the code may need to determine whether the number of passes can be reduced. Of course, these results do NOT explain: * The apparent slowness of the image rendering on the Q6600 compared to that reported with other CPU's. * The large amount of additional CPU consumed when the images are fetched via HTTP compared with time to process the same image fetched from the local file system. We will continue our efforts to gather more data on these issues and post any significant findings here. Overall, we think there is significant scope for improved image handling performance and we hope this information is useful.

malch

Reporter

Comment 3

•

13 years ago

We started making some comparative timings between a Q6600 and an E7500 system. That's a newer CPU of broadly equivalent power and with SSE4 support. When (and only when) color management was enabled, it rendered the test images significantly faster. It seemed as if the qcms color management software was running dramatically faster on the CPU with SSE4. That was surprising because, as far as we know, qcms does not use SSE4. However, the Q6600 and E7500 environments were not identical. The Q6600 was running 64-bit Windows 7 whereas the E7500 was running 32-bit Windows 7. But then we realized that the Q6600 had a calibrated display whereas the E7500 did not. Hey presto, when we installed a calibration profile on the E7500 it slowed down dramatically. It was still a little faster but maybe around 20%. That's certainly something that could be attributable to the hardware. Then, of course, we deinstalled the calibration profile from the Q6600. We found that it was necessary to delete the gfx.color_management.display_profile value and remove the profile from Windows control panel. But when we did that, the Q6600 system rendered the image dramatically faster. We think this means Firefox is applying two transforms to the image data. One to address the ProPhotoRGB ICC embedded in the image, converting it to sRGB (we guess). And a second transform to apply the display profile. This too is a big surprise. Our own software which uses lcms versus qcms applies a single transform. During setup, it creates a transform to map ProPhotoRGB directly to the display profile. And then it can do the whole job in a single pass. An image editor would normally need to apply both transforms because it needs to maintain a copy of the image in the editors native color space as well as pushing a copy with the display profile applied out to the monitor. Does a browser need to do that? In any event, Firefox seems to be using 2 passes/transforms. We don't know if that's an oversight, a design decision, or a qcms limitation (although we think the latter is unlikely). We will investigate further and also record/post some updated timings for the various scenarios over the next day or so. We apologize for length of these posts. However, we do think it's justified by the potential rewards which seem to us to be quite considerable.

Robert Kaiser

Comment 4

•

13 years ago

Moving to color management as per recent comments.

Component: Graphics → GFX: Color Management

QA Contact: thebes → color-management

malch

Reporter

Comment 5

•

13 years ago

Gentlemen: After reviewing many other discussions in bugzilla and reflecting on the issues raised here and elsewhere, we feel: 1. A review of the whole JPEG life cycle (data source, decoding, col mgt, and rendering) would be in order. This is probably a long term issue/project. 2. In the short term, there are a couple of things that would bring FF image performance into the same general ballpark as the competing browsers. First, drop support for the progressive rendering of progressive JPEG's. Firefox would still handle progressive JPEG's but it would not attempt to render them progressively. Chrome/Safari/IE do this and they render the complete image faster than Firefox can render the first pass. Furthermore, with col mgt and a display profile active, the Firefox progressive rendering stalls because there's not enough CPU cycles available for it to keep up. The current implementation just isn't working. Progressive rendering with full color management is very CPU intensive. It simply may not be viable until more of these functions have migrated to the GPU (which seems likely in the long term). Meanwhile, emulate the other browsers and render progressive JPEG's non-progressively. Second, look at combining the two qcms transformations. We think it should be possible to create a transform that maps the embedded ICC directly to the display profile. Thus col mgt can be executed with a single transform of the decompressed image versus the two passes used currently. 3. Absent any comment/feedback from others more actively involved in the code, there's probably not a lot more we can contribute. However, we'll will continue to monitor this thread and respond if/when appropriate.