Open Bug 715919 Opened 8 years ago Updated 2 years ago

Decoding jpegs is ~2x slower in Firefox than with standalone djpeg/tjbench

Categories

(Core :: ImageLib, defect)

x86_64
All
defect
Not set

Tracking

()

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Blocks 2 open bugs)

Details

Assuming color images, 1MP/s = 3MB/s.

Our telemetry [1] indicates that we get around 7MB/s (2MP/s) decoding JPEGs on x86-64 machines.  One standard deviation is, very roughly (I'm doing this by sight here), 4-11MB/s (1-4MP/s).

On x86-32, we see roughly half that speed.  The mode is 2MB/s (1.5MP/s), and one standard deviation is very roughly 1-5MB/s (0.3-2MP/s).

These numbers are calculated as (size of decoded image in bytes) / (time in seconds spent decoding).

Libjpeg-turbo reports [2] that we should see roughly 10x this performance.  In fact, according to its benchmarks, we're doing much worse than even vanilla libjpeg here.

Something is amiss.  It could be:

 1) My calculations here,
 2) The way we calculate decoder speed,
 3) Libjpeg-turbo's benchmarks,
 4) The comparison between libjpeg-turbo's benchmarks and our decoder speed (perhaps we're counting more work as part of the "decode" than libjpeg-turbo is?), or maybe
 5) Gecko.  (Maybe, for example, we're accidentally running super-slow decoder codepaths.)

[1] https://metrics.mozilla.com/pentaho/content/pentaho-cdf-dd/Render?solution=metrics2&path=telemetry&file=telemetryHistogram.wcdf (not public, I'm sorry to report)

[2] http://www.libjpeg-turbo.org/pmwiki/uploads/About/libjpegturbovsipp.ods
DRC, do you have any idea what might be going on here?
On my fast Linux-64 box, I decode the images at [1] at 4-8MB/s.

(Testcase: Load the page, let the images be discarded, then switch back to the tab.  I'm printf'ing the data written to the IMAGE_DECODE_SPEED_JPEG histogram.  Search RasterImage.cpp for "KBps".)

[1] http://www.boston.com/bigpicture/2011/12/the_year_in_pictures_part_ii.html
We also only send certain-sized chunks to the decoder, which depends a lot about whether it's coming off the network or we're just redecoding right from the downloaded buffer.

Do we have any speed comparisons between the two scenarios?
All of the images on that web page are progressive JPEGs.  libjpeg-turbo only accelerates baseline JPEG encoding/decoding.  For progressive JPEGs, libjpeg-turbo should not be any slower than libjpeg, but it isn't expected to be any faster, either.
(Copied and adapted from what I wrote in bug 661304 comment 62)
These progressive JPEGs may be very slow to decode due to bug 435628.

I am not too knowledgeable about the matter but as I understand from what I read, the bug says that progressive JPEGs are very slow to decode when gfx.color_management.enabled=true . From the bug:"Image takes up to 1000% (10x) longer to load than with gfx.color_management.enabled=false".
From what I understand, since then, there is no gfx.color_management.enabled option in about:config, since this option was removed (and is always true), and it always does color management when there is v2 color management profile in the image.
Now there is only gfx.color_management.enablev4 option which defaults to false, but using color management for v2 is enabled always, so the images might be slow to decode because of this bug. Correct me if I misunderstood.
> All of the images on that web page are progressive JPEGs.

Ah, okay.  This does not solve the whole mystery, however!  Let's try another photoblog.  :)

I get ~14MB/s == 5MP/s decoding [2], which jpeginfo says is not progressive.  Isn't that still dog slow?

[1] http://www.theatlantic.com/infocus/2012/01/a-view-inside-iran/100219/
[2] http://cdn.theatlantic.com/static/infocus/iran010612/i41_RTR2MB5Q.jpg
I'm getting something like 50-60 MP/s decoding the individual JPEGs from that page using TJBench (i.e. using libjpeg-turbo outside of the context of any application.)  So yes, I think 5 MP/s is dog slow, and it doesn't seem to be libjpeg-turbo that is causing that.
Blocks: image-suck
It would have been nice to have the following additional information displayed in "image info":
1. The image decoding time and speed (in uncompressed mexapixels and/or bytes per second)
2. Whether this is a progressive JPEG or not.

I filed bug 717784 for that.
The JPEG decoder ImageLib used before libjpeg-turbo wasn't vanilla jpeg either, was it? (bug 411718, bug 412753, …?). Shouldn't therefore the previous decoder also be taken in account in any such comparison?
> The JPEG decoder ImageLib used before libjpeg-turbo wasn't vanilla jpeg either, was it? 

Clearly not, and you're welcome to investigate how fast it was!  But so much has changed since then, even if you told me that we decoded at 0.1MP/s or 50MP/s before libjpeg-turbo, it would be hard for me to say what that measurement means.  (Does it have to do with some new routines which are now counted as "decode" time, or vice versa?  Maybe the old code does some more things after "decode finishes".)

> Shouldn't therefore the previous decoder also be taken in account in any such comparison?

Our changes were supposed to make us faster than stock libjpeg.  So if we're 10x slower than stock libjpeg, presumably we're even more slower compared to Mozilla's hacked libjpeg.  You're welcome to verify this, of course.  :)
In case someone can check it easily: what is the typical performance of GIF and PNG decoding in Firefox, and is it slow like the JPEG decoder?
Ah, I messed up compressed versus uncompressed bytes!

test.jpg is 5100x2995 = 15MP.  2.8MB compressed.

$ time ./djpeg test.jpg > /dev/null
0.1s, so 150MP/s or 28MB/s.

$ ./tjbench test.jpg
Frame rate: 13.76 fps (i.e., 0.07s to decompress once)
Throughput: 210MP/s

These numbers are consistent.

In Firefox, I get 14MB/s on the same image.  So there's still a discrepancy (compare 14MB/s to 28MB/s), but it's not gigantic.
(In reply to nivtwig from comment #11)
> In case someone can check it easily: what is the typical performance of GIF
> and PNG decoding in Firefox, and is it slow like the JPEG decoder?

This probably isn't as interesting because GIFs and PNGs are typically much smaller than JPEGs and the image complexity varies more widely.  But according to telemetry:

 GIF is roughly uniform from 500B/s to 23KB/s.
 PNG is a rough normal distribution, mean 5KB/s, 1 sigma [1KB/s - 14KB/s].

Again, this these decode speeds are measured in *compressed* bytes / second.  I have no idea if these are good numbers.
The discrepancy between djpeg and tjbench is likely due to the disk I/O overhead in djpeg (djpeg loads chunks of the image on-demand from the disk, whereas TJBench does the entire decompression in memory.)

w.r.t. Firefox, I wouldn't call a 2X discrepancy "not gigantic."  :)
> w.r.t. Firefox, I wouldn't call a 2X discrepancy "not gigantic."  :)

I imagine we're counting more things as "decode" than libjpeg-turbo does.  Well...either that, or we're running the decoder twice for each image.  :)
(In reply to Justin Lebar [:jlebar] from comment #12)
> test.jpg is 5100x2995 = 15MP.  2.8MB compressed.
Thanks.
1. Is the image used, "test.jpg", can be considered a good representative example? 
   I think it should be tested with more images to be conclusive, other images like those you used in comment 6 .

2. I think there is no point in talking about or measuring compressed bytes/second, because compression ratios of the same image can vary widely according to the compression desired and be as desired 5% , 30%, 80%, which can cause wide variations when you talk in compressed bytes per second. Therefore Mega Pixels per second, or uncompressed bytes per second is a better measurement that depends less on the compression ratio chosen for the specific image.

3. The PNG and GIF measurements seem to be very slow compared to the JPEG. 
   23KB/s compressed, assuming a high compression of 20x means about 460KB/s uncompressed, or about 150K Pixels/sec, which is 1000X (!) slower than the JPEG performance of 150MP/s .
(In reply to nivtwig from comment #16)
> 2. I think there is no point in talking about or measuring compressed
> bytes/second, because compression ratios of the same image can vary widely
> according to the compression desired and be as desired 5% , 30%, 80%, which
> can cause wide variations when you talk in compressed bytes per second.
> Therefore Mega Pixels per second, or uncompressed bytes per second is a
> better measurement that depends less on the compression ratio chosen for the
> specific image.

Completely agree.  As an example, when you compress nightshot_iso_100.ppm from http://www.imagecompression.info/test_images/ using quality 95 and quality 50 (4:2:2 subsampling), the difference in compression ratio (quality 50 relative to quality 95) is ~4.5x, but the difference in pixel throughput on decompression is only 1.5x, so measuring compressed bytes/second would tell you that the quality=95 image decoded 3x faster than the quality=50 image, when in fact it was 30% slower from the user's point of view.

Compressed bytes/second is meaningful only if the network/disk is the primary bottleneck.  It doesn't tell you anything meaningful about the decompressor.  If there is zero I/O overhead, then the user's perception of performance will be determined by the pixel throughput.
> 1. Is the image used, "test.jpg", can be considered a good representative example? 

I get basically the same speeds with the other examples from comment 6.  (If you'd like to test with your own set of images, search for "KBps" in RasterImage.cpp and put a printf there.)

> 2. I think there is no point in talking about or measuring compressed bytes/second,

Cool.  I filed bug 719709 on fixing the telemetry.

> 3. The PNG and GIF measurements seem to be very slow compared to the JPEG.

I just checked; I got the units wrong.  Everything should be multiplied by 1024.  So:

 GIF is roughly uniform from 500KB/s to 23MB/s.
 PNG is a rough normal distribution, mean 5MB/s, 1 sigma [1MB/s - 14MB/s].

> w.r.t. Firefox, I wouldn't call a 2X discrepancy "not gigantic."  :)

Just to be clear, I think this merits further investigation, if only so we know where the time is going.  (I profiled a bit, and I don't see much that isn't libjpeg-turbo code running, so it's not clear where all that time is going.)  It's just nice to know that we're talking about a 2x difference rather than a 10x difference!
Summary: Decoding jpegs is much slower than libjpeg-turbo's benchmarks → Decoding jpegs is ~2x slower than libjpeg-turbo's benchmarks
Summary: Decoding jpegs is ~2x slower than libjpeg-turbo's benchmarks → Decoding jpegs is ~2x slower in Firefox than with standalone djpeg/tjbench
Disabling color management (gfx.color_management.mode = 0) takes me from ~30MP/s to ~37MP/s.
And nsJPEGDecoder::OutputScanlines is not exactly optimized.

On the one hand, we do a QCMS transform from RGB --> RGB, and then we do *another* pass over the data to convert RGB --> FRGB.  And the RGB --> FRGB transform is not even vectorized!

OutputScanlines (presumably, the RGB --> FRGB transform) is about .6x as expensive as the islow dct.
The silliness in OutputScanlines is dominated in cost by the QCMS (color management) transform, for better or for worse.  And if you disable color management, we go straight to ARGB through the jpeg decoder.
(In reply to DRC from comment #4)
> All of the images on that web page are progressive JPEGs.  libjpeg-turbo
> only accelerates baseline JPEG encoding/decoding.  For progressive JPEGs,
> libjpeg-turbo should not be any slower than libjpeg, but it isn't expected
> to be any faster, either.

I should amend this statement, in case someone stumbles upon this thread via Google.  In fact, further research reveals that progressive JPEGs are faster in libjpeg-turbo than in libjpeg.  However, the performance advantage is quite a bit smaller than with baseline -- something like 30-40% as opposed to 2-4x.
(In reply to DRC from comment #22)
> (In reply to DRC from comment #4)
> > All of the images on that web page are progressive JPEGs.  libjpeg-turbo
> > only accelerates baseline JPEG encoding/decoding.  For progressive JPEGs,
> > libjpeg-turbo should not be any slower than libjpeg, but it isn't expected
> > to be any faster, either.
> 
> I should amend this statement, in case someone stumbles upon this thread via
> Google.  In fact, further research reveals that progressive JPEGs are faster
> in libjpeg-turbo than in libjpeg.  However, the performance advantage is
> quite a bit smaller than with baseline -- something like 30-40% as opposed
> to 2-4x.

Is there room for improvement in libjpeg-turbo here or are progressive JPEGs inherently more difficult to decode?
Yes and yes.  :)  The Huffman routines are more complicated with progressive, and they are not shared with baseline, so none of my Huffman optimizations apply.  Basically, it's a case of Amdahl's Law.  Huffman coding accounts for a larger percentage of execution time with progressive, so speeding up the rest of the code had much less of an effect.

Hard to say how much room for improvement there is.  I would say "some", but doubtful that it would ever reach the same levels of performance that we achieve with baseline.
You need to log in before you can comment on or make changes to this bug.