Last Comment Bug 573948 - Replace libjpeg with libjpeg-turbo
: Replace libjpeg with libjpeg-turbo
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: Trunk
: All All
: -- normal with 21 votes (vote)
: mozilla5
Assigned To: Justin Lebar (not reading bugmail)
:
: Milan Sreckovic [:milan]
Mentors:
http://libjpeg-turbo.virtualgl.org/
: 242145 244978 416157 (view as bug list)
Depends on: 570477 580518 583924 646254 646485 646489 646704
Blocks: jpeg-perf 584652 586320 475225 477728 496298 650899 662346
  Show dependency treegraph
 
Reported: 2010-06-23 03:55 PDT by atzaus
Modified: 2014-04-25 15:16 PDT (History)
75 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
-
?


Attachments
JPEG loading test (907 bytes, application/xhtml+xml)
2010-07-20 15:28 PDT, Ryan VanderMeulen [:RyanVM]
no flags Details
Patch v1 - Part 1: Removing libjpeg (1.39 MB, patch)
2010-08-05 10:32 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v1 - Part 2: Adding libjpeg-turbo (517.61 KB, application/x-gzip)
2010-08-05 10:35 PDT, Justin Lebar (not reading bugmail)
no flags Details
Patch v1 - Part 3: Changes to the build (6.76 KB, patch)
2010-08-05 10:43 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v1 - Patching libjpeg-turbo on top of libjpeg (1.43 MB, patch)
2010-08-05 13:23 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Comparison of libjpeg-turbo (left) and libjpeg rendering jpg-size-5x5.jpg (41.68 KB, image/png)
2010-08-06 15:11 PDT, Justin Lebar (not reading bugmail)
no flags Details
Patch v2 - libjpeg-turbo added on top of libjpeg in /jpeg directory (1.42 MB, patch)
2010-08-06 15:12 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Method dispatch fix (1.75 KB, patch)
2010-08-06 15:48 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v2.1 (v2 plus method dispatch fix) (1.42 MB, patch)
2010-08-06 15:50 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v1.3 (1.42 MB, patch)
2010-08-10 13:56 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Patch v1.3.1 (1.39 MB, patch)
2010-09-02 17:32 PDT, Justin Lebar (not reading bugmail)
jmuizelaar: review+
Details | Diff | Splinter Review
Patch v2.0. Part 1: Upgrade to libjpeg-turbo v1.1.0 (1.67 MB, patch)
2011-03-28 13:20 PDT, Justin Lebar (not reading bugmail)
jmuizelaar: review+
Details | Diff | Splinter Review
Patch v2.0. Part 2: Address review comments. (3.53 KB, patch)
2011-03-28 13:21 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Interdiff v1.3.1 -> v2.0 part 1 (869.19 KB, text/plain)
2011-03-28 13:24 PDT, Justin Lebar (not reading bugmail)
no flags Details
Interdiff v1.3.1 -> v2.0 part 1 without deleted files (195.00 KB, text/plain)
2011-03-28 13:45 PDT, Justin Lebar (not reading bugmail)
no flags Details
Proposed upstream patch for addressing concern about jround_up() argument types. (2.40 KB, patch)
2011-03-28 18:01 PDT, DRC
no flags Details | Diff | Splinter Review
New proposed upstream patch for addressing concern about jround_up() argument types. (2.57 KB, patch)
2011-03-28 21:41 PDT, DRC
no flags Details | Diff | Splinter Review
Patch v2.0 Part 3: Re-enable reftests on Windows. (2.80 KB, patch)
2011-03-29 07:48 PDT, Justin Lebar (not reading bugmail)
jmuizelaar: review+
Details | Diff | Splinter Review

Description atzaus 2010-06-23 03:55:57 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4
Build Identifier: 

To quote the URL
"libjpeg-turbo is a high-speed version of libjpeg  for x86 and x86-64 processors which uses SIMD instructions (MMX, SSE2, etc.) to accelerate baseline JPEG compression and decompression. libjpeg-turbo is generally 2-4x as fast as the unmodified version of libjpeg, all else being equal.

libjpeg-turbo was originally based on libjpeg/SIMD by Miyasaka Masaru, but the TigerVNC and VirtualGL projects made numerous enhancements to the codec, including improved support for Mac OS X, 64-bit support, support for 32-bit and big endian pixel formats (RGBA, ABGR, etc.), accelerated Huffman encoding/decoding, and various bug fixes. The goal was to produce a fully open source codec that could replace the partially closed source TurboJPEG/IPP codec used by VirtualGL and TurboVNC. libjpeg-turbo generally achieves 80-120% of the performance of TurboJPEG/IPP. It is faster in some areas but slower in others."

Fedora already working on incorporating it in their distribution.
http://fedoraproject.org/wiki/Features/libjpeg-turbo

License is wxWindows license for some parts and LGPL for the rest.

This would close a lot of bugs regarding libjpeg/SSE Optimization/64-bit Porting

Reproducible: Always
Comment 1 cajbir (:cajbir) 2010-06-23 04:56:25 PDT
You'll probably need to consult with licensing@mozilla.org about the license. See guidelines here: http://www.mozilla.org/MPL/license-policy.html

When looking at third party libraries for the video decoding and YCbCr conversion LGPL was an issue.
Comment 2 Jeff Muizelaar [:jrmuizel] 2010-06-23 05:39:49 PDT
Using libjpeg-simd is probably an easier option here than libjpeg-turbo license wise.
Comment 3 atzaus 2010-06-23 06:06:32 PDT
According to
http://libjpeg-turbo.svn.sourceforge.net/viewvc/libjpeg-turbo/trunk/README-turbo.txt

"Some of the optimizations to the Huffman encoder/decoder were borrowed from
VirtualGL, and thus the libjpeg-turbo distribution, as a whole, falls under the
wxWindows Library Licence, Version 3.1.  A copy of this license can be found in
this directory under LICENSE.txt.  The wxWindows Library License is based on
the LGPL but includes provisions which allow the Library to be statically
linked into proprietary libraries and applications without requiring the
resulting binaries to be distributed under the terms of the LGPL.

The rest of the source code, apart from these modifications, falls under a less
restrictive, BSD-style license (see README.)"

So it seems that wxWindows license is a more liberal version of LGPL and the rest is BSD (I was mistaken in the initial posting)
Comment 4 atzaus 2010-06-23 06:18:48 PDT
(In reply to comment #2)
> Using libjpeg-simd is probably an easier option here than libjpeg-turbo license
> wise.

Sorry for the bug spam

Regarding libjpeg-simd

From the Google translation of http://cetus.sakura.ne.jp/softlab/jpeg-x86simd/jpegsimd.html

"In addition, AMD64 (EM64T/Intel64) in 64bit mode environment (currently) not supported."

"AMD64 compatible version is under construction...(October 26, 2006: The production process is currently stopped. Seems slow release schedule. Sorry.)"

Jpeg-simd would have to be ported to 64-bit which from what I have read is a cumbersome task.
Comment 5 Jeff Muizelaar [:jrmuizel] 2010-06-23 07:34:22 PDT
(In reply to comment #4)
> Jpeg-simd would have to be ported to 64-bit which from what I have read is a
> cumbersome task.

From the diffs between the 64-bit version and the 32-bit version it seems that the process would be mostly mechanical. In fact, it should be possible to use the same source for 64-bit as 32-bit like libvpx does. Further, the 64-bit versions in libjpeg-turbo should be under the same license as jpeg-simd so we should be able to just use them directly.
Comment 6 DRC 2010-06-23 12:08:53 PDT
Hi, all.  I'm the maintainer and one of the creators of libjpeg-turbo, so I thought I'd throw in some comments.  For starters, libjpeg-turbo is not licensed under pure LGPL.  It uses the wxWindows Library License, which eliminates some of the viral provisions of the LGPL.  My understanding of the differences between the two is spelled out here:

http://www.virtualgl.org/About/License

I have confirmed with the wxWindows developers who created the license that this matches their understanding as well.  Essentially, it is our understanding that wxWindows nullifies Sections 6-7 of LGPL v2.1, which means you can statically link the Library with any application without requiring the application as a whole to fall under the provisions of the LGPL.  You are still bound by other sections of the LGPL, which means you would still be required to distribute source for the Library along with the application.

That being said, however, I can understand the desire for a fully unencumbered version of the source.  The only part of the code which is licensed under wxWindows is the Huffman encoding/decoding optimizations (portions of jchuff.c and jdhuff.c.)  These were developed while I was a Sun employee, so unfortunately I don't have the right to dual license them, or else I certainly would do so.

What I could do, however, is develop a mechanism whereby you could easily build the code without the Huffman optimizations ('configure --without-huffman-opt' or something like that.)  This would simply involve building libjpeg-turbo with the "stock" versions of jchuff.c and jdhuff.c from libjpeg-6b.  I could even go one step farther and make it so that the wxWindows-licensed files are not included in the distribution tarball if you 'configure --without-huffman-opt' and then 'make dist'.

This would slow down the overall performance by 20-25%, but it will still be miles faster than regular libjpeg on x86/x86-64 systems.
Comment 7 Gervase Markham [:gerv] 2010-06-24 02:38:18 PDT
What DRC suggests seems like the smart thing to do, to begin with. Let's take the parts which are unambiguously BSD and still provide a significant win. 

We currently have a policy of only taking code which is compatible with all three of our licenses. We'd have to do some analysis on the wxWindows licence to see whether it fits our policy. (Please, people, stop rolling your own licences! :-) If someone could open another bug for that, that would be great.

But if it's not compatible in that way, I think it's highly unlikely we'd want to introduce a new licence into the Mozilla mix just for a 25% win in JPEG decoding (particularly if we'd already recently had a big win over what we'd been using for years).

Gerv
Comment 8 DRC 2010-06-24 11:44:41 PDT
wxWindows is actually a very mature license at this point-- it just isn't well known.  It was created to solve a specific problem -- allowing static linking of an open source library with proprietary code without forcing the proprietary application to fall under the terms of the LGPL, but also guaranteeing the open nature of the code (which BSD-style licenses do not do.)  OpenSceneGraph uses it, and that's where I learned about it originally.

Let me know if there's anything I can do to facilitate this on my end.  If you're just going to merge the code into your build tree, then you can just as easily replace jchuff.c and jdhuff.c with their original versions when you do that, but I'm still willing to add a configure switch to libjpeg-turbo as well.
Comment 9 Jeff Muizelaar [:jrmuizel] 2010-06-24 12:50:39 PDT
(In reply to comment #8)
> Let me know if there's anything I can do to facilitate this on my end.  If
> you're just going to merge the code into your build tree, then you can just as
> easily replace jchuff.c and jdhuff.c with their original versions when you do
> that, but I'm still willing to add a configure switch to libjpeg-turbo as well.

This seems like it might be best option.
Comment 10 Stuart Parmenter 2010-06-24 12:59:25 PDT
Has any work been done on ARM or other platforms?
Comment 11 Jeff Muizelaar [:jrmuizel] 2010-06-24 13:02:57 PDT
(In reply to comment #10)
> Has any work been done on ARM or other platforms?

One thing I've been meaning to do is switch mobile to use the IFAST dct instead of the ISLOW dct. Android uses IFAST and in fact have an ARM implementation of the IFAST dct. Unfortunately, it is licensed with the apache license. Perhaps we can convince Google to relicense it?
Comment 12 Makoto Kato [:m_kato] 2010-06-24 18:52:34 PDT
(In reply to comment #10)
> Has any work been done on ARM or other platforms?

We should write NEON SIMD code if we change to libjpeg-turbo.


(In reply to comment #11)
> (In reply to comment #10)
> > Has any work been done on ARM or other platforms?
> 
> One thing I've been meaning to do is switch mobile to use the IFAST dct instead
> of the ISLOW dct. Android uses IFAST and in fact have an ARM implementation of
> the IFAST dct. Unfortunately, it is licensed with the apache license. Perhaps
> we can convince Google to relicense it?

Past year, Michael Moy tried to change to IFAST on some platforms by bug 416157.   But many test cases became failed.  If changing to IFAST, we have to modify many test cases, too.
Comment 13 Gervase Markham [:gerv] 2010-06-25 02:06:25 PDT
DRC: the problem is not just incompatibility, but simplicity. A library under the straight LGPL would be compatible with the MPL, the LGPL and the GPL (our three licences) but would increase the licensing complexity of our codebase, something we have historically tried to avoid.

Gerv
Comment 14 DRC 2010-06-25 03:22:45 PDT
So, just so I understand, inbound code to the Mozilla project has to either be licensed under the tri-license or something less restrictive that can be re-licensed under the tri-license?  So if it were possible to re-license the offending code in libjpeg-turbo under your tri-license, then that would solve the problem?
Comment 15 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-06-25 03:34:59 PDT
Our goal is that people should be able to redistribute all the code that goes into our products under the terms of any one of the MPL, LGPL, or GPL. So:
-- code that is licensed with a license compatible with all of those (e.g., BSD) is fine
-- code that is licensed with multiple license options that cover the three above licenses is also fine (e.g., dual licensed MPL/LGPL, since the LGPL option is compatible with the GPL as well)

Note that we don't relicense other people's code. E.g., when we import BSD-licensed code into our tree, it keeps the original license.
Comment 16 DRC 2010-06-25 03:49:55 PDT
Unless I'm missing something fundamental, I think wxWindows-licensed code could be legally redistributed under all three licenses.  However, I fully understand the desire not to introduce new licenses into your code tree.

One question-- do you currently copy the libjpeg code into the Mozilla source or do you build it separately?
Comment 17 Reed Loden [:reed] (use needinfo?) 2010-06-29 01:04:50 PDT
(In reply to comment #16)
> One question-- do you currently copy the libjpeg code into the Mozilla source
> or do you build it separately?

It's in our tree, but it's not by any means the original libjpeg, as it's been patched to pieces.

http://mxr.mozilla.org/mozilla-central/source/jpeg/
Comment 18 DRC 2010-06-29 11:30:15 PDT
OK, then it seems as if merging in the "turbo" features (minus the Huffman coding) into your existing code tree would be the way to go.
Comment 19 Makoto Kato [:m_kato] 2010-06-29 19:21:26 PDT
(In reply to comment #13)
> DRC: the problem is not just incompatibility, but simplicity. A library under
> the straight LGPL would be compatible with the MPL, the LGPL and the GPL (our
> three licences) but would increase the licensing complexity of our codebase,
> something we have historically tried to avoid.

Original license of SIMD code is http://cetus.sakura.ne.jp/softlab/jpeg-x86simd/jpegsimd.html#conditions

In Japanese, "この SIMD 拡張版 IJG JPEG software の使用条件については、オリジナル版の IJG JPEG software の使用条件が適用されます。"

In English by Google Translate, "This SIMD extension for IJG JPEG software terms of use, the original version of the IJG JPEG software is subject to terms of use."

So, original author who is MIYASAKA-san says that this code is same license as libjpeg.  I believe that license is no problem if we port MIYASAKA-san's code to our tree, not using libjpeg-turbo.

Gerv, how about?
Comment 20 DRC 2010-06-29 19:26:45 PDT
Why does the idea of using the old libjpeg/SIMD code keep coming up in this discussion?  This has already been covered multiple times-- libjpeg/SIMD does not have 64-bit support or many of the other bug fixes we made to libjpeg-turbo.

libjpeg-turbo also bears the same license as libjpeg, if you don't use our accelerated version of jchuff.c and jdhuff.c.

In short-- merge in libjpeg-turbo, minus jchuff.c and jdhuff.c, and there is no licensing problem.
Comment 21 Vladimir Vukicevic [:vlad] [:vladv] 2010-06-29 19:36:43 PDT
There seems to be some confusion around what should happen here... the right thing, I think, is:

1. Look at our in-tree libjpeg and how it differs from stock libjpeg.

2. Based on that, either merge the delta from libjpeg -> libjpeg-turbo into our tree, OR merge delta of stock libjpeg and in-tree libjpeg into libjpeg-turbo, and then pull that in.

(minus j[cd]huff.c, of course).

I don't really know how extensive the changes in #1 are, and I don't know what, if any, conflicts there would be between the changes for -turbo and our tree version of libjpeg.  But I think it's worth it to find out; the perf wins wouldn't hurt as we try to do decode on draw, and it sounds like there are some pretty significant ones here.
Comment 22 Vladimir Vukicevic [:vlad] [:vladv] 2010-06-29 19:37:00 PDT
Forgot to ask -- who wants to make a patch? :-)
Comment 23 DRC 2010-06-29 20:14:40 PDT
After a cursory look at the code, it seems like libjpeg-turbo may want to borrow Mozilla's enhancements, where applicable.  In particular, the use of static tables in choice areas like RGB->YUV conversion is likely to improve our performance as well.

Also, there is some of the code which is going to conflict,  because your version already has some SIMD acceleration.  I think that most of it could be replaced with ours, but the areas in which you use SSE2 would probably need to be benchmarked against our version to see which is faster (integer IDCT, for instance.)
Comment 24 Jeff Muizelaar [:jrmuizel] 2010-06-29 20:17:07 PDT
(In reply to comment #23)
> After a cursory look at the code, it seems like libjpeg-turbo may want to
> borrow Mozilla's enhancements, where applicable.  In particular, the use of
> static tables in choice areas like RGB->YUV conversion is likely to improve our
> performance as well.
> 
> Also, there is some of the code which is going to conflict,  because your
> version already has some SIMD acceleration.  I think that most of it could be
> replaced with ours, but the areas in which you use SSE2 would probably need to
> be benchmarked against our version to see which is faster (integer IDCT, for
> instance.)

Our integer SSE2 idct is buggy (bug 477728), so while it would be nice to see how the performance compares, I think I would still switch to the libjpeg-turbo one even if ours is faster.
Comment 25 Justin Lebar (not reading bugmail) 2010-07-15 12:18:30 PDT
(In reply to comment #22)
> Forgot to ask -- who wants to make a patch? :-)

I'll try.
Comment 26 Justin Lebar (not reading bugmail) 2010-07-16 15:11:26 PDT
(In reply to comment #6)
> What I could do, however, is develop a mechanism whereby you could easily build
> the code without the Huffman optimizations ('configure --without-huffman-opt'
> or something like that.)

My guess is that we'll not want to include j{c,d}huff.c in our source tree at all, if they're encumbered with an incompatible license.  Is this right?
Comment 27 Gervase Markham [:gerv] 2010-07-17 09:34:11 PDT
jlebar: that seems wisest to me.

Gerv
Comment 28 Justin Lebar (not reading bugmail) 2010-07-19 15:11:18 PDT
To the build team: Are we willing to add a dependency on nasm so we can get this library?  If I understand correctly, we'd need nasm on both Windows and *nix.
Comment 29 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-19 17:11:43 PDT
Can you get it building with YASM? YASM is already a build requirement everywhere except Win32. It claims to support nasm-compatible syntax.
Comment 30 Justin Lebar (not reading bugmail) 2010-07-19 17:25:50 PDT
I think yasm will work.
Comment 31 Justin Lebar (not reading bugmail) 2010-07-20 15:22:36 PDT
I have something which is building and running (with yasm) on x64.  I'd love to run some performance tests so we can see whether this is worth investing in further.

I figure I should try and run tprender tests, as in

  dist/bin/firefox -tprender -tp manifest -tpformat text

but this runs the test and produces no output.

Can anyone suggest a way to make this work or an alternative benchmarking methodology?
Comment 32 Ryan VanderMeulen [:RyanVM] 2010-07-20 15:28:53 PDT
Created attachment 458811 [details]
JPEG loading test

I used this to test some of mmoy's JPEG patches back in the day. It's got a bit of scatter, but I found that it was reasonably consistent if I did 10-20 runs (unfortunately, manually on each one). The images you see referenced in the file were just a bunch of ~3MB pictures from a digital camera.
Comment 33 Jeff Muizelaar [:jrmuizel] 2010-07-20 18:23:19 PDT
(In reply to comment #31)
> I have something which is building and running (with yasm) on x64.  I'd love to
> run some performance tests so we can see whether this is worth investing in
> further.
> 
> I figure I should try and run tprender tests, as in
> 
>   dist/bin/firefox -tprender -tp manifest -tpformat text

I don't know what tprender does so I'm not sure it's useful. If you post a patch I can get some tp4 (page load benchmark) numbers to compare against our current baseline.
Comment 34 Vladimir Vukicevic [:vlad] [:vladv] 2010-07-20 18:37:14 PDT
Trender isn't going to help because that's just repaint speed -- so will be after jpeg decode has already been done.  I think you'll have a hard time coming up with an in-browser benchmark, though decoding a number of (very) large images is probably the best you can do..
Comment 35 Justin Lebar (not reading bugmail) 2010-07-20 21:55:21 PDT
Okay.  I filed adding timers for the image decoder as bug 580518.
Comment 36 Justin Lebar (not reading bugmail) 2010-07-23 09:16:13 PDT
Some preliminary performance results, on Linux x64.  We're comparing unvectorized libjpeg to libjpeg-turbo's vectorized 64-bit code.  In brief, it appears that the part of decoding that was vectorized runs 2.3x as fast as the unvectorized part.

These benchmarks were run on a quad-core Xeon W3520 (2.67GHz) running Ubuntu 10.04, kernel 2.6.32-23-generic.

Methodology:

$ perf record -ig dist/bin/firefox -tp manifest
    (manifest contains the path to a local copy of [1], which weighs in at 3,888 × 2,592 pixels, 2.95 MB)
$ perf report -g graph

If you're going to try this yourself, note that you need to save the result of perf report before rebuilding.  Your generated perf.data will be invalid (and unreadable) as soon as you modify the binaries.

perf only tells us how long we spend in a function relative to all the other functions in the program.  To work around this, I added a spin loop to the startup sequence.  The loop always does the same amount of work, so we can compare the fraction of our time spent spinning to the fraction of time spent in the JPEG decoder to determine the speedup.

Original:

    11.93%  jpeg_idct_islow
     8.94%  SpinBriefly()
     8.12%  decode_mcu
     7.59%  qcms_transform_data_rgb_out_lut_sse2
     5.71%  ycc_rgb_convert
     2.37%  0x000000001d2170 (ld.so)
     2.31%  jpeg_fill_bit_buffer
     1.97%  __pthread_mutex_unlock_internal

libjpeg-turbo:

    10.84%  SpinBriefly()
     9.49%  qcms_transform_data_rgb_out_lut_sse2
     7.04%  decode_mcu
     6.29%  0x00000000f90c10 (libxul.so)
     2.41%  0x000000001c8e80 (ld.so)
     2.36%  nsJPEGDecoder::OutputScanlines(int*)
     2.14%  __pthread_mutex_unlock_internal
     1.97%  pthread_mutex_lock

I think the unidentified libxul function in the libjpeg-turbo profile is libjpeg-turbo's vectorized jpeg_idct_islow routine.  If so, we can conclude that the speedup for this function is (11.93/8.94) / (6.29/10.84) = 2.30.  The speedup for the JPEG decoder as a whole is of course smaller.

[1] http://commons.wikimedia.org/wiki/File:Schloss_Lichtenstein_04-2010.jpg
Comment 37 DRC 2010-07-23 09:25:36 PDT
Justin,

The comparison you made would only be valid if the two tests were equal in terms of total execution time, but my read of the above is that the amount of work done was the same but the execution time for both tests was different.  Is that correct?
Comment 38 Justin Lebar (not reading bugmail) 2010-07-23 09:35:27 PDT
The assumption is that SpinBriefly() takes the same amount of time in both tests.  I don't actually know that -- all I know is that it does the same work.  It's the same code in both cases.  But with the assumption, I think we can compare execution times by normalizing the reported percentages against the percentage reported for SpinBriefly.

To illustrate, say we knew that SpinBriefly() takes 1s.  Then in the original case, we could say that 8.94% = 1s, so 11.93% = (11.93/8.94)s = 1.3s.  We'd similarly compute that the 6.29% reported for libjpeg-turbo counts for (6.29/10.84)s = .58s.  And then we'd conclude that our speedup is 1.3s / .58s = 2.3 as before.  But of course we don't need to know how long SpinBriefly is in order to do this computation.
Comment 39 DRC 2010-07-23 09:39:48 PDT
Fair enough.  My other question is why the slow IDCT is the only accelerated function.  We've established that the accelerated Huffman decode can't be used, but aren't there other functions that can be accelerated using libjpeg-turbo?
Comment 40 Justin Lebar (not reading bugmail) 2010-07-23 09:51:13 PDT
(In reply to comment #39)
> My other question is why the slow IDCT is the only accelerated function.

Good question.  Here's where the other functions in the profile live:

qcms_transform_data_rgb_out_lut_sse2 : gfx/qcms/qcmsint.c
decode_mcu : jpeg/jdhuff.c
nsJPEGDecoder::OutputScanlines(int*) : libpr0n/decoders/jpeg/nsJPEGDecoder.cpp

So only the slow IDCT and decode_mcu are functions in libjpeg.  I was actually testing with libjpeg-turbo's jdhuff implementation here, and you can see that decode_mcu is substantially faster with libjpeg-turbo: (8.12/8.94) / (7.04/10.84) = 1.40x speedup.
Comment 41 Justin Lebar (not reading bugmail) 2010-07-23 15:52:35 PDT
On 32-bit, yasm .8 (which is what Debian is shipping) hits an infinite loop.  yasm 1.0 seems to work.

I imagine we wouldn't be willing to require yasm 1.0.  But maybe we could disable JPEG SIMD if the machine's yasm isn't new enough.
Comment 42 Vladimir Vukicevic [:vlad] [:vladv] 2010-07-23 16:02:55 PDT
Pretty sure we can require yasm 1.0, especially if there are easily installable packages for debian/ubuntu/fedora around (even if they're not stock provided).
Comment 43 Justin Lebar (not reading bugmail) 2010-07-23 16:15:11 PDT
Debian unstable is still on yasm 0.8, fwiw.  1.0 was released April 8.
Comment 44 Justin Lebar (not reading bugmail) 2010-07-26 11:44:00 PDT
*** Bug 416157 has been marked as a duplicate of this bug. ***
Comment 45 Jeff Muizelaar [:jrmuizel] 2010-07-26 11:47:59 PDT
(In reply to comment #41)
> On 32-bit, yasm .8 (which is what Debian is shipping) hits an infinite loop. 
> yasm 1.0 seems to work.

Can we work around the infinite loop?
Comment 46 Justin Lebar (not reading bugmail) 2010-07-26 14:18:45 PDT
(In reply to comment #45)
> Can we work around the infinite loop?

I think we can.  jccol{mmx,ss2}.asm includes jcclr{mmx,ss2}.asm multiple times, and I think this is what's causing the hang.  We should be able to split up jccol into separate files, one for each include of jcclr.

Do the libjpeg-turbo guys have any thoughts on whether or not this would be a good idea?  I'd prefer to diverge from upstream as little as possible, of course.
Comment 47 DRC 2010-07-26 14:56:09 PDT
I don't have a problem with making those changes to the libjpeg-turbo source, but prototype it first on your end and make sure it fixes the problem.
Comment 48 Justin Lebar (not reading bugmail) 2010-07-26 15:40:19 PDT
I successfully got jccol{mmx,ss2} to assemble, but now yasm is hanging in jcsammmx.

I think yasm 0.8 may be a game of whack-a-mole we don't want to play.
Comment 49 Justin Lebar (not reading bugmail) 2010-07-26 15:43:55 PDT
(I should add: The reason for the hang in jcsammmx is less obvious than the other files.  I tried collapsing the includes, which aren't as tricky as the jccol includes, to no avail.)
Comment 50 Vladimir Vukicevic [:vlad] [:vladv] 2010-07-26 19:45:58 PDT
(In reply to comment #48)
> I think yasm 0.8 may be a game of whack-a-mole we don't want to play.

Yup.  Add a dependency to yasm 1.0.  Worst case we build a static binary of it and include it as part of our build setup for linux.
Comment 51 Justin Lebar (not reading bugmail) 2010-07-27 17:33:17 PDT
Macports is shipping yasm 1.0.
Comment 52 Justin Lebar (not reading bugmail) 2010-07-28 14:54:01 PDT
Just got this working on OSX 64.  Seeing a speedup similar to what was observed on Linux 64:

Original:

  3.8% SpinBriefly()
  7.8% jpeg_idct_islow()

libjpeg-turbo:

  5.4% SpinBriefly()
  4.2% decoding

Speedup: (7.8/3.8) / (4.2/5.4) = 2.6.
Comment 53 Justin Lebar (not reading bugmail) 2010-07-29 10:34:50 PDT
yasm 1.0 on OSX 10.6 64-bit seems to be emitting incorrect 32-bit code.  At least, it works when I use nasm, but the yasm build decodes all jpegs as garbage.

The nasm flags I've used are:

  -fmacho -DMACHO -DPIC

and the yasm flags I've tried are:

  -fmacho -DMACHO -DPIC -rnasm -pnasm

and

  -fmacho32 -DMACHO -DPIC -rnasm -pnasm

I'm looking into this, but I have basically no idea why this is happening.  Any ideas would be appreciated...
Comment 54 Jeff Muizelaar [:jrmuizel] 2010-07-29 11:25:22 PDT
(In reply to comment #53)
> I'm looking into this, but I have basically no idea why this is happening.  Any
> ideas would be appreciated...

Do you know what the difference is in the generated code?
Comment 55 Justin Lebar (not reading bugmail) 2010-07-29 12:36:19 PDT
(In reply to comment #54)
> Do you know what the difference is in the generated code?

Well...here's *a* difference I've been looking at, in _jsimd_rgb_ycc_convert_sse2 (jccolss2.asm):

nasm:
  leal 0x00001bf3(%ecx,%ebp,8),%ebx

yasm:
  leal 0xffffffd3(%ecx,%ebp,8),%ebx

Here's more context:

..@12.adjust:
  pushl   %ebp
  xorl    %ebp,%ebp
  leal    0x00001bf3(%ecx,%ebp,8),%ebx    <---
..@12.ref:
  popl    %ebp
  movl    %ebx,0xffffff7c(%ebp)
  movl    0x08(%eax),%ecx
  testl   %ecx,%ecx
  je      0x00000412

Just the leal line is different between the two assemblers.

Here's the preprocessed source which is generating this.  It's the same for both yasm and nasm.

..@26.adjust:
  push ebp
  xor ebp,ebp
  db 0x8D,0x9C         <----
  jmp near const_base  <----
..@26.ref:
  pop ebp
  mov dword [ebp-(8-(0))*16-4],ebx
  mov ecx, dword[(eax)+8]
  test ecx,ecx
  jz near .return

Before we try and understand how a db plus a jmp combined to form a leal, let's look at the unprocessed source, from jcclrss2.asm:

  get_GOT ebx                     ; get GOT address
  movpic  POINTER [gotptr], ebx   ; save GOT address
  mov     ecx, JDIMENSION [img_width(eax)]
  test    ecx,ecx
  jz      near .return

get_GOT is a macro defined in jsimdext.inc.  Here's the whole thing:

  ; At present, nasm doesn't seem to support PIC generation for Mach-O. 
  ; The PIC support code below is a little tricky. 

          SECTION SEG_CONST 
  const_base: 

  %define GOTOFF(got,sym) (got) + (sym) - const_base 

  %imacro get_GOT 1 
          ; NOTE: this macro destroys ecx resister. 
          call    %%geteip 
          add     ecx, byte (%%ref - $) 
          jmp     short %%adjust 
  %%geteip: 
          mov     ecx, POINTER [esp] 
          ret 
  %%adjust: 
          push    ebp 
          xor     ebp,ebp         ; ebp = 0 
  %ifidni %1,ebx  ; (%1 == ebx) 
          ; db 0x8D,0x9C + jmp near const_base = 
          ;   lea ebx, [ecx+ebp*8+(const_base-%%ref)] ; 8D,9C,E9,(offset32) 
          db      0x8D,0x9C               ; 8D,9C 
          jmp     near const_base         ; E9,(const_base-%%ref) 
  %%ref: 
  %else  ; (%1 != ebx) 
          ; db 0x8D,0x8C + jmp near const_base = 
          ;   lea ecx, [ecx+ebp*8+(const_base-%%ref)] ; 8D,8C,E9,(offset32) 
          db      0x8D,0x8C               ; 8D,8C                       <----
          jmp     near const_base         ; E9,(const_base-%%ref)       <----
  %%ref:  mov     %1, ecx 
  %endif ; (%1 == ebx) 
          pop     ebp 
  %endmacro 

It appears that we're taking the %1 == ebx branch in the macro above (compare the arguments to db).

The apparent key point here is that the db / jmp lines marked with arrows become a leal instruction.  Why?  I dunno.  But perhaps this is what yasm is interpreting differently than nasm.  I'll keep looking after lunch.
Comment 56 Justin Lebar (not reading bugmail) 2010-07-29 14:14:15 PDT
I put an e-mail into the yasm-devel list about this.
Comment 57 Chris Pearce (:cpearce) 2010-07-29 17:00:46 PDT
(In reply to comment #29)
> Can you get it building with YASM? YASM is already a build requirement
> everywhere except Win32. It claims to support nasm-compatible syntax.

YASM won't work on Win32. YASM doesn't output object files which can link with /SAFESEH, so we can't link YASM object files into xul.dll. For libvpx we worked around this by converting the YASM assembly to MASM syntax using http://mxr.mozilla.org/mozilla-central/source/media/libvpx/yasm2masm-as.sh 

Ideally we'd convince the YASM developers to fix their win32 safeseh section generation, or submit a patch which fixes that ourselves upstream.
Comment 58 Justin Lebar (not reading bugmail) 2010-07-29 17:16:33 PDT
Given that the code barely works under yasm, I'm not hopeful that we'll be able to easily translate to masm, unless the syntax for macros and whatnot is very close.  But it's definitely worth looking at.

For now, I'm just going to use nasm on OSX-32.  I think a new-enough version of nasm ships on OSX by default.  (My understanding is that the version of nasm that ships isn't new enough to build libjpeg-turbo on 64-bit, but there, yasm works fine.)  It would be really nice to get yasm to work, though.
Comment 59 Justin Lebar (not reading bugmail) 2010-07-30 17:11:32 PDT
ted, bsmedberg, cpearce: Do you have any advice as to how to proceed here wrt Windows builds?  It looks like the options are:

* Fix yasm so it works with /SAFESEH or otherwise work around this issue
* Ship nasm in Windows build package (assuming that some version of nasm works on Windows)
* Translate the libjpeg-turbo code to a msam-compatible syntax
* Ship libjpeg-turbo on Linux/Mac and use our current libjpeg implementation on Windows
Comment 60 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-30 21:39:16 PDT
I'm actually leaning towards the first option since we're going to be distributing yasm ourselves anyway...
Comment 61 Justin Lebar (not reading bugmail) 2010-07-30 21:54:44 PDT
Fixing yasm certainly sounds to me like the Right Thing to do.  It looks like there's some support for SAFESEH [1] [2], but perhaps it's insufficient [3].

I don't have a good grasp of what the issue is even after doing some reading, so I'm not sure I'm the right person to try and hack the assembler.  But if nobody has time to hack it themselves, I can try, provided someone can sit down with me and explain what's going on.

[1] http://www.tortall.net/projects/yasm/ticket/130
[2] http://www.tortall.net/projects/yasm/changeset/2032
[3] http://www.tortall.net/projects/yasm/ticket/139
Comment 62 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-30 22:21:13 PDT
cpearce probably has the best idea at the moment
Comment 63 DRC 2010-07-31 09:24:02 PDT
In response to the second option (shipping NASM on Windows), NASM does work fine to build libjpeg-turbo on Windows, but I don't know whether it also suffers from the /SAFESEH problem.  As with other platforms, you'll need NASM 2.05 or later to build the 64-bit version.

In response to the third option, you'd be responsible for maintaining any such translation, since it is of no use to the upstream project.
Comment 64 Chris Pearce (:cpearce) 2010-07-31 15:27:07 PDT
(In reply to comment #59)
> * Fix yasm so it works with /SAFESEH or otherwise work around this issue

Ideally we should fix YASM, as we need a working YASM for libvpx as well.

YASM has partial support for SafeSeh, but it doesn't go far enough. YASM ticket 139 seems to diagnose the problem. I don't know much about object formats or assemblers, so I'd need to get up to speed before I could be much help fixing YASM. Maybe an email to the YASM mailing list be helpful?
Comment 65 Justin Lebar (not reading bugmail) 2010-07-31 20:30:11 PDT
Looks like the yasm devs the OSX-32 bug in their trunk [1].  I'll spin a build and see for myself on Monday.

http://www.tortall.net/projects/yasm/changeset/2345
Comment 66 Justin Lebar (not reading bugmail) 2010-07-31 23:29:33 PDT
The SAFESEH issue may also have been fixed this weekend.  Hooray!

http://www.tortall.net/projects/yasm/changeset/2343
Comment 67 Ted Mielczarek [:ted.mielczarek] 2010-08-02 07:40:08 PDT
FWIW, I agree with roc. We want to use YASM to build libvpx, so we should fix YASM and require it on Windows. bug 570477 is on file for adding it to MozillaBuild. If we can get a working version (even a nightly build or equivalent) I'd be happy to include it in MozillaBuild. I'd even spin a 1.5.1 release just for that, as long as you can provide me the binaries.
Comment 68 Justin Lebar (not reading bugmail) 2010-08-02 10:57:54 PDT
Success on Mac32!  Speedup of jpeg_idct_islow is 

(8.2/3.2) / (2.8/4.7) = 4.3

I'm not so good at reading Shark traces, but it looks like ycc_rgb_convert is also much faster:

  (8.3/3.2) / (1.3/4.7) = 9.3

But maybe the trace is somehow under-reporting for libjpeg-turbo.  In any case, I think it's faster.  :)
Comment 69 Justin Lebar (not reading bugmail) 2010-08-03 13:44:18 PDT
I managed to get xperf running on Windows, but khuey and I can't figure out how to get it to load symbols from my build.  The article at [1] seems to describe only the settings for retrieving symbols from our symbol server.

I'm going to see if I have more luck with a different profiler (jst suggested eqtime), but if anyone has suggestions for making this work with xperf, I'm all ears.

[1] https://developer.mozilla.org/en/Profiling_with_Xperf
Comment 70 Justin Lebar (not reading bugmail) 2010-08-03 14:07:43 PDT
jrmuizel to the rescue.  I have symbols on Windows.
Comment 71 Vladimir Vukicevic [:vlad] [:vladv] 2010-08-03 14:10:40 PDT
Feel free to poke me if you have problems with xperf, but for symbols, note the bits in that article about making sure you have the latest xperf and the "For a local firefox build" line -- or did that not work?
Comment 72 Justin Lebar (not reading bugmail) 2010-08-03 17:44:32 PDT
Benchmarks for Win32.

Methodology:

xperf -on latency -stackwalk profile
dist/bin/firefox -tp manifest    (same manifest as before)
xperf -d out.etl

xperf outputs the raw sample counts, which is kind of nice.   There's some variance in these (I've observed differences of 10-20% between runs), but it shouldn't be a big deal.

Unless I'm incorrectly counting some functions as belonging to libjpeg (please correct me if I am!), the speedup with libjpeg-turbo is 2.7x.

libjpeg-turbo:

  qcms_transform: 89 samples
  SpinBriefly: 72
  decode_mcu_slow: 50
  jsimd_idct_islow_sse2: 30
  jpeg_fill_bit_buffer: 19
  nsJPECDecoder::OutputScanlines: 15
  decompress_onepass: 12
  fetch_pixel_x8r8g8b8: 9
  jsimd_ycc_rgb_convert_sse2: 8

  JPEG:
    jsimd_idct_islow_sse2: 30
    jsimd_ycc_rgb_convert_sse2: 8
  TOTAL: 38
  
original:

  qcms_transform: 65 samples
  SpinBriefly: 65
  decode_mcu: 62
  ycc_rgb_convert: 47
  nsJPEGDecoder::OutputScanlines: 19
  dct_8x8_inv_16s: 18
  jpeg_fill_bit_buffer: 17
  bits_image_fetch_transformed: 16
  jpeg_idct_islow_sse2: 9
  decompress_onepass: 7
  ownpj_QuantInv_8x8_16s: 7
  ownpj_Add128_x8x_16s8u: 6
  fetch_pixel_x8r8g8b8: 6
  ...
  ippiDCTQuantInv8x8LS_JPEG_16s8u_C1R: 2

  JPEG:
    ycc_rgb_convert: 47
    dct_8x8_inv_16s: 18
    bits_image_fetch_transformed: 16
    jpeg_idct_islow_sse2: 9
    ownpj_QuantInv_8x8_16s: 7
    ownpj_Add128_x8x_16s8u: 6
    ...
    ippiDCTQuantInv8x8LS_JPEG_16s8u_C1R: 2
  TOTAL: 103

103/38 = 2.7
Comment 73 Justin Lebar (not reading bugmail) 2010-08-05 10:32:49 PDT
Created attachment 463214 [details] [diff] [review]
Patch v1 - Part 1: Removing libjpeg

The full patch weighs in at 3.8MB, which is too large for Bugzilla.  I've therefore split it into three parts: Removing libjpeg, adding libjpeg-turbo, and build changes.

There's little that's interesting in this patch; just a lot of - signs.

In the interests of full disclosure: I haven't tested this particular patch on all platforms.  It's working locally, and I'll of course push to try once the assembler is updated on the remote boxes.

I think we probably want configure to try to assemble the optimized code only if it finds a sufficiently new version of yasm -- otherwise, people with outdated versions are going to be confused by the hangs and incorrect code generation.  I have an XXX in configure.in for this; I'm not really sure how to do it.
Comment 74 Justin Lebar (not reading bugmail) 2010-08-05 10:35:08 PDT
Created attachment 463217 [details]
Patch v1 - Part 2: Adding libjpeg-turbo

Unfortunately this part is still too big for bugzilla.  Sorry about gzipping it.

Interesting bits here are probably just modules/libjpeg-turbo/Makefile.in and modules/libjpeg-turbo/simd/Makefile.in.
Comment 75 Boris Zbarsky [:bz] (still a bit busy) 2010-08-05 10:36:03 PDT
Fwiw, if we can create three self-contained patches in this order: add libjpeg-turbo, build changes, remove libjpeg and push them like that, that would be better than the order listed in comment 73, right?  And may be better than pushing a single mega-patch, though it's hard to tell.
Comment 76 Justin Lebar (not reading bugmail) 2010-08-05 10:43:41 PDT
Created attachment 463225 [details] [diff] [review]
Patch v1 - Part 3: Changes to the build
Comment 77 Jeff Muizelaar [:jrmuizel] 2010-08-05 11:25:06 PDT
Is there a good reason to not put libjpeg-turbo on top of libjpeg? Personally I'd prefer that and it should make the patch smaller.
Comment 78 Justin Lebar (not reading bugmail) 2010-08-05 11:29:02 PDT
I thought that it was kind of weird that jpeg got its own top-level directory, which is why I moved it to modules.

But I could move jpeg to modules/libjpeg-turbo and then put the libjpeg-turbo source on top of it, if we wanted.  That would probably reduce the patch size...
Comment 79 Jeff Muizelaar [:jrmuizel] 2010-08-05 11:33:05 PDT
(In reply to comment #78)
> I thought that it was kind of weird that jpeg got its own top-level directory,
> which is why I moved it to modules.
> 
> But I could move jpeg to modules/libjpeg-turbo and then put the libjpeg-turbo
> source on top of it, if we wanted.  That would probably reduce the patch
> size...

I would suggest doing the reverse. Patch libjpeg-turbo onto of jpeg in the current location, and then we can move it to saner place in a different bug.
Comment 80 Gervase Markham [:gerv] 2010-08-05 11:38:50 PDT
brendan is king of repo locations, but my understanding is that jpeg-at-top-level is historical accident, and it could easily be somewhere more sane. (It's just that when we were on CVS, moving it was a big pain.)

Gerv
Comment 81 Ted Mielczarek [:ted.mielczarek] 2010-08-05 11:46:59 PDT
You should get an imglib peer to r+ the add/remove bits. I can review the build system changes.
Comment 82 Bobby Holley (:bholley) (busy with Stylo) 2010-08-05 11:58:05 PDT
Have you run reftests on this patch?
Comment 83 Justin Lebar (not reading bugmail) 2010-08-05 11:59:33 PDT
I'll run them now.
Comment 84 Bobby Holley (:bholley) (busy with Stylo) 2010-08-05 12:16:42 PDT
(In reply to comment #77)
> Is there a good reason to not put libjpeg-turbo on top of libjpeg? Personally
> I'd prefer that and it should make the patch smaller.

In my opinion, it depends on whether the patch is useful or not. If the delta is reasonable enough to make it useable in the case of "oh hey, something broke when we upgraded to libjpeg-turbo", then we should patch it. But if it's not a useful diff, I think we should first remove it and then add it.

At the moment I'd probably prefer to put this in modules/libimg/jpeg, even if we eventually want to get rid of libimg (so that at least in the mean time, everything's in the same place). Joe might disagree though.

Given that this review is going to be more or less a rubber stamp, I'd like to at least know that it passes all our tests on all platforms first. Specifically, I'm concerned about our jpeg reftests, little-endian architectures like ARM, and architectures where we're not running the asm you're testing with. As such, I'm going to cancel review until we've got all that sorted out.

How much code changed? Do we need a security audit for this? (Not like the one we got for LCMS did us any good...grumble grumble). Dveditz, can you weigh in?
Comment 85 Justin Lebar (not reading bugmail) 2010-08-05 13:00:34 PDT
(In reply to comment #84)
> Given that this review is going to be more or less a rubber stamp, I'd like to
> at least know that it passes all our tests on all platforms first.

I think that's fair.  Releng is working on getting the requisite yasm version out to the build servers, and once it's out there, I'll push to try and see what happens.
Comment 86 Benjamin Smedberg [:bsmedberg] 2010-08-05 13:21:07 PDT
You mean for tryservers? I don't think we can push this to mozilla-central until there's a mozillabuild with the version of YASM we want people to use.
Comment 87 Justin Lebar (not reading bugmail) 2010-08-05 13:23:40 PDT
Created attachment 463272 [details] [diff] [review]
Patch v1 - Patching libjpeg-turbo on top of libjpeg

To the question of how much has changed, the answer is: A lot, but not everything.  :)

The vast majority of changes are in the simd folder, which contains 22000 lines of assembly.  We're of course not using all of that.

We could slim down the libjpeg-turbo patch a bit by removing files we don't need, but I'm not sure it's worth messing with.
Comment 88 Justin Lebar (not reading bugmail) 2010-08-05 13:27:50 PDT
(In reply to comment #86)
> You mean for tryservers? I don't think we can push this to mozilla-central
> until there's a mozillabuild with the version of YASM we want people to use.

Yes, I think we're blocking on bug 570477.
Comment 89 Bobby Holley (:bholley) (busy with Stylo) 2010-08-05 13:30:21 PDT
ok, I think we separate remove/add patches. a 2.4 meg jpeg library is bigger than we'd like it to be (the current one is 1.5 megs), but not quite worth making it harder to apply upstream patches by yanking things out.
Comment 90 Jeff Muizelaar [:jrmuizel] 2010-08-05 13:30:22 PDT
(In reply to comment #87)
> Created attachment 463272 [details] [diff] [review]
> Patch v1 - Patching libjpeg-turbo on top of libjpeg
> 
> To the question of how much has changed, the answer is: A lot, but not
> everything.  :)

The diff doesn't look too unwieldy. I can review this if you like.
Comment 91 Bobby Holley (:bholley) (busy with Stylo) 2010-08-05 13:32:22 PDT
(In reply to comment #90)

> The diff doesn't look too unwieldy. I can review this if you like.

If you care enough that you're willing to give it a non-rubber-stamp review, we can do it your way. ;-)
Comment 92 Jeff Muizelaar [:jrmuizel] 2010-08-05 13:37:34 PDT
(In reply to comment #91)
> (In reply to comment #90)
> 
> > The diff doesn't look too unwieldy. I can review this if you like.
> 
> If you care enough that you're willing to give it a non-rubber-stamp review, we
> can do it your way. ;-)

I think we should actually require it. libjpeg-turbo hasn't had nearly the same security exposure as libjpeg so we should at least go through to look for areas of increased risk.
Comment 93 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-08-05 15:05:52 PDT
My understanding is that we don't want to add new stuff to modules/.

For video/audio we created a new toplevel directory media/ to contain third-party libraries (media/libogg, media/libvpx, etc). For gfx we've put third-party libraries under gfx/ (gfx/cairo, gfx/harfbuzz, gfx/ycbcr, etc). I think this policy is working well.

So I suggest creating a new toplevel directory for image codecs, say images/. Move libpr0n and libpng there as well.
Comment 94 Vladimir Vukicevic [:vlad] [:vladv] 2010-08-05 15:10:44 PDT
This is a bit bikeshed-dy, but the fewer toplevel dirs we have the better; jpeg, png, etc. would fit in fine under 'media', I think.
Comment 95 Bobby Holley (:bholley) (busy with Stylo) 2010-08-05 15:13:39 PDT
(In reply to comment #93)
> My understanding is that we don't want to add new stuff to modules/.
> 
> For video/audio we created a new toplevel directory media/ to contain
> third-party libraries (media/libogg, media/libvpx, etc). For gfx we've put
> third-party libraries under gfx/ (gfx/cairo, gfx/harfbuzz, gfx/ycbcr, etc). I
> think this policy is working well.
> 
> So I suggest creating a new toplevel directory for image codecs, say images/.
> Move libpr0n and libpng there as well.

I think we're all on the same page here, but I don't think it's fair to make this patch implement that change. We've already got libpng in modules/libimg, and so I think it makes sense to keep libjpeg with libpng until we figure out the right place for both of them. This is something I plan on getting to "soonish" (ie, after the branch).
Comment 96 Stuart Parmenter 2010-08-05 15:16:16 PDT
agreed on media/ for image decoders
Comment 97 Olli Pettay [:smaug] 2010-08-05 15:22:20 PDT
Don't move existing code before hg (including hg web) can handle it properly, please.
Comment 98 Jeff Muizelaar [:jrmuizel] 2010-08-05 15:24:32 PDT
I've filled bug 584894 for moving the jpeg code.
Comment 99 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-08-05 15:28:17 PDT
(In reply to comment #96)
> agreed on media/ for image decoders

OK. This also might turn out to be handy if we ever start using a video codec for decoding static images as well (I've heard claims that VP8 can compress better than JPEG for example.)

(In reply to comment #95)
> I think we're all on the same page here, but I don't think it's fair to make
> this patch implement that change.

Sure.

(In reply to comment #97)
> Don't move existing code before hg (including hg web) can handle it properly,
> please.

Do you know if that's being worked on?
Comment 100 Olli Pettay [:smaug] 2010-08-05 15:34:13 PDT
> (In reply to comment #97)
> > Don't move existing code before hg (including hg web) can handle it properly,
> > please.
> 
> Do you know if that's being worked on?
http://mercurial.selenic.com/bts/issue1576 has been open for awhile.
Comment 101 Brendan Eich [:brendan] 2010-08-05 16:14:02 PDT
(In reply to comment #93)
> So I suggest creating a new toplevel directory for image codecs, say images/.
> Move libpr0n and libpng there as well.

Sounds good to me (in case anyone cares :-P).

/be
Comment 102 Brendan Eich [:brendan] 2010-08-05 18:20:47 PDT
Or under media -- comment 99 first para reply makes a good point.

Cc'ing djc about the Hg issue.

/be
Comment 103 Joe Drew (not getting mail) 2010-08-05 19:15:44 PDT
I'd prefer we put libjpeg-turbo in jpeg/ initially, and then move it later.
Comment 104 Justin Lebar (not reading bugmail) 2010-08-05 19:21:33 PDT
We're failing some of the jpeg reftests tests on Linux.

In all but one case, I can't distinguish the libjpeg-turbo image from the reference.  The one exception is jpg-cmyk-2.jpg, where the libjpeg-turbo image looks a bit sharper than the reference.  I can put the rendered image up here if anyone is curious.

How should we deal with these reftest failures?  Do we just declare libjpeg-turbo to be the new gold standard of jepg decoding and change the references, or do we investigate what's causing the differences?
Comment 105 Joe Drew (not getting mail) 2010-08-05 21:12:13 PDT
We definitely need to investigate what is causing these reftest failures. If there are bugs, they are likely to be very subtle, and so I'm afraid it'll be some hard slogging.

Start by finding out the magnitude of the differences. (1 pixel value - say r=125 to r=126 - can often be ignored as rounding differences, though not without determining where those rounding differences came from.) Use a screen magnifier that lets you view the pixel values under the cursor, or use something like Gimp that lets you sample any pixel on the screen.

Also, the reftest analyzer has code that'll circle the differences between two images. That might be helpful!

Finally, this doesn't need to block. I'll take a patch (on a relatively short leash), but I'd release without libjpeg-turbo.
Comment 106 Dirkjan Ochtman (:djc) 2010-08-05 23:55:25 PDT
smaug, bsmedberg: issue1576 is the only remaining issue for moving in hg, right? I'll try to conjure up some round tuits to get that into 1.6.3.
Comment 107 Justin Lebar (not reading bugmail) 2010-08-06 11:09:44 PDT
Comment on attachment 463225 [details] [diff] [review]
Patch v1 - Part 3: Changes to the build

Canceling build review, since I need to at least move some things around.

I'll still need some help figuring out how to get configure to reject yasm binaries older than whatever version we require (likely to be 1.0.2).
Comment 108 Justin Lebar (not reading bugmail) 2010-08-06 15:11:06 PDT
Created attachment 463675 [details]
Comparison of libjpeg-turbo (left) and libjpeg rendering jpg-size-5x5.jpg

Interesting.  There's actually a substantial difference in how libjpeg-turbo and libjpeg render this test image.
Comment 109 Justin Lebar (not reading bugmail) 2010-08-06 15:12:47 PDT
Created attachment 463677 [details] [diff] [review]
Patch v2 - libjpeg-turbo added on top of libjpeg in /jpeg directory
Comment 110 Justin Lebar (not reading bugmail) 2010-08-06 15:48:19 PDT
Created attachment 463692 [details] [diff] [review]
Method dispatch fix

Ah.  The rendering difference is due to what appears to be a typo in libjpeg-turbo.

Updated patch now passes the jpeg reftests locally.  I'll roll up a patch with this fix and submit this patch upstream in a sec.
Comment 111 Justin Lebar (not reading bugmail) 2010-08-06 15:50:45 PDT
Created attachment 463695 [details] [diff] [review]
Patch v2.1 (v2 plus method dispatch fix)
Comment 112 Nomis101 2010-08-07 08:07:49 PDT
I've tried to build FF/TB with this patch, but it failed because of a missing yasm (yasm: no such file or directory). After installing yasm via MacPorts it now builds just fine. So, does this mean, with this patch, yasm gets a new dependency to build mozilla applications? Or will yasm be implemented into the mozilla source?

On Mac OS X JPEGs are OK with this patch (quick visual test) if you build FF in x86_64. But if I build FF/TB in i386, than all JPEGs are misrendered.
http://img291.imageshack.us/img291/3319/i386q.jpg

I've applied patch v2.1 and build with the 10.6 SDK.
Comment 113 Justin Lebar (not reading bugmail) 2010-08-07 10:11:05 PDT
(In reply to comment #112)
> After installing yasm via MacPorts it now builds just fine. So, does this mean, with this patch, yasm gets a new dependency to build mozilla applications?

Please see comment #50.

> On Mac OS X JPEGs are OK with this patch (quick visual test) if you build FF in x86_64. But if I build FF/TB in i386, than all JPEGs are misrendered.
> http://img291.imageshack.us/img291/3319/i386q.jpg

See comment #65.
Comment 114 Nomis101 2010-08-07 12:18:46 PDT
Oh, seems I overlooked that...


> > On Mac OS X JPEGs are OK with this patch (quick visual test) if you build FF in x86_64. But if I build FF/TB in i386, than all JPEGs are misrendered.
> > http://img291.imageshack.us/img291/3319/i386q.jpg
> 
> See comment #65.
Thanks, this fixed it. :-)
Comment 115 Justin Lebar (not reading bugmail) 2010-08-09 09:20:32 PDT
*** Bug 242145 has been marked as a duplicate of this bug. ***
Comment 116 Justin Lebar (not reading bugmail) 2010-08-09 09:24:03 PDT
*** Bug 244978 has been marked as a duplicate of this bug. ***
Comment 117 Justin Lebar (not reading bugmail) 2010-08-09 09:31:58 PDT
Right now you can build FF to use the system libjpeg.  I wonder if we'll want to disable this with this patch.  libjpeg-turbo has a number of features we can use to make other code faster (see e.g. bug 584652, maybe bug 571739).  We could write code to conditionally call into libjpeg-turbo only if it's enabled, but since we won't be testing with libjpeg, it's probably safer just to assume we have libjpeg-turbo.
Comment 118 Justin Lebar (not reading bugmail) 2010-08-10 13:56:21 PDT
Created attachment 464561 [details] [diff] [review]
Patch v1.3

Updated to tip of libjpeg-turbo svn, which includes the method dispatch fix.
Comment 119 Justin Lebar (not reading bugmail) 2010-08-10 13:58:17 PDT
Comment on attachment 464561 [details] [diff] [review]
Patch v1.3

Tagging jrmuizel for review.  Jeff, how do you think we should proceed here?
Comment 120 Jeff Muizelaar [:jrmuizel] 2010-08-11 07:21:20 PDT
Comment on attachment 464561 [details] [diff] [review]
Patch v1.3

I'm probably not going to have time to get to this review that quickly as this is lower priority than some of the other work I'm doing. One comment I do have so far is that you should fix up the licensing documentation to match the code that we're including. i.e. the LGPL and WXWindows licenses should not be needed.
Comment 121 Justin Lebar (not reading bugmail) 2010-08-11 11:49:36 PDT
I can remove LICENSE.txt and LGPL.txt.  The only other place I see wxWindows and (L)GPL referenced is in README-turbo, but I'm not sure it's worth patching README-turbo to remove the references to those licenses.

Do you agree?
Comment 122 DRC 2010-08-11 13:10:06 PDT
I should really clarify the language in README-turbo anyhow. What I mean that file to say is that the overall distribution of LJT is covered by wxWindows if the distribution uses the accelerated Huffman codec but not otherwise.
Comment 124 Justin Lebar (not reading bugmail) 2010-08-20 22:48:50 PDT
(In reply to comment #123)
> Todays win32 build works !
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jlebar@mozilla.com-1bb8d265ded9/tryserver-win32/

That tryserver build doesn't include libjpeg-turbo.  See http://hg.mozilla.org/try/log?rev=1bb8d265ded9
Comment 125 fxtech 2010-08-20 23:03:01 PDT
ops.. ..that's the reason becouse it work
Comment 126 Daniel Cater 2010-08-23 10:23:04 PDT
Now that bug 563088 has landed, I think this bug is more important. Changing back to a tab that hasn't been active for a while, I can visibly see images being re-decoded. If the speed gains here really are significant then this will help.
Comment 127 Bobby Holley (:bholley) (busy with Stylo) 2010-08-28 11:23:43 PDT
jlebar - can you throw together a try build with libjpeg-turbo based onto trunk? We'd like to see how noticeable the post-discarding redecode speedup is tabbing back to:

http://flavors.me/johnath
and
http://demos.hacks.mozilla.org/openweb/CSSMAKESUSICK/

after waiting a period longer than image.mem.discarding_timeout_ms * 2 milliseconds.

If it's significant enough, we may re-evaluate the blocking status of libjpeg-turbo.
Comment 128 Justin Lebar (not reading bugmail) 2010-08-28 12:42:21 PDT
I just pushed to try, but then I remembered that the build machines are still running the old, broken version of yasm.  I believe we'll get working Linux and Mac 64-bit builds, and maybe we'll get a working Linux 32-bit build.

It might just be easier for you to pull the patch, install yasm 1.1, and build it yourself.
Comment 129 Justin Lebar (not reading bugmail) 2010-08-28 14:37:16 PDT
My push to try is yielding debug, not opt, builds for some reason.  I'll try and get this sorted out...
Comment 130 Justin Lebar (not reading bugmail) 2010-08-29 15:31:35 PDT
Builds are up: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jlebar@mozilla.com-53de5036b768/

The Mac 32-bit build is busted -- it'll run but decode images as garbage.  The Linux 32- and 64-bit and the Mac 64-bit builds should all work fine.
Comment 131 Justin Lebar (not reading bugmail) 2010-09-02 17:32:45 PDT
Created attachment 471713 [details] [diff] [review]
Patch v1.3.1

Fixed typo in configure.in.
Comment 132 Nomis101 2010-09-07 05:27:25 PDT
Nice patch, this also speeds up displaying jpeg-picture attachments in Thunderbird, especially for small files. I now see them immediately. For bigger attachments its nearly the same. (I've ported the config changes from this patch to c-c for my build)
Comment 133 Steve Snyder 2010-09-11 13:25:53 PDT
FYI, v1.0.1 was released on September 9th.

http://sourceforge.net/projects/libjpeg-turbo/files/
Comment 134 Justin Lebar (not reading bugmail) 2010-09-13 15:33:26 PDT
Jeff, ping me before you review this and I'll update to whatever is libjpeg-turbo's tip of trunk at that time.
Comment 135 DRC 2010-10-12 02:20:20 PDT
Quick note-- the libjpeg-turbo subversion trunk is used for the latest & greatest (and not necessarily stable) code.  I would suggest continuing to base your in-tree copy of LJT on 1.0.1.  I will create a 1.0.x stable branch as soon as it is needed (i.e. as soon as any bug fixes come in for LJT 1.0.1.)
Comment 136 thewolf86 2010-11-15 15:23:19 PST
Is this going to make into FF4? If so, can it be marked as blocking final at least? Don't mean to rush of course. Just wish set my expectations appropriately. ;-)
Comment 137 Benjamin Smedberg [:bsmedberg] 2010-11-15 15:26:52 PST
This bug is *not* blocking, as the flag indicates. And at this point, I really don't think we should take it. We actually need to ship at some point.
Comment 138 Jorge Quiñónez 2010-12-09 13:15:02 PST
Does anyone foresee that the libjpeg-turbo code will be added to FF 4.1 as a feature in the future?
Comment 139 Jeff Muizelaar [:jrmuizel] 2010-12-09 13:20:22 PST
(In reply to comment #138)
> Does anyone foresee that the libjpeg-turbo code will be added to FF 4.1 as a
> feature in the future?

I think that's when it's likely to land.
Comment 140 Justin Lebar (not reading bugmail) 2011-01-15 10:55:30 PST
This isn't yet reviewed, and before it gets reviewed, it needs to be updated to the latest version of the library.  It looks like the post2.0 tracking bug tracks bugs which are ready to land, so I'm removing that dependency.
Comment 141 DRC 2011-01-27 21:52:35 PST
FYI:

Stable branch for libjpeg-turbo 1.0.x has been created:

https://libjpeg-turbo.svn.sourceforge.net/svnroot/libjpeg-turbo/branches/1.0.x

This is probably what you want to keep your in-tree copy synced with for now, unless you need any of the features in 1.1.
Comment 142 Virtual_ManPL [:Virtual] - (ni? me) 2011-03-11 14:36:12 PST
FYI: 1.1.0 version was released as stable some time ago
Comment 143 pbo 2011-03-15 05:09:55 PDT
Bug on integration libjpeg-turbo in Google Chrome, already have status verified fixed...

http://code.google.com/p/chromium/issues/detail?id=48789
Comment 144 Leman Bennett [Omega] 2011-03-17 12:16:43 PDT
Maybe a different reviewer should be assigned to possibly get this in for Fx5.0? 

According to the draft plan for the new development process, the 5.0 branch could be cut early(3 weeks).
Comment 145 Brendan Eich [:brendan] 2011-03-17 12:22:39 PDT
(In reply to comment #144)
> Maybe a different reviewer should be assigned to possibly get this in for
> Fx5.0? 

Jeff, you reviewing?

/be
Comment 146 Jeff Muizelaar [:jrmuizel] 2011-03-17 12:45:44 PDT
(In reply to comment #145)
> (In reply to comment #144)
> > Maybe a different reviewer should be assigned to possibly get this in for
> > Fx5.0? 
> 
> Jeff, you reviewing?
> 
> /be

I've completed a large chunk of the review. I haven't had time to get to the rest of it yet. I am hoping to for 5.

-Jeff
Comment 147 Justin Lebar (not reading bugmail) 2011-03-17 16:33:33 PDT
I'll be starting full-time on 3/28, so I'll be able to update the patch and whatnot then.
Comment 148 Jeff Muizelaar [:jrmuizel] 2011-03-26 21:08:53 PDT
Comment on attachment 471713 [details] [diff] [review]
Patch v1.3.1

I did not review the simd code as that's more work and lower risk because it's mostly just optimized versions of existing functions. I did go through the rest of the stuff, hopefully I was thorough enough (there was a lot of change).

A few of these comments are on Mozilla changes, the others are on the upstream code. I don't consider it critical that the upstream stuff be addressed before landing.

>diff --git a/jpeg/MOZCHANGES b/jpeg/MOZCHANGES
>--- a/jpeg/MOZCHANGES
>+++ b/jpeg/MOZCHANGES
>@@ -1,10 +1,42 @@
>+    jchuff.c, jdhuff.c, jdhuff.h
>+  since the versions of these files in libjpeg-turbo are also under the
>+  wxWindows license.  (It would have been nicer to revert them to the new
>+  libjepg-8b code, but that doesn't easily integrate with libjepg-turbo.)
s/jepg/jpeg/ (both of them)

>diff --git a/jpeg/jpegint.h b/jpeg/jpegint.h
>--- a/jpeg/jpegint.h
>+++ b/jpeg/jpegint.h
>@@ -364,17 +364,17 @@ EXTERN(void) jinit_color_deconverter JPP
> /* Utility routines in jutils.c */
> EXTERN(long) jdiv_round_up JPP((long a, long b));
>-EXTERN(long) jround_up JPP((long a, long b));
>+EXTERN(size_t) jround_up JPP((size_t a, size_t b));

This change of type here is sort of interesting. It is extended to 64 bits on
Windows and changed to unsigned. 
The checkin comment for the change was "Bleepin' Windows uses LLP64, not LP64",
however it seems like the only code that needs the new size is code added by to jpeg-6b.
(jmemmgr.c and huffman encoder/decoder not included in this patch)

This change removes the type parallelism between jround_up and its sister function jdiv_round_up,
it's also sort of an abuse of the size_t type because the function is used for more than just
sizes of objects. Some callers haven't been updated and still do jround_up((long) val)

I'd suggest using a less general align-to-power-of-two function in jmemmgr.c, I didn't investigate
why the huffman code needs to round 64bit values

>diff --git a/jpeg/jchuff.c b/jpeg/jchuff.c
>--- a/jpeg/jchuff.c
>+++ b/jpeg/jchuff.c
>@@ -14,16 +14,19 @@
> #include "jchuff.h"		/* Declarations shared with jcphuff.c */
> 
>+/* MOZILLA CHANGE: libjpeg-turbo doesn't define INLINE in its config file, so
>+ * we define it here. */
>+#define INLINE

This doesn't seem like a very good definition of INLINE, using
the one removed by this patch seems like a better idea.

>diff --git a/jpeg/jcphuff.c b/jpeg/jcphuff.c
>--- a/jpeg/jcphuff.c
>+++ b/jpeg/jcphuff.c
>@@ -218,17 +218,16 @@ dump_buffer (phuff_entropy_ptr entropy)
> 
>-INLINE
> LOCAL(void)
> emit_bits (phuff_entropy_ptr entropy, unsigned int code, int size)
> /* Emit some bits, unless we are in gather mode */
>@@ -271,17 +270,16 @@ flush_bits (phuff_entropy_ptr entropy)
> 
>-INLINE
> LOCAL(void)
> emit_symbol (phuff_entropy_ptr entropy, int tbl_no, int symbol)

I'm not sure why INLINE needs to go...

>diff --git a/jpeg/jpeglib.h b/jpeg/jpeglib.h
>--- a/jpeg/jpeglib.h
>+++ b/jpeg/jpeglib.h
>@@ -1,58 +1,41 @@
> /*
>  * jpeglib.h
>  *
>  * Copyright (C) 1991-1998, Thomas G. Lane.
>+ * Copyright (C) 2009, D. R. Commander.
>  * This file is part of the Independent JPEG Group's software.
>  * For conditions of distribution and use, see the accompanying README file.
>  *
>  * This file defines the application interface for the JPEG library.
>  * Most applications using the library need only include this file,
>  * and perhaps jerror.h if they want to know the exact error codes.
>  */
> 
> #ifndef JPEGLIB_H
> #define JPEGLIB_H
> 
>-#ifdef XP_OS2
>-/*
>- * On OS/2, the system will have defined RGB_* so we #undef 'em to avoid warnings
>- * from jmorecfg.h.
>- */
>-#ifdef RGB_RED
>-	#undef RGB_RED
>-#endif
>-#ifdef RGB_GREEN
>-	#undef RGB_GREEN
>-#endif
>-#ifdef RGB_BLUE
>-	#undef RGB_BLUE
>-#endif
>-

This is probably needed for OS/2 but I'd rather it be fixed upstream.

>diff --git a/jpeg/jcdctmgr.c b/jpeg/jcdctmgr.c
>--- a/jpeg/jcdctmgr.c
>+++ b/jpeg/jcdctmgr.c
> /*
>+ * Find the highest bit in an integer through binary search.
>+ */
>+LOCAL(int)
>+flss (UINT16 val)
>+{

It would be good to use compiler intrinsics like BitScanReverse and __builtin_clz here when they're available.

>diff --git a/jpeg/jccolor.c b/jpeg/jccolor.c
>--- a/jpeg/jccolor.c
>+++ b/jpeg/jccolor.c
>+  24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 25,
>+  26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27,
>+  27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29
>+};

It would be nice to know where these tables come from

>@@ -141,20 +212,20 @@ rgb_ycc_convert (j_compress_ptr cinfo,
> 
>   while (--num_rows >= 0) {
>     inptr = *input_buf++;
>     outptr0 = output_buf[0][output_row];
>     outptr1 = output_buf[1][output_row];
>     outptr2 = output_buf[2][output_row];
>     output_row++;
>     for (col = 0; col < num_cols; col++) {
>-      r = GETJSAMPLE(inptr[RGB_RED]);
>-      g = GETJSAMPLE(inptr[RGB_GREEN]);
>-      b = GETJSAMPLE(inptr[RGB_BLUE]);
>-      inptr += RGB_PIXELSIZE;
>+      r = GETJSAMPLE(inptr[rgb_red[cinfo->in_color_space]]);
>+      g = GETJSAMPLE(inptr[rgb_green[cinfo->in_color_space]]);
>+      b = GETJSAMPLE(inptr[rgb_blue[cinfo->in_color_space]]);
>+      inptr += rgb_pixelsize[cinfo->in_color_space];

If the compiler can't lift the rgb_* out of the loop. It may be worth doing it by hand.

>@@ -183,36 +254,46 @@ rgb_ycc_convert (j_compress_ptr cinfo,
>-      r = GETJSAMPLE(inptr[RGB_RED]);
>-      g = GETJSAMPLE(inptr[RGB_GREEN]);
>-      b = GETJSAMPLE(inptr[RGB_BLUE]);
>-      inptr += RGB_PIXELSIZE;
>+    for (; outptr < maxoutptr; outptr++, inptr += rgbstride) {
>       /* Y */
>-      outptr[col] = (JSAMPLE)
>-		((ctab[r+R_Y_OFF] + ctab[g+G_Y_OFF] + ctab[b+B_Y_OFF])
>-		 >> SCALEBITS);
>+      #if BITS_IN_JSAMPLE == 8
>+      *outptr = red_lut[inptr[rindex]] + green_lut[inptr[gindex]]
>+	    + blue_lut[inptr[bindex]];

It's not clear what the advantage of this is...
Comment 149 Virtual_ManPL [:Virtual] - (ni? me) 2011-03-27 03:38:48 PDT
I strongly suggest updating libjpeg-turbo from 1.0.0 to 1.1.0 before taking time on complete reviewing. It will be wiser because many changes where made.



Significant changes since 1.0.0:

[1] The Huffman decoder will now handle erroneous Huffman codes (for instance, from a corrupt JPEG image.) Previously, these would cause libjpeg-turbo to crash under certain circumstances.

[2] Fixed typo in SIMD dispatch routines which was causing 4:2:2 upsampling to be used instead of 4:2:0 when decompressing JPEG images using SSE2 code.

[3] configure script will now automatically determine whether the INCOMPLETE_TYPES_BROKEN macro should be defined.



Significant changes since 1.0.1"

[1] Added emulation of the libjpeg v7 and v8b APIs and ABIs. See README-turbo.txt for more details. This feature was sponsored by CamTrace SAS.

[2] Created a new CMake-based build system for the Visual C++ and MinGW builds.

[3] TurboJPEG/OSS can now compress from/decompress to grayscale bitmaps.

[4] jpgtest can now be used to test decompression performance with existing JPEG images.

[5] If the default install prefix (/opt/libjpeg-turbo) is used, then 'make install' now creates /opt/libjpeg-turbo/lib32 and /opt/libjpeg-turbo/lib64 sym links to duplicate the behavior of the binary packages.

[6] All symbols in the libjpeg-turbo dynamic library are now versioned, even when the library is built with libjpeg v6b emulation.

[7] Added arithmetic encoding and decoding support (can be disabled with configure or CMake options)

[8] Added a TJ_YUV flag to TurboJPEG/OSS which causes both the compressor and decompressor to output planar YUV images.

[9] Added an extended version of tjDecompressHeader() to TurboJPEG/OSS which allows the caller to determine the type of subsampling used in a JPEG image.

[10] Added further protections against invalid Huffman codes.



Significant changes since 1.0.90 (1.1 beta1):

[1] The algorithm used by the SIMD quantization function cannot produce correct results when the JPEG quality is >= 98 and the fast integer forward DCT is used. Thus, the non-SIMD quantization function is now used for those cases, and libjpeg-turbo should now produce identical output to libjpeg v6b in all cases.

[2] Despite the above, the fast integer forward DCT still degrades somewhat for JPEG qualities greater than 95, so TurboJPEG/OSS will now automatically use the slow integer forward DCT when generating JPEG images of quality 96 or greater. This reduces compression performance by as much as 15% for these high-quality images but is necessary to ensure that the images are perceptually lossless. It also ensures that the library can avoid the performance pitfall created by [1].

[3] Ported jpgtest.cxx to pure C to avoid the need for a C++ compiler.

[4] Fixed visual artifacts in grayscale JPEG compression caused by a typo in the RGB-to-chrominance lookup tables.

[5] The Windows distribution packages now include the libjpeg run-time programs (cjpeg, etc.)

[6] All packages now include jpgtest.

[7] The TurboJPEG dynamic library now uses versioned symbols.

[8] Added two new TurboJPEG API functions, tjEncodeYUV() and tjDecompressToYUV(), to replace the somewhat hackish TJ_YUV flag.
Comment 150 Justin Lebar (not reading bugmail) 2011-03-27 07:46:42 PDT
Per comment 147, I intend to update to the new version before checking in.  That sound good to you, Jeff?
Comment 151 Jeff Muizelaar [:jrmuizel] 2011-03-28 06:30:58 PDT
(In reply to comment #150)
> Per comment 147, I intend to update to the new version before checking in. 
> That sound good to you, Jeff?

That's fine. An interdiff would to have for a quick look over. Also, one thing that I forgot to mention in the review comments. I'd prefer that we only include the files that are needed for actually building libjpeg-turbo and not checking in all of the extra programs and documentation.
Comment 152 Justin Lebar (not reading bugmail) 2011-03-28 07:44:28 PDT
>diff --git a/jpeg/jpeglib.h b/jpeg/jpeglib.h
>-#ifdef XP_OS2
>-/*
>- * On OS/2, the system will have defined RGB_* so we #undef 'em to avoid warnings
>- * from jmorecfg.h.
>- */
>-#ifdef RGB_RED
>-	#undef RGB_RED
>-#endif
>-#ifdef RGB_GREEN
>-	#undef RGB_GREEN
>-#endif
>-#ifdef RGB_BLUE
>-	#undef RGB_BLUE
>-#endif

> This is probably needed for OS/2 but I'd rather it be fixed upstream.

Do you mean upstream of libjpeg-turbo, i.e., in libjpeg?  libjpeg-turbo is a fork of an old version of libjpeg, so there's really no "upstream" to speak of.
Comment 153 Steve Snyder 2011-03-28 08:14:14 PDT
(In reply to comment #148)
> Comment on attachment 471713 [details] [diff] [review]
> Patch v1.3.1
[snip] 
> >diff --git a/jpeg/jcdctmgr.c b/jpeg/jcdctmgr.c
> >--- a/jpeg/jcdctmgr.c
> >+++ b/jpeg/jcdctmgr.c
> > /*
> >+ * Find the highest bit in an integer through binary search.
> >+ */
> >+LOCAL(int)
> >+flss (UINT16 val)
> >+{
> 
> It would be good to use compiler intrinsics like BitScanReverse and
> __builtin_clz here when they're available.
[snip]

https://sourceforge.net/tracker/?func=detail&aid=3249446&group_id=303195&atid=1278160

It's an improvement in both speed and size, but probably not enough to make a compelling case for inclusion of the patch.  We've seen better results in JS & NSPR because they both examine 32 bits.  The libjpeg-turbo code only looks at 16 bits so there are less savings to be had from the BSR instruction.
Comment 154 Jeff Muizelaar [:jrmuizel] 2011-03-28 08:24:23 PDT
(In reply to comment #152)
> Do you mean upstream of libjpeg-turbo, i.e., in libjpeg?  libjpeg-turbo is a
> fork of an old version of libjpeg, so there's really no "upstream" to speak of.

I meant upstream in libjpeg-turbo instead of patching our local copy.
Comment 155 Justin Lebar (not reading bugmail) 2011-03-28 08:33:37 PDT
Oh, I see; that's a change we made, but which isn't in libjpeg-turbo.

Until it's fixed upstream, why don't we just guard the #define's in jmorecfg.h?  We already keep that file out of sync with libjpeg-turbo's mainline.
Comment 156 Jeff Muizelaar [:jrmuizel] 2011-03-28 08:35:37 PDT
(In reply to comment #155)
> Oh, I see; that's a change we made, but which isn't in libjpeg-turbo.
> 
> Until it's fixed upstream, why don't we just guard the #define's in jmorecfg.h?
>  We already keep that file out of sync with libjpeg-turbo's mainline.

That sounds fine.
Comment 157 Justin Lebar (not reading bugmail) 2011-03-28 13:20:54 PDT
Created attachment 522445 [details] [diff] [review]
Patch v2.0.  Part 1: Upgrade to libjpeg-turbo v1.1.0
Comment 158 Justin Lebar (not reading bugmail) 2011-03-28 13:21:34 PDT
Created attachment 522446 [details] [diff] [review]
Patch v2.0.  Part 2: Address review comments.
Comment 159 Justin Lebar (not reading bugmail) 2011-03-28 13:24:06 PDT
Created attachment 522449 [details]
Interdiff v1.3.1 -> v2.0 part 1
Comment 160 Justin Lebar (not reading bugmail) 2011-03-28 13:45:58 PDT
Created attachment 522459 [details]
Interdiff v1.3.1 -> v2.0 part 1 without deleted files

This interdiff is the same as attachment 522449 [details], but it doesn't include the files I deleted, in an attempt to make it easier to read.
Comment 161 Justin Lebar (not reading bugmail) 2011-03-28 13:55:37 PDT
DRC, what would be a good medium for handling Jeff's remaining review comments?

I'm going to push the latest patches to try now.
Comment 162 DRC 2011-03-28 18:01:59 PDT
Created attachment 522544 [details] [diff] [review]
Proposed upstream patch for addressing concern about jround_up() argument types.
Comment 163 DRC 2011-03-28 18:27:21 PDT
I'll address the review comments that are relevant to the upstream code here:

(1) jround_up() argument types:  this was the result of a bad game of Celebrity Whack-a-Mole that I played while trying to port the code to Win64.  Jeff's idea of creating a local, power-of-2-specific round-up function for jmemmgr.c is a better idea, and I've done so in a new patch (attachment 522544 [details] [diff] [review]) (actually, I made it a macro instead of a function.)  Please let me know if that looks like a reasonable solution.

(2) RGB_{RED|GREEN|BLUE} macro conflicts on OS/2 (OS/2?  Really?):  This seems like an innocuous enough patch, although I'd rather have it in jmorecfg.h instead of jpeglib.h.  I'd also like to know more of the back story as to why OS/2 is defining those macros.  Is this a generic problem that anyone building libjpeg-turbo on OS/2 would encounter?  My lingering doubt here is that libjpeg supports OS/2, so I wonder why they haven't encountered this problem on that platform.  Is Mozilla including some header file that causes those macros to be defined?

(3) bitscan instructions:  I tested the proposed patch to make jcdctmgr.c use GCC 3.4+ compiler intrinsics to replace flss() with bitscan instructions, but I cannot observe any measurable speedup from this (tested several different image types and platforms, as well as 32-bit and 64-bit.)  If it were even 2% faster, I would agree to include it, but a performance patch that causes no appreciable increase in performance does nothing but needlessly obfuscate the code.

(4) The tables in jccolor.c are part of the optimized RGB-to-grayscale conversion routines.  These are pre-computed R-to-luminance, G-to-luminance, and B-to-luminance conversion values.  The problem with this approach is that it loses a bit of accuracy due to round-off, so I recently checked in code to the SVN trunk (1.2 working version) that uses new SIMD routines to perform RGB-to-grayscale conversion.  This is faster and eliminates the round-off issues, as well as the need for the tables in jccolor.c and the #ifdef in rgb_gray_convert().

(5) The rest of the concerns regarding jccolor.c have already been addressed in trunk as well (pulling rgb_red, rgb_green, rgb_blue out of the loop, etc.)
Comment 164 Justin Lebar (not reading bugmail) 2011-03-28 18:31:13 PDT
> (2) RGB_{RED|GREEN|BLUE} macro conflicts on OS/2 (OS/2?  Really?):  This seems
> like an innocuous enough patch, although I'd rather have it in jmorecfg.h
> instead of jpeglib.h.

Attachment 522446 [details] [diff] puts the #undef's in jmorecfg.h, right above the #defines.
Comment 165 DRC 2011-03-28 18:32:08 PDT
And just FYI, another thing that I'm currently being contracted to do is investigate rewriting the performance optimizations in j{c|d}huff*.  This will hopefully allow me to re-license those files in libjpeg-turbo 1.2.
Comment 166 Jeff Muizelaar [:jrmuizel] 2011-03-28 19:17:44 PDT
Comment on attachment 522544 [details] [diff] [review]
Proposed upstream patch for addressing concern about jround_up() argument types.


>+#define PAD(size, align) (((size) + (align) - 1) & (~((align) - 1)))

I usually prefer functions over macros for the additional type safety and to avoid evaluating arguments twice (e.g. PAD(x, a++))
Comment 167 Jeff Muizelaar [:jrmuizel] 2011-03-28 19:20:23 PDT
(In reply to comment #163)
> (2) RGB_{RED|GREEN|BLUE} macro conflicts on OS/2 (OS/2?  Really?):  This seems
> like an innocuous enough patch, although I'd rather have it in jmorecfg.h
> instead of jpeglib.h.  I'd also like to know more of the back story as to why
> OS/2 is defining those macros.  Is this a generic problem that anyone building
> libjpeg-turbo on OS/2 would encounter?  My lingering doubt here is that libjpeg
> supports OS/2, so I wonder why they haven't encountered this problem on that
> platform.  Is Mozilla including some header file that causes those macros to be
> defined?

I'd be fine with dropping this altogether. If OS/2 still needs it, we can always add it back.

> 
> (3) bitscan instructions:  I tested the proposed patch to make jcdctmgr.c use
> GCC 3.4+ compiler intrinsics to replace flss() with bitscan instructions, but I
> cannot observe any measurable speedup from this (tested several different image
> types and platforms, as well as 32-bit and 64-bit.)  If it were even 2% faster,
> I would agree to include it, but a performance patch that causes no appreciable
> increase in performance does nothing but needlessly obfuscate the code.

It's probably worth adding a comment to flss saying that it's not performance critical enough to justify using compiler intrinsics.

> (4) The tables in jccolor.c are part of the optimized RGB-to-grayscale
> conversion routines.  These are pre-computed R-to-luminance, G-to-luminance,
> and B-to-luminance conversion values.  The problem with this approach is that
> it loses a bit of accuracy due to round-off, so I recently checked in code to
> the SVN trunk (1.2 working version) that uses new SIMD routines to perform
> RGB-to-grayscale conversion.  This is faster and eliminates the round-off
> issues, as well as the need for the tables in jccolor.c and the #ifdef in
> rgb_gray_convert().

That sounds like a good solution to me.
Comment 168 Jeff Muizelaar [:jrmuizel] 2011-03-28 19:20:53 PDT
(In reply to comment #165)
> And just FYI, another thing that I'm currently being contracted to do is
> investigate rewriting the performance optimizations in j{c|d}huff*.  This will
> hopefully allow me to re-license those files in libjpeg-turbo 1.2.

This sounds great.
Comment 169 DRC 2011-03-28 21:41:53 PDT
Created attachment 522583 [details] [diff] [review]
New proposed upstream patch for addressing concern about jround_up() argument types.
Comment 170 DRC 2011-03-28 21:43:26 PDT
(In reply to comment #166)
> >+#define PAD(size, align) (((size) + (align) - 1) & (~((align) - 1)))
> 
> I usually prefer functions over macros for the additional type safety and to
> avoid evaluating arguments twice (e.g. PAD(x, a++))

My experience has been that the compiler does a good job of optimizing out that duplication of effort, and the lack of overhead of the additional function call usually makes up for it, but in the specific case we're referring to, it doesn't matter enough to worry about, and using a function call is more consistent with the way it worked before.  Re-submitted as attachment 522583 [details] [diff] [review].
Comment 171 Jeff Muizelaar [:jrmuizel] 2011-03-28 21:51:45 PDT
(In reply to comment #170)
> (In reply to comment #166)
> > >+#define PAD(size, align) (((size) + (align) - 1) & (~((align) - 1)))
> > 
> > I usually prefer functions over macros for the additional type safety and to
> > avoid evaluating arguments twice (e.g. PAD(x, a++))
> 
> My experience has been that the compiler does a good job of optimizing out that
> duplication of effort, and the lack of overhead of the additional function call
> usually makes up for it, but in the specific case we're referring to, it
> doesn't matter enough to worry about, and using a function call is more
> consistent with the way it worked before.  Re-submitted as attachment 522583 [details] [diff] [review] 

I was referring to the fact that PAD(x, a++) does a++ twice which is not what the caller is likely to expect. Regardless, I like the new version better.
Comment 172 Justin Lebar (not reading bugmail) 2011-03-29 07:16:58 PDT
There are unfortunately reftest failures on Windows with this latest patch.  Looking into it.

http://tbpl.mozilla.org/?tree=MozillaTry&rev=fceae635d617
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1301360568.1301363938.582.gz
Comment 173 Justin Lebar (not reading bugmail) 2011-03-29 07:22:43 PDT
Ah, the orange is due to unexpected success.  That's not so bad.  :)

I think we can just back out bug 582850.

http://hg.mozilla.org/mozilla-central/rev/4643426a1523
Comment 174 Jeff Muizelaar [:jrmuizel] 2011-03-29 07:44:55 PDT
Yep, and this should close bug 477728
Comment 175 Justin Lebar (not reading bugmail) 2011-03-29 07:48:15 PDT
Created attachment 522674 [details] [diff] [review]
Patch v2.0 Part 3: Re-enable reftests on Windows.
Comment 176 Steve Snyder 2011-03-29 12:23:54 PDT
Why is support for arithmetic encoding/decoding disabled?
Comment 177 Justin Lebar (not reading bugmail) 2011-03-29 12:29:59 PDT
Don't we need to use the libjpeg v8 ABI in order to get this?
Comment 178 Jeff Muizelaar [:jrmuizel] 2011-03-29 12:37:11 PDT
(In reply to comment #176)
> Why is support for arithmetic encoding/decoding disabled?

We haven't decided whether to enabled this yet. This is a separate issue and needs a separate bug as well as a lot more thought/research.
Comment 179 DRC 2011-03-29 12:46:26 PDT
(In reply to comment #177)
> Don't we need to use the libjpeg v8 ABI in order to get this?

No, it works when emulating the v6b ABI as well.  It just adds two additional functions without breaking compatibility with the rest of the ABI.
Comment 180 Justin Lebar (not reading bugmail) 2011-03-29 14:56:42 PDT
http://hg.mozilla.org/mozilla-central/rev/6a8baba56a65
http://hg.mozilla.org/mozilla-central/rev/3eb5a574319f

Please file follow-ups as needed.
Comment 181 Jeff Muizelaar [:jrmuizel] 2011-03-29 15:18:45 PDT
Apparently this causes OOM in yasm 0.8.something we may need to up the min version
Comment 182 :Gavin Sharp [email: gavin@gavinsharp.com] 2011-03-29 15:37:23 PDT
More specifically, with yasm 0.8.0.2194, which is the latest available by default with Ubuntu 10.04. yasm 1.1.0.2352 works fine.
Comment 183 Olli Pettay [:smaug] 2011-03-30 08:13:42 PDT
Fedora 13 has old yasm too
Comment 184 Christopher Aillon (sabbatical, not receiving bugmail) 2011-03-30 13:42:24 PDT
(Though, honestly we'd really prefer to make it buildable with nasm, or gas.  But that's another bug...)
Comment 185 DRC 2011-04-18 12:57:04 PDT
NOTE:  libjpeg-turbo trunk now contains a version of jdhuff.c and jdhuff.h (Huffman decoder) with the optimizations re-written and re-licensed under the same BSD-style license as the rest of libjpeg.  Looking into jchuff* as well, but I figured you were probably more interested in the decoder than the encoder.
Comment 186 Justin Lebar (not reading bugmail) 2011-04-18 13:05:52 PDT
I've filed bug 650899 for updating libjpeg-turbo.
Comment 187 Robert Sayre 2011-04-26 15:08:23 PDT
Jesse, why did you nominate this bug for tracking-firefox5?
Comment 188 Jesse Ruderman 2011-04-26 15:16:07 PDT
Because bsmedberg told me to in http://groups.google.com/group/mozilla.dev.planning/msg/bfc2fabaecacdd97.  Also because I think it should be listed on https://wiki.mozilla.org/Features/Release_Tracking to make sure it gets the right security reviews, license file additions, changelog entries, crash stats scrutiny, and whatever else people follow that page for.
Comment 189 DRC 2011-04-26 17:45:06 PDT
NOTE:  libjpeg-turbo trunk now contains a version of jchuff.c
(Huffman encoder) with the optimizations re-written and re-licensed under the
same BSD-style license as the rest of libjpeg.
Comment 190 Justin Lebar (not reading bugmail) 2011-04-26 17:47:40 PDT
... and the bug for tracking that is bug 650899.
Comment 191 DRC 2011-04-26 17:50:46 PDT
That bug is for tracking the decoder.  I'm talking about the encoder now.
Comment 192 Brandon Sterne (:bsterne) 2011-04-27 14:30:33 PDT
FYI, the Chrome team has apparently done a security review of the library:
http://code.google.com/p/chromium/issues/detail?id=48789#c3
Comment 193 JP Rosevear [:jpr] 2011-05-17 14:39:16 PDT
Security team finished fuzzing as planned, we are good here.

Note You need to log in before you can comment on or make changes to this bug.