Last Comment Bug 698519 - Update to libjpeg-turbo 1.2
: Update to libjpeg-turbo 1.2
Status: RESOLVED FIXED
[secr:cdiehl][fuzzing complete][gfx.r...
:
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: unspecified
: All All
: -- normal with 5 votes (vote)
: mozilla13
Assigned To: Ryan VanderMeulen [:RyanVM]
:
Mentors:
: 489148 (view as bug list)
Depends on: 771403
Blocks: 759891
  Show dependency treegraph
 
Reported: 2011-10-31 11:20 PDT by Justin Lebar (not reading bugmail)
Modified: 2013-07-30 08:03 PDT (History)
32 users (show)
dveditz: sec‑review+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Checkpoint patch (337.37 KB, patch)
2012-01-14 20:20 PST, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Review
Checkpoint patch #2 (340.32 KB, patch)
2012-01-15 13:37 PST, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Review
Checkpoint patch #3 (340.26 KB, patch)
2012-01-15 16:14 PST, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Review
Checkpoint patch #4 (341.57 KB, patch)
2012-01-17 16:33 PST, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Review
Checkpoint patch #5 (421.85 KB, patch)
2012-01-29 17:57 PST, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Review
Checkpoint patch #6 (423.65 KB, patch)
2012-01-31 14:34 PST, Ryan VanderMeulen [:RyanVM]
justin.lebar+bug: review-
Details | Diff | Review
Google patches to libjpeg-turbo 1.2.0 (46.21 KB, patch)
2012-02-10 18:58 PST, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Review
Upgrade to libjpeg-turbo 1.2.0 (443.48 KB, patch)
2012-02-10 20:16 PST, Ryan VanderMeulen [:RyanVM]
no flags Details | Diff | Review
Upgrade to libjpeg-turbo 1.2.0 (441.57 KB, patch)
2012-02-11 03:34 PST, Ryan VanderMeulen [:RyanVM]
justin.lebar+bug: review+
Details | Diff | Review

Description Justin Lebar (not reading bugmail) 2011-10-31 11:20:16 PDT
libjpeg-turbo 1.2 contains the new, faster, BSD-licensed huffman decoder (bug 650899) and also includes NEON optimizations.

1.2 hasn't been officially released yet, and I think we might as well wait until it's signed off before we take it into the tree.  But we'll want to fuzz the new code, just as we fuzzed the current implementation.  So I'll work on spinning a patch that the security folks can bang on.
Comment 1 Justin Lebar (not reading bugmail) 2011-10-31 11:25:24 PDT
I don't know if this needs a full security review or just a sign-off from the security team.
Comment 2 Jesse Ruderman 2011-10-31 16:07:03 PDT
Christoph fuzzed some version of libjpeg-turbo in May.  I don't know if it was a version that included the improvements in comment 0.

(In general, if I had my way, I'd require security reviews only for *not* updating our upstream code.)
Comment 3 Ryan VanderMeulen [:RyanVM] 2011-12-17 16:32:19 PST
Justin, were you still going to post a patch here to play with?
Comment 4 Justin Lebar (not reading bugmail) 2011-12-19 07:01:20 PST
(In reply to Ryan VanderMeulen from comment #3)
> Justin, were you still going to post a patch here to play with?

I have no recollection of saying I was going to do that, but I'd believe you if you said I had.  :)

I can do this, but it comes at the expense of some B2G things I'm working on atm, which I'd guess are a higher priority.
Comment 5 Ryan VanderMeulen [:RyanVM] 2011-12-19 09:06:58 PST
I was going off what you said in comment #0 :). I'm willing to work on it, but I'm not sure how trivial it will be and am a bit averse to any major troubleshooting that may come out of it.
Comment 6 Justin Lebar (not reading bugmail) 2011-12-19 09:19:39 PST
> I was going off what you said in comment #0 :)

Oh.  :)

If you want to try to spin up a patch, that'd be much appreciated!  It shouldn't be too hard.  The biggest trick will probably be getting the NEON asm to build.

There's a detailed MOZCHANGES file in media/libjpeg which explains some things.  But you'll probably have to reconfigure libjpeg-turbo and merge our changes to jconfig.h and jmorecfg.h.
Comment 7 Curtis Koenig [:curtisk-use curtis.koenig+bzATgmail.com]] 2012-01-11 14:50:00 PST
hold for sec-review until there is a patch
Comment 8 Ryan VanderMeulen [:RyanVM] 2012-01-14 18:36:53 PST
Justin, some questions:
-How do you handle generating jsimdcfg.inc from jsimdcfg.inc.h?
-Now that licensing is no longer an issue, are bmp.c, bmp.h, turbojpeg.h, turbojpegl.c, jchuff.c, jdhuff.c, and jdhuff.h all fair game? I know the huff stuff is good, but I don't know about the rest.
-I have no idea how to modify configure.in to set LIBJPEG_TURBO_NEON_ASM.
-Are the other config files basically generated by trial and error (see what's broken when building and add until the errors go away)?
Comment 9 Ryan VanderMeulen [:RyanVM] 2012-01-14 18:38:50 PST
Gah, I meant to remove the question about configure.in, as I think I figured that out (was making notes here as I went along). I've made a push to try with my first attempt at this. We'll see how badly it breaks :-P
https://tbpl.mozilla.org/?tree=Try&rev=e90f9d076d92
Comment 10 Ryan VanderMeulen [:RyanVM] 2012-01-14 20:20:12 PST
Created attachment 588707 [details] [diff] [review]
Checkpoint patch

Here's what I've got so far. It doesn't build.
../../../media/libjpeg/jdcolext.c:29:1: warning: return type defaults to 'int'
../../../media/libjpeg/jdcolext.c: In function 'LOCAL':
../../../media/libjpeg/jdcolext.c:30:1: error: expected declaration specifiers before 'ycc_rgb_convert_internal'
../../../media/libjpeg/jdcolext.c:81:1: error: expected declaration specifiers before '__attribute__'
../../../media/libjpeg/jdcolext.c:105:1: error: expected '{' at end of input
../../../media/libjpeg/jdcolext.c:105:1: warning: control reaches end of non-void function
Comment 11 Ryan VanderMeulen [:RyanVM] 2012-01-15 06:37:31 PST
One other note, I believe that I got the NEON stuff right. I confirmed in the Android build log that LIBJPEG_TURBO_ARM_ASM is being set to 1 as it should be. From reading the libjpeg-turbo build documentation, it should Just Work for Android GCC. So I basically just ripped off some of the libvpx configure.in magic.

However, it looks like more work is required for iOS support. Is that something we care about?

And of course, due to comment #10, I haven't actually verified that the NEON stuff is correct yet.
Comment 12 fxtech 2012-01-15 07:14:53 PST
I imagine no iOS support is needed becouse there is no iOS build of firefox
Comment 13 DRC 2012-01-15 12:37:02 PST
(In reply to Ryan VanderMeulen from comment #8)
> -Now that licensing is no longer an issue, are bmp.c, bmp.h, turbojpeg.h,
> turbojpegl.c, jchuff.c, jdhuff.c, and jdhuff.h all fair game? I know the

1.2 involved both a refactoring and a relicensing of any code that we had borrowed from VirtualGL (i.e. any files that still bore the wxWindows Library License), so some of the files implementing the TurboJPEG API have different names now.  I don't understand why you would need to include any of those, however, since the Mozilla code doesn't use the TurboJPEG API.
Comment 14 Ryan VanderMeulen [:RyanVM] 2012-01-15 12:40:14 PST
OK, so the bmp.c and bmp.h files are from TurboJPEG. In that case, all that matters are the 3 huff files and the 3 new files jccolext.c, jdcolext.c, and jdmrgext.c. Is that correct?
Comment 15 DRC 2012-01-15 12:42:37 PST
Well, there are additional files for the ARM SIMD stuff as well.  I'm not going to try to provide a list, because I'll probably leave something out.  diff is your friend.
Comment 16 Ryan VanderMeulen [:RyanVM] 2012-01-15 13:37:48 PST
Created attachment 588774 [details] [diff] [review]
Checkpoint patch #2

This still doesn't build, but at least I finally figured out what I was doing wrong with the ext files. Now it's dying in simd code.

yasm -o simd/jcgrymmx.o -f elf32 -rnasm -pnasm -DPIC -DELF -I/builds/slave/try-lnx/build/media/libjpeg/simd/  /builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm
/builds/slave/try-lnx/build/media/libjpeg/simd/jcolsamp.inc:17: error: undefined symbol `RGB_RED' in preprocessor
/builds/slave/try-lnx/build/media/libjpeg/simd/jcolsamp.inc:39: error: undefined symbol `RGB_RED' in preprocessor
/builds/slave/try-lnx/build/media/libjpeg/simd/jcolsamp.inc:61: error: undefined symbol `RGB_RED' in preprocessor
/builds/slave/try-lnx/build/media/libjpeg/simd/jcolsamp.inc:83: error: undefined symbol `RGB_RED' in preprocessor
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:44: error: undefined symbol `jsimd_rgb_gray_convert_mmx' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:44: error:  (Each undefined symbol is reported only once.)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:46: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:50: error: undefined symbol `SIZEOF_MMWORD' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:54: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:61: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:62: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:64: error: undefined symbol `JDIMENSION' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:66: error: undefined symbol `movpic.return' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:70: error: undefined symbol `JSAMPIMAGE' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:72: error: undefined symbol `JSAMPARRAY' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:72: error: undefined symbol `SIZEOF_JSAMPARRAY' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:73: error: undefined symbol `SIZEOF_JSAMPROW' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:78: error: undefined symbol `INT' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:81: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:83: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:88: error: undefined symbol `JSAMPROW' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:90: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:93: error: undefined symbol `movpic.columnloop' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:94: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:96: error: undefined symbol `RGB_PIXELSIZE' in preprocessor
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:192: error: undefined symbol `RGB_PIXELSIZE' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:198: error: undefined symbol `MMWORD' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:208: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:254: error: undefined symbol `BYTE_BIT' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:271: error: instruction expected after label
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:271: error: undefined symbol `GOTOFF' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:300: error: undefined symbol `SCALEBITS' (first use)
/builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm:340: error: undefined symbol `poppic.rowloop' (first use)
Comment 17 Ryan VanderMeulen [:RyanVM] 2012-01-15 16:14:54 PST
Created attachment 588787 [details] [diff] [review]
Checkpoint patch #3

This actually builds, though the NEON code isn't being built for some reason. LIBJPEG_TURBO_ARM_ASM is set to 1 according to the log.
Comment 18 Ryan VanderMeulen [:RyanVM] 2012-01-15 16:15:16 PST
Link to the last Android build log:
https://tbpl.mozilla.org/php/getParsedLog.php?id=8569482&tree=Try&full=1
Comment 19 Ryan VanderMeulen [:RyanVM] 2012-01-15 19:07:05 PST
While it won't fix comment 17, I think I need to nest a HAVE_ARM_NEON check in the new configure.in ARM case to make sure that the NEON optimizations are only compiled on supporting CPUs.
Comment 20 DRC 2012-01-15 21:50:14 PST
I don't follow. Wouldn't that mean that you'd need to release both a NEON and a non-NEON version of FF? The NEON optimizations are always compiled in and automatically enabled/disabled at run time on Linux systems (including Android.) Why is merging creating such difficulties? Seems like you're trying to cherry pick which files to merge rather than just taking the whole 1.2 product and building upon it.
Comment 21 Ryan VanderMeulen [:RyanVM] 2012-01-16 06:04:40 PST
My thought was to maintain the *ability* for a non-NEON supporting build to be compiled should someone choose to do so. It has nothing to do with what Mozilla would or wouldn't release officially. Note that HAVE_ARM_NEON is currently set by configure if it detects a supporting architecture. If they would just be disabled at run time, maybe you're right that it's unnecessary.

Regarding cherry picking (and the merge issues I was having), it was more due to my own mistakes than anything else. I wasn't catching that some of the new files are included but are not to be compiled directly. By going through the upstream makefiles more carefully, I was able to make sense of it and get it compiling. The only "cherry picking" going on is not including the TurboJPEG files. The MOZCHANGES in the patch details exactly what modifications from upstream have been made. Hope that clarifies.
Comment 22 Justin Lebar (not reading bugmail) 2012-01-17 06:03:32 PST
> I think I need to nest a HAVE_ARM_NEON check in the new configure.in ARM case to make sure that the 
> NEON optimizations are only compiled on supporting CPUs.

Do we pass a flag at compile-time which says "this CPU won't support NEON"?  Otherwise, a runtime check seems fine.

> Why is merging creating such difficulties?

The way I set up the libjpeg-turbo build, all files are built using Mozilla's build machinery.  So to do the merge, you need to run libjpeg-turbo's configure.in, then merge the resulting config files, then make Mozilla's build properly compile the libjpeg-turbo source.  That can be somewhat hairy, as Ryan seems to be discovering.

> However, it looks like more work is required for iOS support. Is that something we care about?

Ted maintains an unofficial iOS port, so I'm sure he'd appreciate it if you didn't go out of your way to make this difficult to compile on iOS.  But this isn't a requirement for landing.

> yasm -o simd/jcgrymmx.o -f elf32 -rnasm -pnasm -DPIC -DELF -I/builds/slave/try-lnx/build/media
> /libjpeg/simd/  /builds/slave/try-lnx/build/media/libjpeg/simd/jcgrymmx.asm

Perhaps you should not be assembling this file directly.  It looks like it's %included in jcgrammx.asm.

Maybe it would be worth running a normal libjpeg-turbo build and parsing the output to see exactly which files it compiles.  Or you could just look at the object files generated.

Ryan, anything else you'd like feedback on?
Comment 23 Justin Lebar (not reading bugmail) 2012-01-17 06:05:08 PST
I'm removing the sec-review-needed tag because the security folks seem to be using this flag only when there's a reasonably complete patch available in the bug, such that we might be able to presently schedule the security review.  We should add the flag back once things compile, etc.
Comment 24 Justin Lebar (not reading bugmail) 2012-01-17 06:19:25 PST
> Note that HAVE_ARM_NEON is currently set by configure if it detects a supporting architecture.

Oh, I understand, now that I read configure.in.  HAVE_ARM_NEON checks that the compiler is capable of compiling NEON code.  Your presumption is that if the compiler fails to compile NEON code, it's because we've passed flags which suggest a CPU which doesn't have NEON?  From reading xpcom/glue/arm.h, it seems that this is true for GCC, but not MSVC.  MSVC can't compile inline NEON, but libjpeg-turbo uses only out-of-line asm.

Given that HAVE_ARM_NEON is really a check for compiler and arch support, and given that you only actually want an arch support check, I think it'd probably be OK to ignore this flag and always assemble the NEON code, guarded by a runtime check.  Assuming configure finds a suitable assembler, etc.  There's little harm in having the code sitting around, right?
Comment 25 Justin Lebar (not reading bugmail) 2012-01-17 06:22:49 PST
> My thought was to maintain the *ability* for a non-NEON supporting build to be compiled should 
> someone choose to do so.

--disable-libjpeg-turbo should still work, right?
Comment 26 Justin Lebar (not reading bugmail) 2012-01-17 06:49:10 PST
I discussed this with some folks in #mobile.  Highlights include:

* jbramley: ARMv6 doesn't support NEON.  ARMv7a and newer do, but ARMv7 does not guarantee that NEON is available.

* glandium: It can still be useful to be able to build for armv6 but allow neon at runtime, because you can rum ARMv6 code on an ARMv7 processor.

* mfinkle: We don't currently support building for ARMv6, but we may in the future.  Bug 697205.

Based on this, I think building the NEON code whenever we can (unless --disable-libjpeg-turbo is passed) and doing a runtime check is the right way to go.  What do you think, Ryan?
Comment 27 Curtis Koenig [:curtisk-use curtis.koenig+bzATgmail.com]] 2012-01-17 07:41:09 PST
adding sec-review-needed kw/flag, we may need to put some fuzzing resources on this
Comment 28 Ryan VanderMeulen [:RyanVM] 2012-01-17 09:59:01 PST
(In reply to Justin Lebar [:jlebar] from comment #22)
> Ted maintains an unofficial iOS port, so I'm sure he'd appreciate it if you
> didn't go out of your way to make this difficult to compile on iOS.  But
> this isn't a requirement for landing.
OK, then I won't worry about it for now. It appears that libvpx has already added the needed machinery for gas should iOS support be desired.
> Perhaps you should not be assembling this file directly.  It looks like it's
> %included in jcgrammx.asm.
That was exactly my issue. As I mentioned in comment 21, I ran into that problem in more than one spot :).
(In reply to Justin Lebar [:jlebar] from comment #25)
> --disable-libjpeg-turbo should still work, right?
I don't see why it wouldn't.
(In reply to Justin Lebar [:jlebar] from comment #26)
> Based on this, I think building the NEON code whenever we can (unless
> --disable-libjpeg-turbo is passed) and doing a runtime check is the right
> way to go.  What do you think, Ryan?
Sounds good to me!

At this point, I am still unable to get it to build on ARM. Something about the way I'm implementing LIBJPEG_TURBO_ARM_ASM isn't working correctly. As a result, the right files aren't being included and the build is dying. I will attach an updated checkpoint patch when I get home tonight along with links to the last build log with the hopes that somebody can see what I'm doing wrong.
Comment 29 Ryan VanderMeulen [:RyanVM] 2012-01-17 16:33:06 PST
Created attachment 589346 [details] [diff] [review]
Checkpoint patch #4

Here's the latest checkpoint patch. This builds on all platforms except Android. If anyone can see what I'm doing wrong with LIBJPEG_TURBO_ARM_ASM, please speak up.
https://tbpl.mozilla.org/?tree=Try&rev=e20e88641758

With respect to security fuzzing, my plan was to get this working first, then update the patch to the latest trunk to get the most recent code fuzzed.
Comment 30 Justin Lebar (not reading bugmail) 2012-01-17 17:40:55 PST
I see

/tools/python/bin/python2.5 /builds/slave/try-andrd/build/config/pythonpath.py -I../../config /builds/slave/try-andrd/build/config/expandlibs_gen.py jcomapi.o [...] jcsample.o simd/jsimd_arm.o simd/jsimd_arm_neon.S   > libmozjpeg.a.desc

The .S file at the end looks pretty fishy.  I also don't see the assembler being called on jsimd_arm_neon.S.  Compare this with the x86 build, where yasm gets called for each entry in ASFILES.

It doesn't look like you set LIBJPEG_TURBO_AS; maybe that's the problem?
Comment 31 Zlip792 2012-01-26 02:34:27 PST
I don't want to disrupt the conversation and flow of bug but there is build made by a guy which I think use trunk version of libjpeg-turbo, you can find them here:
http://fbuild.com/
Don't forget to use Translator.. :-)
If my comment is useless then forgive me...
Comment 32 Justin Lebar (not reading bugmail) 2012-01-26 07:02:12 PST
Not useless, but we can't do much with pre-built binaries, since we're an open-source project.
Comment 33 Justin Lebar (not reading bugmail) 2012-01-27 06:41:28 PST
The 1.2 branch was released today!
Comment 34 DRC 2012-01-27 12:50:36 PST
The branch was created, but it has not been "released" yet.  We are awaiting resolution of a downstream issue before releasing 1.2.0.
Comment 35 Ryan VanderMeulen [:RyanVM] 2012-01-29 17:57:46 PST
Created attachment 592562 [details] [diff] [review]
Checkpoint patch #5

Updated to the tip of the 1.2.x branch. Getting closer to a working ARM build, but not there yet. jsimd_arm.c actually builds now (meaning I've finally got those flags working right), but jsimd_arm_neon.S still isn't getting built. Justin, can you please help me out with this? I'm far from an expert in configure/makefiles.

Also, I removed a few more files that weren't necessary for us to build. Probably shouldn't have been included in the original import.

https://tbpl.mozilla.org/?tree=Try&rev=15cfafc6232d
Comment 36 Makoto Kato [:m_kato] 2012-01-29 21:34:57 PST
*.S should add to SSRCS, not ASFILES.  (ex. http://mxr.mozilla.org/mozilla-central/source/gfx/skia/Makefile.in)
Comment 37 Justin Lebar (not reading bugmail) 2012-01-30 07:56:16 PST
With the following patch, I see the .S file being compiled:

diff --git a/media/libjpeg/Makefile.in b/media/libjpeg/Makefile.in
index f89f9e1..0612287 100644
--- a/media/libjpeg/Makefile.in
+++ b/media/libjpeg/Makefile.in
@@ -116,20 +116,18 @@ ifeq ($(AS),yasm)
 endif
 
 # No SIMD support?
 ifeq (,$(LIBJPEG_TURBO_X86_ASM)$(LIBJPEG_TURBO_X64_ASM)$(LIBJPEG_TURBO_ARM_ASM))
   CSRCS += jsimd_none.c
 endif
 
 ifeq (1,$(LIBJPEG_TURBO_ARM_ASM))
-  CSRCS   +=simd/jsimd_arm.c
-  ASFILES += \
-       simd/jsimd_arm_neon.S \
-       $(NULL)
+  CSRCS +=simd/jsimd_arm.c
+  SSRCS += simd/jsimd_arm_neon.S
 endif
 
 ifeq (1,$(LIBJPEG_TURBO_X64_ASM))
   CSRCS   += simd/jsimd_x86_64.c
   ASFILES += \
        simd/jccolss2-64.asm \
        simd/jcgrass2-64.asm \
        simd/jcqnts2f-64.asm \


/usr/local/bin/ccache /home/jlebar/code/moz/android-ndk-r6/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86/bin/arm-linux-androideabi-gcc -o simd/jsimd_arm_neon.o -march=armv7-a -mfpu=neon -I/home/jlebar/code/moz/ff-git/src/media/libjpeg/simd/ -c /home/jlebar/code/moz/ff-git/src/media/libjpeg/simd/jsimd_arm_neon.S
Comment 38 Ryan VanderMeulen [:RyanVM] 2012-01-30 08:04:45 PST
Hah, so close! OK, I'll make the change and push to try tonight with a full test run as well. When this goes final, did you want the Mozilla-specific changes split into a separate patch from the upstream changes?
Comment 39 Justin Lebar (not reading bugmail) 2012-01-30 08:31:48 PST
(In reply to Ryan VanderMeulen from comment #38)
> Hah, so close! OK, I'll make the change and push to try tonight with a full
> test run as well. When this goes final, did you want the Mozilla-specific
> changes split into a separate patch from the upstream changes?

If the Mozilla-specific changes are confined to a few small config files, that's fine with me if you list those files.  But I'm not an imagelib peer, so I can't properly review this without Jeff's leave, and Jeff may want something different.

Can you please also ensure that the autoconf --disable flag works and passes the tests, particularly on Android?
Comment 40 Ryan VanderMeulen [:RyanVM] 2012-01-30 08:35:39 PST
Will do.
Comment 41 Joe Drew (not getting mail) 2012-01-30 10:14:38 PST
(In reply to Justin Lebar [:jlebar] from comment #39)
> (In reply to Ryan VanderMeulen from comment #38)
> > Hah, so close! OK, I'll make the change and push to try tonight with a full
> > test run as well. When this goes final, did you want the Mozilla-specific
> > changes split into a separate patch from the upstream changes?
> 
> If the Mozilla-specific changes are confined to a few small config files,
> that's fine with me if you list those files.  But I'm not an imagelib peer,
> so I can't properly review this without Jeff's leave, and Jeff may want
> something different.

As the imagelib module owner, I declare:

1. Justin may do what he wants wrt libjpeg-turbo in this bug
2. I prefer all changesets to be stand-alone buildable and testable, which means AFAICT that Mozilla-specific changes should be in the same changeset as the libjpeg-turbo changes.
Comment 42 Ryan VanderMeulen [:RyanVM] 2012-01-31 14:08:02 PST
Looks good on Try:
Enabled: https://tbpl.mozilla.org/?tree=Try&rev=5204e965ce03
Disabled: https://tbpl.mozilla.org/?tree=Try&rev=e1df98cb6d61

And I'll just leave it as one big patch.
Comment 43 Justin Lebar (not reading bugmail) 2012-01-31 14:10:02 PST
Is this ready for me to review, then?
Comment 44 Ryan VanderMeulen [:RyanVM] 2012-01-31 14:19:48 PST
If you want to, go ahead.
Comment 45 Justin Lebar (not reading bugmail) 2012-01-31 14:31:21 PST
Actually...we want to update to 1.2 final before landing, right?  So this is ready for preliminary fuzzing, but not review?  Or what do you think?
Comment 46 Ryan VanderMeulen [:RyanVM] 2012-01-31 14:33:46 PST
From the sounds of it, 1.2.0 is pretty close to release. I was thinking that any changes between now and final are likely to be pretty minor, so it wouldn't hurt to start fuzzing and review the current patch.
Comment 47 Ryan VanderMeulen [:RyanVM] 2012-01-31 14:34:44 PST
Created attachment 593220 [details] [diff] [review]
Checkpoint patch #6

Would probably help if I actually attached the patch that builds on all platforms, though.
Comment 48 DRC 2012-01-31 16:26:36 PST
1.2.0 is ready to go from my POV, and binaries have actually already been built from r771.  The reason I'm waiting to upload them is that I got a last-minute bug report from someone within Mozilla regarding compiler warnings, and whoever submitted it hasn't responded yet regarding whether the 11th hour patch I included fixes the warnings.  That obviously isn't a critical issue, but when it was submitted, it made me think that you guys were probably in the middle of integration, so I wanted to give you a day or two to get things building properly within Mozilla, in case a more serious issue was discovered.

If you want me to go ahead and release, give the word.
Comment 49 Justin Lebar (not reading bugmail) 2012-01-31 16:47:53 PST
I can bug him or her; what's the name?
Comment 50 DRC 2012-01-31 16:52:55 PST
Actually, it was from Ryan.  Ryan, did r758 fix the warnings on your end?
Comment 51 Ryan VanderMeulen [:RyanVM] 2012-01-31 16:53:43 PST
That person would be me. Due to timing, I couldn't test your fix last night. It's running on the Try server right now. I'll update your project's bug when the results are in. Frankly, I wouldn't hold up the final release over a few warning fixes, though.

However, I do think fuzzing results could be worth waiting for.
Comment 52 DRC 2012-01-31 17:10:45 PST
Yeah, I'm not holding it up because of those warnings.  I'm just holding it up because that bug report tipped me off that Mozilla was still actively integrating the code.  I agree that waiting until fuzzing is complete would be worthwhile.  Do you think it can be completed this week?
Comment 53 Ryan VanderMeulen [:RyanVM] 2012-01-31 19:02:17 PST
Curtis/Christoph - How long does fuzzing usually take? The attached patch is ready.
Comment 54 Justin Lebar (not reading bugmail) 2012-01-31 21:10:40 PST
Comment on attachment 593220 [details] [diff] [review]
Checkpoint patch #6

I "reviewed" this by looking at the build changes, jpeg config files, and NEON
detection code.  That's about all that's comprehensible to me, anyway.  Fuzzing
is the real test, I think.

DRC, I have a few questions for you in here, if you don't mind.

>+dnl If we're on an ARM system which supports libjpeg-turbo's asm routines and
>+dnl --disable-libjpeg-turbo wasn't passed, use gcc as assembler.

Nit: We use the C compiler as the assembler, right?

>-/* Define if your compiler supports prototypes */
>+/* libjpeg-turbo version */
>+#define LIBJPEG_TURBO_VERSION 0

Huh?

>diff --git a/media/libjpeg/jconfig.h.in b/media/libjpeg/jconfig.h.in
>--- a/media/libjpeg/jconfig.h.in
>+++ b/media/libjpeg/jconfig.h.in

Do we need this file at all?

>+ * Check what SIMD accelerations are supported.
>+ *
>+ * FIXME: This code is racy under a multi-threaded environment.

DRC, can you confirm my understanding of this race?  If multiple threads run
this code, one might read that there's no simd support when there is.
Alternatively, both threads might do some extra work (running the full cpuinfo
parsing), but neither will leak, and they'll both come to the correct
conclusion about whether NEON is supported.  Right?

>+LOCAL(void)
>+init_simd (void)
>+{
>+  char *env = NULL;
>+#if !defined(__ARM_NEON__) && defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
>+  int bufsize = 1024; /* an initial guess for the line buffer size limit */
>+#endif
>+
>+  if (simd_support != ~0U)
>+    return;
>+
>+  simd_support = 0;
>+
>+#if defined(__ARM_NEON__)
>+  simd_support |= JSIMD_ARM_NEON;
>+#elif defined(__linux__) || defined(ANDROID) || defined(__ANDROID__)
>+  /* We still have a chance to use NEON regardless of globally used
>+   * -mcpu/-mfpu options passed to gcc by performing runtime detection via
>+   * /proc/cpuinfo parsing on linux/android */
>+  while (!parse_proc_cpuinfo(bufsize)) {
>+    bufsize *= 2;
>+    if (bufsize > SOMEWHAT_SANE_PROC_CPUINFO_SIZE_LIMIT)
>+      break;
>+  }
>+#endif
>+
>+  /* Force different settings through environment variables */
>+  env = getenv("JSIMD_FORCE_ARM_NEON");
>+  if ((env != NULL) && (strcmp(env, "1") == 0))
>+    simd_support &= JSIMD_ARM_NEON;

DRC, shouldn't this be "|=", as in the |#if defined(__ARM_NEON__)| branch above?

>+GLOBAL(void)
>+jsimd_idct_islow (j_decompress_ptr cinfo, jpeg_component_info * compptr,
>+                JCOEFPTR coef_block, JSAMPARRAY output_buf,
>+                JDIMENSION output_col)
>+{
>+  if ((simd_support & JSIMD_ARM_NEON))
>+    jsimd_idct_islow_neon(compptr->dct_table, coef_block, output_buf, output_col);
>+}
>+
>+GLOBAL(void)
>+jsimd_idct_ifast (j_decompress_ptr cinfo, jpeg_component_info * compptr,
>+                JCOEFPTR coef_block, JSAMPARRAY output_buf,
>+                JDIMENSION output_col)
>+{
>+  if ((simd_support & JSIMD_ARM_NEON))
>+    jsimd_idct_ifast_neon(compptr->dct_table, coef_block, output_buf, output_col);
>+}

It's not important, but I don't get why these if statements aren't assertions.
Surely it's wrong to call jsimd_idct_islow if neon or vectorized islow is not
supported.  So why guard with an if?

>diff --git a/media/libjpeg/simd/jsimd_i386.c b/media/libjpeg/simd/jsimd_i386.c
>--- a/media/libjpeg/simd/jsimd_i386.c
>+++ b/media/libjpeg/simd/jsimd_i386.c
>  *
>  * FIXME: This code is racy under a multi-threaded environment.
>  */
> LOCAL(void)
> init_simd (void)
> {
>   char *env = NULL;
> 
>-  if (simd_support != ~0)
>+  if (simd_support != ~0U)
>     return;
> 
>   simd_support = jpeg_simd_cpu_support();
> 
>   /* Force different settings through environment variables */
>   env = getenv("JSIMD_FORCEMMX");
>   if ((env != NULL) && (strcmp(env, "1") == 0))
>     simd_support &= JSIMD_MMX;

Okay, maybe I get why this is "&=".  Does "force MMX" mean "use MMX only if the
CPU supports it, but even if the CPU supports SSE"?  If so, the concept of
"force NEON" doesn't make much sense, since there's no alternative.

r- only because I'd like to look at the final patch.
Comment 55 Justin Lebar (not reading bugmail) 2012-01-31 21:12:22 PST
Apparently we need to schedule a security review for this.  I'll get on that.

Ryan, would you like to be involved in that meeting?  I can't imagine it will be particularly interesting, but let me know if you want to try to coordinate a time which works for you.
Comment 56 DRC 2012-01-31 21:45:01 PST
(In reply to Justin Lebar [:jlebar] from comment #54)
> >diff --git a/media/libjpeg/jconfig.h.in b/media/libjpeg/jconfig.h.in
> >--- a/media/libjpeg/jconfig.h.in
> >+++ b/media/libjpeg/jconfig.h.in
> 
> Do we need this file at all?

jconfig.h is used to build some parts of the library (it is included by jinclude.h), so you need jconfig.h for sure.  It is designed to be auto-generated by the build system so that the proper features are enabled based on the system's capabilities, but if your build systems will always be fairly homogeneous, then you could get away with not including jconfig.h.in and simply including a pre-generated version.  We do that on Windows, for instance.


> >+ * Check what SIMD accelerations are supported.
> >+ *
> >+ * FIXME: This code is racy under a multi-threaded environment.
> 
> DRC, can you confirm my understanding of this race?  If multiple threads run
> this code, one might read that there's no simd support when there is.

I don't think that could happen.  The first thread into the function will read that simd_support==0xFF..FF (uninitialized) and will continue to set the proper value of that variable based on the /proc capabilities.  The next thread in will read that simd_support!=0xFF..FF and will not try to set the value.  The only potential danger is if Thread 2 tried to subsequently execute a SIMD routine while Thread 1 was still setting simd_support, and the worst case is that Thread 2 would be briefly unaccelerated until Thread 1 finishes setting simd_support.  simd_support is read every time a SIMD routine is executed, not just during initialization.


> Alternatively, both threads might do some extra work (running the full
> cpuinfo
> parsing), but neither will leak, and they'll both come to the correct
> conclusion about whether NEON is supported.  Right?

See https://sourceforge.net/mailarchive/forum.php?thread_name=4E81F960.2060002%40users.sourceforge.net&forum_name=libjpeg-turbo-devel

In summary:
-- Trying to fix the issue creates much worse potential problems, because it would require explicitly initializing the SIMD routines within the body of some API function (and even then, there is no guarantee that multiple threads won't try to simultaneously call the API function)
-- The potential for encountering the issue is very slight
-- The worst case scenario will not create any observable detriment to stability or performance


> It's not important, but I don't get why these if statements aren't
> assertions.
> Surely it's wrong to call jsimd_idct_islow if neon or vectorized islow is not
> supported.  So why guard with an if?

simd_support is a bitwise feature identifier.  In the future, it may be extended with other types of ARM vector instructions.  jsimd_arm.c is based on jsimd_i386.c, which can handle MMX, SSE, and SSE2.  Even though the ARM code only supports one instruction set at the moment, the code is written with future expansion in mind.


> Okay, maybe I get why this is "&=".  Does "force MMX" mean "use MMX only if
> the
> CPU supports it, but even if the CPU supports SSE"?  If so, the concept of
> "force NEON" doesn't make much sense, since there's no alternative.

See above regarding NEON.  It's there for future expansion.

Regarding MMX, yes.  Forcing a particular instruction set means "use this instruction set only if the CPU supports it and regardless of whether it supports others."
Comment 57 Justin Lebar (not reading bugmail) 2012-01-31 21:57:46 PST
> > DRC, can you confirm my understanding of this race?  If multiple threads run
> > this code, one might read that there's no simd support when there is.
>
> I don't think that could happen.  The first thread into the function will read that 
> simd_support==0xFF..FF (uninitialized) and will continue to set the proper value of that variable 
> based on the /proc capabilities.  The next thread in will read that simd_support!=0xFF..FF and will 
> not try to set the value.

Not necessarily, right?

The first thread could read ~0.  Then the second thread swoops in and, before the first thread sets simd_support = 0, also read ~0.  Now both threads will run the full procinfo parsing function.

I'm not saying this is bad.  It's fine, if there are no leaks.

I think we're on the same page here.

> Regarding MMX, yes.  Forcing a particular instruction set means "use this instruction set only if 
> the CPU supports it and regardless of whether it supports others."

Okay, great.  Thanks for clarifying!
Comment 58 DRC 2012-01-31 22:25:00 PST
(In reply to Justin Lebar [:jlebar] from comment #57)
> The first thread could read ~0.  Then the second thread swoops in and,
> before the first thread sets simd_support = 0, also read ~0.  Now both
> threads will run the full procinfo parsing function.

Yes, but that is the unlikeliest among already unlikely scenarios, and it should have zero ill effects, since the init_simd() function should be idempotent and thread-safe.
Comment 59 Christoph Diehl [:posidron] 2012-02-01 02:54:50 PST
@Ryan VanderMeulen
I will begin with fuzzing today if something is found I will CC you.
Comment 60 Christian Holler (:decoder) 2012-02-01 06:28:23 PST
I'll also see if I can give this a try today :)
Comment 61 Justin Lebar (not reading bugmail) 2012-02-01 07:03:09 PST
Apparently this does not need a security review meeting.  Here's what Curtis said:

"When bugs are marked sec-review-needed we try to determine the best way to do security work. You can tell if we have made a decision by looking in the whiteboard, if you see"[secr:<someusername>]" that means we have made a decision about what to do and the username shown is who is going to do that work. If you see curtisk (me) that usually means a full review, if you see another alias it is that person who is going to do a code review, or a fuzzer or some thing."
Comment 62 Ryan VanderMeulen [:RyanVM] 2012-02-01 07:20:59 PST
(In reply to Justin Lebar [:jlebar] from comment #54)
> Comment on attachment 593220 [details] [diff] [review]
> Checkpoint patch #6
> 
> >+dnl If we're on an ARM system which supports libjpeg-turbo's asm routines and
> >+dnl --disable-libjpeg-turbo wasn't passed, use gcc as assembler.
> 
> Nit: We use the C compiler as the assembler, right?
> 
I'll change "gcc" to "the C compiler" to be more clear.

> >-/* Define if your compiler supports prototypes */
> >+/* libjpeg-turbo version */
> >+#define LIBJPEG_TURBO_VERSION 0
> 
> Huh?
> 
That's how it's set in the upstream jconfig.h included in the /win directory. I can set it to 1.2.0.

> >diff --git a/media/libjpeg/jconfig.h.in b/media/libjpeg/jconfig.h.in
> >--- a/media/libjpeg/jconfig.h.in
> >+++ b/media/libjpeg/jconfig.h.in
> 
> Do we need this file at all?
> 
As DRC said, probably not. I'll remove it.
Comment 63 Siarhei Siamashka 2012-02-01 07:50:20 PST
A comment in the context of the security review. Allegedly Chromium has some security fixes for libjpeg-turbo: http://code.google.com/p/chromium/issues/detail?id=106954#c6

And as it looks like Chromium people don't really care about submitting any of their fixes upstream (which is perfectly legal and not required by the libjpeg-turbo license), the only way is to track what they are doing and take whatever is useful.
Comment 64 DRC 2012-02-01 11:17:49 PST
Can you figure out to what they are referring?  I had this same conversation with someone from Gentoo earlier this week, and I think the "security fixes" are referring to the supposed buffer over-read discovered when running LJT in valgrind.  This wasn't a "real" run-time issue-- only valgrind cared about it-- but it's fixed in the latest code anyhow.
Comment 65 DRC 2012-02-01 11:20:30 PST
(In reply to Ryan VanderMeulen from comment #62)
> That's how it's set in the upstream jconfig.h included in the /win
> directory. I can set it to 1.2.0.

No, there is no "upstream jconfig.h".  jconfig.h is auto-generated by the libjpeg-turbo build system, which populates LIBJPEG_TURBO_VERSION with the correct version number.
Comment 66 Justin Lebar (not reading bugmail) 2012-02-01 11:21:10 PST
At the risk of stating the obvious, it sounds like someone needs to diff the code.
Comment 67 Ryan VanderMeulen [:RyanVM] 2012-02-06 15:56:03 PST
(In reply to Christoph Diehl [:cdiehl] from comment #59)
> @Ryan VanderMeulen
> I will begin with fuzzing today if something is found I will CC you.

(In reply to Christian Holler (:decoder) from comment #60)
> I'll also see if I can give this a try today :)

Is no news good news, guys?
Comment 68 Christoph Diehl [:posidron] 2012-02-06 22:30:23 PST
Yes. I, personally would like to wait till the end of this week. In the whole we have currently a resource of around 300 selected JPEGs to build tests on. Christian made  also some coverage process to reduce that amount to a further minimum. Based on my hardware resources it will still take some amount of time to cycle through all those files and yet get a good coverage from a fuzzing perspective.
Comment 69 Christian Holler (:decoder) 2012-02-07 06:10:42 PST
(In reply to Ryan VanderMeulen from comment #67)
> 
> Is no news good news, guys?

From my side, you're good to go, I haven't found any issues. This week, I'll be busy in Santa Cruz so I won't be working on productive fuzzing (but I will work on some methods that could improve the fuzzing this area :))
Comment 70 Christoph Diehl [:posidron] 2012-02-10 06:23:04 PST
Nothing found.
Comment 71 Justin Lebar (not reading bugmail) 2012-02-10 07:25:34 PST
Ryan, are you up for diff'ing Chrome's libjpeg-turbo to double-check what "security" changes they made?
Comment 72 Ryan VanderMeulen [:RyanVM] 2012-02-10 07:30:17 PST
Sure, no problem.
Comment 73 Ryan VanderMeulen [:RyanVM] 2012-02-10 18:58:28 PST
Created attachment 596257 [details] [diff] [review]
Google patches to libjpeg-turbo 1.2.0

Thankfully, Google makes that easy by way of a ready-made patch included with their source!
Comment 74 Ryan VanderMeulen [:RyanVM] 2012-02-10 19:00:37 PST
From README.chromium:

Same as our copy of libjpeg-6b, this libjpeg-turbo also added a new file
jpeglibmangler.h and included it from jpeglib.h that changes the names of all
externally visible functions to chromium_* so that we can avoid conflicts that
arise when system libraries attempt to use our libjpeg. Also, we applied the
following changes which are not merged to upstream:
* Added the 'private_extern' flags on Mac (or the 'hidden' flags on Linux) to
  all the global symbols in '.asm' files to prevent making them external ones.
* Supported motion-JPEG frames that do not have DHT markers.
* Fixed valgrind errors.
The 'google.patch' file represents our changes from the original
libjpeg-turbo-1.2.
Comment 75 Ryan VanderMeulen [:RyanVM] 2012-02-10 20:16:29 PST
Created attachment 596264 [details] [diff] [review]
Upgrade to libjpeg-turbo 1.2.0

libjpeg-turbo 1.2.0 has been released. One last Try run for good measure:
https://tbpl.mozilla.org/?tree=Try&rev=b2083f3c4fb2
Comment 76 DRC 2012-02-11 01:33:05 PST
I'll comment on the Chromium changes individually:

-- Apparently the out-of-bounds read is no longer included in their patch because we fixed it upstream.  That represented the only potential security issue that I was aware of, although I'm not convinced it was a real issue and not just valgrind being overly anal.  I'll explain-- what libjpeg-turbo was doing was reading data in 16-byte chunks (the size of an SSE2 register), so if the buffer size was not actually a multiple of 16 bytes, the last read technically read some bytes that it wasn't supposed to, but that data was subsequently ignored, and due to memory alignment, I don't think the read ever technically fell outside the allocated region-- or, at least, none of the O/S's ever complained if it did.  However, it's a moot point, because it was fixed anyhow.

-- Motion JPEG:  Would be easy enough to include this, and it might even be something we could use in VirtualGL and TurboVNC.  I'd like to have some way of testing it, though.

-- Making the SIMD symbols private is a good idea.  It mainly benefits Mac, though because we accomplish the same thing on Linux using a version script.

-- The rest of their changes appear to be just fixing compiler warnings.

In short, I don't think there's anything critical in there.
Comment 77 DRC 2012-02-11 01:48:28 PST
One other:

-- Not sure why they hard-coded the PIC definition in jsimdext.inc.  This should be handled by the build system instead (I'm sure Chromium uses their own build system, but in our autotools-based one, nasm_lt.sh will add -DPIC to the NASM command line if the project was configured with --with-pic.)
Comment 78 DRC 2012-02-11 02:04:09 PST
More information.  Apparently Chromium uses a customized version of YASM that supports hiding symbols via a new flag (:private_extern) [reference: http://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/51a90cb0e84867be?pli=1].  Thus, that patch isn't usable upstream in its current form.
Comment 79 Ryan VanderMeulen [:RyanVM] 2012-02-11 03:34:22 PST
Created attachment 596299 [details] [diff] [review]
Upgrade to libjpeg-turbo 1.2.0

Try pointed out that I forgot a qrefresh before sending it and attaching here...
https://tbpl.mozilla.org/?tree=Try&rev=80d218addf25
Comment 80 Justin Lebar (not reading bugmail) 2012-02-12 08:16:21 PST
I installed this build on my phone, but I can't get about:telemetry to work, so I can't read the decoder's speed.  :-/

I'm going to spin a build on my machine with the relevant printf's added.  Once I've verified an increase in decoder speed, we can check this in!

Ryan, do you have push privileges, do you need me to push for you?
Comment 81 Justin Lebar (not reading bugmail) 2012-02-12 09:38:11 PST
Tests decoding a non-progressive test image [1] on my Nexus-S running ICS.

Vanilla: ~4000 compressed KB / sec == 3.8MP/s
Patched: ~5000 compressed KB / sec == 4.7MP/s

So it's an improvement, but not a particularly large one.  This could be because the operation is memory-bound, or because this drawing time takes into account other, slow operations (e.g. perhaps color management happens on the phone and is not vectorized).  (Do we really do color management on a phone??)

[1] http://en.wikipedia.org/wiki/File:Mandel_zoom_11_satellite_double_spiral.jpg
Comment 82 Zlip792 2012-02-12 09:54:30 PST
Sorry to again poke nose in serious discussion. Sorry in advance if my comment does not worth but in Chromium Snapshot they got performance improvement recently. It is the bug:
https://bugs.webkit.org/show_bug.cgi?id=78323
May be of some worth.
Comment 83 Justin Lebar (not reading bugmail) 2012-02-12 10:23:58 PST
That bug is for changing to IDCT_FAST on some platforms.

We could do that in Firefox too (e.g. bug 649020), but it's separate from libjpeg-turbo.
Comment 84 Zlip792 2012-02-12 10:34:40 PST
Sorry for then wasting your precious time... Thanks for reading it btw.
Comment 85 Justin Lebar (not reading bugmail) 2012-02-12 10:37:43 PST
(In reply to Ahmad Saleem (zlip) from comment #84)
> Sorry for then wasting your precious time... Thanks for reading it btw.

No worries!  You actually led me to discover that bug 649020 had fallen through the cracks.
Comment 86 Ryan VanderMeulen [:RyanVM] 2012-02-12 10:46:08 PST
(In reply to Justin Lebar [:jlebar] from comment #80)
> Ryan, do you have push privileges, do you need me to push for you?

I do not. Please do.
Comment 87 Justin Lebar (not reading bugmail) 2012-02-12 10:51:00 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/2acd68334b0b

\o/
Comment 88 Ted Mielczarek [:ted.mielczarek] 2012-02-12 14:13:11 PST
(In reply to Ryan VanderMeulen from comment #86)
> (In reply to Justin Lebar [:jlebar] from comment #80)
> > Ryan, do you have push privileges, do you need me to push for you?
> 
> I do not. Please do.

Do you have a bug on file to fix that? :)
Comment 89 Olli Pettay [:smaug] (high review load, please consider other reviewers) 2012-02-12 14:52:39 PST
Regression  Dromaeo (DOM) decrease 4.03% on Linux x64 Mozilla-Inbound-Non-PGO
-------------------------------------------------------------------------------
    Previous: avg 229.060 stddev 1.344 of 30 runs up to revision be556c99ef81
    New     : avg 219.831 stddev 1.953 of 5 runs since revision 2acd68334b0b
    Change  : -9.228 (4.03% / z=6.868)
    Graph   : http://mzl.la/xNRAlN

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=be556c99ef81&tochange=2acd68334b0b
Changesets:
  * http://hg.mozilla.org/integration/mozilla-inbound/rev/2acd68334b0b
    : Ryan VanderMeulen <ryanvm@gmail.com> - Bug 698519 - Update libjpeg-turbo to version 1.2.0. r=jlebar
    : http://bugzilla.mozilla.org/show_bug.cgi?id=698519
Comment 90 Justin Lebar (not reading bugmail) 2012-02-12 20:41:54 PST
This looks like a legitimate (although relatively small) regression.  I'll back it out.
Comment 91 Justin Lebar (not reading bugmail) 2012-02-12 20:47:31 PST
Backed out: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=075fe28b4d34
Comment 92 Justin Lebar (not reading bugmail) 2012-02-13 13:14:23 PST
Confirmed that the backout fixed the regression.

Interestingly, there was no corresponding drop the benchmark for PGO builds.

http://goo.gl/75iNw
Comment 93 Ted Mielczarek [:ted.mielczarek] 2012-02-13 13:19:03 PST
Doesn't seem like it ought to block landing this, then, since we're only shipping PGO builds. It would probably be nice to know what caused this, though.
Comment 94 Justin Lebar (not reading bugmail) 2012-02-13 13:25:19 PST
(In reply to Ted Mielczarek [:ted, :luser] from comment #93)
> Doesn't seem like it ought to block landing this, then, since we're only
> shipping PGO builds. It would probably be nice to know what caused this,
> though.

Indeed.  I guess I'm concerned that if this benchmark dropped, other measures might be affected even in PGO builds.
Comment 95 Ryan VanderMeulen [:RyanVM] 2012-02-13 15:02:32 PST
Looking at logs with and without the patch, the regression appears to be from query.html and traverse.html. The mean time for query.html went from about 6300 to 5400. For traverse.html, the mean time went from 187 to 169. Neither test (or anything in the dromaeo test suite, for that matter), includes JPG images.

The fact that the regression disappears with PGO indicates to me that it's something being optimized away. Maybe a code locality issue or something? Mike, any suggestions for how to investigate if that's the case?
Comment 96 Justin Lebar (not reading bugmail) 2012-02-13 15:04:39 PST
>  The mean time for query.html went from about 6300 to 5400. For traverse.html, the mean time went 
> from 187 to 169.

To be clear, these are scores (runs per second, I think), and bigger is better in Dromaeo.
Comment 97 Justin Lebar (not reading bugmail) 2012-02-13 18:14:47 PST
(In reply to Justin Lebar [:jlebar] from comment #81)
> So it's an improvement, but not a particularly large one.  This could be
> because the operation is memory-bound, or because this drawing time takes
> into account other, slow operations (e.g. perhaps color management happens
> on the phone and is not vectorized).  (Do we really do color management on a
> phone??)

We disable color management on the phone.  I wonder why this is so slow.  But that's for another bug.
Comment 98 Mike Hommey [:glandium] 2012-02-16 02:53:25 PST
(In reply to Ryan VanderMeulen from comment #95)
> The fact that the regression disappears with PGO indicates to me that it's
> something being optimized away. Maybe a code locality issue or something?
> Mike, any suggestions for how to investigate if that's the case?

Besides comparing profiling data (perf, oprofile, callgrind, pick your weapon of choice), I'm not sure how we can spot the difference.
Comment 99 Justin Lebar (not reading bugmail) 2012-02-16 07:11:14 PST
Hm, the same test just regressed again, by roughly the same amount.

Perhaps we passed some code-size threshold or something.

Regression :( Dromaeo (DOM) decrease 4.94% on Linux x64 Mozilla-Inbound-Non-PGO
-------------------------------------------------------------------------------
    Previous: avg 221.712 stddev 6.035 of 30 runs up to revision 04caa247e11a
    New     : avg 210.756 stddev 0.742 of 5 runs since revision db2216c57498
    Change  : -10.955 (4.94% / z=1.815)
    Graph   : http://mzl.la/y2Yog2

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=04caa247e11a&tochange=db2216c57498

Changesets:
  * http://hg.mozilla.org/integration/mozilla-inbound/rev/56c8584839b3
    : Robert O'Callahan <robert@ocallahan.org> - Bug 727661. We should only optimize ThebesLayers to ColorLayers or ImageLayers for brand-new layers (ThebesLayers with no currently valid data). Otherwise we risk ColorLayers being a deoptimization if there is valid content in the layer that doesn't happen to be visible but might become visible later, or if the layer contents are only temporarily a solid color and will soon need a buffer again. r=tn
    : http://bugzilla.mozilla.org/show_bug.cgi?id=727661

  * http://hg.mozilla.org/integration/mozilla-inbound/rev/db2216c57498
    : Robert O'Callahan <robert@ocallahan.org> - Bug 727694. Set up new mDecoderStateMachine with any preset initial volume. r=cpearce
    : http://bugzilla.mozilla.org/show_bug.cgi?id=727694

Bugs:
  * http://bugzilla.mozilla.org/show_bug.cgi?id=727661
  * http://bugzilla.mozilla.org/show_bug.cgi?id=727694
Comment 100 Justin Lebar (not reading bugmail) 2012-02-16 07:27:52 PST
So Dromaeo DOM regressed, then recently fixed itself without a backout.  I'm going to reland this patch and see what happens to the benchmark.
Comment 101 Siarhei Siamashka 2012-02-16 07:28:45 PST
There are such things as L1 instruction cache associativity and also the performance may be affected by instructions alignment (getting just 1 cycle worse in a critical loop may be quite measurable). Introducing big changes (such as upgrading internal libjpeg-turbo copy) causes many relative alignment drifts in unrelated parts of code. If this is the case, then it is really a good idea to compare profiling logs to precisely identify the part of code where the performance regresses.
Comment 102 Justin Lebar (not reading bugmail) 2012-02-16 07:37:40 PST
If that is indeed what's happening -- and it looks like there's some kind of alignment issue, as the regression fixed itself -- then we're probably in the clear because our PGO stats look fine.
Comment 103 Justin Lebar (not reading bugmail) 2012-02-16 07:42:50 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ef233849296
Comment 104 Mike Hommey [:glandium] 2012-02-16 07:50:07 PST
(In reply to Siarhei Siamashka from comment #101)
> There are such things as L1 instruction cache associativity and also the
> performance may be affected by instructions alignment (getting just 1 cycle
> worse in a critical loop may be quite measurable). Introducing big changes
> (such as upgrading internal libjpeg-turbo copy) causes many relative
> alignment drifts in unrelated parts of code. If this is the case, then it is
> really a good idea to compare profiling logs to precisely identify the part
> of code where the performance regresses.

I'm almost ready to buy that, except I'm expecting the code being exercised by the regressed tests to live in the main parts of libxul.so. libjpeg, however, is pretty much separate from that, and lives near the end of the library. I wouldn't expect a change in libjpeg to affect code alignment in the main parts of libxul.
Comment 105 Justin Lebar (not reading bugmail) 2012-02-16 14:03:19 PST
Cool, no benchmark movement this time.
Comment 106 Ed Morley [:emorley] 2012-02-17 05:33:36 PST
https://hg.mozilla.org/mozilla-central/rev/5ef233849296
Comment 107 xunxun 2012-10-06 20:28:17 PDT
libjpeg-turbo 1.2.1 is released.
Comment 108 Justin Lebar (not reading bugmail) 2012-10-06 22:24:38 PDT
We're on 1.2.1; see http://hg.mozilla.org/mozilla-central/file/ecd4c4304219/media/libjpeg/MOZCHANGES.

We're not running the absolute latest code, but I don't see anything important that we're missing.
Comment 109 xunxun 2012-10-06 22:35:02 PDT
(In reply to Justin Lebar [:jlebar] from comment #108)
> We're on 1.2.1; see
> http://hg.mozilla.org/mozilla-central/file/ecd4c4304219/media/libjpeg/
> MOZCHANGES.
> 
> We're not running the absolute latest code, but I don't see anything
> important that we're missing.

Well, I see https://bugzilla.mozilla.org/show_bug.cgi?id=771403

Though I searched libjpeg-turbo in bugzilla, there is no the bug771403 url.
Comment 110 Justin Lebar (not reading bugmail) 2012-10-06 22:37:59 PDT
(In reply to xunxun from comment #109)
> Well, I see https://bugzilla.mozilla.org/show_bug.cgi?id=771403
> 
> Though I searched libjpeg-turbo in bugzilla, there is no the bug771403 url.

Sorry, I don't understand what the problem is.  Is there something you think we need to do, or are we good here?
Comment 111 xunxun 2012-10-06 22:53:01 PDT
(In reply to Justin Lebar [:jlebar] from comment #110)
> (In reply to xunxun from comment #109)
> > Well, I see https://bugzilla.mozilla.org/show_bug.cgi?id=771403
> > 
> > Though I searched libjpeg-turbo in bugzilla, there is no the bug771403 url.
> 
> Sorry, I don't understand what the problem is.  Is there something you think
> we need to do, or are we good here?

I think I understand the reason.
If I searched libjpeg-turbo directly, it will return the results which have no RESO status.
I should enter https://bugzilla.mozilla.org/query.cgi to modify Status from Open to All.
Comment 112 Jeff Muizelaar [:jrmuizel] 2013-07-30 08:03:19 PDT
*** Bug 489148 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.