Closed Bug 256919 Opened 20 years ago Closed 13 years ago

Clean up assembly optimizations of libjpeg

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: tor, Assigned: tor)

Details

Attachments

(3 files, 5 obsolete files)

The attached patch is a modification of Bernard Alleysson's cleanup of the
assembly modifications to libjpeg.  It includes a later version of
Michael Moy's SSE2 work than is currently in the tree.

My modifications to Bernard's include changes to make it compile on
platforms that can't build the sse2 code, reverting more changes/files to
match the upstream libjpeg, and updating the copyright on the new files.

I'm not sure what version of the SSE2 code is in this patch, but it
does seem to have some problems on some images when run on a SSE2
capable machine.  Example problematic image attached.
Attached patch assembly optimizations cleanup (obsolete) — Splinter Review
Could you elaborate on the problems with this image? I ran the image through
my current code and the image matched that of IE's rendering. Of course this
code is quite a bit different than what I'm running in my development environment.
Looked somewhat like the first pass of a progressive jpeg - low detail,
block boundaries clearly visible.  I'll attach a screenshot tomorrow.
Where are these supposed to be defined?

C:\Program Files\Microsoft Visual Studio .NET
2003\Vc7\PlatformSDK\Include\BaseTsd.h(32) : warning C4142: benign redefinition
of typ
e
f:/Mozilla\mozilla\jpeg\jdintel.c(94) : error C2065: 'INTEL_SSE2' : undeclared
identifier
f:/Mozilla\mozilla\jpeg\jdintel.c(96) : error C2065: 'INTEL_SSE' : undeclared
identifier
f:/Mozilla\mozilla\jpeg\jdintel.c(110) : error C2065: 'INTEL_MMX' : undeclared
identifier
make[3]: *** [jdintel.obj] Error 2
make[3]: Leaving directory `/cygdrive/f/Mozilla/mozilla/jpeg'
make[2]: *** [tier_1] Error 2
make[2]: Leaving directory `/cygdrive/f/Mozilla/mozilla'
make[1]: *** [default] Error 2
make[1]: Leaving directory `/cygdrive/f/Mozilla/mozilla'
make: *** [build] Error 2

F:\Mozilla>grep INTEL_SSE2 jpeg.patch
+        if (intelsupport() & INTEL_SSE2) {
+        if (intelsupport() & INTEL_SSE2) {
+    if (intelsupport() & INTEL_SSE2) {
+    sseavailable |= INTEL_SSE2;
+      if (intel_mnemonics & INTEL_SSE2) {
+          intel_mnemonics &= ~INTEL_SSE2;
+#define INTEL_SSE2 0x04
They're defined in jdintel.h:


+#define INTEL_MMX  0x01
+
+#define INTEL_SSE  0x02
+
+#define INTEL_SSE2 0x04
HAVE_INTEL_MNEMONICS isn't getting defined in my environment. The compile
command doesn't have HAVE_INTEL_MNEMONICS defined so it may be that I don't
have MOZ_WIDGET_TOOLKIT setup. Not sure what IDGET_TOOLKIT is or what it implies.

At any rate, I added -DHAVE_INTEL_MNEMONICS to the optimize line and that appears
to work.
+ifndef GNU_CXX
+ifeq ($(MOZ_WIDGET_TOOLKIT),windows)
+CSRCS  +=	jdintel.c
+CFLAGS +=	-DHAVE_INTEL_MNEMONICS
+endif
+endif
+

I can't see how the jpeg code depends on the choice of widget toolkit, you
probably want to enable assembly for Windows and not GCC (don't know about
mingw), it would look like this:

+ifeq ($(OS_ARCH)_$(GNU_CC),WINNT_)
+CSRCS  +=	jdintel.c
+CFLAGS +=	-DHAVE_INTEL_MNEMONICS
+endif

Also jdintel.c has lot of code inside #if 0 (it is the old assembly code),
Michael should check this out to update the file with his latest code and
eventually remove the old code (or make another jdintel_old.c out of it to
enable decoding comparison)
My build finished and it works just fine with the jpeg image in this bug. I'm 
working off the BRANCH_093 codebase. I'll have to put some debugging stuff in to
ensure that the SSE2 code is actually being used though.

I'll try the other makefile code too. MSVC is the only compiler that supports
this form of inline assembler so it should be MSVC varianted. Not GCC may not
be good enough if Mozilla ever supports the Intel compiler.
Michael: Not sure what's happening on your machine, since jdintel.c is
only added to the build list in the ifdef chunk that also sets
HAVE_INTEL_MNEMONICS.

Bernard: Since we only have the windows toolkit for win32, the two tests
are equivalent.  Yours does look cleaner, though the "NT" bit is a bit
misleading.  I left the #if 0 chunck there for now because the large comment
still references that code.
(In reply to comment #10)
> Michael: Not sure what's happening on your machine, since jdintel.c is
> only added to the build list in the ifdef chunk that also sets
> HAVE_INTEL_MNEMONICS.

rules.mk is wiping out CFLAGS after it gets set.

I see the rendering problem now.
Ah, I didn't attach the latest version of the patch that uses DEFINES
instead of CFLAGS.  That should fix your problem.
Attached patch fix makefile (obsolete) — Splinter Review
Attachment #157004 - Attachment is obsolete: true
Didn't see anything obvious as to why the rendering problems are there. Will try
a buildup approach (start with the minimal patch and add stuff until the problem
shows up).

There are some issues I have with the code as is in that we don't need to do CPU
hardware and software checking. If the instruction doesn't execute in software,
then we don't need to do the hardware check.

Also, a flag test would be preferable to a routine call.

I never had the question about quality and MMX answered. The MMX Fast code says
that there's a compromise in quality. I've never seen this myself and haven't
heard any complaints about MMX IFast but I haven't seen any discussion on this
issue by others.
I got everything working for SSE2 at least. I've copied over some of the other
files like jc* and am rebuilding. I think that none of the jc files are used
by Mozilla and that the compiles could be dispensed with but that's just my
opinion.

One thing that I need to know is how to create patches for the new files. I'm
using cvs diff -u for the existing files but I couldn't get this to work for
the new files. Even with the -N swtch.
I took the currrent patches and redid the SIMD plumbing to simplify and make
it faster.
Attached file jdintel.c source file (obsolete) —
I don't know how to turn this into a patch file.
Attached file jdintel.h c source (obsolete) —
I don't know how to turn a source file into a patch.
The current changes run on my SSE2 machine. I'm going to create a special
build that will display the code used for rendering JPEGS so that I can be
sure that it's using the right code when I test it on SSE and MMX machines.

I don't have access to something like a Pentium I though. I'll need to dig one
up somewhere.
Attachment #157073 - Attachment is obsolete: true
Attachment #157118 - Attachment is obsolete: true
Attachment #157119 - Attachment is obsolete: true
Attachment #157120 - Attachment is obsolete: true
There's a build with this code at:

http://www.pryan.org/mozilla/firefox/mmoy/FireFox-Branch-093-JPEG-SIMD-experimental.exe

It should be run with -console and will spit out the routine used (sse2, sse,
mmx or just islow). It won't be fast because it will spit out a line per block
decoded
so displaying a large jpeg could take a while. I've already posted a request for
a Pentium I tester at MozillaZine.

I didn't use my latest code as it would involve changing a few other modules. I
also have several other optimizations in the JPEG code, some SIMD and some just
plain C code but those can be looked at later on.
Changes are verified on SSE.
The MMX code isn't rendering properly so I'll have to debug this. Looks like
a colormap issue at the moment. Still looking for a Pentium 1 system for testing
though.
It's going to take me a few days to get an MMX machine to look at the problem.

I was wondering if you guys were in a hurry for this or not as I can just take
out the MMX stuff for now. I'll still need to test MMX but I can find someone to
make sure that the code works on MMX. I know someone with a Pentium 133 which I may
be able to borrow.
We're not in that much of a hurry - trunk has lived without sse2 for
quite some time and the alpha freeze is still about three weeks out.
Finding someone to review x86 assembly is going to prove interesting,
though.
(In reply to comment #23)
> The MMX code isn't rendering properly so I'll have to debug this. 

MMX inverse dct is not used in the current code:
http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx

The else clauses like this

"+	} else if (intel_mnemonics & INTEL_MMX) {
+	  method_ptr = jpeg_idct_ifast_mmx;
+	  method = JDCT_IFAST;
"

can be removed to match the current code and MMX inverse dct debugging can
happen in another bug.

There's also MMX "upsampling merging" and I don't know if it works.
(In reply to comment #26)
> MMX inverse dct is not used in the current code:
> http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx

In fact MMX inverse dct isn't currently used in the code used by mozilla
(JDCT_ISLOW)

> 
> The else clauses like this
> 
> "+	} else if (intel_mnemonics & INTEL_MMX) {
> +	  method_ptr = jpeg_idct_ifast_mmx;
> +	  method = JDCT_IFAST;
> "
> 
> can be removed

Only the first occurrence please, jpeg_idct_ifast_mmx() is used with JDCT_IFAST

Note:

Mozilla always use JDCT_ISLOW, so some code size could be saved by removing
JDCT_IFAST support, just like it was done with JDCT_IFLOAT at line:
http://lxr.mozilla.org/aviarybranch/source/jpeg/jmorecfg.h#311
I know that it isn't used in the current code but it's been turned on in my
MMX builds since May. Many other MMX builders use it too. It's probably fine
to do it in a separate bug.

I found an MMX machine headed for the trash and need to throw an OS on it.

I've looked at the MMX upsampling code and have made improvements to the regular
sampling code instead. I'd have to do some performance testing with the MMX
stuff to see if it's actually useful. There's a huge amount of code in there
to do some complex stuff and it isn't obvious to me that it's faster.

(In reply to comment #26)
> (In reply to comment #23)
> > The MMX code isn't rendering properly so I'll have to debug this. 
> 
> MMX inverse dct is not used in the current code:
> http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx
> 
> The else clauses like this
> 
> "+	} else if (intel_mnemonics & INTEL_MMX) {
> +	  method_ptr = jpeg_idct_ifast_mmx;
> +	  method = JDCT_IFAST;
> "
> 
> can be removed to match the current code and MMX inverse dct debugging can
> happen in another bug.
> 
> There's also MMX "upsampling merging" and I don't know if it works.
My first version of the MMX stuff changed the method in the decoder routine
to use IFAST (I did some experiments with FLOAT too but the costs of converting
integers to floats and doing floating point arithmetic and then converting back
to integers proved too expensive). I later found that it was easier to just set
the method type in jddctmgr.c and this is in current use by the unofficial
builders.

(In reply to comment #27)
> (In reply to comment #26)
> > MMX inverse dct is not used in the current code:
> > http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx
> 
> In fact MMX inverse dct isn't currently used in the code used by mozilla
> (JDCT_ISLOW)
> 
> > 
> > The else clauses like this
> > 
> > "+	} else if (intel_mnemonics & INTEL_MMX) {
> > +	  method_ptr = jpeg_idct_ifast_mmx;
> > +	  method = JDCT_IFAST;
> > "
> > 
> > can be removed
> 
> Only the first occurrence please, jpeg_idct_ifast_mmx() is used with JDCT_IFAST
> 
> Note:
> 
> Mozilla always use JDCT_ISLOW, so some code size could be saved by removing
> JDCT_IFAST support, just like it was done with JDCT_IFLOAT at line:
> http://lxr.mozilla.org/aviarybranch/source/jpeg/jmorecfg.h#311
> 

Turned out to be a Pentium Pro; not MMX. This might take a few days with some
help from the unoffical build testers. At least I have the non-SIMD cases
covered.
I did a build with my old MMX code and a reporter indicated that this worked for
him. So the problem is somewhere either in moving the MMX code over to jdintel.*
or related to moving some other code into jdintel.*.

I've inspected the code several times but haven't found the cause so I'll start
removing parts of the patch until it starts working again.
I've applied the patch (ff aviary branch) and I can't see any rendering problem
with the MMX code on the few sites I went to. (I've commented out SSE/SSE2
detection and to force the use of MMX). The CPU is an Athlon 1700 XP.
> I've applied the patch (ff aviary branch) and I can't see any rendering problem
> with the MMX code on the few sites I went to. (I've commented out SSE/SSE2
> detection and to force the use of MMX). The CPU is an Athlon 1700 XP.

I received multiple reports of problems on MMX processors and I don't have an
MMX machine at the moment, to test on. When I test the MMX code on SSE2 machines
(by disabling the SSE and SSE2 code), I have no problems. I did go from the
other direction and got something working but the turnaround time in getting
other people to test stuff is too slow and I'd like to get this in this year.

My current build is with the current patch with INTEL_MMX turned off. I'll be
able to test this on Pentium 1, 3 and 4 and Athlon 64. I'll find someone to
test on Pentium 2.
I can get you a PII to test on if you need it.
I'm on vacation right now and will be back on Monday.
So, where is it going ?
Michael, can you ask gor reviews with the patch with SSE and without MMX and
move any MMX work to another bug ?
I would like to have SSE back on the trunk like it was before, if it is the plan.

Okay. I'm grabbing the trunk and will apply. I've been trying to get an mmx
machine for a while but no luck. I've been working on a bunch of other performance
improvements in Mozilla (speeding things up is more fun than admin).

I'll see if I can get something for tonight. I have P1, P3, P4 and K8 but no
MMX.
This is changed jdintel.c module to disable the idct MMX processing. I couldn't

figure out how to create a patch on a new file (I did try the -N switch).
I've tested on P3 and K8 (same as P4). Will test P1 later today and I've
requested a test from the folks on MozillaZine and pryan.org for a test on P2 (MMX).
I've got confirmation that the build is fine on an MMX machine.
(In reply to comment #36)
> So, where is it going ?
> Michael, can you ask gor reviews with the patch with SSE and without MMX and
> move any MMX work to another bug ?
> I would like to have SSE back on the trunk like it was before, if it is the plan.
> 

Do you have an email address for gor?
Attachment #167636 - Attachment description: jddctmgr.c, changed to remove MMX flag setting → jdinput.c, changed to remove MMX flag setting
MMoy how does this relate to the other JPEG SSE bugs open?
It should be closed. The bug currently in play addresses color. There should be a bug that's decided on for idct. And the solution should be intrinsics for support on Linux, Windows and Mac OSX. And the 64-bit variants too.
(In reply to comment #43)
> It should be closed. The bug currently in play addresses color. There should be
> a bug that's decided on for idct. And the solution should be intrinsics for
> support on Linux, Windows and Mac OSX. And the 64-bit variants too.
> 

Not my bug - so eitehr you or Tor should close this...
QA Contact: imagelib
There's been talk that this bug should be closed, I'm doing that now. If this is the wrong decision please re-open.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: