Clean up assembly optimizations of libjpeg

RESOLVED WONTFIX

Status

()

Core
ImageLib
RESOLVED WONTFIX
13 years ago
6 years ago

People

(Reporter: tor, Assigned: tor)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 5 obsolete attachments)

(Assignee)

Description

13 years ago
The attached patch is a modification of Bernard Alleysson's cleanup of the
assembly modifications to libjpeg.  It includes a later version of
Michael Moy's SSE2 work than is currently in the tree.

My modifications to Bernard's include changes to make it compile on
platforms that can't build the sse2 code, reverting more changes/files to
match the upstream libjpeg, and updating the copyright on the new files.

I'm not sure what version of the SSE2 code is in this patch, but it
does seem to have some problems on some images when run on a SSE2
capable machine.  Example problematic image attached.
(Assignee)

Comment 1

13 years ago
Created attachment 157004 [details] [diff] [review]
assembly optimizations cleanup
(Assignee)

Comment 2

13 years ago
Created attachment 157005 [details]
example problematic image

Comment 3

13 years ago
Could you elaborate on the problems with this image? I ran the image through
my current code and the image matched that of IE's rendering. Of course this
code is quite a bit different than what I'm running in my development environment.
(Assignee)

Comment 4

13 years ago
Looked somewhat like the first pass of a progressive jpeg - low detail,
block boundaries clearly visible.  I'll attach a screenshot tomorrow.

Comment 5

13 years ago
Where are these supposed to be defined?

C:\Program Files\Microsoft Visual Studio .NET
2003\Vc7\PlatformSDK\Include\BaseTsd.h(32) : warning C4142: benign redefinition
of typ
e
f:/Mozilla\mozilla\jpeg\jdintel.c(94) : error C2065: 'INTEL_SSE2' : undeclared
identifier
f:/Mozilla\mozilla\jpeg\jdintel.c(96) : error C2065: 'INTEL_SSE' : undeclared
identifier
f:/Mozilla\mozilla\jpeg\jdintel.c(110) : error C2065: 'INTEL_MMX' : undeclared
identifier
make[3]: *** [jdintel.obj] Error 2
make[3]: Leaving directory `/cygdrive/f/Mozilla/mozilla/jpeg'
make[2]: *** [tier_1] Error 2
make[2]: Leaving directory `/cygdrive/f/Mozilla/mozilla'
make[1]: *** [default] Error 2
make[1]: Leaving directory `/cygdrive/f/Mozilla/mozilla'
make: *** [build] Error 2

F:\Mozilla>grep INTEL_SSE2 jpeg.patch
+        if (intelsupport() & INTEL_SSE2) {
+        if (intelsupport() & INTEL_SSE2) {
+    if (intelsupport() & INTEL_SSE2) {
+    sseavailable |= INTEL_SSE2;
+      if (intel_mnemonics & INTEL_SSE2) {
+          intel_mnemonics &= ~INTEL_SSE2;
+#define INTEL_SSE2 0x04
(Assignee)

Comment 6

13 years ago
They're defined in jdintel.h:


+#define INTEL_MMX  0x01
+
+#define INTEL_SSE  0x02
+
+#define INTEL_SSE2 0x04

Comment 7

13 years ago
HAVE_INTEL_MNEMONICS isn't getting defined in my environment. The compile
command doesn't have HAVE_INTEL_MNEMONICS defined so it may be that I don't
have MOZ_WIDGET_TOOLKIT setup. Not sure what IDGET_TOOLKIT is or what it implies.

At any rate, I added -DHAVE_INTEL_MNEMONICS to the optimize line and that appears
to work.

Comment 8

13 years ago
+ifndef GNU_CXX
+ifeq ($(MOZ_WIDGET_TOOLKIT),windows)
+CSRCS  +=	jdintel.c
+CFLAGS +=	-DHAVE_INTEL_MNEMONICS
+endif
+endif
+

I can't see how the jpeg code depends on the choice of widget toolkit, you
probably want to enable assembly for Windows and not GCC (don't know about
mingw), it would look like this:

+ifeq ($(OS_ARCH)_$(GNU_CC),WINNT_)
+CSRCS  +=	jdintel.c
+CFLAGS +=	-DHAVE_INTEL_MNEMONICS
+endif

Also jdintel.c has lot of code inside #if 0 (it is the old assembly code),
Michael should check this out to update the file with his latest code and
eventually remove the old code (or make another jdintel_old.c out of it to
enable decoding comparison)

Comment 9

13 years ago
My build finished and it works just fine with the jpeg image in this bug. I'm 
working off the BRANCH_093 codebase. I'll have to put some debugging stuff in to
ensure that the SSE2 code is actually being used though.

I'll try the other makefile code too. MSVC is the only compiler that supports
this form of inline assembler so it should be MSVC varianted. Not GCC may not
be good enough if Mozilla ever supports the Intel compiler.
(Assignee)

Comment 10

13 years ago
Michael: Not sure what's happening on your machine, since jdintel.c is
only added to the build list in the ifdef chunk that also sets
HAVE_INTEL_MNEMONICS.

Bernard: Since we only have the windows toolkit for win32, the two tests
are equivalent.  Yours does look cleaner, though the "NT" bit is a bit
misleading.  I left the #if 0 chunck there for now because the large comment
still references that code.

Comment 11

13 years ago
(In reply to comment #10)
> Michael: Not sure what's happening on your machine, since jdintel.c is
> only added to the build list in the ifdef chunk that also sets
> HAVE_INTEL_MNEMONICS.

rules.mk is wiping out CFLAGS after it gets set.

I see the rendering problem now.
(Assignee)

Comment 12

13 years ago
Ah, I didn't attach the latest version of the patch that uses DEFINES
instead of CFLAGS.  That should fix your problem.
(Assignee)

Comment 13

13 years ago
Created attachment 157073 [details] [diff] [review]
fix makefile
Attachment #157004 - Attachment is obsolete: true

Comment 14

13 years ago
Didn't see anything obvious as to why the rendering problems are there. Will try
a buildup approach (start with the minimal patch and add stuff until the problem
shows up).

There are some issues I have with the code as is in that we don't need to do CPU
hardware and software checking. If the instruction doesn't execute in software,
then we don't need to do the hardware check.

Also, a flag test would be preferable to a routine call.

I never had the question about quality and MMX answered. The MMX Fast code says
that there's a compromise in quality. I've never seen this myself and haven't
heard any complaints about MMX IFast but I haven't seen any discussion on this
issue by others.

Comment 15

13 years ago
I got everything working for SSE2 at least. I've copied over some of the other
files like jc* and am rebuilding. I think that none of the jc files are used
by Mozilla and that the compiles could be dispensed with but that's just my
opinion.

One thing that I need to know is how to create patches for the new files. I'm
using cvs diff -u for the existing files but I couldn't get this to work for
the new files. Even with the -N swtch.

Comment 16

13 years ago
Created attachment 157118 [details] [diff] [review]
JPEG patches (not including jdintel.c, jdintel.h)

I took the currrent patches and redid the SIMD plumbing to simplify and make
it faster.

Comment 17

13 years ago
Created attachment 157119 [details]
jdintel.c source file

I don't know how to turn this into a patch file.

Comment 18

13 years ago
Created attachment 157120 [details]
jdintel.h c source

I don't know how to turn a source file into a patch.

Comment 19

13 years ago
The current changes run on my SSE2 machine. I'm going to create a special
build that will display the code used for rendering JPEGS so that I can be
sure that it's using the right code when I test it on SSE and MMX machines.

I don't have access to something like a Pentium I though. I'll need to dig one
up somewhere.
(Assignee)

Comment 20

13 years ago
Created attachment 157123 [details] [diff] [review]
recombine patch, some whitespace and bracing cleanup
Attachment #157073 - Attachment is obsolete: true
Attachment #157118 - Attachment is obsolete: true
Attachment #157119 - Attachment is obsolete: true
Attachment #157120 - Attachment is obsolete: true

Comment 21

13 years ago
There's a build with this code at:

http://www.pryan.org/mozilla/firefox/mmoy/FireFox-Branch-093-JPEG-SIMD-experimental.exe

It should be run with -console and will spit out the routine used (sse2, sse,
mmx or just islow). It won't be fast because it will spit out a line per block
decoded
so displaying a large jpeg could take a while. I've already posted a request for
a Pentium I tester at MozillaZine.

I didn't use my latest code as it would involve changing a few other modules. I
also have several other optimizations in the JPEG code, some SIMD and some just
plain C code but those can be looked at later on.

Comment 22

13 years ago
Changes are verified on SSE.

Comment 23

13 years ago
The MMX code isn't rendering properly so I'll have to debug this. Looks like
a colormap issue at the moment. Still looking for a Pentium 1 system for testing
though.

Comment 24

13 years ago
It's going to take me a few days to get an MMX machine to look at the problem.

I was wondering if you guys were in a hurry for this or not as I can just take
out the MMX stuff for now. I'll still need to test MMX but I can find someone to
make sure that the code works on MMX. I know someone with a Pentium 133 which I may
be able to borrow.
(Assignee)

Comment 25

13 years ago
We're not in that much of a hurry - trunk has lived without sse2 for
quite some time and the alpha freeze is still about three weeks out.
Finding someone to review x86 assembly is going to prove interesting,
though.

Comment 26

13 years ago
(In reply to comment #23)
> The MMX code isn't rendering properly so I'll have to debug this. 

MMX inverse dct is not used in the current code:
http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx

The else clauses like this

"+	} else if (intel_mnemonics & INTEL_MMX) {
+	  method_ptr = jpeg_idct_ifast_mmx;
+	  method = JDCT_IFAST;
"

can be removed to match the current code and MMX inverse dct debugging can
happen in another bug.

There's also MMX "upsampling merging" and I don't know if it works.

Comment 27

13 years ago
(In reply to comment #26)
> MMX inverse dct is not used in the current code:
> http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx

In fact MMX inverse dct isn't currently used in the code used by mozilla
(JDCT_ISLOW)

> 
> The else clauses like this
> 
> "+	} else if (intel_mnemonics & INTEL_MMX) {
> +	  method_ptr = jpeg_idct_ifast_mmx;
> +	  method = JDCT_IFAST;
> "
> 
> can be removed

Only the first occurrence please, jpeg_idct_ifast_mmx() is used with JDCT_IFAST

Note:

Mozilla always use JDCT_ISLOW, so some code size could be saved by removing
JDCT_IFAST support, just like it was done with JDCT_IFLOAT at line:
http://lxr.mozilla.org/aviarybranch/source/jpeg/jmorecfg.h#311

Comment 28

13 years ago
I know that it isn't used in the current code but it's been turned on in my
MMX builds since May. Many other MMX builders use it too. It's probably fine
to do it in a separate bug.

I found an MMX machine headed for the trash and need to throw an OS on it.

I've looked at the MMX upsampling code and have made improvements to the regular
sampling code instead. I'd have to do some performance testing with the MMX
stuff to see if it's actually useful. There's a huge amount of code in there
to do some complex stuff and it isn't obvious to me that it's faster.

(In reply to comment #26)
> (In reply to comment #23)
> > The MMX code isn't rendering properly so I'll have to debug this. 
> 
> MMX inverse dct is not used in the current code:
> http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx
> 
> The else clauses like this
> 
> "+	} else if (intel_mnemonics & INTEL_MMX) {
> +	  method_ptr = jpeg_idct_ifast_mmx;
> +	  method = JDCT_IFAST;
> "
> 
> can be removed to match the current code and MMX inverse dct debugging can
> happen in another bug.
> 
> There's also MMX "upsampling merging" and I don't know if it works.

Comment 29

13 years ago
My first version of the MMX stuff changed the method in the decoder routine
to use IFAST (I did some experiments with FLOAT too but the costs of converting
integers to floats and doing floating point arithmetic and then converting back
to integers proved too expensive). I later found that it was easier to just set
the method type in jddctmgr.c and this is in current use by the unofficial
builders.

(In reply to comment #27)
> (In reply to comment #26)
> > MMX inverse dct is not used in the current code:
> > http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx
> 
> In fact MMX inverse dct isn't currently used in the code used by mozilla
> (JDCT_ISLOW)
> 
> > 
> > The else clauses like this
> > 
> > "+	} else if (intel_mnemonics & INTEL_MMX) {
> > +	  method_ptr = jpeg_idct_ifast_mmx;
> > +	  method = JDCT_IFAST;
> > "
> > 
> > can be removed
> 
> Only the first occurrence please, jpeg_idct_ifast_mmx() is used with JDCT_IFAST
> 
> Note:
> 
> Mozilla always use JDCT_ISLOW, so some code size could be saved by removing
> JDCT_IFAST support, just like it was done with JDCT_IFLOAT at line:
> http://lxr.mozilla.org/aviarybranch/source/jpeg/jmorecfg.h#311
> 

Comment 30

13 years ago
Turned out to be a Pentium Pro; not MMX. This might take a few days with some
help from the unoffical build testers. At least I have the non-SIMD cases
covered.

Comment 31

13 years ago
I did a build with my old MMX code and a reporter indicated that this worked for
him. So the problem is somewhere either in moving the MMX code over to jdintel.*
or related to moving some other code into jdintel.*.

I've inspected the code several times but haven't found the cause so I'll start
removing parts of the patch until it starts working again.

Comment 32

13 years ago
I've applied the patch (ff aviary branch) and I can't see any rendering problem
with the MMX code on the few sites I went to. (I've commented out SSE/SSE2
detection and to force the use of MMX). The CPU is an Athlon 1700 XP.

Comment 33

13 years ago
> I've applied the patch (ff aviary branch) and I can't see any rendering problem
> with the MMX code on the few sites I went to. (I've commented out SSE/SSE2
> detection and to force the use of MMX). The CPU is an Athlon 1700 XP.

I received multiple reports of problems on MMX processors and I don't have an
MMX machine at the moment, to test on. When I test the MMX code on SSE2 machines
(by disabling the SSE and SSE2 code), I have no problems. I did go from the
other direction and got something working but the turnaround time in getting
other people to test stuff is too slow and I'd like to get this in this year.

My current build is with the current patch with INTEL_MMX turned off. I'll be
able to test this on Pentium 1, 3 and 4 and Athlon 64. I'll find someone to
test on Pentium 2.

Comment 34

13 years ago
I can get you a PII to test on if you need it.

Comment 35

13 years ago
I'm on vacation right now and will be back on Monday.

Comment 36

13 years ago
So, where is it going ?
Michael, can you ask gor reviews with the patch with SSE and without MMX and
move any MMX work to another bug ?
I would like to have SSE back on the trunk like it was before, if it is the plan.

Comment 37

13 years ago
Okay. I'm grabbing the trunk and will apply. I've been trying to get an mmx
machine for a while but no luck. I've been working on a bunch of other performance
improvements in Mozilla (speeding things up is more fun than admin).

I'll see if I can get something for tonight. I have P1, P3, P4 and K8 but no
MMX.

Comment 38

13 years ago
Created attachment 167636 [details]
jdinput.c, changed to remove MMX flag setting

This is changed jdintel.c module to disable the idct MMX processing. I couldn't

figure out how to create a patch on a new file (I did try the -N switch).

Comment 39

13 years ago
I've tested on P3 and K8 (same as P4). Will test P1 later today and I've
requested a test from the folks on MozillaZine and pryan.org for a test on P2 (MMX).

Comment 40

13 years ago
I've got confirmation that the build is fine on an MMX machine.

Comment 41

13 years ago
(In reply to comment #36)
> So, where is it going ?
> Michael, can you ask gor reviews with the patch with SSE and without MMX and
> move any MMX work to another bug ?
> I would like to have SSE back on the trunk like it was before, if it is the plan.
> 

Do you have an email address for gor?

Updated

13 years ago
Attachment #167636 - Attachment description: jddctmgr.c, changed to remove MMX flag setting → jdinput.c, changed to remove MMX flag setting

Comment 42

10 years ago
MMoy how does this relate to the other JPEG SSE bugs open?

Comment 43

10 years ago
It should be closed. The bug currently in play addresses color. There should be a bug that's decided on for idct. And the solution should be intrinsics for support on Linux, Windows and Mac OSX. And the 64-bit variants too.

Comment 44

10 years ago
(In reply to comment #43)
> It should be closed. The bug currently in play addresses color. There should be
> a bug that's decided on for idct. And the solution should be intrinsics for
> support on Linux, Windows and Mac OSX. And the 64-bit variants too.
> 

Not my bug - so eitehr you or Tor should close this...
QA Contact: imagelib
There's been talk that this bug should be closed, I'm doing that now. If this is the wrong decision please re-open.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.