Closed Bug 256919 Opened 20 years ago Closed 13 years ago

Clean up assembly optimizations of libjpeg

Categories

(Core :: Graphics: ImageLib, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: tor, Assigned: tor)

Details

Attachments

(3 files, 5 obsolete files)

The attached patch is a modification of Bernard Alleysson's cleanup of the assembly modifications to libjpeg. It includes a later version of Michael Moy's SSE2 work than is currently in the tree. My modifications to Bernard's include changes to make it compile on platforms that can't build the sse2 code, reverting more changes/files to match the upstream libjpeg, and updating the copyright on the new files. I'm not sure what version of the SSE2 code is in this patch, but it does seem to have some problems on some images when run on a SSE2 capable machine. Example problematic image attached.
Attached patch assembly optimizations cleanup (obsolete) — Splinter Review
Could you elaborate on the problems with this image? I ran the image through my current code and the image matched that of IE's rendering. Of course this code is quite a bit different than what I'm running in my development environment.
Looked somewhat like the first pass of a progressive jpeg - low detail, block boundaries clearly visible. I'll attach a screenshot tomorrow.
Where are these supposed to be defined? C:\Program Files\Microsoft Visual Studio .NET 2003\Vc7\PlatformSDK\Include\BaseTsd.h(32) : warning C4142: benign redefinition of typ e f:/Mozilla\mozilla\jpeg\jdintel.c(94) : error C2065: 'INTEL_SSE2' : undeclared identifier f:/Mozilla\mozilla\jpeg\jdintel.c(96) : error C2065: 'INTEL_SSE' : undeclared identifier f:/Mozilla\mozilla\jpeg\jdintel.c(110) : error C2065: 'INTEL_MMX' : undeclared identifier make[3]: *** [jdintel.obj] Error 2 make[3]: Leaving directory `/cygdrive/f/Mozilla/mozilla/jpeg' make[2]: *** [tier_1] Error 2 make[2]: Leaving directory `/cygdrive/f/Mozilla/mozilla' make[1]: *** [default] Error 2 make[1]: Leaving directory `/cygdrive/f/Mozilla/mozilla' make: *** [build] Error 2 F:\Mozilla>grep INTEL_SSE2 jpeg.patch + if (intelsupport() & INTEL_SSE2) { + if (intelsupport() & INTEL_SSE2) { + if (intelsupport() & INTEL_SSE2) { + sseavailable |= INTEL_SSE2; + if (intel_mnemonics & INTEL_SSE2) { + intel_mnemonics &= ~INTEL_SSE2; +#define INTEL_SSE2 0x04
They're defined in jdintel.h: +#define INTEL_MMX 0x01 + +#define INTEL_SSE 0x02 + +#define INTEL_SSE2 0x04
HAVE_INTEL_MNEMONICS isn't getting defined in my environment. The compile command doesn't have HAVE_INTEL_MNEMONICS defined so it may be that I don't have MOZ_WIDGET_TOOLKIT setup. Not sure what IDGET_TOOLKIT is or what it implies. At any rate, I added -DHAVE_INTEL_MNEMONICS to the optimize line and that appears to work.
+ifndef GNU_CXX +ifeq ($(MOZ_WIDGET_TOOLKIT),windows) +CSRCS += jdintel.c +CFLAGS += -DHAVE_INTEL_MNEMONICS +endif +endif + I can't see how the jpeg code depends on the choice of widget toolkit, you probably want to enable assembly for Windows and not GCC (don't know about mingw), it would look like this: +ifeq ($(OS_ARCH)_$(GNU_CC),WINNT_) +CSRCS += jdintel.c +CFLAGS += -DHAVE_INTEL_MNEMONICS +endif Also jdintel.c has lot of code inside #if 0 (it is the old assembly code), Michael should check this out to update the file with his latest code and eventually remove the old code (or make another jdintel_old.c out of it to enable decoding comparison)
My build finished and it works just fine with the jpeg image in this bug. I'm working off the BRANCH_093 codebase. I'll have to put some debugging stuff in to ensure that the SSE2 code is actually being used though. I'll try the other makefile code too. MSVC is the only compiler that supports this form of inline assembler so it should be MSVC varianted. Not GCC may not be good enough if Mozilla ever supports the Intel compiler.
Michael: Not sure what's happening on your machine, since jdintel.c is only added to the build list in the ifdef chunk that also sets HAVE_INTEL_MNEMONICS. Bernard: Since we only have the windows toolkit for win32, the two tests are equivalent. Yours does look cleaner, though the "NT" bit is a bit misleading. I left the #if 0 chunck there for now because the large comment still references that code.
(In reply to comment #10) > Michael: Not sure what's happening on your machine, since jdintel.c is > only added to the build list in the ifdef chunk that also sets > HAVE_INTEL_MNEMONICS. rules.mk is wiping out CFLAGS after it gets set. I see the rendering problem now.
Ah, I didn't attach the latest version of the patch that uses DEFINES instead of CFLAGS. That should fix your problem.
Attached patch fix makefile (obsolete) — Splinter Review
Attachment #157004 - Attachment is obsolete: true
Didn't see anything obvious as to why the rendering problems are there. Will try a buildup approach (start with the minimal patch and add stuff until the problem shows up). There are some issues I have with the code as is in that we don't need to do CPU hardware and software checking. If the instruction doesn't execute in software, then we don't need to do the hardware check. Also, a flag test would be preferable to a routine call. I never had the question about quality and MMX answered. The MMX Fast code says that there's a compromise in quality. I've never seen this myself and haven't heard any complaints about MMX IFast but I haven't seen any discussion on this issue by others.
I got everything working for SSE2 at least. I've copied over some of the other files like jc* and am rebuilding. I think that none of the jc files are used by Mozilla and that the compiles could be dispensed with but that's just my opinion. One thing that I need to know is how to create patches for the new files. I'm using cvs diff -u for the existing files but I couldn't get this to work for the new files. Even with the -N swtch.
I took the currrent patches and redid the SIMD plumbing to simplify and make it faster.
Attached file jdintel.c source file (obsolete) —
I don't know how to turn this into a patch file.
Attached file jdintel.h c source (obsolete) —
I don't know how to turn a source file into a patch.
The current changes run on my SSE2 machine. I'm going to create a special build that will display the code used for rendering JPEGS so that I can be sure that it's using the right code when I test it on SSE and MMX machines. I don't have access to something like a Pentium I though. I'll need to dig one up somewhere.
Attachment #157073 - Attachment is obsolete: true
Attachment #157118 - Attachment is obsolete: true
Attachment #157119 - Attachment is obsolete: true
Attachment #157120 - Attachment is obsolete: true
There's a build with this code at: http://www.pryan.org/mozilla/firefox/mmoy/FireFox-Branch-093-JPEG-SIMD-experimental.exe It should be run with -console and will spit out the routine used (sse2, sse, mmx or just islow). It won't be fast because it will spit out a line per block decoded so displaying a large jpeg could take a while. I've already posted a request for a Pentium I tester at MozillaZine. I didn't use my latest code as it would involve changing a few other modules. I also have several other optimizations in the JPEG code, some SIMD and some just plain C code but those can be looked at later on.
Changes are verified on SSE.
The MMX code isn't rendering properly so I'll have to debug this. Looks like a colormap issue at the moment. Still looking for a Pentium 1 system for testing though.
It's going to take me a few days to get an MMX machine to look at the problem. I was wondering if you guys were in a hurry for this or not as I can just take out the MMX stuff for now. I'll still need to test MMX but I can find someone to make sure that the code works on MMX. I know someone with a Pentium 133 which I may be able to borrow.
We're not in that much of a hurry - trunk has lived without sse2 for quite some time and the alpha freeze is still about three weeks out. Finding someone to review x86 assembly is going to prove interesting, though.
(In reply to comment #23) > The MMX code isn't rendering properly so I'll have to debug this. MMX inverse dct is not used in the current code: http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx The else clauses like this "+ } else if (intel_mnemonics & INTEL_MMX) { + method_ptr = jpeg_idct_ifast_mmx; + method = JDCT_IFAST; " can be removed to match the current code and MMX inverse dct debugging can happen in another bug. There's also MMX "upsampling merging" and I don't know if it works.
(In reply to comment #26) > MMX inverse dct is not used in the current code: > http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx In fact MMX inverse dct isn't currently used in the code used by mozilla (JDCT_ISLOW) > > The else clauses like this > > "+ } else if (intel_mnemonics & INTEL_MMX) { > + method_ptr = jpeg_idct_ifast_mmx; > + method = JDCT_IFAST; > " > > can be removed Only the first occurrence please, jpeg_idct_ifast_mmx() is used with JDCT_IFAST Note: Mozilla always use JDCT_ISLOW, so some code size could be saved by removing JDCT_IFAST support, just like it was done with JDCT_IFLOAT at line: http://lxr.mozilla.org/aviarybranch/source/jpeg/jmorecfg.h#311
I know that it isn't used in the current code but it's been turned on in my MMX builds since May. Many other MMX builders use it too. It's probably fine to do it in a separate bug. I found an MMX machine headed for the trash and need to throw an OS on it. I've looked at the MMX upsampling code and have made improvements to the regular sampling code instead. I'd have to do some performance testing with the MMX stuff to see if it's actually useful. There's a huge amount of code in there to do some complex stuff and it isn't obvious to me that it's faster. (In reply to comment #26) > (In reply to comment #23) > > The MMX code isn't rendering properly so I'll have to debug this. > > MMX inverse dct is not used in the current code: > http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx > > The else clauses like this > > "+ } else if (intel_mnemonics & INTEL_MMX) { > + method_ptr = jpeg_idct_ifast_mmx; > + method = JDCT_IFAST; > " > > can be removed to match the current code and MMX inverse dct debugging can > happen in another bug. > > There's also MMX "upsampling merging" and I don't know if it works.
My first version of the MMX stuff changed the method in the decoder routine to use IFAST (I did some experiments with FLOAT too but the costs of converting integers to floats and doing floating point arithmetic and then converting back to integers proved too expensive). I later found that it was easier to just set the method type in jddctmgr.c and this is in current use by the unofficial builders. (In reply to comment #27) > (In reply to comment #26) > > MMX inverse dct is not used in the current code: > > http://lxr.mozilla.org/aviarybranch/ident?i=jpeg_idct_ifast_mmx > > In fact MMX inverse dct isn't currently used in the code used by mozilla > (JDCT_ISLOW) > > > > > The else clauses like this > > > > "+ } else if (intel_mnemonics & INTEL_MMX) { > > + method_ptr = jpeg_idct_ifast_mmx; > > + method = JDCT_IFAST; > > " > > > > can be removed > > Only the first occurrence please, jpeg_idct_ifast_mmx() is used with JDCT_IFAST > > Note: > > Mozilla always use JDCT_ISLOW, so some code size could be saved by removing > JDCT_IFAST support, just like it was done with JDCT_IFLOAT at line: > http://lxr.mozilla.org/aviarybranch/source/jpeg/jmorecfg.h#311 >
Turned out to be a Pentium Pro; not MMX. This might take a few days with some help from the unoffical build testers. At least I have the non-SIMD cases covered.
I did a build with my old MMX code and a reporter indicated that this worked for him. So the problem is somewhere either in moving the MMX code over to jdintel.* or related to moving some other code into jdintel.*. I've inspected the code several times but haven't found the cause so I'll start removing parts of the patch until it starts working again.
I've applied the patch (ff aviary branch) and I can't see any rendering problem with the MMX code on the few sites I went to. (I've commented out SSE/SSE2 detection and to force the use of MMX). The CPU is an Athlon 1700 XP.
> I've applied the patch (ff aviary branch) and I can't see any rendering problem > with the MMX code on the few sites I went to. (I've commented out SSE/SSE2 > detection and to force the use of MMX). The CPU is an Athlon 1700 XP. I received multiple reports of problems on MMX processors and I don't have an MMX machine at the moment, to test on. When I test the MMX code on SSE2 machines (by disabling the SSE and SSE2 code), I have no problems. I did go from the other direction and got something working but the turnaround time in getting other people to test stuff is too slow and I'd like to get this in this year. My current build is with the current patch with INTEL_MMX turned off. I'll be able to test this on Pentium 1, 3 and 4 and Athlon 64. I'll find someone to test on Pentium 2.
I can get you a PII to test on if you need it.
I'm on vacation right now and will be back on Monday.
So, where is it going ? Michael, can you ask gor reviews with the patch with SSE and without MMX and move any MMX work to another bug ? I would like to have SSE back on the trunk like it was before, if it is the plan.
Okay. I'm grabbing the trunk and will apply. I've been trying to get an mmx machine for a while but no luck. I've been working on a bunch of other performance improvements in Mozilla (speeding things up is more fun than admin). I'll see if I can get something for tonight. I have P1, P3, P4 and K8 but no MMX.
This is changed jdintel.c module to disable the idct MMX processing. I couldn't figure out how to create a patch on a new file (I did try the -N switch).
I've tested on P3 and K8 (same as P4). Will test P1 later today and I've requested a test from the folks on MozillaZine and pryan.org for a test on P2 (MMX).
I've got confirmation that the build is fine on an MMX machine.
(In reply to comment #36) > So, where is it going ? > Michael, can you ask gor reviews with the patch with SSE and without MMX and > move any MMX work to another bug ? > I would like to have SSE back on the trunk like it was before, if it is the plan. > Do you have an email address for gor?
Attachment #167636 - Attachment description: jddctmgr.c, changed to remove MMX flag setting → jdinput.c, changed to remove MMX flag setting
MMoy how does this relate to the other JPEG SSE bugs open?
It should be closed. The bug currently in play addresses color. There should be a bug that's decided on for idct. And the solution should be intrinsics for support on Linux, Windows and Mac OSX. And the 64-bit variants too.
(In reply to comment #43) > It should be closed. The bug currently in play addresses color. There should be > a bug that's decided on for idct. And the solution should be intrinsics for > support on Linux, Windows and Mac OSX. And the 64-bit variants too. > Not my bug - so eitehr you or Tor should close this...
QA Contact: imagelib
There's been talk that this bug should be closed, I'm doing that now. If this is the wrong decision please re-open.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: