Closed Bug 428750 Opened 14 years ago Closed 5 years ago

Add PAETH Filter MMX processing back to PNG on Windows


(Core :: ImageLib, defect)

Windows XP
Not set





(Reporter: mmoy, Assigned: mmoy)



(2 files, 1 obsolete file)

Version 1.4 of the PNG code removed the MMX inline assembler code that provided improved PNG performance (by about 25%). Code was provided for GCC-style and MSVC-style inline assembler code. I had a look at the four optimized filter routines (UP, SUB, AVG and PAETH) and found UP and SUB provided no apparent benefit, AVG provides a very small improvement and PAETH provides about a 12% improvement overall using an AMD K8 processor. I think that the rest of the performance comes from the interlace code.

This patch just took the old MSVC MMX code for the PAETH filter, and added a processor check for MMX support.

I tried bringing the GCC-style code for Mac OSX but ran into various problems with GCC-style inline assembler as it is harder to work with compared to MSVC. And I think that the original writers of the MMX code used global variables to get around the problem of accessing local variables. This, in turn, might have resulted in thread-safety issues. This is just a guess on my part as I didn't follow the reasoning for removing this code.
I'm not sure that the test for MMX support is needed. 

According to Firefox 3 requires at minimum a Pentium/233MHz CPU.  That being the case, what Mozilla-supported processor *doesn't* have MMX capabilities?
It's nice to be able to assume at least MMX. This patch removes the MMX sniffer and branching based on capabilities.
Attachment #315333 - Attachment is obsolete: true
Flags: wanted-next+
Attached image Graphic o' doom

This will cause a protection fault in png_read_filter_row_mmx_paeth(), in the 3BPP case.
Occurs both with and without PGO, with VS2005/SP1.

Preliminary investigation:

         mov ecx, FullLength   
         mov eax, ecx
         sub eax, ebx          
         and eax, 0x00000007   
         sub ecx, eax        // underflows to a number just less than 2^32;
         mov MMXLength, ecx  //   then used as brake to terminate looping
   } // end _asm block
   // Now do the math for the rest of the row
   switch ( bpp )
      case 3:
         ActiveMask.use = 0x0000000000ffffff;
         ActiveMaskEnd.use = 0xffff000000000000;
         ShiftBpp.use = 24;    // == bpp(3) * 8
         ShiftRem.use = 40;    // == 64 - 24
dpthgo:                        // EBX == diff == 0x0f
         mov ecx, FullLength   // ECX == FullLength == 0x06
         mov eax, ecx          // EAX == ECX == 0x06
         sub eax, ebx          // EAX == (0x06 - 0x0f) == 0xFFFFFFF7
         and eax, 0x00000007   // EAX == 0x07
         sub ecx, eax          // ECX == (0x06 - 0x07) == 0xFFFFFFFF
         mov MMXLength, ecx    // == increment pointer until 0xFFFFFFFF
   } // end _asm block
   // Now do the math for the rest of the row
   switch ( bpp )
      case 3:
            add ebx, 8
            pand mm1, ActiveMaskEnd
            paddb mm1, [edi + ebx - 8] // add Paeth predictor with Raw(x)

            cmp ebx, MMXLength         // <==== advance past valid addresses
            pxor mm0, mm0              // pxor does not affect flags
            movq [edi + ebx - 8], mm1  // write back updated value
                                 // mm1 will be used as Raw(x-bpp) next loop
                           // mm3 ready to be used as Prior(x-bpp) next loop
            jb dpth3lp
Is this patch a clean-room rewrite?  We cannot use what was in libpng-1.2.x because of a licensing issue.
Flags: needinfo?(mmoy)
Closed: 5 years ago
Resolution: --- → WONTFIX
Flags: needinfo?(mmoy)
You need to log in before you can comment on or make changes to this bug.