User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:18.104.22.168) Gecko/20060527 SUSE/22.214.171.124-1.3 Firefox/126.96.36.199 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:188.8.131.52) Gecko/20060527 SUSE/184.108.40.206-1.3 Firefox/220.127.116.11 Functions SetImageData() and SetAlphaData() are often called with 4 bytes to copy. A performance gain can be gotten by doing these copies in a single instruction rather than calling memcpy(). Reproducible: Always Steps to Reproduce: 1.Load page containing graphics. 2. 3. Actual Results: Lots of code executed unnecessarily. Expected Results: Code should recognize common data sets. The 2 functions noted above call memcpy() for any non-zero quantity of graphics data to copy. When I load the pages of the 10 most popular websites (http://www.alexa.com/site/ds/top_sites?cc=US&ts_mode=country&lang=none) I see that 13% of the calls to memcpy() are to copy only 4 bytes of data. A 4-byte copy can be done in a single instruction. When building Mozilla with inline memcpy() (-O2 optimization with MSVC compilers) the inline code takes many more CPU cycles than the simple compare-and-move I am suggesting. The savings in execution time should be even more stark when building with -O1 (no inlining of standard functions) as the change avoids overhead of passing params on the stack.
Going to all of those top 10 most popular websites with Seamonkey v1.0.2 involves a combined 25344 calls to memcpy() from SetImageData() and SetAlphaData(). Those calls break down like this: 38407 mticks, sizeKB = 7510, 4byte = 3145, small = 5328, large = 16871 where number of bytes to copy is: 4byte = (data == 4 bytes) small = (4 < data < 64) large = (data >= 64) and sizeKB is to total amount of data (expressed in KBytes) run through memcpy() for these calls. The exact numbers vary slightly from run to run because the advertisements shown on the pages vary. Roughly 13% of the calls to memcpy() are for 4-byte blocks. FYI.
Assignee: general → nobody
Status: UNCONFIRMED → NEW
Component: General → General
Ever confirmed: true
Product: Mozilla Application Suite → Core
QA Contact: general → general
Version: unspecified → Trunk
Steve, do you realize that this would only work on properly align data ? This type of operations can be very slow on many processors, and are even forbidden on some. I've had some algorithms runs 10 times slower because of misalignment (on Alpha). That's the reason that the compiler didn't take the obvious optimizations. Did you test only on Intel or AMD ? While this would work on some processors (Intel mainly), it can't be used on all.
memcpy is usually quite fast; I'd be surprised if you can do much better. Did you actually measure the performance of these changes?
Jo, Hmmm. I hadn't considered non-x86 processors. Alignment (on x86) isn't an issue. The memcpy() also copies the data 4 bytes at a time, without testing alignment, but is much less efficient because it's generalized for any amount of data. Here is the code generated by VS2003/MSVC 7.1. It's compiled at -O2 get get the inlined memcpy. if (aLength == 4) 01C69E5B cmp esi,4 01C69E5E jne gfxImageFrame::SetAlphaData+0AAh (1C69E6Ah) *((PRUint32 *)(alphaData + offset)) = *((PRUint32 *)aData); 01C69E60 mov ecx,dword ptr [ecx] 01C69E62 mov edx,dword ptr [this] 01C69E65 mov dword ptr [edi+edx],ecx else 01C69E68 jmp gfxImageFrame::SetAlphaData+0C5h (1C69E85h) memcpy(alphaData + offset, aData, aLength); 01C69E6A mov eax,dword ptr [this] 01C69E6D mov ecx,esi 01C69E6F mov esi,dword ptr [aData] 01C69E72 mov edx,ecx 01C69E74 shr ecx,2 01C69E77 add edi,eax 01C69E79 rep movs dword ptr [edi],dword ptr [esi] 01C69E7B mov ecx,edx 01C69E7D and ecx,3 01C69E80 rep movs byte ptr [edi],byte ptr [esi]
I'm withdrawing this patch because it really doesn't provide the performance improvement I was hoping for. There is a gain by not calling memcpy() just for 4 bytes, but that gain seems to be negated by the new data size check needed to avoid the memcpy() call. Oh, well.
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.