Open Bug 550314 Opened 14 years ago Updated 2 years ago

NS_CopySegmentToBuffer() could be faster

Tracking

()

Status:

NEW

People

(Reporter: swsnyder, Unassigned)

Details

(Keywords: perf)

Steve Snyder

Reporter

Description

•

14 years ago

NS_CopySegmentToBuffer() is basically just a wrapper around memcpy(), and a lot of data passes through it.  The copying of data would be faster if the source and dest pointers were 16-byte aligned.  At this writing the dest pointer is simply base+offset with no regard to alignment.

Most contemporary x86 implementations of memcpy() use SIMD (SSE2, etc.) instructions to copy data in 16-byte blocks.  This yields better performance than just using integer-sized copies.

VS2005/SP1 is one such compiler to provide SSE2 data copying.  There's a kink in Microsoft's implementation, though: both source and destination pointers MUST have the same alignment.  They apparently didn't think the gains from unaligned SIMD memory accesses were worth supporting, so its both pointers aligned (allowing adjustment to a 16-byte boundary) or nothing.

My knowledge of C++ is too sketchy to do the work myself, but I don't think the changes to ensure alignment would that extensive.

See file: C:\Program Files\Microsoft Visual Studio 8\VC\crt\src\intel\memcpy.asm

Steve Snyder

Reporter

Updated

•

14 years ago

Keywords: perf

Boris Zbarsky [:bzbarsky]

Comment 1

•

14 years ago

1)  Is this actually showing up in profiles?
2)  Does memcpy not handle copying up to alignment and then using SSE?  I'd think
    it does.
3)  Making the alignment work out would mean changing all consumers of this
    function, right?  And possibly various upstream consumers.  Cound be done;
    but is it worth it (see comment 1).

Steve Snyder

Reporter

Comment 2

•

14 years ago

1. NS_CopySegmentToBuffer() itself is not featured in profiling, but memcpy(), where the real work is done, is certainly prominent.  How much of that memcpy() CPU use can be attributed to NS_CopySegmentToBuffer() is unknown.

2. If the alignment of the 2 pointers is equal, then memcpy() will adjust up to 16-byte alignment, then use SIMD copying thereafter.  For example, if src=0xXXXXXX04 and dst=0xXXXXXX14 then integer copying is used until src=0xXXXXXX10 and dst=0xXXXXXX20 (both 16-byte aligned).  If we start with, say, src=0xXXXXXX04 and dst=0xXXXXXX18 then SIMD will not be used because the pointers cannot be adjusted until both are 16-byte aligned.

3. Is it worth it?  Depends on how much work has to be done to ensure alignment.  This seems like a cheap way to improve performance (because Microsoft has already written the SIMD copying code).  If pointer alignment would be unduly disruptive, maybe it's not so cheap.

Benjamin Smedberg

Comment 3

•

14 years ago

Wouldn't any decent profiler be able to give a hierarchical profile to tell you whether memcpy was being called under NS_CopySegmentToBuffer?

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

NS_CopySegmentToBuffer() could be faster

Categories

(Core :: XPCOM, defect)

Tracking

()

People

(Reporter: swsnyder, Unassigned)

References

Details

(Keywords: perf)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Updated