Closed
Bug 413920
Opened 17 years ago
Closed 5 months ago
Investigate using SSE4 for CRC32 and String operations
Categories
(Core :: General, task)
Core
General
Tracking
()
RESOLVED
INACTIVE
People
(Reporter: mtschrep, Unassigned)
References
()
Details
Chips with SSE4 are now shipping and once we upgrade to VC2008 we get the use of new intrinsic ops (http://blogs.msdn.com/vcblog/archive/2007/10/18/new-intrinsic-support-in-visual-studio-2008.aspx). Of particular note in SSE4 are CRC32 Calculation and certain 128bit string compare ops. Not sure if either would be interested for NSS or JS. Also interesting to note that newer processors are greatly reducing latency of SSE instructions (http://www.hardwaresecrets.com/fullimage.php?image=6762) which can have a big impact on certain ops.
Just wanted to get this on the radar for Moz2 and beyond.
Comment 1•17 years ago
|
||
I'd like to make a few comments on the use of SIMD in general so that folks understand the ramifications in using SIMD.
The customer base will always have different processors so there has to be a cost/benefit analysis on whether use of SIMD code will hurt or help more. For code involving loops with the same thing done over and over again, it can make sense to have multiple sets of code for different processors as the cost in determining which code set to run is small compared to the performance gain. For small improvements, the cost in picking from multiple instruction sets can match the gains.
One way to get the best performance for each processor is to put out separate kits for each processor. Many of the unofficial builders have done this in the past but it is rather exhausting work. But the switching costs between releases is minimized.
I think that SSE2 is a comfortable base as the Pentium 4 was released in late 2000 so all recently sold computers should be SSE2-capable. SSE3 doesn't really add much. SSSE3 is useful but it isn't available in any AMD processors and is only available in Core 2 Duo processors. SSE4 will be released in two parts by Intel with some functionality in Penryn and the remainder to be added in Nehalem (late 2008). Penryn will support SSE4.1 but CRC32 support will arrive in SSE4.2 which will be in Nehalem.
Comment 2•17 years ago
|
||
FYI - SSE3 is supported by pretty much all 90nm AMD chips.
There is a project SSE4-CRC32 https://github.com/Voxer/sse4_crc32
Updated•2 years ago
|
Severity: normal → S3
Comment 4•5 months ago
|
||
No activity for a while, closing
Status: NEW → RESOLVED
Type: defect → task
Closed: 5 months ago
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•