Closed
Bug 438881
Opened 16 years ago
Closed 6 years ago
Significant cache misses for string conversion routines
Categories
(Tamarin Graveyard :: Virtual Machine, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
Future
People
(Reporter: mingqiu.sun, Unassigned)
Details
(Whiteboard: PACMAN)
Attachments
(1 file, 1 obsolete file)
574 bytes,
patch
|
stejohns
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Build Identifier: tamarin-central-228c19321257 We noticed significant cache misses in one of the UTF string conversion routines. By injecting the following prefetch loop, we are able to reduce the cache misses by about 40%: + int i; + uint8 *temp = out; + for (i=0; i< outMax ; i=i+256) + _asm prefetchnta temp+i; As a result, sunspider/string-unpack-code performance is improved by about 10%. This technique could be applied elsewhere too, where cache misses are a problem. Reproducible: Always Steps to Reproduce: 1. run sunspider 2. measure cache read and write misses with vtune 3. identify hot methods wrt cache misses Actual Results: test avm avm2 %sp sunspider\access-binary-trees.as 47.0 47.0 0.0 sunspider\access-fannkuch.as 125.0 125.0 0.0 sunspider\access-nbody.as 171.0 171.0 0.0 sunspider\access-nsieve.as 62.0 62.0 0.0 sunspider\bitops-3bit-bits-in-byte.as 15.0 15.0 0.0 sunspider\bitops-bits-in-byte.as 16.0 16.0 0.0 sunspider\bitops-bitwise-and.as 203.0 203.0 0.0 sunspider\bitops-nsieve-bits.as 46.0 46.0 0.0 sunspider\controlflow-recursive.as 46.0 46.0 0.0 sunspider\crypto-aes.as 62.0 62.0 0.0 sunspider\crypto-md5.as 31.0 31.0 0.0 sunspider\crypto-sha1.as 31.0 31.0 0.0 sunspider\math-cordic.as 93.0 93.0 0.0 sunspider\math-partial-sums.as 203.0 203.0 0.0 sunspider\math-spectral-norm.as 156.0 156.0 0.0 sunspider\s3d-cube.as 109.0 109.0 0.0 sunspider\s3d-morph.as 93.0 93.0 0.0 sunspider\s3d-raytrace.as 125.0 125.0 0.0 sunspider\string-fasta.as 93.0 93.0 0.0 sunspider\string-unpack-code.as 7609.0 6906.0 9.2 sunspider\string-validate-input.as 93.0 93.0 0.0 patch file: diff -Naur ../tamarin-central-copy/core/UnicodeUtils.cpp core/UnicodeUtils.cpp --- ../tamarin-central-copy/core/UnicodeUtils.cpp 2008-03-05 13:54:47.298446000 -0800 +++ core/UnicodeUtils.cpp 2008-05-22 11:36:56.626858500 -0700 @@ -73,6 +73,13 @@ int outLen = 0; if (out) { + #ifdef WIN32 + int i; + uint8 *temp = out; + for (i=0; i< outMax ; i=i+256) + _asm prefetchnta temp+i; + #endif + // Output buffer passed in; actually encode data. while (inLen > 0) {
Reporter | ||
Comment 1•16 years ago
|
||
Comment 2•16 years ago
|
||
I'll take it for now - I'll see if I can apply the same patch to the new strings.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Updated•16 years ago
|
Status: NEW → ASSIGNED
Updated•16 years ago
|
Attachment #324826 -
Flags: review?(stejohns)
Comment 3•16 years ago
|
||
Comment on attachment 324826 [details] [diff] [review] prefetch loop rejecting only because this assumes WIN32 == Intel, which isn't true in all cases (eg various WinCE builds). You probably want something that specifically asks for Visual-C-on-Intel: #if defined(_MSC_VER) && (defined(AVMPLUS_IA32) || defined(AVMPLUS_AMD64))
Attachment #324826 -
Flags: review?(stejohns) → review-
Reporter | ||
Comment 4•16 years ago
|
||
Steven, That is a very good point. Revised patch attached.
Reporter | ||
Comment 5•16 years ago
|
||
Updated•16 years ago
|
Attachment #337946 -
Flags: review+
Flags: flashplayer-qrb? → flashplayer-qrb+
Target Milestone: --- → flash10.x
Updated•15 years ago
|
Attachment #324826 -
Attachment is obsolete: true
Comment 7•15 years ago
|
||
Turns out we can't land this at present in its current form. The Flash Player's minimum x86-32 requirement is MMX, not SSE1, yet SSE1 is required for the PREFETCHNTA instruction. Since they do not build multiple static binaries but detect architectures at run-time this change would create a lot of complication for - at this time - relatively modest gain.
Assignee: lhansen → nobody
Status: ASSIGNED → NEW
Priority: P3 → --
Target Milestone: flash10.1 → Future
Comment 8•15 years ago
|
||
Note however that the optimization may apply to various platforms and builds: - mac intel systems are always sse2 or better - x86-64 systems ditto
Updated•14 years ago
|
Whiteboard: PACMAN
Comment 9•6 years ago
|
||
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Comment 10•6 years ago
|
||
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in
before you can comment on or make changes to this bug.
Description
•