Closed Bug 438881 Opened 16 years ago Closed 6 years ago

Significant cache misses for string conversion routines

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

x86
Windows Vista
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: mingqiu.sun, Unassigned)

Details

(Whiteboard: PACMAN)

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14
Build Identifier: tamarin-central-228c19321257

We noticed significant cache misses in one of the UTF string conversion routines. By injecting the following prefetch loop, we are able to reduce the cache misses by about 40%:

+			int i;
+			uint8 *temp = out;
+			for (i=0; i< outMax ; i=i+256)  
+				_asm prefetchnta temp+i;

As a result, sunspider/string-unpack-code performance is improved by about 10%. This technique could be applied elsewhere too, where cache misses are a problem.

Reproducible: Always

Steps to Reproduce:
1. run sunspider
2. measure cache read and write misses with vtune
3. identify hot methods wrt cache misses
Actual Results:  
test                                                   avm    avm2     %sp

sunspider\access-binary-trees.as                      47.0    47.0     0.0
sunspider\access-fannkuch.as                         125.0   125.0     0.0
sunspider\access-nbody.as                            171.0   171.0     0.0
sunspider\access-nsieve.as                            62.0    62.0     0.0
sunspider\bitops-3bit-bits-in-byte.as                 15.0    15.0     0.0
sunspider\bitops-bits-in-byte.as                      16.0    16.0     0.0
sunspider\bitops-bitwise-and.as                      203.0   203.0     0.0
sunspider\bitops-nsieve-bits.as                       46.0    46.0     0.0
sunspider\controlflow-recursive.as                    46.0    46.0     0.0
sunspider\crypto-aes.as                               62.0    62.0     0.0
sunspider\crypto-md5.as                               31.0    31.0     0.0
sunspider\crypto-sha1.as                              31.0    31.0     0.0
sunspider\math-cordic.as                              93.0    93.0     0.0
sunspider\math-partial-sums.as                       203.0   203.0     0.0
sunspider\math-spectral-norm.as                      156.0   156.0     0.0
sunspider\s3d-cube.as                                109.0   109.0     0.0
sunspider\s3d-morph.as                                93.0    93.0     0.0
sunspider\s3d-raytrace.as                            125.0   125.0     0.0
sunspider\string-fasta.as                             93.0    93.0     0.0
sunspider\string-unpack-code.as                     7609.0  6906.0     9.2
sunspider\string-validate-input.as                    93.0    93.0     0.0


patch file:

diff -Naur ../tamarin-central-copy/core/UnicodeUtils.cpp core/UnicodeUtils.cpp
--- ../tamarin-central-copy/core/UnicodeUtils.cpp	2008-03-05 13:54:47.298446000 -0800
+++ core/UnicodeUtils.cpp	2008-05-22 11:36:56.626858500 -0700
@@ -73,6 +73,13 @@
 		int outLen = 0;
 		if (out)
 		{
+			#ifdef WIN32
+			int i;
+			uint8 *temp = out;
+			for (i=0; i< outMax ; i=i+256)  
+				_asm prefetchnta temp+i;
+			#endif
+
 			// Output buffer passed in; actually encode data.
 			while (inLen > 0)
 			{
Attached patch prefetch loop (obsolete) — Splinter Review
I'll take it for now - I'll see if I can apply the same patch to the new strings.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee: nobody → daumling
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Attachment #324826 - Flags: review?(stejohns)
Comment on attachment 324826 [details] [diff] [review]
prefetch loop

rejecting only because this assumes WIN32 == Intel, which isn't true in all cases (eg various WinCE builds). You probably want something that specifically asks for Visual-C-on-Intel:

#if defined(_MSC_VER) && (defined(AVMPLUS_IA32) || defined(AVMPLUS_AMD64))
Attachment #324826 - Flags: review?(stejohns) → review-
Steven,

That is a very good point. Revised patch attached.
Attachment #337946 - Flags: review+
Flags: flashplayer-qrb?
Flags: flashplayer-qrb? → flashplayer-qrb+
Target Milestone: --- → flash10.x
Ready to land, I'll take it.
Assignee: daumling → lhansen
Priority: -- → P3
Attachment #324826 - Attachment is obsolete: true
Turns out we can't land this at present in its current form.  The Flash Player's minimum x86-32 requirement is MMX, not SSE1, yet SSE1 is required for the PREFETCHNTA instruction.  Since they do not build multiple static binaries but detect architectures at run-time this change would create a lot of complication for - at this time - relatively modest gain.
Assignee: lhansen → nobody
Status: ASSIGNED → NEW
Priority: P3 → --
Target Milestone: flash10.1 → Future
Note however that the optimization may apply to various platforms and builds:

- mac intel systems are always sse2 or better
- x86-64 systems ditto
Whiteboard: PACMAN
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: