Closed Bug 437136 Opened 16 years ago Closed 15 years ago

Reducing memory operation form prologue & epilogue of JITed code

Tracking

(Not tracked)

Status:

VERIFIED WONTFIX

People

(Reporter: habals, Unassigned)

Details

Attachments

(1 file)

replacing push & pop to sub & add in prologue 16 years ago Jungwoo Ha 631 bytes, patch		Details \| Diff \| Splinter Review

Jungwoo Ha

Reporter

Description

•

16 years ago

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.1; MS-RTC LM 8; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)
Build Identifier: 

To align the stack, there is a place in prologue that pushes EBP twice, and in epilogue it POPs twice. 
It is possible to switch first PUSH EBP to SUB ESP, 4 and POP to ADD ESP, 4.
This will increase the prologue instruction size by 3 bytes in x86, but it saves one memory operation. I ran it on Mac OS X with Core 2 Duo 2.8GHz, and got slight improvement on the performance. Here is the patch and performance result.

---
diff -r b7fa522d969b nanojit/Nativei386.cpp
--- a/nanojit/Nativei386.cpp    Tue Jun 03 08:54:09 2008 -0700
+++ b/nanojit/Nativei386.cpp    Tue Jun 03 15:38:32 2008 -0700
@@ -90,7 +90,8 @@
         NIns *patchEntry = _nIns;
                MR(FP, SP);
                PUSHr(FP); // push ebp twice to align frame on 8bytes
-               PUSHr(FP);
+               //PUSHr(FP);
+               SUBi(SP, 4);

                for(Register i=FirstReg; i <= LastReg; i = nextreg(i))
                        if (needSaving&rmask(i))
@@ -175,7 +176,8 @@
                for (Register i=UnknownReg; i >= FirstReg; i = prevreg(i))
                        if (restore&rmask(i)) { POP(i); }

-               POP(FP);
+               //POP(FP);
+               ADDi(SP,4);
                POP(FP);
         return  _nIns;
     }
---

./runtests.py -i 50 sunspider
Executing tests at 2008-05-31 12:36:50.245578
avm: /Users/habals/tamarin-tracing-unmod/dist/shell/avmshell
avm2: /Users/habals/tamarin-tracing/dist/shell/avmshell


test                                                   avm    avm2     %sp

sunspider/access-binary-trees.as                      84.0    84.0     0.0
sunspider/access-fannkuch.as                         138.0   136.0     1.4
sunspider/access-nbody.as                            160.0   160.0     0.0
sunspider/access-nsieve.as                            60.0    60.0     0.0
sunspider/bitops-3bit-bits-in-byte.as                 14.0    14.0     0.0
sunspider/bitops-bits-in-byte.as                      40.0    40.0     0.0
sunspider/bitops-bitwise-and.as                      206.0   201.0     2.4
sunspider/bitops-nsieve-bits.as                       52.0    52.0     0.0
sunspider/controlflow-recursive.as                    30.0    29.0     3.3
sunspider/crypto-aes.as                              169.0   169.0     0.0
sunspider/crypto-sha1.as                              39.0    39.0     0.0
sunspider/math-cordic.as                              52.0    52.0     0.0
sunspider/math-partial-sums.as                       196.0   194.0     1.0
sunspider/math-spectral-norm.as                       33.0    33.0     0.0
sunspider/s3d-cube.as                                155.0   154.0     0.6
sunspider/s3d-morph.as                                77.0    75.0     2.6
sunspider/string-fasta.as                            159.0   155.0     2.5
---

I found that after these pushes in prologue, SUBi SP,40 is executed.
I think by removing these pushes and combine SUB instruction into one, you'd get a better performance improvement. 
However, one of the EBP value in stack is used, so I'm not sure how to get rid of it. 
Any comment if this is possible or a right way to go?


Reproducible: Always

Steps to Reproduce:
1.
2.
3.

Jungwoo Ha

Reporter

Comment 1

•

16 years ago

Attached patch replacing push & pop to sub & add in prologue — Details — Splinter Review

replacing push & pop to sub & add in prologue

Edwin Smith

Comment 2

•

16 years ago

the prologue on windows is aligning esp with 8bytes, and making sure ebp is also aligned.  because of how the code is organized the prologue is messy and larger than it should be.  

further directions that expand the scope but may have more benefit:
- how about an optimized prolog for windows that does the 8-aligning of esp integrated with everything else, rather than two mini-prologs?

- should the pushes of esi, etc occur after saving ebp, to make the prologue "standard" (aids in debugging)

- the prolog is only executed when transitioning between interpreter and traces but not when jumping from one trace to another.  its exactly the same prolog for every trace.  we could handcode the prolog once and jump directly to a no-prologue trace.   this would mean having 1 prologue, period, vs 1 per trace like now.  code size then would not matter.

- related:  when calling a helper function that takes a floating point value (eg fmod) we do PUSH(ECX) twice, then a store to store the fp value.  should we intead do sub esp,8?  whats the size/speed tradeoff.  are there any issues with esp folding?

Ed

Dan Smith

Updated

•

15 years ago

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → WONTFIX

Chris Peyer

Updated

•

15 years ago

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Reducing memory operation form prologue & epilogue of JITed code

Categories

(Tamarin Graveyard :: Tracing Virtual Machine, defect)

Tracking

(Not tracked)

People

(Reporter: habals, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Updated

Updated

Attachment

General

Description

File Name

Content Type