Closed Bug 465322 Opened 16 years ago Closed 16 years ago

[redux] Shrink the interpreter stack frame

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: lhansen, Assigned: lhansen)

Details

Attachments

(1 file, 2 obsolete files)

Preliminary patch, tested on MacTel 32-bit only 16 years ago Lars T Hansen 110.94 KB, patch		Details \| Diff \| Splinter Review
Preliminary patch, many platforms 16 years ago Lars T Hansen 112.52 KB, patch		Details \| Diff \| Splinter Review
Patch 16 years ago Lars T Hansen 116.06 KB, patch	stejohns : review+ edwsmith : review+	Details \| Diff \| Splinter Review

Lars T Hansen

Assignee

Description

•

16 years ago

Attached patch Preliminary patch, tested on MacTel 32-bit only (obsolete) — Details — Splinter Review

Analysis has shown that the interpreter stack frame can be huge, ranging from 550 bytes on armcc/ARM to 2900 bytes on vc++/ARM.  This is an issue for small-stack systems and can even be a performance liability as the stack will grow deep for even moderately recursive functions.  The reasons are a combination of compilers' inability to reuse block-scoped variables, lots of local variables, and poor code generation (inability to reuse temps; many spurious temps not removed, probably because the function is so large).

This is a tracking bug for a fix for the problem.

Lars T Hansen

Assignee

Comment 1

•

16 years ago

Attached patch Preliminary patch, many platforms (obsolete) — Details — Splinter Review

Compiles on many platforms with the following frame sizes (obtained from the generated code):

  Mac 32    668
  Mac 64    376
  Win 32    600
  Win 64    1552 (!)
  ARM VC++  752

The atrocious frame size on Win64 is the result of many more spurious temps than even the Win32 build, and of course they're all 8 bytes.  Groan.

The main open issue is whether the design around the CallStackNode - now relegated to alloca'd memory, which may in turn be shunted onto the heap - is correct, since the object actually has a destructor that won't be called.

Assignee: nobody → lhansen

Attachment #348538 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 2

•

16 years ago

(In reply to comment #1)
>   Mac 32    668
>   Mac 64    376

I'm surprised that Mac 64 has a smaller stack frame size than Mac 32 -- does that gcc backend just do more aggressive temp reuse?  (Bodes well for moving to 64-bit userland on the Mac, though!)

Lars T Hansen

Assignee

Comment 3

•

16 years ago

(In reply to comment #2)
> (In reply to comment #1)
> >   Mac 32    668
> >   Mac 64    376
> 
> I'm surprised that Mac 64 has a smaller stack frame size than Mac 32 -- does
> that gcc backend just do more aggressive temp reuse?  (Bodes well for moving to
> 64-bit userland on the Mac, though!)

I'm surprised too.  In fact, 376/8 = 47, which is just two more than the number of local register-allocable variables in that function.  The obvious guess is that the extra registers available are put to good use.

I wouldn't extrapolate too much from this, the interpreter function is hard to compile and far from typical, but yes, it's encouraging.

My analysis is that gcc for ia32 is not doing a good job.  I've looked at the code emitted by gcc 4.0.1, and it systematically fails to reuse compiler-introduced temporaries.  Any improvement in that area in the 64-bit compiler should have a significant impact.  (gcc 4.2 creates a larger frame yet but I've no analysis as to why.)  The problem with VC++ 9.0 for ia32 is that it introduces a large number of spill locations for EAX in the interpreter.  It spills to these but never loads from them, and the net result is a large stack frame and a substantial amount of redundant computation.  I don't yet know why this is happening; looks like a compiler bug / misfeature.  The code emitted by VC9 for ARM and x64 have exactly the same problem, but from the numbers above it could look like there are even more spill locations introduced for x64 (I'm just guessing).

Lars T Hansen

Assignee

Comment 4

•

16 years ago

Attached patch Patch — Details — Splinter Review

Attachment #348816 - Attachment is obsolete: true

Attachment #348990 - Flags: review?(stejohns)

Lars T Hansen

Assignee

Updated

•

16 years ago

Attachment #348990 - Flags: review?(edwsmith)

Steven Johnson

Updated

•

16 years ago

Attachment #348990 - Flags: review?(stejohns) → review+

Steven Johnson

Comment 5

•

16 years ago

Comment on attachment 348990 [details] [diff] [review]
Patch

in WordcodeEmitter.cpp you could use skipU30() rather than multiple readU30() calls.

Edwin Smith

Updated

•

16 years ago

Attachment #348990 - Flags: review?(edwsmith) → review+

Lars T Hansen

Assignee

Comment 6

•

16 years ago

Pushed to tamarin-redux as changeset 1140:4d56af395653

Compared to the same code without this fix, there are some performance changes (interpreter-only code).  These are the deltas above 4% for GCC 4.0 with -O3 on MacOS 10.5.5, lightly loaded system:

jsbench/FFT.as                   18748   17767     5.5
jsbench/HeapSort.as               8879    9503    -6.6
jsbench/LUFact.as                11974   10995     8.9
jsbench/SOR.as                   71817   65771     9.2
jsbench/typed/HeapSort.as         9580   10606    -9.7
jsbench/typed/RayTracer.as       54300   52110     4.2

I wish I could say there was a trend here but the main difference between HeapSort and the others is that HeapSort conses up a lot fewer doubles, probably.  I'm not going to worry about it at this time.

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Chris Peyer

Comment 7

•

15 years ago

Resolved fixed engineering / work item that has been pushed.  Setting status to verified.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

[redux] Shrink the interpreter stack frame

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

Tracking

(Not tracked)

People

(Reporter: lhansen, Assigned: lhansen)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 2 obsolete files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Comment 5

Updated

Comment 6

Comment 7

Attachment

General

Description

File Name

Content Type