Closed Bug 588978 Opened 14 years ago Closed 14 years ago

TM: Add JSStackFrame* to JSFrameRegs

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bhackett1024, Assigned: bhackett1024)

References

Details

Attachments

(2 files)

It would be nice for JM if both f.fp and f.cx->fp did not have to be modified every time a JSStackFrame is pushed or popped.  If the current frame was instead accessed as cx->regs->fp, only f.regs.fp would need to be modified because cx->regs points directly there.  Accessing cx->fp currently costs about 2.5 million reads (for cx) and 2.5 million writes (for cx->fp) on SS, and 100 million of each on the V8 benchmark.  If updates to cx->fp are avoided at frame changes, as in bug 587707, 1.5 million reads and writes are still required.

It would be faster to move cx->fp into f.regs.fp rather than remove f.fp entirely, as this would avoid the reads of cx when the frame is being updated.  Moreover, register state and frame pointers tend to go hand in hand in the source; register state is not much use without the frame the pc/sp are associated with.
I did exactly this in the first WIP patches for the dual stack (since the benefits were magnified by having both an fp and ftp); I've just been too lazy to revisit since, so awesome!
This patch passes tests and trace-tests.
Assignee: general → bhackett1024
Attachment #467801 - Flags: review?(lw)
Any measured speedup on JM?
I haven't tested, this patch is against TM.  This was definitely a speedup (small, but these add up) when microbenchmarking for bug 587707.
Nice work slogging through all that.

>+ *   3. |ss|'s current frame is |cx->regs|.

cx->regs->sp

>-    /* If this segment is suspended, the top of the segment. */
>-    JSStackFrame        *suspendedFrame;

I'm a little surprised the JS_STATIC_ASSERT(sizeof(StackSegment) % sizeof(Value) == 0) didn't hit.  Have you built on 32-bit?

It seems easy to miss a cx->frame()/cx->maybeFrame() so definitely give it a good try-server'ing.  Before pushing to try, though, I'd try to build a whole browser.  (E.g., mxr shows cx->fp use in /embedding/browser/activex/src/plugin/LegacyPlugin.cpp)
Comment on attachment 467801 [details] [diff] [review]
patch adding JSStackFrame* to JSFrameRegs

r+, but with one suggested renaming: does s/cx->frame()/cx->fp()/ sound good?  In addition to being shorter, borrowing from the CPU arch use of 'fp', it means *top* frame.  If so, I don't know whether "maybefp"/"hasfp" or "maybeFp"/"hasFp" looks better... but one of those?  Brendan?
Attachment #467801 - Flags: review?(lw) → review+
+1 to fp(), shorter & canonical
Those names sound fine to me, no preference for maybefp vs. maybeFp.  I've only tested on 64-bit so far, though the full browser builds on OSX.  That LegacyPlugin.cpp file is windows-only, will grep some more before pushing to try.
Blocks: 587707
(In reply to comment #6)
> Nice work slogging through all that.
> 
> >+ *   3. |ss|'s current frame is |cx->regs|.
> 
> cx->regs->sp

cx->regs->fp, right?

fp, hasfp, maybefp -- any capital F and P hurts, maybeTopFrame is too long (and frame instead of fp connotes struct not ptr).

/be
OTOH I see savedPC and other all caps spellings of canonical "register" names...

/be
(In reply to comment #10)
> cx->regs->fp, right?

D'oh!  Yes

(In reply to comment #11)
> OTOH I see savedPC and other all caps spellings of canonical "register"
> names...

Don't let savedPC be the guide; I did that recently without thinking too hard about capitalization :-/   JSStackFrame::savedpc would have matched JSStackFrame::pc better.
Putting a patch through the try server tonight which is cx->fp(), cx->hasfp(), cx->maybefp().
It looks like this was a slowdown on TM, according to awfy.
I'm able to reproduce this slowdown locally (~5ms), will investigate.
OK, near as I can tell this slowdown is mostly due to whim-of-the-compiler issues in the interpreter.  cx->fp() is only accessed ~700k times by TM in SS, and 700k extra dereferences should cost << 1ms.  I think changing the way fp is accessed, even adding the fp() method, changes how the rest of the code is compiled.  Various weirdness:

- If I take the previous revision (7e5cdd50de10) and wrap accesses to cx->fp in a 'frame() { return fp; }', I get a consistent 2ms speedup on SS, though some benchmarks (e.g. tofte) get a consistent slowdown.

- In the pure interpreter, this patch is consistently 5ms faster on SS than the previous revision, even though it should be strictly slower.

Bottom line, I think it's best to wait and see how JM+TM and pure JM are affected.  If it does turn out to be faster (for whatever reason) to put the current frame on cx, this will be easy to put back as all the fp accesses are now wrapped in (supposedly performance-neutral, but apparently not) accessors.
This fixes a few uses of cx->fp only compiled under options not exercised by release/debug/tinderbox.
Attachment #468408 - Flags: review?(lw)
(In reply to comment #17)
> Bottom line, I think it's best to wait and see how JM+TM and pure JM are
> affected.  If it does turn out to be faster (for whatever reason) to put the
> current frame on cx, this will be easy to put back as all the fp accesses are
> now wrapped in (supposedly performance-neutral, but apparently not) accessors.

I was waiting to merge this change to JM to see if the perf thing got fixed. It sounds like you want me to merge it over to JM so you can see what happens and then fix it. So I will merge it now.
Attachment #468408 - Flags: review?(lw) → review+
Patch for unnecessary loads of f.cx in JM after merge from TM, discussed with dmandelin in IRC.

http://hg.mozilla.org/projects/jaegermonkey/rev/c0ca08abe183
bhackett: dmandelin: ping
[6:16pm] dmandelin: bhackett: pong
[6:16pm] bhackett: hey
[6:16pm] bhackett: bug in the last merge from TM
[6:16pm] bhackett: unnecessary load of f.cx, which for some reason slows things down for me
[6:16pm] bhackett: can I push this patch?
[6:16pm] bhackett: http://pastebin.mozilla.org/774428
[6:19pm] dmandelin: bhackett: i'm not sure what that load is for...are you saying that ArgReg1 never gets used after that?
[6:19pm] bhackett: dmandelin: it used to be used to store to cx->fp
[6:19pm] bhackett: this is immediately before returning, so it shouldn't be used afterwards
[6:19pm] bhackett: I'll test first though
[6:20pm] bhackett: some reason I get a 1ms improvement from the merge as is, but 4ms with this tweak
[6:20pm] dmandelin: bhackett: ok, so i deleted a store right after, but for forgot to remove that load? as long as it passes trace tests, go ahead and push that change. thx for catching!
[6:21pm] bhackett: dmandelin: sounds good
[6:21pm] dmandelin: bhackett: there is another one just like it TrampolineCompiler.cpp. could you get that one too?
[6:21pm] bhackett: ok
Somewhat noisy (revisions from bug 589746 mixed in), but on AWFY it looks like both JM and JM+TM improved by a few ms with this.  I get a consistent 4ms improvement testing locally.
(In reply to comment #18)
> Created attachment 468408 [details] [diff] [review]
> patch for some uncompiled cx->fp
> 
> This fixes a few uses of cx->fp only compiled under options not exercised by
> release/debug/tinderbox.

http://hg.mozilla.org/tracemonkey/rev/a71bd1f59054
this is fixed, right?

http://hg.mozilla.org/mozilla-central/rev/14a90a53ceeb
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: