Closed Bug 412608 Opened 17 years ago Closed 17 years ago

TT: rewrite === and == using CASE

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: edwsmith, Assigned: stejohns)

Details

Attachments

(1 file, 1 obsolete file)

Too many branches with the current approach.  cmp is an example of double dispatching on the arg type to the proper handler.
also redo OP_add with CASE
also possibly :coerce  -- currently uses CASE on builtintype but could probably also use it on boxtype to avoid a lot of redundant isnv/isobject/etc checks
Assignee: nobody → stejohns
Attached patch Patch (obsolete) — Splinter Review
OK, this is a bit of a radical approach, and it comes at a high cost (+400 bytes of Forth bytecode)... but it collapses ==, ===, and OP_add each into a single CASE (in most cases). There may well be a better way to do this in terms of space efficiency (lots of redundant entries), and it's rather a brute-force approach (let's enumerate all the possibilities!), but take a look and see what you think...
Attachment #297905 - Flags: review?(edwsmith)
props for cleverness, for sure!

i have mixed feelings about it though.  i like that it eliminates a CASE and its corresponding guard.  but as you said, we now have some really big cases, and i worry it might obscure expressions that would otherwise be available for matching expressions.

also, cmp uses double dispatch.  if it makes sense to multiply + flatten, should we do it for cmp too?

Often one of the operands has a known type and the other one doesn't.  in that case, one of the branches would be optimized out but the other one wouldn't.  By combining boxtypes into a single value, that branch can't be optimized out anymore, but we still end up with a single branch.  given the same outcome either way i'd want less forth code.

even if neither boxtype expression is const, it might match a previous standalone boxtype expression.  if we have xt(eq(boxtype,c)) (what CASE compiles to) and then we see it again, the second one could be optimized out.  but if one of the expressions is mul(boxtype,...), it wouldnt match the other one unless it also was a mul.

performance is king though... any idea if this is a measureable difference than the double-dispatch approach?
yeah, I have mixed feelings too... this was kind of a harebrained friday-afternoon-experiment :-) 

re: cmp, yes, *if* it makes sense for these, it might make sense there too. (I just ran out of experimentation time.)

any guess as to how often one operand has known type and the other doesn't? ie is this a gut feeling or do we have numbers? what numbers would be the tipping point?

re: less forth, yes, it's definitely more than I want too, but mostly in terms of bytecode rather than forth source. (it's not terse but neither is it hard to understand IMHO) I have a thought about how it could be compressed (in terms of bytecode, not forth source) but need to experiment to see how much might be saved.

re: matching a previous boxtype, yeah, I see your point here. again, the whole crux is how often this happens in practice, but that probably is hard to quantify without running a large ream of benchmarks (and hoping those are typical). 

re: performance, the quick benches I ran showed a pretty miniscule performance gain... my assumption is that the real gain would be not in improved performance but in fewer guards & side exits (and thus more cache space available for "real" JITted code), but I didn't have time to measure this Friday. (this would probably be a good thing to be able to measure IMHO, I may add a debug flag for it and/or add it to -Dstats)
(In reply to comment #5)

> any guess as to how often one operand has known type and the other doesn't? ie
> is this a gut feeling or do we have numbers? what numbers would be the tipping
> point?

It's a gut feeling based on many hours of looking at verbose output.  agree, these kinds of decisions are best to be data driven.  from a memory perspective, 400 bytes increase in code to eliminate many KB of guards, is a win.  but need data.

in terms of dynamic instruction count, its probably a wash.  swap case swap case vs mul add case? not sure. 


Got it. I'll try to gather up more hard numbers today. Like I said, this is more of an experiment than a definite way to go....
yes, and either single or dual dispatch on boxtype is better than the many if's that it replaces.  less code, but also the optimizer is only matching expressions.  so taking the boxtype multiple times is fodder to optimize, but "isnumber" then "isint" is not really optimizeable.

i'm seeing the same pattern for null pointer checks... isnv doesn't combine well with boxtype, so in cases when we'll take the boxtype and CASE anyway, its a win to push the NP check into the same case.  (eg toproto, and others)
the feeling I'm getting is that the vast majority of "is" checks (isnv isobject etc) should be replaced by CASE checks, but this is more a gut feeling than anything backed by numbers at this point...
Comment on attachment 297905 [details] [diff] [review]
Patch

going with the single flat dispatch seems fine since it eliminates the most guards and is reportedly smaller than the multi-dispatch equivalent.
Attachment #297905 - Flags: review?(edwsmith) → review+
I just did some measurement of single-flat-case vs. a nested-case approach, and interestingly, the number of side exits is pretty close to identical between them, with a nested-case approach having slightly fewer (probably due to constant optimization in a few cases). But since every exit is a considerable chunk of code, every bit helps, and IMHO the nested-case is more readable anyway, so I'll resubmit a patch using that approach.
Attached patch PatchSplinter Review
Rewrote OP_add, OP_equals, and OP_strictequals to use CASE rather than repeated IF's to minimize the number of guards/side-exits produced.

Added 2drop0 and 2drop1 words to forth, similar to the existing drop0 and drop1 words.

Added the -Dverbose_live flag; this split part of the behavior formerly in -Dverbose_opt_detail into a separate flag, and emits information about live LIR instructions and exit metrics. Note that this flag is now completely independent of other verbose flags.

Added a readUnalignedInt16() call to Interpreter for the several places we do this: Intel chips will allow an unaligned read in a single instruction so we special-case for them.
Attachment #298340 - Flags: review?(edwsmith)
Attachment #297905 - Attachment is obsolete: true
Comment on attachment 298340 [details] [diff] [review]
Patch

looks good
Attachment #298340 - Flags: review?(edwsmith) → review+
added as changeset:   293:8b4a9bb8c709
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Closing out all TT: transfer bugs that are resolved fixed.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: