Closed Bug 642412 Opened 14 years ago Closed 12 years ago

TI: investigate Kraken crypto performance

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jandem, Assigned: bhackett1024)

References

Details

Attachments

(1 file)

Standalone pbkdf2 benchmark 14 years ago Jan de Mooij [:jandem] 34.05 KB, application/x-javascript		Details

Jan de Mooij [:jandem]

Reporter

Description

•

14 years ago

TI+JM is slower than trunk JM on all Kraken crypto tests. TI and JM should do very well on crypto tests, so I'd like to investigate what's happening here. TI is also slower on SS crypto-aes, it's probably related.

Jan de Mooij [:jandem]

Reporter

Updated

•

14 years ago

Assignee: general → jandemooij

Status: NEW → ASSIGNED

Jan de Mooij [:jandem]

Reporter

Comment 1

•

14 years ago

I started analyzing pbkdf2. According to Shark we spend at least 8% in stubs::SetElem. Most of the time is spent inside sjcl.hash.sha256.prototype.C. This function has many setelem's, the most important one is inside the loop, something like this: b = d[a & 15] = (int + int) | 0; Type inference output: #35:00285: zero type 0: int #35:00286: bitor type 0: int #35:00287: setelem type 0: int monitored I think this is bug 621937 (JM not generating IC's for monitored setelem)

Depends on: 621937

Jan de Mooij [:jandem]

Reporter

Comment 2

•

14 years ago

Attached file Standalone pbkdf2 benchmark — Details

This is a beautified, single-file version of stanford-crypto-pbkdf2.

Jan de Mooij [:jandem]

Reporter

Comment 3

•

14 years ago

Brian, the type information for function C (line 344) is really bad. I'm not sure why but the type of all locals, arg and this is all "unknown". Better type information may get rid of the monitored setelem and is needed to make this benchmark fast.

Brian Hackett [Laid off!]

Assignee

Comment 4

•

14 years ago

First step, assume that arithmetic involving objects produces integers (rather than unknown values), monitor double/string/XML results dynamically. http://hg.mozilla.org/projects/jaegermonkey/rev/897c257e5405

Brian Hackett [Laid off!]

Assignee

Comment 5

•

14 years ago

Several fixes to inference precision. These mostly fall into the category of not overapproximating to capture the behavior of an unlikely event (calling a non-function, indexing an object with a non-integer), but waiting for it to happen dynamically. http://hg.mozilla.org/projects/jaegermonkey/rev/a900619f0e04 http://hg.mozilla.org/projects/jaegermonkey/rev/bff02e9994e3 http://hg.mozilla.org/projects/jaegermonkey/rev/ac5f63354ab2 http://hg.mozilla.org/projects/jaegermonkey/rev/b48f1d51c6f5 There is still some imprecision because of a few accesses to Math which are JSOP_NAME rather than GNAME. That can be fixed similarly to how we do monitored SETNAME/SETPROP, will file separately. For now adding '| 0' to the calls to Math.round and Math.floor should fix precision. I get these times: JM+TI (old): 691 JM+TI (new): 438 JM: 474 JM+TM: 345 V8: 229 Looking at shark, JM seems to be getting killed by the slow paths it takes for the '(...) | 0' operations in C's inner loop. We try to branchTruncateToInt32 on these, but on average about 1.5 times per iteration this truncation fails (the double does not fit in an int32) and we take I think a very expensive slow path. We could instead call an infallible native like the tracer does, but to get to V8's performance I think we should instead generate inline code which is good enough to capture the large finite doubles being truncated here.

Jan de Mooij [:jandem]

Reporter

Updated

•

14 years ago

Assignee: jandemooij → bhackett1024

Brian Hackett [Laid off!]

Assignee

Updated

•

14 years ago

Depends on: 643037

Brian Hackett [Laid off!]

Assignee

Comment 6

•

14 years ago

The patch in bug 643037 fixes the truncate-double paths and brings the JM+TI time down to 361 (still with the modification to the Math calls). Other things that should help: 1. Know that 'a' is an integer within the loop in the C function. It is initially an array but is then reused as an integer. Could use SSA-style techniques to add some flow sensitivity to let the inference distinguish its uses, or (probably better) make a Crankshaft-style speculation it is an integer within the loop body and hoist a check to the loop header. This is worth about 20ms (easy to test by hand). 2. Hoist array bounds checks and LICM the slots pointer computation on the 'd' array. 3. Regalloc tuning on the loop, get another GPR (heavy register pressure here). 4. Typed natives or inline code for various Array natives. Shark thinks we spend a decent amount of time in these.

Brian Hackett [Laid off!]

Assignee

Comment 7

•

14 years ago

Get precise type information for the results of JSOP_NAME/JSOP_CALLNAME (358ms on the unmodified test). http://hg.mozilla.org/projects/jaegermonkey/rev/9ee17aa5f938 This exposed an existing bug in the recompilation mechanism, where we would convert integer slots to doubles even if the recompiled code knew they were integers. The FrameState can be either more or less precise than the inferred types, and we only require args/locals inferred as int|double to be doubles at join points (the revised algorithm only uses the FrameState, not the inferred types, to figure out what needs repatching, and is a good deal cleaner). http://hg.mozilla.org/projects/jaegermonkey/rev/7154281e487b

Jan de Mooij [:jandem]

Reporter

Comment 8

•

12 years ago

We're still slower than V8 on these tests but the info in this bug is unlikely to help much.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

TI: investigate Kraken crypto performance

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: jandem, Assigned: bhackett1024)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Updated

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type