Closed Bug 601794 Opened 14 years ago Closed 13 years ago

Excessive startup time due to quadratic VarTracker behavior

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P3)

defect

Tracking

(Not tracked)

VERIFIED FIXED
Q2 11 - Wasabi

People

(Reporter: wmaddox, Assigned: wmaddox)

Details

(Whiteboard: WE:2700709,fixed-in-salt,fixed-in-spicy,fixed-in-spicier)

Attachments

(5 files, 4 obsolete files)

Time spent in trackForwardEdge() is O(number_of_forward_edges * number_of_vars).
Certain mechanically-generated methods can be pathogical.  One method observed in the wild is 1809481 bytes long with with 149988 variables.
In the motivating example of a lengthy initialization method, the code is all straight-line control flow without explicit branches appearing in the ABC.  The problematic forward edges arise due to null checks, which branch to common code to throw an exception on a null pointer.

This patch is a workaround for this case.  We take advantage of the fact that the branch target immediately throws an exception, and that vartracker state is not propagated to the handler in this case.  (This is actually a bit subtle -- it is only because we throw the exception in the epilogue, rather than at the point the null check is performed, that this is the case.)

Note: This patch is against tamarin-redux-argo, though similar issues apply to TR.
Attachment #480806 - Attachment is patch: true
Whiteboard: WE:2700709
tr-argo/deb/shell/avmshell -Dverifyonly  ~/avmglue.abc ~/es29.swf

Patch reduces time for -Dverifyonly from 4m25s to 28s, tr-argo release-debugger build.  Time with -Dinterp is 1s, so clearly there's still a lot of time going to the JIT.

Applied to fp10-salt, es29.swf starts successfully, albeit slowly.
There are multiple cases of ~4:1 verifyonly/interp ratio in this dataset, but the ~9:1 seen with es29.swf is a bit of an outlier.  Absolute verify times are much lower, as are interpreter times.  It's unclear how much any of this means without looking at how much of the swf size is actually do to AS3 code, and the distribution of the method sizes.
Assignee: nobody → wmaddox
Only lists methods if framesize > 60 or codesize > 10000.  out of 30K swfs, there's only 12K 'large' methods, and only a handful look like outliers.
(In reply to comment #2)

> Applied to fp10-salt, es29.swf starts successfully, albeit slowly.

For release builds, es29.swf starts in 14s with the patch, compared to 24s on fp10-coral, so we're seeing a nice speedup from 10.1 once the regression is mitigated.
TR-argo passes acceptance tests with this patch.
Patch needs to be carefully reviewed for correctness issues.
Attachment #480806 - Attachment is obsolete: true
Attachment #480995 - Flags: review?(edwsmith)
Attachment #480995 - Flags: feedback?(rreitmai)
The code might be slightly cleaner if the trackEdge flag were on CodegenLabel.  We don't need it for this app, but the same flag probably makes sense for the other handlers too (upe, npe, interrupt, mops-range-check).

For the sake of Salt or another maintenance release, we should keep the scope limited, and the patch is probably fine as-is (pending more testing).
Attachment #480995 - Flags: superreview?(edwsmith)
Attachment #480995 - Flags: review?(rreitmai)
Attachment #480995 - Flags: review?(edwsmith)
Attachment #480995 - Flags: feedback?(rreitmai)
Also, not that it really matter much, but just curious why we made it a generic trackEdge flag rather than trackForwardEdge flag, seeing as this is supposed to be a short-lived spot fix, no?
(In reply to comment #8)
> Also, not that it really matter much, but just curious why we made it a generic
> trackEdge flag rather than trackForwardEdge flag, seeing as this is supposed to
> be a short-lived spot fix, no?

It's not clear that it will be short-lived, in the sense that TR will likely need something similar until selective translation / OSR is implemented.  That said, the only edges that are not tracked are known to be forward edges, so it's not strictly necessary.  I had in mind the notion of "not tracking an edge" and implementing it completely, rather than specializing the API to its only use. That would save a few instructions.
Someone earlier had the idea to just punt to the interpreter if a function is too big.  I didn't like the idea because of bug 561368, and its hard to define "too big".  But, there's an easy sub-case we could handle.

if (framesize*8 + K > NJ_MAX_STACK_ENTRY*4), where K is a conservative estimate of the extra space we always need, e.g. sizeof(MethodFrame) and other odds+ends, then the JIT is guaranteed to fail, but not until we reach the Assembler phase.  We could just fail sooner and faster.  Even K=0 is okay.  Here, "conservative" means we only fail early if the assembler is guaranteed to fail anyway.

I still like the earlier patch as a refinement for TR, but failing sooner is so simple/easy we should probably do it, and it will solve the problem in WE:2700709 for Salt.
Status: NEW → ASSIGNED
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2.x-Spicy
Target Milestone: flash10.2.x-Spicy → flash10.1.x-Salt
VarTracker is not in FP9, no backport required.

Is this an injection?
Yes, the extreme slowness was an Argo injection (see the Watson notes).
Attachment #481611 - Flags: feedback?(edwsmith)
Attachment #481611 - Attachment is patch: true
Comment on attachment 481611 [details] [diff] [review]
Fail-fast for JIT when framesize too large for NanoJIT stack limit

Looks exactly like what I'd imagined.

Does it work? specifically:
 - makes the test app start fast(er)
 - jit doesn't fail any more often than before

Nit: jitMustFail() probably should be a static method on CodegenLIR.
Nit: calculation of jitMustFail could be after if ((core->isJITEnabled()...
Attachment #481611 - Flags: feedback?(edwsmith) → feedback+
Attachment #481611 - Attachment is obsolete: true
Attachment #481721 - Flags: review?(edwsmith)
Attachment #481721 - Flags: feedback?(rreitmai)
Comment on attachment 481721 [details] [diff] [review]
Fail-fast for JIT when framesize too large for NanoJIT stack limit   (v2)

While focusing on testing the assertion, I broke the non-debug case, and was too quick on the trigger to post the revised before retesting the non-debug case.  New patch forthcoming.
Attachment #481721 - Flags: review?(edwsmith)
Attachment #481721 - Flags: feedback?(rreitmai)
Revised per Edwin's comments, with some infrastructure for testing agreement between fast-fail test and actual JIT failure.  Assertion is OK on testcase es29.swf (-Dverifyonly), the cases that JIT fail in the top 100 AS3 swfs (by ABC size), and an artificial test case with lots of local variables showing exactly the point at which framesize-related failure occurs.  It turns out the metric used is rather conservative (K = 0), but it should be safe and robust and is adequate to the purpose at hand.

In fp10-salt with this fix, es29.swf initializes in ~5s.
Attachment #481721 - Attachment is obsolete: true
Attachment #481741 - Flags: review?(edwsmith)
Attachment #481741 - Flags: review?(rreitmai)
Attachment #481743 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 480995 [details] [diff] [review]
Workaround for lengthy initialization methods (v2, with explanatory comment)

(In reply to comment #17)

> Assertion is OK on testcase
> es29.swf (-Dverifyonly), the cases that JIT fail in the top 100 AS3 swfs (by
> ABC size), and an artificial test case with lots of local variables showing
> exactly the point at which framesize-related failure occurs.

When I first read this, I thought you were saying "its okay for the assert to fire" (no its not).  But I reread it and I think you mean "the assert doesn't fire", which is good.

Functionally looks right.  nit: I would have passed in a framesize or MethodSignaturep into jitCanFail() instead of Verifier& (in TR, CodegenLIR never references Verifier directly), but its fine, lets ship it.
Attachment #480995 - Flags: superreview?(edwsmith) → superreview+
Comment on attachment 481741 [details] [diff] [review]
Fail-fast for JIT when framesize too large for NanoJIT stack limit (v3)

Looks good.
Attachment #481741 - Flags: review?(rreitmai) → review+
Attachment #480995 - Flags: review?(rreitmai) → review+
Attachment #481741 - Flags: superreview?(edwsmith)
Attachment #481741 - Flags: review?(edwsmith)
Attachment #481741 - Flags: review+
Attachment #481741 - Flags: review?(rreitmai)
Comment on attachment 480995 [details] [diff] [review]
Workaround for lengthy initialization methods (v2, with explanatory comment)

This patch is obsolete, though something similar is appropriate for TR.
Attachment #480995 - Attachment is obsolete: true
Attachment #481741 - Flags: superreview?(edwsmith) → superreview+
Attachment #481741 - Flags: review?(rreitmai) → review+
submit to salt cl 729458
Whiteboard: WE:2700709 → WE:2700709,fixed-in-salt
This patch adds fast-fail logic to TR.  The semantics follow the following patch to Salt: https://bugzilla.mozilla.org/attachment.cgi?id=484923&action=edit

Specifically, we normally fall back to the interpreter for large frame sizes, but -Djitordie is unaffected, allowing the JIT to execute and possibly fail.
Attachment #486502 - Flags: superreview?(edwsmith)
Attachment #486502 - Flags: review?(rreitmai)
The underlying issue reported in WE:2700709 is present in TR since the Argo release, thus this fix should be applied to all subsequent TR branches to date.
(excepting Salt, per comment #22)
Target Milestone: flash10.1.x-Salt → flash10.2.x-Spicy
Comment on attachment 486502 [details] [diff] [review]
Fail-fast for JIT when framesize too large for NanoJIT stack limit (for TR)

R+ since looks fine; couple of nits.   As its now grown quite complex, splitting the shouldJIT() logic across multiple lines might make parsing easier.

Also outside the scope of this bug but handy nonetheless, having a more verbose description of exec policy might be helpful, e.g. interp due 'to large method' , interp due 'static init method'
Attachment #486502 - Flags: review?(rreitmai) → review+
Comment on attachment 486502 [details] [diff] [review]
Fail-fast for JIT when framesize too large for NanoJIT stack limit (for TR)

Code looks fine.  How is test coverage looking?  This has enough subtlety to demand 100% branch coverage of shouldJit(), I think.
Attachment #486502 - Flags: superreview?(edwsmith) → superreview+
Target Milestone: flash10.2.x-Spicy → flash10.x-Wasabi
Possible future work item for improving vartracker performance moved to bug 631303 .  Closing this bug.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: WE:2700709,fixed-in-salt → WE:2700709,fixed-in-salt,fixed-in-spicy,fixed-in-spicier
Fix submitted to salt with P4 CL: 729458, Verified fixed by player qe with player 10.1.102.57
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: