Closed
Bug 479109
Opened 17 years ago
Closed 17 years ago
TM: Improve blacklist
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla1.9.2a1
People
(Reporter: gal, Assigned: gal)
References
Details
(Keywords: fixed1.9.1, Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file, 17 obsolete files)
|
41.33 KB,
patch
|
graydon
:
review+
|
Details | Diff | Splinter Review |
No description provided.
| Assignee | ||
Updated•17 years ago
|
Assignee: general → gal
| Assignee | ||
Updated•17 years ago
|
Flags: blocking1.9.1?
Priority: -- → P1
Target Milestone: --- → mozilla1.9.1b3
Comment 1•17 years ago
|
||
Hey Andreas, if you get a second, could you add a touch more context here so we can understand sizing/impact for a blocking decision? No rush.
| Assignee | ||
Comment 2•17 years ago
|
||
Sorry for the **** description. I split the original bug in two and didn't fill in the blanks here.
We can't trace some code for various reasons (i.e. we didn't implement a certain opcode or native). In this case we start compiling, hit an abort, and then don't want to compile again. We call this "blacklisting". Blacklisting is not absolute, because the abort point is different from the start point, and we might end up along a different path that we can compile just fine from the same starting point. So we use an exponential backoff.
The current mechanism works somewhat, but not good enough. For some code where we don't trace much and abort most the time, we can incur 50% slowdown over JIT disabled. The goal of this bug is to be smarter about inhibiting tracing after repeated failures, i.e. back off harder than exponential if we keep dying along the same path over and over. In other words we don't want to suck as much if we can't trace to limit our performance downside.
| Assignee | ||
Comment 3•17 years ago
|
||
Using a minimal heuristics that starts at hot == 2 and sets hot = -1<<5 when blacklisting, without any other blacklisting tuning points, we get the following significant aborts on SS:
t/3d-raytrace.js recorder: started(924), aborted(894), completed(179), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(70), breaks(0), returns(4), unstableInnerCalls(22)
t/access-binary-trees.js recorder: started(406), aborted(406), completed(0), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(6)
t/access-fannkuch.js recorder: started(57), aborted(26), completed(57), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(13), breaks(4), returns(0), unstableInnerCalls(21)
t/crypto-aes.js recorder: started(244), aborted(174), completed(76), different header(6), trees trashed(0), slot promoted(0), unstable loop variable(11), breaks(0), returns(0), unstableInnerCalls(8)
t/crypto-md5.js recorder: started(4), aborted(0), completed(5), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
t/crypto-sha1.js recorder: started(5), aborted(0), completed(10), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
t/date-format-tofte.js recorder: started(502), aborted(5995), completed(4), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(1), unstableInnerCalls(0)
t/date-format-xparb.js recorder: started(10766), aborted(10764), completed(8), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
t/string-fasta.js recorder: started(18), aborted(7), completed(21), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(2), breaks(4), returns(0), unstableInnerCalls(4)
t/string-tagcloud.js recorder: started(7504), aborted(7497), completed(7), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(4), breaks(0), returns(0), unstableInnerCalls(0)
| Assignee | ||
Comment 4•17 years ago
|
||
This blacklisting algorithm is greatly simplified. It tracks hit counts and blacklists after N attempts of failing to trace from a tree entry point.
| Assignee | ||
Comment 5•17 years ago
|
||
Attachment #363230 -
Attachment is obsolete: true
Comment 6•17 years ago
|
||
I did a bit of archaeology on the existing tweaks to the blacklisting algorithm. There were 4 places that tweak something, all in js_RecordLoopEdge and relating to inner trees. They were all introduced in http://hg.mozilla.org/tracemonkey/rev/74dd5792d4de without any rationale documented. I found that 3 of the 4 can be removed with no apparent perf loss, and in fact a small perf gain. The last one:
if (old->recordAttempts < MAX_MISMATCH)
oracle.resetHits(old->ip);
is 'necessary' for fannkuch (we get 80ms loss without it) but otherwise costs us about 25 ms on the rest of SS. This is resetHits:
/* Reset the hit count for an jump-target and relax the blacklist count. */
void
Oracle::resetHits(const void* ip)
{
size_t h = hitHash(ip);
if (hits[h] > 0)
hits[h]--;
if (blacklistLevels[h] > 0)
blacklistLevels[h]--;
}
The comment doesn't match the code. It turns out that removing the part that adjusts hits has no effect: the blacklist part is the part that helps fannkuch. The effect of the fannkuch-specific check is thus to undo the blacklisting of an outer tree the first 20 times it is aborted and blacklisted due to the "no compatible inner tree" condition. Presumably in fannkuch it takes about 20 runs of the outer tree before the inner tree becomes available to call, but at that point it typically is available.
| Assignee | ||
Comment 7•17 years ago
|
||
Attachment #363231 -
Attachment is obsolete: true
Comment 8•17 years ago
|
||
(In reply to comment #7)
> Created an attachment (id=363233) [details]
> v2
With this one, we take a 10-15ms bath on 3d-raytrace. But setting MAXPEERS=8 fixes that. I recommend this tune, which gives us an almost 20ms win vs. tip:
#define MAXPEERS 8
inline void
Oracle::blacklist(const void* tree, const void* stop)
{
int32_t& h = hits(tree);
h = -(1<<6);
if (_blacklist[blHash(tree, stop)]++ > 2)
h = INT_MIN;
}
TEST COMPARISON FROM TO DETAILS
=============================================================================
** TOTAL **: 1.015x as fast 1188.8ms +/- 0.1% 1170.9ms +/- 0.2% significant
=============================================================================
3d: *1.018x as slow* 173.7ms +/- 0.6% 176.8ms +/- 0.4% significant
cube: ?? 43.3ms +/- 0.8% 43.6ms +/- 0.8% not conclusive: might be *1.007x as slow*
morph: - 33.7ms +/- 1.0% 33.3ms +/- 1.4%
raytrace: *1.033x as slow* 96.7ms +/- 0.8% 99.9ms +/- 0.7% significant
access: *1.013x as slow* 153.0ms +/- 0.9% 155.0ms +/- 0.5% significant
binary-trees: - 44.1ms +/- 1.8% 43.9ms +/- 0.5%
fannkuch: *1.035x as slow* 66.3ms +/- 0.5% 68.6ms +/- 0.7% significant
nbody: ?? 28.1ms +/- 0.8% 28.4ms +/- 1.3% not conclusive: might be *1.011x as slow*
nsieve: - 14.5ms +/- 2.6% 14.1ms +/- 1.6%
bitops: - 40.6ms +/- 1.5% 40.1ms +/- 1.8%
3bit-bits-in-byte: - 1.8ms +/- 16.7% 1.8ms +/- 16.7%
bits-in-byte: - 9.2ms +/- 3.3% 9.1ms +/- 2.5%
bitwise-and: - 2.8ms +/- 10.8% 2.8ms +/- 10.8%
nsieve-bits: - 26.8ms +/- 1.1% 26.4ms +/- 1.4%
controlflow: - 36.8ms +/- 0.8% 36.4ms +/- 1.0%
recursive: - 36.8ms +/- 0.8% 36.4ms +/- 1.0%
crypto: 1.039x as fast 71.4ms +/- 0.8% 68.7ms +/- 1.1% significant
aes: 1.069x as fast 40.5ms +/- 1.2% 37.9ms +/- 0.6% significant
md5: - 23.1ms +/- 1.0% 23.0ms +/- 2.1%
sha1: - 7.8ms +/- 3.9% 7.8ms +/- 3.9%
date: 1.026x as fast 194.2ms +/- 0.3% 189.2ms +/- 0.3% significant
format-tofte: 1.034x as fast 76.6ms +/- 0.5% 74.1ms +/- 0.7% significant
format-xparb: 1.022x as fast 117.6ms +/- 0.4% 115.1ms +/- 0.2% significant
math: - 46.9ms +/- 1.3% 46.4ms +/- 1.5%
cordic: ?? 21.5ms +/- 1.8% 21.6ms +/- 1.7% not conclusive: might be *1.005x as slow*
partial-sums: 1.029x as fast 17.8ms +/- 1.7% 17.3ms +/- 2.0% significant
spectral-norm: - 7.6ms +/- 4.9% 7.5ms +/- 5.0%
regexp: - 50.6ms +/- 1.0% 50.4ms +/- 0.7%
dna: - 50.6ms +/- 1.0% 50.4ms +/- 0.7%
string: 1.034x as fast 421.6ms +/- 0.2% 407.9ms +/- 0.3% significant
base64: - 18.6ms +/- 2.0% 18.3ms +/- 1.9%
fasta: 1.035x as fast 88.0ms +/- 0.5% 85.0ms +/- 0.7% significant
tagcloud: 1.024x as fast 116.1ms +/- 0.5% 113.4ms +/- 0.6% significant
unpack-code: 1.045x as fast 162.6ms +/- 0.3% 155.6ms +/- 0.3% significant
validate-input: 1.020x as fast 36.3ms +/- 1.0% 35.6ms +/- 1.0% significant
| Assignee | ||
Comment 9•17 years ago
|
||
Attachment #363233 -
Attachment is obsolete: true
| Assignee | ||
Comment 10•17 years ago
|
||
Attachment #363239 -
Attachment is obsolete: true
Comment 11•17 years ago
|
||
On my machine, the best SS tune is
BL_BACKOFF 16
BL_ATTEMPTS 2
HOTEXIT 2
MAXEXIT 2
MAXPEERS 10
Because before you didn't get good results on your machine with BL_ATTEMPTS=2, I fixed it at 4 and got:
BL_BACKOFF 32
BL_ATTEMPTS 4
HOTEXIT 2
MAXEXIT 3
MAXPEERS 8
Comment 12•17 years ago
|
||
v8 appears to be pretty insensitive to these tuning parameters. Either set given in comment 11 looks pretty good and both are about the same.
| Assignee | ||
Comment 13•17 years ago
|
||
Could you please try the parameters with ./bench.sh? I get really bad results with that. Might be oracle collisions. HOTEXIT != 1 seems to be the problem.
Comment 14•17 years ago
|
||
(In reply to comment #2)
> The current mechanism works somewhat, but not good enough. For some code where
> we don't trace much and abort most the time, we can incur 50% slowdown over JIT
> disabled.
And that's why this bug should be a blocker.
/be
| Assignee | ||
Comment 15•17 years ago
|
||
Attachment #363244 -
Attachment is obsolete: true
| Assignee | ||
Comment 16•17 years ago
|
||
Attachment #363433 -
Attachment is obsolete: true
| Assignee | ||
Comment 17•17 years ago
|
||
v5 is a speedup for SS and gets us to around 3% slowdown on the unpatched date example, which used to produce > 2x slowdown. We are doing a little bit more analysis but this should be done very soon.
| Assignee | ||
Updated•17 years ago
|
Attachment #363434 -
Flags: review?(graydon)
| Assignee | ||
Comment 18•17 years ago
|
||
Comment 19•17 years ago
|
||
Here is a slight retune of v5 that gives me slightly better results:
'BL_ATTEMPTS': 7, (was 8)
'HOTEXIT': 1, (was 2)
'MAXPEERS': 9, (was 8)
but it's really a tiny difference. I also suggest blacklisting a trace when the "unsupported opcode" condition is hit, i.e., this code is run near line 4359 is run:
abort_recording:
js_AbortRecordingHard(cx, js_CodeName[op]);
My measurements show recording is almost never successful after this happens. Blacklisting at this point seems to give a slight SS speedup. It would also be reasonable to do a backoff, but a more aggressive one (backoff by more and blacklist after fewer attempts) but that seems unnecessarily complicated.
In my debug builds, I sometimes saw a 'reason' value for an abort as 'EnterFrame' or 'LeaveFrame' but now I can't tell which call sites generate those values. But my measurements showed that 'EnterFrame' is like unsupported op: might as well blacklist at that point, while LeaveFrame is like the other aborts, where a backoff is appropriate. So if that distinction can be made it might help another tiny little bit.
For some reason I don't see an improvement in bench.sh, but it doesn't get worse, either.
v5 with and without these tweaks is about the same on the v8 test, and a definite improvement over tip.
| Assignee | ||
Comment 20•17 years ago
|
||
Attachment #363445 -
Attachment is obsolete: true
| Assignee | ||
Comment 21•17 years ago
|
||
Attachment #363446 -
Attachment is obsolete: true
Comment 22•17 years ago
|
||
The basic twiddling of counts and blacklist formulae are clearly just matters of empirical measurement, no complaints there. The other changes are .. less clear to me.
- blHash is unused, delete it.
- I don't know why it's safe to avoid blacklisting the outer tree, or how things have changed to make the removal ok.
- I don't know why it's ok to remove the reset of hits on unstable exits and branch exits (or why they were there in the first place)
- In attemptToExtendTree, you're using the fragment's internal hit counter rather than the oracle's. I don't know if this is intentional or why it's correct if so; nothing else consults that counter.
I'll r+ if you can explain the last 3 points and agree to remove the dead function in the first.
Comment 23•17 years ago
|
||
I tried to run the 2d physics page with the experimental patch but I crashed:
0 libmozjs.dylib 0x001c6d03 TraceRecorder::snapshot(ExitType) + 279 (jstracer.cpp:2064)
1 libmozjs.dylib 0x001d2deb TraceRecorder::TraceRecorder(JSContext*, VMSideExit*, nanojit::Fragment*, TreeInfo*, unsigned int, unsigned int, unsigned char*, VMSideExit*) + 951
2 libmozjs.dylib 0x001d2f35 js_StartRecorder(JSContext*, VMSideExit*, nanojit::Fragment*, TreeInfo*, unsigned int, unsigned int, unsigned char*, VMSideExit*, nanojit::Fragment*) + 129
3 libmozjs.dylib 0x001d3274 js_RecordTree(JSContext*, JSTraceMonitor*, nanojit::Fragment*, nanojit::Fragment*, unsigned int, Queue<unsigned short>*) + 406
4 libmozjs.dylib 0x001d3601 js_AttemptToStabilizeTree(JSContext*, VMSideExit*, nanojit::Fragment*) + 681
5 libmozjs.dylib 0x001d3da5 js_MonitorLoopEdge(JSContext*, unsigned int&, int) + 991
6 libmozjs.dylib 0x0016ac32 js_Interpret + 53714 (jsinterp.cpp:3721)
Comment 24•17 years ago
|
||
(In reply to comment #22)
> - I don't know why it's safe to avoid blacklisting the outer tree, or how
> things have changed to make the removal ok.
> - I don't know why it's ok to remove the reset of hits on unstable exits and
> branch exits (or why they were there in the first place)
I discussed these issues in comment 6, along with a pointer to the changeset where those heuristics were introduced. I was never able to find any evidence that they did anything useful; it's not even clear they even did what they were originally intended to. Maybe dvander remembers what they were for.
Comment 25•17 years ago
|
||
(In reply to comment #24)
> I discussed these issues in comment 6, along with a pointer to the changeset
> where those heuristics were introduced. I was never able to find any evidence
> that they did anything useful; it's not even clear they even did what they were
> originally intended to. Maybe dvander remembers what they were for.
Ah. Well enough then; didn't realize that's what you meant by "3 of 4 can be removed". Otherwise fine by me if you think the physics page crash is unrelated (and delete the dead function). Maybe run a larger pageset to test? We have some multitree bugs in progress that are sort of pushed out into the light by any twiddling of the oracle, so it's not clearly the "fault" of this patch. But it might be.
Comment 26•17 years ago
|
||
Oh, you're talking about a different patch, "experimental" vs. non. Oops.
Updated•17 years ago
|
Attachment #363434 -
Flags: review?(graydon) → review+
| Assignee | ||
Comment 27•17 years ago
|
||
Not sure whether all concerns were addressed, so asking for r again.
Attachment #363434 -
Attachment is obsolete: true
Attachment #363447 -
Attachment is obsolete: true
Attachment #363590 -
Flags: review?(graydon)
Comment 28•17 years ago
|
||
(In reply to comment #14)
>
> And that's why this bug should be a blocker.
yep yep. this was split off from a blocker we filed.
Flags: blocking1.9.1? → blocking1.9.1+
| Assignee | ||
Comment 29•17 years ago
|
||
Note for graydon: the reason we don't need resetHits any more is that we don't use an exponential backoff. For each attempt we back off by 32 and then blacklist permanently. The resetHits magic was necessary because by the time the outer tree was ready, we would already wait 512 iterations. Now we just wait 32, as long its within the permissible number of attempts.
Comment 30•17 years ago
|
||
Comment on attachment 363590 [details] [diff] [review]
v6
Hmm, I'm still going to fuss a bit here. Happy to r+ when I understand fully, but:
- Why is do js_Blacklist and js_Backoff move one-peer into the peer list? Shouldn't all peers in a peer list have the same ip?
- Why is js_AttemptToExtendTree using c->hits()? this is a per-peer count embedded in the fragment: seems to me it'll be possible for you to build up MAXPEERS * MAXEXIT (27) branches off each possible ip; is this the intent?
| Assignee | ||
Updated•17 years ago
|
Attachment #363590 -
Flags: review?(graydon) → review-
| Assignee | ||
Comment 31•17 years ago
|
||
Comment on attachment 363590 [details] [diff] [review]
v6
I will post a new patch and an explanation in a bit.
| Assignee | ||
Comment 32•17 years ago
|
||
Attachment #363590 -
Attachment is obsolete: true
Attachment #363725 -
Flags: review?(graydon)
Comment 33•17 years ago
|
||
Comment on attachment 363725 [details] [diff] [review]
v7
looks fine
Attachment #363725 -
Flags: review?(graydon) → review+
| Assignee | ||
Comment 34•17 years ago
|
||
Whiteboard: fixed-in-tracemonkey
| Assignee | ||
Comment 35•17 years ago
|
||
New patch with script code patching to avoid calls to MonitorLoopEdge after blacklisting. This patch also eliminates the colliding hash since now we are pretty fast after blacklisting anyway. It seems I need a pretty high BL_ATTEMPTS number to get fasta to go fast, since outer trees need to be attempted a few times until they actually get compiled. This used to be taken care of by resetHits. We might want to add some mechanism back. Maybe after every successful inner-tree compilation allow the outer tree to immediately re-record? Handing the patch over to dmandelin to optimize the parameters. This seem to be faster than the previous patch.
Attachment #363725 -
Attachment is obsolete: true
| Assignee | ||
Comment 36•17 years ago
|
||
Attachment #363807 -
Attachment is obsolete: true
Updated•17 years ago
|
Whiteboard: fixed-in-tracemonkey
Comment 37•17 years ago
|
||
(In reply to comment #35)
> It seems I need a pretty high
> BL_ATTEMPTS number to get fasta to go fast, since outer trees need to be
> attempted a few times until they actually get compiled. This used to be taken
> care of by resetHits. We might want to add some mechanism back.
My study of blacklist events showed that after most of the inner-tree related events we are in fact able to later compile successfully. (The only exception was 'Inner tree took different side exit, abort recording', but I don't know if that is true in general or only incidentally in SS.) This supports the idea that for these events, we want to wait a bit (back off), but don't need to blacklist permanently.
As a first cut, you might want to try just not incrementing the attempts counter for these guys. If we need more complexity, we can have some classes of events count for more progress toward blacklisting than others. So, we can have a counter that increments by A for inner-tree events, B for most other aborts, and C for the bad stuff (recursion and unsupported ops), blacklist permanently when the counter gets to 100, and tune A, B, and C.
Comment 38•17 years ago
|
||
jsfunfuzz seems happy with this patch.
| Assignee | ||
Comment 39•17 years ago
|
||
Attachment #363808 -
Attachment is obsolete: true
| Assignee | ||
Comment 40•17 years ago
|
||
Attachment #364178 -
Attachment is obsolete: true
Attachment #364179 -
Flags: review?(graydon)
| Assignee | ||
Comment 41•17 years ago
|
||
Attachment #364179 -
Attachment is obsolete: true
Attachment #364179 -
Flags: review?(graydon)
| Assignee | ||
Comment 42•17 years ago
|
||
Attachment #364181 -
Attachment is obsolete: true
Attachment #364183 -
Flags: review?(graydon)
Comment 43•17 years ago
|
||
Comment on attachment 364183 [details] [diff] [review]
v10, ready for review
The blacklisting contains an assert *pc==JSOP_LOOP, you suggested in IRC changing this to include || *pc==JSOP_NOP to handle races. Not sure who would be racing us except ourselves, but perhaps some feedback inside another tree helper.
Otherwise looks lovely. Lots of hairballs deleted. If it still works and doesn't regress, I'm happy.
Attachment #364183 -
Flags: review?(graydon) → review+
| Assignee | ||
Comment 44•17 years ago
|
||
Whiteboard: fixed-in-tracemonkey
| Assignee | ||
Comment 45•17 years ago
|
||
Followup fix: http://hg.mozilla.org/tracemonkey/rev/c63cf255ec3b
Comment 46•17 years ago
|
||
Keywords: fixed1.9.1
Updated•17 years ago
|
Comment 47•17 years ago
|
||
Comment 48•17 years ago
|
||
Keywords: fixed1.9.1
Updated•17 years ago
|
Target Milestone: mozilla1.9.1b3 → mozilla1.9.2a1
Comment 49•17 years ago
|
||
The checked-in patch on this bug causes crashes in latest builds on trunk and 1.9.1. This is filed as bug 481060.
Updated•17 years ago
|
Flags: in-testsuite-
You need to log in
before you can comment on or make changes to this bug.
Description
•