Status

()

defect
P1
normal
RESOLVED FIXED
10 years ago
9 years ago

People

(Reporter: gal, Assigned: gal)

Tracking

({fixed1.9.1})

unspecified
mozilla1.9.2a1
x86
macOS
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.1 +
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 attachment, 17 obsolete attachments)

41.33 KB, patch
graydon
: review+
Details | Diff | Splinter Review
Comment hidden (empty)
(Assignee)

Updated

10 years ago
Assignee: general → gal
(Assignee)

Updated

10 years ago
Flags: blocking1.9.1?
Priority: -- → P1
Target Milestone: --- → mozilla1.9.1b3
Hey Andreas, if you get a second, could you add a touch more context here so we can understand sizing/impact for a blocking decision? No rush.
(Assignee)

Comment 2

10 years ago
Sorry for the shitty description. I split the original bug in two and didn't fill in the blanks here.

We can't trace some code for various reasons (i.e. we didn't implement a certain opcode or native). In this case we start compiling, hit an abort, and then don't want to compile again. We call this "blacklisting". Blacklisting is not absolute, because the abort point is different from the start point, and we might end up along a different path that we can compile just fine from the same starting point. So we use an exponential backoff.

The current mechanism works somewhat, but not good enough. For some code where we don't trace much and abort most the time, we can incur 50% slowdown over JIT disabled. The goal of this bug is to be smarter about inhibiting tracing after repeated failures, i.e. back off harder than exponential if we keep dying along the same path over and over. In other words we don't want to suck as much if we can't trace to limit our performance downside.
(Assignee)

Comment 3

10 years ago
Using a minimal heuristics that starts at hot == 2 and sets hot = -1<<5 when blacklisting, without any other blacklisting tuning points, we get the following significant aborts on SS:

t/3d-raytrace.js recorder: started(924), aborted(894), completed(179), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(70), breaks(0), returns(4), unstableInnerCalls(22)
t/access-binary-trees.js recorder: started(406), aborted(406), completed(0), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(6)
t/access-fannkuch.js recorder: started(57), aborted(26), completed(57), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(13), breaks(4), returns(0), unstableInnerCalls(21)
t/crypto-aes.js recorder: started(244), aborted(174), completed(76), different header(6), trees trashed(0), slot promoted(0), unstable loop variable(11), breaks(0), returns(0), unstableInnerCalls(8)
t/crypto-md5.js recorder: started(4), aborted(0), completed(5), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
t/crypto-sha1.js recorder: started(5), aborted(0), completed(10), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
t/date-format-tofte.js recorder: started(502), aborted(5995), completed(4), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(1), unstableInnerCalls(0)
t/date-format-xparb.js recorder: started(10766), aborted(10764), completed(8), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0)
t/string-fasta.js recorder: started(18), aborted(7), completed(21), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(2), breaks(4), returns(0), unstableInnerCalls(4)
t/string-tagcloud.js recorder: started(7504), aborted(7497), completed(7), different header(0), trees trashed(0), slot promoted(0), unstable loop variable(4), breaks(0), returns(0), unstableInnerCalls(0)
(Assignee)

Comment 4

10 years ago
This blacklisting algorithm is greatly simplified. It tracks hit counts and blacklists after N attempts of failing to trace from a tree entry point.
(Assignee)

Comment 5

10 years ago
Posted patch patch, this time for real (obsolete) — Splinter Review
Attachment #363230 - Attachment is obsolete: true
I did a bit of archaeology on the existing tweaks to the blacklisting algorithm. There were 4 places that tweak something, all in js_RecordLoopEdge and relating to inner trees. They were all introduced in http://hg.mozilla.org/tracemonkey/rev/74dd5792d4de without any rationale documented. I found that 3 of the 4 can be removed with no apparent perf loss, and in fact a small perf gain. The last one:

            if (old->recordAttempts < MAX_MISMATCH)
                oracle.resetHits(old->ip);

is 'necessary' for fannkuch (we get 80ms loss without it) but otherwise costs us about 25 ms on the rest of SS. This is resetHits:

/* Reset the hit count for an jump-target and relax the blacklist count. */
void
Oracle::resetHits(const void* ip)
{
    size_t h = hitHash(ip);
    if (hits[h] > 0)
        hits[h]--;
    if (blacklistLevels[h] > 0)
        blacklistLevels[h]--;
}

The comment doesn't match the code. It turns out that removing the part that adjusts hits has no effect: the blacklist part is the part that helps fannkuch. The effect of the fannkuch-specific check is thus to undo the blacklisting of an outer tree the first 20 times it is aborted and blacklisted due to the "no compatible inner tree" condition. Presumably in fannkuch it takes about 20 runs of the outer tree before the inner tree becomes available to call, but at that point it typically is available.
(Assignee)

Comment 7

10 years ago
Posted patch v2 (obsolete) — Splinter Review
Attachment #363231 - Attachment is obsolete: true
(In reply to comment #7)
> Created an attachment (id=363233) [details]
> v2

With this one, we take a 10-15ms bath on 3d-raytrace. But setting MAXPEERS=8 fixes that. I recommend this tune, which gives us an almost 20ms win vs. tip:

#define MAXPEERS 8

inline void
Oracle::blacklist(const void* tree, const void* stop)
{
    int32_t& h = hits(tree);
    h = -(1<<6);
    if (_blacklist[blHash(tree, stop)]++ > 2)
        h = INT_MIN;
}

TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           1.015x as fast    1188.8ms +/- 0.1%   1170.9ms +/- 0.2%     significant

=============================================================================

  3d:                  *1.018x as slow*   173.7ms +/- 0.6%    176.8ms +/- 0.4%     significant
    cube:              ??                  43.3ms +/- 0.8%     43.6ms +/- 0.8%     not conclusive: might be *1.007x as slow*
    morph:             -                   33.7ms +/- 1.0%     33.3ms +/- 1.4% 
    raytrace:          *1.033x as slow*    96.7ms +/- 0.8%     99.9ms +/- 0.7%     significant

  access:              *1.013x as slow*   153.0ms +/- 0.9%    155.0ms +/- 0.5%     significant
    binary-trees:      -                   44.1ms +/- 1.8%     43.9ms +/- 0.5% 
    fannkuch:          *1.035x as slow*    66.3ms +/- 0.5%     68.6ms +/- 0.7%     significant
    nbody:             ??                  28.1ms +/- 0.8%     28.4ms +/- 1.3%     not conclusive: might be *1.011x as slow*
    nsieve:            -                   14.5ms +/- 2.6%     14.1ms +/- 1.6% 

  bitops:              -                   40.6ms +/- 1.5%     40.1ms +/- 1.8% 
    3bit-bits-in-byte: -                    1.8ms +/- 16.7%      1.8ms +/- 16.7% 
    bits-in-byte:      -                    9.2ms +/- 3.3%      9.1ms +/- 2.5% 
    bitwise-and:       -                    2.8ms +/- 10.8%      2.8ms +/- 10.8% 
    nsieve-bits:       -                   26.8ms +/- 1.1%     26.4ms +/- 1.4% 

  controlflow:         -                   36.8ms +/- 0.8%     36.4ms +/- 1.0% 
    recursive:         -                   36.8ms +/- 0.8%     36.4ms +/- 1.0% 

  crypto:              1.039x as fast      71.4ms +/- 0.8%     68.7ms +/- 1.1%     significant
    aes:               1.069x as fast      40.5ms +/- 1.2%     37.9ms +/- 0.6%     significant
    md5:               -                   23.1ms +/- 1.0%     23.0ms +/- 2.1% 
    sha1:              -                    7.8ms +/- 3.9%      7.8ms +/- 3.9% 

  date:                1.026x as fast     194.2ms +/- 0.3%    189.2ms +/- 0.3%     significant
    format-tofte:      1.034x as fast      76.6ms +/- 0.5%     74.1ms +/- 0.7%     significant
    format-xparb:      1.022x as fast     117.6ms +/- 0.4%    115.1ms +/- 0.2%     significant

  math:                -                   46.9ms +/- 1.3%     46.4ms +/- 1.5% 
    cordic:            ??                  21.5ms +/- 1.8%     21.6ms +/- 1.7%     not conclusive: might be *1.005x as slow*
    partial-sums:      1.029x as fast      17.8ms +/- 1.7%     17.3ms +/- 2.0%     significant
    spectral-norm:     -                    7.6ms +/- 4.9%      7.5ms +/- 5.0% 

  regexp:              -                   50.6ms +/- 1.0%     50.4ms +/- 0.7% 
    dna:               -                   50.6ms +/- 1.0%     50.4ms +/- 0.7% 

  string:              1.034x as fast     421.6ms +/- 0.2%    407.9ms +/- 0.3%     significant
    base64:            -                   18.6ms +/- 2.0%     18.3ms +/- 1.9% 
    fasta:             1.035x as fast      88.0ms +/- 0.5%     85.0ms +/- 0.7%     significant
    tagcloud:          1.024x as fast     116.1ms +/- 0.5%    113.4ms +/- 0.6%     significant
    unpack-code:       1.045x as fast     162.6ms +/- 0.3%    155.6ms +/- 0.3%     significant
    validate-input:    1.020x as fast      36.3ms +/- 1.0%     35.6ms +/- 1.0%     significant
(Assignee)

Comment 9

10 years ago
Posted patch v3 (obsolete) — Splinter Review
Attachment #363233 - Attachment is obsolete: true
(Assignee)

Comment 10

10 years ago
Posted patch v3, refreshed for tip (obsolete) — Splinter Review
Attachment #363239 - Attachment is obsolete: true
On my machine, the best SS tune is 

BL_BACKOFF 16
BL_ATTEMPTS 2
HOTEXIT 2
MAXEXIT 2
MAXPEERS 10

Because before you didn't get good results on your machine with BL_ATTEMPTS=2, I fixed it at 4 and got:

BL_BACKOFF 32
BL_ATTEMPTS 4
HOTEXIT 2
MAXEXIT 3
MAXPEERS 8
v8 appears to be pretty insensitive to these tuning parameters. Either set given in comment 11 looks pretty good and both are about the same.
(Assignee)

Comment 13

10 years ago
Could you please try the parameters with ./bench.sh? I get really bad results with that. Might be oracle collisions. HOTEXIT != 1 seems to be the problem.
(In reply to comment #2)
> The current mechanism works somewhat, but not good enough. For some code where
> we don't trace much and abort most the time, we can incur 50% slowdown over JIT
> disabled.

And that's why this bug should be a blocker.

/be
(Assignee)

Comment 15

10 years ago
Posted patch v4 (obsolete) — Splinter Review
Attachment #363244 - Attachment is obsolete: true
(Assignee)

Comment 16

10 years ago
Posted patch v5 (obsolete) — Splinter Review
Attachment #363433 - Attachment is obsolete: true
(Assignee)

Comment 17

10 years ago
v5 is a speedup for SS and gets us to around 3% slowdown on the unpatched date example, which used to produce > 2x slowdown. We are doing a little bit more analysis but this should be done very soon.
(Assignee)

Updated

10 years ago
Attachment #363434 - Flags: review?(graydon)
Here is a slight retune of v5 that gives me slightly better results:

    'BL_ATTEMPTS': 7, (was 8)
    'HOTEXIT':     1, (was 2)
    'MAXPEERS':    9, (was 8)

but it's really a tiny difference. I also suggest blacklisting a trace when the "unsupported opcode" condition is hit, i.e., this code is run near line 4359 is run:

  abort_recording:
    js_AbortRecordingHard(cx, js_CodeName[op]);

My measurements show recording is almost never successful after this happens. Blacklisting at this point seems to give a slight SS speedup. It would also be reasonable to do a backoff, but a more aggressive one (backoff by more and blacklist after fewer attempts) but that seems unnecessarily complicated.

In my debug builds, I sometimes saw a 'reason' value for an abort as 'EnterFrame' or 'LeaveFrame' but now I can't tell which call sites generate those values. But my measurements showed that 'EnterFrame' is like unsupported op: might as well blacklist at that point, while LeaveFrame is like the other aborts, where a backoff is appropriate. So if that distinction can be made it might help another tiny little bit.

For some reason I don't see an improvement in bench.sh, but it doesn't get worse, either.

v5 with and without these tweaks is about the same on the v8 test, and a definite improvement over tip.
(Assignee)

Comment 20

10 years ago
Posted patch experimental patch, v2 (obsolete) — Splinter Review
Attachment #363445 - Attachment is obsolete: true
(Assignee)

Comment 21

10 years ago
Posted patch experimental patch, v3 (obsolete) — Splinter Review
Attachment #363446 - Attachment is obsolete: true
The basic twiddling of counts and blacklist formulae are clearly just matters of empirical measurement, no complaints there. The other changes are .. less clear to me.

- blHash is unused, delete it.
- I don't know why it's safe to avoid blacklisting the outer tree, or how things have changed to make the removal ok.
- I don't know why it's ok to remove the reset of hits on unstable exits and branch exits (or why they were there in the first place)
- In attemptToExtendTree, you're using the fragment's internal hit counter rather than the oracle's. I don't know if this is intentional or why it's correct if so; nothing else consults that counter.

I'll r+ if you can explain the last 3 points and agree to remove the dead function in the first.
I tried to run the 2d physics page with the experimental patch but I crashed:

0   libmozjs.dylib                	0x001c6d03 TraceRecorder::snapshot(ExitType) + 279 (jstracer.cpp:2064)
1   libmozjs.dylib                	0x001d2deb TraceRecorder::TraceRecorder(JSContext*, VMSideExit*, nanojit::Fragment*, TreeInfo*, unsigned int, unsigned int, unsigned char*, VMSideExit*) + 951
2   libmozjs.dylib                	0x001d2f35 js_StartRecorder(JSContext*, VMSideExit*, nanojit::Fragment*, TreeInfo*, unsigned int, unsigned int, unsigned char*, VMSideExit*, nanojit::Fragment*) + 129
3   libmozjs.dylib                	0x001d3274 js_RecordTree(JSContext*, JSTraceMonitor*, nanojit::Fragment*, nanojit::Fragment*, unsigned int, Queue<unsigned short>*) + 406
4   libmozjs.dylib                	0x001d3601 js_AttemptToStabilizeTree(JSContext*, VMSideExit*, nanojit::Fragment*) + 681
5   libmozjs.dylib                	0x001d3da5 js_MonitorLoopEdge(JSContext*, unsigned int&, int) + 991
6   libmozjs.dylib                	0x0016ac32 js_Interpret + 53714 (jsinterp.cpp:3721)
(In reply to comment #22)
> - I don't know why it's safe to avoid blacklisting the outer tree, or how
> things have changed to make the removal ok.
> - I don't know why it's ok to remove the reset of hits on unstable exits and
> branch exits (or why they were there in the first place)

I discussed these issues in comment 6, along with a pointer to the changeset where those heuristics were introduced. I was never able to find any evidence that they did anything useful; it's not even clear they even did what they were originally intended to. Maybe dvander remembers what they were for.
(In reply to comment #24)

> I discussed these issues in comment 6, along with a pointer to the changeset
> where those heuristics were introduced. I was never able to find any evidence
> that they did anything useful; it's not even clear they even did what they were
> originally intended to. Maybe dvander remembers what they were for.

Ah. Well enough then; didn't realize that's what you meant by "3 of 4 can be removed". Otherwise fine by me if you think the physics page crash is unrelated (and delete the dead function). Maybe run a larger pageset to test? We have some multitree bugs in progress that are sort of pushed out into the light by any twiddling of the oracle, so it's not clearly the "fault" of this patch. But it might be.
Oh, you're talking about a different patch, "experimental" vs. non. Oops.
Attachment #363434 - Flags: review?(graydon) → review+
(Assignee)

Comment 27

10 years ago
Posted patch v6 (obsolete) — Splinter Review
Not sure whether all concerns were addressed, so asking for r again.
Attachment #363434 - Attachment is obsolete: true
Attachment #363447 - Attachment is obsolete: true
Attachment #363590 - Flags: review?(graydon)

Comment 28

10 years ago
(In reply to comment #14)
> 
> And that's why this bug should be a blocker.

yep yep. this was split off from a blocker we filed.
Flags: blocking1.9.1? → blocking1.9.1+
(Assignee)

Comment 29

10 years ago
Note for graydon: the reason we don't need resetHits any more is that we don't use an exponential backoff. For each attempt we back off by 32 and then blacklist permanently. The resetHits magic was necessary because by the time the outer tree was ready, we would already wait 512 iterations. Now we just wait 32, as long its within the permissible number of attempts.
Comment on attachment 363590 [details] [diff] [review]
v6

Hmm, I'm still going to fuss a bit here. Happy to r+ when I understand fully, but:

- Why is do js_Blacklist and js_Backoff move one-peer into the peer list? Shouldn't all peers in a peer list have the same ip?

- Why is js_AttemptToExtendTree using c->hits()? this is a per-peer count embedded in the fragment: seems to me it'll be possible for you to build up MAXPEERS * MAXEXIT (27) branches off each possible ip; is this the intent?
(Assignee)

Updated

10 years ago
Attachment #363590 - Flags: review?(graydon) → review-
(Assignee)

Comment 31

10 years ago
Comment on attachment 363590 [details] [diff] [review]
v6

I will post a new patch and an explanation in a bit.
(Assignee)

Comment 32

10 years ago
Posted patch v7 (obsolete) — Splinter Review
Attachment #363590 - Attachment is obsolete: true
Attachment #363725 - Flags: review?(graydon)
Comment on attachment 363725 [details] [diff] [review]
v7

looks fine
Attachment #363725 - Flags: review?(graydon) → review+
(Assignee)

Comment 34

10 years ago
http://hg.mozilla.org/tracemonkey/rev/5c7fe2d7c24d
Whiteboard: fixed-in-tracemonkey
(Assignee)

Comment 35

10 years ago
Posted patch v8 (obsolete) — Splinter Review
New patch with script code patching to avoid calls to MonitorLoopEdge after blacklisting. This patch also eliminates the colliding hash since now we are pretty fast after blacklisting anyway. It seems I need a pretty high BL_ATTEMPTS number to get fasta to go fast, since outer trees need to be attempted a few times until they actually get compiled. This used to be taken care of by resetHits. We might want to add some mechanism back. Maybe after every successful inner-tree compilation allow the outer tree to immediately re-record? Handing the patch over to dmandelin to optimize the parameters. This seem to be faster than the previous patch.
Attachment #363725 - Attachment is obsolete: true
(Assignee)

Comment 36

10 years ago
Posted patch v8, for real (obsolete) — Splinter Review
Attachment #363807 - Attachment is obsolete: true

Updated

10 years ago
Whiteboard: fixed-in-tracemonkey
(In reply to comment #35)
> It seems I need a pretty high
> BL_ATTEMPTS number to get fasta to go fast, since outer trees need to be
> attempted a few times until they actually get compiled. This used to be taken
> care of by resetHits. We might want to add some mechanism back. 

My study of blacklist events showed that after most of the inner-tree related events we are in fact able to later compile successfully. (The only exception was  'Inner tree took different side exit, abort recording', but I don't know if that is true in general or only incidentally in SS.) This supports the idea that for these events, we want to wait a bit (back off), but don't need to blacklist permanently. 

As a first cut, you might want to try just not incrementing the attempts counter for these guys. If we need more complexity, we can have some classes of events count for more progress toward blacklisting than others. So, we can have a counter that increments by A for inner-tree events, B for most other aborts, and C for the bad stuff (recursion and unsupported ops), blacklist permanently when the counter gets to 100, and tune A, B, and C.

Comment 38

10 years ago
jsfunfuzz seems happy with this patch.
(Assignee)

Comment 39

10 years ago
Posted patch v9 (obsolete) — Splinter Review
Attachment #363808 - Attachment is obsolete: true
(Assignee)

Comment 40

10 years ago
Attachment #364178 - Attachment is obsolete: true
Attachment #364179 - Flags: review?(graydon)
(Assignee)

Comment 41

10 years ago
Posted patch v10 (obsolete) — Splinter Review
Attachment #364179 - Attachment is obsolete: true
Attachment #364179 - Flags: review?(graydon)
(Assignee)

Comment 42

10 years ago
Attachment #364181 - Attachment is obsolete: true
Attachment #364183 - Flags: review?(graydon)
Comment on attachment 364183 [details] [diff] [review]
v10, ready for review

The blacklisting contains an assert *pc==JSOP_LOOP, you suggested in IRC changing this to include || *pc==JSOP_NOP to handle races. Not sure who would be racing us except ourselves, but perhaps some feedback inside another tree helper.

Otherwise looks lovely. Lots of hairballs deleted. If it still works and doesn't regress, I'm happy.
Attachment #364183 - Flags: review?(graydon) → review+
(Assignee)

Comment 44

10 years ago
http://hg.mozilla.org/tracemonkey/rev/094c2e9de304
Whiteboard: fixed-in-tracemonkey

Updated

10 years ago
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Keywords: fixed1.9.1
Resolution: --- → FIXED
Target Milestone: mozilla1.9.1b3 → mozilla1.9.2a1
The checked-in patch on this bug causes crashes in latest builds on trunk and 1.9.1. This is filed as bug 481060.

Updated

10 years ago
Flags: in-testsuite-
Depends on: 522920
Depends on: 547911
You need to log in before you can comment on or make changes to this bug.