Closed Bug 516264 Opened 16 years ago Closed 14 years ago

TM: implement merge traces

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: gal, Assigned: dvander)

References

Details

Attachments

(2 files, 5 obsolete files)

pathological test case 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency 769 bytes, patch		Details \| Diff \| Splinter Review
WIP v1 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency 12.65 KB, patch		Details \| Diff \| Splinter Review
WIP v2 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency 13.04 KB, patch		Details \| Diff \| Splinter Review
fixes crash bz found 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency 13.32 KB, patch		Details \| Diff \| Splinter Review
WIP v3 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency 15.22 KB, patch		Details \| Diff \| Splinter Review
handles more opcodes 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency 18.76 KB, patch		Details \| Diff \| Splinter Review
WIP v5 16 years ago David Anderson [:dvander] - inactive, e-mail if emergency 20.50 KB, patch		Details \| Diff \| Splinter Review

Andreas Gal :gal

Reporter

Description

•

16 years ago

We are currently unable to trace loops with a very branchy interior. Example: for (...) if .. if ... if .. if .. if .. Such if cascades cause a combinatorial explosion through the inherent tail splitting of trace recording. The proposed solution is to estimate when a loop becomes "too branchy" by counting the number of merge points we have come across. We already track merges in cfgMerges whenever we hit an if block (which indicates the merge point in its source note). When we actually hit the merge point, we bump a counter. Between branch traces the counter is handed over through the side exit. If the counter hits a threshold, we terminate the loop with endLoop as if we hit a break; statement, and then immediately restart recording. We call these new form of trace a merge traces. During recording, whenever we hit a merge point, we check whether we already have an existing merge trace with matching types, and if so we terminate the trace we are recording, and link it to the merge trace.

David Mandelin [:dmandelin]

Comment 1

•

16 years ago

(In reply to comment #0) > The proposed solution is to estimate when a loop becomes "too branchy" by > counting the number of merge points we have come across. This seems like a reasonable strategy but the explanation is kind of unclear to me. In particular, a loop can become too branchy without having any merge points at all (nested ifs 10 level deep -> 1024 paths, with only one merge point). Merge points seem to be more precisely an indicator of how much tail duplication is happening. > We already track merges in cfgMerges whenever we hit an if block (which > indicates the merge point in its source note). When we actually hit the merge > point, we bump a counter. Between branch traces the counter is handed over > through the side exit. If the counter hits a threshold, we terminate the loop > with endLoop as if we hit a break; statement, and then immediately restart > recording. We call these new form of trace a merge traces. Restating the above (let me know if I got it wrong), we have a positive integer parameter M. We trace the number of unique merge points that were passed in the recording of each tree. When recording hits the Mth (or greater) merge point for a tree, recording stops and merge traces are started from that point. Now for some math. The key construct that leads to tail duplication is just this: if (...) { ... [A] // straight-line code } ... [B] // straight-line code For [A], we can record 2 traces. If we have both, then in the current design we record 2 copies of [B]. If we have 2 simple if statements following each other, then we get 2 copies of the second if and 4 of the following code. More generally, we have: branch [N1,S1] (...) { } branch [N2,S2] (...) { } ... "branch [N,S]" means a block of code that has N paths though it that reach a common point at the end, with the total size of the N paths being S (in arbitrary size units). Currently, the size of the traces recorded here is: S1 + N1*S2 + N1*N2*S3 + ... + N1*...*N(k-1)*Sk Presumably the last term often dominates, but variable values of S could change that. In the strategy above, we limit the number of terms we get in a row. So if M is 3, for example, we have: (S1 + N1*S2 + N1*N2*S3) + (S4 + N4*S5 + N4*N5*S6) + ... If (N1*N2*S3) (and related terms) are "small enough", then this produces acceptable results. But it seems to me this strategy could fall down if the Ns or the Ss are highly variable. If they are highly variable, I think it would be better to consider instead a measure like this: (number of paths coming in to merge point - 1) * (size of paths going out) This is an estimate of how much tail duplication gain we achieve by merging at this point. Note that both inputs also have to be estimated: we don't know how many traces we will end up recording that reach the merge point, and we don't necessarily how long the traces are going out. It would be nice to have some samples of extra branchy stuff to look at when designing the algorithm. That way, we could mathematically model the effect of different strategies relatively easily. There are a couple of less fundamental points I also wonder about: - Merge points at the top level (not contained in any other branch nest) seem different from nested merge points. It's easy to understand the benefit of merging at top-level merge points and easy to see how they will look in the trace tree; nested merge points seem more confusing. I would suggest at first restricting ourselves to top-level merge points, unless someone has an easy way of understanding them. - Do we plan to select merge points (for starting merge traces) that already have traces going through them, or will we select merge points only on the first time recording reaches them?

Christopher Blizzard (:blizzard)

Updated

•

16 years ago

Blocks: 509986

Boris Zbarsky [:bzbarsky]

Comment 2

•

16 years ago

I wonder whether there are cases (e.g. switch) where it might be worthwhile setting M = 1. That is, always merge after a switch. I can file a separate bug on this if desired... It seems like it might be more tractable than the general case and might give us implementation experience and immediate wins on some of these emulator "benchmarks"...

Boris Zbarsky [:bzbarsky]

Comment 3

•

16 years ago

Oh, and one more thing regarding the measure used to determine gain.... What matters is not only the codesize reduction from eliminating duplication but also the branch number reduction (due to the existence of MAX_BRANCHES). So even if there's not that much code in the tails, if they're executed often we might have a big win from merging just due to being able to trace all the codepaths to start with.

Graydon Hoare :graydon

Comment 4

•

16 years ago

I am thinking aloud in order to clarify what this bug means; let me know if I'm wrong. The idea, if I get it, has not so much to do with handling *nested* ifs (which only produce depth-of-nesting copies of the tail) but, as dmandelin points out, *sequential* ifs, that produce exponential number of copies of the tail. (Though if so I do not see why he says "nested ifs 10 level deep -> 1024 paths", as a 10-level-deep nested if block seem to me to only produce 10 copies of its tail; you can only enter the Nth nesting if the (N-1)th nesting was taken. Sequential branching causes *both* branches to get a copy.) Anyway, assuming that was a typo and we're really only concerned about exponential explosion due to *sequential* branches, in the following code: if (A) { P } if (B) { Q } if (C) { R } S We're trying to avoid making 2^3=8 copies of S: S RS P S QRS P S P RS PQ S PQRS And instead come up with a means of emitting: Loop-header + Merge-point + branch-traces branch-traces ------------------------------------ | P | | R S | | Q | ---exit---> | S | | PQ | IOW, bound the sequential-branchiness (exponential-size) of any path at some (small) number and *convert* it to nested-beanchiness (linear-size) of linked "merge traces". If so, is the idea that "a tree" will now consist of the 'main' trace, plus all of its direct branch fragments, plus all of the 'main' merge-point fragments, plus all of the branch fragments leading off all the merge-points? You say "call closeLoop and immediately resume recording", but you can't really mean "closeloop", because it does a whole mess of type-stabilization stuff and it assumes it's actually at a loop edge. Something like closeloop though, that produces a new root for subsequent branch traces, but is considered a part of the main tree? What happens on that 'exit' edge? Is it essentially the same as a branch exit? If so, I'm confused. In the diagram above, I glossed over the fact that only *one side* of a trace like PQ is actually "straight-line" (on-trace) code; the other side P(!Q) actually consists of guard on Q and a branch exit already. If a merge-point-exit is going to be, more or less, identical to a branch exit ... why are we inventing a new idea? IOW, why manufacture a new mechanism of tracking "merge points" when it's perfectly well represented already by the symmetrical concept "branch points"? Every branch point implies a merge point, right? Are we (logically) doing anything other than "forcing every Nth branch to be a branch-exit on *both* its taken- and not-taken edges"?

Mike Shaver (:shaver emeritus)

Comment 5

•

16 years ago

I don't understand how merging traces, which happens after the branchiness, will help us avoid hitting our heads on MAX_BRANCHES, which would happen at the branchiness. Are we proposing to do something at the _head_ of traces, as well as the merging the tails? Am I totally missing it?

Andreas Gal :gal

Reporter

Comment 6

•

16 years ago

if if (*) if a else b Think of it as forming trees that do not start in a loop header but inside a loop. From the original tree header individual traces form a tree, but if they touch (*) they just end and cross-jump to that tree. Hence the innermost if with its individual possible branches does no longer cause additional tail duplication.

Mike Shaver (:shaver emeritus)

Comment 7

•

16 years ago

OK, yeah, that's what I thought. But I don't get comment #3 then, still, because MAX_BRANCHES cares about profusion of "trace heads", not "trace tails", right? For a given location, this merging won't keep us from hitting MAX_BRANCHES where we would otherwise, will it?

Andreas Gal :gal

Reporter

Comment 8

•

16 years ago

It will. For the case above, you can fully cover the tree with maxbranches=4. Since you have 4 in the tree to the let and 2 in the tree to the right. Otherwise you would need 8.

Boris Zbarsky [:bzbarsky]

Comment 9

•

16 years ago

Mike, simple example of how this can help with max branches is code like this: switch (cond) { /* 10 cases */ } switch (cond) { /* 10 cases */ } If we merge after the first switch, we get a maximum of 20 branches here, right? If we don't we get 100.

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Blocks: 518549

Graydon Hoare :graydon

Comment 10

•

16 years ago

Hm. Let me try again! I was actually hoping for some clarification in comment #4. In particular the last paragraph: IOW, why manufacture a new mechanism of tracking "merge points" when it's perfectly well represented already by the symmetrical concept "branch points"? Every branch point implies a merge point, right? Are we (logically) doing anything other than "forcing every Nth branch to be a branch-exit on *both* its taken- and not-taken edges"? (I understand that even doing so requires new *policy* concerning the formation of branches; I'm wondering about the need for new mechanisms though.)

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 11

•

16 years ago

If I understand comment #10 correctly, we have a limitation in that the register/spill state might not be compatible from a branch-exit to a merge point. We have to spill everything and reload it. It might not be worthwhile for every branch exit to also be a merge point (for example, shape guards) or it might not be worth recording such new traces unless they are very hot. If we could immediately jump into the middle of a trace, maybe there wouldn't be a problem.

Andreas Gal :gal

Reporter

Comment 12

•

16 years ago

#11: I am working on improving that situation. We currently a) spill all registers b) move all data from spill slots into the native stack. I will gradually undo both limitations. I don't think we want all merge points as potential branch targets since in that case we can't do dead code elimination across basic blocks any more.

RNicoletto

Comment 13

•

16 years ago

Why Platform tag is "Mac OS X"? Shouldn't it be "All" as per bug 509986?

Andreas Gal :gal

Reporter

Comment 14

•

16 years ago

buzilla sets the default platform/architecture based on the browser ua info and I filed the bug with ff running on macosx. fixed now

OS: Mac OS X → All

Hardware: x86 → All

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 15

•

16 years ago

Attached patch pathological test case — Details — Splinter Review

recorder: started(1), aborted(0), completed(33), different header(0), trees trashed(1), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(0), blacklisted(0) monitor: triggered(99205), exits(99205), type mismatch(0), global mismatch(0), flushed(0) real 0m0.137s

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 16

•

16 years ago

With an experimental patch for merge tracing: recorder: started(1), aborted(0), completed(24), different header(0), trees trashed(1), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), merged loop exits(1), unstableInnerCalls(0), blacklisted(0) monitor: exits(25), timeouts(0), type mismatch(0), triggered(25), global mismatch(0), flushed(0) real 0m0.028s

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 17

•

16 years ago

Attached patch WIP v1 (obsolete) — Details — Splinter Review

This is an extremely hacky patch that I used to experiment. Applies on top of bug 524620 and then bug 525371.

Assignee: general → dvander

Status: NEW → ASSIGNED

Boris Zbarsky [:bzbarsky]

Comment 18

•

16 years ago

fwiw, that crashes on loading http://benfirshman.com/projects/jsnes/ : Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000004 0x003fe8fe in TraceRecorder::dataAlloc (this=0x1c2b23a0) at jstracer.h:1291 1291 VMAllocator& dataAlloc() const { return *traceMonitor->dataAlloc; } traceMonitor is null. #0 0x003fe8fe in TraceRecorder::dataAlloc (this=0x1c2b23a0) at jstracer.h:1291 #1 0x003c85f9 in TraceRecorder::outOfMemory (this=0x1c2b23a0) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jstracer.cpp:2596 #2 0x003f89d6 in TraceRecorder::monitorRecording (cx=0x1370e00, tr=0x1c2b23a0, op=JSOP_IFEQ) at /Users/bzbarsky/mozilla/tracemonkey/mozilla/js/src/jstracer.cpp:7164 #3 0x003043af in js_Interpret (cx=0x1370e00) at jsops.cpp:79

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 19

•

16 years ago

(In reply to comment #18) Thanks, I messed up someone's lifetime somewhere.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 20

•

16 years ago

Using date.now() to benchmark -- JIT without merge traces: 1143ms Interpreter: 729ms v8: 81ms With merge traces, the results are very dependent on the cut-off heuristics. Here's a little table, the first column is how many IFEQs result in a merge point, and the second is the resulting benchmark time. -------------------- | 5 | 710ms | | 4 | 662ms | | 3 | 72ms | | 2 | 83ms | --------------------

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 21

•

16 years ago

Attached patch WIP v2 (obsolete) — Details — Splinter Review

The approach used in comment #20 meant merge traces could easily have overlapping regions. So as cfgDiversions gets higher, the more needless merge traces there will be. Now IFEQ will always join to a compatible merge trace if one exists. So here are new statistics on the same benchmark: |----------------------------| | IFEQ cutoff | Time | Exits | |-------------|------|-------| | 2 | 73ms | 35 | | 3 | 66ms | 51 | | 4 | 58ms | 77 | | 5 | 55ms | 107 | ------------------------------ I am not sure what's going on with the time-versus-exit-count pattern there. Will test on SunSpider as well to see the effect of merge traces on code with a small number of branches.

Attachment #409836 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 22

•

16 years ago

An explanation for the disparity is probably that transitioning between traces is slow (comments 11, 12). So with a lower cut-off, the traces are less branchy, but the generated code ends up being not as good.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 23

•

16 years ago

Attached patch fixes crash bz found (obsolete) — Details — Splinter Review

Attachment #410113 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 24

•

16 years ago

Attached patch WIP v3 (obsolete) — Details — Splinter Review

It seems like it's not valid to key merge points on the PC, call depth, and typemap - we actually need to add each PC of the call chain into the mix. Otherwise a function could be inlined twice, and for one callsite to merge to the other is invalid. Please correct me if this thinking is wrong, Andreas. This patch addresses that issue. The only opcode supported is still JSOP_IFEQ - more will come soon.

Attachment #410338 - Attachment is obsolete: true

Andreas Gal :gal

Reporter

Comment 25

•

16 years ago

If we start at the same tree anchor, and maintain merge traces per tree, then if we inline twice callDepth should be different.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 26

•

16 years ago

(In reply to comment #25) > If we start at the same tree anchor, and maintain merge traces per tree, then > if we inline twice callDepth should be different. I am thinking of a scenario like: for (;;) { f(); g(); f(); } Now f() has been inlined twice, in each case callDepth == 1.

Andreas Gal :gal

Reporter

Comment 27

•

16 years ago

Right ... mhm. Yeah, ok I concede. So its the hash of all pcs along callDepth.

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 28

•

16 years ago

Attached patch handles more opcodes (obsolete) — Details — Splinter Review

Handles all ifop() and fused-if opcodes.

Attachment #410418 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 29

•

16 years ago

Attached patch WIP v5 — Details — Splinter Review

Fixed some serious perf regressions. The heuristics are still not good, or something is missing. SunSpider gets a slight win in 3d-cube (1% overall), but the NES benchmark slows down noticeably. Still investigating.

Attachment #410429 - Attachment is obsolete: true

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Updated

•

16 years ago

Depends on: 525371

Julian Seward [:jseward]

Comment 30

•

16 years ago

(In reply to comment #10) > Hm. Let me try again! I was actually hoping for some clarification in comment > #4. In particular the last paragraph: [..] I must say I find the terminology confusing too. In particular I don't see why we need to speak about either merge points or branch points. My impression from talking to dvander was that this is a simple mechanism. As we record, we may at any point decide to truncate the trace and force a continuation on a common successor trace. The reason for the truncation can be completely arbitrary, but presumably would be along the lines of a judgement that the probability of still actually being on trace at this point had become unacceptably low. And the main complication is reconciling the stack frame state for all the various control flow edges reaching the successor trace. Is that correct? Did I miss something?

Duncan Loveday

Updated

•

16 years ago

Blocks: 535925

Duncan Loveday

Updated

•

16 years ago

No longer blocks: 535925

Boris Zbarsky [:bzbarsky]

Updated

•

16 years ago

Blocks: 508716

Duncan Loveday

Updated

•

16 years ago

Blocks: 530956

Mathieu Pellerin

Comment 31

•

15 years ago

Is this something that will become irrelevant with the Jaegermonkey JS engine?

David Anderson [:dvander] - inactive, e-mail if emergency

Assignee

Comment 32

•

15 years ago

Not necessarily, it's still an interesting topic, but there really isn't pressure to research & implement it anymore.

Mathieu Pellerin

Comment 33

•

15 years ago

Being able to waste some time on JSNES isn't sufficient pressure? ;)

Mathieu Pellerin

Comment 34

•

15 years ago

Giant leap in JSNES' fps with the landing of jaegermonkey, congrats to developers. For the record, on my machine (2g ram, intel core 2 CPU 3GHz), Chrome is still faster, running at a constant 58fps, while Firefox varies from 40fps to 58fps.

David Mandelin [:dmandelin]

Comment 35

•

15 years ago

(In reply to comment #34) > Giant leap in JSNES' fps with the landing of jaegermonkey, congrats to > developers. > > For the record, on my machine (2g ram, intel core 2 CPU 3GHz), Chrome is still > faster, running at a constant 58fps, while Firefox varies from 40fps to 58fps. What happens if you set in the about:config prefs javascript.options.tracejit.content = false Does the frame rate still vary, or is it steady with that setting?

Mathieu Pellerin

Comment 36

•

15 years ago

Frame rate still varies after switching tracejit off. It's also not boosting or negatively impacting the fps.

Ryan VanderMeulen [:RyanVM]

Comment 37

•

14 years ago

This bug is obsolete with the removal of the tracejit. Bug 509986 is still open for tracking JSNES performance in SM.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → WONTFIX

Duncan Loveday

Comment 38

•

14 years ago

So TM is gone yes ? All that work ! What a shame if things didn't work out. Is there a blog anywhere ?

Jan de Mooij [:jandem]

Comment 39

•

14 years ago

(In reply to Duncan Loveday from comment #38) > So TM is gone yes ? All that work ! What a shame if things didn't work out. > Is there a blog anywhere ? http://blog.mozilla.com/nnethercote/2011/11/23/memshrink-progress-report-week-23/ It also explains why the work on TM is not (entirely) wasted.

You need to log in before you can comment on or make changes to this bug.