Open Bug 1454586 Opened 6 years ago Updated 2 years ago

Optimize resume point creation in IonBuilder

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

Performance Impact medium
Tracking Status
firefox61 --- fix-optional

People

(Reporter: jandem, Unassigned)

References

(Blocks 3 open bugs)

Details

(Keywords: perf)

Resume point creation in IonBuilder often shows up in profiles and it scales badly with the number of locals.

We should consider collecting enough data in IonBuilder to move the actual resume point allocation off-thread. Another idea is for resume points to have a pointer to a previous resume point so we only have to record the delta.

I'd like to collect some data on the different resume point types we have.
Priority: -- → P2
I've been thinking about it this week and I think there's a really nice way to do this:

(1) Change MResumePoint to support a sparse/delta representation in addition to the current dense one.

(2) Convert IonBuilder to use the sparse representation as much as possible.

(3) After IonBuilder, immediately "densify" all resume points. This will still be a win because it can be done off-thread. It will use a bit more memory than we use now but I think that's okay.

(4) If this all works out, we can then push this densify step *down* into the compiler: make sure the next optimization pass works fine with delta resume points, then densify after (instead of before) it, and do this for each pass.

(5) Hopefully we will then (incrementally) get to the point where every pass works with delta resume points and we can remove the densify step completely.

So even if we can't use delta resume points everywhere for some reason, we will still be able to use them in IonBuilder.
Overall I think this is a good idea, except for (4), as we depend on the basic block in which the instructions are. So after IonBuilder, the simplest approach is to have the entry resume point be a dense representation, while every other basic block in the loop could be a sparse one.

I do not recall seeing much use for rp->instruction_ link, and I think this is the kind of link which can be replace walking the list of basic block from the entry resume point.
(In reply to Nicolas B. Pierron [:nbp] {backlog: ~36} from comment #2)
> Overall I think this is a good idea, except for (4), as we depend on the
> basic block in which the instructions are.

I'm not sure if this is an unfixable problem - we probably don't have to deoptimize *all* entry resume points. But let's focus on IonBuilder first and then we can see how far we can go :)
In this profile [1] with have an IonCompile which last 144ms with a restartLoop take 35ms, and a lot of time is spent in creating basic blocks and resume points, which is usually a side effects of having tons of live variables within one frame.

This bug might help reduce the time taken on the main thread and potentially move it either off-thread or to make it an incrementally built data structure which is constructed from the diff introduced in each predecessors. (which might be a bit hard to handle with back-edges)

[1] https://perf-html.io/public/af86f00f11f7c7719ed6018a24dab7a86ecc507b/flame-graph/?globalTrackOrder=0-1-2-3-4&hiddenGlobalTracks=1-2-3&localTrackOrderByPid=2573-0~&range=3.3981_8.3281&thread=5&v=3
Keywords: perf
Whiteboard: [qf]
Flags: needinfo?(nicolas.b.pierron)
Whiteboard: [qf] → [qf:p1:64]
Whiteboard: [qf:p1:64] → [qf:p1:f64]
Moving QF Target to qf:p1:f67, we don't believe we can provide a fix in the FF64 time frame.
Whiteboard: [qf:p1:f64] → [qf:p1:f67]
Whiteboard: [qf:p1:f67] → [qf:p2]
Blocks: 1540935
Flags: needinfo?(nicolas.b.pierron)
Priority: P2 → P3
Performance Impact: --- → P2
Whiteboard: [qf:p2]

Iain, is this also a concern for WarpBuilder? Or we can close this issue?

Flags: needinfo?(iireland)

WarpBuilder already moved resume point allocation offthread. There may still be value in having a more compact off-thread representation of resume points, though. It might also open the door to a delta encoding of snapshots, which would reduce steady-state memory usage as well as peak memory usage (during compiles).

I think we should leave this open.

Flags: needinfo?(iireland)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.