Closed Bug 1553332 Opened 4 years ago Closed 3 years ago

Reduce mark stack memory usage

Categories

(Core :: JavaScript: GC, task, P3)

task

Tracking

()

RESOLVED FIXED
Tracking Status
firefox69 --- wontfix
firefox70 --- wontfix

People

(Reporter: sfink, Assigned: sfink)

References

Details

Attachments

(1 obsolete file)

Bug 1167452 doubled the size of the stack by duplicating it and making a black stack and a gray stack. We should remove the memory overhead.

Blocks: 1556706

Just to get an idea, I dumped out the maximum size reached of the gray and black stacks for an assortment of talos tests:

      Test     Gray    Black
      ----     ----    -----
       bcv     5291     3531
         c    66325    10680
      damp   119669    32390
         d   210519     3514
        g1     5533     3545
        g3     5262     3460
        g4     5272     3460
 tabswitch    13032     5629

I was rather surprised to learn that the gray stack gets larger than the black in the browser.

I have a fix, but it's grown to be kind of complex.

First, I tried SegmentedVector. It wasn't great, because unfortunately the mark stack doesn't just contain individual items; it also has larger structures, and those could get split across segments. BufferList provides more of a raw buffers-of-bytes abstraction, so with some work could be used. But both have the problem that I would need to implement interfaces for sharing unused space between them, which was unappealing.

So I tried a second path -- given that only one color or the other is likely to be really active at once, I maintain two stacks each with its own Vector (just like now). I then juggle between them to try to keep the Vector with higher capacity for use by the active color's stack. How to do that gets a little messy.

The easiest case is when both stacks are empty. Then I can just make the one with bigger capacity the active one.

If the other stack has a larger capacity, then it's still easy: swap the stacks (which is just a pointer swap).

Otherwise, it's more tricky. You have to copy the data in order to swap the stacks. If both are nonempty, then that requires a bunch of copying around. Usually one stack is empty or nearly empty when you're swapping between them, so it's pretty cheap. But it is not rare to have more data built up. So I put a limit on the number of entries to be copied -- if it's more than 64 (currently), I fall back to swapping the stacks and hope that the small-capacity active stack just doesn't see a lot of action before it's swapped back.

I'm doing measurements now.

awsy seems to say that this does improve memory usage by quite a lot [1]. I did not do multiple talos runs, so I don't really have good information about the perf impact yet, but [2] shows some things with substantial gains and some things with even more substantial losses. But it may be noise. Then again, the linux64 and linux64-qr changes are consistent across the board, so there's a decent chance this is real.

One potential solution to excess stack copying would be to do it lazily -- when you change mark colors, don't bother doing anything with the stacks until the first push or pop. That would fix the most common of the known problematic cases where you change mark color, do nothing, then change back. But it adds even more complexity, and a branch during pushing/popping (so far this change has been very isolated). I'll wait for the talos results before attempting anything like that.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=b451023528cc93469a49338812f3649ffd2bb6fc&framework=4&selectedTimeRange=172800

[2 ] https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=b451023528cc93469a49338812f3649ffd2bb6fc&framework=1&selectedTimeRange=172800

Note that the push/pop branch would also be required for any segmented stack solution.

A totally different solution for all of this would be to push color change markers onto the stack. You would remember the current color of the top of the stack, so pushing and popping objects would be unaffected. Changing the color would mean pushing a color change token onto the stack. Popping a color change token would toggle between the colors. It would use a little more space for the tokens, but probably less than having a whole separate stack with its own slop and fragmentation.

Hm... right now, this is seeming like the simplest solution. When changing colors, I might need to check whether the top of the stack is already a color change token and pop it instead of pushing a new one. That would prevent the pathological case of lots of color swaps using up an unbounded amount of space while the stack contents are not changing.

And no added push/pop branch! (Well, it does require an expansion of the switch cases, which could theoretically slow things down from either the switch codegen or branch misprediction, but I don't think our marking is light enough to care.)

Assignee: nobody → sphink

This is currently regressing 69. It would be great to get this fixed and backported before that ships.

Ok, I'll post the patch I have.

This uses the color change token implementation rather that what you suggested, which was to tag every entry with its color, because (1) threading the color down to everywhere that now needed it was a big change, but mostly (2) it seems hard to do on 32-bit, where we don't have anyplace to hide another bit. (TaggedPtr uses alignment to free up the bottom 3 bits for a tag. On 64bit, we have plenty of high bits to steal, but not on 32-bit AFAIK.)

The color change token approach does introduce some complexity in order to make color changes infallible. I didn't spell it out in the comments, but one of the problems is that some tests will init the mark stack and reserve the 1 extra slot we need, then will do a minor GC and clear out the stack, then call oomAfterAllocations(0) to oom on the next allocation, and then change the mark color to gray. That'll hit an OOM while just trying to set the mark color. I kind of hacked around that by allowing a single push on an empty stack (the artificial oom handling is done manually, because we will often have a stack backed by a Vector that is already of nonzero size, and we want to pretend to oom even then.)

After implementing 3 different approaches, the best I could do was to eliminate half the regression. So for now, I backed out the offending patch:

changeset: https://hg.mozilla.org/mozilla-central/rev/4ad75b878827
user: Steve Fink
date: Wed Jul 24 14:29:22 2019 -0700
summary: Backed out changeset 617df479fac1 (bug 1167452)

Sadly, it turns out that yet another of the bug 1167452 patches depended on this, so this triggered bug 1570465 and both will need to be backported together. But I'd rather let it bake for a little longer on Nightly first; I wasn't expecting that fuzzbug to result from the backout.

I'm closing this bug and will file another for my next attempt at allowing gray and black marking to be mixed.

The regressor has been backed out, so I no longer have any relevant code in the tree.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

New attempt in bug 1580888.

Attachment #9079233 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.