I have a fix, but it's grown to be kind of complex.
First, I tried SegmentedVector. It wasn't great, because unfortunately the mark stack doesn't just contain individual items; it also has larger structures, and those could get split across segments. BufferList provides more of a raw buffers-of-bytes abstraction, so with some work could be used. But both have the problem that I would need to implement interfaces for sharing unused space between them, which was unappealing.
So I tried a second path -- given that only one color or the other is likely to be really active at once, I maintain two stacks each with its own Vector (just like now). I then juggle between them to try to keep the Vector with higher capacity for use by the active color's stack. How to do that gets a little messy.
The easiest case is when both stacks are empty. Then I can just make the one with bigger capacity the active one.
If the other stack has a larger capacity, then it's still easy: swap the stacks (which is just a pointer swap).
Otherwise, it's more tricky. You have to copy the data in order to swap the stacks. If both are nonempty, then that requires a bunch of copying around. Usually one stack is empty or nearly empty when you're swapping between them, so it's pretty cheap. But it is not rare to have more data built up. So I put a limit on the number of entries to be copied -- if it's more than 64 (currently), I fall back to swapping the stacks and hope that the small-capacity active stack just doesn't see a lot of action before it's swapped back.
I'm doing measurements now.