Would this be solved by more aggressive tiling of the clip masks? I.e. just split everything in 512 chunks in each dimension, like we do for blob images.
We could even go as far as do the tile splits according to the available storage in the current target, i.e. supposing we need to allocate 1900x10 piece, we look at the current 2048x2048 surface and see that we can only fit 1000x10 at most, so we cut it there in 2 pieces, and draw the other half in the next layer. On a second thought, this might be complex to implement: we only know about the available RT size when we assign to passes, and only at this point we'd know how the instances are built on the dependent passes...
So, we need to come up with a better strategy for render target allocation. It might make sense to have a heuristic where allocations that have one dimension much larger than the other go into a separate rectangular render target, perhaps?
Not sure how this would work though: if we have a separate target with a separate size, that means it has to be processed in a separate pass. This means more RT/texture switches, and not clear how the pass dependencies are going to work out, logically, within the current framework. Nicola has been looking into making the RT graph more generalized, so this might be handy.
Perhaps, what we are missing is just more precise tracking of the available texture/target space, i.e. a better allocator?