It turns out that the primitive dependency calculation for picture caching ends up having to do almost the same work as the visibility and clip chain generation code.
However, we don't want to run all the primitive preparation code up front, since we can skip that for valid tiles.
We should split the current prepare pass into a visibility pass and a preparation pass. This will allow picture caching to be much faster in CPU time (by reducing work). It also opens up the possibility of dirty regions instead of a single dirty rect. Finally, it unblocks the work needed to fix an invalidation bug that occurs when scrolling + a new display list arrives.