Combines3DTransformWithAncestors means that we need to include the transform from our parent layer before trying to do 2D coordinate conversions.
This can be set on the (single!) child layer of a perspective layer (returns true for TransformIsPerspective()), and also for layers where the parent establishes or extends a preserve-3d context.
The case we have here, where the layer has both scroll metadata, and Combines3DTransformWithAncestor() is exclusive to perspective though, since a value for overflow != visible is specified as disabling preserve-3d.
Given that, I think it would be equivalent to use the current logic to find the parent of the layer with metadata, and then if that layer has TransformIsPerspective(), jump up one level further. My previous suggestion of using Combines3DTransformWithAncestor is the more idiomatic way of doing it though.
--- Discussion of a side-issue that doesn't directly affect this bug follows. ---
Botond also raised the question (on irc) of why the scroll offset is on the perspective layer, despite it not being the layer with scroll metadata.
The transforms we need to apply are (for this case where the transformed frame scrolls, but the perspective is on the non-scrolling outer scroll frame):
css-transform * offset-from-transform-frame-to-perspective-frame * perspective-transform * offset-from-perspective-frame-to-reference-frame
Gecko currently constructs the transform layer with just the first term of that sequence, and the perspective item with the other three. That makes it such that the scroll offset (included in the offset between transform and perspective frame) is part of the transform on the perspective layer.
We could just as easily include the first two terms on the transform layer (and it would probably make more sense to do so), at which point the scroll offset is now on the same frame as the transform. Note that that wouldn't affect this bug, and we still need to combine the perspective transform to get correct coordinate conversion.
APZ will be apply any async scrolling to the layer with scroll metadata (the transform layer), so we currently have the weird behaviour where the initial scroll offset is part of the perspective layer (pre-translated), and the async scroll offset is part of the transform layer (post-translated). Again, this doesn't affect behaviour in any way, since we always need to multiply the two together before using them, it's just a bit confusing.