Here's a Talos comparison between the pref off and on; click "Show only important changes" to get something that's easier to read.
It shows a number of wins across the board and one regression.
The regression is probably not real: The glvideo test is already very bimodal and I think the "pref on" jobs probably just happened to hit the "high" values more frequently.
The improvements are partially real and partially a change in what's being tested.
BasicCompositor is a lot more efficient with the CoreAnimation path because one or two window-sized copies are eliminated, so the 20% win on basic_compositor_video is probably real.
Window resizing also becomes a lot more efficient with CoreAnimation: Without CoreAnimation, the window has a main memory buffer behind the OpenGL context which is never shown on the screen but which gets copied around a bit during window resizing. There's also some extra surface synchronization overhead. CoreAnimation gets rid of that copy and overhead. So the 17% win on tresize is probably also real.
The other wins are not real; they're caused by a difference in how ASAP mode works.
"ASAP" mode is enabled by setting layout.frame-rate to 0. This causes us to refresh the window as many times as we can, more frequently than vsync if possible, and to turn off refresh synchronization on the OpenGL context.
With the non-CoreAnimation path, this has the effect that our window contents are presented to the screen as soon as possible, potentially incurring "tearing" artifacts. My hypothesis for how this works is that every call to SwapBuffers forces the window server to do a present in the window's rectangle and copy the window contents to the screen's front buffer, and I think that SwapBuffers waits for the previous frame's window server copy to finish. So our measurement included measuring one window server copy per frame.
With CoreAnimation, we no longer have a way to force an "immediate window server copy". We draw as frequently as we can, but the window server will only copy our window contents to the screen once per vsync interval. So the ASAP mode measurements now include less window server work, maybe one window server copy every 10 drawn frames or so, depending on how many composites we do within one vsync interval.
I'm not concerned about this change in measurement. We will still measure our own drawing work just as we were before. It would be nice to get a good sense of how much window server work each composite is causing, but it's not the end of the world if we can't get that data.