It will be fairly straightforward to measure the impact on load time, and any large impacts on retention.
It will be difficult to measure the impact on user perception of performance, and the impact on page breakage. If either of these effects are really big then they might impact retention (positively or negatively) and be visible that way. But there's a definitely a potential for this change to make users happier or less happy in a way that we care about but can't detect through the load time or retention metrics.
If there are other performance metrics than page load (e.g. something that measures jank) then that would help.
Anti-tracking and I have been developing an add-on for detecting page-breakage caused by the change being tested. If you really want to be sure that you're not breaking more pages, then we should consider re-using that methodology here - which of course would delay things so it might not be an option?
If you're not too worried about the risks, then we could just go with the simple experiment, roll it out if the results are not alarming, and be prepared to roll it back if the amount of breakage is in the range where it didn't show up in the experiment's retention metrics but does generate bug reports when the feature is fully rolled out.