Several months ago I made a script to test latency between the host and the device under test. This proved that there is a significant variation between the speed of tests run, between the hardware/VMs where the test is run and the device itself. It is likely to be greater when we start running tests on different device hardware, too. When we run the tests we often see these differences in test results between individuals running the tests because of this latency. Fast is not necessarily better because it often can go faster than Gaia. We get this between hardware on WebDriver/desktop browser tests too but to a lesser extent. I don't really know of a good solution but I'm just raising this as an enhancement for discussion if in some way we can normalise/make more consistent the speed between different hardware combinations. All I can propose so far is a slight lag on marionette commands to slow the fast systems down to the level of the slower hardware systems but I know that's pretty vague. As we're functional testing with this outright speed is less of a priority than reliable functional testing. Someone more familiar with writing test framework may have dealt with something like this before?
We're seeing this same issue when looking at tests on b2g desktop builds. I think the only solution is to add more timing awareness to the tests. That is, after performing an action that has some effect on the UI, we should have code that explicitly waits for that state to appear. We already have some of that, but in many other cases, we don't, because most of the time on a real device, the latency between the host and the device means that the expected state appears before the next Marionette command gets executed. If the latency drops, we can run into trouble. We need to adapt the tests to run well on b2g desktop builds, since we're going to get them going on TBPL on that platform. I think doing this will improve the stability of the tests on all platforms, since we'll have to address these exact sorts of timing issues there.
Yes I realise that but coding timing issues at the test level is very time consuming and it's rarely the same solution twice which means there are few chances for making the process faster. A tweak in marionette at the top of the tree would help at so many points below that tree saving man hours. In the same vein as having implicit waits for elements increases reliability. I know I don't have much of a solution but a couple of days even spent here would produce large long term gains. I daren't think how many hours I've spent chasing the differences between test timing on CI and local.. Definitely more than days!
Technically, it would be quite easy to add some latency setting to Marionette, but I fear that would basically just shift the problem, rather than fixing it. Right now, we observe problems because sometimes the latency is less than the usual amount, and this exposes timing problems in the test. Adding some artificial latency to Marionette won't fix this. Instead, the "usual amount of latency" will just be greater than it is now. It will still be variable, for the various reasons you mentioned, and it is very likely that new tests that are written will (unwittingly) take advantage of the "new normal", and break when the actual latency drops below that new normal. When the tests are run on B2G desktop builds, they have very low latency in general. I think fixing the tests for this target, which we intend to do, will fix this problem across the board, and that is a much more robust solution.
That's only useful for the small sub-set of tests that can run on Desktop. I understand what you mean by it just shifting the perception of latency but you are talking about a set amount of latency at all times? I want to reduce the variation rather than just increase the latency by a set amount at all times. So if we can calculate the combination to be slow then don't add latency, if something is fast add in latency. It would really be mostly about trying to give the test coder more confidence that they can write a test and if the timing is OK on their device then it will be OK on desktop or other devices too. There will be less pressure to test on combinations of devices/desktop which of course delays progress.
Created attachment 745223 [details] Marionette 'ping' script Here is a marionette ping script I started that Dave fine-tuned. I run the tests from a VM to an Unagi and the average response is 36. The variance is 25-68 (43). I'd be interested to see results for desktop builds and for running tests not on a VM.
Closing out B2G related bugs. If these still happen and are valuable, please reopen with details again and how this affects Desktop/Fennec