We are getting lots of intermittent in the mediasource web-platform tests. With our existing MSE architecture, anything touching MSE was subject to lots of variations. In particular we would create and initialized a hardware decoder whenever an init segment was received. Depending on the load of the machine, this could take some time. The new MSE architecture is about to be enabled and made to be the default. Yet, I'm still getting intermittent. With the new MSE, I can never reproduce any of those timeouts locally, nor in any of my VMs. However, I attempted to simulate a bit the mac try machines which is made of 2010 mac mini. They have a core 2 duo at 2GHz, little memory and a mechanical 2.5" hard drive. Hardly a beast speed-wise. With an apache config, I set up the web-platform tests and ran it locally. When the files being loaded by the JS aren't in the cache, and depending on the load of the machines, I would often get a timeout. Once the file was on the cache (e.g. any follow up runs) then it would be all okay. In 10s we have to go fetch multiple files, play them (typically 4s long), seek into them, perform various operations. I found that often, it would take several seconds just for the apache server to server the files. We do not fail the tests. It just sometimes take more than the time the scripts expect us to complete things: 10s. When I bumped the default timeout to 20s, I never got the timeout. Again, we do not fail the tests per say. The code behaves according to spec: events are fired in the proper order, media is played. I suggest that we bump the default timeout value to 20s to cater for our struggling try platforms.
While the intermittent are more common on macs, they also happen on the windows VMs.
Created attachment 8635198 [details] [diff] [review] Increase default timeout from 10s to 20s. We could make the timeout change just for the mediasource tests. However it would require bigger changes. And that would mean diverging from upstream.
Attachment #8635198 - Flags: review?(james)
I would prefer that we just set the mediasource tests to have a long timeout (<meta name=timeout content=long> and regenerate the test manifest). Many tests currently timeout so increasing the timeout across the board could be a significant performance regression.
Could you point me to some documentation on how to do this? I have no idea on how to generate a manifest. We have three tests that we genuinely fail with a TIMEOUT, I feel that waiting a full minute all the time is quite long. What about adding a new value, like "medium", which would be 20s
There is an intentional decision to limit the number of timeout choices since it's generally hard to tell how test performance will vary with different environments (e.g. hardware, compile options). I don't think that having three choices would make this substantially easier to get right than having only two. To modify the timeouts the easiest thing to do is add the <meta name=timeout content=long> to the tests, before the <script src="/resources/testharness.js"></script> and then run ./mach web-platform-tests --manifest-update. All being well it should just add some timeout="long" keys to the manifest.
But I thought we couldn't modify the tests themselves as any changes would then be uploaded upstream. I'm not sure we want that...
Sure we do. If these are intermittent for us because of timing issues it seems quite likely that they will be for other people too.
Comment on attachment 8635198 [details] [diff] [review] Increase default timeout from 10s to 20s. Review of attachment 8635198 [details] [diff] [review]: ----------------------------------------------------------------- (this patch wouldn't work in any case because the harness will kill the browser after 15s).
Attachment #8635198 - Flags: review?(james) → review-
so I'm guessing I'm not seeing anymore intermittent locally because all tests complete within 15s then. I bumped the value for just the test that currently is exhibiting intermittent. Will see how that go on try.
Component: Audio/Video → Audio/Video: Playback
Seems like we're not getting to this.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
The timeout value was bumped in another bug and was applied upstream.
You need to log in before you can comment on or make changes to this bug.