Closed Bug 1142501 Opened 10 years ago Closed 7 years ago

Intermittent Linux TEST-UNEXPECTED-TIMEOUT | /webvtt/rendering/cues-with-video/processing-model/basic.html | expected FAIL

Categories

(Testing :: web-platform-tests, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: cbook, Unassigned)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [leave open][test disabled])

Ubuntu VM 12.04 x64 mozilla-central pgo test web-platform-tests-reftests https://treeherder.mozilla.org/logviewer.html#?job_id=1155300&repo=mozilla-central 05:10:21 INFO - TEST-UNEXPECTED-TIMEOUT | /webvtt/rendering/cues-with-video/processing-model/basic.html | expected FAIL
Summary: Intermittent basic.html | expected FAIL → Intermittent TEST-UNEXPECTED-TIMEOUT | /webvtt/rendering/cues-with-video/processing-model/basic.html | expected FAIL
Test disabled on Linux, I think: https://hg.mozilla.org/integration/mozilla-inbound/rev/06562fa26db3 since this was becoming one of the top sources of orange. I'd appreciate post-facto review. It would also be good to follow up on figuring out what made this start to time out, although it's not clear to me that the test is well-written, nor is it clear to me how much we should care about a failing test timing out.
Flags: needinfo?(james)
Keywords: leave-open
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | /webvtt/rendering/cues-with-video/processing-model/basic.html | expected FAIL → Intermittent Linux TEST-UNEXPECTED-TIMEOUT | /webvtt/rendering/cues-with-video/processing-model/basic.html | expected FAIL
comment 3 appears to be the earliest starred occurrence, though I suspect there were prior unstarred ones.
I did some retriggers on: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=b6329532e4e9 https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=58c9d079f318 to isolate when this reached mozilla-central. (We know it was on central for the pgo build for the latter; I just want to confirm it's on the 64-bit opt build as well, since that's what I was retriggering on the previous merge.)
That looks like the right fix to disable it. I did wonder if this was related to the harness changes in http://hg.mozilla.org/integration/mozilla-inbound/rev/d4a480e3bf65 but it looks like that landed on inbound on 6th March which is too early to explain this bug.
Flags: needinfo?(james)
It was present on mozilla-central on b6329532e4e9, so I need to go back further.
It seems like I can retrigger builds arbitrarily far back in the past: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=Ubuntu VM 12.04 x64 mozilla-central opt test web-platform-tests-reftests and this bug will occur 50%-75% of the time on Linux 64-bit opt runs. Yet it was not occurring at the time, when the original runs for those changesets happened. This suggests one of two (I think) things: (a) The factor that caused the regression is external to the repositories being tested. (b) The test runs are not using the tests or code for the changeset that they claim to be (e.g., because they're downloading a "latest" build or tests instead of the package for the right revision) Either one is a serious problem. See, e.g., these retriggers for a mozilla-central push from March 6: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=3c7cbc756a6a when prior to the retriggers I did today, none of the Linux 64-bit opt runs prior to the ones already in this bug showed the problem.
Flags: needinfo?(james)
Note that (a) violates the first point in https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#Must_avoid_patterns_known_to_cause_non_deterministic_failures and I think (b) probably ought to violate something in that rule as well, although it seems less explicit to me.
As far as I know, everything in the test and harness is deterministic to the same extent that our other test harnesses are (i.e. up to the limit of changes caused by differences in the underlying OS / hardware). In particular the tests are all copied in-tree and packaged into tests.zip along with the harness in the same way as for e.g. mochitests. The harness also configures Firefox to crash if the tests try to access any resources outside the local network. I also can't see anything obviously suspicious about this particular test. From the log it appears that when it times out the reftest-wait class is never being removed from the root element. This ought to happen only if a playing event is never dispatched to the <video> element, which seems like a plausible kind of bug, but it doesn't obviously explain the pattern you see above, and which would presumably affect all the other WebVTT tests that are written with the same structure. SO, I guess I don't have any concrete ideas. I'm away next week so I won't have time to debug it in more detail until I get back.
Flags: needinfo?(james)
Whiteboard: [leave open][test disabled]
I still think comment 177 is suspicious.
Flags: needinfo?(james)
Indeed comment 177 was suspicious, but what it missed being suspicious of was the fact that (a) includes the most obvious external to the repo thing being tested, Amazon's ability to provide us with VMs that consistently have the same capabilities and that are not sometimes broken. Conveniently, Amazon seems to have gotten somewhat better behaved, and someone at some time reenabled this test.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(james)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.