Open Bug 1544465 Opened 2 years ago Updated 2 years ago

osx Mojave (10.14) -dom/media/test/test_bug1113600.html - gizmo-short.mp4-10 timed out!

Categories

(Core :: Audio/Video, defect, P2)

defect

Tracking

()

People

(Reporter: jmaher, Unassigned)

References

(Blocks 1 open bug)

Details

we are looking to upgrade the OSX OS version from 10.10 to 10.14 in our automation. There are a few failures and we don't know if these are related to the OS installation/config, our tooling needs, or the browser.

you can see failures on some try pushes here:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher%40mozilla.com&fromchange=96a96f509d98f90ce3f8e35990df75153755b877&tochange=b9ef9517732230f87398e4b26d30ed5c2e34adce&searchStr=media

02:05:48 INFO - TEST-PASS | dom/media/test/test_bug1113600.html | [started gizmo-short.mp4-10 t=3.391] Length of array should match number of running tests
02:05:49 INFO - gizmo-short.mp4 capture started at 0.13725. Duration=0.27
02:05:49 INFO - GECKO(882) | [Child 884, Main Thread] WARNING: Cannot query channel count on a AudioSegment with no chunks.: '!mChunks.IsEmpty()', file /builds/worker/workspace/build/src/dom/media/AudioSegment.h, line 408
02:05:55 INFO - GECKO(882) | --DOMWINDOW == 12 (0x113dec020) [pid = 884] [serial = 314] [outer = 0x0] [url = about:blank]
02:05:55 INFO - GECKO(882) | --DOCSHELL 0x113eab800 == 3 [pid = 884] [id = {762c6115-f240-9c4b-9411-6a7f34c4d106}] [url = about:blank]
02:05:59 INFO - GECKO(882) | --DOMWINDOW == 11 (0x10d326c00) [pid = 884] [serial = 316] [outer = 0x0] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
02:05:59 INFO - GECKO(882) | --DOMWINDOW == 10 (0x113e13c00) [pid = 884] [serial = 315] [outer = 0x0] [url = about:blank]
02:05:59 INFO - GECKO(882) | --DOMWINDOW == 9 (0x10d31cc00) [pid = 884] [serial = 313] [outer = 0x0] [url = http://mochi.test:8888/tests/dom/media/test/test_buffered.html]
02:06:06 INFO - GECKO(882) | --DOMWINDOW == 8 (0x113e1c000) [pid = 884] [serial = 318] [outer = 0x0] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
02:06:55 INFO - GECKO(882) | --DOMWINDOW == 7 (0x113e14400) [pid = 884] [serial = 317] [outer = 0x0] [url = http://mochi.test:8888/tests/dom/media/test/test_bug1018933.html]
02:08:31 INFO - GECKO(882) | [Parent 882, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/toolkit/components/antitracking/AntiTrackingCommon.cpp, line 400
02:08:48 INFO - TEST-INFO | started process screencapture
02:08:48 INFO - TEST-INFO | screencapture: exit 0
02:08:48 INFO - Buffered messages logged at 19:05:45
02:08:48 INFO - TEST-PASS | dom/media/test/test_bug1113600.html | A valid string reason is expected
02:08:48 INFO - TEST-PASS | dom/media/test/test_bug1113600.html | Reason cannot be empty
02:08:48 INFO - Buffered messages finished
02:08:48 INFO - TEST-UNEXPECTED-FAIL | dom/media/test/test_bug1113600.html | Test timed out!
02:08:48 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:275:18
02:08:48 INFO - onTimeout@dom/media/test/manifest.js:1778:7
02:08:48 INFO - setTimeout handlerMediaTestManager/this.started@dom/media/test/manifest.js:1785:57
02:08:48 INFO - startTest@dom/media/test/test_bug1113600.html:22:11
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNull
startTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNullstartTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNull
startTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNullstartTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNull
startTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNullstartTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNull
startTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNullstartTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNull
startTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.finished@dom/media/test/manifest.js:1813:12
02:08:48 INFO - startTest/v.onended@dom/media/test/test_bug1113600.html:39:13
02:08:48 INFO - EventHandlerNonNull*startTest@dom/media/test/test_bug1113600.html:36:3
02:08:48 INFO - MediaTestManager/this.nextTest@dom/media/test/manifest.js:1834:12
02:08:48 INFO - MediaTestManager/this.runTests/<@dom/media/test/manifest.js:1754:12
02:08:48 INFO - gizmo-short.mp4-10 timed out!
02:08:48 INFO - [finished gizmo-short.mp4-10] remaining=
02:08:48 INFO - TEST-PASS | dom/media/test/test_bug1113600.html | [finished gizmo-short.mp4-10 t=183.41] Length of array should match number of running tests
02:08:48 INFO - TEST-PASS | dom/media/test/test_bug1113600.html | [started flac-s24.flac-11 t=183.411] Length of array should match number of running tests
02:08:50 INFO - flac-s24.flac capture started at 1.088435. Duration=4.04

:pehrsons, would you have more context on this failure? Do let me know if there is more information needed or potential OS/tooling changes to fix this. It could be that the timing is changing or something different happens in OSX 10.14.

Flags: needinfo?(apehrson)

Looking at the test code, I don't think there's something inherently wrong with it, we should have a guarantee that these mid-playback-captured sessions end. I suspect implementation bugs, and that timing changes on OSX 10.14 trigger these.

Can you try again with the patches over at bug 1536766? I fixed a lot of timing related things that are fairly close to landing. If you still have these failures with those patches, I'd like to know whether you have failures with a revision prior to bug 1423253.

Flags: needinfo?(apehrson) → needinfo?(jmaher)

ok, same failure (and even more) with the patches from bug 1536766:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4760cf9de87b7e58f2d26771bbb0dfa859603e9f

next up is an older tree prior to bug 1423253.

Flags: needinfo?(jmaher)

Interesting!

I'll try to repro locally. In any case, I don't think this is related to OS or tooling - it wouldn't be intermittent if it were. It's likely that some kind of race comes into effect.

Prioritized based on blocking bug 1530474, whether this means disabling test or otherwise.

Priority: -- → P2

I tried to repro this on try in [1] but it was all green.

Is there something that landed on m-c in the last week or so that I need in order to run this on the proper machines?

I'll try to rebase to the same parent you had as well, as soon as I've gone through review comments and made updates for those patches that are in flight.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c844be29a4706ff042ef290ec41a3242286c4f23

Flags: needinfo?(jmaher)

these are not in production yet, if all tests were passing, we would be ready to run these in production. I have to have a local diff applied to my tree when pushing to try:
https://hg.mozilla.org/try/rev/398ba0468f7699ece8b981b35bf70f1649b3ca02

Flags: needinfo?(jmaher)

I had that: https://hg.mozilla.org/try/rev/322c0e685f25b907c05fcd21204fbd09b8529c1f

I take it no other differences are needed then. Well, at least something is making them green in that case, I just need to figure out what. None of my patches affect test_bug1113600.html, only underlying code, but you had all those changes in your push too.

I did have some logging enabled, but I wouldn't expect such a large fallout to become completely green with this logging. I shouldn't rule it out completely; either the logging made it green or we need to bisect from [1] (my parent) to [2] (your parent).

[1] https://hg.mozilla.org/try/rev/d6988a918ee4
[2] https://hg.mozilla.org/try/rev/6c9e7cba261d

I did fresh try pushes, and did a --no-artifact build but I get the same results. I did this without your patch set applied. I did look at the logs and it is running on the right device, so I believe you are doing the right stuff.

As a note, osx 10.14 isn't going live this week or soon; hopefully end of the month I can convert some tests to it and gradually get most if not all done by end of May.

I did more try pushes with my patch set applied on a much more recent m-c, in various constellations (some with logs, some without, some disabling RDD or AV1, some pure).

It's no longer green, but it's also not test_bug1113600.html failing anymore, instead it seems to be test_videoPlaybackQuality_totalFrames.html that's failing most of the time.

I don't know if this means the patches for bug 1536766 fixes the issue, or if something on the machines is not triggering it anymore. I guess we'll let it land, and you can check what's still failing.

The try runs are as follows,
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0ad9c4b7bc7334fab34e0331db9f6d3f2cc813ff
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d02955d18aee00e09602b3345b6b1fe171aced36
https://treeherder.mozilla.org/#/jobs?repo=try&revision=98f1c93cc16e78fed1562e331ed86ffa0f3c5cae
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d2ea1a583e9476f9c14cd18e34ff6eedcdc4d851
https://treeherder.mozilla.org/#/jobs?repo=try&revision=99c5e4734ed24cad9088563613d4526c60d5dbad
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2b80b48e4ebdec1e2adf2ca12123580adea7ad29

Depends on: 1536766

In bug 1536766, for test_streams_element_capture.html, on the mojave machines only, I've found that decoding sometimes fails with NS_ERROR_DOM_MEDIA_DECODE_ERR (0x806e0004) - mozilla::MediaResult mozilla::AppleATDecoder::DecodeSample(mozilla::MediaRawData *): Error decoding audio sample: 1650549857 @ 16000000

This is also for an mp4 file, as this timeout is. This could be the reason we're seeing timeouts here. We could add an error event handler to the media elements, but that will just clarify the error to automation and not fix the root problem. jya, any thoughts on what's up here?

Flags: needinfo?(jyavenard)

I've also seen some occasions of the same file (ambisonics.mp4) timing out without an error, like [1] which is on linux. Perhaps the linux version of the same underlying issue as in comment 12.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1380ca01961bf72e71a68c054fc77730d4ffa890&selectedJob=242062302

(In reply to Andreas Pehrson [:pehrsons] from comment #12)

In bug 1536766, for test_streams_element_capture.html, on the mojave machines only, I've found that decoding sometimes fails with NS_ERROR_DOM_MEDIA_DECODE_ERR (0x806e0004) - mozilla::MediaResult mozilla::AppleATDecoder::DecodeSample(mozilla::MediaRawData *): Error decoding audio sample: 1650549857 @ 16000000

This is also for an mp4 file, as this timeout is. This could be the reason we're seeing timeouts here. We could add an error event handler to the media elements, but that will just clarify the error to automation and not fix the root problem. jya, any thoughts on what's up here?

I can't understand why this would ever happen. You would think that when decoding that file everything is pre-deterministic. We always feed the exact same content to the decoder

I wonder if it could be with some underlying machines, could we get a list of the machines where those failures occurred at some stage and determine if there's a pattern?

Flags: needinfo?(jyavenard)
You need to log in before you can comment on or make changes to this bug.