Closed Bug 1242774 Opened 8 years ago Closed 8 years ago

Seeking in media with video overhang can crash browser

Categories

(Core :: Audio/Video: Playback, defect, P1)

46 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla47
Tracking Status
firefox45 --- unaffected
firefox46 + verified
firefox47 + verified

People

(Reporter: bryce, Assigned: alwu)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(5 files, 6 obsolete files)

Attached video VideoOverhang.webm
When seeking in a media with video longer than the audio a null pointer can occur. This can be done manually by rapidly seeking around where the audio finishes in such a file.

To do this with the attached file rapidly seek around the 4 second mark (as the 5 is displayed). This appears to be reproducible if the file is streamed or read from disk.

This appears to affect the nightly (46+), but not Aurora or Beta.
Can't repro on Linux nightly. What is your platform?
Flags: needinfo?(bvandyk)
(In reply to JW Wang [:jwwang] from comment #1)
> Can't repro on Linux nightly. What is your platform?

OSX
Flags: needinfo?(bvandyk)
Attached file Crash stack
The crash stack points to a MozPromise crash in a cubeb callback:

Thread 71 Crashed:: com.apple.audio.IOThread.client
0   XUL                           	0x0000000102a95251 void mozilla::MozPromise<bool, nsresult, false>::Private::Resolve<bool&>(bool&&&, char const*) + 17 (Mutex.h:69)
1   XUL                           	0x0000000102a8f667 mozilla::media::DecodedAudioDataSink::Drained() + 103 (RefPtr.h:40)
2   XUL                           	0x00000001029b86e7 mozilla::AudioStream::StateCallback(cubeb_state) + 199 (Mutex.h:75)
3   XUL                           	0x0000000103a2c739 audiounit_output_callback + 201 (cubeb_audiounit.c:133)
...
Keywords: crash
Assignee: nobody → alwu
Attached file Log
[Crash reason]
As the audio backend in OSX calls drain() twice, then we tried to resolve the promise which had already been resolved and released in previous drain().

[Solution]
Replace Resolve() with ResolveIfExists().
Attached file Part 2 : Add the crash test (obsolete) —
We have an easier reproducing way, and it's also used in my crash test.

STR.
1. Play video
2. Pause video when it becomes silent (blue screen)
3. Play video again 
4. *Crash*
Attached patch Part 2 : Add the crash test (obsolete) — Splinter Review
Only add crash test to other platforms, 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2898d3b3e797
Attachment #8715722 - Attachment is obsolete: true
Comment on attachment 8715721 [details] [diff] [review]
Part 1 : Don't resolve promise again when the callback calls drain()

Hi, JW,
Could you help me review these patches?
I'll modify the comment in Drained() after getting the try-server result in comment9, then I would file a new bug and mention what platform has this problem. 
Very appreciate!
Attachment #8715721 - Flags: review?(jwwang)
Attachment #8716145 - Flags: review?(jwwang)
Attachment #8715721 - Attachment is patch: true
Attachment #8716145 - Attachment is patch: true
Comment on attachment 8715721 [details] [diff] [review]
Part 1 : Don't resolve promise again when the callback calls drain()

Review of attachment 8715721 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/mediasink/DecodedAudioDataSink.cpp
@@ +278,5 @@
>  void
>  DecodedAudioDataSink::Drained()
>  {
>    SINK_LOG("Drained");
> +  // In osx, the audio backend would triggled Drained() twice, then it would

s/would triggled Drained/could trigger Drained/.

'would' means it will happen for sure.

@@ +280,5 @@
>  {
>    SINK_LOG("Drained");
> +  // In osx, the audio backend would triggled Drained() twice, then it would
> +  // cause the crash because the promise had already been resolve and free.
> +  // We'll need to solve it in bugXXXXXX.

'bugXXXXXX' means nothing. You can use 'FIXME' or 'TODO' to indicate a future work item.
Attachment #8715721 - Flags: review?(jwwang) → review+
Comment on attachment 8716145 [details] [diff] [review]
Part 2 : Add the crash test

Review of attachment 8716145 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/test/crashtests/audio-drained.html
@@ +1,1 @@
> +<html class="reftest-wait">

You need to call document.documentElement.removeAttribute("class") to finish the test. See oscillator-ended-1.html for an example.

@@ +1,3 @@
> +<html class="reftest-wait">
> +<head>
> +  <title>Bug 1242774 : video crashed if replay after audio drained</title>

Avoid using a term like 'drained' which is a term used by underlying implementation. You can say something like "video crashed if pause and play again after audio track ends".

@@ +9,5 @@
> +    dump("### Error : " + msg + "\n");
> +  }
> +}
> +
> +var SILENT_PART_START_TIME = 4.5;

Let's call this "AUDIO_END_TIME".

@@ +11,5 @@
> +}
> +
> +var SILENT_PART_START_TIME = 4.5;
> +var video = document.createElement('video');
> +video.src = "videoCrash.webm";

You need to "hg add" this file.

@@ +16,5 @@
> +video.play();
> +
> +video.ontimeupdate = function () {
> +  assert(SILENT_PART_START_TIME < video.duration,
> +         "Silent start time shouldn't beyond the duration!");

should be smaller than the duration.
Attachment #8716145 - Flags: review?(jwwang) → review-
(In reply to JW Wang [:jwwang] from comment #12)
> @@ +11,5 @@
> > +}
> > +
> > +var SILENT_PART_START_TIME = 4.5;
> > +var video = document.createElement('video');
> > +video.src = "videoCrash.webm";
> 
> You need to "hg add" this file.
> 
It's already added, but it doesn't be showed in "review" mode.
Choose "overview" then you would see that file.
(In reply to JW Wang [:jwwang] from comment #12)
>
> You need to call document.documentElement.removeAttribute("class") to finish
> the test. See oscillator-ended-1.html for an example.
> 

I think that is only needed when you add some css attribute in the element. [1]

[1] removeAttribute : http://www.w3schools.com/jsref/met_element_removeattribute.asp
Do you mean the test can finish correctly without calling document.documentElement.removeAttribute("class") like oscillator-ended-1.html?
Now I am waiting the result on window, now only OSX crashed.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e58d3d1d2543&group_state=expanded
Attached patch Part 2 : Add the crash test (obsolete) — Splinter Review
Attachment #8716145 - Attachment is obsolete: true
Attachment #8716194 - Flags: review?(jwwang)
Comment on attachment 8716194 [details] [diff] [review]
Part 2 : Add the crash test

Review of attachment 8716194 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/media/test/crashtests/video-replay-after-audio-end.html
@@ +11,5 @@
> +}
> +
> +var AUDIO_END_TIME = 4.5;
> +var video = document.createElement('video');
> +video.src = "videoCrash.webm";

The convention is lowercase like "video-crash.webm".
Attachment #8716194 - Flags: review?(jwwang) → review+
The test also crashed on the Window, but the crash point isn't the same [1].

In the crash stack, we can find the following stack
> 2  xul.dll!mozilla::AudioStream::Resume() [AudioStream.cpp:e58d3d1d2543 : 471 + 0x8]
> 3  xul.dll!mozilla::media::DecodedAudioDataSink::SetPlaying(bool)

Is it correct to resume the AudioStream after drained?

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=16373175&repo=try
Flags: needinfo?(jwwang)
Try-server result for all patches, we can see whether the crash happens on Window after applying the patch.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8f91bbf7a22f
It looks like to me wasapi should handle the case where wasapi_stream_start() is called after drained correctly.
Flags: needinfo?(jwwang)
See Also: → 1246104
See Also: → 1246108
Attachment #8716194 - Attachment is obsolete: true
Attachment #8716233 - Flags: review+
Attachment #8716232 - Flags: review+
Keywords: checkin-needed
Attachment #8716233 - Attachment is patch: true
https://hg.mozilla.org/mozilla-central/rev/b1c27c182412
https://hg.mozilla.org/mozilla-central/rev/561d4d620aa3
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Backed out the crashtest in https://hg.mozilla.org/integration/mozilla-inbound/rev/2914d1a8153c - apparently Linux32 thinks it's a little aggressive with memory, or leaves things around, or something: caused at least bug 1247122 and bug 1247506 and I think there are some other flavors of the same OOMish thing that we didn't yet get around to filing.
Blocks: 1247122, 1247506
Flags: in-testsuite?
per comment27, skip this test on Linux.
Attachment #8716233 - Attachment is obsolete: true
Attachment #8719359 - Flags: review+
Attachment #8719359 - Attachment description: Add the crash test. r=jwwang. → Part 2 : Add the crash test. r=jwwang.
Attachment #8719359 - Attachment is patch: true
Keywords: checkin-needed
(In reply to Alastor Wu [:alwu] from comment #28)
> Created attachment 8719359 [details] [diff] [review]
> Part 2 : Add the crash test. r=jwwang.
> 
> per comment27, skip this test on Linux.

Please file follow-up bugs to ensure they will be fixed on Linux and Windows.
Blocks: 1248313
Open the bug1248313 to track it.
Keywords: checkin-needed
backed out for causing https://treeherder.mozilla.org/logviewer.html#?job_id=21739600&repo=mozilla-inbound and others. please conside a try run before  next landing try
Flags: needinfo?(alwu)
All the tests are green light now.
Flags: needinfo?(alwu)
Keywords: checkin-needed
MozReview-Commit-ID: EBRigaPn5Du
Attachment #8719359 - Attachment is obsolete: true
Comment on attachment 8720324 [details] [diff] [review]
Part 2 : Add the crash test. r=jwwang.

Sorry, I forgot to update the new patch....
Attachment #8720324 - Attachment description: add crash test → Part 2 : Add the crash test. r=jwwang.
Flags: needinfo?(alwu)
Attachment #8720324 - Flags: review+
The try result is in the comment34, the reason of the back-out is I forgot to update the new patch.
Sorry!
Keywords: checkin-needed
Comment on attachment 8716232 [details] [diff] [review]
Part 1 : Don't resolve promise again when the callback calls drain(). r=jwwang.

Approval Request Comment
[Feature/regressing bug #]:948267
[User impact if declined]:crash might happen during playback when video duration is longer than audio duration.
[Describe test coverage new/current, TreeHerder]:TreeHerder
[Risks and why]: low, the change is simple and has been Aurora and Central for quite a while.
[String/UUID change made/needed]:none
Attachment #8716232 - Flags: approval-mozilla-beta?
Comment on attachment 8716232 [details] [diff] [review]
Part 1 : Don't resolve promise again when the callback calls drain(). r=jwwang.

Crash fix, recent regression (46), includes a test. 
Yay, put it on beta too please so we don't ship this.
Attachment #8716232 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Blocks: 948267
This crash accounted for 1539 of 4941 of crashes recorded under signature mozilla::MozPromise<T>::Private::Resolve<T> in the period 2016-03-01 -> 2016-03-24 in 46 beta.
Crash Signature: [@ mozilla::MozPromise<T>::Private::Resolve<T>]
I have 3037 crashes in beta 2 with this signature. This fix landed the day we shipped beta 2; there wasn't a beta 3, so it should be in beta 4. We don't have a ton of data yet for beta 4 but I'd love for us to verify this fix in a couple of days.
Flags: qe-verify+
This signature has disappeared past beta 2. Marking verified.
Status: RESOLVED → VERIFIED
Crash volume for signature 'mozilla::MozPromise<T>::Private::Resolve<T>':
 - nightly (version 50): 1 crash from 2016-06-06.
 - aurora  (version 49): 0 crashes from 2016-06-07.
 - beta    (version 48): 4 crashes from 2016-06-06.
 - release (version 47): 35 crashes from 2016-05-31.
 - esr     (version 45): 1 crash from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       0       0       0       0       1       0       0
 - aurora        0       0       0       0       0       0       0
 - beta          0       0       0       1       1       1       1
 - release       5       4       6       7       5       5       0
 - esr           1       0       0       0       0       0       0

Affected platform: Windows
So this is fixed in 47 but we're still seeing it in 48 and 50?
Flags: needinfo?(lhenry)
Yes, I think that here, the bot should not be reporting on this bug.... The threshhold of having a few crashes like this doesn't seem worth reporting.  Calixte what do you think? I'm seeing a lot of this kind of thing (Months untouched, fixed bugs with a few crashes on release channel, getting a bot update)
Flags: needinfo?(lhenry) → needinfo?(cdenizet)
I think it is noise (and would like to be able to explain why but this is a different story...)
Flags: needinfo?(cdenizet)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: