Closed Bug 702870 Opened 8 years ago Closed 2 years ago

stalled event does not fire if stall occurs before metadata has loaded

Categories

(Core :: Audio/Video: Playback, defect)

defect
Not set

Tracking

()

RESOLVED INACTIVE

People

(Reporter: kinetik, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-needed)

Attachments

(1 file, 2 obsolete files)

The stalled event is fired from the progress event's timer callback if data has not been received in the last STALL_MS milliseconds.  The progress event's timer is started from nsBuiltinDecoder::MetadataLoaded.  Any stall occurring before metadata loading completes will not result in a stall event being fired.

The linked testcase triggers this bug.  The server sends only the HTTP headers before suspending the request for 3.5s.  See here: https://github.com/soundcloud/areweplayingyet/blob/master/server.js#L30
Arun, this bug might interest you. The relevant code is in http://mxr.mozilla.org/mozilla-central/source/content/media/nsBuiltinDecoder.cpp. Matthew, I looked at the testcase, but I find it difficult to tell what I was supposed to be looking at to see a result.
(In reply to Josh Matthews [:jdm] from comment #1)
> Matthew, I looked at the testcase, but I find it difficult to tell what
> I was supposed to be looking at to see a result.

Click run in the top left, after a few seconds you'll see a green tick or red cross replace the run button to indicate success or failure.
https://bugzilla.mozilla.org/show_bug.cgi?id=490705 seem to address some audio api features. Do I have to work on these lines?

I checked the code I have got with the entries in the post above. They appear quite different. Maybe the code is available in a different branch. Should I work on a branch other than mozilla-central for solving this issue?

\\ Regards,

Arun
nsMediaDecoder::Progress is the only place where stall function is called (function called DownloadStalled).

content/media/nsMediaDecoder.cpp is the file that is studied.

Please let me know if I am looking in the correct place.

Also do let me know if there is any specific traces that I can activate to do my debugging faster?

\\ Thanks,

Arun
Yes, that is the correct file. nsBuiltinDecoder::MetadataLoaded (which calls StartProgress) is invoked from http://mxr.mozilla.org/mozilla-central/source/content/media/nsBuiltinDecoderStateMachine.cpp#1803, which you should also look at.
(In reply to Arun [:arunsl] from comment #3)
> https://bugzilla.mozilla.org/show_bug.cgi?id=490705 seem to address some
> audio api features. Do I have to work on these lines?

That bug implemented an API to read/write audio directly with <audio> elements (see https://wiki.mozilla.org/Audio_Data_API); it's not related to this bug.

(In reply to Arun [:arunsl] from comment #4)
> nsMediaDecoder::Progress is the only place where stall function is called
> (function called DownloadStalled).
> 
> content/media/nsMediaDecoder.cpp is the file that is studied.
> 
> Please let me know if I am looking in the correct place.

Yes, plus what Josh pointed out, and content/html/content/src/nsHTMLMediaElement.cpp.

> Also do let me know if there is any specific traces that I can activate to
> do my debugging faster?

Set the environment variable NSPR_LOG_MODULES to nsMediaElement:5,nsMediaElementEvents:5 to enable the logging in nsHTMLMediaElement.cpp, which will log state changes and when events are fired.  This general mechanism is used all over the Firefox code, look for calls to the function PR_NewLogModule in the code you're interested in to find out the name of the logging module that's used for that code.  Also see http://www.mozilla.org/projects/nspr/reference/html/prlog.html.

The core problem in this bug is that the progress timer is started once the media's metadata loads (from nsBuiltinDecoder::MetadataLoaded), but it should be started when we first start loading the resource (when the nsHTMLMediaElement's network state changes to NETWORK_LOADING), as there may be quite a delay between starting the load and successfully loading the metadata, as in this testcase where the test webserver deliberately delays the data transfer.
areweplayingyet.org/sounds/long.ogg/stall is the media used, pasting for reference
I have tried to see what is happening currently and have commented out StartProgress in nsBuiltinDecoder::MetadataLoaded to test the scenario.

Then I had found that the case still fails. But interesting find was that nsIDOMHTMLMediaElement::NETWORK_LOADING is already set from nsHTMLMediaElement::FinishDecoderSetup before StartProgress was called from nsBuiltinDecoder::MetadataLoaded

And Matthew has told me that the element is destroyed by the test when it either passes (the stalled event is fired) or the test times out. So when StartProgress is commented eventually event is aborted failing the test. 

The printout of the abort section is enclosed (before this happened the state was nsIDOMHTMLMediaElement::NETWORK_LOADING):

287704864[7ffa1005d260]: Stream 7ff9e723d048 reading since there are free blocks
287704864[7ffa1005d260]: Stream 7ff9e723d048 DataReceived at 3622721 count=2896
287704864[7ffa1005d260]: Stream 7ff9e723d048 reading since there are free blocks
287704864[7ffa1005d260]: 7ff9e7095fa0 Changed state to SHUTDOWN
287704864[7ffa1005d260]: 7ff9ee998020 Dispatching event abort
287704864[7ffa1005d260]: nsIDOMHTMLMediaElement::NETWORK_EMPTY from nsHTMLMediaElement::AbortExistingLoads
287704864[7ffa1005d260]: 7ff9ee998020 Dispatching event emptied
287704864[7ffa1005d260]: nsIDOMHTMLMediaElement::NETWORK_NO_SOURCE from nsHTMLMediaElement::QueueSelectResourceTask
287704864[7ffa1005d260]: nsIDOMHTMLMediaElement::NETWORK_EMPTY from nsHTMLMediaElement::AbortExistingLoads
287704864[7ffa1005d260]: 7ff9ee998020 Dispatching event emptied
287704864[7ffa1005d260]: nsIDOMHTMLMediaElement::NETWORK_NO_SOURCE from nsHTMLMediaElement::QueueSelectResourceTask
287704864[7ffa1005d260]: 7ff9ee998020 ChangeDelayLoadStatus(1) doc=0x7ff9e9d61000
287704864[7ffa1005d260]: nsIDOMHTMLMediaElement::NETWORK_LOADING from nsHTMLMediaElement::SelectResource
287704864[7ffa1005d260]: 7ff9ee998020 Queuing event loadstart
287704864[7ffa1005d260]: nsIDOMHTMLMediaElement::NETWORK_NO_SOURCE from nsHTMLMediaElement::NoSupportedMediaSourceError
287704864[7ffa1005d260]: 7ff9ee998020 Queuing event error
287704864[7ffa1005d260]: 7ff9ee998020 ChangeDelayLoadStatus(0) doc=0x7ff9e9d61000
287704864[7ffa1005d260]: Destroying media state machine thread
287704864[7ffa1005d260]: Stream 7ff9e723d048 closed
287704864[7ffa1005d260]: 7ff9ee998020 Dispatching event loadstart



I am finding an appropriate place to fix the issue as it seems not related to state nsIDOMHTMLMediaElement::NETWORK_LOADING after all.
stacktrace from gdb shows the Shutdown called from load function.

 nsBuiltinDecoder::Shutdown() <- nsHTMLMediaElement::ShutdownDecoder() <- nsHTMLMediaElement::AbortExistingLoads() <- nsHTMLMediaElement::Load()

this may be in line with http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#concept-media-load-algorithm as suggested by Matthew.

But then eventually going to nsHTMLMediaElement::NoSupportedMediaSourceError may not be the desired flow.

Will try to see that area, why its happening.
Added the following log in nsHTMLMediaElement::SelectResource


      LOG(PR_LOG_DEBUG, ("%p Failed Trying load from src=%s", this, NS_ConvertUTF16toUTF8(src).get()));
      ReportLoadError("MediaLoadInvalidURI", params, ArrayLength(params));
    }
    NoSupportedMediaSourceError();


The log (debug logs with some extra printouts to pinpoint the issue) was as follows:

570722080[7f3620e5d260]: 7f35fef8e140 ChangeDelayLoadStatus(1) doc=0x7f35fab81800
570722080[7f3620e5d260]: nsIDOMHTMLMediaElement::NETWORK_LOADING from nsHTMLMediaElement::SelectResource
570722080[7f3620e5d260]: 7f35fef8e140 Queuing event loadstart
570722080[7f3620e5d260]: 7f35fef8e140 Failed Trying load from src=
570722080[7f3620e5d260]: nsIDOMHTMLMediaElement::NETWORK_NO_SOURCE from nsHTMLMediaElement::NoSupportedMediaSourceError
570722080[7f3620e5d260]: 7f35fef8e140 Queuing event error
570722080[7f3620e5d260]: 7f35fef8e140 ChangeDelayLoadStatus(0) doc=0x7f35fab81800
570722080[7f3620e5d260]: 7f35fef8e140 Dispatching event loadstart
570722080[7f3620e5d260]: 7f35fef8e140 Dispatching event error


It seems the src was blank, which is leading to this issue. Let me know your expert suggestions on the observation.
Sorry for the delay in getting back to you with a plan for this.  I've had a closer look now, and the basics of a plan follow:

Because this testcase stalls so early in the load (after sending HTTP headers, but before sending any data), very little of the decoder load/setup process has run, because it all hinges on when MediaLoadListener::OnStartRequest fires, which happens too late to start the timer and pass this test.

We'll need to set up the progress/stalled timer from nsHTMLMediaElement in all the places where the "loadstart" event is fired.  This will involve moving StartProgress, StopProgress, the Progress callback function, and the timer from nsMediaDecoder to nsHTMLMediaElement, and redirecting calls to nsMediaDecoder::{Start,Stop}Progress to the element.  Remove the existing call to StartProgress() from nsBuiltinDecoder, add new ones whereever "loadstart" is fired in nsHTMLMediaElement, then run the content/media mochitests and see what breaks.
Assignee: nobody → arun.sukumaran.latha
Status: NEW → ASSIGNED
Attached patch Patch version 1 (obsolete) — Splinter Review
The issue seem to be happening because of the network I was working in. I had tested now that the patch had worked in alternate network. Please confirm if this is working at your side.

PS: The debug level printouts are still on the file.
SetAudioCaptured is not the right place to call StartProgress, it's not logically related to SetAudioCaptured's responsibility.  Additionally, calling it from there (which, in turn, is called from FinishDecoderSetup) will start the progress/stalled timer too late to fix the bug revealed by this bug's testcase as I described in comment 11.
Attached patch Patch version 2 (obsolete) — Splinter Review
Progress is now monitored from HTMLmediaElement rather than decoder.

Please let me know if this patch helped.
Attachment #647113 - Attachment is obsolete: true
Attachment #647453 - Flags: feedback?(kinetik)
Comment on attachment 647453 [details] [diff] [review]
Patch version 2

This looks like the right approach, thanks.  Progress(false) needs to be called when StartProgress() is called at the start of the load to initialize mDataTime, so that the stalled time calculation works.  With that change, the linked testcase passes.  The content/media mochitests fail, though, because test_info_leak expects to see the loadstart event before a progress event.  We'll need to check the spec to verify that the test is making the right assumptions and then either change the test or move StartProgress/Progress to after loadstart.  I quickly made the second change, but the mochitests started failing elsewhere and I haven't had time to investigate.

You can run the mochitests yourself (as described here: https://developer.mozilla.org/en/Mochitest#Running_select_tests) and continuing investigating.  You only need to run the tests in content/media/tests at this stage.
Attachment #647453 - Flags: feedback?(kinetik) → feedback+
Progress function is moved back and code is cleaned of unwanted comments.
Attachment #647453 - Attachment is obsolete: true
Attachment #647843 - Flags: review?(kinetik)
Attachment #647843 - Flags: feedback?(kinetik)
As I explained in comment 15, the stalled timer needs to be started when the load is started for the linked testcase to pass.

The tricky issue with test_info_leak is that it starts loads for known 404 URIs and expects to see only two events: loadstart, and error.  With the change required to fire the stall event, it's possible for progress and stalled events to be delivered before the error event arrives, because the error event is raised from MediaLoadListener::OnStartRequest.  Obviously, we can't know the URIs are 404 before attempting to open them.

The relevant piece of the spec regarding starting the progress/stalled timer is: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#stall-timeout

I *think* it'd be acceptable to change the test to accept progress and stalled events, since I don't believe they'll leak anything useful.  It'd be interesting to look at what other UAs do here.
Comment on attachment 647843 [details] [diff] [review]
Feddback comments incorporated

Per comment 15 and comment 17, this bug is not addressed until the testcases passes, which it can't do until the changes I mentioned are made.

(In reply to Matthew Gregan [:kinetik] from comment #17)
> I *think* it'd be acceptable to change the test to accept progress and
> stalled events, since I don't believe they'll leak anything useful.  It'd be
> interesting to look at what other UAs do here.

progress events wouldn't make sense:

"While the load is not suspended (see below), every 350ms (±200ms) or for every byte received, whichever is least frequent, queue a task to fire a simple event named progress at the element."

So the test needs to be updated to accept stalled events, and the logic of the Progress callback needs to be tweaked so that stalled can fire during initial load setup, but progress events can't fire until the channel starts receiving data.  That requires initializing mDataTime but not hitting the mProgressTime.IsNull() test.
Attachment #647843 - Flags: review?(kinetik)
Attachment #647843 - Flags: review-
Attachment #647843 - Flags: feedback?(kinetik)
Duplicate of this bug: 875689
Assignee: arun.sukumaran.latha → nobody
Status: ASSIGNED → NEW
Duplicate of this bug: 891287
Component: Audio/Video → Audio/Video: Playback
Mass closing do to inactivity.
Feel free to re-open if still needed.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.