Closed Bug 910903 Opened 11 years ago Closed 9 years ago

Intermittent test_mediarecorder_avoid_recursion.html | Test timed out

Categories

(Core :: Audio/Video: Recording, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox37 --- wontfix
firefox38 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: RyanVM, Assigned: jwwang)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files, 2 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=27191476&tree=B2g-Inbound

b2g_emulator_vm b2g-inbound opt test mochitest-3 on 2013-08-29 13:09:51 PDT for push 47b18e2a60ed
slave: tst-linux64-ec2-040

13:23:47     INFO -  System JS : ERROR chrome://specialpowers/content/SpecialPowersObserverAPI.js:183
13:23:47     INFO -                       NS_ERROR_UNEXPECTED: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED) [nsIPrefBranch.getBoolPref]
13:23:47     INFO -  979 ERROR TEST-UNEXPECTED-FAIL | /tests/content/media/test/test_mediarecorder_avoid_recursion.html | Test timed out.
13:23:47     INFO -  980 INFO TEST-END | /tests/content/media/test/test_mediarecorder_avoid_recursion.html | finished in 304441ms
If this happens again, we should build a patch here that adds some info statements into this test to see if we understand where this is failing.
Assignee: nobody → rlin
Okay, that's a second failure then.

Randy - Can you create a patch that adds a bunch of info statements to this mochitest?

When we get another failure with this patch landed, that will help give us more information on what's failing here.
Keywords: regression
Interesting enough, this appears to be only failing on b2g emulator.
Ya, add some logs and run tryserver to check if something wrong. 
https://tbpl.mozilla.org/?tree=Try&rev=997f8f3ffc1a
Attached patch add logs (obsolete) — Splinter Review
Attachment #800603 - Flags: review?(jsmith)
Comment on attachment 800603 [details] [diff] [review]
add logs

Review of attachment 800603 [details] [diff] [review]:
-----------------------------------------------------------------

This is technically okay, although I might add a few more info statements when the media recorder starts and when requestData is called.
Attachment #800603 - Flags: review?(jsmith) → review+
One other nit - you need fix the commit message to have r=jsmith at the end of it.
Attached patch check-in patchSplinter Review
Fix nits, carry reviewer.
Whiteboard: checkin-needed
Keywords: checkin-needed
Whiteboard: checkin-needed → [leave open]
Attachment #800603 - Attachment is obsolete: true
Attachment #801123 - Flags: checkin+
Ah ha. That log gives good information on what the problem here is. It looks like what's happening here is that calling stop() on the media stream is failing to fire onstop intermittently on the media recorder using that stream.
Hmm. The logs seem so. I will try to write some test script to capture this symptom on my local emulator build.
Compare to normal log, it seems encoder return the first header blob but miss the 2nd one.
After 909670 fix, I think this bug should disappear.

Before that patch, MediaRecorder object might be destroyed before onstop event sent.

This statement ensures stop send to js eventhandler before session destroy.
http://dxr.mozilla.org/mozilla-central/source/content/media/MediaRecorder.cpp#l141

I would suggest close this issue unless we can repro it again
Flags: needinfo?(rlin)
Flags: needinfo?(jsmith)
Not sure about that - the TBPL failures seen here occurred after the landing of bug 909670, which means that the patch in bug 909670 may have not fixed this issue. The issue filed here was last seen in TBPL on 10/15/2013 on inbound, where as the patch in bug 909670 landed on 9/26/2013 on inbound.
Flags: needinfo?(jsmith)
This bug was found when the encoder fail to output encoded data to UA.
So it doesn't related to the cannot close browser timeout.
Flags: needinfo?(rlin)
Can't found this issue around two months. Switch to worksforme.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
No longer blocks: MediaRecording
It looks like the root cause is thread starving on b2g emulator. When timeouts happened, data available size was always zero which could be a result that the encoder thread never got a chance to run. I will come back at this bug after bug 916135 is fixed and content/media/test/* are enabled again.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Stop requesting data when count >= 30 to prevent the main thread from starving the encoding thread.
Assignee: rlin → jwwang
Attachment #8365828 - Flags: review?(jsmith)
Comment on attachment 8365828 [details] [diff] [review]
Fix thread starving in encoding thread for the main thread keeps queuing tasks.

Review of attachment 8365828 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nit

::: content/media/test/test_mediarecorder_avoid_recursion.html
@@ +28,1 @@
>        }

The concept looks fine, but let's minimize the returns here. Let's try something like:

if (count++ === 30) {
  stream.stop();
} else if (count < 30 && mediaRecorder.state === 'recording') {
  info('requestData again');
  mediaRecorder.requestData();
}
Attachment #8365828 - Flags: review?(jsmith) → review+
https://tbpl.mozilla.org/?tree=Try&rev=d5045e5816b7

if (count++ == 30) {
  info("stream.stop");
  stream.stop();
}

It is weird that the if statement is never true and the stream is not stopped and causes timeout. I have to say

++count;
if (count == 30) { ...}

to pass the test.

I wonder if this is a bug in Javascript.
Hi Ryan,
Please check in part1_fixThreadStarving-v3.patch and leave this bug open to see if this issue will occur again. Thansk.
Keywords: checkin-needed
https://hg.mozilla.org/integration/b2g-inbound/rev/0195888bb20e

I guess my concern with this is that I feel like we're working around an underlying problem in the platform by changing the test. Is this testcase contrived enough that we feel that thread starvation won't happen outside our automation?
Keywords: checkin-needed
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #42)
> https://hg.mozilla.org/integration/b2g-inbound/rev/0195888bb20e
> 
> I guess my concern with this is that I feel like we're working around an
> underlying problem in the platform by changing the test. Is this testcase
> contrived enough that we feel that thread starvation won't happen outside
> our automation?

Let's keep the bug open then focus on fixing the actual bug here. The patch landed here can just help remove the intermittent failure risk with the test.
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #42)
> https://hg.mozilla.org/integration/b2g-inbound/rev/0195888bb20e
> 
> I guess my concern with this is that I feel like we're working around an
> underlying problem in the platform by changing the test. Is this testcase
> contrived enough that we feel that thread starvation won't happen outside
> our automation?

No, it is totally valid for a user to write code like that. The problem is that the main thread keeps enqueing tasks (by calling requestData in the callback of ondataavailable) such that the lower priority threads (ex: encoding thread in this case) will not have a chance to acquire CPU for main thread is busy, as is worse in a single-core architecture (ex: emulator).

There are 2 follow-up work items here:
1. when requestData is called, there should be at least some progress (such that data size is not 0 again) in the encoding thread before calling ondataavailable again.
2. there should be a mechanism/tool to detect such thread starving conditions to help us find out more timeouts in mochitest.

Item 2 should deserve a new bug for sure. For item 1, is it preferred to fix it in this bug or open a new one to trace it?
That's really your decision to make. In general, we like to track one issue per bug. However, since comment 43 was a workaround to the real problem, then I think it's OK to continue working on #1 here (with comment 43 being reverted in whatever final patch lands).

And like you said, #2 definitely needs a new bug. Thanks again for all your work on this. Progress is definitely being made!
Component: Video/Audio → Video/Audio: Recording
No longer blocks: MediaRecording
Looks like this is fixed. Should it happen again, we can file another bug for it.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
Whiteboard: [leave open]
Target Milestone: --- → mozilla38
Blocks: 1228605
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: