Closed Bug 1006530 Opened 6 years ago Closed 6 years ago

[RTSP][V2.0] Closing a audio RTSP streaming via tab page causes system crash

Categories

(Firefox OS Graveyard :: RTSP, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(firefox29 wontfix, firefox30 fixed, firefox31 fixed, firefox32 fixed, b2g-v1.4 fixed, b2g-v2.0 fixed)

VERIFIED FIXED
2.0 S1 (9may)
Tracking Status
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- fixed
firefox32 --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: whsu, Assigned: ethan)

References

Details

(Keywords: crash, Whiteboard: [p=2][b2g-crash])

Crash Data

Attachments

(2 files, 2 obsolete files)

Attached video WP_20140506_013.mp4
* Description:
  Closing a RTSP streaming via tag page of browser. It will cause a system crash.
  Here are the CrashID
  - https://crash-stats.mozilla.com/report/index/0149e0eb-4003-455a-849c-972ca2140506
  Attach the demo video (WP_20140506_013.mp4)
  

* Reproduction steps:
  1. Launch the following page via browser
     - http://goo.gl/FyHFNs
  2. Tap the "AMX test file"
  3. Switch to tag page after RTSP streaming is ready to play.
  4. Tap "X" to close the web page

* Expected result:
  Web page can be closed

* Actual result:
  A system crash happened

* Reproduction build: V2.0 (Buri)
  - Gaia      e8a08a3f7a608993f0b302371e016e73faceea70
  - Gecko     https://hg.mozilla.org/mozilla-central/rev/2897fb554990
  - BuildID   20140505160203
  - Version   32.0a1
William,
What is the reproduction rate of this bug?
I cannot reproduce it on my Unagi.
Assignee: nobody → ettseng
Severity: normal → critical
Priority: -- → P1
The crash report shows that the object is destroyed at an android thread, not at main thread.

The class SendOnDisconnectedTask might break the rule "release on main thread" because it holds a reference to the object.

http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/controller/RtspController.cpp?from=RtspController.cpp#338

We should try debug build and modify line338 to sync dispatch. In this situation, the crash should be easily reproduced.
Duplicate of this bug: 1005867
OOPS~ I always can reproduce it on my buri device.
I can show you tomorrow. Thanks!
(In reply to Benjamin Chen [:bechen] from comment #2)
> We should try debug build and modify line338 to sync dispatch. In this
> situation, the crash should be easily reproduced.
I tried to run a debug build.
It did crash! But after the crash, my Unagi cannot boot anymore and I re-flashed it to a normal build to boot it.
However, I think this result reveals the root cause could quite possibly be this line.
(In reply to William Hsu [:whsu] from comment #4)
> OOPS~ I always can reproduce it on my buri device.
> I can show you tomorrow. Thanks!
Never mind. I think we are pretty close to the root cause. :)
BTW, I will also try to reproduce this bug on a Buri phone.
Hi Steve,

We think the root cause of this crash is a threading/timing issue.
Could you help to clarify our thoughts?

The crash report reveals the crash occurs at this line:
http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/controller/RtspController.cpp?from=RtspController.cpp#338

327 RtspController::OnDisconnected(uint8_t index,
328                                nsresult reason)
329 {
330   LOG(("RtspController::OnDisconnected() for track %d reason = 0x%x", index, reason));
331   mState = DISCONNECTED;
332   if (mListener) {
333     nsRefPtr<SendOnDisconnectedTask> task =
334       new SendOnDisconnectedTask(mListener, index, reason);
335     // Break the cycle reference between the Listener (RtspControllerParent) and
336     // us.
337     mListener = nullptr;
338     return NS_DispatchToMainThread(task);
339   }
340   return NS_ERROR_NOT_AVAILABLE;
341 }

The member "mURI" of RtspControllerParent is declared as "nsCOMPtr<nsIURI>" and cannot be released on an off-main-thread.
I imagine line 338 will do the following things:
1) NS_DispatchToMainThread(task) executes SendOnDisconnectedTask::Run() on the main thread, which calls RtspControllerParent's OnDisconnected() that triggers a sequence of upstream release of RTSP objects.
2) After NS_DispatchToMainThread() returns, the local variable "task" leaves its scope and is being destructed, which in turn releases its reference count to mListener (RtspControllerParent).

So I guess if 1) is completed before 2), 2) decreases the ref-count of RtspControllerParent to zero and releases it here, which is running on an off-main-thread!
And the reason why we don't encounter this crash previously is because 2) usually completes before 1).

If my guess is true, I can think of two possible solutions.
a. Declare SendOnDisconnectedTask::mListener as nsMainThreadPtrHandle instead of nsCOMPtr.
b. Use forget() mechanism, such as NS_DispatchToMainThread(task.forget());

How do you think?
Is our guess and solution sane?
Flags: needinfo?(sworkman)
Wow - this is a fun one!

I think the race condition you describe sounds plausible. But I think the solution should include nsStandardURL.

task.forget() might help, or else just declaring "new SendOnDisconnectedTask" within the function call:
NS_DispatchToMainThread(new SendOn....);

But this only fixes this instance of RtspControllerParent being released off main thread.

I think it is better to do a proxy release of RtspControllerParent::mURI. Try NS_ProxyRelease in ~RtspControllerParent. This way, you avoid mURI causing crashes in the future, because it can't be released off main thread. I think it's worth trying this anyway.
Flags: needinfo?(sworkman)
blocking-b2g: --- → 2.0?
Status: NEW → ASSIGNED
Steve,
We found a good example in MediaResource::Destroy(), which is added by bug 975858.
I just applied the same solution as there.
This patch would ensure RtspControllerParent is released on the main thread.

Could you help to review it?
Attachment #8419242 - Flags: review?(sworkman)
Attachment #8419242 - Flags: feedback-
Comment on attachment 8419242 [details] [diff] [review]
Using proxy release to ensure RtspControllerParent is released on the main thread

Hi William,

Please help to apply this patch and verify whether it fixes the crash or not.
Thanks!
Attachment #8419242 - Flags: feedback- → feedback?(whsu)
Whiteboard: [p=2]
Target Milestone: --- → 2.0 S2 (23may)
Hi, Ethan,

Roger that! Thanks for the prompt help! :)
(In reply to Ethan Tseng [:ethan] from comment #9)
> Steve,
> This patch would ensure RtspControllerParent is released on the main thread.
> Could you help to review it?
I just filed bug 1007577 in order to remove nsIInterfaceRequestor from RtspControllerParent.
I will rebase either the patch in bug 1007577 or in this bug depending on which one is landed first.
(In reply to Ethan Tseng [:ethan] from comment #9)
> Created attachment 8419242 [details] [diff] [review]
> Using proxy release to ensure RtspControllerParent is released on the main
> thread
> 
> Steve,
> We found a good example in MediaResource::Destroy(), which is added by bug
> 975858.
> I just applied the same solution as there.
> This patch would ensure RtspControllerParent is released on the main thread.
If the patch ensures the RtspControllerParent is released on the main thread,
you might need to remove the nsMainThreadPtrHandle reference to RtspControllerParent?
http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/controller/RtspController.h?from=rtspcontroller.h#40
(In reply to Benjamin Chen [:bechen] from comment #13)
> If the patch ensures the RtspControllerParent is released on the main thread,
> you might need to remove the nsMainThreadPtrHandle reference to
> RtspControllerParent?
> http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/
> controller/RtspController.h?from=rtspcontroller.h#40

Yes! Thanks for your reminder!
Comment on attachment 8419242 [details] [diff] [review]
Using proxy release to ensure RtspControllerParent is released on the main thread

Review of attachment 8419242 [details] [diff] [review]:
-----------------------------------------------------------------

Ooo. That's a nice fix. Good job. r=me.

Please remove the nsMainThreadPtrs as Benjamin suggests.
Attachment #8419242 - Flags: review?(sworkman) → review+
1. Address comment 14.
2. Refresh commit message "r=sworkman"
Attachment #8419242 - Attachment is obsolete: true
Attachment #8419242 - Flags: feedback?(whsu)
Attachment #8419867 - Flags: review+
Fix a careless compile error.
Attachment #8419867 - Attachment is obsolete: true
Attachment #8419887 - Flags: review+
The result of TBPL:
https://tbpl.mozilla.org/?tree=Try&rev=5b72159fe3b7

Trychooser syntax:
try: -b do -p all -u none -t none
This patch should not affect any test cases. Just run builds for all platform.
Keywords: checkin-needed
(In reply to Steve Workman [:sworkman] from comment #15)
> Review of attachment 8419242 [details] [diff] [review]:
> -----------------------------------------------------------------
> Ooo. That's a nice fix. Good job. r=me. 
> Please remove the nsMainThreadPtrs as Benjamin suggests.

Thanks! Steve. :)
https://hg.mozilla.org/mozilla-central/rev/fcfc9fedb6b4
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: 2.0 S2 (23may) → 2.0 S1 (9may)
Duplicate of this bug: 1009032
Ethan, a duplicate of this was seen on v28 - does it need to be uplifted?
Flags: needinfo?(ettseng)
(In reply to Steve Workman [:sworkman] from comment #24)
> Ethan, a duplicate of this was seen on v28 - does it need to be uplifted?
Yes! I think so.
I didn't uplift a bug before. Could you do it for me? I will learn it.
Flags: needinfo?(ettseng)
Ah, it's very simple actually, and you need to answer some questions, so better if you do it.

Click on 'Details' beside the attached patch, and then select approval-aurora and approval-beta - choose '?'. Some questions will be auto-filled in the comment box - just answer those and the release engineering will decide if it can be approved or not. Then, RyanVM or another code sheriff will do the landing.

At least, that's how I did it before :)
Comment on attachment 8419887 [details] [diff] [review]
Using proxy release to ensure RtspControllerParent is released on the main thread

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 1009032
User impact if declined: occasional system crash on B2G
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): none
String or IDL/UUID changes made by this patch: none
Attachment #8419887 - Flags: approval-mozilla-beta?
Attachment #8419887 - Flags: approval-mozilla-aurora?
(In reply to Steve Workman [:sworkman] from comment #26)
> Ah, it's very simple actually, and you need to answer some questions, so
> better if you do it.

Just did it.
Thanks for taking the time to explain this. :)
Thanks for the help!
I verified this patch.
No crash happened during verification.
Good Job!

* Build Information:
 - Gaia      2f89c43e798ccba631025bedc47a1fb24e830cf2
 - Gecko     https://hg.mozilla.org/mozilla-central/rev/4b6d63b05a0a
 - BuildID   20140512160204
 - Version   32.0a1
Status: RESOLVED → VERIFIED
Crash Signature: [@ nsStandardURL::Release() ]
Keywords: crash
Whiteboard: [p=2] → [p=2][b2g-crash]
(In reply to William Hsu [:whsu] from comment #29)
> Thanks for the help!
> I verified this patch.
> No crash happened during verification.
> Good Job!

So glad to hear this!
Thanks for your verification. :)
Comment on attachment 8419887 [details] [diff] [review]
Using proxy release to ensure RtspControllerParent is released on the main thread

low risk, crash fix approving for uplift. This has existed since fx28.
Attachment #8419887 - Flags: approval-mozilla-beta?
Attachment #8419887 - Flags: approval-mozilla-beta+
Attachment #8419887 - Flags: approval-mozilla-aurora?
Attachment #8419887 - Flags: approval-mozilla-aurora+
blocking-b2g: 2.0? → ---
Blocks: b2g-RTSP-2.0
You need to log in before you can comment on or make changes to this bug.