Closed Bug 1121855 Opened 7 years ago Closed 6 years ago

[FFOS7715 v2.1][camera]monkey test crash at libxul.so!mozilla::DOMMediaStream::Destroy() [DOMMediaStream.cpp : 156 + 0x2]

Categories

(Firefox OS Graveyard :: Gaia::Camera, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(blocking-b2g:2.1S+, firefox39 fixed, b2g-v2.1S fixed, b2g-v2.2 fixed, b2g-v2.2r fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S8 (20mar)
blocking-b2g 2.1S+
Tracking Status
firefox39 --- fixed
b2g-v2.1S --- fixed
b2g-v2.2 --- fixed
b2g-v2.2r --- fixed
b2g-master --- fixed

People

(Reporter: jinchao.wang, Assigned: behsieh)

References

Details

(Whiteboard: [SPRD 392127](17times_3rd))

Attachments

(3 files, 6 obsolete files)

Description:monkey test crash at libxul.so!mozilla::DOMMediaStream::Destroy() [DOMMediaStream.cpp : 156 + 0x2]

Device: SPRD 7715ea

Steps to Reproduce:
1) after 11 hours monkey test

Actual Result:no crash

Expected Result:happen crash

Repro frequency:3/10

Analysis: It will happen the same crash in camera about 3/10.It's easy to reproduce in monkey test.
Whiteboard: [sprd 392127]
Can you have a look about this bug?
Flags: needinfo?(wilsonpage)
Attached file backtrace
Whiteboard: [sprd 392127] → [SPRD 392127]
I'm a front-end engineer, not Gecko. Perhaps Andrew can make sense of the logs you have attached.
Flags: needinfo?(wilsonpage) → needinfo?(aosmond)
Hi Steven, Shawn, this issue happens a lot during the monkey test hence it blocks the test to be complete. Could you find someone to check this one?

Thanks
blocking-b2g: --- → 2.1S?
Flags: needinfo?(styang)
Flags: needinfo?(sku)
full log:http://pan.baidu.com/s/1gdxPZMV
hi jinchao:
thanks for your information.
some questions needs your help
1. did you run it on branch v2.1s
2. Can you provide your commit ID of gecko and gaia?
3. in your repo frequency , you reproduce this issue in 3 times, can you also attached these 3 crash backtrace?
thanks
Flags: needinfo?(jinchao.wang)
(In reply to becker hsieh{:behsieh} from comment #6)
> hi jinchao:
> thanks for your information.
> some questions needs your help
> 1. did you run it on branch v2.1s
> 2. Can you provide your commit ID of gecko and gaia?
> 3. in your repo frequency , you reproduce this issue in 3 times, can you
> also attached these 3 crash backtrace?
> thanks

Hi becker hsieh:
1. did you run it on branch v2.1s
-- Although mozilla has had v2.1s branch, but spreadtrum didn't build v2.1s branch. So we didn't run it on branch v2.1s.
2. Can you provide your commit ID of gecko and gaia?
-- B2G-flash-tool-master$ ./check_versions.sh 
Gaia-Rev        836e6d74cb8b7016df555f85445893b3ff9aac12
Gecko-Rev       1a13affb168d9d2810f9ea9a2ec9c5cd188bbfec
Build-ID        20150113141649
Version         34.0
Device-Name     scx15_sp7715ea
FW-Release      4.4.2
FW-Incremental  68
FW-Date         Tue Jan 13 14:12:57 CST 2015
3.in your repo frequency , you reproduce this issue in 3 times, can you also attached these 3 crash backtrace?
-- As in comment 5, the log is provided by our test group, the result of log is complete. But the most time it is reproduced by RD, the RD's log is not complete, need we do some processing,and you can see comment 2. I don't tend to provide RD's log. If our test group reproduce this bug, I will provide you with the latest log as soon as possible.
Flags: needinfo?(jinchao.wang)
Just to clarify that although SPRD is not building from v2.1s yet, they definitely work on 2.1 branch for now since they start to work on 7715/FFOS V2.1 long before the 2.1s is branched out
Per comment 7, keep ni on jinchao.
Once backtrace/log are ready, please feel free to ni Backer or me.
Flags: needinfo?(sku) → needinfo?(jinchao.wang)
Blocks: 1123554
Attached patch Camera_debug.patch (obsolete) — Splinter Review
add debug log
hi jinchao:
we add some debug logs in gecko
can you please run monkey test base on this patch?
thanks
Hi Jinchao, Siiaceon, I would suggest you put this debug code into your local repository such that all the software you build moving forward will contain this patch. I think this can help us gather information while running the Monkey test

Thanks
Flags: needinfo?(siiaceon.cao)
(In reply to becker hsieh{:behsieh} from comment #11)
> hi jinchao:
> we add some debug logs in gecko
> can you please run monkey test base on this patch?
> thanks

(In reply to Vance Chen [:vchen][vchen@mozilla.com] from comment #12)
> Hi Jinchao, Siiaceon, I would suggest you put this debug code into your
> local repository such that all the software you build moving forward will
> contain this patch. I think this can help us gather information while
> running the Monkey test
> 
> Thanks


Hi  becker hsieh & Vance:
   I will apply this patch to our code today and  run monkey test base on this patch. Thanks for your feedback.
Flags: needinfo?(jinchao.wang)
Assignee: nobody → behsieh
Attached patch enable_gecko_log.patch (obsolete) — Splinter Review
hi jinchao:
i also enable camera gecko log.
can you please help to apply this patch and built libxul.so
here is step
1. cd B2G/gecko
2. patch -p1 < enable_gecko_log.patch
3. cd B2G/scripts
4. ./fastxul.sh
please must be sure the device that you run monkey test include these 2 patches.
thanks
(In reply to becker hsieh{:behsieh} from comment #14)
> Created attachment 8552852 [details] [diff] [review]
> enable_gecko_log.patch
> 
> hi jinchao:
> i also enable camera gecko log.
> can you please help to apply this patch and built libxul.so
> here is step
> 1. cd B2G/gecko
> 2. patch -p1 < enable_gecko_log.patch
> 3. cd B2G/scripts
> 4. ./fastxul.sh
> please must be sure the device that you run monkey test include these 2
> patches.
> thanks

Hi becker hsieh:
    I will do it.
Flags: needinfo?(siiaceon.cao)
Flags: needinfo?(styang)
Jinchao, fastxul.sh is a script under B2G repos. 

If you are sync form https://github.com/sprd-ffos/B2G.git or your internal repo.

Please remember to merge the update form https://github.com/mozilla-b2g/B2G
jinchao please have a quick feedback once has the result, thanks for your hard work
Flags: needinfo?(jinchao.wang)
(In reply to siiaceon from comment #17)
> jinchao please have a quick feedback once has the result, thanks for your
> hard work

Hi becker hsieh:

   Our test group has reproduced this crash problem under the patch.
   I uploaded the log to this address: http://pan.baidu.com/s/1dDEjvEh
   In this address,repoduce1_394969.tar.bz2 is the log. bug394969_gecko_add_log_for_camera_crash_1.patch is the patch I apply.

   PS:
   I didn't use "./fastxul.sh", I apply this patch to our code, then full build, then flash the full pac to the phone. Is it effective?
Flags: needinfo?(jinchao.wang) → needinfo?(behsieh)
hi jimchao:
per discussion , since there is no gecko_camera key word inside full log.
can you reattached new version of log once you reproduce the issue?
thanks
Flags: needinfo?(behsieh)
(In reply to becker hsieh{:behsieh} from comment #19)
> hi jimchao:
> per discussion , since there is no gecko_camera key word inside full log.
> can you reattached new version of log once you reproduce the issue?
> thanks

Hi becker hsieh:
    Our test group has reproduced this issue again. It seems that it's easily to reproduce. Please help to solve this bug. I will cooperate with you.

    log address: http://pan.baidu.com/s/1sjx6aC1
    The log contain "gecko_camera" in CameraCommon.h and CameraDebug in DOMMediaStream.cpp.
    I am sorry that our test group don't sync to our local time every time.
Flags: needinfo?(aosmond) → needinfo?(behsieh)
Whiteboard: [SPRD 392127] → [SPRD 392127](17times_3rd)
Frequently hit. blocking...
blocking-b2g: 2.1S? → 2.1S+
Attached patch Camera_Crash_debug_0202.patch (obsolete) — Splinter Review
Hi jinchao:
per discussion , attached file which debug for camera crash , i already integrate CameraCommon.h inside this patch,so please use this patch only will be ok.
thanks
Flags: needinfo?(behsieh)
jinchao pls inform once result bring out
Flags: needinfo?(jinchao.wang)
(In reply to becker hsieh{:behsieh} from comment #22)
> Created attachment 8557826 [details] [diff] [review]
> Camera_Crash_debug_0202.patch
> 
> Hi jinchao:
> per discussion , attached file which debug for camera crash , i already
> integrate CameraCommon.h inside this patch,so please use this patch only
> will be ok.
> thanks

Hi becker:
    Using Camera_Crash_debug_0202.patch, We reproduced this crash on our clean version, full log is :  http://pan.baidu.com/s/1o6DlsQu
    please, thanks very much.
Flags: needinfo?(jinchao.wang) → needinfo?(behsieh)
If a nsDOMCameraControl instance is referenced by no one, unlink will not happen before deleting it.
Attached file debug_log.txt
Hi kyle:
we need your help on this issue.
We found there is a crash case which is not process unlink and went to destruct directly.

the normal behavior like below
construct A-> construct B-> unlink A->destruct A

in fail case we found behavior like below
construct A->construct B-> unlink A->construct C->construct D->destruct C
object C which is not process unlink before destruct and object A not process destruct. 
below is log ,I put [MozDEBUG] comment for easily to check.

please check attached file
thanks
Flags: needinfo?(behsieh) → needinfo?(khuey)
I don't entirely understand your question, but it might be relevant that destruction of cycle-collected objects happens asynchronously after unlinking.  That is, it's likely that the sequence of events in the program is

construct A->construct B-> unlink A->construct C->construct D->destruct C->CRASH->destruct A

Where if we didn't crash the destructor of A would run sometime later.

If you want more help than that please be more specific about what A, B, etc are.  I'm finding this hard to follow.
Flags: needinfo?(khuey)
in normal case
when 1st camera launch aDOMCameraControl will be register like below
gecko_camera: >>> Register ( aDOMCameraControl = 0xb1805e00 ) mWindowId = 0x4
leave camera aDOMCameraControl will not destruct immediately(destruct is control by cyclecollect)

2nd launch camera
gecko_camera: >>> Register ( aDOMCameraControl = 0xb1805d00 ) mWindowId = 0x4
2nd leave camera
do unlink
Gecko   : CameraDebug DOMMediaStream::Destroy E DOMMediastream=0xb1805e00 GetHintContents:0x2 <--[MozDEBUG]unlink DOMCameraControl 0xb1805e00
https://dxr.mozilla.org/mozilla-central/source/dom/media/DOMMediaStream.cpp?from=DOMMediastream.cpp&case=true#108
do destruct 
gecko_camera: virtual mozilla::nsDOMCameraControl::~nsDOMCameraControl():235 : this=0xb1805e00 <--[MozDEBUG]destruct DOMCameraControl 0xb1805e00
https://dxr.mozilla.org/mozilla-central/source/dom/media/DOMMediaStream.cpp?from=DOMMediastream.cpp&case=true#151


in fail case causes crash
1st launch camera
gecko_camera: >>> Register( aDOMCameraControl = 0xb1a11f00 ) mWindowId = 0x7 <--[MozDEBUG]construct aDOMCameraControl 0xb1a11f00

2nd lauch camera
gecko_camera: >>> Register( aDOMCameraControl = 0xb2e48600 ) mWindowId = 0x7 <-- [MozDEBUG]construct aDOMCameraControl 0xb2e48600
leave camera 
do unlink first
Gecko   : CameraDebug DOMMediaStream::Destroy E this=0xb1a11f00 GetHintContents:0x2 <-- [MozDEBUG]unlink aDOMCameraControl 0xb1a11f00
after that , should destruct 0xb1a11f00
but destruct dosent trigger by cyclecollect.
and next , there are a lot of incorrect operate.

3rd launch camera
gecko_camera: >>> Register( aDOMCameraControl = 0xb0d31600 ) mWindowId = 0x7 <--[MozDEBUG]construct aDOMCameraControl 0xb0d31600

4th launch camera
gecko_camera: >>> Register( aDOMCameraControl = 0xb0d93500 ) mWindowId = 0x7 <--[MozDEBUG]construct aDOMCameraControl 0xb0d93500
leave camera 
gecko_camera: virtual mozilla::nsDOMCameraControl::~nsDOMCameraControl():235 : this=0xb0d31600 <--[MozDEBUG]destruct 0xb0d31600 (fail should be destruct 0xb1a11f00)
without unlink 0xb0d31600 before destruct

my question is
cyclecollector should be do unlink and destruct object.
in fail case it seems not process unlink and went destruct directly.
thanks
Flags: needinfo?(khuey)
As Ting-Yu points out in comment 26, an object is only unlinked if it is part of a cycle.  If it is not part of a cycle it will be destroyed without unlinking.

Similarly, if the unlink does not break whatever cycles the object is a part of, it will not be destroyed.  Unlike the above case, which is expected behavior, this would point to a bug in the unlinking code of one of the objects in the cycle.
Flags: needinfo?(khuey)
Assignee: behsieh → administration
I don't think you meant to do that reassignment ...
Flags: needinfo?(behsieh)
There might be a possibility that similar thing like Bug 1135304 happened.
Hi Robert:
in comment 28, we found a crash happened in monkey test, DOMMediastream been destruct without release mStream and mListener first.
here is current code flow.
https://dxr.mozilla.org/mozilla-central/source/dom/camera/DOMCameraControl.cpp#268
mInput is declare as  nsRefPtr<CameraPreviewMediaStream> as below
https://dxr.mozilla.org/mozilla-central/source/dom/camera/DOMCameraControl.h#218
but in InitStreamCommon
https://dxr.mozilla.org/mozilla-central/source/dom/media/DOMMediaStream.cpp#266
mInput become as regular pointer and assigned to mStream

in case if NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN not process before DOMCameraControl destruct 
mStream will be release and its address will become illegal.
after process DOMMediaStream::~DOMMediaStream to DOMMediaStream::Destroy
mStream is not null but address is illegal
in process mStream->Destroy(); it will crash.
Can you please help on this issue?
thanks
Flags: needinfo?(roc)
It looks like the destructor of nsDOMCameraControl is running twice (since the pattern of mStream is 0x5a5a5a5, which is freed memory).
hi kyle:
thanks for reminding , i will keep following this bug for sure.
but looks like this bug needs more people's help to make ffox more robust.
so I reassignment it to nobody and keep tracking on this bug  :)
Flags: needinfo?(behsieh)
Attached patch camera_crash_fix0226.patch (obsolete) — Splinter Review
Hi jinchao:
can you try attached patch and run monkey test?
thanks
Flags: needinfo?(jinchao.wang)
(In reply to becker hsieh{:behsieh} from comment #35)
> Created attachment 8569594 [details] [diff] [review]
> camera_crash_fix0226.patch
> 
> Hi jinchao:
> can you try attached patch and run monkey test?
> thanks

Hi becker:
   Thanks for your feedback very much! I will apply this patch to run monkey test on our side.
Flags: needinfo?(jinchao.wang)
Set assignee as Becker

--
Keven
Assignee: administration → behsieh
Hi becker:
   Using your patch, we didn't reproduce this CR yet , and we didn't find other issue. I think your patch has effect. So we can fix this CR now. Thanks for your help.
Flags: needinfo?(behsieh)
hi jinchao:
how many devices and how many days you put it to run monkey test?
thanks
Flags: needinfo?(behsieh)
(In reply to becker hsieh{:behsieh} from comment #39)
> hi jinchao:
> how many devices and how many days you put it to run monkey test?
> thanks

Hi Becker:
   Ten devices and tow times, 11 hours every time. We will continue to run monkey test these days. If we found any other issue, we will tell you at the first time.
If I understand correctly, the attachment 8569594 [details] [diff] [review] will destroy the stream only from CC unlinking. Won't that cause memory leak once the issue occur, when unlink does not happen and destruct directly?
I modify nsDOMCameraControl to be nsRefPtr instead of nsWeakPtr. 
nsDOMCameraControl will never be released.
ting yu
i need to know why nsDOMCameraControl didnt process unlink.
do you have any idea i could add log in somewhere and provide patch to jimchao and run monkey test?
or do you have any suggest right owner i could ask for help?
thanks
Flags: needinfo?(janus926)
As was pointed out in comment 26 and 29, unlink does not happen before destruction if the object is not part of a cycle.  If nsDOMCameraControl does not work properly if it is not unlinked, that is a bug.
Kyle has explained the things clearly, clear NI.
Flags: needinfo?(janus926)
Becker just came to me, I repeated what Kyle said in comment 44 and made sure he got it. Also suggested him to figure out what is the cycle the nsDOMCameraControl expected to be in, if the code relies on unlink to finalize the object properly.
Another way is to fix the nsDOMCameraControl/DOMMediaStream finalization so unlink does not affect it. Becker said he's working on it (comment 32).
Attached patch camera_crash_fix0304.patch (obsolete) — Splinter Review
Hi jinchao:
can you run monkey test with attached file
thanks
Flags: needinfo?(jinchao.wang)
(In reply to becker hsieh{:behsieh} from comment #48)
> Created attachment 8572468 [details] [diff] [review]
> camera_crash_fix0304.patch
> 
> Hi jinchao:
> can you run monkey test with attached file
> thanks

Hi  becker:
    I will apply this patch to run monkey test, today. thanks for your feedback.
Flags: needinfo?(jinchao.wang)
I've tested camera_crash_fix0304.patch in combination with my changes in bug 1139721 which cause this to reproduce (for me) 100% when running the ICS emulator dom/camera mochitests. Resolves the crash.
Blocks: 1139721
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8dfd1c1d831
this is my mochitest running on ICS emulator
i didnt found any camera crash.
Flags: needinfo?(aosmond)
(In reply to becker hsieh{:behsieh} from comment #51)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8dfd1c1d831
> this is my mochitest running on ICS emulator
> i didnt found any camera crash.

Hi  becker:
    Using patch in comment 48 ,we run monkey test on our side. The test group didn't found any crash about 10 devices / 25 hours. And RD of us didn't found any crash too.
    thanks.
(In reply to becker hsieh{:behsieh} from comment #51)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8dfd1c1d831
> this is my mochitest running on ICS emulator
> i didnt found any camera crash.

I realize in retrospect what I wrote was confusing. My changes resulted in always hitting this crash; with your patch, the crash was resolved for that reproduction method too :).
Flags: needinfo?(aosmond)
Comment on attachment 8572468 [details] [diff] [review]
camera_crash_fix0304.patch

Review of attachment 8572468 [details] [diff] [review]:
-----------------------------------------------------------------

r+ for the addition of ::Destroy in ~nsDOMCameraControl, assuming all of the debug prints are removed.
Attachment #8572468 - Flags: review+
Attached patch bug1121855Splinter Review
remove debug log and add reviewer name.
Attachment #8572468 - Attachment is obsolete: true
Attachment #8571790 - Attachment is obsolete: true
Attachment #8569594 - Attachment is obsolete: true
Attachment #8557826 - Attachment is obsolete: true
Attachment #8552852 - Attachment is obsolete: true
Attachment #8552261 - Attachment is obsolete: true
Attachment #8577061 - Flags: review+
Attachment #8577061 - Attachment is patch: true
https://hg.mozilla.org/mozilla-central/rev/ae105bf20177
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S8 (20mar)
This landed on master but never got backported even though it's marked as blocking v2.1S. Do we still want it at this point?
Flags: needinfo?(vchen)
Flags: needinfo?(jocheng)
In fact, partner already cherry-pick this one into their local branch. So it would be nice if we can land it onto 2.1s as well

Thanks
Flags: needinfo?(vchen)
Hi Ryan,
Not sure the patch is also good for 2.1S. Can you help to check?
Thanks!
Flags: needinfo?(jocheng) → needinfo?(ryanvm)
Flags: needinfo?(ryanvm)
You need to log in before you can comment on or make changes to this bug.