Closed Bug 1125418 Opened 10 years ago Closed 10 years ago

[l-gonk] plugin-container can't connect to media server

Categories

(Core :: Audio/Video, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX
blocking-b2g 2.2+

People

(Reporter: ggrisco, Assigned: sotaro)

References

Details

(Keywords: crash, Whiteboard: [b2g-crash][caf-crash 433][caf priority: p1][CR 785241])

Crash Data

Attachments

(29 files, 4 obsolete files)

147.93 KB, text/plain
Details
24.13 KB, text/plain
Details
97.60 KB, text/x-log
Details
7.48 KB, patch
Details | Diff | Splinter Review
746.31 KB, text/x-log
Details
667.75 KB, text/x-log
Details
4.76 KB, patch
Details | Diff | Splinter Review
7.35 KB, patch
Details | Diff | Splinter Review
1.53 MB, text/x-log
Details
733 bytes, text/plain
Details
143.06 KB, text/plain
Details
733 bytes, text/plain
Details
143.06 KB, text/plain
Details
214.64 KB, text/plain
Details
161.74 KB, text/plain
Details
89.83 KB, text/plain
Details
149.38 KB, text/plain
Details
82.96 KB, text/plain
Details
164.91 KB, text/plain
Details
83.29 KB, text/plain
Details
157.70 KB, text/plain
Details
76.31 KB, text/plain
Details
163.39 KB, text/plain
Details
7.48 KB, patch
Details | Diff | Splinter Review
85.74 KB, text/plain
Details
165.71 KB, text/plain
Details
85.85 KB, text/plain
Details
157.92 KB, text/plain
Details
4.69 KB, patch
Details | Diff | Splinter Review
With v2.2, We have seen this crash signature hundreds of times over the past few weeks during monkey test.  No STR available.  cafbot will upload minidump and .extra file.

[@ mozalloc_abort | __android_log_assert | android::OMXClient::connect | android::OmxDecoder::AllocateMediaResources ]
Attached file EXTRA file attachment - (obsolete) —
Attached file decoded minidump - (obsolete) —
Bhavana, here's another frequent crash we're seeing.
Flags: needinfo?(bbajaj)
Whiteboard: [CR 785241]
Whiteboard: [CR 785241] → [caf priority: p1][CR 785241]
Whiteboard: [caf priority: p1][CR 785241] → [b2g-crash][caf-crash 433][caf priority: p1][CR 785241]
Observed on: 

Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.044
Moz BuildID: 20150120002507
B2G Version: 2.2
Gecko Version: 37.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=f5b3d1b6cfa3e702033f613915ae637cb735cbfb
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=bb6c4d3fc51281f1a2ee0ff471221dbc1d0d1035
Patches: bug 1067629, bug 1117862, bug 1119980, bug 1091307, bug 1122162
copying Sotaro and Chris Pearce
Component: Gaia::Video → Video/Audio
Product: Firefox OS → Core
Thanks Hema!

Greg, hoping sotaro or chris can help start investigation here or atleast re-direct to the right owner to help here.
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(cpearce)
Flags: needinfo?(bbajaj)
Greg, can we have logcat logs when the problem happens?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(ggrisco)
Greg, can we have logcat logs when the problem happens? And from when, does the problem happen?
OMXClient did not have a change that seems to affect to this. Other places change seems to be affected.
(In reply to Sotaro Ikeda [:sotaro] from comment #9)
> OMXClient did not have a change that seems to affect to this. Other places
> change seems to be affected.

correction:
OMXClient did not have a change that seems to affect to this.
In attachment 8554036 [details], ASSERT happens at OMXClient::connect(). It says that media server did not exist because of a crash. Therefore, attachment 8554036 [details] does not have an information except media server crash.

Therefore, logcat log that catch the crash is necessary to investigate more.
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> (In reply to Sotaro Ikeda [:sotaro] from comment #9)
> > OMXClient did not have a change that seems to affect to this. Other places
> > change seems to be affected.
> 
> correction:
> OMXClient did not have a change that seems to affect to this.

Sorry, OmxDecoder is correct workd...
Flags: needinfo?(ggrisco)
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> Greg, can we have logcat logs when the problem happens? And from when, does
> the problem happen?

Greg, can you also get kernel log(dmesg)?
Flags: needinfo?(ggrisco)
Attached file dmesg.log (obsolete) —
Flags: needinfo?(ggrisco)
Attached file main.log (obsolete) —
Ok, uploaded dmesg and android logs.
Flags: needinfo?(sotaro.ikeda.g)
Greg, were attachment 8554768 [details] and attachment 8554771 [details] got same time to logs of Comment 1 and Comment 2?
And can you provide the hardware and RAM config information?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(ggrisco)
attachment 8554771 [details] says that b2g process was restarted during testing. It seems to be triggered by slogger error. slogger seems not mozilla's module.
 

The following is from attachment 8554771 [details].
--------------------------------------------------------------------------------------

01-01 00:03:31.358   960   962 E slogger : Raw Log write error. Peer has disconnected
01-01 00:03:31.360   960   962 E slogger : Raw Log shutdown. Waiting on reload signal...
01-01 00:03:31.363   960   960 W slogger : Active: 0
01-01 00:03:31.363   960   960 W slogger : Log cleanup: ref.log
01-01 00:03:31.366   960   960 W slogger : Log cleanup: dmesg.log
01-01 00:03:31.370   960   960 W slogger : Log cleanup: main.log
01-01 00:03:31.372   960   960 W slogger : Log cleanup: radio.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: events.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: meminfo.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: battery.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: ion.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: kgsl.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: buddyinfo.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: procrank.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: top.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: lsof.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2g-info.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2g-ps.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2g-procrank.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2gmemreport.log
01-01 00:03:32.360   960   962 W slogger : Raw Log stop
01-01 00:03:34.434   246   876 D QLOC-OSAgent: Subscriber already registerd XTWiFi-PE
01-01 00:03:35.880   246   876 D QLOC-OSAgent: Subscriber already registerd XTWiFi-PE
01-01 00:03:37.221   237   857 D AudioFlinger: setProcessedAudioState(trackId=55, processed=0)


The following says that b2g process was killed/re-started.

01-01 00:03:37.239   217   217 I ServiceManager: service 'media.resource_manager' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'SurfaceFlinger' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'permission' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'display.qservice' died
Greg, do you know if slogger is codeaurora's module?
(In reply to Sotaro Ikeda [:sotaro] from comment #19)
> Greg, do you know if slogger is codeaurora's module?

Hi Sotaro, yes, slogger is an internal CAF utility.  Let me see if I can figure out why this is happening.  I'll close this for now and re-open if needed.  Thanks.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(ggrisco)
Resolution: --- → INVALID
no problem :-)
Attachment #8554035 - Attachment is obsolete: true
Attachment #8554036 - Attachment is obsolete: true
Attachment #8554768 - Attachment is obsolete: true
Attachment #8554771 - Attachment is obsolete: true
Attached file dmesg.log
I reproduced this by first removing our slogger utility.  Then all I did was start Video app with an sdcard inserted (that contained some videos). The app opened but complained that there was another instance of the app running (which shouldn't be the case).  So I closed the app and that's when I saw the crash.  It's pretty repeatable, so please let me know if there's any extra information I can gather for you.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos). The
> app opened but complained that there was another instance of the app running
> (which shouldn't be the case).  So I closed the app and that's when I saw
> the crash.  It's pretty repeatable, so please let me know if there's any
> extra information I can gather for you.

Greg, can you explain more explicitly or can you upload the screen shot when "there was another instance of the app running". It seems different problem than the crash.
Flags: needinfo?(ggrisco)
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos).

Greg, Is the message is like the following?
- "Cannot play video: Another app is currently using the video player"

Video app actually does not have a such capability. Therefore, the video app just emulate it just by some timer. Similar bug to it is Bug 1112412.
Assignee: nobody → sotaro.ikeda.g
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos). The
> app opened but complained that there was another instance of the app running
> (which shouldn't be the case).  So I closed the app and that's when I saw
> the crash.  It's pretty repeatable, so please let me know if there's any
> extra information I can gather for you.

Greg, can you provide a set of logs when removing our slogger utility?
(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> (In reply to Greg Grisco from comment #25)
> > I reproduced this by first removing our slogger utility.  Then all I did was
> > start Video app with an sdcard inserted (that contained some videos). The
> > app opened but complained that there was another instance of the app running
> > (which shouldn't be the case).  So I closed the app and that's when I saw
> > the crash.  It's pretty repeatable, so please let me know if there's any
> > extra information I can gather for you.
> 
> Greg, can you provide a set of logs when removing our slogger utility?

Sorry, it is not necessary. logs of Comment 22, 23, 24 seems new log and does not have b2g process restart.
attachment 8555442 [details] have the following. It seems to say that the device was oom.

> 01-01 02:52:14.148  1398  1452 E Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> attachment 8555442 [details] have the following. It seems to say that the
> device was oom.
> 
> > 01-01 02:52:14.148  1398  1452 E Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort

One strange thing is that the crash log(attachment 8555441 [details] ) say that there is no mediaserver.
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> (In reply to Greg Grisco from comment #25)
> 
> Greg, Is the message is like the following?
> - "Cannot play video: Another app is currently using the video player"

Yes, that is the exact message.  I just reproduced and saw the message.  The app crashes right after that.  Now I have been able to reproduce only 50% of the time.  Other times I am able to play video (although no audio is being played).
Flags: needinfo?(ggrisco)
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos). The
> app opened but complained that there was another instance of the app running
> (which shouldn't be the case).  So I closed the app and that's when I saw
> the crash.  It's pretty repeatable, so please let me know if there's any
> extra information I can gather for you.

Greg, did you reproduce the crash just by opening one video app? Can you write more detailed step(STR) to reproduce the crash?
Flags: needinfo?(ggrisco)
Here are the steps I used to reproduce:

1. reboot phone
2. select Video app

video images are displayed

3. select a video to play
4. if the video starts to play, go to step 1.  But after only two iterations of this I saw the "Cannot play video" message followed by crashs.
Flags: needinfo?(ggrisco) → needinfo?(sotaro.ikeda.g)
Greg, thanks. I have an additional question. Can you answer the following questions?

- In which version of gonk is used for testing? (Gonk KitKat?).
- On which hardware does the test is done?
- Can you upload OMXClient.cpp file used for the ROM?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(ggrisco)
(In reply to Greg Grisco from comment #34)
> Here are the steps I used to reproduce:
> 
> 1. reboot phone
> 2. select Video app

Greg, in the above "1", how did you reboot the phone. Did you do power off and on? Or, did you just reboot b2g?
(In reply to Sotaro Ikeda [:sotaro] from comment #35)
> Greg, thanks. I have an additional question. Can you answer the following
> questions?
> 
> - In which version of gonk is used for testing? (Gonk KitKat?).
> - On which hardware does the test is done?
> - Can you upload OMXClient.cpp file used for the ROM?

This was on L-based build on reference hardware.  Gonk version is listed in comment 4.

(In reply to Sotaro Ikeda [:sotaro] from comment #36)
> (In reply to Greg Grisco from comment #34)
> > Here are the steps I used to reproduce:
> > 
> > 1. reboot phone
> > 2. select Video app
> 
> Greg, in the above "1", how did you reboot the phone. Did you do power off
> and on? Or, did you just reboot b2g?

The crash occurs in both scenarios.
Flags: needinfo?(ggrisco)
Wired thing about this bug is that logcat says that the crash happened because of memory allocation failure.

> Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort

But crash log says that android::OMXClient::connect() failure calls abort.
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> Used OMXClient.cpp seems like the following.
> https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/tree/media/
> libstagefright/OMXClient.cpp?h=mozilla/b2g-5.0.0_r6

If the above code is correct, when abort was called in OMXClient::connect(), it succeeded to get IServiceManager and IMediaPlayerService. But failed to get IOMX. IOMX should always succeed when IMediaPlayerService exists.
Another wired thing is that if the crash happened by OMXClient::connect() failure, the following log should exit in logcat log.

> F OMXClient: frameworks/av/media/libstagefright/OMXClient.cpp:406 CHECK(mOMX.get() != NULL) failed.
Maybe mOMX is null?
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> Wired thing about this bug is that logcat says that the crash happened
> because of memory allocation failure.
> 
> > Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort
> 
> But crash log says that android::OMXClient::connect() failure calls abort.

There seems no inconsistency about the above. In b2g and b2g container process, abort() call is redirected to mozalloc_abort() call. attachment 8555441 [details] shows mozalloc_abort() call. And mozalloc_abort() prints the following log.

> Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort
(In reply to Greg Grisco from comment #42)
> Maybe mOMX is null?

There is such possibility. But MediaPlayerService::getOMX() always return valid OMX.

https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/tree/media/libmediaplayerservice/MediaPlayerService.cpp?h=LA.AF.1.1.1#n357
Add some logs around OMXClient and OMX.
(In reply to Sotaro Ikeda [:sotaro] from comment #45)
> Created attachment 8556062 [details] [diff] [review]
> log patch - framework/av
> 
> Add some logs around OMXClient and OMX.

I checked the patch on nexus5-l. Some how, I rarely saw log around OMXClient, although log around MediaPlayerService almost always seems to be printed to logcat.
Greg, can you test again with attachment 8556062 [details] [diff] [review]? some log seems not always printed to logcat, but the additional log could help for debugging. Thanks.
Flags: needinfo?(ggrisco)
Attached file video_works.log
This is what the log looks like when I'm able to play video without crash.  Extra logging from attachment 8556062 [details] [diff] [review] is applied.
Flags: needinfo?(ggrisco)
Attached file video_crashes.log
This is what the log looks like when Video crashes.  Extra logging from attachment 8556062 [details] [diff] [review] is applied.
Thanks! When the problem happens, somehow, an application seems to failed to call MediaPlayerService::getOMX(). The function call log exist only on video_works.log(attachment 8556165 [details]).

But it is still not clear why it failed.
I re-checked decoded minidump of the crash attachment 8555441 [details]. I recognized one wired thing. It has all threads stacks in the crash process. But it does not have IPC Pool threads. They are created by android::ProcessState::self()->startThreadPool() call. The function is called during application process initialization.
The problem seems to happen around android binder ipc.
If we have msm8909 device, it might becomes easier to investigate this problem...
Sotaro, is there any other information that I collect for you since you don't have 8909?
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(cpearce)
(In reply to Greg Grisco from comment #54)
> Sotaro, is there any other information that I collect for you since you
> don't have 8909?

Thanks, I will prepare a new log patch tomorrow(Thursday).
Flags: needinfo?(sotaro.ikeda.g)
Greg, can you get a log with the following patches applied?

  - log patch - gecko: attachment 8559996 [details] [diff] [review]
  - log patch - framework/native: attachment 8559997 [details] [diff] [review]
  - log patch - framework/av: attachment 8556062 [details] [diff] [review]
Flags: needinfo?(ggrisco)
Hi Sotaro, some of these patches no longer apply.  I'm trying to fix them up but will have to finish it tomorrow.  I hope to have some logs for you to look at tomorrow morning, thanks.
Attached file music_video.log
Now it's getting hard to reproduce this bug.  We were seeing hundreds of these crashes in past couple of weeks, but last two days automation hasn't produced any crashes.  However, I can see the "Another app is currently using the video player." message after I follow these steps:

1. Start Music app
2. Select music track to play

With some content, I'm seeing the Music app freeze (bug 1129685)

3. Kill the Music app
4. Start the Video app

Now I see the message.

I collected logs for this with logging patches applied.
Flags: needinfo?(ggrisco)
(In reply to Greg Grisco from comment #60)
> Created attachment 8560553 [details]
> music_video.log
> 
> Now it's getting hard to reproduce this bug.  We were seeing hundreds of
> these crashes in past couple of weeks, but last two days automation hasn't
> produced any crashes.  However, I can see the "Another app is currently
> using the video player." message after I follow these steps:
> 
> 1. Start Music app
> 2. Select music track to play
> 
> With some content, I'm seeing the Music app freeze (bug 1129685)
> 
> 3. Kill the Music app
> 4. Start the Video app
> 
> Now I see the message.
> 
> I collected logs for this with logging patches applied.

This seems like a different problem/bug. The log is not optimum for this symptom. But from the log, there seems to have a problem to "input data read" of OMXCodec from DataSource. It might be cause by same problem as bug 1129685.
(In reply to Sotaro Ikeda [:sotaro] from comment #62)
> (In reply to Greg Grisco from comment #60)
> > Created attachment 8560553 [details]
> > music_video.log
> > 
> > Now it's getting hard to reproduce this bug.  We were seeing hundreds of
> > these crashes in past couple of weeks, but last two days automation hasn't
> > produced any crashes.  However, I can see the "Another app is currently
> > using the video player." message after I follow these steps:

Video tag does not expose such capability. Therefore, the above status is detected by heuristics by using event response time. Therefore, the above message normally say, gecko MediaDecoder is not emitting the event withing 2 second or so.
Although I have trouble reproducing this manually now, it still reproduces in monkey test.  I'm trying to get a log with the logging patches applied.  Will share those when I get them.
Greg, Comment 67 and Comment 68 does not have logcat data. Can we have the logcat log?
Flags: needinfo?(ggrisco)
It seems this exact crash is not reproducing anymore with the logging patch applied.  Could just be a coincidence, but I'll leave the patch in for a while longer to see if we get anything.
Flags: needinfo?(ggrisco)
attachment 8571002 [details] has the followings. From them, mediaserver seems not started yet.

> ServiceManager: Waiting for service media.camera...

> ServiceManager: Waiting for service media.player...
From comment 80, there might be the following possibilities.

- [1] mediaserver's initialization takes too long time
- [2] mediaserver's initialization became stuck
BTW, I had removed the logging patch this weekend and now we're seeing crashes again.  It seems that the extra logging was preventing the crash from happening.
(In reply to Sotaro Ikeda [:sotaro] from comment #81)
> From comment 80, there might be the following possibilities.
> 
> - [1] mediaserver's initialization takes too long time
> - [2] mediaserver's initialization became stuck

There is also another possibility.
- [3] Binder IPC sometimes does not work correctly from content process.
Hi Sotaro,

(In reply to Sotaro Ikeda [:sotaro] from comment #86)
> (In reply to Sotaro Ikeda [:sotaro] from comment #81)
> ...
>
> There is also another possibility.
> - [3] Binder IPC sometimes does not work correctly from content process.

Are you testing that possibility? What next steps can be taken to move this issue forward?

Thanks,
Mike
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Mike Lee [:mlee] from comment #93)
> 
> Are you testing that possibility? What next steps can be taken to move this
> issue forward?

We do not have a environment for testing. Recently, only device was received, but correct software environment was not provided yet. Therefore, only thing is ask code aurora to add some logs to there environment. But adding a lot of log could cause comment 85.
Flags: needinfo?(sotaro.ikeda.g)
I am going to create a different patch to check mediaserver's service start up.
Add log around mediaserver's service start up.
Greg, can you add log patch attachment 8572867 [details] [diff] [review] to codeaurora's build? It add log around mediasercices' start up. It seems not add too much log.
Flags: needinfo?(ggrisco)
(In reply to Sotaro Ikeda [:sotaro] from comment #97)
> Greg, can you add log patch attachment 8572867 [details] [diff] [review] to
> codeaurora's build? It add log around mediasercices' start up. It seems not
> add too much log.

Ok, it should get built today and monkey will run this weekend.  If there are crashes, I (or cafbot) will send you the logs.
Flags: needinfo?(ggrisco)
Thank you!
Hi Sotaro, we're able to reproduce more reliably now and think this might be due to race condition at startup.  We're going to debug this now.  I marked as [POVB] for now, will loop you back in when we know more.  Thanks!
Whiteboard: [b2g-crash][caf-crash 433][caf priority: p1][CR 785241] → [POVB][b2g-crash][caf-crash 433][caf priority: p1][CR 785241]
Thanks for the notification:-)
It turns out that this doesn't really have much to do with media/OMX.  It now looks like on some (all?) L gonks, /system/b2g/plugin-container is unable to connect to the mediaserver.  This can also be reproduced in the Dialer app and Built-in Keyboard app, when they try to play audio.

Example using the Phone app instead of Video app, since it's easier to demonstrate the issue:

BAD
---
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     273796 89472 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  24452 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   167484 37608 ffffffff b6f43d08 S /system/b2g/b2g
(Preallocated a  2 u0_a3579  3579  541   70720  17596 ffffffff b6f43d08 S /system/b2g/b2g

$ adb shell kill 3579
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     268924 88952 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  24452 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   166444 38040 ffffffff b6f43d08 S /system/b2g/b2g

<<< Launch Phone app >>>

$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     244048 90124 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  23772 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   98428  30132 ffffffff b6f43d08 S /system/b2g/b2g
Communications   2 u0_a3661  3661  260   91080  37004 ffffffff b6f3f070 R /system/b2g/plugin-container

$ adb logcat | grep Waiting
I/ServiceManager( 3661): Waiting for service media.audio_flinger...
I/ServiceManager( 3661): Waiting for service media.audio_flinger...
I/ServiceManager( 3661): Waiting for service media.audio_flinger...

<<<  The Phone app is completely wedged here >>

GOOD
----
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     257484 91604 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  23592 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   131876 35916 ffffffff b6f43d08 S /system/b2g/b2g
(Preallocated a  2 u0_a3694  3694  541   70720  17224 ffffffff b6f43d08 S /system/b2g/b2g

<<< Launch Phone app >>>

$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     246800 91928 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  23404 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   98428  29672 ffffffff b6f43d08 S /system/b2g/b2g
Communications   2 u0_a3694  3694  541   89372  28908 ffffffff b6f43d08 S /system/b2g/b2g

$ adb logcat | grep Waiting
null

<<<  The Phone app is working fine here >>
Summary: Assertion failure in OMXDecoder::AllocateMediaResources during monkey test → [l-gonk] plugin-container can't connect to media server
Whiteboard: [POVB][b2g-crash][caf-crash 433][caf priority: p1][CR 785241] → [b2g-crash][caf-crash 433][caf priority: p1][CR 785241]
Once bug 1053634 is fixed plugin-container should never be used and perhaps this bug just disappears.

It would be interesting if the STR from comment 108 reproduce the issue on N5 though, I don't have one to try here.
Depends on: 1053634
No longer blocks: CAF-v3.0-FL-metabug
Shawn,

Can someone on the Taipei team confirm whether this issue reproduces on the Nexus 5? CAF is trying to understand if this is only an issue with their device.

Thanks,
Mike
Flags: needinfo?(sku)
(In reply to Mike Lee [:mlee] from comment #110)
> Shawn,
> 
> Can someone on the Taipei team confirm whether this issue reproduces on the
> Nexus 5? CAF is trying to understand if this is only an issue with their
> device.
> 
> Thanks,
> Mike


Mike,
There is no such BAD symptom as comment 108 said. 

Blake,
 Once you got time, please can you try check this issue as well?



$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     314028 117936 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   105224 28620 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
(Preallocated a  2 u0_a1776  1776  385   70932  17800 ffffffff b6e5b490 S /system/b2g/b2g

<< Launch communication app >>
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     364464 119004 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   108304 27896 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
Communications   2 u0_a1776  1776  385   98516  28224 ffffffff b6e5b490 S /system/b2g/b2g

<< Kill communication app >>
$ adb shell kill 1776
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     357232 118708 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   130172 34664 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
(Preallocated a  2 u0_a2019  2019  385   70932  17996 ffffffff b6e5b490 S /system/b2g/b2g

<< Re-launch communication app >>
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     366676 120680 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   112492 39960 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
Communications   2 u0_a2019  2019  385   97492  28392 ffffffff b6e5b490 S /system/b2g/b2g

<< Launch video app >>
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     367636 126220 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   108304 27636 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
Communications   2 u0_a2019  2019  385   101576 28216 ffffffff b6e5b490 S /system/b2g/b2g
Video            2 u0_a2210  2210  385   92732  25072 ffffffff b6e5b490 S /system/b2g/b2g
Flags: needinfo?(sku) → needinfo?(bwu)
Currently I don't have nexus 5 at my hand. mchiang should be able to help check it. :)
Flags: needinfo?(bwu) → needinfo?(mchiang)
Hi,

I follow the steps in comment 34 and try to reproduce the bug with Nexus 5.
Cannot reproduce it for 10 times.
Since Greg also cannot reproduce the bug manually now, we need to come out a monkey & stress test to hit this problem.

Hi Greg,

According to the log, seems the problem is related to the binder/IPC.
Does your cafbot also record kernel log?
Could you help do the favor:
1) Set cafbot to record all logs from the boot up (We expect to see the log like "MediaPlayerService::instantiate()", which ensures mediaplayerservice indeed instantiate successfully)
2) Provide both logcat log and kernel log.
3) Ask your system team to assist analyzing the kernel log and see if there is any IPC/binder issue. Is that possible there is a binder leakage?

Regards,
Munro
Flags: needinfo?(mchiang)
Flags: needinfo?(ggrisco)
This log patch is a part of Attachment 8559997 [details] [diff]. Limit log to servicemanager.
I do not think that "plugin-container can't connect to media server". Just before video decoder instantiation, content process access mediaserver by getMediaResourceManagerService().

getMediaResourceManagerService() access to the media server. IMediaResourceManagerService is used to reserve video decoder resource. The instance of IMediaResourceManagerService exits in b2g process and it is registered to mediaserver. The content process get the interface via mediaserver.

https://dxr.mozilla.org/mozilla-central/source/dom/media/omx/mediaresourcemanager/IMediaResourceManagerDeathNotifier.cpp#45
Greg, can you add attachment 8576783 [details] [diff] [review]? It make sure if content process access mediasercver.
Flags: needinfo?(ggrisco)
Another reason of comment 115 is in attachment 8554771 [details] in comment 15. Only this log has b2g process reboot log. During b2g process reboot. ServiceManager always emit some service died messages. But it does not have a died message of mediaserver's services.

Died logs of attachment 8554771 [details] was the following.
--------------------------------------------
01-01 00:03:37.239   217   217 I ServiceManager: service 'media.resource_manager' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'SurfaceFlinger' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'permission' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'display.qservice' died

But we normally expect the followig died logs. b2g process reboot also trigger mediaserver prorocess reboot.
-------------------------------------------------------------------

03-12 13:06:41.790   175   175 I ServiceManager: service 'android.security.keystore' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'media.resource_manager' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'SurfaceFlinger' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'permission' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'display.qservice' died
03-12 13:06:41.791   193   193 D nfcd    :  4001536 of bytes to be sent... data=0x0 ret=0
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.sound_trigger_hw' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.audio_flinger' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.player' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.camera' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.audio_policy' died
Flags: needinfo?(mvines)
Flags: needinfo?(ggrisco)
(In reply to mchiang from comment #113)
> I follow the steps in comment 34 and try to reproduce the bug with Nexus 5.
> Cannot reproduce it for 10 times.
 
Thanks for checking!

> Since Greg also cannot reproduce the bug manually now, we need to come out a
> monkey & stress test to hit this problem.

This can now be reproduced 100% over here, please see comment 108 for how.   It's something to do with fd management in the plugin-container process as the issue does not reproduce if plugin-container was not needed and the Video/Dialer/Keyboard app run in a b2g process and certainly may be a POVB.

However I think we should focus our attention on fixing bug 1053634 right now, as this will remove the plugin-container process entirely and as a fringe benefit should resolve this bug (fingers crossed).
Flags: needinfo?(mvines)
Bug 1053634 avoids this issue as hoped so we don't need a fix here.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: