[l-gonk] plugin-container can't connect to media server

RESOLVED WONTFIX

Status

()

defect
--
critical
RESOLVED WONTFIX
5 years ago
4 years ago

People

(Reporter: ggrisco, Assigned: sotaro)

Tracking

({crash})

unspecified
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:2.2+)

Details

(Whiteboard: [b2g-crash][caf-crash 433][caf priority: p1][CR 785241], crash signature)

Attachments

(29 attachments, 4 obsolete attachments)

147.93 KB, text/plain
Details
24.13 KB, text/plain
Details
97.60 KB, text/x-log
Details
7.48 KB, patch
Details | Diff | Splinter Review
746.31 KB, text/x-log
Details
667.75 KB, text/x-log
Details
4.76 KB, patch
Details | Diff | Splinter Review
7.35 KB, patch
Details | Diff | Splinter Review
1.53 MB, text/x-log
Details
733 bytes, text/plain
Details
143.06 KB, text/plain
Details
733 bytes, text/plain
Details
143.06 KB, text/plain
Details
214.64 KB, text/plain
Details
161.74 KB, text/plain
Details
89.83 KB, text/plain
Details
149.38 KB, text/plain
Details
82.96 KB, text/plain
Details
164.91 KB, text/plain
Details
83.29 KB, text/plain
Details
157.70 KB, text/plain
Details
76.31 KB, text/plain
Details
163.39 KB, text/plain
Details
7.48 KB, patch
Details | Diff | Splinter Review
85.74 KB, text/plain
Details
165.71 KB, text/plain
Details
85.85 KB, text/plain
Details
157.92 KB, text/plain
Details
4.69 KB, patch
Details | Diff | Splinter Review
Reporter

Description

5 years ago
With v2.2, We have seen this crash signature hundreds of times over the past few weeks during monkey test.  No STR available.  cafbot will upload minidump and .extra file.

[@ mozalloc_abort | __android_log_assert | android::OMXClient::connect | android::OmxDecoder::AllocateMediaResources ]
Posted file EXTRA file attachment - (obsolete) —
Posted file decoded minidump - (obsolete) —
Reporter

Comment 3

5 years ago
Bhavana, here's another frequent crash we're seeing.
Flags: needinfo?(bbajaj)
Whiteboard: [CR 785241]
Whiteboard: [CR 785241] → [caf priority: p1][CR 785241]
Whiteboard: [caf priority: p1][CR 785241] → [b2g-crash][caf-crash 433][caf priority: p1][CR 785241]
Observed on: 

Device: msm8909
Gonk Version: AU_LINUX_GECKO_LF.BR.1.2.3.00.00.00.000.044
Moz BuildID: 20150120002507
B2G Version: 2.2
Gecko Version: 37.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=f5b3d1b6cfa3e702033f613915ae637cb735cbfb
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=bb6c4d3fc51281f1a2ee0ff471221dbc1d0d1035
Patches: bug 1067629, bug 1117862, bug 1119980, bug 1091307, bug 1122162

Comment 5

5 years ago
copying Sotaro and Chris Pearce
Component: Gaia::Video → Video/Audio
Product: Firefox OS → Core
Thanks Hema!

Greg, hoping sotaro or chris can help start investigation here or atleast re-direct to the right owner to help here.
blocking-b2g: 2.2? → 2.2+
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(cpearce)
Flags: needinfo?(bbajaj)
Assignee

Comment 7

5 years ago
Greg, can we have logcat logs when the problem happens?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(ggrisco)
Assignee

Comment 8

5 years ago
Greg, can we have logcat logs when the problem happens? And from when, does the problem happen?
Assignee

Comment 9

5 years ago
OMXClient did not have a change that seems to affect to this. Other places change seems to be affected.
Assignee

Comment 10

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #9)
> OMXClient did not have a change that seems to affect to this. Other places
> change seems to be affected.

correction:
OMXClient did not have a change that seems to affect to this.
Assignee

Comment 11

5 years ago
In attachment 8554036 [details], ASSERT happens at OMXClient::connect(). It says that media server did not exist because of a crash. Therefore, attachment 8554036 [details] does not have an information except media server crash.

Therefore, logcat log that catch the crash is necessary to investigate more.
Assignee

Comment 12

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> (In reply to Sotaro Ikeda [:sotaro] from comment #9)
> > OMXClient did not have a change that seems to affect to this. Other places
> > change seems to be affected.
> 
> correction:
> OMXClient did not have a change that seems to affect to this.

Sorry, OmxDecoder is correct workd...
Assignee

Updated

5 years ago
Flags: needinfo?(ggrisco)
Assignee

Comment 13

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #8)
> Greg, can we have logcat logs when the problem happens? And from when, does
> the problem happen?

Greg, can you also get kernel log(dmesg)?
Flags: needinfo?(ggrisco)
Reporter

Comment 14

5 years ago
Posted file dmesg.log (obsolete) —
Flags: needinfo?(ggrisco)
Reporter

Comment 15

5 years ago
Posted file main.log (obsolete) —
Reporter

Comment 16

5 years ago
Ok, uploaded dmesg and android logs.
Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 17

5 years ago
Greg, were attachment 8554768 [details] and attachment 8554771 [details] got same time to logs of Comment 1 and Comment 2?
And can you provide the hardware and RAM config information?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(ggrisco)
Assignee

Comment 18

5 years ago
attachment 8554771 [details] says that b2g process was restarted during testing. It seems to be triggered by slogger error. slogger seems not mozilla's module.
 

The following is from attachment 8554771 [details].
--------------------------------------------------------------------------------------

01-01 00:03:31.358   960   962 E slogger : Raw Log write error. Peer has disconnected
01-01 00:03:31.360   960   962 E slogger : Raw Log shutdown. Waiting on reload signal...
01-01 00:03:31.363   960   960 W slogger : Active: 0
01-01 00:03:31.363   960   960 W slogger : Log cleanup: ref.log
01-01 00:03:31.366   960   960 W slogger : Log cleanup: dmesg.log
01-01 00:03:31.370   960   960 W slogger : Log cleanup: main.log
01-01 00:03:31.372   960   960 W slogger : Log cleanup: radio.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: events.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: meminfo.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: battery.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: ion.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: kgsl.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: buddyinfo.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: procrank.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: top.log
01-01 00:03:31.374   960   960 W slogger : Log cleanup: lsof.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2g-info.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2g-ps.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2g-procrank.log
01-01 00:03:31.375   960   960 W slogger : Log cleanup: b2gmemreport.log
01-01 00:03:32.360   960   962 W slogger : Raw Log stop
01-01 00:03:34.434   246   876 D QLOC-OSAgent: Subscriber already registerd XTWiFi-PE
01-01 00:03:35.880   246   876 D QLOC-OSAgent: Subscriber already registerd XTWiFi-PE
01-01 00:03:37.221   237   857 D AudioFlinger: setProcessedAudioState(trackId=55, processed=0)


The following says that b2g process was killed/re-started.

01-01 00:03:37.239   217   217 I ServiceManager: service 'media.resource_manager' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'SurfaceFlinger' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'permission' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'display.qservice' died
Assignee

Comment 19

5 years ago
Greg, do you know if slogger is codeaurora's module?
Reporter

Comment 20

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #19)
> Greg, do you know if slogger is codeaurora's module?

Hi Sotaro, yes, slogger is an internal CAF utility.  Let me see if I can figure out why this is happening.  I'll close this for now and re-open if needed.  Thanks.
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(ggrisco)
Resolution: --- → INVALID
Assignee

Comment 21

5 years ago
no problem :-)
Reporter

Comment 22

5 years ago
Attachment #8554035 - Attachment is obsolete: true
Attachment #8554036 - Attachment is obsolete: true
Attachment #8554768 - Attachment is obsolete: true
Attachment #8554771 - Attachment is obsolete: true
Reporter

Comment 24

5 years ago
Posted file dmesg.log
Reporter

Comment 25

5 years ago
I reproduced this by first removing our slogger utility.  Then all I did was start Video app with an sdcard inserted (that contained some videos). The app opened but complained that there was another instance of the app running (which shouldn't be the case).  So I closed the app and that's when I saw the crash.  It's pretty repeatable, so please let me know if there's any extra information I can gather for you.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee

Comment 26

5 years ago
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos). The
> app opened but complained that there was another instance of the app running
> (which shouldn't be the case).  So I closed the app and that's when I saw
> the crash.  It's pretty repeatable, so please let me know if there's any
> extra information I can gather for you.

Greg, can you explain more explicitly or can you upload the screen shot when "there was another instance of the app running". It seems different problem than the crash.
Flags: needinfo?(ggrisco)
Assignee

Comment 27

5 years ago
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos).

Greg, Is the message is like the following?
- "Cannot play video: Another app is currently using the video player"

Video app actually does not have a such capability. Therefore, the video app just emulate it just by some timer. Similar bug to it is Bug 1112412.
Assignee

Updated

5 years ago
Assignee: nobody → sotaro.ikeda.g
Assignee

Comment 28

5 years ago
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos). The
> app opened but complained that there was another instance of the app running
> (which shouldn't be the case).  So I closed the app and that's when I saw
> the crash.  It's pretty repeatable, so please let me know if there's any
> extra information I can gather for you.

Greg, can you provide a set of logs when removing our slogger utility?
Assignee

Comment 29

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> (In reply to Greg Grisco from comment #25)
> > I reproduced this by first removing our slogger utility.  Then all I did was
> > start Video app with an sdcard inserted (that contained some videos). The
> > app opened but complained that there was another instance of the app running
> > (which shouldn't be the case).  So I closed the app and that's when I saw
> > the crash.  It's pretty repeatable, so please let me know if there's any
> > extra information I can gather for you.
> 
> Greg, can you provide a set of logs when removing our slogger utility?

Sorry, it is not necessary. logs of Comment 22, 23, 24 seems new log and does not have b2g process restart.
Assignee

Comment 30

5 years ago
attachment 8555442 [details] have the following. It seems to say that the device was oom.

> 01-01 02:52:14.148  1398  1452 E Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort
Assignee

Comment 31

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> attachment 8555442 [details] have the following. It seems to say that the
> device was oom.
> 
> > 01-01 02:52:14.148  1398  1452 E Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort

One strange thing is that the crash log(attachment 8555441 [details] ) say that there is no mediaserver.
Reporter

Comment 32

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> (In reply to Greg Grisco from comment #25)
> 
> Greg, Is the message is like the following?
> - "Cannot play video: Another app is currently using the video player"

Yes, that is the exact message.  I just reproduced and saw the message.  The app crashes right after that.  Now I have been able to reproduce only 50% of the time.  Other times I am able to play video (although no audio is being played).
Flags: needinfo?(ggrisco)
Assignee

Comment 33

5 years ago
(In reply to Greg Grisco from comment #25)
> I reproduced this by first removing our slogger utility.  Then all I did was
> start Video app with an sdcard inserted (that contained some videos). The
> app opened but complained that there was another instance of the app running
> (which shouldn't be the case).  So I closed the app and that's when I saw
> the crash.  It's pretty repeatable, so please let me know if there's any
> extra information I can gather for you.

Greg, did you reproduce the crash just by opening one video app? Can you write more detailed step(STR) to reproduce the crash?
Flags: needinfo?(ggrisco)
Reporter

Comment 34

5 years ago
Here are the steps I used to reproduce:

1. reboot phone
2. select Video app

video images are displayed

3. select a video to play
4. if the video starts to play, go to step 1.  But after only two iterations of this I saw the "Cannot play video" message followed by crashs.
Flags: needinfo?(ggrisco) → needinfo?(sotaro.ikeda.g)
Assignee

Comment 35

5 years ago
Greg, thanks. I have an additional question. Can you answer the following questions?

- In which version of gonk is used for testing? (Gonk KitKat?).
- On which hardware does the test is done?
- Can you upload OMXClient.cpp file used for the ROM?
Flags: needinfo?(sotaro.ikeda.g) → needinfo?(ggrisco)
Assignee

Comment 36

5 years ago
(In reply to Greg Grisco from comment #34)
> Here are the steps I used to reproduce:
> 
> 1. reboot phone
> 2. select Video app

Greg, in the above "1", how did you reboot the phone. Did you do power off and on? Or, did you just reboot b2g?
Reporter

Comment 37

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #35)
> Greg, thanks. I have an additional question. Can you answer the following
> questions?
> 
> - In which version of gonk is used for testing? (Gonk KitKat?).
> - On which hardware does the test is done?
> - Can you upload OMXClient.cpp file used for the ROM?

This was on L-based build on reference hardware.  Gonk version is listed in comment 4.

(In reply to Sotaro Ikeda [:sotaro] from comment #36)
> (In reply to Greg Grisco from comment #34)
> > Here are the steps I used to reproduce:
> > 
> > 1. reboot phone
> > 2. select Video app
> 
> Greg, in the above "1", how did you reboot the phone. Did you do power off
> and on? Or, did you just reboot b2g?

The crash occurs in both scenarios.
Flags: needinfo?(ggrisco)
Assignee

Comment 39

5 years ago
Wired thing about this bug is that logcat says that the crash happened because of memory allocation failure.

> Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort

But crash log says that android::OMXClient::connect() failure calls abort.
Assignee

Comment 40

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #38)
> Used OMXClient.cpp seems like the following.
> https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/tree/media/
> libstagefright/OMXClient.cpp?h=mozilla/b2g-5.0.0_r6

If the above code is correct, when abort was called in OMXClient::connect(), it succeeded to get IServiceManager and IMediaPlayerService. But failed to get IOMX. IOMX should always succeed when IMediaPlayerService exists.
Assignee

Comment 41

5 years ago
Another wired thing is that if the crash happened by OMXClient::connect() failure, the following log should exit in logcat log.

> F OMXClient: frameworks/av/media/libstagefright/OMXClient.cpp:406 CHECK(mOMX.get() != NULL) failed.
Reporter

Comment 42

5 years ago
Maybe mOMX is null?
Assignee

Comment 43

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #39)
> Wired thing about this bug is that logcat says that the crash happened
> because of memory allocation failure.
> 
> > Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort
> 
> But crash log says that android::OMXClient::connect() failure calls abort.

There seems no inconsistency about the above. In b2g and b2g container process, abort() call is redirected to mozalloc_abort() call. attachment 8555441 [details] shows mozalloc_abort() call. And mozalloc_abort() prints the following log.

> Gecko   : mozalloc_abort: Redirecting call to abort() to mozalloc_abort
Assignee

Comment 44

5 years ago
(In reply to Greg Grisco from comment #42)
> Maybe mOMX is null?

There is such possibility. But MediaPlayerService::getOMX() always return valid OMX.

https://www.codeaurora.org/cgit/quic/la/platform/frameworks/av/tree/media/libmediaplayerservice/MediaPlayerService.cpp?h=LA.AF.1.1.1#n357
Assignee

Comment 45

5 years ago
Add some logs around OMXClient and OMX.
Assignee

Comment 46

5 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #45)
> Created attachment 8556062 [details] [diff] [review]
> log patch - framework/av
> 
> Add some logs around OMXClient and OMX.

I checked the patch on nexus5-l. Some how, I rarely saw log around OMXClient, although log around MediaPlayerService almost always seems to be printed to logcat.
Assignee

Comment 47

5 years ago
Greg, can you test again with attachment 8556062 [details] [diff] [review]? some log seems not always printed to logcat, but the additional log could help for debugging. Thanks.
Flags: needinfo?(ggrisco)
Reporter

Comment 48

5 years ago
Posted file video_works.log
This is what the log looks like when I'm able to play video without crash.  Extra logging from attachment 8556062 [details] [diff] [review] is applied.
Flags: needinfo?(ggrisco)
Reporter

Comment 49

5 years ago
Posted file video_crashes.log
This is what the log looks like when Video crashes.  Extra logging from attachment 8556062 [details] [diff] [review] is applied.
Assignee

Comment 50

5 years ago
Thanks! When the problem happens, somehow, an application seems to failed to call MediaPlayerService::getOMX(). The function call log exist only on video_works.log(attachment 8556165 [details]).

But it is still not clear why it failed.
Assignee

Comment 51

5 years ago
I re-checked decoded minidump of the crash attachment 8555441 [details]. I recognized one wired thing. It has all threads stacks in the crash process. But it does not have IPC Pool threads. They are created by android::ProcessState::self()->startThreadPool() call. The function is called during application process initialization.
Assignee

Comment 52

5 years ago
The problem seems to happen around android binder ipc.
Assignee

Comment 53

5 years ago
If we have msm8909 device, it might becomes easier to investigate this problem...
Reporter

Comment 54

4 years ago
Sotaro, is there any other information that I collect for you since you don't have 8909?
Reporter

Updated

4 years ago
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(cpearce)
Assignee

Comment 55

4 years ago
(In reply to Greg Grisco from comment #54)
> Sotaro, is there any other information that I collect for you since you
> don't have 8909?

Thanks, I will prepare a new log patch tomorrow(Thursday).
Assignee

Comment 56

4 years ago
Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 58

4 years ago
Greg, can you get a log with the following patches applied?

  - log patch - gecko: attachment 8559996 [details] [diff] [review]
  - log patch - framework/native: attachment 8559997 [details] [diff] [review]
  - log patch - framework/av: attachment 8556062 [details] [diff] [review]
Flags: needinfo?(ggrisco)
Reporter

Comment 59

4 years ago
Hi Sotaro, some of these patches no longer apply.  I'm trying to fix them up but will have to finish it tomorrow.  I hope to have some logs for you to look at tomorrow morning, thanks.
Reporter

Comment 60

4 years ago
Posted file music_video.log
Now it's getting hard to reproduce this bug.  We were seeing hundreds of these crashes in past couple of weeks, but last two days automation hasn't produced any crashes.  However, I can see the "Another app is currently using the video player." message after I follow these steps:

1. Start Music app
2. Select music track to play

With some content, I'm seeing the Music app freeze (bug 1129685)

3. Kill the Music app
4. Start the Video app

Now I see the message.

I collected logs for this with logging patches applied.
Flags: needinfo?(ggrisco)
Assignee

Comment 62

4 years ago
(In reply to Greg Grisco from comment #60)
> Created attachment 8560553 [details]
> music_video.log
> 
> Now it's getting hard to reproduce this bug.  We were seeing hundreds of
> these crashes in past couple of weeks, but last two days automation hasn't
> produced any crashes.  However, I can see the "Another app is currently
> using the video player." message after I follow these steps:
> 
> 1. Start Music app
> 2. Select music track to play
> 
> With some content, I'm seeing the Music app freeze (bug 1129685)
> 
> 3. Kill the Music app
> 4. Start the Video app
> 
> Now I see the message.
> 
> I collected logs for this with logging patches applied.

This seems like a different problem/bug. The log is not optimum for this symptom. But from the log, there seems to have a problem to "input data read" of OMXCodec from DataSource. It might be cause by same problem as bug 1129685.
Assignee

Comment 63

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #62)
> (In reply to Greg Grisco from comment #60)
> > Created attachment 8560553 [details]
> > music_video.log
> > 
> > Now it's getting hard to reproduce this bug.  We were seeing hundreds of
> > these crashes in past couple of weeks, but last two days automation hasn't
> > produced any crashes.  However, I can see the "Another app is currently
> > using the video player." message after I follow these steps:

Video tag does not expose such capability. Therefore, the above status is detected by heuristics by using event response time. Therefore, the above message normally say, gecko MediaDecoder is not emitting the event withing 2 second or so.
Reporter

Comment 65

4 years ago
Although I have trouble reproducing this manually now, it still reproduces in monkey test.  I'm trying to get a log with the logging patches applied.  Will share those when I get them.
Assignee

Comment 69

4 years ago
Greg, Comment 67 and Comment 68 does not have logcat data. Can we have the logcat log?
Flags: needinfo?(ggrisco)
Reporter

Comment 76

4 years ago
It seems this exact crash is not reproducing anymore with the logging patch applied.  Could just be a coincidence, but I'll leave the patch in for a while longer to see if we get anything.
Flags: needinfo?(ggrisco)
Assignee

Comment 80

4 years ago
attachment 8571002 [details] has the followings. From them, mediaserver seems not started yet.

> ServiceManager: Waiting for service media.camera...

> ServiceManager: Waiting for service media.player...
Assignee

Comment 81

4 years ago
From comment 80, there might be the following possibilities.

- [1] mediaserver's initialization takes too long time
- [2] mediaserver's initialization became stuck
Reporter

Comment 85

4 years ago
BTW, I had removed the logging patch this weekend and now we're seeing crashes again.  It seems that the extra logging was preventing the crash from happening.
Assignee

Comment 86

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #81)
> From comment 80, there might be the following possibilities.
> 
> - [1] mediaserver's initialization takes too long time
> - [2] mediaserver's initialization became stuck

There is also another possibility.
- [3] Binder IPC sometimes does not work correctly from content process.

Comment 93

4 years ago
Hi Sotaro,

(In reply to Sotaro Ikeda [:sotaro] from comment #86)
> (In reply to Sotaro Ikeda [:sotaro] from comment #81)
> ...
>
> There is also another possibility.
> - [3] Binder IPC sometimes does not work correctly from content process.

Are you testing that possibility? What next steps can be taken to move this issue forward?

Thanks,
Mike
Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 94

4 years ago
(In reply to Mike Lee [:mlee] from comment #93)
> 
> Are you testing that possibility? What next steps can be taken to move this
> issue forward?

We do not have a environment for testing. Recently, only device was received, but correct software environment was not provided yet. Therefore, only thing is ask code aurora to add some logs to there environment. But adding a lot of log could cause comment 85.
Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 95

4 years ago
I am going to create a different patch to check mediaserver's service start up.
Assignee

Comment 96

4 years ago
Add log around mediaserver's service start up.
Assignee

Comment 97

4 years ago
Greg, can you add log patch attachment 8572867 [details] [diff] [review] to codeaurora's build? It add log around mediasercices' start up. It seems not add too much log.
Flags: needinfo?(ggrisco)
Reporter

Comment 101

4 years ago
(In reply to Sotaro Ikeda [:sotaro] from comment #97)
> Greg, can you add log patch attachment 8572867 [details] [diff] [review] to
> codeaurora's build? It add log around mediasercices' start up. It seems not
> add too much log.

Ok, it should get built today and monkey will run this weekend.  If there are crashes, I (or cafbot) will send you the logs.
Flags: needinfo?(ggrisco)
Assignee

Comment 102

4 years ago
Thank you!
Reporter

Comment 103

4 years ago
Hi Sotaro, we're able to reproduce more reliably now and think this might be due to race condition at startup.  We're going to debug this now.  I marked as [POVB] for now, will loop you back in when we know more.  Thanks!
Whiteboard: [b2g-crash][caf-crash 433][caf priority: p1][CR 785241] → [POVB][b2g-crash][caf-crash 433][caf priority: p1][CR 785241]
Assignee

Comment 104

4 years ago
Thanks for the notification:-)
It turns out that this doesn't really have much to do with media/OMX.  It now looks like on some (all?) L gonks, /system/b2g/plugin-container is unable to connect to the mediaserver.  This can also be reproduced in the Dialer app and Built-in Keyboard app, when they try to play audio.

Example using the Phone app instead of Video app, since it's easier to demonstrate the issue:

BAD
---
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     273796 89472 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  24452 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   167484 37608 ffffffff b6f43d08 S /system/b2g/b2g
(Preallocated a  2 u0_a3579  3579  541   70720  17596 ffffffff b6f43d08 S /system/b2g/b2g

$ adb shell kill 3579
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     268924 88952 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  24452 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   166444 38040 ffffffff b6f43d08 S /system/b2g/b2g

<<< Launch Phone app >>>

$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     244048 90124 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  23772 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   98428  30132 ffffffff b6f43d08 S /system/b2g/b2g
Communications   2 u0_a3661  3661  260   91080  37004 ffffffff b6f3f070 R /system/b2g/plugin-container

$ adb logcat | grep Waiting
I/ServiceManager( 3661): Waiting for service media.audio_flinger...
I/ServiceManager( 3661): Waiting for service media.audio_flinger...
I/ServiceManager( 3661): Waiting for service media.audio_flinger...

<<<  The Phone app is completely wedged here >>

GOOD
----
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     257484 91604 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  23592 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   131876 35916 ffffffff b6f43d08 S /system/b2g/b2g
(Preallocated a  2 u0_a3694  3694  541   70720  17224 ffffffff b6f43d08 S /system/b2g/b2g

<<< Launch Phone app >>>

$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      260   1     246800 91928 ffffffff b6f43d08 S /system/b2g/b2g
(Nuwa)           0 root      541   260   67472  23404 ffffffff b6f43d08 S /system/b2g/b2g
Homescreen       2 u0_a2728  2728  541   98428  29672 ffffffff b6f43d08 S /system/b2g/b2g
Communications   2 u0_a3694  3694  541   89372  28908 ffffffff b6f43d08 S /system/b2g/b2g

$ adb logcat | grep Waiting
null

<<<  The Phone app is working fine here >>
Summary: Assertion failure in OMXDecoder::AllocateMediaResources during monkey test → [l-gonk] plugin-container can't connect to media server
Whiteboard: [POVB][b2g-crash][caf-crash 433][caf priority: p1][CR 785241] → [b2g-crash][caf-crash 433][caf priority: p1][CR 785241]
Once bug 1053634 is fixed plugin-container should never be used and perhaps this bug just disappears.

It would be interesting if the STR from comment 108 reproduce the issue on N5 though, I don't have one to try here.
Depends on: 1053634

Updated

4 years ago

Updated

4 years ago
No longer blocks: CAF-v3.0-FL-metabug
Shawn,

Can someone on the Taipei team confirm whether this issue reproduces on the Nexus 5? CAF is trying to understand if this is only an issue with their device.

Thanks,
Mike
Flags: needinfo?(sku)

Comment 111

4 years ago
(In reply to Mike Lee [:mlee] from comment #110)
> Shawn,
> 
> Can someone on the Taipei team confirm whether this issue reproduces on the
> Nexus 5? CAF is trying to understand if this is only an issue with their
> device.
> 
> Thanks,
> Mike


Mike,
There is no such BAD symptom as comment 108 said. 

Blake,
 Once you got time, please can you try check this issue as well?



$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     314028 117936 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   105224 28620 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
(Preallocated a  2 u0_a1776  1776  385   70932  17800 ffffffff b6e5b490 S /system/b2g/b2g

<< Launch communication app >>
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     364464 119004 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   108304 27896 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
Communications   2 u0_a1776  1776  385   98516  28224 ffffffff b6e5b490 S /system/b2g/b2g

<< Kill communication app >>
$ adb shell kill 1776
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     357232 118708 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   130172 34664 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
(Preallocated a  2 u0_a2019  2019  385   70932  17996 ffffffff b6e5b490 S /system/b2g/b2g

<< Re-launch communication app >>
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     366676 120680 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   112492 39960 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
Communications   2 u0_a2019  2019  385   97492  28392 ffffffff b6e5b490 S /system/b2g/b2g

<< Launch video app >>
$ adb shell b2g-ps
APPLICATION    SEC USER     PID   PPID  VSIZE  RSS     WCHAN    PC        NAME
b2g              0 root      190   1     367636 126220 ffffffff b6e5b490 S /system/b2g/b2g
(Nuwa)           0 root      385   190   66652  22216 ffffffff b6e5b490 S /system/b2g/b2g
Homescreen       2 u0_a822   822   385   108304 27636 ffffffff b6e5b490 S /system/b2g/b2g
Built-in Keyboa  2 u0_a969   969   190   89904  31744 ffffffff b6e5e490 S /system/b2g/plugin-container
Find My Device   2 u0_a1049  1049  385   74152  22112 ffffffff b6e5b490 S /system/b2g/b2g
Communications   2 u0_a2019  2019  385   101576 28216 ffffffff b6e5b490 S /system/b2g/b2g
Video            2 u0_a2210  2210  385   92732  25072 ffffffff b6e5b490 S /system/b2g/b2g
Flags: needinfo?(sku) → needinfo?(bwu)
Currently I don't have nexus 5 at my hand. mchiang should be able to help check it. :)
Flags: needinfo?(bwu) → needinfo?(mchiang)
Hi,

I follow the steps in comment 34 and try to reproduce the bug with Nexus 5.
Cannot reproduce it for 10 times.
Since Greg also cannot reproduce the bug manually now, we need to come out a monkey & stress test to hit this problem.

Hi Greg,

According to the log, seems the problem is related to the binder/IPC.
Does your cafbot also record kernel log?
Could you help do the favor:
1) Set cafbot to record all logs from the boot up (We expect to see the log like "MediaPlayerService::instantiate()", which ensures mediaplayerservice indeed instantiate successfully)
2) Provide both logcat log and kernel log.
3) Ask your system team to assist analyzing the kernel log and see if there is any IPC/binder issue. Is that possible there is a binder leakage?

Regards,
Munro
Flags: needinfo?(mchiang)
Reporter

Updated

4 years ago
Flags: needinfo?(ggrisco)
Assignee

Comment 114

4 years ago
This log patch is a part of Attachment 8559997 [details] [diff]. Limit log to servicemanager.
Assignee

Comment 115

4 years ago
I do not think that "plugin-container can't connect to media server". Just before video decoder instantiation, content process access mediaserver by getMediaResourceManagerService().

getMediaResourceManagerService() access to the media server. IMediaResourceManagerService is used to reserve video decoder resource. The instance of IMediaResourceManagerService exits in b2g process and it is registered to mediaserver. The content process get the interface via mediaserver.

https://dxr.mozilla.org/mozilla-central/source/dom/media/omx/mediaresourcemanager/IMediaResourceManagerDeathNotifier.cpp#45
Assignee

Comment 116

4 years ago
Greg, can you add attachment 8576783 [details] [diff] [review]? It make sure if content process access mediasercver.
Flags: needinfo?(ggrisco)
Assignee

Comment 117

4 years ago
Another reason of comment 115 is in attachment 8554771 [details] in comment 15. Only this log has b2g process reboot log. During b2g process reboot. ServiceManager always emit some service died messages. But it does not have a died message of mediaserver's services.

Died logs of attachment 8554771 [details] was the following.
--------------------------------------------
01-01 00:03:37.239   217   217 I ServiceManager: service 'media.resource_manager' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'SurfaceFlinger' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'permission' died
01-01 00:03:37.239   217   217 I ServiceManager: service 'display.qservice' died

But we normally expect the followig died logs. b2g process reboot also trigger mediaserver prorocess reboot.
-------------------------------------------------------------------

03-12 13:06:41.790   175   175 I ServiceManager: service 'android.security.keystore' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'media.resource_manager' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'SurfaceFlinger' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'permission' died
03-12 13:06:41.790   175   175 I ServiceManager: service 'display.qservice' died
03-12 13:06:41.791   193   193 D nfcd    :  4001536 of bytes to be sent... data=0x0 ret=0
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.sound_trigger_hw' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.audio_flinger' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.player' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.camera' died
03-12 13:06:41.893   175   175 I ServiceManager: service 'media.audio_policy' died
Flags: needinfo?(mvines)
Flags: needinfo?(ggrisco)
(In reply to mchiang from comment #113)
> I follow the steps in comment 34 and try to reproduce the bug with Nexus 5.
> Cannot reproduce it for 10 times.
 
Thanks for checking!

> Since Greg also cannot reproduce the bug manually now, we need to come out a
> monkey & stress test to hit this problem.

This can now be reproduced 100% over here, please see comment 108 for how.   It's something to do with fd management in the plugin-container process as the issue does not reproduce if plugin-container was not needed and the Video/Dialer/Keyboard app run in a b2g process and certainly may be a POVB.

However I think we should focus our attention on fixing bug 1053634 right now, as this will remove the plugin-container process entirely and as a fringe benefit should resolve this bug (fingers crossed).
Flags: needinfo?(mvines)
Bug 1053634 avoids this issue as hoped so we don't need a fix here.
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.