Closed Bug 853977 Opened 7 years ago Closed 7 years ago

3rd party app Poppit crashes or hangs after launching

Categories

(Core :: Audio/Video, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

VERIFIED FIXED
blocking-b2g leo+
Tracking Status
b2g18 --- verified
b2g18-v1.0.1 --- unaffected

People

(Reporter: adora, Assigned: sotaro)

References

Details

(Keywords: crash, regression, reproducible, Whiteboard: [b2g-crash][apps watch list1] )

Crash Data

Attachments

(2 files, 2 obsolete files)

Crash reports from initial test:
https://crash-stats.mozilla.com/report/index/bp-482a634b-e776-4fe9-86f2-6885a2130322

Initially unable to duplicate crash after rebooting, app just hangs indefinitely on the blue EA logo.  After hard exiting the app and launching again, then able to duplicate crash.

Link to v1.0.1 in review: https://marketplace.firefox.com/reviewers/apps/review/poppit

Build: 20130319070203
Whiteboard: [apps watch list]
Crash Signature: [@ __libc_android_abort | __android_log_assert | android::OMXClient::connect]
Keywords: crash
Looks like a crash in the OMX Decoder. Looks media related.

Chris - Any ideas? Is this a driver-level crash or gecko crash?
Flags: needinfo?(chris.double)
Sotaro, Crash report here indicates OMXClient::connect. Is this in the area of the OMXClient refactoring that you did? Any thoughts on what the crash/hang might be?
Flags: needinfo?(chris.double) → needinfo?(sotaro.ikeda.g)
I'm going to go ahead and guess this is likely a regression.
Keywords: regression
I would be interested in knowing if it works with a build before bug 803471 landed.
(In reply to Chris Double (:doublec) from comment #4)
> I would be interested in knowing if it works with a build before bug 803471
> landed.

Good point. I'm putting regressionwindow-wanted to check for this.

leo? nom defensively if this is a regression from bug 803471. Although it's probably worth noming anyways cause this is breaking basic usage of a partner app.
blocking-b2g: --- → leo?
Component: General → Video/Audio
Product: Boot2Gecko → Core
Version: unspecified → Trunk
Lisa - Is this crash consistently reproducible with Poppit?
(In reply to Jason Smith [:jsmith] from comment #6)
> Lisa - Is this crash consistently reproducible with Poppit?

IRC - Lisa mentions this is reproducible with this app.
Keywords: reproducible
I'm grabbing this for regression window analysis.
QA Contact: jsmith
(In reply to Lisa Brewster [:adora] from comment #0)
> Crash reports from initial test:
> https://crash-stats.mozilla.com/report/index/bp-482a634b-e776-4fe9-86f2-
> 6885a2130322
> 
The crash happens when it failed to receive IOMX from MediaPlayerService in mediaserver process. But from the thread info, there are already some OMXCodec. From the info, it seem that, mediaserver process chrash happened.
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Lisa Brewster [:adora] from comment #0)
> 
> Link to v1.0.1 in review:
> https://marketplace.firefox.com/reviewers/apps/review/poppit
> 
> Build: 20130319070203

I do not have a permission to access the site. How can I get the permission?
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> (In reply to Lisa Brewster [:adora] from comment #0)
> > 
> > Link to v1.0.1 in review:
> > https://marketplace.firefox.com/reviewers/apps/review/poppit
> > 
> > Build: 20130319070203
> 
> I do not have a permission to access the site. How can I get the permission?

Looks like it's a non-privileged app. I can host this on my site to allow you to install it.
Whiteboard: [apps watch list] → [b2g-crash][apps watch list]
Confirmed I can reproduce the crash on startup on a 3/26 build. Let me work on getting that regression range now.
FYI - this app is loaded with lots of mp3 sounds, so this is definitely starting to sound like it's a regression from bug 803471.
I added the app here - http://mozqa.com/webapi-permissions-tests/ under Packaged App Test Case 20. I confirmed I'm getting the crash still too from installing and launching that app.
Here's what I got so far:

3/26 - Busted
3/19 - Busted
3/11 - Working
Talking with Chris in IRC, comment 15 confirms his findings that this is indeed a regression from bug 803471.
blocking-b2g: leo? → leo+
Can we find out if v1.0.1 is affected as well?
Assignee: nobody → chris.double
Keywords: qawanted
(In reply to Alex Keybl [:akeybl] from comment #17)
> Can we find out if v1.0.1 is affected as well?

1.01 won't be affected here - the regressing bug (bug 803471) is a leo+ bug that has only landed on b2g18, not v1.01.
Assigning to Sotaro as it's deeply involved with the refactorings done in bug 803471.
Assignee: chris.double → sotaro.ikeda.g
I confirmed that media server crash happened during app start up. I am going investigate about it more.
I found that Binder ipc failed with following log. It is not clear why this happens.

-------------------------------------------------
03-27 09:32:28.989   116   894 E IMemory : binder=0x156c530 transaction failed fd=-2147483647, size=0, err=-2147483646 (Unknown error: 2147483646)
03-27 09:32:28.989   116   894 E IMemory : cannot dup fd=-2147483647, size=0, err=-2147483646 (Bad file number)
03-27 09:32:28.989   116   894 E IMemory : cannot map BpMemoryHeap (binder=0x156c530), size=0, fd=-1 (Bad file number)
03-27 09:32:28.989   584   969 V OMXCodec: [OMX.google.mp3.decoder] allocated buffer 0x1560a50 on output port
03-27 09:32:28.989   116   698 E IMemory : binder=0x156c530 transaction failed fd=-2147483647, size=0, err=-2147483646 (Unknown error: 2147483646)
03-27 09:32:28.989   116   698 E IMemory : cannot dup fd=-2147483647, size=0, err=-2147483646 (Bad file number)
03-27 09:32:28.989   116   698 E IMemory : cannot map BpMemoryHeap (binder=0x156c530), size=0, fd=-1 (Bad file number)
03-27 09:32:28.989   584   969 V OMXCodec: [OMX.google.mp3.decoder] allocated buffer 0x1574e20 on output port
03-27 09:32:28.989   116   717 E IMemory : binder=0x156c530 transaction failed fd=-2147483647, size=0, err=-2147483646 (Unknown error: 2147483646)
03-27 09:32:28.989   116   717 E IMemory : cannot dup fd=-2147483647, size=0, err=-2147483646 (Bad file number)
03-27 09:32:28.989   116   717 E IMemory : cannot map BpMemoryHeap (binder=0x156c530), size=0, fd=-1 (Bad file number)
03-27 09:32:28.989   584   969 V OMXCodec: [OMX.google.mp3.decoder] allocated buffer 0x1574f88 on output port
During the app load, 30 OMXCodec instances are created by the app.
Hi Sotaro,

The log seems to show that "share memory for audio buffer in audiotrack is not valid". So you might check whether the code for creating audiotrack returned any error or not.
It fails to map MemoryHeap at some point during initialization. 
http://androidxref.com/4.0.4/xref/frameworks/base/libs/binder/IMemory.cpp#318

The Binder driver failed to deliver file descriptor.
cess)(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> It fails to map MemoryHeap at some point during initialization. 
> http://androidxref.com/4.0.4/xref/frameworks/base/libs/binder/IMemory.cpp#318
> 
> The Binder driver failed to deliver file descriptor.

It seems that file descriptor hit rlimit. The number of system's default file descriptor's rlimit is 1024. And a descroptor of last success BpMemoryHeap::assertReallyMapped() is always 1022 or 1023. like following 

>04-02 09:16:37.801   486   870 I GonkMp3 : BnMemoryHeap::onTransact() getHeapID() 243 getSize() 32768 getFlags() 0 getOffset() 0
>04-02 09:16:37.801   115   685 I GonkMp3 : BpMemoryHeap::assertReallyMapped()  binder=0x4411f8 fd=1023, size=32768, err=0 (Success)
When I manually increase rlimit of app's process and mediaserver process from 1024 to 2048, the app starts normally.
It is better to change rlimit in init.rc like following. 
http://androidxref.com/4.0.4/xref/system/core/rootdir/init.rc#199

But init.rc is not updated by "./flash.sh" It is included in boot.img.
:mwu, to fix the bug, it is necessary to increase rlimit of file descriptor. Where is a good place to do it? At "init.rc" ?
Flags: needinfo?(mwu)
If you https://github.com/mozilla-b2g/gonk-misc/blob/master/init.b2g.rc loads early enough for you, I recommend putting it there.
Flags: needinfo?(mwu)
(In reply to Michael Wu [:mwu] from comment #29)
> If you https://github.com/mozilla-b2g/gonk-misc/blob/master/init.b2g.rc
> loads early enough for you, I recommend putting it there.

:mwu, thanks for the quick response! I have 2 more questions.

- how can I update init.b2g.rc? Is there a document about how to?
- how cam I build boot.img? IIRC, init.b2g.rc is included in boot.img.
Flags: needinfo?(mwu)
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> (In reply to Michael Wu [:mwu] from comment #29)
> > If you https://github.com/mozilla-b2g/gonk-misc/blob/master/init.b2g.rc
> > loads early enough for you, I recommend putting it there.
> 
> :mwu, thanks for the quick response! I have 2 more questions.
> 
> - how can I update init.b2g.rc? Is there a document about how to?

The question is about how to update init.b2g.rc on a phone like unagi.
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> (In reply to Michael Wu [:mwu] from comment #29)
> > If you https://github.com/mozilla-b2g/gonk-misc/blob/master/init.b2g.rc
> > loads early enough for you, I recommend putting it there.
> 
> :mwu, thanks for the quick response! I have 2 more questions.
> 
> - how can I update init.b2g.rc? Is there a document about how to?

Send a pull request. Then we'll need to make sure that it gets uplifted to v1-train.

> - how cam I build boot.img? IIRC, init.b2g.rc is included in boot.img.

We usually use https://github.com/mozilla-b2g/unbootimg/ to repack the boot image. Since we only care about fixing this for v1-train, we only need repack the devices where that matters. I can update the boot images for you in this case.
Flags: needinfo?(mwu)
(In reply to Michael Wu [:mwu] from comment #32)
> (In reply to Sotaro Ikeda [:sotaro] from comment #30)
> > (In reply to Michael Wu [:mwu] from comment #29)
> > > If you https://github.com/mozilla-b2g/gonk-misc/blob/master/init.b2g.rc
> > > loads early enough for you, I recommend putting it there.
> > 
> > :mwu, thanks for the quick response! I have 2 more questions.
> > 
> > - how can I update init.b2g.rc? Is there a document about how to?
> 
> Send a pull request. Then we'll need to make sure that it gets uplifted to
> v1-train.
> 
> > - how cam I build boot.img? IIRC, init.b2g.rc is included in boot.img.
> 
> We usually use https://github.com/mozilla-b2g/unbootimg/ to repack the boot
> image. Since we only care about fixing this for v1-train, we only need
> repack the devices where that matters. I can update the boot images for you
> in this case.

Thanks. I could repack boot.img from your info. But setrlimit in "init.b2g.rc" do not work as expected. I am going to investigate about it more.
(In reply to Sotaro Ikeda [:sotaro] from comment #27)
> It is better to change rlimit in init.rc like following. 
> http://androidxref.com/4.0.4/xref/system/core/rootdir/init.rc#199
> 
> But init.rc is not updated by "./flash.sh" It is included in boot.img.

The above point in init.rc works for the bug.
I reconfirmed "init.rc" and "init.b2g.rc". "init.b2g.rc" is too late for the bug. Need to call setrlimit in "init.rc".
Comment on attachment 732534 [details] [diff] [review]
patch - increase RLIMIT_NOFILE's soft limit to 2048 in init.rc

:mwu, can you review the patch?

It changes rlimit in init.rc. I tried in init.b2g.rc, but it did not work. Is it correct to change "init.rc.patch" if I want to change "init.rc"? It seems that "init.rc.patch" is not used for actual "init.rc". "init.rc" that is built from "init.rc.patch" has "import /init.b2g.rc" in the first lime of the "init.rc". But in prebuilt boot.img, the line is under "on early-init" in 9th lime.
Attachment #732534 - Flags: review?(mwu)
By applying the change on unagi, I checked mediaserver process' file descriptor number.

After starting the Poppit, the number becomes 1112.
>adb shell ls  /proc/115/fd | wc -w
>1112

After kill the Poppit, the number decreases to 13.

>adb shell ls  /proc/115/fd | wc -w
>13
The app loads 30 mp3 file. It seems like each mp3 file requirs 36 file descriptor.
Attached file stack info of BpMemory constructor (obsolete) —
get stack information of BpMemory constructor in mediaserver process during the Poppit start up.
(In reply to Sotaro Ikeda [:sotaro] from comment #40)
> Created attachment 732942 [details]
> stack info of BpMemory constructor
> 
> get stack information of BpMemory constructor in mediaserver process during
> the Poppit start up.

From the stak info, file descriptors are created for each IMemory. Actual heap is created for each OMXCodec, but the file descriptor for the heap is created for each buffer in mediaserver process.
So you put setrlimit on "on boot" in init.b2g.rc? Have you tried other events like "on early-boot"?
(In reply to Michael Wu [:mwu] from comment #42)
> So you put setrlimit on "on boot" in init.b2g.rc? Have you tried other
> events like "on early-boot"?

Sorry, I misunderstood how init.b2g.rc works. I put setrlimit in init.rc.
Right now, I conformed that setrlimit on "on boot" in init.b2g.rc. I am going to create a new patch soon.
Confirmed the patch works on unagi.
Attachment #732534 - Attachment is obsolete: true
Attachment #732534 - Flags: review?(mwu)
Comment on attachment 732988 [details] [diff] [review]
patch - increase RLIMIT_NOFILE's soft limit to 2048 in init.b2g.rc

mwu, I created a new patch. Can you review the patch?
Attachment #732988 - Flags: review?(mwu)
Comment on attachment 732988 [details] [diff] [review]
patch - increase RLIMIT_NOFILE's soft limit to 2048 in init.b2g.rc

Review of attachment 732988 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #732988 - Flags: review?(mwu) → review+
Carry "mwu: review+"
Attachment #733048 - Flags: review+
Attachment #732942 - Attachment is obsolete: true
mwu, can you create a latest boot.img that apply the change? It is necessary to confirm the fix by other person. thanks.
Flags: needinfo?(mwu)
Whiteboard: [b2g-crash][apps watch list] → [b2g-crash][apps watch list1]
Uplifted to v1-train.

A new boot image will be available to verify this bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(mwu)
Resolution: --- → FIXED
Tony - Did someone end up verifying this?

This requires a kernel change per the email I got from Michael, so a verification would be helpful here.

To the person who does the verification, you just need to get a build that contains the kernel fix and bug fix here, and run the following STR:

1. Go to http://mozqa.com/webapi-permissions-tests/
2. Install "Packaged App Test Case 20"
3. Launch it

If you don't crash and the app loads correctly, then you can say this is marked verified.
Flags: needinfo?(tchung)
Aaron, can you take a stab at this?  see email for location of new kernel.
Flags: needinfo?(tchung) → needinfo?(aaron.train)
(In reply to Jason Smith [:jsmith] from comment #50)
> If you don't crash and the app loads correctly, then you can say this is
> marked verified.

Based on this, verified fixed (kernel update #6, latest-b2g18-04/12)
Status: RESOLVED → VERIFIED
Flags: needinfo?(aaron.train)
I've just flashed with the latest nightly build (2013-04-15) and re-installed this app, and the loading progress bar stalls indefinitely at about 95%.

Is there anything else I need to do to suck in this kernel fix so I can continue with app review?
(In reply to Lisa Brewster [:adora] from comment #53)
> I've just flashed with the latest nightly build (2013-04-15) and
> re-installed this app, and the loading progress bar stalls indefinitely at
> about 95%.
> 
> Is there anything else I need to do to suck in this kernel fix so I can
> continue with app review?

Just update kernel is enough to fix the bug.
So stupid question...plain old flashing updates the kernel, too?

Aaron, were you able to fully launch the app and actually play the game?
Flags: needinfo?(aaron.train)
(In reply to Lisa Brewster [:adora] from comment #55)
> So stupid question...plain old flashing updates the kernel, too?
> 
> Aaron, were you able to fully launch the app and actually play the game?

Updated kernel and latest build at the time of commenting, launched and was playable.
Flags: needinfo?(aaron.train)
This seems to hang at ~95% on mozilla-b2g18 (04/16) using the str in comment #50. This at one point (04/12) was fixed and has regressed again.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Did you install the kernel after installing the build? The build has its own kernel which you need to replace after flashing it.
(In reply to Michael Wu [:mwu] from comment #58)
> Did you install the kernel after installing the build?

Yes. Re-testing on mozilla-b2g18 (04/18), applied kernel upgrade after flash, loading Poppit™ and it still 'hangs' at roughly ~95% of the game load on the EA screen.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Whoops didn't mean to revert flags back.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I confirmed that comment #57 happens in Firefox OS v1.1(04/21) on unagi. It happens by a different reason. There is no problem of file descriptors.
OMXCodec failed to load some mp3 fibes because of timeout happens with following log. 

> [OMX.google.mp3.decoder] Timed out waiting for output buffers: 0/0

I assume the timeout happens by slow data input to OMXCodec. Need to confirm it. If it becomes clear. It is better to be handled in a different bug.
The app's initialization was failed by timeout at OMXCodec. The timeout happened at OMXCodec::waitForBufferFilled_l(). By extending the OMXCodec's kBufferFilledEventTimeOutNs, the app can be loaded.
  http://androidxref.com/4.0.4/xref/frameworks/base/media/libstagefright/OMXCodec.cpp#3427

I am going to split the problems to two bugs.
- move mp3 software codec from mediasercver process to app's process
- extend OMXCodec's kBufferFilledEventTimeOutNs of OMXCodec
Depends on: 864180
Depends on: 864181
I've not had any problems launching and playing this game for the past several days.  Is it safe to publish now, or should I wait for the "depends on" bugs to get fixed?
Flags: needinfo?(jsmith)
First, I'm closing this bug because we've already landed the patches here. The two bugs that are dependencies here are followups based on the testing here.

Note that this bug only reproduces on 1.1 or later and it's associated . It does *not* reproduce on 1.01. So you should be good to approve with or without this bug right now anyways.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Flags: needinfo?(jsmith)
Resolution: --- → FIXED
Verified since testing was done here with two followups.
Status: RESOLVED → VERIFIED
Flags: in-moztrap?
Flags: in-moztrap? → in-moztrap+
Added Apps Suite Test Case #8610 Verify that the user can launch a 3rd party app
You need to log in before you can comment on or make changes to this bug.