Closed Bug 965369 Opened 11 years ago Closed 6 years ago

[B2G][Poppit] App freezes after changing difficulty and starting new game

Categories

(Tech Evangelism Graveyard :: Preinstalled B2G Apps, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bzumwalt, Assigned: jmunck)

References

Details

(Keywords: regression, Whiteboard: [c=memory p= s= u=])

Attachments

(2 files)

Description:
While playing the game Poppit freezing occurs on through a combination of new games and difficulty changes after playing for a few minutes. When game is frozen, animations may still play, but user input is not accepted. Repros 100% of time, may need to repeat STR multiple times.

Repro Steps:
1) Open Marketplace app and install Poppit
2) Launch Poppit app
3) Play game normally for ~2 minutes
4) Tap menu
5) Select options and change difficulty from current setting
6) Return to menu and start new game

Actual:
Poppit freezes after a few minutes of play when changing difficulty or starting new game.

Expected:
User can play Poppit and change settings of game without incident.

Environmental Variables:
Device: Buri v1.2 COM RIL
BuildID: 20140128004004
Gaia: 539a25e1887b902b8b25038c547048e691bd97f6
Gecko: d10e1f965d0c
Ril: 01.02.00.019.102
Version: 26.0
Firmware Version: V1.2-device.cfg

Notes:
Repro frequency: 3/3, 100%
Repro steps 3 - 6 may need to be repeated in order for bug to manifest
We need a logcat here with console enabled.
blocking-b2g: --- → 1.3?
Keywords: qawanted, regression
I saw this a few times while working on another poppit bug.  It seemed to happen the most for me if I started poppit then went to homescreen and back to poppit multiple times.  Locking and unlocking the phone was another way to trigger it.  Anything to try to force poppit, system, and homescreen to run at once.

I did not fully investigate but it seemed to be related to homescreen getting killed while trying to transition between apps.  Thats just a guess at this point, though.
Attached file Bug_965369_Logcat
Please see log cat attached, also I was able to get the app to OOM 

<4>[ 1279.076656] select 2784 (Poppit!"), adj 10, size 19868, to kill
<4>[ 1279.076679] send sigkill to 2784 (Poppit!"), adj 10, size 19868
Keywords: qawanted
Andrew - Can you find an assignee for this?
blocking-b2g: 1.3? → 1.3+
Flags: needinfo?(overholt)
Can we get an about:memory dump?
Assignee: nobody → overholt
Flags: needinfo?(overholt)
Keywords: qawanted
(In reply to Andrew Overholt [:overholt] from comment #5)
> Can we get an about:memory dump?

Note for QA Wanted request - https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Getting_About_Memory_Logs_for_a_Bug for more info how to do this btw.
QA Contact: pbylenga
Attached file about-memory-0.zip
Please see about memory report attached.
Keywords: qawanted
Ugh, in the poppit process:

62.68 MB (100.0%) -- explicit
├──29.07 MB (46.37%) ── heap-unclassified

Kyle, is that expected? Do we have any known large missing memory reporters on b2g, or do we need to dig into why that's so high here?
Also:

├──12.75 MB (20.34%) -- media
│  ├──12.75 MB (20.34%) ── decoded-audio
│  └───0.00 MB (00.00%) ── decoded-video
├───5.69 MB (09.08%) -- images
│   ├──5.69 MB (09.08%) -- content
│   │  ├──5.69 MB (09.08%) -- used
│   │  │  ├──4.21 MB (06.72%) ── uncompressed-heap
│   │  │  ├──1.48 MB (02.36%) ── raw
│   │  │  └──0.00 MB (00.00%) ── uncompressed-nonheap
│   │  └──0.00 MB (00.00%) ++ unused
│   └──0.00 MB (00.00%) ++ chrome
├───5.59 MB (08.92%) -- js-non-window
│   ├──3.47 MB (05.53%) -- runtime
│   │  ├──1.63 MB (02.59%) -- code
│   │  │  ├──0.89 MB (01.43%) ── ion
│   │  │  └──0.73 MB (01.17%) ++ (4 tiny)
│   │  ├──1.10 MB (01.76%) ++ (10 tiny)
│   │  └──0.74 MB (01.18%) ── script-sources
│   ├──1.56 MB (02.49%) -- zones
│   │  ├──0.79 MB (01.26%) ++ (2 tiny)
│   │  └──0.77 MB (01.23%) ++ zone(0x403b3800)
│   └──0.57 MB (00.90%) ++ gc-heap
├───5.47 MB (08.73%) -- window-objects/top(app://{82d88f9d-e744-4274-a570-3725df17d6f1}/index.html, id=1)
(In reply to Johnny Stenback (:jst, jst@mozilla.com) from comment #8)
> Ugh, in the poppit process:
> 
> 62.68 MB (100.0%) -- explicit
> ├──29.07 MB (46.37%) ── heap-unclassified
> 
> Kyle, is that expected? Do we have any known large missing memory reporters
> on b2g, or do we need to dig into why that's so high here?

https://bugzilla.mozilla.org/show_bug.cgi?id=957346#c8 and bug 960873.
Looks like maybe you need bug 961441 to accurately report decoded-audio memory.  That's only in m-c at the moment.

Also, I'm working on bug 960873 to help reduce the decoded-audio impact.  We need the app to tune some settings or make code changes to have the biggest impact here.  See bug 959603.

There is still about 18MB of heap-unclassified after those changes, though.  Probably worth further investigation.
Looking at a DMD dump on b2g desktop running poppit it seems like the biggest chunk of heap-unclassified comes from media kinda stuff (gstreamer mostly).
Probably similar, but on a real device we use OMX instead of gstreamer.
I'm taking this since it will likely be greatly improved, if not fixed, when I land bug 960873.

That being said, beyond improving memory usage it would be good to know why the app freezes under memory pressure.  That seems like a more subtle bug that we should not ignore.

Due to other blockers I'm working, I probably won't get to this until later in the week.
Assignee: overholt → bkelly
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [c=memory p= s= u=1.3]
Can we get ETA to fix this bug? Thank you.
Has anyone been able to reproduce the freeze on v1.3?  I've been playing poppit for a while trying to follow the steps in comment 0.  Was the logcat in comment 3 for an actual freeze and was it for 1.3?

I can't get the freeze to occur.  (I'm getting decent at poppit, though.)

That being said I can easily get it to OOM by hitting the home button or lock button.

I also triggered an annoying e.me bug by accident, but I wrote up bug 968628 to track that.
Keywords: qawanted
I reproduced the freeze on today's Buri v1.3.  The logcat provided in comment 3 was for the freeze and was on v1.3 as well.

I've gotten the freeze to occur twice now, once following the STRs in Comment 0 and another time the freeze occurred on the menu screen with these STRs:

1. Play a game of Poppit till no more moves after using special items
2. Tap menu
3. Start a new game and again play for approx. 2-3 minutes
4. Tap menu and attempt to interact with Menu buttons

Actual results: Freeze occurs
Expected result: App does not freeze and reacts to user input

v1.3 Environmental Variables:
Device: Buri v1.3 MOZ
BuildID: 20140206004002
Gaia: 467ef8c9145d9a57d35b0619db541d23b522b958
Gecko: a1fa925c40c2
Version: 28.0
Firmware Version: v1.2-device.cfg
Keywords: qawanted
Hmm, I'm using an older version of OEM firmware.  Let me try upgrading to v1.2-device.cfg.
I was finally able to reproduce the freeze with v1.2-device.cfg and MOZ RIL.

It looks like the Poppit app was spinning or something:

User 65%, System 20%, IOW 13%, IRQ 0%
User 10 + Nice 194 + Sys 65 + Idle 0 + IOW 41 + IRQ 0 + SIRQ 0 = 310

  PID PR CPU% S  #THR     VSS     RSS PCY UID      Name
  534  0  78% R   339 522228K  96928K  fg app_534  /system/b2g/plugin-container
  136  0   6% S    42 172332K  53840K  fg root     /system/b2g/b2g

I'm not sure if this is memory related.  If it is, its not a normal OOM.
I tried to take a profile while it was spinning, but it OOM'd the poppit app.
I was able to reproduce again after bug 960873 landed.  Now that we are using less memory I was able to get a profile:

http://people.mozilla.org/~bgirard/cleopatra/#report=c9bd050638195943d6724595d0077462fd80bdec

This doesn't have the finest resolution or full stacks, but it suggests that we might be getting:

  IPDL::PHal::RecvNotifySensorChange

Every 50ms which is causing the game engine to recompute as if the user is rapidly touching the screen.

Considering I could never reproduce with the older OEM firmware, I think there might be an issue with the V1.2-device.cfg image.

Jason, I'm going to be traveling shortly, so won't be able to look at this again really until Monday.  Do you have any devices with the old OEM firmware you could test with v1.3 gaia/gecko?  Who should we ask about this at the OEM?
Flags: needinfo?(jsmith)
Which firmware in particular are you looking for testing on? We've got access to some old base images for Buri.
Flags: needinfo?(jsmith)
The image dated 11/14 or 11/15.  Thanks!
Group: kddi-confidential
Okay - marking qawanted to test this with the 11/15 buri base image with the latest 1.3 gaia/gecko on top.
Group: kddi-confidential → mozilla-corporation-confidential
Keywords: qawanted
I did not mean to mark the bug confidential.  That was a mis-click while editing on my phone.
Group: mozilla-corporation-confidential
Tested this with the 11-15-2013 base and latest v1.3 buri gaia/gecko, eventually the bug does reproduce.  Took about 4x as long.  On the v1.2-device.cfg I can get it to freeze within 1-3 games doing the repro steps.  On the 11-15-2013 base it was around game 7 or 8.

v1.3 Environmental Variables:
Device: Buri v1.3 MOZ
BuildID: 20140207004002
Gaia: 1527d1e450364e383eeb95ff898dca2042e2b4b5
Gecko: 0a6d83aabb02
Version: 28.0
Firmware Version: 20131115
Keywords: qawanted
Thank you Peter.  What gecko revision did you have for v1.2-device.cfg testing in comment 26?  Was it also gecko rev 0a6d83aabb02?

The landing of bug 960873 might also have made it harder to trigger, so wondering if the v1.2-device.cfg has older gecko.
Flags: needinfo?(pbylenga)
My gecko rev was Gecko: a1fa925c40c2 from Comment 17 which is where I tested on the v1.2-device.cfg base.
Flags: needinfo?(pbylenga)
Thanks Peter.  If its easy to test v1.2-device.cfg on gecko 0a6d83aabb02 it would be interesting to know if there is a difference.  I think I have enough to continue investigating without that, though, if its painful.  Thanks again.
I hit the freeze similar to my repro rate in Comment 17.  It seemed to last a bit longer but definitely not as long as it did in Comment 26.  Hope that helps!

v1.3 Environmental Variables:
Device: Buri v1.3 MOZ
BuildID: 20140207004002
Gaia: 1527d1e450364e383eeb95ff898dca2042e2b4b5
Gecko: 0a6d83aabb02
Version: 28.0
Firmware Version: v1.2-device.cfg
Ok, I think I understand what is happening here now.

1) impact.js has an ig.input.initAccelerometer() function which does addEventListener('devicemotion')
2) poppit calls initAccelerometer() every time you win a "prize" is unlocked. (Doll floating down from a pop, etc.)
3) This results in one 'devicemotion' event listener for every prize you have unlocked in this instance of the game.
4) The 'devicemotion' event is fired from gonk every 50ms which triggers all these event listeners.
5) When the event listener processing takes up more than 50ms of time, you essentially enter a live-lock state where other input cannot be processed.

This is only a memory related issue in that memory pressure will cause garbage collection to fire more often, thus using up more of the 50ms time window.

I spoke to Domenic and he has put a fix into the impact.js repo so it will only register its 'devicemotion' event listener once.  This is currently unreleased, though.

In any case we need the poppit app developer to upgrade to the git version of impact.js or modify their code to workaround this issue by only calling initAccelerometer() once.  Without doing this poppit will gradually slow down every time the user wins a prize in the game.

Harald, can you reach out to the poppit developer about this?  Note, we also have a request out to them to upgrade to impact.js git version in bug 959603.
Flags: needinfo?(hkirschner)
Should this still be blocking since we need the app to be modified?
blocking-b2g: 1.3+ → 1.3?
No longer depends on: 969114
I'm leaving bug 960873 as a dependency here since Peter's testing demonstrated that it helped mitigate the issue.

Note, I also think Peter is just a better poppit player than me as he seems to get more prizes in the same amount of time.  :-)
Blocks: b2g-poppit
blocking-b2g: 1.3? → ---
Component: General → Preinstalled B2G Apps
Product: Firefox OS → Tech Evangelism
Harald, since you have bug 959603, I'm going to take the liberty of setting you as the assignee for this one as well.  Let me know if you have any issues with that.  Thanks!
Assignee: bkelly → hkirschner
Priority: P1 → P2
Whiteboard: [c=memory p= s= u=1.3] → [c=memory p= s= u=]
@Harald - Are there any updates regarding this bug?  Do we have a date this may be fixed by?
Julia, please review and provide fix date.  Thank you
Assignee: hkirschner → 2008
Flags: needinfo?(jmunck)
Julia - are there any updates regarding this bug?
Hi Nicole,
Who can I speak to regarding Poppit on FireFox OS? Need to understand what the roadmap and roll out plan looks like for Poppit so that we can determine priority of getting these bugs fixed. 
Please let me know.
Thanks,
Julia
Flags: needinfo?(jmunck)
BD will jump in here offline to answer all of Julian's questions.
Flags: needinfo?(hkirschner)
(In reply to Nicole Fong from comment #36)
> Assignee: hkirschner@mozilla.com2008@jmunch.dk

You must think I'm someone else.
Assignee: 2014 → nobody
Status: ASSIGNED → NEW
Harald - has BD gotten in contact with Julia?

Julia - do you have all the information you need in order to fix this bug?  If so, when is the potential fix date?
Assignee: nobody → jmunck
Status: NEW → ASSIGNED
Mass closing on Tech Evangelism::Preinstalled B2G App as Firefox OS is no longer a thing.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Mass closing on Tech Evangelism::Preinstalled B2G App as Firefox OS is no longer a thing.
Mass closing on Tech Evangelism::Preinstalled B2G App as Firefox OS is no longer a thing.
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: