Poppit abnormally terminates quickly after trying to play the game for a few seconds

RESOLVED FIXED

Status

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jsmith, Assigned: snorp)

Tracking

({regression})

unspecified
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:koi+, firefox27 verified, b2g-v1.2 fixed)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(4 attachments)

Posted file Logcat
Build: 1.2 9/20/2013
Device: Buri

STR

1. Install Poppit from marketplace
2. Launch it
3. Try playing the game for a few seconds

Expected

The game stays up and running.

Actual

After playing the game for a few seconds, the app abnormally gets killed.
blocking-b2g: --- → koi?
Posted file dmesg log
I can also re-generate the crash on master hamachi. From dmesg log, poppit process is killed by low memory killer.
This bug seems very similar to Bug 899907.
Whiteboard: [MemShrink]
moving to koi+ as it is a regression issue.
blocking-b2g: koi? → koi+
QA Contact: dwatson
The window for the bug appears to be between 09/06 and 09/09:

Occurs on -
Buri Build ID: 20130909114657
Gecko: http://hg.mozilla.org/mozilla-central/rev/218d4334d29e
Gaia: aa4180e9286d385fa6b62d236f30fb24cd8b93e9
Platform Version: 26.0a1

Does not occur -
Build ID: 20130906040204
Gecko: http://hg.mozilla.org/mozilla-central/rev/ab5f29823236
Gaia: 94e5f269874b02ac0ea796b64ab995fce9efa4b3
Platform Version: 26.0a1

In build 09/06 the Poppit application does not experience the crash while playing, but in 09/09 the application crashes while playing.
I revert the following, by this change I can play poppit by using ROM built from same source in Comment 1.

- Bug 907745 - Enable the new textures on B2G 
   https://hg.mozilla.org/integration/b2g-inbound/rev/5174b38b4153
(In reply to Sotaro Ikeda [:sotaro] from comment #5)
> I revert the following, by this change I can play poppit by using ROM built
> from same source in Comment 1.
> 
> - Bug 907745 - Enable the new textures on B2G 
>    https://hg.mozilla.org/integration/b2g-inbound/rev/5174b38b4153

This revert seems to take the lifetime of poppit longer. But the app is killed during play at last.
(In reply to Sotaro Ikeda [:sotaro] from comment #6)
> (In reply to Sotaro Ikeda [:sotaro] from comment #5)
 
> This revert seems to take the lifetime of poppit longer. But the app is
> killed during play at last.

When reverted, I can play poppit. But ui touch and home button often became not work at all.
Blocks: GFXB2G1.2
Make sure we test on 1.2 going forward, not on master which has different graphics API in use (should not be related here though, if the problem was introduced on 9/6.)
Assignee: nobody → nical.bugzilla
Nothing obvious while running this on B2G emulator:

 - about:memory says for the Poppit process 140 M resident, 121 M resident-unique, which is enough to explain OOM on device, but offers very little clue into how that memory is used. 101 M explicit, of which 70M unclassified. Nothing special gfx-wise, with 4 M gralloc, 0.6 M canvas-2d-pixels, 7.57 images...

 - logcat keep spewing errors like this:

E/OMXCodec(  192): [OMX.google.mp3.decoder] Timed out waiting for output buffers: 4/0

 - in refgraph, nothing special graphics-wise either. Few layers, few Gralloc buffers... The only thing that seemed to be growing was the number of MediaDecoderStateMachine objects on the heap. It was 2 at the beginning of the game, and continuously grew to 6. These objects are part of a cycle of strong references that is not declared to the CC, reported as follows ([1]) by refgraph. However, these cycles remained referenced by outside, they weren't leaked.

 - the next step should be to run this in DMD to figure what the heap usage is.



[1] :

Cycle 45 has 2 vertices.

Is NOT traversed by the CC!

Types involved: mozilla::MediaDecoder / mozilla::MediaDecoderStateMachine

Vertex 0:

type name: mozilla::MediaDecoder
address: 0x4435b160
size: 296 bytes
belongs to Cycle 45
Strong references:
  strong reference 0
    address: 0x402695e0
    reference type name: nsCOMPtr<mozilla::MediaDecoderStateMachine>
    NON-traversed by the CC
  strong reference 1
    address: 0x435db020
    reference type name: nsRefPtr<mozilla::MediaResource>
    NON-traversed by the CC
Weak references:
  weak reference 0
    <unknown> at 0x4022f040
  weak reference 1
    <unknown> at 0x445c7ca0
  weak reference 2
    <unknown> at 0x445c8520
  weak reference 3
    FragmentOrElement (xhtml) audio app://{588c9b49-9c87-4985-b938-a5ff51cd78a9}/index.html at 0x44cbac20
Back strong references:
  back strong reference 0
    mozilla::MediaDecoderStateMachine at 0x402695e0
  back strong reference 1
    FragmentOrElement (xhtml) audio app://{588c9b49-9c87-4985-b938-a5ff51cd78a9}/index.html at 0x44cbac20
  back strong reference 2
    nsTArrayHeader at 0x4592c020
Back weak references:
  back weak reference 0
    mozilla::MediaDecoderStateMachine at 0x402695e0
  back weak reference 1
    <unknown> at 0x43392e20
  back weak reference 2
    <unknown> at 0x433f8a20
  back weak reference 3
    mozilla::MediaResource at 0x435db020
  back weak reference 4
    nsTArrayHeader at 0x44b99020
  back weak reference 5
    <unknown> at 0x4546f8e0


Vertex 1:

type name: mozilla::MediaDecoderStateMachine
address: 0x402695e0
size: 376 bytes
belongs to Cycle 45
Strong references:
  strong reference 0
    address: 0x4435b160
    reference type name: nsRefPtr<mozilla::MediaDecoder>
    NON-traversed by the CC
  strong reference 1
    address: 0x459a7810
    reference type name: nsCOMPtr<nsITimer>
    NON-traversed by the CC
Weak references:
  weak reference 0
    <unknown> at 0x4022f040
  weak reference 1
    mozilla::MediaDecoderStateMachine at 0x402695e0
  weak reference 2
    <unknown> at 0x43392e20
  weak reference 3
    mozilla::MediaDecoder at 0x4435b160
  weak reference 4
    <unknown> at 0x44391cf0
  weak reference 5
    <unknown> at 0x44bddf20
Back strong references:
  back strong reference 0
    mozilla::MediaDecoder at 0x4435b160
Back weak references:
  back weak reference 0
    mozilla::MediaDecoderStateMachine at 0x402695e0
  back weak reference 1
    nsTimerImpl at 0x459a7810
  back weak reference 2
    <unknown> at 0x55466020
  back weak reference 3
    <unknown> at 0x554d3020
  back weak reference 4
    <unknown> at 0x5a007020
  back weak reference 5
    <unknown> at 0x5a070020
Sotaro / Chris D / Paul might know about whether the MediaDecoder situation here is normal.
(In reply to Benoit Jacob [:bjacob] from comment #10)
> Sotaro / Chris D / Paul might know about whether the MediaDecoder situation
> here is normal.

Sounds like they use a lot of audio objects, each of which would result in a number of threads being created. Do we have the source to Poppit?
Posted file poppit-1.0.5.zip
(In reply to Chris Double (:doublec) from comment #11)
> (In reply to Benoit Jacob [:bjacob] from comment #10)
> > Sotaro / Chris D / Paul might know about whether the MediaDecoder situation
> > here is normal.
> 
> Sounds like they use a lot of audio objects, each of which would result in a
> number of threads being created. Do we have the source to Poppit?

I just attached the zip file from marketplace with the source.
(In reply to Sotaro Ikeda [:sotaro] from comment #5)
> I revert the following, by this change I can play poppit by using ROM built
> from same source in Comment 1.
> 
> - Bug 907745 - Enable the new textures on B2G 
>    https://hg.mozilla.org/integration/b2g-inbound/rev/5174b38b4153

I am going to re-check if the above is correct.
Posted file hg change log
"hg log" got from the following command. Tags are based on comment 4.
  - hg log  -r ab5f29823236:218d4334d29e
From attachment 810747 [details], I picked up the suspicious changes.

----------------------------------------------------------
changeset:   145870:949ea52cb8ed
user:        Nicolas Silva <nical.bugzilla@gmail.com>
date:        Fri Sep 06 11:03:12 2013 +0200
summary:     Bug 911941 - Prevent TextureClients to be shared by several ImageLayers. r=Bas

changeset:   145871:af7a69606aa8
user:        Nicolas Silva <nical.bugzilla@gmail.com>
date:        Fri Sep 06 11:04:50 2013 +0200
summary:     Bug 912040 - Make sure deallocating shared texture data doesn't race with TextureClient access. r=sotaro

changeset:   145872:5174b38b4153
user:        Nicolas Silva <nical.bugzilla@gmail.com>
date:        Fri Sep 06 11:08:55 2013 +0200
summary:     Bug 907745 - Enable the new textures on B2G

-------------------------------------------------------

changeset:   145961:194d051724a8
user:        Stephen Pohl <spohl.mozilla.bugs@gmail.com>
date:        Fri Sep 06 12:30:17 2013 -0400
summary:     Bug 817700 - Make <canvas>.toBlob run asynchronously. r=seth,roc

changeset:   145962:db43e949245c
user:        Stephen Pohl <spohl.mozilla.bugs@gmail.com>
date:        Fri Sep 06 12:30:21 2013 -0400
summary:     Bug 817700 - Make <canvas>.toBlob run asynchronously - canvas changes. r=roc

changeset:   145961:194d051724a8
user:        Stephen Pohl <spohl.mozilla.bugs@gmail.com>
date:        Fri Sep 06 12:30:17 2013 -0400
summary:     Bug 817700 - Make <canvas>.toBlob run asynchronously. r=seth,roc

-------------------------------------------------------

changeset:   145971:2f0686edebb5
user:        Benoit Jacob <bjacob@mozilla.com>
date:        Fri Sep 06 08:02:04 2013 -0400
summary:     Bug 905227 - Introduce 128 px threshold for Skia/GL, overridable by gfx.canvas.min-size-for-skia-gl preference - r=nical

changeset:   145972:d9e41df7445c
user:        Benoit Jacob <bjacob@mozilla.com>
date:        Fri Sep 06 08:02:12 2013 -0400
summary:     Bug 905227 - Turn on Skia/GL on B2G - r=nical

changeset:   145973:8e91e51a5c19
user:        Benoit Jacob <bjacob@mozilla.com>
date:        Fri Sep 06 08:03:08 2013 -0400
summary:     Bug 905227 - On Android Emulator, assume that OES_rgb8_rgba8 is available even though it's not in the GL_EXTENSIONS string - r=nical

changeset:   145974:79c8b5e214a9
user:        Benoit Jacob <bjacob@mozilla.com>
date:        Fri Sep 06 08:03:53 2013 -0400
summary:     Bug 905227 - Adjust WebGL reftest.list for UNEXPECTED-PASS - r=nical

--------------------------------------------------------------

changeset:   146103:a20a9a879347
user:        Chris Pearce <cpearce@mozilla.com>
date:        Mon Sep 09 09:47:29 2013 +1200
summary:     Bug 910996 - Move MP3FrameParser out of B2G specific code, so it can be used on other platforms. r=padenot

------------------------------------------------------------

changeset:   146106:353aa004fed8
user:        Matt Woodrow <mwoodrow@mozilla.com>
date:        Mon Sep 09 14:35:20 2013 +1200
summary:     Bug 912173 - Always AddRef and Release the GLContext when sharing a SurfaceStream to avoid mismatches. r=nrc
I feel the enabling the SkiaGL change seems most suspicious. Disabled it by commenting out the following. Since then the poppit did not get killed by low memory killer. 

// pref("gfx.canvas.azure.backends", "skia");
// pref("gfx.canvas.azure.accelerated", true);

------------------------------------------
changeset:   145972:d9e41df7445c
user:        Benoit Jacob <bjacob@mozilla.com>
date:        Fri Sep 06 08:02:12 2013 -0400
summary:     Bug 905227 - Turn on Skia/GL on B2G - r=nical
That makes sense - when we use SkiaGL, we use their cache, and the memory usage would go up.  I don't think the fix is to disable SkiaGL, and there are some thoughts for getting the cache picture more optimal, though I doubt that is a 1.2 timeframe.
Flags: needinfo?(bjacob)
During debugging, I saw another problem. When SkiaGL was enabled, poppit UI rendering sometimes drawn upside down. When I disabled the SkiaGL, I did not see this.
(In reply to Milan Sreckovic [:milan] from comment #18)
> That makes sense - when we use SkiaGL, we use their cache, and the memory
> usage would go up.  I don't think the fix is to disable SkiaGL, and there
> are some thoughts for getting the cache picture more optimal, though I doubt
> that is a 1.2 timeframe.

The Skia/GL cache is really an aspect of Skia/GL that I have never worked on, so you'll get a better answer from :snorp or :gw280. Forwarding the needinfo.
Flags: needinfo?(snorp)
Flags: needinfo?(gwright)
Flags: needinfo?(bjacob)
Note: we really need to look into this carefully with DMD to make sure that we understand exactly what, around Skia, is using much memory. Or are we already certain that it is the cache?
(In reply to Benoit Jacob [:bjacob] from comment #21)
> Note: we really need to look into this carefully with DMD to make sure that
> we understand exactly what, around Skia, is using much memory. Or are we
> already certain that it is the cache?

Given that things don't go bad when we use software Skia, I'd say the cache seems like the culprit.

The current Skia texture cache behavior is not great for mobile. Bug 920239 will help.

In the mean time we could try reducing the Skia max cache size for B2G. This value can be changed here: http://mxr.mozilla.org/mozilla-central/source/gfx/2d/DrawTargetSkia.cpp#91

I guess I would suggest 32 or even 16 here, given how resource-constrained some B2G devices can be. Obviously we need to get to a better solution, though.
Flags: needinfo?(snorp)
Another thing we need to do is purge the Skia texture cache entirely when there is a memory-pressure event. I had a patch for this a while back but I have no idea what I did with it.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #22)
> I guess I would suggest 32 or even 16 here, given how resource-constrained
> some B2G devices can be. Obviously we need to get to a better solution,
> though.

I locally changed to 16 on v1.2 hamachi. By this change, the poppit did not killed by low memory killer until now, when it is shown in forground. When the app is set to background, the app is kill by low memory killer. 16MiB is still huge memory for hamachi device.

When SkiaGL is disabled, the app was not get killed by low memory killer. But the change to 16 made big improvement.
(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #22)
> > I guess I would suggest 32 or even 16 here, given how resource-constrained
> > some B2G devices can be. Obviously we need to get to a better solution,
> > though.
> 
> I locally changed to 16 on v1.2 hamachi. By this change, the poppit did not
> killed by low memory killer until now, when it is shown in forground. When
> the app is set to background, the app is kill by low memory killer. 16MiB is
> still huge memory for hamachi device.
> 
> When SkiaGL is disabled, the app was not get killed by low memory killer.
> But the change to 16 made big improvement.

Alright, thanks for giving that a try. How much memory does the Hamachi have total? 256? We should probably just disable SkiaGL there.
Hamachi have 256 in total. But actual system memory is 180MB in total.
The following is memory info after stating the poppit with SkiaGL enabled.
----------------------------------------------
    NAME PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER   
     b2g 144    0 48.7 51.8 55.3 152.2       0 root   
Poppit!" 521    1 87.2 90.5 94.0 494.4       2 app_521

System memory info:

            Total 180.4 MB
     Used - cache 156.9 MB
  B2G procs (PSS) 142.3 MB
    Non-B2G procs  14.6 MB
     Free + cache  23.5 MB
             Free   7.6 MB
            Cache  16.0 MB
It is memory info after starting the poppit with SkiaGL disabled.

----------------------------------------------------------
                  |     megabytes    |
    NAME PID NICE  USS  PSS  RSS VSIZE OOM_ADJ USER   
     b2g 146    0 52.2 56.1 60.3 153.8       0 root   
Poppit!" 554    1 70.6 74.5 78.7 460.8       2 app_554

System memory info:

            Total 180.4 MB
     Used - cache 138.4 MB
  B2G procs (PSS) 130.6 MB
    Non-B2G procs   7.8 MB
     Free + cache  42.0 MB
             Free  21.5 MB
            Cache  20.5 MB
Looks like the 16MB texture cache is the exact difference. The cache is what gives SkiaGL most of its speed (with drawImage at least), so if we can't have a decently sized one we should just fallback to software.
Assignee: nical.bugzilla → snorp
We can just build the check for memory and stuff into the patch for bug 920160, I think.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 920160
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #30)
> We can just build the check for memory and stuff into the patch for bug
> 920160, I think.
> 
> *** This bug has been marked as a duplicate of bug 920160 ***

This bug has already been triaged, so the dupe needs to go the other direction.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Depends on: 920160
Or make a dependency and nominate the dependency.
Let me clarify my wording in the above comments:

This bug is deemed as a blocker, but I don't know about the other. Until the other bug becomes a blocker itself, this bug needs to be fixed independently of bug 920160.
I can run Poppit with a 4MB cache on hamachi, but it's still super slow. Profiling indicates that it's doing software blits for some reason. The same setup (4MB cache) works fine on Android, so I'm looking into that now.
Whiteboard: [MemShrink] → [MemShrink:P2]
Have seen this issue today on the latest Buri aurora build.
Repros exactly as in comment 0, Poppit works fine after rebooting.

BuildID: 20131016004005
Gaia: 5ef3535021286ccab7af639897feaaf5955720a0
Gecko: 75b0b968f3ed
Version: 26.0a2

*have the logcat saved if needed
snorp, this is a fairly hard blocker for release. What's the latest? What do you need to expedite your investigation and fix?
Flags: needinfo?(snorp)
(In reply to Alex Keybl [:akeybl] from comment #36)
> snorp, this is a fairly hard blocker for release. What's the latest? What do
> you need to expedite your investigation and fix?

The fix for bug 920160 fixes this, just waiting on an additional review to land that.
Flags: needinfo?(snorp)
Closing as fixed by bug 920160.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Keywords: verifyme
Resolution: --- → FIXED
This issue still reproduces but far less often on Buri 1.2 Build ID: 20131107004003.

The game closes intermittently when starting a new level, changing the skill level, or just by tapping on balloons during a stage. 

Gaia   590eb598aacf1e2136b2b6aca5c3124557a365ca
SourceStamp 26f1e160e696
BuildID 20131107004003
Version 26.0

This issue does not reproduce on Buri 1.3 Build ID: 20131107040200

Gaia   42bbe26a72e030faf07a6fc297f61a3a8ccda25b
SourceStamp 70de5e24d79b
BuildID 20131107040200
Version 28.0a1

Game no longer terminates after trying to play Poppit on 1.3.
Sarah - Can you file a followup bug for the issue you found on 1.2?
A new bug has been created.  See Bug 936285

(In reply to Jason Smith [:jsmith] from comment #40)
> Sarah - Can you file a followup bug for the issue you found on 1.2?
Keywords: verifyme
Flags: needinfo?(gwright)
You need to log in before you can comment on or make changes to this bug.