Closed Bug 982237 Opened 6 years ago Closed 6 years ago

[B2G][Buri]Firefox game app (cupcakes vs. veggies) fails to render and exits unexpectedly

Categories

(Core :: Canvas: 2D, defect, P1)

28 Branch
ARM
Gonk (Firefox OS)
defect

Tracking

()

VERIFIED FIXED
mozilla31
blocking-b2g 1.3+
Tracking Status
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- fixed
b2g-v1.3 --- verified
b2g-v1.3T --- fixed
b2g-v1.4 --- verified
b2g-v2.0 --- verified

People

(Reporter: njpark, Assigned: bkelly)

References

Details

(Keywords: perf, regression, Whiteboard: [c=effect p=3 s=2014.03.28 u=1.3] [MemShrink:P2])

Attachments

(5 files, 3 obsolete files)

Attached file veggies.txt
The game that is featured on Firefox OS Marketplace "Cupcakes vs. Veggies" exits unexpectedly when executed.  The screen goes dark, and a blue dotted line appears in the corner, then the app exits to the finder without any error message.  This game does run in Firefox OS 1.1 (Platform Version 18)

Reproduction Steps:
1. Download Cupcakes vs. Veggies from FFOS Marketplace and install
2. Execute.

Expected:
App should run and display graphics

Actual:
Shows blank screen with a single line, then exits

Device = Buri/Hamachi
Gaia      a351fe62c11737c722ad33aaff438f6ccd00bd4a
Gecko     https://hg.mozilla.org/mozilla-central/rev/41d962d23e81
BuildID   20140311040203
Version   30.0a1
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013

Attached is the adb log.  Shortly after the app is launched, following error is logged:

I/Gecko   (  136): Event handler expection [Exception... "[JavaScript Error: "Permission denied for <app://system.gaiamobile.org> to create wrapper for object of class UnnamedClass" {file: "app://system.gaiamobile.org/js/app_install_manager.js" line: 390}]'[JavaScript Error: "Permission denied for <app://system.gaiamobile.org> to create wrapper for object of class UnnamedClass" {file: "app://system.gaiamobile.org/js/app_install_manager.js" line: 390}]' when calling method: [nsIDOMEventListener::handleEvent]"  nsresult: "0x80570021 (NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS)"  location: "native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0"  data: yes]

When it is supposed to render the image, it shows a series of following errors:

E/msm7627a.hwcomposer(  136): hwc_set: Unable to render by hwc due to non-pmem memory
Going to bump to general first to do more analysis before we conclude which bug is the root problem here. I'm also adding qawanted to check behavior on 1.3.
Component: Graphics → General
Keywords: qawanted
Product: Core → Firefox OS
Version: 30 Branch → unspecified
Attached file veggies_13.txt
adb output when running under 1.3
It fails in 1.3 build as well.  attached is the adb output.  the error

E/msm7627a.hwcomposer(  136): hwc_set: Unable to render by hwc due to non-pmem memory

is still present, but no javascript error.  Perhaps it was irrelevant?
Keywords: qawanted
(In reply to npark from comment #3)
> It fails in 1.3 build as well.  attached is the adb output.  the error
> 
> E/msm7627a.hwcomposer(  136): hwc_set: Unable to render by hwc due to
> non-pmem memory
> 
> is still present, but no javascript error.  Perhaps it was irrelevant?

It seems not related. It typically happen when gralloc buffers are falledbacked to ashmem. It could easily happen on hamachi/buri.
I locally confirmed that the problem seems to happen because of out of memory. "adb shell dmesg" log had the following log.

<3>[  185.638031] Out of memory: Kill process 347 (Homescreen) score 621 or sacrifice child
<3>[  185.638051] Killed process 347 (Homescreen) total-vm:70496kB, anon-rss:15996kB, file-rss:0kB
<7>[  187.760969]  msm_batt_update_psy_status>>>>>>>> ygg  626  
<7>[  187.761759] ygg **** msm_batt_get_batt_chg_status 473 is_charging_complete 0  is_charging 0
<6>[  187.761774]  msm_batt_update_psy_status 653  4200 
<4>[  187.761788] BATT: rcvd: 0, 2, 0, 2; 4200, 31
<7>[  187.761799] ygg *** full off led msm_batt_update_psy_status 846  
<6>[  187.761813] ygg *******led_on 483 on 0
<7>[  187.761826] ygg *** full off led msm_batt_update_led_status 500  
<6>[  187.761836] ygg *******led_on 483 on 0
<6>[  187.761849] resume update  msm_batt_update_psy_status 858  4200
<6>[  187.762493]   msm_batt_update_psy_status 0  0
<7>[  187.762514]  msm_batt_update_psy_status <<<<<<<<< ygg  874  
<4>[  188.006503] select 616 ((Preallocated a), adj 10, size 2839, to kill
<4>[  188.006523] send sigkill to 616 ((Preallocated a), adj 10, size 2839
<4>[  189.894286] select 575 (Cupcakes vs. Ve), adj 2, size 9543, to kill
<4>[  189.894308] send sigkill to 575 (Cupcakes vs. Ve), adj 2, size 9543
<4>[  213.756886] SSL Cert #3 used greatest stack depth: 4728 bytes left
Easy to reproduce too. If it's working on 1.1, then we've got a serious performance regression here with memory if we're managing to OOM the app this easily.
blocking-b2g: --- → 1.3?
Whiteboard: [MemShrink]
Mike

Please review and reassign.
blocking-b2g: 1.3? → 1.3+
Flags: needinfo?(mlee)
Looks like its allocating a large layer with system memory:

I/Gecko   (  761): SharedSurface_Gralloc::Create: success -- surface 0x4449de80, GraphicBuffer 0x442e3000.
I/Gecko   (  761): SharedSurface_Gralloc::Create -------
E/memalloc(  136): /dev/pmem: No more pmem available
W/memalloc(  136): Falling back to ashmem
I/Gecko   (  761): SharedSurface_Gralloc::Create: success -- surface 0x444fd980, GraphicBuffer 0x442e3880.
I/Gecko   (  136): 
I/Gecko   (  136): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
Priority: -- → P1
Hardware: x86 → ARM
Whiteboard: [MemShrink] → [MemShrink][c=effect p= s= u=]
Whiteboard: [MemShrink][c=effect p= s= u=] → [MemShrink][c=effect p= s= u=1.3]
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(mlee)
Component: General → Performance
Jason are we sure this reproduces on v1.3?  I just installed, launched, and played the game on my v1.3 buri.

  gaia:  df157ce3a3f841850a1cce8f057f8e7f547fb9f8
  gecko: df253201ad89

It looks like comment 0 was with gecko 30.
Flags: needinfo?(jsmith)
Actually... it worked launched from marketplace, but fails if I launch from homescreen.  So I can reproduce now.
Flags: needinfo?(jsmith)
Yea, we are allocating some very large gralloc buffers which are falling back to system memory:

03-12 15:01:02.979 E/memalloc( 7796): ### ### ashmem allocting 5017600 bytes
03-12 15:01:03.089 E/memalloc( 7796): ### ### ashmem allocting 16777216 bytes
03-12 15:01:03.789 E/memalloc( 7796): ### ### ashmem allocting 16777216 bytes

Thats 38MB of system memory just for graphics layers.  Add in normal app memory and audio resources and I could very easily see this OOMing.

If I disable APZC we only seem to allocate around 2MB buffers.

Kats, any ideas on what this could be?  Any debug I can add to help isolate it?
Flags: needinfo?(bugmail.mozilla)
Sounds like their app uses a set of frames that end up allocating large displayports. The usual APZC logging should confirm that; look for the "requesting content repaint" output from AsyncPanZoomController.cpp and see what the displayport is set to.
Flags: needinfo?(bugmail.mozilla)
QA Contact: mvaughan
Component: Performance → Panning and Zooming
Product: Firefox OS → Core
QA Contact: mvaughan
Version: unspecified → 28 Branch
Looks like 480x5000?  I'm having trouble looking at its code because it appears to be browser+appcache based.

03-12 23:57:09.869 I/Gecko   (11516): APZC: 0x45aeb400 got a NotifyLayersUpdated with aIsFirstPaint=0: i=(2 2) cb=(0 0 480 3
20) dp=(0.000 0.000 480.000 320.000) v=(0.000 0.000 480.000 320.000) s=(0.000 0.000) sr=(0.000 0.000 480.000 5000.000) z=(1.
000 1.000 1.000 1.000) 0
03-12 23:57:09.979 I/Gecko   (11830): SharedSurface_Gralloc::Create -------
03-12 23:57:09.989 I/Gecko   (11830): SharedSurface_Gralloc::Create: success -- surface 0x434d13c0, GraphicBuffer 0x44494d80
.
03-12 23:57:10.039 I/Gecko   (11830): SharedSurface_Gralloc::Create -------
03-12 23:57:10.039 E/memalloc(11516): /dev/pmem: No more pmem available
03-12 23:57:10.039 E/memalloc(11516): ### ### ashmem allocting 16777216 bytes
Here is a full log.  Anything jump out at you Kats?  Thanks for your help!
Flags: needinfo?(bugmail.mozilla)
From the log the only repaint request I see is this one:

03-13 00:08:38.889 I/Gecko   (11516): APZC: 0x45e45c00 requesting content repaint: i=(2 2) cb=(0 0 480 320) dp=(0.000 0.000 480.000 320.000) v=(0.000 0.000 480.000 320.000) s=(0.000 0.000) sr=(0.000 0.000 480.000 320.000) z=(1.000 1.000 1.000 1.000) 1

which has a normal-sized displayport (480x320). Towards the end of the log, like in the line you pasted in comment 13, the scrollable rect (i.e. the full size of the content) expands to be 5000 pixels tall, but the displayport (the dp=(...) value) is still only 320 pixels tall.

I don't think the displayport is the problem here.
Flags: needinfo?(bugmail.mozilla)
I spoke with Kats on IRC.  Later today I'll try triggering a layers dump when I see a large allocation in:

  gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp
Specifically I think you should log all the calls to [1] (this is on the root process side) and see what sort of stuff is getting created. If you find something is requesting allocation of a buffer that's way too big, then you can add a condition on the child side [2] and get a backtrace or something to see where it's coming from.

[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp#283
[2] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp#416
QA Contact: mvaughan
This issue appears to have started reproducing on the 09/09/13 1.2 build. The game will crash when you attempt to launch it from the Homescreen. You can launch it from the Marketplace, but it will cause the app UI and Home button to become unresponsive, forcing a device restart.

- Last Working -
Device: Buri v1.2 MOZ RIL
BuildID: 20130906040204
Gaia: 94e5f269874b02ac0ea796b64ab995fce9efa4b3
Gecko: ab5f29823236
Version: 26.0a1
Firmware Version: V1.2-device.cfg

- First Broken -
Device: Buri v1.2 MOZ RIL
BuildID: 20130909114657
Gaia: aa4180e9286d385fa6b62d236f30fb24cd8b93e9
Gecko: 218d4334d29e
Version: 26.0a1
Firmware Version: V1.2-device.cfg

**This looks to possibly be a gecko issue**
last working gaia/first broken gecko = Launches but causes an unresponsive app UI and Home button, forcing a device restart.
Gaia: 94e5f269874b02ac0ea796b64ab995fce9efa4b3
Gecko: 218d4334d29e

first broken gaia/last working gecko = NO REPRO
Gaia: aa4180e9286d385fa6b62d236f30fb24cd8b93e9
Gecko: ab5f29823236

Push log: http://hg.mozilla.org//mozilla-central/pushloghtml?fromchange=ab5f29823236&tochange=218d4334d29e

NOTE: The game launches and plays just fine on the 9/06 build, but if you attempt to terminate the app via the card view, the OS will crash.
The window above heavily implies this is a SkiaGL regression.
Component: Panning and Zooming → Graphics: Layers
Can we get a memory dump?
Adding qawanted to get an about:memory log.
Keywords: qawanted
Attached file about memory report
Here is the requested about memory report.
Keywords: qawanted
I had trouble getting a debug build to work on the device so I hacked up ShadowLayersUtilGralloc.cpp to use a PROFILER_MARKER() and sleep(1) when it sees a 2048x2048 gralloc allocation.

This is the profile I got:

  http://people.mozilla.org/~bgirard/cleopatra/#report=101d8a315f7ac568a35f2bc4ef076be4a416fdf4

As you can see it is indeed related to canvas and not APZC.  Interestingly, this is being triggered by the app's font loading code.
And I tried replicating my success with APZC off from comment 11.  It does still OOM with APZC off.  I think I just got lucky when I tried earlier.  It seems the app can survive maybe 20% of the time or something.

Sorry for the noise Kats!
It appears this is the source of the font.min.js file that is creating the canvas:

  https://github.com/Pomax/Font.js/blob/master/Font.js
And here is where the 2048x2048 canvas comes from:

      // If this is a remote font, we rely on the indicated quad size
      // for measurements. If it's a system font there will be no known
      // quad size, so we simply fix it at 1000 pixels.
      var quad = this.systemfont ? 1000 : this.metrics.quadsize;

      // Because we need to 'preload' a canvas with this
      // font, we have no idea how much surface area
      // we'll need for text measurements later on. So
      // be safe, we assign a surface that is quad² big,
      // and then when measureText is called, we'll
      // actually build a quick <span> to see how much
      // of that surface we don't need to look at.
      var canvas = document.createElement("canvas");
      canvas.width = quad;
      canvas.height = quad;

From here:

  https://github.com/Pomax/Font.js/blob/master/Font.js#L496
Mike, do you think it would be reasonable to cap the size of this canvas in font.js?  It seems the code already has to handle analyzing the font piecemeal for the system font case.  In this case quadsize appears to be 2048, which is equating to a 16MB buffer (2048x2048x4bits).

Also, I'm not 100% sure from looking at the code, but it appears font.js keeps the canvas around?  Is this necessary?  Would it be possible to create the canvas only when its used in |measureText| and set its width/height back to 0 when not measuring to free memory?

Just looking for ways to make this more memory efficient for mobile.  Thanks!
Flags: needinfo?(pomax)
Maybe font.js should use willReadFrequently when creating the canvas to force software canvases instead of opengl.

  var context = canvas.getContext('2d', { willReadFrequently: true });
Alternatively, should we perhaps automatically fallback to software canvas context if the user requests a size that is too large?  Otherwise we are always at risk of gralloc blowing up on us like this.

Jeff, James, what do you think?
Flags: needinfo?(snorp)
Flags: needinfo?(jmuizelaar)
Component: Graphics: Layers → Canvas: 2D
(In reply to Ben Kelly [:bkelly] from comment #29)
> Alternatively, should we perhaps automatically fallback to software canvas
> context if the user requests a size that is too large?  Otherwise we are
> always at risk of gralloc blowing up on us like this.
> 
> Jeff, James, what do you think?

It's not clear to me from this bug how gralloc is blowing up. Can you elaborate that part?
Flags: needinfo?(jmuizelaar)
The real question is what Font.js is being used for. I'm working on a completely different font related library at the moment, and there's now also the rather useful opentype.js available (http://nodebox.github.io/opentype.js/) so depending on what's required Font.js might not be the best thing to go for. Font.js was designed mostly to find out metrics without having to actually do any cmap glyph lookups; if you can spare the cycles to perform those lookups, the resulting metrics are actually more accurate than what Font.js can give you, and probably doesn't require a canvas at all.

(that said, I don't know if opentype.js does general metrics based on full strings rather than individual glyphs - I know there's no GPOS implementation yet, so advanced kerning might not be taken into account atm).
  
needinfo me again if more information's required =)
Flags: needinfo?(pomax)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #30)
> It's not clear to me from this bug how gralloc is blowing up. Can you
> elaborate that part?

Well, if an app tries to create a canvas of 2048x2048 we currently as gralloc for a 16MB buffer.  This results in it used 16MB of mlocked system memory.

In theory I could craft an app that asked for enough mlocked system memory to OOM kill the b2g parent process itself.  I've seen this behavior in other bugs when APZC was accidentally asking for too large a gralloc buffer.
Whiteboard: [MemShrink][c=effect p= s= u=1.3] → [MemShrink][c=effect p=3 s= u=1.3]
I'd like to try cap'ing our gralloc allocations at a reasonably high value.  For example N times the number of pixels on the screen or something.
That exactly is the problem here? Is the 2048x2048 canvas too big for gralloc? or is this an OOM issue?
(In reply to Andreas Gal :gal from comment #34)
> That exactly is the problem here? Is the 2048x2048 canvas too big for
> gralloc? or is this an OOM issue?

Its an OOM, but I think its directly related to the 2048x2048 canvases.  The gralloc system dutifully provides 16MB of memory in mlocked system memory for each of these canvases.  This causes a huge spike in memory usage.  If it can survive that, then the app seems to load ok.

I think this is concerning because, in theory, content could craft a canvas of a certain size that causes the b2g parent process to OOM.

A perhaps simple fix would be to return nullptr from AllocGrallocBuffer() if the request is larger than a configured threshold.
(In reply to Ben Kelly [:bkelly] from comment #32)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #30)
> > It's not clear to me from this bug how gralloc is blowing up. Can you
> > elaborate that part?
> 
> Well, if an app tries to create a canvas of 2048x2048 we currently as
> gralloc for a 16MB buffer.  This results in it used 16MB of mlocked system
> memory.
> 
> In theory I could craft an app that asked for enough mlocked system memory
> to OOM kill the b2g parent process itself.  I've seen this behavior in other
> bugs when APZC was accidentally asking for too large a gralloc buffer.

If the b2g process is giving out too much gralloc memory it should have a per-process limit that it will give to other processes. Once the allocation fails the regular failure path should just make this work.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #36)
> If the b2g process is giving out too much gralloc memory it should have a
> per-process limit that it will give to other processes. Once the allocation
> fails the regular failure path should just make this work.

Yeah. Although allocating a 2048x2048 software canvas is still pretty gross.
Flags: needinfo?(snorp)
Also: with the latest SkiaGL, if the gralloc allocations fail, we'll still use SkiaGL -- it will just be slow as hell due to using the basic (readback) surfaces. That's not good.
Here's a patch that limits total gralloc memory permitted for the system.  I used DPI to try to make it scale appropriately for low-end and high-end devices.  We get a ~16MB limit for buri up to about ~44MB limit for nexus-5.

It was not clear to me how to track things on a per-process basis given the somewhat confusing life-cycle of gralloc buffers.  This at least prevents content from denial-of-servicing the parent process.

Its unclear to me if we still need the 4096 height and width check just below it if this goes in.

Note, we also need bug 962278 uplifted to v1.3 in order for the DPI to be set properly.
Attachment #8391467 - Flags: feedback?(jmuizelaar)
Attachment #8391467 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 8391467 [details] [diff] [review]
b2g28_v1_3 patch:  Limit total gralloc memory based on device DPI.

Review of attachment 8391467 [details] [diff] [review]:
-----------------------------------------------------------------

This seems like a reasonable thing to do to me. It might be worth wrapping this in a pref or something to easily disable it just in case. I don't know how easy that is to do here.

::: gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp
@@ +261,5 @@
> +  // Calculate the maximum gralloc memory to permit on the device to avoid
> +  // using up too much mlocked system memory.  Otherwise content could use
> +  // canvases to OOM the parent process. Tuned to ~16MB on 160dpi screens
> +  // like the hamachi.  On nexus-5 this is ~44MB.
> +  size_t maxGrallocLimit = APZCTreeManager::GetDPI() * 1024 * 102;

You could get rid of the "102" magic number by writing this as:

(APZCTreeManager::GetDPI() / 160) * (16 * 1024 * 1024)

That way it maps more directly to the comment.
Attachment #8391467 - Flags: feedback?(bugmail.mozilla) → feedback+
Oh, and with that patch I can launch the game without OOMing.  (5 out of 5)

On a side note, the game vendor should just use a locked landscape setting in their manifest instead of the buggy notification window they have right now.
(In reply to (PTO|Mar15-Mar19) Kartikaya Gupta (email:kats@mozilla.com) from comment #40)
> This seems like a reasonable thing to do to me. It might be worth wrapping
> this in a pref or something to easily disable it just in case. I don't know
> how easy that is to do here.

I looked at doing a pref, but it was a bit annoying.  I think I need to read the pref on the main thread and this is the IPC thread.  Also, the gfx prefs seem to have been refactored between 1.3 and m-c.

That being said, I would be happy to a pref if someone can show me where they belong.  Is it gfxPref:: in m-c and gfxPlatform:: in 1.3?
So I converted this to a mozilla-central patch and the app is still OOM'ing for some reason.  I'd like to investigate what is going on differently in 1.4.
I am seeing this on 1.4 which I have not seen before:

  W/Adreno200-ES20( 1446): <qgl2DrvAPI_glFramebufferRenderbuffer:2569>: GL_OUT_OF_MEMORY
I'm going to test 1.4 with a pristine OEM firmware image to see if that problem occurs normally.  If so I plan to open a new 1.4? bug for that issue and will try to land my gralloc limiting patch here.
So the 1.4 crash happens on a clean install, so I think its unrelated.  Its also a different crash because instrumentation shows that we are not hitting the large gralloc that is the issue here.  We appear to die earlier in the app launch.

I wrote bug 985122 to track that problem.

I'm going to pursue landing my gralloc patch for this bug in m-c, m-a, and b2g28_v1_3.
See Also: → 985122
mozilla-central patch to limit total gralloc usage for the system.  Note, I chose not to do a pref right now to make uplifting easier since it appears the gfx prefs were refactored between v1.3 and v1.4.
Attachment #8393130 - Flags: review?(jmuizelaar)
Try build.  I did not have time for a local build after my hg pull -u, so I hope this works. :-)

  https://tbpl.mozilla.org/?tree=Try&rev=04dc653b43f9
Comment on attachment 8393130 [details] [diff] [review]
Limit total gralloc memory based on device DPI. (v1)

Review of attachment 8393130 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp
@@ +297,5 @@
> +
> +  // Calculate the maximum gralloc memory to permit on the device to avoid
> +  // using up too much mlocked system memory.  Otherwise content could use
> +  // canvases to OOM the parent process. Tuned to ~16MB on 160dpi screens
> +  // like the hamachi.  On nexus-5 this is ~44MB.

Why does it matter that the memory is mlocked? We don't have swap so does it make a difference?

@@ +298,5 @@
> +  // Calculate the maximum gralloc memory to permit on the device to avoid
> +  // using up too much mlocked system memory.  Otherwise content could use
> +  // canvases to OOM the parent process. Tuned to ~16MB on 160dpi screens
> +  // like the hamachi.  On nexus-5 this is ~44MB.
> +  size_t maxGrallocLimit = (16 * 1024 * 1024) * APZCTreeManager::GetDPI() / 160;

Can we use something that corresponds a little better with when we'll actually OOM?
Whiteboard: [MemShrink][c=effect p=3 s= u=1.3] → [MemShrink:P2][c=effect p=3 s= u=1.3]
(In reply to Jeff Muizelaar [:jrmuizel] from comment #49)
> Why does it matter that the memory is mlocked? We don't have swap so does it
> make a difference?

Well, it requires the app to page it in immediately.  I believe ashmem (their fallback allocator) uses contiguous memory, but I'm not 100% sure about that.

At the end of the day its just a huge immediate increase in memory usage in a very short time frame.  The linux memory killer algorithm works well with gradual changes, but can respond to spikes by OOM'ing the parent process.

I can remove the "mlocked" if you wish.
 
> Can we use something that corresponds a little better with when we'll
> actually OOM?

This will depend on the memory resources of the device.  I was trying to use the device resolution/screen size as a proxy for this.

I can't think of a legitimate reason why a buri with a 320x480 screen would need more than 16MB of graphics memory.  It only has 8MB of pmem anyway.

Note, even without this check you are likely to hit Kats's 4096 height/width limit for large allocations.  That crashes the child process completely instead of falling back to shared memory.  Personally I don't like that check as it seems bad form to crash an app that asks for a large canvas when we could instead use shmem.  Kats asked that we leave it in, though.

There's just a lot of room to cause mischief with a 4095x4095 canvas (~64MB) before hitting that hard limit.

I'm open to suggestions on how better to proceed here.  I just don't have many ideas as I'm not a graphics expert.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #49)
> Can we use something that corresponds a little better with when we'll
> actually OOM?

I was thinking about this some more last night and I think it would require some more extensive infrastructure.  Since we're multi-process we would need to understand the memory usage of multiple apps simultaneously.

We would also need to understand the device's provided memory resources better.  As far as I know we would need to access /proc/meminfo for this sort of thing.  I don't know that resources like pmem or ion memory are reported anywhere accessible to us.

In the end I think we could add a lot of elaborate infrastructure here that doesn't work much better than a simple threshold.

I'd like to address your concern if I can, but I think its just hard to pin down exactly what will cause an OOM or not at any given time.  My solution here was simply to prevent unbounded usage of a scarce system resource like gralloc to try to help the situation.

Any input on how you would like me to proceed is appreciated.  Thanks!
Comment on attachment 8391467 [details] [diff] [review]
b2g28_v1_3 patch:  Limit total gralloc memory based on device DPI.

Be nicer to Jeff's flag queue as I already have him on r? for the m-c patch.
Attachment #8391467 - Flags: feedback?(jmuizelaar)
Hmm, the try build shows some very unhappy emulator builds.  Lots of orange.  I'll need to look closer at that.
> Since we're multi-process we would need
> to understand the memory usage of multiple apps simultaneously.
> 
> We would also need to understand the device's provided memory resources
> better.  As far as I know we would need to access /proc/meminfo for this
> sort of thing.  I don't know that resources like pmem or ion memory are
> reported anywhere accessible to us.

The system memory reporter (it shows up in about:memory under the "System" section) reports lots of measurements from /proc, including pmem.
New try build since bug 980027 was backed out.  Hoping that was the source of the orange in the previous try:

  https://tbpl.mozilla.org/?tree=Try&rev=fffc28925d9a
Hi Milan, can you help me with some direction on this bug?

I believe the root problem here is that the app is requesting too much gralloc memory via canvas elements.  The way the firmware falls back to system memory for gralloc makes it very easy to OOM the app and even the parent process.

I think we need to provide some kind of limit on the overall gralloc resource to avoid content DoS'ing the device with large canvases; either accidentally or maliciously.

From Jeff's review feedback, though, it sounds like my simple threshold may not be adequate or desirable.  However, if we want something more involved the gfx team probably needs to drive what that's going to look like.

How would you like to proceed here?  Would you like someone from gfx to take this over?

Thanks!

(And sorry if this is just moving slow due to GDC this week.  I figured it would be better to ask than to wait and potentially have it fall off the radar.)
Flags: needinfo?(milan)
(In reply to Ben Kelly [:bkelly] from comment #55)
> New try build since bug 980027 was backed out.  Hoping that was the source
> of the orange in the previous try:
> 
>   https://tbpl.mozilla.org/?tree=Try&rev=fffc28925d9a

No, no.  This is my crash.  Let me dig into the emulator issue.
Comment on attachment 8393130 [details] [diff] [review]
Limit total gralloc memory based on device DPI. (v1)

Drop the review flag until I can investigate the emulator crashes on try.
Attachment #8393130 - Flags: review?(jmuizelaar)
By the way, either the 1.3 patch or the trunk patch is off by a factor of 1024.

I'd like to see if we can get this one green and then worry about minor review issues. Since 1.4 is different, that seems a separate regression, could be related to tiling, we'll take a look.
(In reply to Milan Sreckovic [:milan] from comment #59)
> By the way, either the 1.3 patch or the trunk patch is off by a factor of
> 1024.

Hmm, I think they are equivalent.  I was refactoring the math to express the comment better as suggested by Kats.

  16 * 1024 * 1024 / 160 ~= 102 * 1024

Or may be I'm missing another calculation.  Can you clarify?
(In reply to Ben Kelly [:bkelly] from comment #60)
> ...
> Or may be I'm missing another calculation.  Can you clarify?

Can't clarify.  Brain fart.  Sorry for the noise.
Flags: needinfo?(milan)
I saw this crash on device while testing contacts app image changes.  I was "stretching" the contact detail view down.  Still need to investigate the crash further.
(In reply to Ben Kelly [:bkelly] from comment #51)
> 
> I'd like to address your concern if I can, but I think its just hard to pin
> down exactly what will cause an OOM or not at any given time.  My solution
> here was simply to prevent unbounded usage of a scarce system resource like
> gralloc to try to help the situation.

Do you have an explanation for what makes gralloc more scarse than regular system memory? i.e why would using shmem even help here?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #63)
> (In reply to Ben Kelly [:bkelly] from comment #51)
> > 
> > I'd like to address your concern if I can, but I think its just hard to pin
> > down exactly what will cause an OOM or not at any given time.  My solution
> > here was simply to prevent unbounded usage of a scarce system resource like
> > gralloc to try to help the situation.
> 
> Do you have an explanation for what makes gralloc more scarse than regular
> system memory? i.e why would using shmem even help here?

I believe it typically allocates pages that are not managed by the MMU, so they must be physically contiguous. That means any amount of fragmentation could result in failed allocations.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #64)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #63)
> > (In reply to Ben Kelly [:bkelly] from comment #51)
> > > 
> > > I'd like to address your concern if I can, but I think its just hard to pin
> > > down exactly what will cause an OOM or not at any given time.  My solution
> > > here was simply to prevent unbounded usage of a scarce system resource like
> > > gralloc to try to help the situation.
> > 
> > Do you have an explanation for what makes gralloc more scarse than regular
> > system memory? i.e why would using shmem even help here?
> 
> I believe it typically allocates pages that are not managed by the MMU, so
> they must be physically contiguous. That means any amount of fragmentation
> could result in failed allocations.

This limitation only applies to pmem and is only needed for use with components that don't support page tables like the hwcomposer. The adreno gpu does have an mmu and can use non-pmem gralloc memory.
Making a gralloc allocation that falls back to ashmem does this:

  int prot = PROT_READ | PROT_WRITE;
  fd = ashmem_create_region(name, data.size);
  ashmem_set_prot_region(fd, prot);
  base = mmap(0, data.size, prot, MAP_SHARED|MAP_POPULATE|MAP_LOCKED, fd, 0);
  memset((char*)base + offset, 0, data.size);

Whereas the skia ashmem allocator does this:

  fd = ashmem_create_region(fName, size);
  ashmem_set_prot_region(fd, PROT_READ | PROT_WRITE);
  addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);

So the gralloc fallback to ashmem uses MAP_LOCKED|MAP_POPULATE and then zero's the entire buffer.  I believe this forces all pages to be faulted in immediately.

I can't definitively say why this leads to more OOMs, but my belief is that the system cannot gracefully handle a sudden spike in memory like this when already close to its memory limits.  It does not have time to page out unused code, kill low priority apps, etc.  Instead, the LMK finds the system suddenly very close to overcommit failure and starts killing very aggressively to keep the OS running.
(In reply to Ben Kelly [:bkelly] from comment #66)
> Making a gralloc allocation that falls back to ashmem does this:
> 
>   int prot = PROT_READ | PROT_WRITE;
>   fd = ashmem_create_region(name, data.size);
>   ashmem_set_prot_region(fd, prot);
>   base = mmap(0, data.size, prot, MAP_SHARED|MAP_POPULATE|MAP_LOCKED, fd, 0);
>   memset((char*)base + offset, 0, data.size);
> 
> Whereas the skia ashmem allocator does this:
> 
>   fd = ashmem_create_region(fName, size);
>   ashmem_set_prot_region(fd, PROT_READ | PROT_WRITE);
>   addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
> 
> So the gralloc fallback to ashmem uses MAP_LOCKED|MAP_POPULATE and then
> zero's the entire buffer.  I believe this forces all pages to be faulted in
> immediately.
> 
> I can't definitively say why this leads to more OOMs, but my belief is that
> the system cannot gracefully handle a sudden spike in memory like this when
> already close to its memory limits.  It does not have time to page out
> unused code, kill low priority apps, etc.  Instead, the LMK finds the system
> suddenly very close to overcommit failure and starts killing very
> aggressively to keep the OS running.

So it's actually the MAP_POPULATE that causes the difference?
I'll investigate further and see if I can get a more definitive answer.
Commenting out the MAP_POPULATE did not help.

I need to investigate further, but it seems perhaps that the ashmem allocated by gralloc is counted against both the child and parent processes by the low memory killer.

Here is some instrumentation showing the change in RSS in both the parent and child process from allocating the large 2048x2048 canvas here:

E/memalloc( 8017): /dev/pmem: No more pmem available
W/memalloc( 8017): Falling back to ashmem
E/memalloc( 8017): ### ### ashmem allocting 16777216 bytes
I/Gecko   ( 8017): ### ### PARENT 2048x2048:  RSS delta: 13209600
I/Gecko   ( 8186): ### ### CHILD 2048x2048:  RSS delta: 15564800

In theory these should be increases to PSS, while the USS would stay low.  The memory killer uses get_mm_rss(), though.  I believe this means the memory killer is just looking at RSS.

I still need to look and see how the vm system triggers the memory killer in the first place to determine if the memory is double-counted there.  Maybe this is all correct, but it would seem to make it easier to OOM.
After thinking about it more, it seems perfectly reasonable for the OOM killer to use RSS.

I switched tacks and am looking at what the differences are between gralloc and the fallback path.  I think I was looking at the wrong code when I wrote about the skia ashmem allocator in comment 66.  I'm trying to focus on figuring out how this side of the code works now.

So far it appears that we create a similar sized memory buffer using the IPC generated AllocUnsafeShmem() function.  Interestingly, if I force the large allocations to use this path, they do not trigger an immediate increase in the RSS of the app:

I/Gecko   (13767): ### ### short circuit 2048x2048
I/Gecko   (13943): ### ### CHILD 2048x2048:  RSS delta: 0
I/Gecko   (13943): ### ### AllocSurfaceDescriptorWithCaps() call AllocSharedImageSurface for width:2048 height:2048
I/Gecko   (13943): ### ### gfxBaseSharedMemorySurface::Create() stride:8192 size:16777232
I/Gecko   (13943): ### ### Android SharedMemoryBasic::Create() 16781312 bytes
I/Gecko   (13943): ### ### Android SharedMemoryBasic::Map() 16781312 bytes
I/Gecko   (13767): ### ### Android SharedMemoryBasic::Map() 16781312 bytes
I/Gecko   (13943): ### ### gfxBaseSharedMemorySurface::Create() RSS delta: 0


This is eventually performing an open() on an ashmem device and then using an ioctl to set the size here:

  http://mxr.mozilla.org/mozilla-central/source/ipc/glue/SharedMemoryBasic_android.cpp#60

And then an mmap() here:

  http://mxr.mozilla.org/mozilla-central/source/ipc/glue/SharedMemoryBasic_android.cpp#78

We then perform the same map on the child process.

This is not doing any kind of MAP_POPULATE or memset() zero'ing like gralloc does, though.  So it may still be the issue of paging in the large buffer too fast.

While I removed MAP_POPULATE before to test its effects on the gralloc, I did not try removing or slowing the memset().  I see what that does next.
First, let me say thanks to Jeff for pushing me to dig deeper here.  There does seem to be something else going on beyond just the gralloc buffer itself.

I want to provide an update on my investigation, but there are a lot of logs so far.  Here is the short, TLDR version:

  gralloc case:
    rss after 1st 2048x2048 canvas:  ~75MB
    rss after 2nd 2048x2048 canvas:  ~90MB with a hard OOM via failed page fault

  shmem case:
    rss after 1st 2048x2048 canvas:  ~54MB
    rss after 2nd 2048x2048 canvas:  ~67MB

I see the allocations increasing RSS by close to 16MB even in the shmem case, so something else in the gralloc path seems to be using more memory.


Here is a longer version:

When we use gralloc allocations that fallback to ashmem for the 2048x2048 canvases we see the first allocation succeed and then everything blows up on the second:

E/memalloc( 5416): /dev/pmem: No more pmem available
W/memalloc( 5416): Falling back to ashmem
E/memalloc( 5416): ### ### ashmem allocting 16777216 bytes
I/Gecko   ( 5416): ### ### PARENT 2048x2048:  RSS delta: 16752640
I/Gecko   ( 5741): ### ### CHILD 2048x2048:  RSS delta: 16756736 new total: 75190272
I/Gecko   ( 5741): ### ### CHILD 2048x2048:  after sleep RSS delta: 0 new total: 75190272
I/Gecko   ( 5741): SharedSurface_Gralloc::Create: success -- surface 0x432b51c0, GraphicBuffer 0x43256500.
I/Gecko   ( 5741): ### ### CanvasRenderingContext2D::EnsureTarget() 2048x2048 END rss delta:22708224 total:75452416
I/Gecko   ( 5741): ### ### CanvasRenderingContact2D::FillRect() x:-1.000000 y:-1.000000 w:2050.000000 h:2050.000000 END rss total:75452416
I/Gecko   ( 5741): ### ### CanvasRenderingContact2D::FillText() x:50.000000 y:1024.000000 START rss total:75452416
I/Gecko   ( 5741): ### ### CanvasRenderingContact2D::FillText() x:50.000000 y:1024.000000 END rss total:75452416
I/Gecko   ( 5741): ### ### CanvasRenderingContext2D::EnsureTarget() 960x536 START rss total:75722752
I/Gecko   ( 5741): ### ### CanvasRenderingContext2D::EnsureTarget() 2048x2048 START rss total:75722752
I/Gecko   ( 5416):
I/Gecko   ( 5416): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv

The last we see of the client process here it has an RSS of ~75MB.

The next allocation causes us to get killed by the linux OOM killer.  (Note, there are two OOM killers.  The android OOM killer which tries to respect LRU order and things, and then the less discriminate linux OOM killer.)

<4>[70658.190381] select 5759 ((Preallocated a), adj 10, size 4254, to kill
<4>[70658.190401] send sigkill to 5759 ((Preallocated a), adj 10, size 4254
<4>[70658.333849] Cupcakes vs. Ve invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=2, oom_score_adj=134
<4>[70658.333914] [<c00497f4>] (unwind_backtrace+0x0/0x12c) from [<c0135760>] (dump_header.clone.1+0x6c/0x18c)
<4>[70658.333941] [<c0135760>] (dump_header.clone.1+0x6c/0x18c) from [<c01358c0>] (oom_kill_process.clone.0+0x40/0x21c)
<4>[70658.333964] [<c01358c0>] (oom_kill_process.clone.0+0x40/0x21c) from [<c0135cf8>] (out_of_memory+0x25c/0x330)
<4>[70658.333991] [<c0135cf8>] (out_of_memory+0x25c/0x330) from [<c0139728>] (__alloc_pages_nodemask+0x4fc/0x6ac)
<4>[70658.334018] [<c0139728>] (__alloc_pages_nodemask+0x4fc/0x6ac) from [<c014ec30>] (handle_pte_fault+0x188/0x818)
<4>[70658.334041] [<c014ec30>] (handle_pte_fault+0x188/0x818) from [<c014f8e0>] (handle_mm_fault+0x1c0/0x1dc)
<4>[70658.334066] [<c014f8e0>] (handle_mm_fault+0x1c0/0x1dc) from [<c05b8d10>] (do_page_fault+0x164/0x314)
<4>[70658.334094] [<c05b8d10>] (do_page_fault+0x164/0x314) from [<c003d240>] (do_DataAbort+0x34/0x94)
<4>[70658.334121] [<c003d240>] (do_DataAbort+0x34/0x94) from [<c05b7160>] (ret_from_exception+0x0/0x10)

[Trim...]

<3>[70658.337946] Out of memory: Kill process 5741 (Cupcakes vs. Ve) score 626 or sacrifice child
<3>[70658.337966] Killed process 5741 (Cupcakes vs. Ve) total-vm:171476kB, anon-rss:57880kB, file-rss:32744kB

Here we see that the linux OOM killer is running because it tried to map a page, failed due to lack of resources, and then triggered do_page_fault() and ultimately the oom killer.  At this point the child process has an RSS ~90MB.


In comparison this is what we see when we force the 2048x2048 canvases to fallback to shared memory:

I/Gecko   ( 6496): ### ### short circuit 2048x2048
I/Gecko   ( 6670): ### ### CHILD 2048x2048:  RSS delta: 0 new total: 42037248
I/Gecko   ( 6670): ### ### CHILD 2048x2048:  after sleep RSS delta: 0 new total: 42037248
I/Gecko   ( 6670): ### ### AllocSurfaceDescriptorWithCaps() call AllocSharedImageSurface for width:2048 height:2048
I/Gecko   ( 6670): ### ### gfxBaseSharedMemorySurface::Create() stride:8192 size:16777232
I/Gecko   ( 6670): ### ### Android SharedMemoryBasic::Create() 16781312 bytes
I/Gecko   ( 6670): ### ### Android SharedMemoryBasic::Map() 16781312 bytes
I/Gecko   ( 6496): ### ### Android SharedMemoryBasic::Map() 16781312 bytes
I/Gecko   ( 6670): ### ### gfxBaseSharedMemorySurface::Create() RSS delta: 0
I/Gecko   ( 6670): ### ### CanvasRenderingContext2D::EnsureTarget() 2048x2048 END rss delta:13987840 total:55336960
I/Gecko   ( 6670): ### ### CanvasRenderingContact2D::FillRect() x:-1.000000 y:-1.000000 w:2050.000000 h:2050.000000
END rss total:55336960
I/Gecko   ( 6670): ### ### CanvasRenderingContact2D::FillText() x:50.000000 y:1024.000000 START rss total:55365632
I/Gecko   ( 6670): ### ### CanvasRenderingContact2D::FillText() x:50.000000 y:1024.000000 END rss total:55365632
I/Gecko   ( 6670): ### ### CanvasRenderingContact2D::FillRect() x:-1.000000 y:-1.000000 w:2050.000000 h:2050.000000 START rss total:54824960
I/Gecko   ( 6670): ### ### CanvasRenderingContext2D::EnsureTarget() 2048x2048 START rss total:54824960
I/Gecko   ( 6670): SharedSurface_Gralloc::Create -------
I/Gecko   ( 6496): ### ### short circuit 2048x2048
I/Gecko   ( 6670): ### ### CHILD 2048x2048:  RSS delta: 0 new total: 55496704
I/Gecko   ( 6670): ### ### CHILD 2048x2048:  after sleep RSS delta: 0 new total: 55496704
I/Gecko   ( 6670): ### ### AllocSurfaceDescriptorWithCaps() call AllocSharedImageSurface for width:2048 height:2048
I/Gecko   ( 6670): ### ### gfxBaseSharedMemorySurface::Create() stride:8192 size:16777232
I/Gecko   ( 6670): ### ### Android SharedMemoryBasic::Create() 16781312 bytes
I/Gecko   ( 6670): ### ### Android SharedMemoryBasic::Map() 16781312 bytes
I/Gecko   ( 6670): ### ### gfxBaseSharedMemorySurface::Create() RSS delta: 0
I/Gecko   ( 6496): ### ### Android SharedMemoryBasic::Map() 16781312 bytes
I/Gecko   ( 6670): ### ### CanvasRenderingContext2D::EnsureTarget() 2048x2048 END rss delta:12500992 total:67325952
I/Gecko   ( 6670): ### ### CanvasRenderingContact2D::FillRect() x:-1.000000 y:-1.000000 w:2050.000000 h:2050.000000 END rss total:67325952
I/Gecko   ( 6670): ### ### CanvasRenderingContact2D::FillText() x:50.000000 y:1024.000000 START rss total:67325952
I/Gecko   ( 6670): ### ### CanvasRenderingContact2D::FillText() x:50.000000 y:1024.000000 END rss total:67325952
I/Gecko   ( 6496):
I/Gecko   ( 6496): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
I/Gecko   ( 6496):
I/Gecko   ( 6496): [Parent 6496] WARNING: waitpid failed pid:6542 errno:10: file /srv/mozilla-b2g28_v1_3/ipc/chromium/src/base/process_util_posix.cc, line 254
I/Gecko   ( 6670): ### ### CanvasRenderingContext2D::EnsureTarget() 960x536 START rss total:69509120

Here at the end of the first 2048x2048 allocation the child process is only using ~54MB.  After the second 2048x2048 allocation we are using ~67MB.
(In reply to Ben Kelly [:bkelly] from comment #71)
> I see the allocations increasing RSS by close to 16MB even in the shmem
> case, so something else in the gralloc path seems to be using more memory.

Well, to be more clear, CanvasRenderingContact2D::EnsureTarget() increases RSS much more in the gralloc case compared to the shmem case.

  gralloc EnsureTarget increase:  22708224
  shmem EnsureTarget increase:    13987840

That's just over 8MB more RSS in the gralloc case.  I'll try to figure out why that is next.
(In reply to Ben Kelly [:bkelly] from comment #72)
> (In reply to Ben Kelly [:bkelly] from comment #71)
> > I see the allocations increasing RSS by close to 16MB even in the shmem
> > case, so something else in the gralloc path seems to be using more memory.
> 
> Well, to be more clear, CanvasRenderingContact2D::EnsureTarget() increases
> RSS much more in the gralloc case compared to the shmem case.
> 
>   gralloc EnsureTarget increase:  22708224
>   shmem EnsureTarget increase:    13987840
> 
> That's just over 8MB more RSS in the gralloc case.  I'll try to figure out
> why that is next.

Before bug 939276, EnsureTarget would create a GLContext. That could probably explain the extra memory usage. It should be better in master (and I think 1.4?)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #73)
> Before bug 939276, EnsureTarget would create a GLContext. That could
> probably explain the extra memory usage. It should be better in master (and
> I think 1.4?)

I wonder if this might explain why we fail in a different way on 1.4 as well. (bug 985122)

I'll take a look at this today.  Still, 8+MB seems like a lot even if we're creating a new context, no?
(In reply to Ben Kelly [:bkelly] from comment #74)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #73)
> > Before bug 939276, EnsureTarget would create a GLContext. That could
> > probably explain the extra memory usage. It should be better in master (and
> > I think 1.4?)
> 
> I wonder if this might explain why we fail in a different way on 1.4 as
> well. (bug 985122)
> 
> I'll take a look at this today.  Still, 8+MB seems like a lot even if we're
> creating a new context, no?

It does seem a little higher than I would expect, yeah.
Over in bug 985122 the discussion turned towards de-optimizing these extreme cases in order to be more resource efficient.  In particular, James suggested disabling skia for any canvas larger than the device screen.

This patch implements this suggestion and allows the game to load on both m-c and v1.3.  So this patch addresses bug 985122 as well.
Attachment #8391467 - Attachment is obsolete: true
Attachment #8393130 - Attachment is obsolete: true
Attachment #8398066 - Flags: review?(snorp)
Comment on attachment 8398066 [details] [diff] [review]
Do not use skia for canvas larger than device screen. (v1)

Review of attachment 8398066 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good overall, just a couple tweaks

::: content/canvas/src/CanvasRenderingContext2D.cpp
@@ +137,5 @@
>  static int64_t gCanvasAzureMemoryUsed = 0;
>  
> +// Cache the number of pixels available on the primary screen to use when
> +// determining if a canvas should be skia accelerated or not.
> +static int32_t gScreenPixels = -1;

I think you can just move this inside CheckSizeForSkiaGL

@@ +833,5 @@
> +  if (size.width < minsize || size.height < minsize) {
> +    return false;
> +  }
> +
> +  // Cache the number of pixels on the primary screen

I think we want to have a pref for this max-value, similar to the min-value above. If < 0, then use screen size, 0 is unlimited, otherwise use the pref value. If you don't feel like adding a pref, I think this should at least be #ifdef MOZ_B2G
Attachment #8398066 - Flags: review?(snorp) → review-
Thanks for the quick review!

Here's a version with the pref you suggested.  Default is set to unlimited max and b2g.js enabled screen-based limit.

https://tbpl.mozilla.org/?tree=Try&rev=06e65f9bbff8
Attachment #8398066 - Attachment is obsolete: true
Attachment #8398117 - Flags: review?(snorp)
Attachment #8398117 - Flags: review?(snorp) → review+
Duplicate of this bug: 985122
https://hg.mozilla.org/mozilla-central/rev/facbc33656d5
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Whiteboard: [MemShrink:P2][c=effect p=3 s= u=1.3] → [c=effect p=3 s=2014.03.28 u=1.3] [MemShrink:P2]
Please request approval-mozilla-b2g28 on this patch if it's ready for uplift to v1.3.
Flags: needinfo?(bkelly)
Who can grant approval while Fabrice is on PTO this week?  Jason, do you know?
Flags: needinfo?(bkelly) → needinfo?(jsmith)
(In reply to Ben Kelly [:bkelly] from comment #84)
> Who can grant approval while Fabrice is on PTO this week?  Jason, do you
> know?

Rel man should be able to do it - either Preeti or Bhavana.
Flags: needinfo?(jsmith)
Comment on attachment 8398117 [details] [diff] [review]
Do not use skia for canvas larger than device screen. (v2)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Skia-accelerated canvases
User impact if declined: Apps that previously worked in 1.1 now OOM more easily due to the graphics resources needed for large canvases.
Testing completed: Verified on device for v1.3 and v1.5.
Risk to taking this patch (and alternatives if risky): Low.  Provided preference makes it easy to disable if a problem occurs.
String or UUID changes made by this patch:  None
Attachment #8398117 - Flags: approval-mozilla-b2g28?
Flags: needinfo?(praghunath)
Flags: needinfo?(bbajaj)
Sheriffs, I assume you will merge the actual commit rev, but please note the updated commit message is not reflected in the attachment here.
Flags: needinfo?(bbajaj)
Keywords: verifyme
Comment on attachment 8398117 [details] [diff] [review]
Do not use skia for canvas larger than device screen. (v2)

Requesting QA verification once this lands.
Attachment #8398117 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Flags: needinfo?(praghunath)
Verified as fixed, v1.5, I am able to launch the game and the app runs and display graphics. Verified fixed on the latest Master Buri build v1.5:

v1.5 4/2 Environmental Variables:
Device: Buri 1.5 MOZ RIL
BuildID: 20140402040201
Gaia: 6274b3645b3d41df4aa5a55398b5d106edf1c528
Gecko: 4941a2ac0786
Version: 31.0a1
Firmware Version: V1.2-device.cfg

-

Verified as fixed, v1.4, I am able to launch the game and the app runs and display graphics. Verified fixed on the latest Buri build v1.4:

v1.4 4/2 Environmental Variables:
Device: Buri 1.4 MOZ RIL
BuildID: 20140402000202
Gaia: 1db7a4857faec427799de0827964128b445e80ea
Gecko: b1b159c885cc
Version: 30.0a2
Firmware Version: V1.2-device.cfg

-

Verified as fixed, v1.3, I am able to launch the game and the app runs and display graphics. Verified fixed on the latest Buri build v1.3:

v1.3 4/2 Environmental Variables:
Device: Buri 1.3 MOZ RIL
BuildID: 20140402164004
Gaia: c5cd3a11e91339163b351d50769eaca0067da860
Gecko: c6c7b01cdb8e
Version: 28.0
Firmware Version: V1.2-device.cfg

-

1) Verified against Master v1.5, changing status to verified.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Blocks: 999841
You need to log in before you can comment on or make changes to this bug.