Closed Bug 941160 Opened 8 years ago Closed 8 years ago

[B2G][MMS] Crash will occur when attaching images to an MMS

Categories

(Core :: Graphics, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED DUPLICATE of bug 939962
blocking-b2g koi+
Tracking Status
b2g-v1.2 --- affected
b2g-v1.3 --- affected

People

(Reporter: pfield, Assigned: snorp)

References

Details

(Keywords: memory-footprint, Whiteboard: burirun4 [MemShrink:P1])

Attachments

(2 files)

Attached file log.txt
Description:
When adding multiple images to an MMS a crash will occur, sending the user back to the home screen.

NOTE: This issue does not reproduce on the Buri 20131104 Base Image, on the latest Buri 1.2 build.

Repro Steps:
1) Updated Buri to V1.2 Com RIL Build ID: 20131118004001
2) Open the Messenger app, create a new message.
3) Select the attachment icon and select gallery.
4) Select an Image and tap on the check box to add to the MMS.
5) Repeat step 3-4 two or three times.

Actual:
Crash occurred when adding MMS

Expected:
Crash never occurs when adding MMS

Environmental Variables
Device: Buri ver 1.2 COM RIL
Build ID: 20131118004001
Gecko: http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/3e8b854a992e
Gaia: 7a23f8c53ba97da9c63a7275b36d155b4526a639
Platform Version: 26.0
RIL Version: 01.02.00.019.102 
Firmware Version: V1.2_US_20131115

Notes:Does not generate crash report but app does crash and force close. 
Always takes more than one MMS to cause crash.

Repro frequency: 70% 
Link to failed test case: https://moztrap.mozilla.org/manage/case/8511/
See attached: Logcat
This looks graphics related based on the log.

Milan - Can someone on your team look into this to see why this is happening?

I'm going to assume defensively that this is a bug on our side, as the 1.1 base image doesn't trigger the correct gfx workflows anyways. This is likely a bug only seen when we're using the correct base image. I actually don't know how this could be a vendor bug, so I'd like input from gfx to see what they think is going on here.
blocking-b2g: --- → koi?
Component: Gaia::SMS → Graphics
Flags: needinfo?(milan)
Keywords: perf
Product: Firefox OS → Core
Whiteboard: burirun4 → burirun4 [MemShrink]
Sotaro, could you take a look if this is us or vendor issue?
Assignee: nobody → sotaro.ikeda.g
Flags: needinfo?(milan) → needinfo?(sotaro.ikeda.g)
(In reply to Milan Sreckovic [:milan] from comment #2)
> Sotaro, could you take a look if this is us or vendor issue?

I take a look.
Flags: needinfo?(sotaro.ikeda.g)
I tested on master b2g with pmem fallback ROM built from today's latest source. I also see the problem. from 'adb shell dmesg', Message app was killed by low memory killer.

<4>[  251.902831] select 636 ((Preallocated a), adj 10, size 4828, to kill
<4>[  251.902854] send sigkill to 636 ((Preallocated a), adj 10, size 4828
<4>[  253.421836] idle_stats_device: Overwriting samples
<4>[  288.474128] select 559 (Gallery), adj 10, size 7091, to kill
<4>[  288.474153] send sigkill to 559 (Gallery), adj 10, size 7091
<4>[  289.216199] select 424 (Homescreen), adj 8, size 5112, to kill
<4>[  289.216226] send sigkill to 424 (Homescreen), adj 8, size 5112
<4>[  307.115284] select 718 (Gallery), adj 10, size 8536, to kill
<4>[  307.115306] send sigkill to 718 (Gallery), adj 10, size 8536
<4>[  309.359411] select 759 (plugin-containe), adj 10, size 124, to kill
<4>[  309.359453] send sigkill to 759 (plugin-containe), adj 10, size 124
<4>[  333.999933] select 762 (Settings), adj 10, size 8153, to kill
<4>[  333.999956] send sigkill to 762 (Settings), adj 10, size 8153
<4>[  334.605773] select 894 (plugin-containe), adj 10, size 496, to kill
<4>[  334.605788] send sigkill to 894 (plugin-containe), adj 10, size 496
<4>[  338.606279] select 820 (Gallery), adj 10, size 7736, to kill
<4>[  338.606298] send sigkill to 820 (Gallery), adj 10, size 7736
<4>[  339.820094] select 765 (Homescreen), adj 8, size 4855, to kill
<4>[  339.820114] send sigkill to 765 (Homescreen), adj 8, size 4855
<4>[  340.338408] select 565 (Messages), adj 2, size 25995, to kill
<4>[  340.338429] send sigkill to 565 (Messages), adj 2, size 25995
(In reply to Jason Smith [:jsmith] from comment #1)
This is likely
> a bug only seen when we're using the correct base image. 

If we do not use correct base image (no fallback from pmem to ashmem), memory allocation failed because of out of pmem.
> 
> Notes:Does not generate crash report but app does crash and force close. 
> Always takes more than one MMS to cause crash.

This bug seems not crash, but the MMS killed by low memory killer because of out of memory.
pfield, can you attach also "adb shell dmesg" log?
Flags: needinfo?(pfield)
Recently, memory usage of 2d canvas was shrunk by Bug 920160 and Bug 927254. They decreases memory usage a lot. If there is no gralloc fallback from pmem to ashmem, gralloc usage hit the limit when the device became out of pmem. But the fallback enables more memory usage.
Attached file $ adb shell dmesg.txt
Flags: needinfo?(pfield)
From attachment 8336378 [details], Messages was killed by low memory killer.
When disable SkiaGL by disabling following, the problem did not happen.
 >pref("gfx.canvas.azure.backends", "skia");
 >pref("gfx.canvas.azure.accelerated", true);
(In reply to Sotaro Ikeda [:sotaro] from comment #11)
> When disable SkiaGL by disabling following, the problem did not happen.
>  >pref("gfx.canvas.azure.backends", "skia");
>  >pref("gfx.canvas.azure.accelerated", true);

That concludes this is a regression from Skia GL, which I think makes this a cut and dry blocker.
Sotaro, what happens with these settings:

pref("gfx.canvas.azure.backends", "skia");
pref("gfx.canvas.azure.accelerated", false);

as in, Skia, but not SkiaGL.

In general, I would expect us to run out of memory sooner with SkiaGL - after all, we have the caching enabled, and caching means more memory usage.  This is not necessarily a bug, just "we're using more memory".

It may be worth testing with caching turned off completely (or down to that 2MB minimum).  We're supposed to get back 2MB from bug 939392 anyway :)

George, Snorp, let's see if we can keep Skia/GL in 1.2, we know it gives us a lot of performance improvement.
Flags: needinfo?(snorp)
Flags: needinfo?(gwright)
(In reply to Milan Sreckovic [:milan] from comment #13)
> Sotaro, what happens with these settings:
> 
> pref("gfx.canvas.azure.backends", "skia");
> pref("gfx.canvas.azure.accelerated", false);

By the above setting, the STR was passed without the crash(not killed). GL/GLCache part seems to be a problem.
(In reply to Milan Sreckovic [:milan] from comment #13)
> Sotaro, what happens with these settings:
> 
> pref("gfx.canvas.azure.backends", "skia");
> pref("gfx.canvas.azure.accelerated", false);
> 
> as in, Skia, but not SkiaGL.
> 
> In general, I would expect us to run out of memory sooner with SkiaGL -
> after all, we have the caching enabled, and caching means more memory usage.
> This is not necessarily a bug, just "we're using more memory".
> 
> It may be worth testing with caching turned off completely (or down to that
> 2MB minimum).  We're supposed to get back 2MB from bug 939392 anyway :)
> 
> George, Snorp, let's see if we can keep Skia/GL in 1.2, we know it gives us
> a lot of performance improvement.

Right now 2MB is the minimum Skia texture cache size, but it's shared between all canvas instances. I'd recommend against making that much lower (and certainly not zero), as performance would be much worse than software because of constant uploads.

Assuming the patch from bug 927254 was in the build, I don't know what we can really do to reduce memory usage much more. One idea I've had in the past is to back immutable Skia bitmaps with ashmem gralloc buffers, which would remove the tex2d copy, but that's going to be significant work with a lot of edge cases I'd imagine.
Flags: needinfo?(snorp)
Whiteboard: burirun4 [MemShrink] → burirun4 [MemShrink:P1]
Milan, I might be better to re-assign this bug to a engineer knows well about SkiaGL. How do you thinks about it?
Flags: needinfo?(milan)
Severity: normal → major
Priority: -- → P1
Whiteboard: burirun4 [MemShrink:P1] → [c=memory p= s= u=] burirun4 [MemShrink:P1]
blocking-b2g: koi? → koi+
We need product to make the call on SkiaGL in 1.2.  See Bug 939962 as it is the same underlying cause.
Flags: needinfo?(clee)
Whiteboard: [c=memory p= s= u=] burirun4 [MemShrink:P1] → [c=memory p= s= u=1.2] burirun4 [MemShrink:P1]
Severity: major → blocker
We have a product decision - SkiaGL stays in.  James, I gave you this one and :gw280 bug 939962, though they are probably the same.  I don't want to make that assumption though.  Can you two please work together to see what the alternatives are.
Assignee: sotaro.ikeda.g → snorp
Flags: needinfo?(milan)
Does the patch for bug 946541 fix this?
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #19)
> Does the patch for bug 946541 fix this?

Putting qawanted to test this in a build with the patch from bug 946541 included (i.e. test this on a 12/6 1.2 build or later).
Keywords: qawanted
Keywords: perffootprint
Whiteboard: [c=memory p= s= u=1.2] burirun4 [MemShrink:P1] → burirun4 [MemShrink:P1]
As Milan guessed, this bug is probably related to bug 939962.  When the MMS app picks an image from the Gallery app, the gallery app offers the user the chance to crop the image.  And image cropping is done by the same part of the Gallery app that does image editing.

For bug 935273 I've made this more efficient, so that the full-size image is not decoded unless the user actually does a crop.

Now that 935273 has landed, I would expect that the OOM will now only occur if you actually crop the image after picking it.

If you can't solve the underlying issue, a possible workaround would be to modify the MMS app to set nocrop:true in the activity request it uses to pick an attachment.  This will tell the gallery to not allow cropping. That would be a UX change, however, so you'd want to get UX approval first, of course.
Actually, after reading the STR more carefully, I'm not sure whether the MMS app is OOMing because the gallery app is using too much memory or whether it OOMs when it tries to resize the image it just received. Resizing the image requires decoding it at full size which allocates a bunch of memory.

Another possible (last-resort) workaround would be to modify the pick activity so the gallery returns both the full-size image and the preview image.  The preview image is at least as big as the screen, and that might be about the size that is desired here.  It would probably only make sense with the nocrop:true fix mentioned above, however.
I can still repro this issue on Buri 1.2 Build ID: 20131206004002

Gaia   d48df33cbdae2063f12b09e3dca07ff7a2f2a3cc
SourceStamp fd0302696c57
BuildID 20131206004002
Version 26.0


(In reply to Jason Smith [:jsmith] from comment #20)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #19)
> > Does the patch for bug 946541 fix this?
> 
> Putting qawanted to test this in a build with the patch from bug 946541
> included (i.e. test this on a 12/6 1.2 build or later).
Keywords: qawanted
Peter

Please pitch in for product.

Moving to koi? till product decision is made.
blocking-b2g: koi+ → koi?
Flags: needinfo?(pdolanjski)
Based on comment #18, it sounds like product chimed in.  Milan, who made the SkiaGL decision?
Flags: needinfo?(pdolanjski) → needinfo?(milan)
(In reply to Peter Dolanjski [:pdol] from comment #25)
> Based on comment #18, it sounds like product chimed in.  Milan, who made the
> SkiaGL decision?

Andreas did. This was already discussed on b2g-release-drivers that we are moving forward with blocking on fixing the memory issues for Skia GL. Please move this back to koi+.
blocking-b2g: koi? → koi+
Flags: needinfo?(milan)
I commented in another bug that we could consider demoting to a software canvas under memory pressure. This would eliminate the GLContext and friends (SurfaceStream, etc) and also whatever GPU stuff Skia has. The problem with this is that in order to do the demotion, we have to allocate a software DrawTarget and then copy the contents from original one. This will obviously use even more memory, which is clearly not a great idea when we are already in a low memory situation. I thought about having an "emergency" demotion mode that would skip this copy, but I don't think it's a very good idea. Content would have no idea when this occurred, so it wouldn't have an opportunity to recover. We would need to add something similar to the "lost context" event that WebGL has.
Duplicate of this bug: 945774
Can we check this again on today's 1.2 build? The automation report is saying this is passing now, so I want to double check if this reproduces manually.
Keywords: qawanted
I can still repro this issue on the Buri 1.2 Build ID: 20131211004007

Gaia   096722a9e2510ecdfe45ba7382d7d50826b82feb
SourceStamp 43d7b300241a
BuildID 20131211004007
Version 26.0
Keywords: qawanted
We're at a bit of an end of the road here.  George is going to take the upstream Skia and see if it fixes the issue (they handle cache clearing better in the newer versions); either way, I don't think we have a 1.2 solution to this issue, short of disabling SkiaGL, at startup, on any phone < 256mb (or perhaps even < 384mb)
Flags: needinfo?(gwright)
James will try adding a flag that forces the canvas used as a part of the process to software instead, see if that saves us enough memory to survive the operation.
(In reply to Milan Sreckovic [:milan] from comment #32)
> James will try adding a flag that forces the canvas used as a part of the
> process to software instead, see if that saves us enough memory to survive
> the operation.

Hi Milan, Snorp.  any progress on the patch here?
There is a patch that adds a flag to force a software context on bug 884226. Just need one more r+ there and we can put that in. Once that is in we'll need an additional patch to set the 'gfx.canvas.willReadFrequently.enable' pref, and then the Gallery app will need to use that flag in its canvas.
Duplicate of this bug: 952094
Given Milan's email, this is a dupe of bug 939962 (same problem).
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 939962
Flags: needinfo?(clee)
You need to log in before you can comment on or make changes to this bug.