Closed Bug 939962 Opened 6 years ago Closed 6 years ago

Gallery app will get OOM process killed upon saving an edited photo after flashing or resetting the device

Categories

(Firefox OS Graveyard :: Gaia::Gallery, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:koi+, firefox27 wontfix, firefox28 fixed, firefox29 fixed, b2g-v1.2 fixed, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

VERIFIED FIXED
1.3 C2/1.4 S2(17jan)
blocking-b2g koi+
Tracking Status
firefox27 --- wontfix
firefox28 --- fixed
firefox29 --- fixed
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: nhirata, Assigned: djf)

References

Details

(Keywords: perf, regression, Whiteboard: [c=memory p= s= u=1.2] [MemShrink:P1])

Attachments

(4 files, 4 obsolete files)

Gaia:     1fb5d3168c95201d90bdb8f3c3d9023f77fffbed
Gecko:    41970e80c89c64def89f25f46b312dedadeafbf8
BuildID   20131116094609
Version   26.0
base build : v1.2_US_20131115.cfg
device : Buri

1. flash with new image or reset the device
2. launch the gallery app
3. edit a photo
4. try to save the edited photo

Expected: photo will save
Actual: OOM process killed before the save occurs.

adb shell dmesg shows:
<4>[ 1398.088629] select 758 (Homescreen), adj 8, size 4463, to kill
<4>[ 1398.088651] send sigkill to 758 (Homescreen), adj 8, size 4463
<4>[ 1398.786859] select 857 (Gallery), adj 2, size 27172, to kill
<4>[ 1398.786879] send sigkill to 857 (Gallery), adj 2, size 27172
blocking-b2g: --- → koi?
Note: 
1.1 OEM (Eng build) build :  US_20131104_root.cfg
1.2 OEM build : V1.2_US_20131115.cfg

** 1.2 OEM build : can't test; you can't reset with it and boot
** 1.2 OEM base build + custom 1.2 build : happens
** 1.1 OEM build : does not happen
** 1.1 OEM base build + same custom 1.2 build : doesn't save but does not OOM.
QA Contact: sparsons
This issue started to occur on the Buri 1.2 Build ID: 20130929004004

BuildID: 20130929004004
Gecko: 48faa2668dd8
Gaia: N/A
Version: 26.0a2


Last working Buri 1.2 Build ID: 20130928004001

Gaia   1e9470b9b6df630eddf1c4c8b25b3170ee786b0e
SourceStamp af05b2a644b8
BuildID 20130928004001
Version 26.0a2
Keywords: perf
Whiteboard: [MemShrink]
I was going to guess something related to a larger file from a leo device but this is on a buri.

David, you seem to always have good insight with these sorts of problems :)
blocking-b2g: koi? → koi+
Flags: needinfo?(dflanagan)
Whiteboard: [MemShrink] → [c=memory p= s= u=1.2] [MemShrink]
The regression range has nothing in it on the gaia side that landed on 9/28/2013.
The regression range isn't pointing out anything that appears to be the cause of this.
The regression range is around the time that bug 900399 landed, making photo size configurable. After that bug landed we started taking much larger photos. So if the test scripts that found this bug depends on first taking a photo with the camera and then rebooting and then editing the photo, the issue may just be that the photo is larger and uses more memory.

As a general thing, for image-related OOMs, it is important to include image size information in the bug report (not file size, but megapixels. Editing a 5megapixel image requires 40megabytes of memory.)

I'm not positive that I understand the bug report here. Editing an image after a reboot causes an OOM.  But if you try again after that it works?  What if you reboot and then wait a minute and then edit the photo. Does that work? If you launch another app and then launch gallery and edit the photo does that work?

I don't know anything about our startup process or about how the low memory killer works, but from the described symptoms, I'm guessing that after reboot there is some unkillable process running using up some of the available memory.  Maybe that plus the larger default photo size combine to cause this OOM?

So that is to say that I don't think this is a Gallery bug, but something about our startup process.

I recommend first thinking about how common this issue will be in the wild. If it only occurs within the first minute of a reboot, will users actually be affected by it?  If not, maybe we can WONTFIX and just say that the low memory killer is doing its job just like it is supposed to.

Second, I'd suggest that someone take a look at what processes are running right after reboot and how much memory they are using (adb shell b2g-ps does this, I think).  Is there something big using a lot of memory that could be made killable?

Finally, if this really is a koi+ bug that can't be solved by reducing memory usage elsewhere, then we may want to change the image editor so that it reduces the size of large images when it edits them. Maybe we start with a 5 megapixel image, but always save edited versions as 2 megapixels or less.
Flags: needinfo?(dflanagan)
Thinking about this a bit more, there is something I can try that might or might not help.

I said in the comment about that a 5 mp image takes 40mb to edit. Each pixel takes 4 byte, and when editing we have the original image and the edited copy in memory at the same time.

I could modify my code so that the in-memory copy of the original image was discarded as soon as a copy was made for in-placed editing.  There would be a memory spike when that copy was made and we would still need 8 bytes per pixel for a short time.  But for most of the editing and saving operations we would only be using 4 bytes per pixel.  Maybe that change would be enough to avoid the OOM.
I don't have a device that I can reproduce this crash on, so I can't test whether the attached patch actually fixes the bug. Jason or Sarah: could you check to see if this resolves the issue? (The patch is for master, and I'm assuming that the OOM reproduces there as well as on 1.2.  If you need a 1.2 pull request to test against, I can create one for you.)

John: I think it is probably worth applying this (small) patch to master even if it doesn't fix this particular bug, so I'd appreciate your review.
Attachment #8335558 - Flags: review?(johu)
Flags: needinfo?(sparsons)
Comment on attachment 8335558 [details] [review]
link to github pull request

Cool Patch! We should definitely land this patch to master.
Attachment #8335558 - Flags: review?(johu) → review+
David,

Assigning to you since you seem actively engaged on the bug.
Assignee: nobody → dflanagan
I meant to set needinfo for Jason as well to see if the patch resolves the bug.  Jason: please see comment 9.
Flags: needinfo?(jsmith)
Duplicate of this bug: 941738
Sarah mentioned in IRC that she was going to look into testing the patch.
Flags: needinfo?(jsmith)
I tested this patch on today's master and I can still reproduce the bug. 

BuildID: 20131121040202
Gecko: cf378dddfac8
Flags: needinfo?(sparsons)
(In reply to Sarah Parsons from comment #15)
> I tested this patch on today's master and I can still reproduce the bug. 
> 
> BuildID: 20131121040202
> Gecko: cf378dddfac8

Bummer!  Maybe the work I do to prevent OOM in bug 935273 will help here, but I don't want to count on that.

Sarah, could you answer this question from comment #7, please: "I'm not positive that I understand the bug report here. Editing an image after a reboot causes an OOM.  But if you try again after that it works?  What if you reboot and then wait a minute and then edit the photo. Does that work? If you launch another app and then launch gallery and edit the photo does that work?"

Sarah: also, it is critical to know how large (in pixels, not in bytes) the image you are trying to edit is. A memory-related bug report isn't complete without knowing how much memory is being used!

Mike: is there anyone from your team who could figure out what is going on at boot time that makes us get an OOM on this only after rebooting?

Without a working buri device (unless someone can help me root mine) it is going to be hard for me to reproduce this and get it fixed.  I still think we may need to consider WONTFIX for this bug, or if a fix is required, we may have to re-impose some draconian limits on the maximum image size for the camera and gallery apps.
Flags: needinfo?(sparsons)
Flags: needinfo?(mlee)
Hi David,

To answer the question in comment #7: Basically, this issue can be reproduced even without a fresh flash, reset or rebooting, although it is more intermittent (see bug 941738). Simply by tapping on an image from the gallery, editing, then attempting to save will prompt the OOM. I have seen it repro again right after getting the issue to occur once before it does not happen 100%. I did try to reproduce after opening another app first (Music and Calendar App) then the gallery and was able to get the issue to occur.

To answer your second question: The pictures were taken on using the Buri camera so the images are:
Width: 1536 pixels 
Height: 2048 pixels 
Total size: 237.5 kb - 371.8 kb

Hopefully this will help! Let me know if I can provide more info. 

(In reply to David Flanagan [:djf] from comment #16)
> (In reply to Sarah Parsons from comment #15)
> > I tested this patch on today's master and I can still reproduce the bug. 
> > 
> > BuildID: 20131121040202
> > Gecko: cf378dddfac8
> 
> Bummer!  Maybe the work I do to prevent OOM in bug 935273 will help here,
> but I don't want to count on that.
> 
> Sarah, could you answer this question from comment #7, please: "I'm not
> positive that I understand the bug report here. Editing an image after a
> reboot causes an OOM.  But if you try again after that it works?  What if
> you reboot and then wait a minute and then edit the photo. Does that work?
> If you launch another app and then launch gallery and edit the photo does
> that work?"
> 
> Sarah: also, it is critical to know how large (in pixels, not in bytes) the
> image you are trying to edit is. A memory-related bug report isn't complete
> without knowing how much memory is being used!
> 
> Mike: is there anyone from your team who could figure out what is going on
> at boot time that makes us get an OOM on this only after rebooting?
> 
> Without a working buri device (unless someone can help me root mine) it is
> going to be hard for me to reproduce this and get it fixed.  I still think
> we may need to consider WONTFIX for this bug, or if a fix is required, we
> may have to re-impose some draconian limits on the maximum image size for
> the camera and gallery apps.
Flags: needinfo?(sparsons)
I don't think a WONTFIX is a possible path forward here if this is a confirmed regression. We should back up & figure out what regressed this patch to cause this.

Note that moving to a 1.2 base image is triggering different gfx workflows, so there's always the possibility that this is a gfx fallout.
Milan - Can someone from the gfx team assist here to see if this is an issue on the gfx side or not?
Flags: needinfo?(milan)
Also - qawanted to get a logcat & dmesg log included on this bug
Keywords: qawanted
dmesg log: 

<4>[  157.738854] BATT: rcvd: 0, 2, 0, 2; 4197, 28
<6>[  157.738869] ygg *******led_on 481 on 1
<6>[  157.738884] resume update  msm_batt_update_psy_status 862  4197
<6>[  157.739528] nearly must  msm_batt_update_once 578	4197
<7>[  157.739553]  msm_batt_update_psy_status <<<<<<<<< ygg  878  
<4>[  163.561668] select 375 (Usage), adj 11, size 6585, to kill
<4>[  163.561688] select 483 (Video), adj 11, size 7001, to kill
<4>[  163.561704] send sigkill to 483 (Video), adj 11, size 7001
<4>[  164.581451] select 375 (Usage), adj 11, size 6301, to kill
<4>[  164.581476] send sigkill to 375 (Usage), adj 11, size 6301
<4>[  173.909819] select 517 (Settings), adj 10, size 6331, to kill
<4>[  173.909843] send sigkill to 517 (Settings), adj 10, size 6331
<4>[  177.011583] ls used greatest stack depth: 4584 bytes left
<4>[  187.709844] select 635 ((Preallocated a), adj 10, size 4414, to kill
<4>[  187.709864] send sigkill to 635 ((Preallocated a), adj 10, size 4414
<4>[  206.221246] select 432 (Homescreen), adj 8, size 5021, to kill
<4>[  206.221273] send sigkill to 432 (Homescreen), adj 8, size 5021
<4>[  207.441813] select 588 (Gallery), adj 2, size 25105, to kill
<4>[  207.441833] send sigkill to 588 (Gallery), adj 2, size 25105
<6>[  213.678638] android_work: android_work: sent missed DISCONNECT event
<7>[  213.681268] diag: USB disconnected
Attached file 939962_Logcat
Keywords: qawanted
Because the memory handling changes with the latest image, is this being tested with the image that supports fallback from pmem, or one of the older ones?  If it's with the newer one, it is to be expected now that we run out of memory more often - in the past, we would fail getting pmem, and things wouldn't draw correctly, but that would be the end of it.  Now, as we run out of pmem, we would start using "regular" memory, and that could lead to more OOM situations.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #23)
> Because the memory handling changes with the latest image, is this being
> tested with the image that supports fallback from pmem, or one of the older
> ones?  If it's with the newer one, it is to be expected now that we run out
> of memory more often - in the past, we would fail getting pmem, and things
> wouldn't draw correctly, but that would be the end of it.  Now, as we run
> out of pmem, we would start using "regular" memory, and that could lead to
> more OOM situations.

It's the new base image for 1.2. So that would explain why we're OOMing more in Gallery app more generally now.
We will never have a quality gallery app until we get bug 854795 fixed.  I can't emphasize that enough.
(In reply to Jason Smith [:jsmith] from comment #18)
> I don't think a WONTFIX is a possible path forward here if this is a
> confirmed regression. We should back up & figure out what regressed this
> patch to cause this.
> 

Now that Sarah has clarified that this is not just a transient issue after a reboot, I agree that WONTFIX is not an option.

> Note that moving to a 1.2 base image is triggering different gfx workflows,
> so there's always the possibility that this is a gfx fallout.

I'm probably not the right one to be working on it if that is the issue.

I have a working Buri now, but can't reproduce the bug. Probably I've got an old base image, since the buri has been sitting on my desk for a couple of months (just figured out how to root it today).

I can edit an image once, but then if I try to edit another image, the image editor is completely broken with this error in the console: Error: WebGL: Can't get a usable WebGL context" {file: "app://gallery.gaiamobile.org/js/ImageEditor.js

I've got some webgl patches in my review queue from benoit jacob. I'll try applying them. If they help with my bug, maybe they'll help with the crash others are seeing on their Buris.

And if that doesn't help, I'll probably just unassign myself from this.
Looking over Benoit's webgl patches, they appear to only improve execution speed of the image editor. I don't think they would change the memory usage. And because I can't reproduce the bug to begin with, I have not given them a try.

I'm unassigning myself from this bug based on the hypothesis in comment #23. If this is a low-level graphics issue, I wouldn't know how to fix it.  I can't think of any way to reduce the memory used when saving an edited image, so I don't think I can fix this in Gaia.

I'd recommend that someone try reverting the patch from bug 915001 to see if this bug still occurs with that removed.  (That was an 1.1hd+ bug that made a big change to the way edited images were saved.  Perhaps it caused this regression?)

Also, if anyone wants to try applying Benoit's patches in bug 934790 and bug 934813, maybe they would make a difference here.  (But I doubt it).
Assignee: dflanagan → nobody
Milan,

We will need your team's help to look into this bug. 

Thanks a lot!
Hema
Flags: needinfo?(milan)
Flags: needinfo?(mlee)
Sotaro, can you reproduce this problem?  Jerry, what about you? Should we follow David's advice and try backing out the patch from bug 915001 and see if it caused this problem?  It's late in the day in Toronto - Jerry, can you first test for the problem and try backing out bug 915001 and let us know how it went in the bug, and we can pick it up tomorrow in Toronto?  Thanks.
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(milan)
Flags: needinfo?(hshih)
ok, I found a buri device.
I will back out the bug 915001 first and check this issue.
Flags: needinfo?(hshih)
It might be better to compare SkiaGL ON and OFF.
Flags: needinfo?(sotaro.ikeda.g)
Component: Gaia::Gallery → Graphics
Product: Firefox OS → Core
Per discussion on a different bug, this could be another memory regression involving Skia GL.
(In reply to Milan Sreckovic [:milan] from comment #29)
> Sotaro, can you reproduce this problem? 

Yes, I reproduced the problem.
(In reply to Jason Smith [:jsmith] from comment #32)
> Per discussion on a different bug, this could be another memory regression
> involving Skia GL.

Yes, when I disabled SkiaGL, the crash did not happen.
Milan, who on your team can investigate this SkiaGL regression?
Flags: needinfo?(milan)
Whiteboard: [c=memory p= s= u=1.2] [MemShrink] → [c=memory p= s= u=1.2] [MemShrink:P1]
This test is base on gecko v1.2(with SkiaGL on) and Gaia:1fb5d3168c95201d90bdb8f3c3d9023f77fffbed.
The test device is buri.

Buri only has 8MB pmem and 180MB main memory.
Helix has 30MB pmem and 372MB main memory.

The test image is 1536*2048 pixel.

When we press the save button in gallery editor, we both get OOM in current and previous version of gallery.
As Sotaro said in comment 34, We might get OOM because SkiaGL use a lot of pmem(for SkiaGL cache??).


This is the memory footprint:
1.for current version gallery:
I/Gecko   ( 1283):  GrallocBufferActor::alloc size(1536,2048)  => 2d canvas(for jpeg encode)
E/memalloc( 1283):  Out of PMEM. Dumping PMEM stats for debugging
E/memalloc( 1283):  ------------- PRINT PMEM STATS --------------
E/memalloc( 1283):  Node 0 -> Start Address : 0 Size 19200 Free info 0
E/memalloc( 1283):  Node 1 -> Start Address : 19200 Size 19200 Free info 0
E/memalloc( 1283):  Node 2 -> Start Address : 38400 Size 12800 Free info 0
E/memalloc( 1283):  Node 3 -> Start Address : 51200 Size 12800 Free info 0
E/memalloc( 1283):  Node 4 -> Start Address : 64000 Size 10240 Free info 0
E/memalloc( 1283):  Node 5 -> Start Address : 74240 Size 2560 Free info 1
E/memalloc( 1283):  Node 6 -> Start Address : 76800 Size 19200 Free info 0
E/memalloc( 1283):  Node 7 -> Start Address : 96000 Size 19200 Free info 0
E/memalloc( 1283):  Node 8 -> Start Address : 115200 Size 12800 Free info 0
E/memalloc( 1283):  Node 9 -> Start Address : 128000 Size 134144 Free info 1
E/memalloc( 1283):  Total Allocated: 125440 Total Free: 136704
E/memalloc( 1283): ----------------------------------------------
E/memalloc( 1283): /dev/pmem: No more pmem available
W/memalloc( 1283): Falling back to ashmem
I/Gecko   ( 1283): GrallocBufferActor::alloc size(1024,1024)	=> webgl canvas(for tile image processing)
E/memalloc( 1283): /dev/pmem: Allocated buffer base:0x4a524000 size:4194304 offset:4096000 fd:141
E/memalloc( 1414): /dev/pmem: Mapped buffer base:0x495d5000 size:8290304 offset:4096000 fd:58
E/memalloc( 1283): /dev/pmem: Freeing buffer base:0x4a718000 size:327680 offset:2048000 fd:131
E/memalloc( 1414): /dev/pmem: Unmapping buffer base:0x45aed000 size:2375680 offset:2048000
I/Gecko   ( 1283): GrallocBufferActor::alloc size(320,426)
E/memalloc( 1283):  Out of PMEM. Dumping PMEM stats for debugging
E/memalloc( 1283):  ------------- PRINT PMEM STATS --------------
E/memalloc( 1283):  Node 0 -> Start Address : 0 Size 19200 Free info 0
E/memalloc( 1283):  Node 1 -> Start Address : 19200 Size 19200 Free info 0
E/memalloc( 1283):  Node 2 -> Start Address : 38400 Size 12800 Free info 0
E/memalloc( 1283):  Node 3 -> Start Address : 51200 Size 12800 Free info 0
E/memalloc( 1283):  Node 4 -> Start Address : 64000 Size 12800 Free info 1
E/memalloc( 1283):  Node 5 -> Start Address : 76800 Size 19200 Free info 0
E/memalloc( 1283):  Node 6 -> Start Address : 96000 Size 19200 Free info 0
E/memalloc( 1283):  Node 7 -> Start Address : 115200 Size 12800 Free info 0
E/memalloc( 1283):  Node 8 -> Start Address : 128000 Size 131072 Free info 0
E/memalloc( 1283):  Node 9 -> Start Address : 259072 Size 3072 Free info 1
E/memalloc( 1283):  Total Allocated: 246272 Total Free: 15872
E/memalloc( 1283): ----------------------------------------------
E/memalloc( 1283): /dev/pmem: No more pmem available
W/memalloc( 1283): Falling back to ashmem


2.for previous version gallery(back out bug 915001):
I/Gecko   ( 1559): GrallocBufferActor::alloc size(1536,2048)    => webgl canvas(processing whole image)
E/memalloc( 1559):  Out of PMEM. Dumping PMEM stats for debugging
E/memalloc( 1559):  ------------- PRINT PMEM STATS --------------
E/memalloc( 1559):  Node 0 -> Start Address : 0 Size 19200 Free info 0
E/memalloc( 1559):  Node 1 -> Start Address : 19200 Size 19200 Free info 0
E/memalloc( 1559):  Node 2 -> Start Address : 38400 Size 12800 Free info 0
E/memalloc( 1559):  Node 3 -> Start Address : 51200 Size 12800 Free info 0
E/memalloc( 1559):  Node 4 -> Start Address : 64000 Size 10240 Free info 0
E/memalloc( 1559):  Node 5 -> Start Address : 74240 Size 2560 Free info 1
E/memalloc( 1559):  Node 6 -> Start Address : 76800 Size 19200 Free info 0
E/memalloc( 1559):  Node 7 -> Start Address : 96000 Size 19200 Free info 0
E/memalloc( 1559):  Node 8 -> Start Address : 115200 Size 12800 Free info 0
E/memalloc( 1559):  Node 9 -> Start Address : 128000 Size 134144 Free info 1
E/memalloc( 1559):  Total Allocated: 125440 Total Free: 136704
E/memalloc( 1559): ----------------------------------------------
E/memalloc( 1559): /dev/pmem: No more pmem available
W/memalloc( 1559): Falling back to ashmem
E/memalloc( 1559): /dev/pmem: Freeing buffer base:0x4a718000 size:327680 offset:2048000 fd:131
E/memalloc( 1685): /dev/pmem: Unmapping buffer base:0x45842000 size:2375680 offset:2048000
I/Gecko   ( 1559): GrallocBufferActor::alloc size(320,426)
E/memalloc( 1559): /dev/pmem: Allocated buffer base:0x4a524000 size:573440 offset:4096000 fd:105
E/memalloc( 1685): /dev/pmem: Mapped buffer base:0x47200000 size:4669440 offset:4096000 fd:38
E/memalloc( 1559): /dev/pmem: Freeing buffer base:0x4a90c000 size:573440 offset:4096000 fd:105
E/memalloc( 1685): /dev/pmem: Unmapping buffer base:0x47200000 size:4669440 offset:4096000
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #35)
> Milan, who on your team can investigate this SkiaGL regression?

It's a product decision at this point.  SkiaGL uses more memory, as we cache.  This is why canvas faster with it.

Similar conversation in bug 941160 - this is the same underlying issue.
Where is that decision being made?  We already use a ton of memory for images ... not really thrilled about adding more unless we're winning a ton of perf.
Chris, we need a product decision on SkiaGL see Milan's comment 37.
Flags: needinfo?(clee)
Priority: -- → P1
See Also: → 94116
We have a product decision - SkiaGL stays in.  George, I gave you this one and :snorp bug 941160, though they are probably the same.  I don't want to make that assumption though.  Can you two please work together to see what the alternatives are.

Also, can somebody confirm that we're killing the gallery app because of low memory pressure, and that is what ultimately causes the symptom?
Assignee: nobody → gwright
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #40)
> Also, can somebody confirm that we're killing the gallery app because of low
> memory pressure, and that is what ultimately causes the symptom?

See comment 0?
See Also: 94116
I wonder if I should have tried applying Benoit's patch from bug 934813 to see if it resolves this.  That patch makes the webgl more performant.  I don't know if it has any impact on memory or not.  Perhaps one of the people working on this bug now will know whether it would make a difference.
Preeti reminded me about the Gaia patch from comment #9.

I've landed it to master: https://github.com/mozilla-b2g/gaia/commit/08d96ee9678bef48a4540470ed96d32e962ff557

This does not reduce peak memory usage, but it prevents us from holding an extra copy of the image data while an editing file is being saved, so it might do some good.  The bug remains open, however.
And uplifted to v1.2: https://github.com/mozilla-b2g/gaia/commit/de86074dae1500dcdf2ff1f3ffdf535188527c80

I still can't reproduce this bug (I think my Buri device has old firmware or something) so I can't test whether this makes any difference.  Based on comment #15, however, I'm assuming that this will not fix the issue.  

Once we do fix the issue, however, the patch I just landed might prevent other apps from being killed, I suppose.
Status: NEW → ASSIGNED
Because 1.2 is going to be finished, should we mark this bug from koi+ to 1.3+? Thanks.
QA Wanted - Can someone retest this to see if the recent skia gl & above patch stops the reproduction of this bug?
Keywords: qawanted
This issue still reproduces on Buri 1.2 Build ID: 20131211004007

Gaia   096722a9e2510ecdfe45ba7382d7d50826b82feb
SourceStamp 43d7b300241a
BuildID 20131211004007
Version 26.0

This issue still reproduce on the Buri 1.3 Build ID:20131211004003

Gaia   7a2ccae2a546ac4d981d250272dafa630c926422
SourceStamp 6bb84d0bc170
BuildID 20131211004003
Version 28.0a2
Keywords: qawanted
With a trunk build on a Hamachi device (I believe it's 256MB) I can't reproduce this bug using our updated skia from a week or two ago. I don't have a Buri device to test on.
(In reply to George Wright (:gw280) from comment #48)
> With a trunk build on a Hamachi device (I believe it's 256MB) I can't
> reproduce this bug using our updated skia from a week or two ago. I don't
> have a Buri device to test on.

Note - Hamachi and Buri are the same device.

What happens if you test this using the latest changes?
By latest changes you mean v1.2 branch or master branch?
So I just tested with master branch and also can't reproduce. I will now test with 1.2...
I can't reproduce with v1.2 branch either, so I have no idea what's going on :(
Perhaps your hamachi/buri has an old base image that doesn't support fallback?
How do I update it?
Normal ROM update recommend to update only gecko and gaia. It can not update correctly to new one. To update correctly, you need to use vendor proprietary tool. I can update to a new base image by using it.
Hi all,
Can we add a "off-screen" option for off-screen image processing when we create 2d or 3d canvas context?
If we use off-screen option, the canvas will not show on the screen.
We can only use single buffer for optimizing the memory usage.
Flags: needinfo?(snorp)
Jerry,

Please reach out to a QA contact to help with set up.
Flags: needinfo?(hshih)
Hi Preeti,
Do I need to discuss the "off-screen" option with QA?
Flags: needinfo?(hshih)
I think some wires got crossed here.  Preeti, if you need any clarification on this, let me know.  Jerry, work with Snorp and Vlad directly on this, and let's just post the conclusions to the bug.
I add the log at:
http://dxr.mozilla.org/mozilla-central/source/gfx/gl/SharedSurfaceGralloc.cpp#57

In gallery, we will just create "one buffer" for 2d canvas(using SkiaGL) and 3d canvas.

I have found some addition buffer alloc in gallery. I will midify the gallery code and review the OOM problem.
Flags: needinfo?(snorp)
(In reply to Jerry Shih[:jerry] from comment #61)
> I add the log at:
> http://dxr.mozilla.org/mozilla-central/source/gfx/gl/SharedSurfaceGralloc.
> cpp#57
> 
> In gallery, we will just create "one buffer" for 2d canvas(using SkiaGL) and
> 3d canvas.
> 
> I have found some addition buffer alloc in gallery. I will midify the
> gallery code and review the OOM problem.

The addition two buffer allocation that were related to WebGL came from below lines.

http://dxr.mozilla.org/mozilla-central/source/content/canvas/src/WebGLContext.cpp#397
buffer allocation caused by publishframe()

http://dxr.mozilla.org/mozilla-central/source/content/canvas/src/WebGLContext.cpp#400
buffer allocation caused by Resizeoffscreen()

Need to confirm we still have OOM problem if we save above redundant buffer allocation.
Attached patch 0001-add-debug-log.patch (obsolete) — Splinter Review
gallery app debug message
remove unnecessary canvas size assignment.

The assignment operation will call resize() in b2g, and create a new buffer in this function.
We should not create a new buffer in this situation.
This is the gallery app image processing canvas memory footprint. 
The test image size is 384*640.

If we remove the unnecessary zero size assignment, we will not see the 1024*1 and 1024*1024 buffer alloc.
It juse reduces a little memory usage.
We need to check whether we still get OOM problem.
We also need to fix the resize() in b2g.

1.
Apply 0001-add-debug-log.patch:

I/Gecko   ( 8138): bignose gralloc surface:(354,395)
I/Gecko   ( 8138): bignose gralloc surface:(354,395)
I/Gecko   ( 8138): bignose gralloc surface:(237,395)
I/Gecko   ( 8138): bignose gralloc surface:(354,395)
I/Gecko   ( 8138): bignose gralloc remove surface:(237,395)
I/GeckoDump( 8138): bignose start image processing
I/Gecko   ( 8138): bignose gralloc surface:(384,640)
I/Gecko   ( 8138): bignose gralloc surface:(1024,1024)
I/GeckoDump( 8138): bignose start tile processing
I/GeckoDump( 8138): bignose finish tile processing
I/Gecko   ( 8138): bignose gralloc surface:(1024,1024)              //unnecessary alloc
I/Gecko   ( 8138): bignose gralloc surface:(1024,1)                 //unnecessary alloc
I/GeckoDump( 8138): bignose end image processing
I/GeckoDump( 8138): bignose save encoded file
I/Gecko   ( 8138): bignose gralloc remove surface:(384,640)
I/Gecko   ( 8138): bignose gralloc surface:(285,285)

2.
Apply 0001-add-debug-log.patch and 0002-remove-unnecessary-assignment.patch
remove unnecessary canvas size assignment(it cause Resize() in b2g)

I/Gecko   ( 9537): bignose gralloc surface:(354,395)
I/Gecko   ( 9537): bignose gralloc surface:(354,395)
I/Gecko   ( 9537): bignose gralloc surface:(237,395)
I/Gecko   ( 9537): bignose gralloc surface:(354,395)
I/GeckoDump( 9537): bignose start image processing
I/Gecko   ( 9537): bignose gralloc surface:(384,640)
I/Gecko   ( 9537): bignose gralloc surface:(1024,1024)
I/GeckoDump( 9537): bignose start tile processing
I/GeckoDump( 9537): bignose finish tile processing
I/GeckoDump( 9537): bignose end image processing
I/GeckoDump( 9537): bignose save encoded file
I/Gecko   ( 9537): bignose gralloc remove surface:(237,395)
I/Gecko   ( 9537): bignose gralloc surface:(285,285)
We can use "webgl_context.getExtension('WEBGL_lose_context').loseContext()" to destroy the webgl context explicitly instead of waiting for the GC.

But I get crash when I call this function.
What kind of crash are you getting? OOM or something worse?
Group: gfx-core-security
(In reply to Andreas Gal :gal from comment #68)
> What kind of crash are you getting? OOM or something worse?

The crash is come from bug 893304.
We can use "webgl_context.getExtension('WEBGL_lose_context').loseContext()" now.
I'm using master branch for testing. It is not affect the koi code base.
Should we remove the gfx-core-security label?


This is the alloc footprint with webgl_context.getExtension('WEBGL_lose_context').loseContext().
We destroy the webgl context explicitly. 

I/Gecko   (18213): bignose gralloc surface:(354,395)
I/Gecko   (18213): bignose gralloc surface:(354,395)
I/Gecko   (18213): bignose gralloc surface:(237,395)
I/Gecko   (18213): bignose gralloc surface:(354,395)
I/GeckoDump(18213): bignose start image processing
I/Gecko   (18213): bignose gralloc surface:(384,640)
I/Gecko   (18213): bignose gralloc surface:(1024,1024)
I/GeckoDump(18213): bignose start tile processing
I/GeckoDump(18213): bignose finish tile processing
I/GeckoDump(18213): bignose we have handler
I/Gecko   (18213): bignose gralloc remove surface:(1024,1024)         //for tile processing
I/GeckoDump(18213): bignose end image processing
I/GeckoDump(18213): bignose save encoded file
I/GeckoDump(18213): bignose we have handler
I/Gecko   (18213): bignose gralloc remove surface:(354,395)           //for editing preview
I/Gecko   (18213): bignose gralloc remove surface:(354,395)
I/Gecko   (18213): bignose gralloc remove surface:(354,395)
I/Gecko   (18213): bignose gralloc remove surface:(237,395)
I/Gecko   (18213): bignose gralloc surface:(285,285)
use webgl lose_context extension
Group: gfx-core-security
I create bug 951812 for reducing memory usage in gallery app.
Hi Jerry, did you need any help here from others on this bug?  We're still looking for this to land cleanly before tomorrow, 12/20.  So if you need engineering managers to loan you a resource, please just ask.  This is a high priority bug.  thanks!
Flags: needinfo?(hshih)
(In reply to Tony Chung [:tchung] from comment #72)
> Hi Jerry, did you need any help here from others on this bug?  We're still
> looking for this to land cleanly before tomorrow, 12/20.  So if you need
> engineering managers to loan you a resource, please just ask.  This is a
> high priority bug.  thanks!

Bug 951812 just reduce a little memory usage.(comment 66)
I am not sure that will solve this OOM issue.
Flags: needinfo?(hshih)
Duplicate of this bug: 941160
If a gaia person wants to try the patch in bug 884226 (which I just pushed to inbound), that would be helpful.

You'll need to set the 'gfx.canvas.willReadFrequently.enable' pref and also modify the getContext() call for your image processing canvas to be something like: canvas.getContext('2d', { willReadFrequently: true })

This will force the creation of a software canvas, which may save us enough memory to avoid crashing.
Let's move this to Gaia now, and take the Comment 75 and run with it.
Assignee: gwright → nobody
Component: Graphics → Gaia::Gallery
Product: Core → Firefox OS
(In reply to Milan Sreckovic [:milan] from comment #76)
> Let's move this to Gaia now, and take the Comment 75 and run with it.

We can't do that. bug 884226 isn't part of 1.2, so we can't use that solution as it stands right now.
Component: Gaia::Gallery → Graphics
Product: Firefox OS → Core
Version: unspecified → 26 Branch
Discussed in meeting - we need to + bug 884226 & fix this in the gallery by making use of the work in bug 884226.
Component: Graphics → Gaia::Gallery
Product: Core → Firefox OS
Version: 26 Branch → unspecified
Hema - The relevant gecko patches are ready to be used by the gallery. Can you find someone on the media apps side to implement the relevant gaia patch here?
Flags: needinfo?(hkoka)
Assignee: nobody → dflanagan
Flags: needinfo?(hkoka)
Taking this bug to see if I can take it from here.  I'm worried about the "may" in "may save us enough memory to avoid crashing" part of comment 75, but I guess we need to try it and see.

IIUC, I need a gecko patch to enable the pref for b2g, and then a Gaia patch to set the willReadFrequently property when I create the 2d canvas context.
(In reply to David Flanagan [:djf] from comment #80)
> ...
> 
> IIUC, I need a gecko patch to enable the pref for b2g, and then a Gaia patch
> to set the willReadFrequently property when I create the 2d canvas context.

Correct.
James,

I'm asking for your review of this to make sure I'm setting the pref correctly
Attachment #8356306 - Flags: review?(snorp)
James,

Is this one-line patch all that is needed in Gaia?  I'm setting your new flag on the one big 2D canvas used when saving an edited image.

[Also marking Jerry's various attachments as obsolete, since he has moved those to bug 951812]
Attachment #8349266 - Attachment is obsolete: true
Attachment #8349267 - Attachment is obsolete: true
Attachment #8349269 - Attachment is obsolete: true
Attachment #8349545 - Attachment is obsolete: true
Attachment #8356307 - Flags: review?(snorp)
I've attached a gecko patch to enable the new willReadFrequently flag for b2g and a gaia patch to use that new flag when saving an image.

Unfortunately, I am unable to test my patches because, as noted above, I've never been able to reproduce this bug.  For me, saving an edited image always fails with these lines:

[JavaScript Warning: "Error: WebGL: Can't get a usable WebGL context" {file: "app://gallery.gaiamobile.org/js/ImageEditor.js" line: 1396}]
[JavaScript Error: "TypeError: gl is null" {file: "app://gallery.gaiamobile.org/js/ImageEditor.js" line: 1399}]

That is, I get a webgl error in the console before I get an OOM. I'm assuming that this is because my buri (or is it a hamachi?) has out of date firmware, and I have no way to update it.

So unassigning myself and setting needinfo for Jason. Jason: is there anyone in QA that can test these two patches to see if they fix the bug?
Assignee: dflanagan → nobody
Flags: needinfo?(jsmith)
Sure.

For the tester doing this, you need to do the following on the latest 1.4 build flashed from pvtbuilds:

1. git clone https://github.com/davidflanagan/gaia.git
2. cd gaia
3. git checkout bug939962
4. MOZILLA_OFFICIAL=1 make production

After setting up this build, you then need to add the preference on your device (https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Changing_preferences):

pref("gfx.canvas.willReadFrequently.enable", true);

Then, run the STR seen in this bug & the dupe.
Flags: needinfo?(jsmith)
Keywords: qawanted
Comment on attachment 8356306 [details] [diff] [review]
gecko patch to set the pref in b2g.js

Yup, that looks right.
Attachment #8356306 - Flags: review?(snorp) → review+
Comment on attachment 8356307 [details] [review]
one-line gaia patch to use willReadFrequently flag

Looks good, but is it for sure the only place you need the flag?
Attachment #8356307 - Flags: review?(snorp) → review+
(In reply to Jason Smith [:jsmith] from comment #85)
> Sure.
> 
> For the tester doing this, you need to do the following on the latest 1.4
> build flashed from pvtbuilds:
> 
> 1. git clone https://github.com/davidflanagan/gaia.git
> 2. cd gaia
> 3. git checkout bug939962
> 4. MOZILLA_OFFICIAL=1 make production
> 
> After setting up this build, you then need to add the preference on your
> device
> (https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Changing_preferences):
> 
> pref("gfx.canvas.willReadFrequently.enable", true);
> 
> Then, run the STR seen in this bug & the dupe.


I tested using the steps in this comment using a Buri device and the following build:

Gaia   22c9d906d78ab9945d92eef6a7973a5a0d56f397
SourceStamp 14ac61461f2a
BuildID 20140106040201
Version 29.0a1
base build : v1.2_US_20131115.cfg

I performed a number of manipulations editing various photos in the gallery, some taken with the camera and some that were not, and I did not see the OOM issue noted in the initial comment. I also tested attaching various gallery images to MMS messages and did not encounter any issues.
Looks like this is good to land then.
Keywords: qawanted
David - We're good to land here, right? If so, can you land the gecko patch, then the gaia patch?
Flags: needinfo?(dflanagan)
(In reply to Jason Smith [:jsmith] from comment #90)
> David - We're good to land here, right? If so, can you land the gecko patch,
> then the gaia patch?

David is out this morning. If we are good from the testing front, he can help land his changes once he is back this afternoon (EST)

Thanks
Hema
I'll put checkin-needed then to get this on the sheriff's radar for landing then.
Keywords: checkin-needed
Attachment #8356306 - Attachment is patch: true
Adding leave open since the gaia patch will need to land after the gecko patch
Whiteboard: [c=memory p= s= u=1.2] [MemShrink:P1] → [c=memory p= s= u=1.2] [MemShrink:P1][leave open]
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #87)
> Comment on attachment 8356307 [details] [review]
> one-line gaia patch to use willReadFrequently flag
> 
> Looks good, but is it for sure the only place you need the flag?

That's the only large 2D canvas involved in the 'save edited image' flow, so probably the only place directly relevant to this bug.

It could be that there are other places that would benefit from the use of this flag, but since I have basically no idea what it does, I don't really know for sure.  We use offscreen canvases all over the place in Gaia for resizing images.  Basically canvas.createContext(); context.drawImage(); canvas.toBlob();  Would 'willReadFrequently' make that image resizing process faster or less memory intensive? If so, there are probably at least 10 other places I'd use it.  (But in a separate bug).
Flags: needinfo?(dflanagan) → needinfo?(snorp)
Landed gaia patch to master: https://github.com/mozilla-b2g/gaia/commit/bb2d680da76b31fa4e33e00b07af0df3d0285fa2

It still needs uplift to 1.3 and 1.2, but I'll let a sheriff do this after the gecko patch gets uplifted.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Whiteboard: [c=memory p= s= u=1.2] [MemShrink:P1][leave open] → [c=memory p= s= u=1.2] [MemShrink:P1]
https://hg.mozilla.org/releases/mozilla-aurora/rev/55197a1aa5af
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/31392cf7e185

Leaving status flags set to affected for the Gaia patches.
Assignee: nobody → dflanagan
Target Milestone: --- → 1.3 C2/1.4 S2(17jan)
(In reply to David Flanagan [:djf] from comment #96)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #87)
> > Comment on attachment 8356307 [details] [review]
> > one-line gaia patch to use willReadFrequently flag
> > 
> > Looks good, but is it for sure the only place you need the flag?
> 
> That's the only large 2D canvas involved in the 'save edited image' flow, so
> probably the only place directly relevant to this bug.
> 
> It could be that there are other places that would benefit from the use of
> this flag, but since I have basically no idea what it does, I don't really
> know for sure.  We use offscreen canvases all over the place in Gaia for
> resizing images.  Basically canvas.createContext(); context.drawImage();
> canvas.toBlob();  Would 'willReadFrequently' make that image resizing
> process faster or less memory intensive? If so, there are probably at least
> 10 other places I'd use it.  (But in a separate bug).

You're only using it for scaling? It's a little less clear cut there, but it could be better to use this flag for that as well.
Flags: needinfo?(snorp)
This bug was partially uplifted.

Uplifted bb2d680da76b31fa4e33e00b07af0df3d0285fa2 to:
v1.3: d578331502a31a52d4a3f356ab5e96c5b1e847a8

Commit bb2d680da76b31fa4e33e00b07af0df3d0285fa2 didn't uplift to branch v1.2
Duplicate of this bug: 940729
Duplicate of this bug: 940729
Duplicate of this bug: 956226
tested and working
1.5 
Gecko fa7c5ad
Gaia 7c73c66
Status: RESOLVED → VERIFIED
Flags: needinfo?(clee)
You need to log in before you can comment on or make changes to this bug.