Closed Bug 805114 Opened 12 years ago Closed 12 years ago

[gallery] generating thumbnails for huge images causes OOM

Categories

(Firefox OS Graveyard :: Gaia::Gallery, defect, P3)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+)

RESOLVED FIXED
blocking-basecamp +

People

(Reporter: ttaubert, Assigned: djf)

References

Details

Attachments

(1 file)

I just did a 'make reset-gaia' and there's not too many photos on my sdcard. Open the gallery and wait 10s for it to parse the media files, it exits with an OOM. Pretty sure that worked last week with a lot more pictures.
blocking-basecamp: --- → ?
If the gallery loads successfully, swipe some images around in picture mode. It'll go OOM after before the 10th.
I found following log in dmesg, looks like Gallery app uses too much memory and get killed by kswapd.

<4>[10-26 03:41:05.482] [2521: kworker/u:1]huangjinyu temp_t9_7:46,time:7
<4>[10-26 03:41:05.932] [2880: Gallery]select 2860 (Homescreen), adj 6, size 6909, to kill
<4>[10-26 03:41:05.932] [2880: Gallery]send sigkill to 2860 (Homescreen), adj 6, size 6909
<3>[10-26 03:41:05.982] [17: kworker/0:1]TAOS: lux_timep->saturation=8100 hi=1445 lo=867
<4>[10-26 03:41:06.523] [25: kswapd0]select 5973 ((App)), adj 6, size 3895, to kill
<4>[10-26 03:41:06.523] [25: kswapd0]send sigkill to 5973 ((App)), adj 6, size 3895
<4>[10-26 03:41:07.824] [25: kswapd0]select 2880 (Gallery), adj 1, size 18982, to kill
<4>[10-26 03:41:07.824] [25: kswapd0]send sigkill to 2880 (Gallery), adj 1, size 18982
Per triage: blocking, as this will be a relatively common case.  There are likely related bugs on general slowness for this app, which may be related.
blocking-basecamp: ? → ---
ttaubert, or anyone else who has seen this bug...  Was it on otoro or unagi?

I wonder if this is related to #803049 and #803020?

This is a regression, obviously. What has changed? The switch from otoro to unagi.  Is there less free memory on the new device?  Maybe device storage changes?  Imagelib changes in gecko? Or, given that the camera is now producing rotated images, maybe that is what has changed, and we're getting much larger images (or much larger thumbnails) from the camera.
Assignee: nobody → dflanagan
blocking-basecamp: --- → +
Still reproducible on Unagi here. I have no idea what changed, would be good to have a couple of other people chime in that can reproduce it. My device is artificially limited to the 256mb the target device will have.

> $ adb shell cat /proc/meminfo | grep MemTotal
> MemTotal:         184096 kB
I can reproduce this bug on Otoro and the memory information in below.
> adb shell cat /proc/meminfo | grep MemTotal
> MemTotal:         172880 kB
Shih-Chiang, make sure you have the latest otoro kernel.  The amount of MemTotal should be approximately the same as Tim's unagi in comment 5.
@cjones, thanks for the reminder, I'll update my otoro. I happened to take a look at this bug, gallery app curently loads all the thumbnails from indexDB and append these thumbnails as inline <img>. This OOM problem will happened eventually as user storing more and more pictures on his phone.
I'll analysis the memory profile of gallery app and address the memory consumption problem.
Well I've got much more memory on my unagi, so I'm not going to be reproducing this bug: 

adb shell cat /proc/meminfo | grep MemTotal
MemTotal:         416468 kB

I thought there was a kernel update recently to cripple half the memory or something. If I flash a nightly, don't I get the latest kernel? 

If I can't get this issue sorted out I'll unassign myself from the bug, and let someone who has a chance of reproducing it take a crack.
(In reply to Shih-Chiang Chien [:schien] from comment #8)
> @cjones, thanks for the reminder, I'll update my otoro. I happened to take a
> look at this bug, gallery app curently loads all the thumbnails from indexDB
> and append these thumbnails as inline <img>. This OOM problem will happened
> eventually as user storing more and more pictures on his phone.
> I'll analysis the memory profile of gallery app and address the memory
> consumption problem.

You're right that the gallery stores all of its thumbnails in a single list and creates one img element for each one, without any attempt to load thumbnails on demand.  

However, there is something else going on here. The app is now OOMing with a smallish number of photos that used to work fine.  So we need to figure out whether there is less memory available for the app, or if the images are now taking up more memory than they used to.  I've taken this bug to investigate in case it is actually a bug in the app, but I suspect it is a gecko regression of some sort.
So here's a big part of the problem, I think:  the camera is capturing images at 480x640, and then embedding a 384x512 thumbnail inside that image. Almost half of each jpeg file we're saving is its EXIF thumbnail. And its more than 9 times bigger than what the gallery app wants.

I'll file a bug about the camera hardware or device driver, because we really shouldn't bother saving a thumbnail at all if it is going to be almost full size.

And meanwhile, I'll convert the gallery app to ignore the EXIF thumbnail data and just create its own, and we can see if that helps prevent this bug.
The linear list needs to be fixed too though.
Does it? Or is gecko smart enough to not decompress the jpeg data for images that are offscreen?  

I was wrong in #10: the app does not use individual <img> elements. These are just <li> tags with CSS background-image set.  I don't know whether background-image vs img element actually makes any difference.
I think we have a known bug where we decode even off screen images. At the very least we fetch them all. I would recommend doing this lazily. Generating all those img tags is never a good idea.
I've filed the big thumbnail bug as https://bugzilla.mozilla.org/show_bug.cgi?id=806055
There is an attempt at a workaround at https://github.com/mozilla-b2g/gaia/pull/6034

Tim: would you try out that patch and see if it resolves the bug for you?

I'm also going to ask you to review the patch for me.
Attachment #675817 - Flags: review?(ttaubert)
Tim: I forgot to mention that you'll have to do a make reset-gaia, or manually blow away the indexedDB database for the gallery app before you can test this patch.  It changes the way image thumbnails are generated, but if they're already stored in the database you won't see any change.
I just tried your patch with reset-gaia, the app still gets closed unfortunately (while scanning).
I can also do some investigations on my side if that helps. I'll do it later when I'm back.
@David, I got a SIGSEGV after applying your patch, here is the error I found in logcat.
 I/Gecko   (14060): [Child 14060] ###!!! ABORT: [PBlobStreamChild] abort()ing as a result: file /home/hellfire/workspace/github/B2G-dev/objdir-gecko/ipc/ipdl/PBlobStreamChild.cpp, line 364
Shih-Chiang: Wow! I wonder if doing a make-reset (which forces the gallery to rescan all photos at startup) would have caused a crash for you even without applying my patch...

Anyway, if the crash is reproducible, that is probably a new bug to report.  Or there are crasher flags to set on this one, I think.

Shih-Chiang: if you can reproduce the crash with my patch but cannot crash without, then I'd like to know that. Because the patch is supposed to reduce memory use, and I'd like to apply it even if it doesn't entirely fix this bug.

Tim: I'm sorry the patch didn't fix it for you. Would you mind reviewing that patch anyway? If we can determine that the patch does not cause Shih-Chiang's crash, I'd like to land it even if it does not fully resolve the OOM problem.

Meanwhile, I really need to find out why my unagi has so much memory and what to do about that.  Tim: did you build your own kernel?  Any idea why just flashing from nightly wouldn't be enough to get me the latest kernel version?
@David: SIGSEGV will disappear after removing the patch for gallery app, and re-applying this patch will make the SIGSEGV show up again.
Here is the memory profile of gallery app during first phone db initialization. Images actually consume a lot of memory. The STR on my otoro (with latest kernel) is to swipe the thumbnail view while gallery app is loading the thumbnails.

67.51 MB (100.0%) -- explicit
├──54.85 MB (81.26%) -- images
│  ├──54.85 MB (81.26%) -- content
│  │  ├──54.85 MB (81.26%) -- used
│  │  │  ├──48.01 MB (71.12%) ── uncompressed-heap
│  │  │  ├───6.84 MB (10.14%) ── raw
│  │  │  └───0.00 MB (00.00%) ── uncompressed-nonheap
│  │  └───0.00 MB (00.00%) ++ unused
│  └───0.00 MB (00.00%) ++ chrome
├───5.14 MB (07.62%) ── heap-unclassified
├───4.16 MB (06.16%) -- js-non-window
│   ├──2.87 MB (04.26%) -- compartments
│   │  ├──2.50 MB (03.71%) ++ non-window-global
│   │  └──0.37 MB (00.55%) ++ no-global/compartment(atoms)
│   └──1.28 MB (01.90%) ++ (2 tiny)
├───1.62 MB (02.40%) -- window-objects/top(app://gallery.gaiamobile.org/index.html, id=1)/active/window(app://gallery.gaiamobile.org/index.html)
│   ├──0.93 MB (01.38%) ++ js/compartment(app://gallery.gaiamobile.org/index.html)
│   └──0.69 MB (01.02%) ++ (4 tiny)
├───1.10 MB (01.63%) ── xpti-working-set
└───0.63 MB (00.93%) ++ (8 tiny)
Priority: -- → P3
(In reply to David Flanagan [:djf] from comment #13)
> Does it? Or is gecko smart enough to not decompress the jpeg data for images
> that are offscreen?  

Gecko is smart enough that it won't keep the decompressed data in memory for offscreen images. Though the images are discarded off a timer so the decompressed data can likely result in OOMs unless we have hooked up all the relevant memory pressure observers and notifications (I don't know the state of that work).

However even the compressed data can easily add up to a lot of data. Are we creating thumbnails for all images or are we actually loading the full image from the sdcard directly and just resizing on screen? Gecko is *not* smart enough to only keep the resized data in memory and fixing that is likely too hard to do in the v1 timeframe.
Jonas,

The gallery stores a thumbnail for each image in indexed db and uses that.  This particular thread of this bug has gotten off track, however: this bug is about the first run, which is when the gallery has to scan all images and either extract their EXIF thumbnails, or actually load the image into an img element and create a thumbnail.  

Something has changed in gecko's image memory management, because this process of creating thumbnails did not OOM in the past
Jonas,

Has gecko actually gotten better recently about discarding decompressed image data when images go offscreen?  If so that would explain #803049.
Whiteboard: [MemShrink]
(In reply to David Flanagan [:djf] from comment #27)
> Has gecko actually gotten better recently about discarding decompressed
> image data when images go offscreen?  If so that would explain #803049.

To the best of my knowledge no changes have happened here.
Attachment #675817 - Flags: review?(ttaubert) → review+
I just landed https://github.com/mozilla-b2g/gaia/pull/6034 to address the big thumbnail issue.  

It sounds like this will not fix this issue, but it does fix two other bugs.

Tim and Shih-Chiang, are you still seeing the OOM on startup?  Note that the version of the patch I landed no longer requires you to make reset-gaia.  And I switched from canvas.mozGetAsFile to the asynchronous canvas.toBlob(), so that might have changed the memory issues you were seeing before.
Still a very early OOM :(
Even when commenting out line 244 and 262 the OOM still occurs which is really odd because we don't show/render any thumbnails at all.

https://github.com/ttaubert/gaia/blob/f3fd8deab037c110e4e918894c227c279f6b0e9a/apps/gallery/js/gallery.js#L244
> Gecko is smart enough that it won't keep the decompressed data in memory for offscreen 
> images.

Just to be clear, this is bug 689623, which is definitely not fixed.
This needs to be worked around in Gaia, we can't get a Gecko fix for image downscaling on Aurora.  Generating thumbnails via canvas and using those is the solution.
Blocks: slim-fast
Whiteboard: [MemShrink]
Seems it's not displaying the images but parsing the meta data that makes us run OOM. This single piece (without all its even handlers) is enough to get the app killed:

> photodb = new MediaDB('pictures', metadataParser, {
>   mimeTypes: ['image/jpeg', 'image/png'],
>   version: 2
> });

Replacing the metadataParser with a dummy one "fixes" it.
This is really a bug in the mediadb design. I discussed this with djf earlier today. Mediadb should _never_ iterate over a large set of images and process all of them. Thats not how responsive web content should be structured. We need a continuation pattern here that processes image by image posting timeouts. This should happen in the background and the visible state is updated as changes are detected. djf, can you take this one?
@djf, OOM will not happened again on my otoro with the workaround in comment 29. However, it took about 2 minutes to construct mediadb for the first execution of gallery app (with 234 jpg images in my sdcard).
Here is the memory profile for gallery app, as you can see the |images| section is downsized to 3.65 MB. 
19.03 MB (100.0%) -- explicit
├───5.40 MB (28.37%) ++ js-non-window
├───3.65 MB (19.19%) -- images
│   ├──3.65 MB (19.19%) -- content
│   │  ├──3.65 MB (19.19%) -- used
│   │  │  ├──3.23 MB (16.98%) ── raw
│   │  │  ├──0.42 MB (02.20%) ── uncompressed-heap
│   │  │  └──0.00 MB (00.00%) ── uncompressed-nonheap
│   │  └──0.00 MB (00.00%) ++ unused
│   └──0.00 MB (00.00%) ++ chrome
├───3.36 MB (17.66%) ── heap-unclassified
├───3.00 MB (15.74%) -- dom/memory-file-data
│   ├──2.72 MB (14.31%) ++ large
│   └──0.27 MB (01.44%) ── small
├───1.94 MB (10.21%) -- window-objects/top(app://gallery.gaiamobile.org/index.html, id=1)/active/window(app://gallery.gaiamobile.org/index.html)
│   ├──1.05 MB (05.53%) -- js/compartment(app://gallery.gaiamobile.org/index.html)
│   │  ├──0.55 MB (02.89%) -- gc-heap
│   │  │  ├──0.34 MB (01.78%) ++ (6 tiny)
│   │  │  └──0.21 MB (01.12%) ── unused-gc-things
│   │  ├──0.29 MB (01.51%) ++ (6 tiny)
│   │  └──0.21 MB (01.13%) ── analysis-temporary
│   ├──0.64 MB (03.39%) -- layout
│   │  ├──0.38 MB (02.00%) ++ (6 tiny)
│   │  └──0.26 MB (01.39%) ── pres-shell
│   └──0.25 MB (01.30%) ++ (3 tiny)
├───1.10 MB (05.78%) ── xpti-working-set
├───0.34 MB (01.80%) ++ (7 tiny)
└───0.24 MB (01.25%) ++ layout
Andreas, 

You misunderstood my explanation of mediadb.  Everything is async.  When the gallery enumerates the known images, that is an async enumeration of IndexedDB, using a cursor, and it handles one image at a time.

And when mediadb scans for new files, that is an async enumeration of DeviceStorage.  Again, one file at a time, and I am careful to serialize the metadata parsing, so that mediadb does not end up generating thumbnails for multiple new photos at once.

What behavior are you seeing that makes you think there is a structural issue here? #803049 was a bug related to the display of thumbnails, and it made the app seem very sluggish because it just wasn't displaying the thumbnail images reliably. I've fixed that, however.

Now, the bug here is that the app crashes when first scanning Tim's photos.  As far as we know, no one else can reproduce it.
Tim:

Since you're the only one that can reproduce this, I wonder if it is specific to the images on your sdcard.  I could imagine a corrupt .jpeg file causing an OOM, for example, if it didn't have valid EXIF data.  Or maybe you just have many more files than others who have tested?

Are you able to bisect your set of images to find one that causes the OOM?  Or are you willing to zip up your images and email them to me?
Oh gosh, I feel a little stupid. To test the gallery I copied some pictures from my desktop to the sdcard and there was one image in there which should pretty obviously cause an OOM. It's a 3508x2480 PNG, 12.7MB in size. How do we guard against files like these? Ideally we should be able to display them but that probably wouldn't be a blocker as that's a very uncommon use case, no?
Easy to reproduce, just put this image on your sdcard:
http://upload.wikimedia.org/wikipedia/commons/b/b9/Mona_Lisa.PNG
Summary: [gallery] opening the gallery (first load) causes OOM → [gallery] generating thumbnails for huge images causes OOM
It's a little worse, even. All it takes to go OOM is a HQ image I took with my Galaxy S2 - 3264x2448, JPEG, 3.4MB.
Yeah, we can't crash on too-large images.  The best thing to do here is add a size limit.  We can tune the actual value of the limit over time.
Bug 796533 is about exactly that.  But I filed that bug before we'd actually encountered the problem. I'm going to close that one as a dupe of this one.
So in addition to never opening images that are too big to fit in memory, there are some other things I can do to reduce memory usage.

1) right now, since the embedded thumbnail is too big, I think I'm opening the full-size image and reducing that to the smaller size.  But now that the camera resolution is higher, it is worth finding the embedded thumbnail and reducing it to the micro-thumbnail that gets stored in mediadb.

2) If the embedded thumbnail in the file is as big as the screen, I should use it when displaying a photo, and only switch to the full-size image when the user zooms in. That should help a lot with memory usage while panning through images.  See bug 810105
Mike H reminds me that #2 above isn't possible. With our current camera driver the thumbnail size is fixed at 512x384. The API to change it exists but is not hooked up.
(In reply to David Flanagan [:djf] from comment #46)
> Mike H reminds me that #2 above isn't possible. With our current camera
> driver the thumbnail size is fixed at 512x384. The API to change it exists
> but is not hooked up.

I'm not thinking straight. That thumbnail size is bigger than the screen, so I could use it.
This bug was marked qawanted, but it's not clear what needs qa attention.  Removing qawanted.  If there needs to be qa done on this, please retag and remark what needs to be done.
Keywords: qawanted
Component: Gaia → Gaia::Gallery
David, what else needs to happen here to close this out?
The bug that was originally reported has been fixed.

The interesting discussion in this bug is about displaying the thumbnails in some kind of lazy list so they are not all decoded at once, and that is something that I'm working on in #809782, so we can close this now.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: