Closed Bug 1163225 Opened 9 years ago Closed 9 years ago

[Gallery] in low memory situations, thumbnail creation can fail and produce a black square

Categories

(Firefox OS Graveyard :: Gaia::Gallery, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(b2g-v2.2 affected, b2g-master affected)

RESOLVED WORKSFORME
Tracking Status
b2g-v2.2 --- affected
b2g-master --- affected

People

(Reporter: djf, Assigned: djf)

References

Details

While investigating bug 1161734, I found and filed bug 1163102 which described a low-memory situation where Gallery could fail to create a thumbnail for an image, and therefore fail to ever display that image.

But 1163102 was quickly closed because a patch for a related bug had just landed. Seth believes that his patch for that other bug will also fix the exception that 1163102 was about.

Unfortunately, if that exception just goes away, but the memory problems remain, we'll end up with images where the thumbnail is just an empty black rectangle.

Sicne 1163102 was closed, I'm filing this bug so we don't forget about the possibility.  Once 1161734 is resolved, we'll need to check to see whether we can reproduce this thumbnail creation problem. If gallery can generate blank thumbnails, then that is kind of a big problem and we'll need to figure something out.
Taking this bug so I don't forget about it, and setting needinfo so that No-Jun is aware of the possibility that bug 1163102 might still be lurking in a different form.
Assignee: nobody → dflanagan
Flags: needinfo?(npark)
(In reply to David Flanagan [:djf] from comment #1)
> Taking this bug so I don't forget about it, and setting needinfo so that
> No-Jun is aware of the possibility that bug 1163102 might still be lurking
> in a different form.

What would be the sure way to instigate this bug if it is still present?  will check it out on the next nightly test
Flags: needinfo?(dflanagan)
No-Jun,

With today's nightly, if bug 1163102 still reproduces (i.e. screenshots of the gallery app don't show up in the gallery when we're in a low-memory state) then that bug should probably be re-opened.

My concern is that if we follow the STR in bug 1163102 today, what we'll find is that instead of getting no thumbnail at all, we'll now get an incorrect blank thumbnail for the screenshot.

If that is happening, I really want to know about it.  Seth says that this kind of blank thumbnail has always been a possibility. I think that in the past the Gallery app would have OOMed instead of giving a blank thumbnail. Now, however, I'm worried that we might actually start seeing blank thumbnails.

If it is happening, I don't know if we can fix it, but I want to know about it. If we fix bug 1161739, then maybe it will also solve this problem.  Or maybe there isn't even really a problem here at all.  (I filed this bug preemptively even before I knew whether there would be a problem because I didn't want to forget to check for it.)
Flags: needinfo?(dflanagan)
Adding qawanted to verify this bug with following STR:

Load 100+ images into the phone over USB
Launch gallery, and check whether every image has thumbnails views properly decoded.  look out for black/white thumbnail views.
Exit Gallery app
Take 10+ screenshots by Power+VolDown combo outside the gallery app.
Launch Gallery app
Verify the screenshots are shown with proper thumbnails
Flags: needinfo?(npark)
Keywords: qawanted
When used STR in comment 4, grey thumbnails were shown for some of the images (174 images were loaded)

When those grey thumbnails were clicked, most displayed proper image, but after a few times, it was showing grey screen. upon exiting to the list view, previous grey thumbnails displayed icons. Repeating this process eventually killed the gallery app with OOM.

re-opening Bug 1163012 per Comment 2.

Version Info:
Build ID               20150511010202
Gaia Revision          6089234ace8b294a8feef064387604bae16254e3
Gaia Date              2015-05-10 13:57:12
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/d8420a541d1c
Gecko Version          40.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150511.042644
Firmware Date          Mon May 11 04:26:55 EDT 2015
Bootloader             L1TC000118D0
Sorry, that should be Bug 1163102
I closed bug 1163102 again. It does indeed look like that bug has been resolved. 

But it leaves us with this bug. In low-memory situations, I've verified that we can get completely blank thumbnails for screenshots.

STR:

1) put 100+ images on the phone
2) launch gallery and scroll quickly around until you see gray and missing thumbanils (the symptom of bug 1161734).  
3) While is this (low-memory) gray-thumbnail state, take screenshots *of the gallery app*
4) Scroll back to the top of the gallery and look at the thumbnail for the new screenshot.

Expected: the thumbnail will not be a solid black square. Or if it is it is because of bug 1161734 and scrolling around some will make it appear like a real thumbnail.

Actual: sometimes, the thumbnail image is actually just a permanently black square, because gecko did not have enough image memory to create the thumbnail.

This happened to me on 3 out of 8 attempts.
This is bad.

Seth says that there is no way that Gecko can guarantee that drawImage() will always succeed and we'll always get a thumbnail. The problem is that we don't have any way to detect failure, so we can't even try to create the thumbnail again some time. Once we've generated that all-black thumbnail, the user is stuck with it forever.

Maybe this has always been a possibility but in the past we would OOM before this occurred? I have never seen it before.

My hope is that if we resolve 1161734, then this will no longer be reproducible.

Even if this stops reproducing, this is an interesting bug because it highlights a shortcoming of the current Canvas API: I either need drawImage to guarantee that it will succeed, or I need it to OOM instead of failing, or I need to to tell me when it fails. As it stands I cannot be sure that I have generated a thumbnail.
Summary: [Gallery] make sure we always get a thumbnail → [Gallery] in low memory situations, thumbnail creation can fail and produce a black square
Keywords: qawanted
(In reply to David Flanagan [:djf] from comment #8)
> Maybe this has always been a possibility but in the past we would OOM before
> this occurred? I have never seen it before.

It has always been possible, but without image locking we'd discard all visible images if possible to free up memory, so it'd be harder to observe in practice.

We're currently holding on to way too many images in the Gallery app - it's confusing our image locking heuristics somehow - and I suspect this will again become hard to reproduce once this is fixed.

> Even if this stops reproducing, this is an interesting bug because it
> highlights a shortcoming of the current Canvas API: I either need drawImage
> to guarantee that it will succeed,

That's impossible. =(

> or I need it to OOM instead of failing,

We can't do that, either.

> or I need to to tell me when it fails.

This is the best bet. I think we should consider throwing an exception when drawImage fails due to low memory. It's not trivial to implement, though, as we need to pipe that information from quite a way away.
Depends on: 1166134
Depends on: 1166136
This is happening in practice on master (I haven't tested in 2.2). If I push 300 images to the phone with ./generateImages.sh 300 and then try to scan them, I'll find that many have permanently black thumbnails. The test images are PNGs, so the gallery also generates screen-size preview images for them. These preview images are also solid black. So the user sees a black thumbnail. When they click on it, they see a black image.  And when they zoom in they finally see the actual image.

I experimented by removing will-change:scroll-position from gallery.css to see if that would stop gecko from holding on to all of the images, but it didn't seem to help.

Here are things that would be interesting to explore:

1) modify the metadata parser so it creates and saves thumbnails for new images but does not display those new thumbnails.  That might tell us if we're running out of image memory because of the onscreen images or because of the offscreen images.

2) force the thumbanils to be created from newest to oldest order instead of oldest to newest like we often seem to get and see if creating them that way is easier on memory. The hypothesis here is that creating a thumbanil onscreen, then pushing it off screen with new thumbnails isn't allowing the images memory to be freed.  Making this change is something I'd like to do anyway to improve startup. It would probably have to be in mediadb and would probably be a change that was too risky to uplift to 2.2.
Noting that this affects 2.2 and 3.0.  In 2.2, however we don't get black thumbnail, we just never see any thumbnails at all because drawImage() throws an exception instead of failing silently.
I modified the gallery metadata parser to use a new img element for each thumbnail instead of reusing the same one, just to see if that would somehow affect memory management. It made no difference at all. 

I still need to try the ideas in comment #10
blocking-b2g: --- → 2.2?
Actually, I've already got bug 1164164 filed for this basic issue (with a different manifestation) on 2.2, and it has been made a 2.2+ blocker. So this one does not need to be nominated.
blocking-b2g: 2.2? → ---
David, as I suggested elsewhere, can you just read back from the canvas after calling drawImage() and see if all the pixels are black? For any thumbnail that isn't affected, you'll almost certainly only need to read one pixel to do this check. It's obviously not ideal, but this would be a route to fixing this on 2.2 and guaranteeing that at least the thumbnails will get regenerated the next time the user opens the gallery app.
I've does experiment #2 from comment 10, and have found that if I always force new photos to be added at the bottom of the document, then this bug never occurs.  And if I force new thumbnails to be added at the top, then it does occur.

If images are are decoded when inserted, then apparently gecko isn't good about releasing their memory when they get pushed out of the visible viewport.

I filed bug 1168985 about the gecko issue. As part of that bug, I found that I could workaround the issue by using window.scrollBy() to scroll down a pixel and then back up a pixel after inserting each image.

I'll see if that same workaround helps here.
The window.scrollBy() workaround does not seem to help.
Actually, the window isn't scrollable. But adjusting thumbnails.scrollTop() doesn't solve this bug either.
(In reply to David Flanagan [:djf] from comment #15)
> I filed bug 1168985 about the gecko issue. As part of that bug, I found that
> I could workaround the issue by using window.scrollBy() to scroll down a
> pixel and then back up a pixel after inserting each image.

We may actually be able to uplift a fix for bug 1168985 if Timothy agrees with the current patch in that bug. It's so simple that it should be safe to uplift, even at this late stage.
There is a patch attached to bug 1164164 that goes part way toward resolving this issue by changing the order in which mediadb processes newly found files.
Today's nightly includes Seth's patches for bugs 1169879, 1169880 and 1169881, and using that nightly I can no longer reproduce this bug.
Peter, could you (or other Qanalysts) whether this reproduces, and if not can we close the bug?  Thanks!
Flags: needinfo?(pbylenga)
Keywords: verifyme
This issue no longer reproduces on today's flame for me, adding qawanted to verify my testing to ensure it's not a one off.

Environmental Variables:
Device: Flame 3.0
Build ID: 20150602055237
Gaia: 6d477a7884273886605049b20f60af5c1583a150
Gecko: 9eae3880b132
Gonk: 040bb1e9ac8a5b6dd756fdd696aa37a8868b5c67
Version: 41.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0
Flags: needinfo?(pbylenga)
Keywords: qawanted
This issue is no longer reproducing on Flame 3.0 or 2.2. I generated 300 images and the app created thumbnails for all the images.

Device: Flame (KK, full flashed, 319MB)
BuildID: 20150605010203
Gaia: 65369b217faac7d70c1a29100c4208c6d1db16e3
Gecko: 0496b5b3e9ef
Gonk: 040bb1e9ac8a5b6dd756fdd696aa37a8868b5c67
Version: 41.0a1 (3.0 Master) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:41.0) Gecko/41.0 Firefox/41.0

Device: Flame (KK, full flashed, 319MB)
BuildID: 20150604002503
Gaia: b96e657ce2822df5da5da1a8ba91c38ad3281bc9
Gecko: 273f8ee45c88
Gonk: bd9cb3af2a0354577a6903917bc826489050b40d
Version: 37.0 (2.2) 
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0

Closing bug as works for me.
Status: NEW → RESOLVED
Closed: 9 years ago
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: qawanted, verifyme
Resolution: --- → WORKSFORME
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.