Last Comment Bug 862970 - don't lock images on the active page on B2G
: don't lock images on the active page on B2G
Status: VERIFIED FIXED
[MemShrink]
:
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: unspecified
: x86 Mac OS X
-- normal (vote)
: mozilla23
Assigned To: Joe Drew (not getting mail)
:
: Milan Sreckovic [:milan]
Mentors:
Depends on: 866947 868066
Blocks: 852643 860818
  Show dependency treegraph
 
Reported: 2013-04-17 11:24 PDT by Andreas Gal :gal
Modified: 2014-03-04 04:09 PST (History)
16 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
tef+
wontfix
wontfix
fixed
fixed
wontfix
fixed


Attachments
patch (799 bytes, patch)
2013-04-17 11:25 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
patch (802 bytes, patch)
2013-04-17 11:34 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
Keep sending low-memory-ongoing notifications every 5s which flush caches but don't GC. (6.66 KB, patch)
2013-04-18 03:35 PDT, Andreas Gal :gal
justin.lebar+bug: review+
Details | Diff | Splinter Review
port to b2g18 (6.64 KB, patch)
2013-04-18 13:55 PDT, Joe Drew (not getting mail)
no flags Details | Diff | Splinter Review
updated patch, decode early, but still don't lock (7.26 KB, patch)
2013-04-19 03:16 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
Patch that builds on mozilla-b2g18 (7.25 KB, patch)
2013-04-19 04:50 PDT, Josh Matthews [:jdm]
no flags Details | Diff | Splinter Review
patch (8.76 KB, patch)
2013-04-19 06:54 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
WIP, measure available system memory before allocating decoded image data. (9.87 KB, patch)
2013-04-19 12:34 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Limit decoded image cache to 30MB (8.74 KB, patch)
2013-04-20 06:38 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
updated patch, pref to limit eager image decoding on load (9.41 KB, patch)
2013-04-22 12:42 PDT, Andreas Gal :gal
no flags Details | Diff | Splinter Review
remove some debug junk (7.92 KB, patch)
2013-04-22 13:03 PDT, Andreas Gal :gal
dflanagan: review-
Details | Diff | Splinter Review
disable image locking on b2g, handle ongoing OOM condition better, and only eagerly decode first 24 images in a document (8.00 KB, patch)
2013-04-23 19:09 PDT, Andreas Gal :gal
joe: review+
Details | Diff | Splinter Review
patch for inbound (11.41 KB, patch)
2013-04-24 08:35 PDT, Andreas Gal :gal
joe: review+
Details | Diff | Splinter Review
patch for inbound (12.60 KB, patch)
2013-04-26 11:04 PDT, Joe Drew (not getting mail)
no flags Details | Diff | Splinter Review
b2g18 patch (12.44 KB, patch)
2013-04-29 07:14 PDT, Joe Drew (not getting mail)
no flags Details | Diff | Splinter Review

Description User image Andreas Gal :gal 2013-04-17 11:24:57 PDT
We OOM on image-rich pages with B2G. We currently lock images on the current tab. Instead, we will disable image locking and set the timeout and max decode memory to very high values, and we will rely on the OOM handler to clear the decoded image cache.
Comment 1 User image Andreas Gal :gal 2013-04-17 11:25:56 PDT
Created attachment 738643 [details] [diff] [review]
patch
Comment 2 User image Andreas Gal :gal 2013-04-17 11:26:23 PDT
we have to check that bug 807143 doesn't reappear with this fix
Comment 3 User image Joe Drew (not getting mail) 2013-04-17 11:29:19 PDT
I understand the locking change, but why the timeout & max decoded size change?
Comment 4 User image Andreas Gal :gal 2013-04-17 11:34:49 PDT
Created attachment 738647 [details] [diff] [review]
patch
Comment 5 User image Andreas Gal :gal 2013-04-17 11:57:26 PDT
I see some issues with the patch. Decoding gets stuck half way through the image on pintrest.
Comment 6 User image Andreas Gal :gal 2013-04-17 11:58:22 PDT
In general it works pretty well though. Decoding is fast enough for normal reading/scrolling.
Comment 7 User image Andreas Gal :gal 2013-04-17 11:59:16 PDT
however, on GP I don't think the OOM handler kicks in
Comment 8 User image Joe Drew (not getting mail) 2013-04-17 12:00:57 PDT
(In reply to Andreas Gal :gal from comment #5)
> I see some issues with the patch. Decoding gets stuck half way through the
> image on pintrest.

That sounds like bug 807143.
Comment 9 User image Joe Drew (not getting mail) 2013-04-17 12:02:31 PDT
(Bug 807143 was not just about flickering, but about images not redrawing/not redrawing fully, for reasons that was never explored.)
Comment 10 User image Joe Drew (not getting mail) 2013-04-17 12:06:23 PDT
(In reply to Andreas Gal :gal from comment #7)
> however, on GP I don't think the OOM handler kicks in

With lock disabled, I'd be a little surprised if we stored enough decoded image memory to reach OOM, fwiw.
Comment 11 User image Andreas Gal :gal 2013-04-17 12:45:14 PDT
probably because the GP has more memory. this needs testing on a production device, but otherwise it looks promising
Comment 12 User image Andreas Gal :gal 2013-04-17 12:46:38 PDT
Sounds like we need someone to debug the "never explored" part of comment 9. Joe, are you available to do that?

As for comment 10, look at the patch. We never discard anything except via the OOM handler.
Comment 13 User image Joe Drew (not getting mail) 2013-04-17 12:55:55 PDT
(In reply to Andreas Gal :gal from comment #12)
> Sounds like we need someone to debug the "never explored" part of comment 9.
> Joe, are you available to do that?

Not immediately, but in a couple of days I think I will be.

> As for comment 10, look at the patch. We never discard anything except via
> the OOM handler.

That's why I asked in comment 3 why you were doing this :), but I think you're right that this will make us not discard except via OOM or exceeding 512 MB. There are occasions when we discard immediately, but that's mostly in corner cases.
Comment 14 User image Justin Lebar (not reading bugmail) 2013-04-17 15:30:44 PDT
Please note that it's not an out-of-memory hook, it's a low-memory notification.  These are throttled to run no more than once every 5s (gonk.systemMemoryPressureRecoveryPollMS"), and even then only when we transition between a low-memory and a not-low-memory state.

So suppose we have the following situation:

 * Use your phone.  Lots of processes are loaded and running in the background.
 * Load the browser.  This causes us to start up a new process, which causes the amount of free memory on the system to go under 10mb.  We fire a low-memory notification.
 * Load Pintrest in the browser.

At this point, we will not throw away any images until we free up enough memory so the system has more than 10mb free (because we must leave the low-memory state before we fire the low-memory notification again.)  That's unlikely to happen; I expect we'll crash instead.

We could fire the low-memory notification more often, but it's not clear how to do that correctly.  We do not want every process on the system to GC every 5s while we're under memory pressure.
Comment 15 User image Joe Drew (not getting mail) 2013-04-17 15:53:30 PDT
Andreas, given comment 14, if we didn't have the redrawing problem, could we ship with the image flashing that happens without the long timeout and huge decoded data limit?
Comment 16 User image Justin Lebar (not reading bugmail) 2013-04-17 16:00:19 PDT
Other things you almost certainly want to do here:

 * Avoid the 5ms sync decode after the first time we draw an image.  In fact, we should probably avoid the sync decode the first time we draw an image, too.  This will break reftests, but leaving in the sync decode will probably lead to serious jank when scrolling.

 * Discard all images in a document when it is destroyed.  Images can be shared between documents, so this isn't totally correct, but it's probably close enough.

If relying on the low-memory notifications as-is is good enough, then it's good enough.  I just think we may want to tweak them somehow, particularly if they're a primary means of notifying us that we should drop memory.  It's a very delicate balance; you can imagine us dropping all images every 5s while we're under memory pressure (but not GC'ing etc), but that would be a bad experience if we're low on memory for some reason other than images.
Comment 17 User image Andreas Gal :gal 2013-04-17 16:01:05 PDT
I have never seen the state of comment 15 but I can go and reproduce that. In the meantime, we need to diagnose the partial decoding either way.
Comment 18 User image Justin Lebar (not reading bugmail) 2013-04-17 16:03:15 PDT
There are plenty of other ways to get into the state from comment 14.  The point is that after we get into a low-memory state, we will not discard any images until we leave the low-memory state.  That may be undesirable.
Comment 19 User image Milan Sreckovic [:milan] 2013-04-17 16:05:03 PDT
Let's keep this bug up to date tomorrow as well, Joe can get further involved, but note the six hour time difference.
Comment 20 User image Joe Drew (not getting mail) 2013-04-17 16:06:39 PDT
(In reply to Justin Lebar [:jlebar] from comment #16)
> leaving in the sync decode will
> probably lead to serious jank when scrolling.

Really? I thought scrolling was done in a different thread from the main Gecko thread.

>  * Discard all images in a document when it is destroyed.  Images can be
> shared between documents, so this isn't totally correct, but it's probably
> close enough.

Note that nsDocument::DeleteShell() already does this. I don't know whether that's good enough for the b2g case, though.
Comment 21 User image Joe Drew (not getting mail) 2013-04-17 16:07:41 PDT
(In reply to Andreas Gal :gal from comment #17)
> I have never seen the state of comment 15 but I can go and reproduce that.
> In the meantime, we need to diagnose the partial decoding either way.

I'm pretty sure it's partial *redrawing,* but regardless I will try to diagnose it tomorrow.
Comment 22 User image Andreas Gal :gal 2013-04-17 16:10:42 PDT
I still feel that the low memory notification approach is the right way to go. jlebar, can we somehow tell that we are "still in low memory" state? In that case I would prefer to fire of a timer that keeps throwing away images until we no longer are in low memory state.
Comment 23 User image Andreas Gal :gal 2013-04-17 16:11:04 PDT
Partial redrawing is fine. We should schedule another paint after that when we have the full image right?
Comment 24 User image Justin Lebar (not reading bugmail) 2013-04-17 16:14:36 PDT
> jlebar, can we somehow tell that we are "still in low memory" state?

Absolutely; hopefully it's clear in GonkMemoryPressureMonitoring.cpp how to do this, but let me know if it's not.

> Really? I thought scrolling was done in a different thread from the main Gecko thread.

If I'm wrong, that's awesome.  I didn't think we had async scrolling everywhere in b2g.  Maybe in the browser but not in apps, or vice versa?

> Note that nsDocument::DeleteShell() already does this.

That should probably be sufficient.
Comment 25 User image Joe Drew (not getting mail) 2013-04-17 16:15:10 PDT
You'd think so! Obviously something's going wrong, though; I wouldn't want to speculate as to what in particular.
Comment 26 User image Andreas Gal :gal 2013-04-17 23:11:34 PDT
Any updates here Joe? Did you debug the cause for the partial paints?
Comment 27 User image Andreas Gal :gal 2013-04-18 03:35:04 PDT
Created attachment 738966 [details] [diff] [review]
Keep sending low-memory-ongoing notifications every 5s which flush caches but don't GC.
Comment 28 User image Andreas Gal :gal 2013-04-18 03:38:47 PDT
Comment on attachment 738966 [details] [diff] [review]
Keep sending low-memory-ongoing notifications every 5s which flush caches but don't GC.

This is completely untested. Justin, can you please ask Lucas and Faramarz to find you someone who can take this and finish this today? Also, please ask Jeff for help with the rendering glitch (partial image rendering) I am seeing on pintrest. Looks like no updates from Joe, so we should do our own investigation.
Comment 29 User image Milan Sreckovic [:milan] 2013-04-18 04:24:54 PDT
(In reply to Andreas Gal :gal from comment #28)
> Comment on attachment 738966 [details] [diff] [review]
> Keep sending low-memory-ongoing notifications every 5s which flush caches
> but don't GC.
> 
> This is completely untested. Justin, can you please ask Lucas and Faramarz
> to find you someone who can take this and finish this today? Also, please
> ask Jeff for help with the rendering glitch (partial image rendering) I am
> seeing on pintrest. Looks like no updates from Joe, so we should do our own
> investigation.

If we can cover it in Madrid great, but note that Joe's updates not being there between 7pm and 7am local time is not an indication that he's not going to look at it.  I expect more information in a few hours, so if anything happens in the meantime, let's make sure this bug is up to date.
Comment 30 User image Josh Matthews [:jdm] 2013-04-18 06:33:44 PDT
I'm going to play around with this on an unagi.
Comment 31 User image Justin Lebar (not reading bugmail) 2013-04-18 06:59:56 PDT
Comment on attachment 738966 [details] [diff] [review]
Keep sending low-memory-ongoing notifications every 5s which flush caches but don't GC.

Changes to GonkMemoryPressureMonitoring look good.  But certainly the pref changes should be reviewed separately (and need tests).

I'm not familiar with this code, but CompositorChild sends a message to the parent on memory pressure (CompositorChild.cpp::MemoryPressureObserver), and then the parent does stuff on behalf of the child.  We may not want to do this, since it encourages priority inversion (low-priority bg processes making the main process do work on their behalf) when we're under long-lasting memory pressure.

It's also not clear to me how expensive nsFontCache::Compact() is.  Similarly, memory pressure causes us to run gfxFont.h::FlushShapedWordCaches, which looks like something we don't want to do every 5s.

I'm most worried about CompositorChild, followed by FlushShapedWordCaches.

Better safe than sorry?

P.S. If you use git-bz [1], it will automatically attach -U8 patches.  :)

[1] https://github.com/jlebar/moz-git-tools
Comment 32 User image Justin Lebar (not reading bugmail) 2013-04-18 07:00:10 PDT
> But certainly the pref changes should be reviewed separately (and need tests).

s/tests/testing/
Comment 33 User image Milan Sreckovic [:milan] 2013-04-18 07:35:53 PDT
Are we considering this for tef?
Comment 34 User image Joe Drew (not getting mail) 2013-04-18 07:54:05 PDT
(In reply to Andreas Gal :gal from comment #26)
> Any updates here Joe? Did you debug the cause for the partial paints?

No, because of the 6 hour time difference. I'm starting on it now.
Comment 35 User image Andreas Gal :gal 2013-04-18 07:54:47 PDT
1.0.1 is crashing on pintrest and many other popular sites, so yes, if we can swing a decent fix, I will go and beg for an uplift.
Comment 36 User image Milan Sreckovic [:milan] 2013-04-18 08:06:12 PDT
(In reply to Andreas Gal :gal from comment #35)
> 1.0.1 is crashing on pintrest and many other popular sites, so yes, if we
> can swing a decent fix, I will go and beg for an uplift.

JST and I just had a mini-triage, tef+ this.
Comment 37 User image Joe Drew (not getting mail) 2013-04-18 08:21:26 PDT
(In reply to Andreas Gal :gal from comment #5)
> I see some issues with the patch. Decoding gets stuck half way through the
> image on pintrest.

So I'm clear: This happens when loading m.pinterest.com in the browser, right?
Comment 38 User image Andrew Overholt [:overholt] 2013-04-18 08:55:41 PDT
(In reply to Joe Drew (:JOEDREW! \o/) from comment #37)
> (In reply to Andreas Gal :gal from comment #5)
> > I see some issues with the patch. Decoding gets stuck half way through the
> > image on pintrest.
> 
> So I'm clear: This happens when loading m.pinterest.com in the browser,
> right?

Yes.
Comment 39 User image Andreas Gal :gal 2013-04-18 11:18:47 PDT
Joe, any updates here?
Comment 40 User image Joe Drew (not getting mail) 2013-04-18 11:25:10 PDT
I am currently debugging why bug 807143 happens with image locking disabled, with the hope that fixing that bug (very easily reproduced) will fix pinterest partial display problems (difficult to reproduce, at least for me on an Unagi phone).
Comment 41 User image Andreas Gal :gal 2013-04-18 11:44:48 PDT
Cool, thanks. I will focus on this patch then. On the TCL device loading pintrest kills the browser and takes down the system. Looks like the OOM killer doesn't work there.
Comment 42 User image Andreas Gal :gal 2013-04-18 11:49:09 PDT
I/Gecko   (  144): 
I/Gecko   (  144): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
I/Gecko   (  144): 
E/GeckoConsole(  144): Content JS WARN at http://connect.facebook.net/en_US/all.js:52 in i: The "fb-root" div has not been created, auto-creating
I/Gecko   (  144): 
I/Gecko   (  144): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
I/Gecko   (  144): 
I/Gecko   (  144): 
I/Gecko   (  144): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
I/Gecko   (  144): 

I get this while pintrest is sucking up all the memory and the device dies. Wonderful. This is a device with a kernel that might not be up to date.
Comment 43 User image Andreas Gal :gal 2013-04-18 11:59:30 PDT
Joe, here is an easy test case with this patch. Applied it and then hold down the finger on an icon on the homescreen. We go into icon edit mode and all the icons get a little "x" on the top right and a second later all icons disappear (only the red x remains for each icon).
Comment 44 User image Andreas Gal :gal 2013-04-18 12:02:56 PDT
Yeah, memory pressure triggering definitely doesn't work on my device but with this patch things on pintrest are much much better (still running OOM eventually though, because memory pressure purge never comes). Patch also slightly changes some perceived painting so definitely not zero risk (we don't decode on load any more).
Comment 45 User image Joe Drew (not getting mail) 2013-04-18 12:10:21 PDT
FWIW: Even after this works, I have some reservations about what the UX is going to be like for blessed apps if their icons, etc can be discarded from under them. There's going to be a bunch of flashing.

I have no idea if it's possible to have different prefs for those apps, but if so, it'd be a good idea to enable locking for them.
Comment 46 User image Joe Drew (not getting mail) 2013-04-18 12:11:13 PDT
We are definitely re-decoding these images after discard, but for some reason we're never redrawing them. Still trying to track down how the invalidation is getting lost.
Comment 47 User image Andreas Gal :gal 2013-04-18 12:12:20 PDT
Yeah, lets see what it looks like. I am using a device of the other vendor now. That device properly kills the browser and the system survives. I will try my patch now.
Comment 48 User image Andreas Gal :gal 2013-04-18 12:13:47 PDT
Note: flashing would only occur when we are in memory pressure, which basically means "omg omg we are about to die", so I think its ok to trade off a little flashing vs not dying. Outside memory pressure we should never flash since we never discard images.
Comment 49 User image Joe Drew (not getting mail) 2013-04-18 12:35:39 PDT
So: we redecode an image and mark it dirty, but because we're in memory pressure or hit our limit or whatever, by the time we get back around to actually reading the image, it's been discarded again. 

This repeats in a continuous loop forever.
Comment 50 User image Andreas Gal :gal 2013-04-18 12:49:02 PDT
Cool. That problem should not happen with this patch. I am trying to get my device build to work to test with the kernel that actually properly sends memory pressure events. Thanks for the diagnosis Joe.

Here is what I expect to happen:
- we go into memory pressure
- we throw away all decoded images
- we start painting, re-decoded whatever is visible
- we don't throw away, as long we are no longer in memory pressure
- since we no longer lock all the images, we should come out of memory pressure, and everything is happy

A corner case might be when lots of apps are open and we are surfing the memory pressure threshold. We will see in a few.
Comment 51 User image Joe Drew (not getting mail) 2013-04-18 12:51:15 PDT
I could definitely see some other corner cases where having a giant image on-screen causes us to go into memory pressure again, or it takes a little while for memory pressure to "clear", but hopefully in practice we should be OK.
Comment 52 User image Andreas Gal :gal 2013-04-18 13:15:43 PDT
Ok, I need someone else to test this. My proto device is borked. Genlock is taking a mental health day.
Comment 53 User image Joe Drew (not getting mail) 2013-04-18 13:28:32 PDT
Does jdm have a device that knows how to fire memory-pressure? (I don't. :( )
Comment 54 User image Joe Drew (not getting mail) 2013-04-18 13:55:42 PDT
Created attachment 739256 [details] [diff] [review]
port to b2g18
Comment 55 User image Andreas Gal :gal 2013-04-18 13:57:11 PDT
The main issue I am seeing are disappearing icons when long-tapping on the home screen (red x appears, icons disappear). This might be related to drawImage path. We should focus on this. This is unrelated to memory pressure and pintrest. I will work with jlebar on the actual OOM testing tomorrow.
Comment 56 User image Joe Drew (not getting mail) 2013-04-18 14:10:18 PDT
Unfortunately I can't reproduce this problem at all. Andreas tells me that it happens on first boot, even.

I need someone who *can* reproduce to break in CanvasRenderingContext2D::DrawImage and find out what happens in SurfaceForElement. If that fails to decode the image for whatever reason, it'd explain this behaviour, and we'd be able to move on from there.

(Note that my attempts are all on b2g18.)
Comment 57 User image Justin Lebar (not reading bugmail) 2013-04-18 15:07:43 PDT
> I get this while pintrest is sucking up all the memory and the device dies. Wonderful.

FWIW we need dmesg output in order to understand OOMs.  logcat doesn't tell us anything helpful.
Comment 58 User image Joe Drew (not getting mail) 2013-04-18 15:16:23 PDT
I tried to build B2G against trunk mozilla-central with Andreas' patch, but it just ended up with a build that wouldn't boot. I blew away all objdirs and am trying again, but now I'm out of time until tomorrow.
Comment 59 User image Justin Lebar (not reading bugmail) 2013-04-18 15:51:05 PDT
I suspect you'll have much better luck building b2g18 than m-c.  Nobody tests m-c; it's perpetually broken.
Comment 60 User image Joe Drew (not getting mail) 2013-04-18 18:33:42 PDT
Yeah, okay. Hopefully someone can tell me what is happening in CanvasRenderingContext2D::DrawImage; if I have that information I can figure it out.
Comment 61 User image Andreas Gal :gal 2013-04-19 00:13:42 PDT
milan, can we get some help today here?
Comment 62 User image Milan Sreckovic [:milan] 2013-04-19 00:48:19 PDT
Sure, just waiting for people to show up.
Comment 63 User image Andreas Gal :gal 2013-04-19 01:46:37 PDT
Ok, so I think this is actually a blob that gets rendered that we snapshotted from a canvas. We read img.naturalWidth and if that is zero we don't paint it.
Comment 64 User image Andreas Gal :gal 2013-04-19 02:22:34 PDT
So I think we fire onload before .width and .height is available. The gaia homescreen seems to rely on that being available and before it was because we start decode early.
Comment 65 User image Andreas Gal :gal 2013-04-19 03:16:06 PDT
We might be revoking the blob url after onload and we set src to that url but the image isn't decoded yet (not locked!) so it can't be decoded later when we paint it.
Comment 66 User image Andreas Gal :gal 2013-04-19 03:16:42 PDT
Created attachment 739508 [details] [diff] [review]
updated patch, decode early, but still don't lock
Comment 67 User image Jeff Muizelaar [:jrmuizel] 2013-04-19 03:37:20 PDT
(In reply to Andreas Gal :gal from comment #66)
> Created attachment 739508 [details] [diff] [review]
> updated patch, decode early, but still don't lock

This seems reasonable if it works.
Comment 68 User image Milan Sreckovic [:milan] 2013-04-19 03:46:30 PDT
With this patch, I don't get the problem described in comment 43, but can't test pinterest.com itself, I'm running into a "can't turn wifi on" known bug.
Comment 69 User image Milan Sreckovic [:milan] 2013-04-19 03:56:49 PDT
On pinterest.com I managed to get the "oh, how embarrassing, this is taking too long" once after some scrolling, then within 5s on the reload.  OK the second time.
Comment 70 User image Josh Matthews [:jdm] 2013-04-19 04:50:57 PDT
Created attachment 739540 [details] [diff] [review]
Patch that builds on mozilla-b2g18

No idea if Andreas' patch builds on other trees, but it certainly didn't on b2g18. Here's a version that does.
Comment 71 User image Joe Drew (not getting mail) 2013-04-19 05:27:26 PDT
(In reply to Andreas Gal :gal from comment #64)
> So I think we fire onload before .width and .height is available. The gaia
> homescreen seems to rely on that being available and before it was because
> we start decode early.

Barring a bug, this definitely isn't the case - that's the whole point of onload.

(In reply to Andreas Gal :gal from comment #65)
> We might be revoking the blob url after onload and we set src to that url
> but the image isn't decoded yet (not locked!) so it can't be decoded later
> when we paint it.

What do you mean by "revoking the blob url"? drawImage should always synchronously decode the image as long as it's loaded.
Comment 72 User image Joe Drew (not getting mail) 2013-04-19 05:29:27 PDT
This patch looks reasonable. I don't think it should cause much (if any!) main-thread jank.

I still want to know what's happening in drawImage, though; the RequestDecode part of this patch shouldn't be necessary.
Comment 73 User image Justin Lebar (not reading bugmail) 2013-04-19 05:30:29 PDT
What seems to happen is that

 * We have an image,
 * We drawImage it to a canvas
 * We pull the canvas out as a blob
 * /that blob/ is not decoded.
Comment 74 User image Joe Drew (not getting mail) 2013-04-19 05:43:14 PDT
Reading page.js in the homescreen app, it's entirely possible that the Object URL *is* being revoked before it can be painted, meaning that if we don't explicitly requestDecode as the image is added to the document (like Andreas' patch does), we can never decode it because the blob URL is gone.
Comment 75 User image Joe Drew (not getting mail) 2013-04-19 05:45:00 PDT
Specifically, displayRenderedIcon calls revokeObjectURL during onload, which is basically guaranteed to be too early; onload often fires before any painting is done, and painting is the only thing *guaranteed* to start the image decoding process.
Comment 76 User image Jeff Muizelaar [:jrmuizel] 2013-04-19 05:47:29 PDT
(In reply to Joe Drew (:JOEDREW! \o/) from comment #75)
> Specifically, displayRenderedIcon calls revokeObjectURL during onload, which
> is basically guaranteed to be too early; onload often fires before any
> painting is done, and painting is the only thing *guaranteed* to start the
> image decoding process.

That sounds like a reasonable explanation for why this only happens with blobs.
Comment 77 User image Jeff Muizelaar [:jrmuizel] 2013-04-19 06:01:38 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #76)
> (In reply to Joe Drew (:JOEDREW! \o/) from comment #75)
> > Specifically, displayRenderedIcon calls revokeObjectURL during onload, which
> > is basically guaranteed to be too early; onload often fires before any
> > painting is done, and painting is the only thing *guaranteed* to start the
> > image decoding process.
> 
> That sounds like a reasonable explanation for why this only happens with
> blobs.

But there's still some weird stuff going on.
Comment 78 User image Andreas Gal :gal 2013-04-19 06:54:14 PDT
Created attachment 739591 [details] [diff] [review]
patch
Comment 79 User image Milan Sreckovic [:milan] 2013-04-19 07:21:14 PDT
Assigning to me to drive it, it still needs the full effort by everybody involved to actually resolve it :)
Comment 80 User image Jeff Muizelaar [:jrmuizel] 2013-04-19 07:24:34 PDT
One thing I've thought about. With this patch, won't we accumulate decoded images from pages that we've navigated away from until we hit OOM?
Comment 81 User image Justin Lebar (not reading bugmail) 2013-04-19 07:28:27 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80)
> One thing I've thought about. With this patch, won't we accumulate decoded
> images from pages that we've navigated away from until we hit OOM?

It depends on when nsDocument::DeleteShell is called.  If that's called when we navigate away from a page, we're fine.
Comment 82 User image Milan Sreckovic [:milan] 2013-04-19 07:49:14 PDT
What was the status of "mImageLocking is false" before?  We never bothered decoding the images in that case.
Comment 83 User image Joe Drew (not getting mail) 2013-04-19 08:00:22 PDT
We would decode them if they were painted.
Comment 84 User image Joe Drew (not getting mail) 2013-04-19 08:28:39 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #80)
> One thing I've thought about. With this patch, won't we accumulate decoded
> images from pages that we've navigated away from until we hit OOM?

Presumably we'd hit a low memory notification first, though.
Comment 85 User image Justin Lebar (not reading bugmail) 2013-04-19 09:06:44 PDT
To put in writing here what we've already discussed: We were observing a low-memory notification in the master process, but the child wasn't getting it before it was killed.  jdm tried increasing the threshold at which we get notified, and then the child gets the low-memory notification and then immediately dies.
Comment 86 User image Milan Sreckovic [:milan] 2013-04-19 09:25:23 PDT
(In reply to Milan Sreckovic [:milan] from comment #79)
> Assigning to me to drive it

This time for real :)
Comment 87 User image Andreas Gal :gal 2013-04-19 10:19:46 PDT
jlebar, how is it possible that the child process OOMs so quickly? Can we tell remaining physical memory from user space so we avoid getting so close?
Comment 88 User image Joe Drew (not getting mail) 2013-04-19 10:21:13 PDT
Our current idea is to do what we used to do on Maemo, and check for memory pressure when we're trying to create a new image frame; if we're currently under pressure, we'll just abort the decode.
Comment 89 User image Justin Lebar (not reading bugmail) 2013-04-19 10:26:56 PDT
> jlebar, how is it possible that the child process OOMs so quickly?

I don't see how it should be impossible.  Note that allocations in the main process (for example, Necko buffers) can cause us to OOM in the child.

> Can we tell remaining physical memory from user space so we avoid getting so close?

Yes; working on it.
Comment 90 User image Justin Lebar (not reading bugmail) 2013-04-19 12:34:41 PDT
Created attachment 739742 [details] [diff] [review]
WIP, measure available system memory before allocating decoded image data.

This compiles, but it doesn't run for me, probably because the b2g18 base rev I
pulled is broken.  As a result, I haven't tested this; it probably doesn't work
right.

I need to go for the evening, and I'm on a plane tomorrow, but maybe this will
help someone.
Comment 91 User image Joe Drew (not getting mail) 2013-04-19 13:18:12 PDT
It looks like this isn't enough, which isn't *too* surprising - with overcommit it's possible for memory to go from fine to too-low without calling EnsureFrame.

We're going to need a hook in FrameUpdated too, but one that forcibly kills the decoder and discards the frames.
Comment 92 User image Justin Lebar (not reading bugmail) 2013-04-19 15:51:16 PDT
> It looks like this isn't enough

I'm really surprised that my patch worked perfectly on the first try.  Did you
check that it's reading the amount of free memory correctly?  Sorry I haven't
been able to test it myself.  It sucks that our trees are so unstable.

> with overcommit it's possible for memory to go from fine to too-low without calling 
> EnsureFrame.

Yeah, that's a good point.  I can imagine us kicking off N decodes all at about the same time and saying "yep, plenty of space for you" for each of them.  I can also imagine us discarding all images and still falling over due to the images currently being decoded.

Before someone asks, I don't think we can disable overcommit without first modifying jemalloc and possibly the JS engine.  I suspect thirdy-party code like drivers may also rely on overcommit.

jdm and I talked for a while, and we think we still have some options here.
Not all of these are mutually-exclusive.

1. As Joe suggested, we could cancel decodes on memory pressure, or we could
   even read the amount of free memory on the system while we decode.

2. We could fire the low-memory event at a threshold above 10mb.  jdm observed
   that when the threshold was 15mb, the child process at least received the
   low-memory notification from the kernel before it was killed.  Maybe we
   need to go to 20mb.

3. We could fire the memory-pressure event faster after receiving the
   notification from the kernel.  Right now we dispatch an event to the main
   thread, but we could use nsThread.cpp::ScheduleMemoryPressureEvent, which
   cuts the line in the event loop.

4. We could allocate our images in discardable memory.  This would be a pretty
   heavyweight change.  There's a sketch of an API and implementation in bug
   748598.  I suspect it may be hard to get the right discarding policy in the
   kernel, though.
Comment 93 User image Andreas Gal :gal 2013-04-20 01:37:50 PDT
The discardable memory stuff looks like exactly what we need. I have a sick day in Madrid, so I will grab that bug and go for it.
Comment 94 User image Justin Lebar (not reading bugmail) 2013-04-20 01:51:32 PDT
Not to be Debbie Downer, but an important part of using ashmem will be ensuring that processes get "charged" correctly for memory that they own, and that we free memory owned by low-priority processes before we free memory owned by high-priority processes.

Since ashmem is a shared memory system, I suspect it may not do this well.
Comment 95 User image Andreas Gal :gal 2013-04-20 04:47:17 PDT
Ok. In that case I think the next best thing is to take your patch and in the alloc path purge the discard tracker and then trying to allocate again. We want to avoid failing in that path.
Comment 96 User image Andreas Gal :gal 2013-04-20 04:50:54 PDT
jlebar, I have a nice DiscardableMemory abstraction patch. I guess we could still use that in the parent, especially in necko.
Comment 97 User image Andreas Gal :gal 2013-04-20 06:06:55 PDT
jlebar, as far as I can tell ashmem properly accounts ashmem towards the "owner" process and when that process dies, ashmem gets deallocated. ashmem's purge policy is LRU and not bound to priorities. I think thats ok for cache-like use, so I want to start using it for the image cache and memcache and what not.
Comment 98 User image Andreas Gal :gal 2013-04-20 06:38:55 PDT
Created attachment 739963 [details] [diff] [review]
Limit decoded image cache to 30MB

This patch actually works very well and the browser stays at a very manageable process size with pintrest. Flashing is minimal/not noticable after the page fully loaded. Until loading however there is an issue. I am kicking off eager decoding, and that eager decoding keeps kicking out images on top of the page while images further down load. We have to do eager decoding only until the image tracker quota is full. The next patch will do that.
Comment 99 User image Andreas Gal :gal 2013-04-20 09:01:34 PDT
Ok, after a few hours of debugging I think the problem is that we don't start tracking images in the discard tracker until _after_ decoding is complete and we fire off too many eager decoding requests and they thresh the decoded image cache, which is why the first images on the page don't show up (they keep getting evicted by decoding requests from further down in the document).
Comment 100 User image Justin Lebar (not reading bugmail) 2013-04-20 13:29:29 PDT
(In reply to Andreas Gal :gal from comment #96)
> jlebar, I have a nice DiscardableMemory abstraction patch. I guess we could
> still use that in the parent, especially in necko.

Upon reflection, I have a worse concern than the one from comment 94.

Linux has an OOM killer component that kills processes when the system is (very nearly) out of memory.  Android adds a second component, the low-memory killer (LMK), which usually acts before the OOM killer.  The LMK can follow a set of strict priorities, whereas the priorities we give to the OOM killer are more like guidelines.  The LMK can also be configured to kill certain classes of apps when we reach certain thresholds of free memory, and we use this.  For example, we kill fg apps when there's less than 5mb of free memory on the system.  This seems to keep the main process from OOMing itself.

So the difficulty with discardable memory is, we need ashmem to work together with the LMK.  If a fg app has an ashmem mapping, we want that to be discarded before the fg app is killed by the LMK, otherwise ashmem isn't very useful.

Looking through the kernel source, it appears that maybe we will try to free ashmem segments before we invoke the LMK; I wasn't expecting even this level of coordination.  There doesn't appear to be any coordination causing us to kill low-priority tasks before we free a high-priority task's ashmem segments, which might cause us to favor keeping background tasks alive at the expense of being able to show images in a foreground task.

Anyway, if we went with this approach, we'd need to test it out and see what happens.  Perhaps we can tune our LMK params so that it works well enough.  Like low-memory notifications, ashmem may be a helpful tool as part of an ensemble of approaches, but I doubt it will fix all the problems.
Comment 101 User image Joe Drew (not getting mail) 2013-04-22 10:44:23 PDT
Justin, Andreas: do you need Jeff or me to do anything at this point?
Comment 102 User image Andreas Gal :gal 2013-04-22 12:42:59 PDT
Created attachment 740421 [details] [diff] [review]
updated patch, pref to limit eager image decoding on load
Comment 103 User image David Flanagan [:djf] 2013-04-22 13:01:17 PDT
The version of the patch from earlier this morning (not the comment #102 one) did not fix my problems in bug 861965. See https://bugzilla.mozilla.org/show_bug.cgi?id=861965#c14 and https://bugzilla.mozilla.org/attachment.cgi?id=740429

I haven't tried this latest version yet.
Comment 104 User image Andreas Gal :gal 2013-04-22 13:03:07 PDT
Created attachment 740432 [details] [diff] [review]
remove some debug junk
Comment 105 User image Justin Lebar (not reading bugmail) 2013-04-22 13:17:50 PDT
I have a gun up to my head to finish other blockers by Wednesday.  I'll get to this when I can, but you'll probably find a more timely review elsewhere.
Comment 106 User image David Flanagan [:djf] 2013-04-22 16:19:19 PDT
Comment on attachment 740432 [details] [diff] [review]
remove some debug junk

Review of attachment 740432 [details] [diff] [review]:
-----------------------------------------------------------------

I'm taking liberty of offering r- on this patch, not because I have any idea what it does but because it does not compile.

::: content/base/src/nsDocument.cpp
@@ +8444,5 @@
> +  if (oldCount == 0) {
> +    if (mLockingImages)
> +      rv = aImage->LockImage();
> +    if (NS_SUCCEEDED(rv) && sOnloadDecodeLimit &&
> +        mLockingImages.Length() < sOnloadDecodeLimit)

This line fails to compile: 

The latest patch does not build for me:

/Users/djf/mozilla/mozilla-b2g18/content/base/src/nsDocument.cpp: In member function 'virtual nsresult nsDocument::AddImage(imgIRequest*)':
/Users/djf/mozilla/mozilla-b2g18/content/base/src/nsDocument.cpp:8448: error: request for member 'Length' in '((nsDocument*)this)->nsDocument::mLockingImages', which is of non-class type 'bool'
Comment 107 User image Andreas Gal :gal 2013-04-23 19:09:50 PDT
Created attachment 741121 [details] [diff] [review]
disable image locking on b2g, handle ongoing OOM condition better, and only eagerly decode first 24 images in a document

pinterest no longer crashes for me with this patch, and memory stays around 70MB instead of eventually sucking up all the memory. Also, initial rendering seems faster since we have fewer images competing for CPU time.
Comment 108 User image Andreas Gal :gal 2013-04-23 19:13:48 PDT
jlebar already reviewed the OOM pieces but I think jeff should in addition review the change to the image load behavior. This is disabled for anything but FFOS, so I think this is safe to land on trunk.
Comment 109 User image Andreas Gal :gal 2013-04-24 04:26:51 PDT
I made a testcase that contains 200 images on a page. This is probably a bit more reliable to test than pinterest:

http://people.mozilla.org/~gal/imagelock/
Comment 110 User image Joe Drew (not getting mail) 2013-04-24 07:45:39 PDT
Comment on attachment 741121 [details] [diff] [review]
disable image locking on b2g, handle ongoing OOM condition better, and only eagerly decode first 24 images in a document

Review of attachment 741121 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsDocument.cpp
@@ +8442,5 @@
>    // If this is the first insertion and we're locking images, lock this image
>    // too.
> +  if (oldCount == 0) {
> +    if (mLockingImages)
> +      rv = aImage->LockImage();

I want this change in general, actually, so can you propose a separate patch to do it on mozilla-central?
Comment 111 User image Andreas Gal :gal 2013-04-24 08:30:49 PDT
Sure, happy to do that. I was actually planning to land this on m-c and then uplift. Do you see a problem with applying this to m-c? It won't change behavior for Firefox just yet, but we can start playing with the new prefs (I think we should limit eager decoding on desktop too).
Comment 112 User image Andreas Gal :gal 2013-04-24 08:35:35 PDT
Created attachment 741339 [details] [diff] [review]
patch for inbound
Comment 113 User image Joe Drew (not getting mail) 2013-04-24 08:59:45 PDT
Comment on attachment 741339 [details] [diff] [review]
patch for inbound

Review of attachment 741339 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't review anything regarding the memory notifications, so if the merge wasn't trivial you might want to get jlebar to look at them.

::: b2g/app/b2g.js
@@ +272,5 @@
>  pref("image.mem.decodeondraw", true);
> +pref("content.image.allow_locking", false); /* don't allow image locking */
> +pref("image.mem.min_discard_timeout_ms", 86400000); /* 24h, we rely on the out of memory hook */
> +pref("image.mem.max_decoded_image_kb", 30000); /* 30MB seems reasonable */
> +pref("image.onload.decode.limit", 24); /* don't decode more than 24 images eagerly */

image.onload.decode.limit should be added to all.js too, initially with a value of 0. It's probably best to leave it disabled like that for now, just so we limit our scope in this bug; I'm sure mobile in particular will want a value on it too, but I just don't know what that value will be.

Post bug 847223, we won't need this on desktop; if and when bug 691169 gets fixed, it won't matter for Fennec either. (I don't see an equivalent to making bug 689623 work on B2G, but I might have just missed it.)
Comment 114 User image Andreas Gal :gal 2013-04-24 09:33:37 PDT
This needs someone to land it. Assigning to faramarz to find an owner land the patches (m-c patch needs a tryserver run) and also add the all.js pref.
Comment 115 User image [:fabrice] Fabrice Desré 2013-04-25 09:21:36 PDT
Pushed to try with the new pref in all.js:
https://tbpl.mozilla.org/?tree=Try&rev=2e5cd94c2d96
Comment 116 User image [:fabrice] Fabrice Desré 2013-04-25 12:23:20 PDT
So, we have a reftests failing because of this patch. Joe, any chance you can take a look?
Comment 117 User image Justin Lebar (not reading bugmail) 2013-04-25 12:26:14 PDT
I'd be surprised if reftests weren't failing.  I think we should just disable them, or alternatively disable this behavior during reftests.

The right fix is bug 695763.
Comment 118 User image Joe Drew (not getting mail) 2013-04-25 12:33:32 PDT
This fails reftests on desktop platforms, even, though. The code change shouldn't have modified desktop behaviour!
Comment 119 User image Joe Drew (not getting mail) 2013-04-25 12:37:51 PDT
Argh.

    2.50 +    if (NS_SUCCEEDED(rv) && sOnloadDecodeLimit &&
    2.51 +        mImageTracker.Count() < sOnloadDecodeLimit)
    2.52        rv = aImage->StartDecoding();

should be

    2.50 +    if (NS_SUCCEEDED(rv) && (!sOnloadDecodeLimit ||
    2.51 +        mImageTracker.Count() < sOnloadDecodeLimit))
    2.52        rv = aImage->StartDecoding();
Comment 120 User image Joe Drew (not getting mail) 2013-04-25 12:38:07 PDT
(but indented correctly)
Comment 121 User image David Flanagan [:djf] 2013-04-25 12:55:18 PDT
When I apply the patch (without Joe's comment 119 fix) to m-c, and try it as a fix for bug 861965, the phone reboots.
Comment 122 User image Andreas Gal :gal 2013-04-25 12:58:18 PDT
Joe's fix makes sense. I can take a look at comment #121. I didn't check that use case yet. 5mb images might exceed the 24 picture limit. I didn't find an easy way to limit total size, only count.
Comment 123 User image David Flanagan [:djf] 2013-04-25 13:40:30 PDT
With Joe's patch, I still crash, with a segfault.  Here are what look like the relevant lines from logcat, starting with the last debugging message from the gallery. At this point, the gallery app has successfully loaded and created previews for 3 5mp images, and is just starting to load the 4th image:

E/GeckoConsole( 3219): Content JS LOG at app://gallery.gaiamobile.org/js/metadata_scripts.js:1152 in createThumbnailAndPreview: createThumbnailAndPreview 0/IMG_20121006_171351.jpg
I/Gecko   ( 3075): 
I/Gecko   ( 3075): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
I/Gecko   ( 3075): 
I/Gecko   ( 3075): 
I/Gecko   ( 3075): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
I/Gecko   ( 3075): 
D/memalloc( 3075): /dev/pmem: Freeing buffer base:0x48955000 size:40960 offset:8048640 fd:137
D/memalloc( 3075): /dev/pmem: Freeing buffer base:0x4836b000 size:614400 offset:1847296 fd:127
I/Gecko   ( 3075): [Parent 3075] ###!!! ABORT: actor has been |delete|d: file /Users/djf/mozilla/builds/obj-mc-gonk/ipc/ipdl/PGrallocBufferParent.cpp, line 392
E/Gecko   ( 3075): mozalloc_abort: [Parent 3075] ###!!! ABORT: actor has been |delete|d: file /Users/djf/mozilla/builds/obj-mc-gonk/ipc/ipdl/PGrallocBufferParent.cpp, line 392
F/libc    ( 3075): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
I/Gecko   ( 3075): [Parent 3075] WARNING: waitpid failed pid:3255 errno:10: file /Users/djf/mozilla/mozilla-central/ipc/chromium/src/base/process_util_posix.cc, line 260
I/DEBUG   ( 3073): debuggerd committing suicide to free the zombie!
I/DEBUG   ( 3257): debuggerd: Apr 15 2013 10:15:46

And here's an excerpt from dmesg output:

<4>[04-25 20:30:12.421] [3219: Gallery]select 3125 (Usage), adj 6, size 5445, to kill
<4>[04-25 20:30:12.421] [3219: Gallery]send sigkill to 3125 (Usage), adj 6, size 5445
<4>[04-25 20:30:13.632] [3219: Gallery]select 3255 (plugin-containe), adj 6, size 197, to kill
<4>[04-25 20:30:13.632] [3219: Gallery]send sigkill to 3255 (plugin-containe), adj 6, size 197
<4>[04-25 20:30:15.554] [28: kswapd0]select 3133 (Homescreen), adj 4, size 4019, to kill
<4>[04-25 20:30:15.554] [28: kswapd0]send sigkill to 3133 (Homescreen), adj 4, size 4019
<4>[04-25 20:30:16.224] [3219: Gallery]select 3219 (Gallery), adj 2, size 29238, to kill
<4>[04-25 20:30:16.224] [3219: Gallery]send sigkill to 3219 (Gallery), adj 2, size 29238
Comment 124 User image Andreas Gal :gal 2013-04-25 13:48:44 PDT
Yeah, so sounds like those images are so big that the 24 image limit doesn't help. david, try setting the image decode limit to 1 and run the experiment again. If that works, I can probably make some changes to the patch.
Comment 125 User image [:fabrice] Fabrice Desré 2013-04-25 18:23:00 PDT
It looks like reftests are still failing with the fix from comment 119 :
https://tbpl.mozilla.org/?tree=Try&rev=7d43ad4a2ed7
Comment 126 User image Joe Drew (not getting mail) 2013-04-25 18:29:45 PDT
You missed the brackets around (!sOnloadDecodeLimit || mImageTracker.Count() < sOnloadDecodeLimit)
Comment 127 User image Joe Drew (not getting mail) 2013-04-25 18:39:58 PDT
Although in the very common case that NS_SUCCEEDED(rv) is true that won't actually modify whether the if body is executed...
Comment 128 User image [:fabrice] Fabrice Desré 2013-04-25 21:15:59 PDT
Not much better with the https://tbpl.mozilla.org/?tree=Try&rev=7b69b561e4d6

Joe, can you check the tests locally? I'm playing blind there.
Comment 129 User image Joe Drew (not getting mail) 2013-04-26 07:07:23 PDT
Yeah, doing so now.
Comment 130 User image Andreas Gal :gal 2013-04-26 07:32:17 PDT
This is no longer a good use of fabrice time. Sorry for giving you a faulty patch to land fabrice. Joe, want to grab this? Of all of us here you are definitely most qualified.
Comment 131 User image Joe Drew (not getting mail) 2013-04-26 10:17:22 PDT
Hooray, I can finally reproduce. Debugging!
Comment 132 User image Joe Drew (not getting mail) 2013-04-26 11:03:40 PDT
The below patch seems to fix it. Hooray! Trying: https://tbpl.mozilla.org/?tree=Try&rev=0fb01134417c

diff --git a/content/base/src/nsDocument.cpp b/content/base/src/nsDocument.cpp
--- a/content/base/src/nsDocument.cpp
+++ b/content/base/src/nsDocument.cpp
@@ -8895,18 +8895,18 @@ nsDocument::AddImage(imgIRequest* aImage
 
   nsresult rv = NS_OK;
 
   // If this is the first insertion and we're locking images, lock this image
   // too.
   if (oldCount == 0) {
     if (mLockingImages)
       rv = aImage->LockImage();
-    if (NS_SUCCEEDED(rv) && (sOnloadDecodeLimit || 
-        mImageTracker.Count() < sOnloadDecodeLimit))
+    if (NS_SUCCEEDED(rv) && (!sOnloadDecodeLimit ||
+                             mImageTracker.Count() < sOnloadDecodeLimit))
       rv = aImage->StartDecoding();
   }
 
   // If this is the first insertion and we're animating images, request
   // that this image be animated too.
   if (oldCount == 0 && mAnimatingImages) {
     nsresult rv2 = aImage->IncrementAnimationConsumers();
     rv = NS_SUCCEEDED(rv) ? rv2 : rv;
Comment 133 User image Joe Drew (not getting mail) 2013-04-26 11:04:03 PDT
Created attachment 742468 [details] [diff] [review]
patch for inbound
Comment 134 User image Andreas Gal :gal 2013-04-26 11:41:40 PDT
I will test on device after you land on inbound. Thanks!
Comment 135 User image Joe Drew (not getting mail) 2013-04-26 13:59:44 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe381d10084f
Comment 136 User image Ryan VanderMeulen [:RyanVM] 2013-04-27 18:34:05 PDT
https://hg.mozilla.org/mozilla-central/rev/fe381d10084f
Comment 137 User image Andreas Gal :gal 2013-04-28 15:41:23 PDT
This still needs to be uplifted to b2g18.
Comment 138 User image Joe Drew (not getting mail) 2013-04-28 18:26:21 PDT
How does one do such a thing? I haven't had to in the past.
Comment 139 User image Joe Drew (not getting mail) 2013-04-29 07:14:57 PDT
Created attachment 743053 [details] [diff] [review]
b2g18 patch
Comment 140 User image Ryan VanderMeulen [:RyanVM] 2013-04-29 08:41:23 PDT
Your friendly neighborhood merge maven takes care of the B2G uplifts ;). Thanks for the branch patch, though. Makes the job easier!

https://hg.mozilla.org/releases/mozilla-b2g18/rev/61b65b06cba0
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/10772f6fb11b
Comment 141 User image Jason Smith [:jsmith] 2013-04-29 19:32:33 PDT
See https://bugzilla.mozilla.org/show_bug.cgi?id=860818#c45 for the QA analysis study here. Good news is that the original bug filed about "failing to load" Pinterest actually was a wfm, but the issue about stacking too many images to be decoded still occurs with and without this patch.

I'm waiting for Justin's input on what's the right followup to take here.

Marking verified to indicate this was tested though.
Comment 142 User image Justin Lebar (not reading bugmail) 2013-04-29 21:49:38 PDT
Assuming that the relevant process even receives the memory pressure event before it OOMs (this is easy to check in logcat), it sounds like we should consider dropping some or all ongoing decodes on memory pressure, in addition to everything else we're doing.  I know we talked about this at Madrid...
Comment 143 User image Justin Lebar (not reading bugmail) 2013-04-30 11:03:28 PDT
^ This is bug 867264 now.
Comment 144 User image Jason Smith [:jsmith] 2013-05-06 18:06:26 PDT
(In reply to Andreas Gal :gal from comment #2)
> we have to check that bug 807143 doesn't reappear with this fix

Bad news. I think bug 807143 did reappear. See bug 868066.

Note You need to log in before you can comment on or make changes to this bug.