Provide a hard limit for decoded image buffer

RESOLVED FIXED in mozilla31

Status

()

Core
ImageLib
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: schien, Assigned: schien)

Tracking

unspecified
mozilla31
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 10 obsolete attachments)

24.38 KB, image/png
Details
12.65 KB, patch
schien
: review+
Details | Diff | Splinter Review
56.66 KB, patch
schien
: review+
Details | Diff | Splinter Review
Created attachment 757138 [details]
Possible memory usage while all background apps are killed by mem killer

For a device had very limited memory, we could provide a hard limit of decoded image buffer. Bug 860818 and Bug 867264 show that oom killer might wake up and kill the foreground app if the allocation of image buffer is faster than discarding by lowmem killer (since we only have 5MB margin between lowmem and oom killer for foreground process).
The hard limit can be calculated per process, it'll act like a global limitation since all the background process will be killed first under low memory scenario. It should be in preference, so that device manufacturers can adjust the value along with the prefs of lowmem/oom/DiscardTracker based on the hardware capability of each device.

The attachment shows the memory usage of a scenario that a giant foreground app consumed all available memory. Normally the memory used other than image buffer is less than 40MB, so I think 40MB is a reasonable hard limit on 256MB handset.
Created attachment 8347988 [details] [diff] [review]
setup hard limit for decoded image buffer size

Setup a hard limit for decoded image buffer size can prevent content process kill itself by displaying too much image at the same time. Displaying a image placeholder might be UX-friendly for user than process crash.

Here is the test case:
http://people.mozilla.org/~schien/image_memory_presure.html
Unagi with this patch can survive without killing the content process.
Assignee: nobody → schien
Attachment #8347988 - Flags: feedback?(seth)
Attachment #8347988 - Flags: feedback?(khuey)
Comment on attachment 8347988 [details] [diff] [review]
setup hard limit for decoded image buffer size

Review of attachment 8347988 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! I like where you're going with this. I'm marking this f- because I think some details need to be reworked, but the general approach is great.

::: image/src/DiscardTracker.cpp
@@ +148,5 @@
> +  if (!sHardLimitDecodedImageKB) {
> +    return true;
> +  }
> +
> +  PR_Lock(sAllocationLock);

This approach is not thread-safe, because you do not check whether we're under the hard memory limit and "allocate" the memory atomically. (I realize that we presently only allocate imgFrame memory on the main thread, but I don't think this is a safe assumption for the future.)

I'd recommend going ahead and updating sCurrentDecodedImageBytes inside the lock if this function would return true. The caller should then be responsible for calling InformAllocation with a negative value to *remove* the allocated bytes if it later fails to successfully allocate the surface.

You'll need to modify the control flow in imgFrame::Init to make this work, but it would make me much more comfortable with this design.

@@ +150,5 @@
> +  }
> +
> +  PR_Lock(sAllocationLock);
> +  bool enoughSpace =
> +    (sHardLimitDecodedImageKB * 1024) >= (sCurrentDecodedImageBytes + bytes);

Regarding this expression, I'd prefer "(sHardLimitDecodedImageKB * 1024 - sCurrentDecodedImageBytes) >= bytes" to reduce the risk of overflow. That's probably be made more readable by separating it into two statements.

> int64_t availableBytes = sHardLimitDecodedImageKB * 1024 - sCurrentDecodedImageBytes;
> bool enoughSpace = availableBytes >= bytes;
Attachment #8347988 - Flags: feedback?(seth) → feedback-
Created attachment 8351282 [details] [diff] [review]
setup hard limit for decoded image buffer size

update patch according to review comment.
Attachment #8347988 - Attachment is obsolete: true
Attachment #8347988 - Flags: feedback?(khuey)
Attachment #8351282 - Flags: review?(seth)
Attachment #8351282 - Flags: feedback?(khuey)
Comment on attachment 8351282 [details] [diff] [review]
setup hard limit for decoded image buffer size

Review of attachment 8351282 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/DiscardTracker.cpp
@@ +139,5 @@
>    DisableTimer();
>  }
>  
> +/* static */ bool
> +DiscardTracker::TryAllocation(int64_t bytes)

aBytes is Gecko style

@@ +161,5 @@
> +  return enoughSpace;
> +}
> +
> +/* static */ void
> +DiscardTracker::InformDeallocation(int64_t bytes)

here too.

@@ +173,1 @@
>    MOZ_ASSERT(sCurrentDecodedImageBytes >= 0);

After we change the signedness this should just become an assertion at the beginning of this function that aBytes <= sCurrentDecodedImageBytes

::: image/src/DiscardTracker.h
@@ +81,5 @@
>       */
>      static void DiscardAll();
>  
>      /**
> +     * Inform the discard tracker that we are going to allocat some memory

allocate

@@ +84,5 @@
>      /**
> +     * Inform the discard tracker that we are going to allocat some memory
> +     * for a decoded image. We use this to determine when we've allocated
> +     * too much memory and should discard some images.  This function can
> +     * be called from any thread and is thread-safe.

Also note that if this function succeeds the caller is now responsible for ensuring that InformDeallocation is called.

@@ +89,2 @@
>       */
> +    static bool TryAllocation(int64_t bytes);

These functions should both take unsigned integers.

@@ +122,5 @@
>      static nsCOMPtr<nsITimer> sTimer;
>      static bool sInitialized;
>      static bool sTimerOn;
>      static mozilla::Atomic<int32_t> sDiscardRunnablePending;
>      static int64_t sCurrentDecodedImageBytes;

And sCurrentDecodedImageBytes should become unsigned too.
Attachment #8351282 - Flags: feedback?(khuey) → feedback+
Comment on attachment 8351282 [details] [diff] [review]
setup hard limit for decoded image buffer size

Review of attachment 8351282 [details] [diff] [review]:
-----------------------------------------------------------------

Looks much better! Almost there.

I agreed with everything khuey said, as well. Please rerequest review when you've addressed both of our concerns.

::: b2g/app/b2g.js
@@ +302,5 @@
>  pref("image.mem.decodeondraw", true);
>  pref("image.mem.allow_locking_in_content_processes", false); /* don't allow image locking */
>  pref("image.mem.min_discard_timeout_ms", 86400000); /* 24h, we rely on the out of memory hook */
>  pref("image.mem.max_decoded_image_kb", 30000); /* 30MB seems reasonable */
> +pref("image.mem.hard_limit_decoded_image_kb", 60000); /* 60MB seemes reasonable */

"seems"

::: image/src/DiscardTracker.cpp
@@ +147,5 @@
> +  PR_Lock(sAllocationLock);
> +  bool enoughSpace =
> +    (!sHardLimitDecodedImageKB) ? true
> +                                : (sHardLimitDecodedImageKB << 10 - sCurrentDecodedImageBytes)
> +                                    >= bytes;

This means the same thing as:

!sHardLimitDecodedImageKB || (sHardLimitDecodedImageKB << 10) - sCurrentDecodedImageBytes >= bytes

which I'd prefer.

Also, "+" and "-" have higher precedence than "<<" and ">>", IIRC, so I think you need parens around the left shift expression. I've corrected that in my suggested replacement. What might be even better is to compute it in a separate statement.

Also note that we use "* 1024" rather than "<< 10" all over the place in this file, and it's probably better to be consistent.

::: image/src/imgFrame.cpp
@@ +173,5 @@
>      if (!mPalettedImageData)
>        NS_WARNING("moz_malloc for paletted image data should succeed");
>      NS_ENSURE_TRUE(mPalettedImageData, NS_ERROR_OUT_OF_MEMORY);
>    } else {
> +    // Inform the discard tracker that we are going to allocated some memory.

"allocate"

::: modules/libpref/src/init/all.js
@@ +4012,5 @@
>  // The maximum amount of decoded image data we'll willingly keep around (we
>  // might keep around more than this, but we'll try to get down to this value).
>  pref("image.mem.max_decoded_image_kb", 51200);
>  
> +// Hard limit for the amount of decoded image data, 0 meas we don't have the

"means"
Attachment #8351282 - Flags: review?(seth) → review-
Created attachment 8362412 [details] [diff] [review]
setup hard limit for decoded image buffer size

Update according to review comments.
Attachment #8351282 - Attachment is obsolete: true
Attachment #8362412 - Flags: review?(seth)
Attachment #8362412 - Flags: review?(khuey)
I found it's pretty hard to test a preference that only loads at start-up. I'm trying to test it by changing the pref and opening a remote browser that runs the actual test code. Do you guys have any suggestion on how we can test it?
How are we going to maintain this hardcoded limit of 60MB?  Should it be a percentage of the total size, rather than a fixed value?  Is this going to limit our performance on better phones?
(In reply to Shih-Chiang Chien [:schien] from comment #7)
> I found it's pretty hard to test a preference that only loads at start-up.
> I'm trying to test it by changing the pref and opening a remote browser that
> runs the actual test code. Do you guys have any suggestion on how we can
> test it?

It's not really possible to test preferences that are not "live".

(In reply to Milan Sreckovic [:milan] from comment #8)
> How are we going to maintain this hardcoded limit of 60MB?  Should it be a
> percentage of the total size, rather than a fixed value?  Is this going to
> limit our performance on better phones?

OEMs will set the pref based on how much memory the device has, presumably.
Comment on attachment 8362412 [details] [diff] [review]
setup hard limit for decoded image buffer size

Review of attachment 8362412 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not a peer, but it looks good to me!
Attachment #8362412 - Flags: review?(khuey) → feedback+
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #9)
> (In reply to Milan Sreckovic [:milan] from comment #8)
> > How are we going to maintain this hardcoded limit of 60MB?  Should it be a
> > percentage of the total size, rather than a fixed value?  Is this going to
> > limit our performance on better phones?
> 
> OEMs will set the pref based on how much memory the device has, presumably.

Yes, OEMs might fine tune this configuration during their internal QC process. We should maintain a wiki for this kind of device-specific preferences.
It's nice to provide a heuristic of guessing the hard limit instead of a hard-coded one, but I think we should file a follow-up bug for it without blocking the entire feature.
With this patch, Gecko will not abort the image decoding while exceeding the buffer limit. We should be able to suspend the image decoding and resume it after we get enough buffer. I think it should be also a follow-up bug for improving the user experience.
(In reply to Shih-Chiang Chien [:schien] from comment #12)
> With this patch, Gecko will not abort the image decoding while exceeding the
> buffer limit. We should be able to suspend the image decoding and resume it
> after we get enough buffer. I think it should be also a follow-up bug for
> improving the user experience.

That's a good idea! Worth filing.
Comment on attachment 8362412 [details] [diff] [review]
setup hard limit for decoded image buffer size

Review of attachment 8362412 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8362412 - Flags: review?(seth) → review+
One reftest on emulator is broken by this patch because we exceed the image buffer limit. http://dxr.mozilla.org/mozilla-central/source/layout/reftests/bugs/370629-1.html embeded a 4000*4000 png which is slightly exceed the 60000KB buffer limit. I think we should disable the buffer limit in our test environment in general. We can do this by setting up a user preference in http://dxr.mozilla.org/mozilla-central/source/testing/profiles/prefs_b2g_unittest.js. In that case, I'll try to add some test cases for the hard limit even if we need to some hacky things in the testcase.
(In reply to Seth Fowler [:seth] from comment #14)
> (In reply to Shih-Chiang Chien [:schien] from comment #12)
> > With this patch, Gecko will not abort the image decoding while exceeding the
> > buffer limit. We should be able to suspend the image decoding and resume it
> > after we get enough buffer. I think it should be also a follow-up bug for
> > improving the user experience.
> 
> That's a good idea! Worth filing.

I've filed Bug 962384 for it.
(In reply to Shih-Chiang Chien [:schien] from comment #16)
> One reftest on emulator is broken by this patch because we exceed the image
> buffer limit.
> http://dxr.mozilla.org/mozilla-central/source/layout/reftests/bugs/370629-1.
> html embeded a 4000*4000 png which is slightly exceed the 60000KB buffer
> limit. I think we should disable the buffer limit in our test environment in
> general. We can do this by setting up a user preference in
> http://dxr.mozilla.org/mozilla-central/source/testing/profiles/
> prefs_b2g_unittest.js. In that case, I'll try to add some test cases for the
> hard limit even if we need to some hacky things in the testcase.

Not testing what we run seems wrong. Changing the limit to account for the extra size (you'd need <62000KB?) seems like a much better approach.
(In reply to Milan Sreckovic [:milan] from comment #18)
> Not testing what we run seems wrong. Changing the limit to account for the
> extra size (you'd need <62000KB?) seems like a much better approach.
I tried increasing the hard limit for Firefox OS, however, it still showed intermittent orange on try. My guess is gecko cannot guarantee all the image buffers are released before starting a test case.
Created attachment 8366595 [details] [diff] [review]
Part 2 - testcase.patch

Create a content process with buffer size limit and use mozbrowser API as the IPC channel to verdict the result.
Attachment #8366595 - Flags: feedback?(seth)
Attachment #8366595 - Flags: feedback?(khuey)
(In reply to Shih-Chiang Chien [:schien] from comment #19)
> (In reply to Milan Sreckovic [:milan] from comment #18)
> > Not testing what we run seems wrong. Changing the limit to account for the
> > extra size (you'd need <62000KB?) seems like a much better approach.
> I tried increasing the hard limit for Firefox OS, however, it still showed
> intermittent orange on try. My guess is gecko cannot guarantee all the image
> buffers are released before starting a test case.

If we can't figure out a solution for this, in my opinion we'd be better off marking that one test random for affected platforms.

I concur with Milan that I'd prefer the testing environment mirror the actual runtime environment as closely as possible.
Comment on attachment 8366595 [details] [diff] [review]
Part 2 - testcase.patch

Review of attachment 8366595 [details] [diff] [review]:
-----------------------------------------------------------------

Very nice!

::: image/test/mochitest/file_image_buffer_limit.html
@@ +69,5 @@
> +      document.body.removeChild(img_8m);
> +      document.body.removeChild(another_img_8M);
> +
> +      // Spin the event loop a few times to give the image in the display:none
> +      // iframe a chance to be discarded.

This is the only part that makes me a bit nervous. It seems like a recipe for intermittent oranges.

Is one spin definitely not enough?

If not, I'd prefer to just do this (in pseudocode):

while(true) {
  spinEventLoop();
  retryImageLoad();
}

This way we retry an infinite number of times until we succeed. This is more robust than hard-coding either a timeout or a set number of event loop spins. We already use this approach in a lot of the layout tests that have to deal with nondeterminism, so there's plenty of precedent.

::: testing/profiles/prefs_b2g_unittest.js
@@ +7,5 @@
>  user_pref("dom.ipc.browser_frames.oop_by_default", false);
>  user_pref("dom.mozBrowserFramesWhitelist","app://test-container.gaiamobile.org,http://mochi.test:8888");
>  user_pref("marionette.force-local", true);
>  user_pref("dom.testing.datastore_enabled_for_hosted_apps", true);
> +user_pref("image.mem.hard_limit_decoded_image_kb", 0);

I don't understand why this line is in the patch. Maybe I'm just missing something?
Attachment #8366595 - Flags: feedback?(seth) → feedback+
(In reply to Seth Fowler [:seth] from comment #22)
> Comment on attachment 8366595 [details] [diff] [review]
> Part 2 - testcase.patch
> 
> Review of attachment 8366595 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Very nice!
> 
> ::: image/test/mochitest/file_image_buffer_limit.html
> @@ +69,5 @@
> > +      document.body.removeChild(img_8m);
> > +      document.body.removeChild(another_img_8M);
> > +
> > +      // Spin the event loop a few times to give the image in the display:none
> > +      // iframe a chance to be discarded.
> 
> This is the only part that makes me a bit nervous. It seems like a recipe
> for intermittent oranges.
> 
> Is one spin definitely not enough?
I just copy these code from image/test/mochitest/test_drawDiscardedImage.html. One spin works on my local machine and I'll push it to try server to see if we can have a stable green.
> 
> If not, I'd prefer to just do this (in pseudocode):
> 
> while(true) {
>   spinEventLoop();
>   retryImageLoad();
> }
> 
> This way we retry an infinite number of times until we succeed. This is more
> robust than hard-coding either a timeout or a set number of event loop
> spins. We already use this approach in a lot of the layout tests that have
> to deal with nondeterminism, so there's plenty of precedent.
> 
> ::: testing/profiles/prefs_b2g_unittest.js
> @@ +7,5 @@
> >  user_pref("dom.ipc.browser_frames.oop_by_default", false);
> >  user_pref("dom.mozBrowserFramesWhitelist","app://test-container.gaiamobile.org,http://mochi.test:8888");
> >  user_pref("marionette.force-local", true);
> >  user_pref("dom.testing.datastore_enabled_for_hosted_apps", true);
> > +user_pref("image.mem.hard_limit_decoded_image_kb", 0);
> 
> I don't understand why this line is in the patch. Maybe I'm just missing
> something?
This line is for turning off hard limit in emulator mochitest/reftest. I'll remove this line because we already have a consensus on testing with hard limit. I did an experiment today and found we need 62M of decoded buffer to pass the reftest.
(In reply to Shih-Chiang Chien [:schien] from comment #23)
> I just copy these code from
> image/test/mochitest/test_drawDiscardedImage.html. One spin works on my
> local machine and I'll push it to try server to see if we can have a stable
> green.

Sounds great! One spin is the ideal solution.

> I did an experiment today and found we need 62M of decoded buffer to
> pass the reftest.

Excellent. It'd probably be a good idea to add a comment for the user_pref stating that the layout/reftests/bugs/370629-1.html is the limiting factor and you experimentally determined that 62M was enough.
Comment on attachment 8366595 [details] [diff] [review]
Part 2 - testcase.patch

I think seth's review is sufficent here.
Attachment #8366595 - Flags: feedback?(khuey)
Unagi with 256MB RAM can survive with 65MB decoded buffer limit. I'll use this value for now and file a follow-up bug for the smart hard limit suggested by @milan in comment #8.
Created attachment 8371314 [details] [diff] [review]
Part 1 - setup hard limit for decoded image bufer size

Set buffer size limit to 65M on Firefox OS.
Attachment #8362412 - Attachment is obsolete: true
Attachment #8371314 - Flags: review?(seth)
Created attachment 8371317 [details] [diff] [review]
Part 2 - testcase.patch

Directly test the buffer limit on b2g emulator. It tooks me some time to find out imgFrame is not locked on B2G and It will cause imgFrame create/release repeatitively for the 10M-pixel image.

try result: https://tbpl.mozilla.org/?tree=Try&rev=33026c682e03
Attachment #8366595 - Attachment is obsolete: true
Attachment #8371317 - Flags: review?(seth)
Created attachment 8372105 [details] [diff] [review]
Part 2 - testcase.patch

I forgot that TBPL doesn't include "image" folder on b2g-emulator.

https://tbpl.mozilla.org/?tree=Try&rev=76ccbe398f59
Attachment #8371317 - Attachment is obsolete: true
Attachment #8371317 - Flags: review?(seth)
Attachment #8372105 - Flags: review?(seth)
Comment on attachment 8371314 [details] [diff] [review]
Part 1 - setup hard limit for decoded image bufer size

Review of attachment 8371314 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.
Attachment #8371314 - Flags: review?(seth) → review+
Comment on attachment 8372105 [details] [diff] [review]
Part 2 - testcase.patch

Review of attachment 8372105 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good!

Still seems like there's a failure on try, though. I'm going to assume it's a trivial issue and r+ this. If you have to make big changes, please rerequest review.
Attachment #8372105 - Flags: review?(seth) → review+
This is the failure I'm talking about, on B2G ICS Emulator Opt, from the try job in comment 29:

> 621 ERROR TEST-UNEXPECTED-FAIL | /tests/image/test/mochitest/test_image_buffer_limit.html | should fail to load due to image buffer size limit
Created attachment 8373020 [details] [diff] [review]
Part 2 - testcase.patch, v2

Loading a single image exceeding the "image.mem.max_decoded_image_kb" (30000KB in b2g) will cause all the image been discarded at next DiscardNow() cycle [1]. If the second huge image decoding task comes later than DiscardNow(), it'll have a chance to allocate the requested decoded buffer and cause the intermittent failure. 
I workaround this problem by loading a smaller image as the first image, so the DiscardNow() won't be activate after the first image for sure.

We can observe the images are blinking in http://people.mozilla.org/~schien/image_memory_presure.html on B2G because the imgFrames are continuously been discarded and recreated. We could fix the discarding algorithm in a separate bug. @seth, how do you think?

try result: https://tbpl.mozilla.org/?tree=Try&rev=c3a4eea422fe

[1] http://dxr.mozilla.org/mozilla-central/source/image/src/DiscardTracker.cpp?from=DiscardTracker.cpp#277
Attachment #8372105 - Attachment is obsolete: true
Attachment #8373020 - Flags: review?(seth)
@seth, any feedback for comment #33?
Flags: needinfo?(seth)
Sorry for the slow response time! I got backed up and I'm digging my way out now.

(In reply to Shih-Chiang Chien [:schien] (UTC+8) from comment #33)
> I workaround this problem by loading a smaller image as the first image, so
> the DiscardNow() won't be activate after the first image for sure.

Yeah, that makes sense. Sounds like a good approach.

> We can observe the images are blinking in
> http://people.mozilla.org/~schien/image_memory_presure.html on B2G because
> the imgFrames are continuously been discarded and recreated. We could fix
> the discarding algorithm in a separate bug. @seth, how do you think?

We definitely do not want that behavior, as that sounds like a battery killer in addition to being a poor experience for the user. Possibly we should have an exponential backoff on redecoding discarded images? I agree that we should save that for a separate bug, though; please CC me on any related bugs you create.
Flags: needinfo?(seth)
Comment on attachment 8373020 [details] [diff] [review]
Part 2 - testcase.patch, v2

Review of attachment 8373020 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8373020 - Flags: review?(seth) → review+
Created attachment 8403734 [details] [diff] [review]
Part 1 - hard-limit-for-discard-tracker.patch

rebase to latest m-c, carry r+.
Attachment #8371314 - Attachment is obsolete: true
Attachment #8403734 - Flags: review+
Created attachment 8403739 [details] [diff] [review]
Part 2 - testcase.patch

rebase to latest m-c, carry r+.

try result: https://tbpl.mozilla.org/?tree=Try&rev=f3b11007343d
Attachment #8373020 - Attachment is obsolete: true
Attachment #8403739 - Flags: review+
Looks good! Ready to land.
Keywords: checkin-needed

Updated

4 years ago
Keywords: checkin-needed
(This failure occurred on 3 out of the 5 try jobs of the same type)
Hmm...It's weird that enabling |test_drawDiscardedImage.html| on b2g emulator will cause the connection loss. I'll find out the root cause and post a new patch later this day.
Bug 997034 is filed for the test timeout mentioned in comment #41.
Created attachment 8408990 [details] [diff] [review]
Part 1 - hard-limit-for-discard-tracker.patch

rebase to latest m-c, carry r+.
Attachment #8403734 - Attachment is obsolete: true
Attachment #8408990 - Flags: review+
Created attachment 8408991 [details] [diff] [review]
Part 2 - testcase.patch

rebase to latest m-c, carry r+.

try result is stable: https://tbpl.mozilla.org/?tree=Try&rev=db08b202d308
Attachment #8403739 - Attachment is obsolete: true
Attachment #8408991 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2fcd144a6282
https://hg.mozilla.org/mozilla-central/rev/dc2154549c66
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31

Updated

4 years ago
Depends on: 1008646
You need to log in before you can comment on or make changes to this bug.