Last Comment Bug 664659 - All image surfaces are counted on the heap, even though some don't live there. This can cause about:memory's heap-unclassified number to go negative.
: All image surfaces are counted on the heap, even though some don't live there...
Status: RESOLVED FIXED
[MemShrink:P2][inbound]
:
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: unspecified
: x86_64 Linux
: -- normal with 1 vote (vote)
: mozilla8
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
Depends on: 671280
Blocks: 563701
  Show dependency treegraph
 
Reported: 2011-06-15 21:57 PDT by Nicholas Nethercote [:njn]
Modified: 2011-08-05 08:59 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Part 1, v1 (1.04 KB, patch)
2011-07-11 15:52 PDT, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
Part 2, v1 (7.88 KB, patch)
2011-07-11 15:54 PDT, Justin Lebar (not reading bugmail)
n.nethercote: review+
justin.lebar+bug: review-
Details | Diff | Splinter Review
Windows about:memory before/after discard (8.01 KB, text/plain)
2011-07-11 18:21 PDT, Justin Lebar (not reading bugmail)
no flags Details
Part 2, v2 (10.39 KB, patch)
2011-07-12 06:56 PDT, Justin Lebar (not reading bugmail)
no flags Details | Diff | Splinter Review
Rolled up patch v3 (28.50 KB, patch)
2011-07-12 13:39 PDT, Justin Lebar (not reading bugmail)
n.nethercote: review+
Details | Diff | Splinter Review
Patch v4 (27.65 KB, patch)
2011-07-13 10:27 PDT, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
KIND_MAPPED comment change, v1 (1.65 KB, patch)
2011-07-15 15:34 PDT, Justin Lebar (not reading bugmail)
n.nethercote: review+
Details | Diff | Splinter Review
Patch v5 (27.72 KB, patch)
2011-07-15 15:47 PDT, Justin Lebar (not reading bugmail)
n.nethercote: feedback+
Details | Diff | Splinter Review
Fix for Mac, option (3) from comment 50 (1.41 KB, patch)
2011-07-25 11:54 PDT, Justin Lebar (not reading bugmail)
joe: review-
Details | Diff | Splinter Review
Fix for Mac, v2 (2.26 KB, patch)
2011-07-28 14:29 PDT, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review
Fix for Linux (8.45 KB, patch)
2011-08-01 14:13 PDT, Justin Lebar (not reading bugmail)
joe: review+
Details | Diff | Splinter Review

Description Nicholas Nethercote [:njn] 2011-06-15 21:57:04 PDT
Attachment 539718 [details] has this (edited):

548,421,696 B (100.0%) -- explicit
├──483,880,622 B (88.23%) -- images
│  ├──482,935,630 B (88.06%) -- content
│  │  ├──482,935,630 B (88.06%) -- used
│  │  │  ├──407,744,396 B (74.35%) -- uncompressed
│  │  │  └───75,191,234 B (13.71%) -- raw
│  │  └────────────0 B (00.00%) -- unused
│  │               ├──0 B (00.00%) -- raw
│  │               └──0 B (00.00%) -- uncompressed
│  └──────944,992 B (00.17%) -- chrome
│         ├──944,992 B (00.17%) -- used
│         │  ├──944,992 B (00.17%) -- uncompressed
│         │  └────────0 B (00.00%) -- raw
│         └────────0 B (00.00%) -- unused
│                  ├──0 B (00.00%) -- raw
│                  └──0 B (00.00%) -- uncompressed
├──231,311,348 B (42.18%) -- js
├───38,942,768 B (07.10%) -- storage
├────8,249,005 B (01.50%) -- layout
└──-213,962,047 B (-39.01%) -- heap-unclassified

Other Measurements
1,423,396,864 B -- vsize
  734,003,200 B -- heap-committed
  593,592,320 B -- resident
  496,779,328 B -- heap-used
  237,223,006 B -- heap-unused
    2,502,656 B -- heap-dirty
    1,758,176 B -- canvas-2d-pixel-bytes
       82,144 B -- gfx-surface-image

The images total is almost as big as the heap-used total.  This is probably because:

(a) some of those images are not actually stored on the heap (ie. in memory allocated with malloc/new)

(b) there's just a bug of some kind in the image reporters.
Comment 1 Boris Zbarsky [:bz] 2011-06-15 22:29:26 PDT
Are we hitting the size=0 case in imgFrame::EstimateMemoryUsed here perchance?
Comment 2 Justin Lebar (not reading bugmail) 2011-06-20 08:13:50 PDT
(In reply to comment #1)
> Are we hitting the size=0 case in imgFrame::EstimateMemoryUsed here
> perchance?

Yes.  It the mOptSurface branch, but the reported size is 0, so it falls through to the if(size == 0) branch.
Comment 3 Justin Lebar (not reading bugmail) 2011-06-20 10:09:02 PDT
Maybe we're double-counting somewhere.  The heap-unclassified value is pretty close to half the uncompressed image value.
Comment 4 Nicholas Nethercote [:njn] 2011-06-20 18:07:43 PDT
(In reply to comment #2)
> > Are we hitting the size=0 case in imgFrame::EstimateMemoryUsed here
> > perchance?
> 
> Yes.  It the mOptSurface branch, but the reported size is 0, so it falls
> through to the if(size == 0) branch.

(In reply to comment #3)
> Maybe we're double-counting somewhere.  The heap-unclassified value is
> pretty close to half the uncompressed image value.

I got the impression from bz's comment 2 that the size=0 case overestimates.  Do we hit that case a lot?
Comment 5 Justin Lebar (not reading bugmail) 2011-06-20 18:11:59 PDT
(In reply to comment #4)
> I got the impression from bz's comment 2 that the size=0 case overestimates.
> Do we hit that case a lot?

I don't know the code, but I think it may overestimate on Linux in the sense that the image data is not stored within the application's memory at all, and the correct value to report is 0.  See bug 664642 comment 18.
Comment 6 Boris Zbarsky [:bz] 2011-06-20 20:43:43 PDT
> Yes.  It the mOptSurface branch, but the reported size is 0

Ah, indeed.  I bet the same thing can happen on Windows in the mWinSurface case, actually, because for non-image surfaces (which both mWinSurface and Linux mOptSurface are, at least) KnownMemoryUsed() always returns 0.

I have no idea how much memory mWinSurface actually uses or which address space it lives in, nor what mOptSurface looks like on non-Linux, but I'm pretty sure that on Linux if mOptSurface is not an image surface that means that the image actually lives in the X server's memory...
Comment 7 Andreas Jung 2011-07-07 13:22:11 PDT
I coincidentally encountered this bug.
The only thing I did to reproduce the bug was opening a page with an extreme high number of images.

The page I can reliably reproduce on is:
http://bbs.kyohk.net/thread-380828-1-1.html

WARNING: Opening this page will require a lot of RAM and will possibly slow down Firefox so that it becomes almost unusable. Save all unsaved data before opening the page.

The numbers I got are:

Explicit Allocations
555.60 MB (100.0%) -- explicit
├──1,001.28 MB (180.22%) -- images
│  ├──1,000.83 MB (180.13%) -- content
│  │  ├──1,000.82 MB (180.13%) -- used
│  │  │  ├────920.54 MB (165.68%) -- uncompressed
│  │  │  └─────80.28 MB (14.45%) -- raw
│  │  └──────0.01 MB (00.00%) -- (1 omitted)
│  └──────0.45 MB (00.08%) -- (1 omitted)
├──253.20 MB (45.57%) -- js
│  ├───71.60 MB (12.89%) -- compartment([System Principal])
│  │   ├──31.49 MB (05.67%) -- gc-heap
│  │   │  ├──15.00 MB (02.70%) -- objects
│  │   │  ├───8.81 MB (01.59%) -- arena-unused
│  │   │  ├───6.08 MB (01.09%) -- shapes
│  │   │  └───1.60 MB (00.29%) -- (4 omitted)
│  │   ├──26.69 MB (04.80%) -- mjit-code
│  │   ├───6.08 MB (01.09%) -- (4 omitted)
│  │   ├───3.96 MB (00.71%) -- object-slots
│  │   └───3.37 MB (00.61%) -- scripts
│  ├───22.54 MB (04.06%) -- gc-heap-chunk-unused

[snip...]

│  ├────2.97 MB (00.54%) -- compartment(http://bbs.kyohk.net/thread-380828-1-1.h...)
│  │    └──2.97 MB (00.54%) -- (8 omitted)
│  └────2.79 MB (00.50%) -- compartment(http://s7.addthis.com/static/r07/sh45.ht...)
│       └──2.79 MB (00.50%) -- (8 omitted)
├───40.55 MB (07.30%) -- storage
│   └──40.55 MB (07.30%) -- sqlite
│      ├──28.38 MB (05.11%) -- places.sqlite
│      │  ├──28.02 MB (05.04%) -- cache-used
│      │  └───0.36 MB (00.06%) -- (2 omitted)
│      ├───8.16 MB (01.47%) -- urlclassifier3.sqlite
│      │   ├──8.07 MB (01.45%) -- cache-used
│      │   └──0.08 MB (00.01%) -- (2 omitted)
│      └───4.01 MB (00.72%) -- (12 omitted)
├───10.04 MB (01.81%) -- layout
│   ├──10.04 MB (01.81%) -- all
│   └───0.00 MB (00.00%) -- (1 omitted)
└── -749.46 MB (-134.89%) -- (2 omitted) -> xpti-working-set + heap-unclassified

Other Measurements
1,506.64 MB -- vsize
1,503.13 MB -- private
  923.26 MB -- gfx-surface-win32
  742.61 MB -- resident
  522.48 MB -- heap-committed
  497.55 MB -- heap-used
  107.00 MB -- js-gc-heap
   69.45 MB -- heap-unused
    2.65 MB -- heap-dirty
    2.42 MB -- shmem-allocated
    2.42 MB -- shmem-mapped
    1.35 MB -- canvas-2d-pixel-bytes
    0.01 MB -- gfx-surface-image
    0.00 MB -- gfx-d2d-surfacecache
    0.00 MB -- gfx-d2d-surfacevram

This was on:
Mozilla/5.0 (Windows NT 5.1; rv:8.0a1) Gecko/20110706 Firefox/8.0a1
Windows XP SP3

Platform should therefore be set to "All" or (if you think the fix for Linux and Windows is different) be split in one Windows and one Linux bug.


PS: It is also worth noting that on the above site Firefox uses over 1 GB of memory, but Internet Explorer 8 uses only about 750 MB. Also, Firefox becomes basically unusable (until you switch tabs and Firefox frees the memory) while IE stays responsive.
Is there a bug where I can add a comment or should I file an additional bug for this?
Comment 8 Nicholas Nethercote [:njn] 2011-07-07 16:33:59 PDT
(In reply to comment #7)
>
> PS: It is also worth noting that on the above site Firefox uses over 1 GB of
> memory, but Internet Explorer 8 uses only about 750 MB. Also, Firefox
> becomes basically unusable (until you switch tabs and Firefox frees the
> memory) while IE stays responsive.
> Is there a bug where I can add a comment or should I file an additional bug
> for this?

Bug 660577 is open for excessive memory usage on image-heavy sites.  As for the part about it making Firefox unusable, there's a good chance that's due to paging -- could you tell if that was the case?  Eg. was your hard disk going crazy?
Comment 9 Justin Lebar (not reading bugmail) 2011-07-07 16:54:52 PDT
You can also view hard page faults from the system monitor. start -> run -> perfmon.msc.  Hard page faults (aka major page faults) is under the "memory" category.
Comment 10 Andreas Jung 2011-07-08 15:03:17 PDT
(In reply to comment #8)
> Bug 660577 is open for excessive memory usage on image-heavy sites.  As for
> the part about it making Firefox unusable, there's a good chance that's due
> to paging -- could you tell if that was the case?  Eg. was your hard disk
> going crazy?

Answer is in bug 660577 comment #92 (it seemed too off-topic in this bug).
Comment 11 Justin Lebar (not reading bugmail) 2011-07-11 15:52:25 PDT
Created attachment 545273 [details] [diff] [review]
Part 1, v1
Comment 12 Justin Lebar (not reading bugmail) 2011-07-11 15:54:31 PDT
Created attachment 545274 [details] [diff] [review]
Part 2, v1
Comment 13 Justin Lebar (not reading bugmail) 2011-07-11 15:55:34 PDT
This works on my Linux box.  I'll start looking at other platforms now.
Comment 14 Nicholas Nethercote [:njn] 2011-07-11 16:51:53 PDT
Comment on attachment 545274 [details] [diff] [review]
Part 2, v1

Review of attachment 545274 [details] [diff] [review]:
-----------------------------------------------------------------

This seems ok, though I don't understand the *XlibSurface.* changes.  I'd like to see patches for other platforms before it lands, though.

::: gfx/thebes/gfxASurface.cpp
@@ +542,5 @@
>  
>  /** Memory reporting **/
>  
> +static const char *sDefaultSurfaceDescription =
> +    "Memory used by gfx surface of the given type";

Period needed at the end of the sentence.

@@ +545,5 @@
> +static const char *sDefaultSurfaceDescription =
> +    "Memory used by gfx surface of the given type";
> +
> +// {reporter name, reporter description (or null for default) }
> +static const char *sSurfaceMemoryReporterAttrs[][2] = {

Using a two-element array is a bit gross.  I'd prefer a struct with "name" and "description" fields.

@@ +553,5 @@
> +    {"gfx-surface-xlib",
> +     "Memory used by xlib surfaces to store pixmaps. This memory lives in "
> +     "the X server's process rather than in this application, so the bytes "
> +     "accounted for here aren't counted in vsize, resident, explicit, or any of "
> +     "the other measurements on this page."},

This description only makes sense if gfx-surface-xlib only occurs on Linux.  I guess that is the case?
Comment 15 Justin Lebar (not reading bugmail) 2011-07-11 16:59:05 PDT
> This seems ok, though I don't understand the *XlibSurface.* changes.

Yeah, a gfx peer needs to have a look.  I don't pretend to know what I'm doing.  :)

>> +static const char *sDefaultSurfaceDescription =
>> +    "Memory used by gfx surface of the given type";

> Period needed at the end of the sentence.

It's not a sentence.  :)  But I guess all the other reporters start with a sentence fragment followed by a period, so this one should too.
Comment 16 Justin Lebar (not reading bugmail) 2011-07-11 17:21:33 PDT
> This description only makes sense if gfx-surface-xlib only occurs on Linux.  I 
> guess that is the case?

Why is that?  gfx-surface-xlib occurs only where X is present, of course, but does it have to be specifically Linux?
Comment 17 Nicholas Nethercote [:njn] 2011-07-11 17:33:39 PDT
(In reply to comment #16)
> 
> Why is that?  gfx-surface-xlib occurs only where X is present, of course,
> but does it have to be specifically Linux?

I'm assuming that X is only (and always) present on Linux.
Comment 18 Justin Lebar (not reading bugmail) 2011-07-11 17:43:11 PDT
> This description only makes sense if gfx-surface-xlib only occurs on Linux.  I 
> guess that is the case?

> I'm assuming that X is only (and always) present on Linux.

Okay.  Jut to be clear: By confirming that this reporter only appears when X is present, have I addressed your concern?

I just checked on Mac, and confirmed that we don't have this problem there.  So it's just Windows to go.
Comment 19 Nicholas Nethercote [:njn] 2011-07-11 17:57:33 PDT
(In reply to comment #18)
> 
> > I'm assuming that X is only (and always) present on Linux.
> 
> Okay.  Jut to be clear: By confirming that this reporter only appears when X
> is present, have I addressed your concern?

Yes.
Comment 20 Justin Lebar (not reading bugmail) 2011-07-11 18:21:42 PDT
Created attachment 545301 [details]
Windows about:memory before/after discard

I think part 1 may be all that's needed to get this working on Windows.  I've attached about:memory before and after discarding about 100mb worth of images.
Comment 21 Justin Lebar (not reading bugmail) 2011-07-11 18:24:20 PDT
Comment on attachment 545274 [details] [diff] [review]
Part 2, v1

This causes us to double-count on Linux.  I didn't realize that imgFrame::EstimateMemoryUsed was listening to my calls to gfxASurface::RecordMemoryUsed.
Comment 22 Justin Lebar (not reading bugmail) 2011-07-12 06:56:27 PDT
Created attachment 545372 [details] [diff] [review]
Part 2, v2

This doesn't double-count any longer on Linux, at the cost of a bit of ugliness in the API.  njn, what do you think about gating on MemoryInProcess()?

On Windows, I'm finding that this bug doesn't happen in my unpatched debug builds, but I still can with the nightly builds.  I'm getting set up now to build with jemalloc, to see if that changes anything.
Comment 23 Justin Lebar (not reading bugmail) 2011-07-12 10:15:35 PDT
On Windows, it looks like the images are stored in-process (*).  But the problem is that images/content/used/raw has kind HEAP when it doesn't live in the heap.  It should have kind MAPPED.

(*) At least, on my machine, without D2D.
Comment 24 Justin Lebar (not reading bugmail) 2011-07-12 10:34:22 PDT
And with D2D forced on, the memory reporter works properly?  That's kind of surprising.

It reports the memory under images/content/used/uncompressed and gfx-surface-image.
Comment 25 Joe Drew (not getting mail) 2011-07-12 12:31:43 PDT
Comment on attachment 545273 [details] [diff] [review]
Part 1, v1

I'm a little surprised that gfxXlibSurface doesn't know how to ask X about this properly.
Comment 26 Justin Lebar (not reading bugmail) 2011-07-12 12:55:25 PDT
(In reply to comment #25)
> I'm a little surprised that gfxXlibSurface doesn't know how to ask X about
> this properly.

Indeed.  It looks like we're reporting more memory usage than xrestop blames us for, so probably the Right Thing is to ask X how much memory we're actually using.  I'll try and see how hard that is to do, although I'm not really bothered if we over-report.
Comment 27 Justin Lebar (not reading bugmail) 2011-07-12 13:27:11 PDT
XRestop uses XResQueryClientPixmapBytes().  See 

http://www.manpagez.com/man/3/XResQueryClientPixmapBytes/

I have no idea if this is something we can use in our code, though.  I kind of think it's not worth worrying about.
Comment 28 Joe Drew (not getting mail) 2011-07-12 13:28:44 PDT
Yeah, it's no big deal.
Comment 29 Justin Lebar (not reading bugmail) 2011-07-12 13:39:54 PDT
Created attachment 545465 [details] [diff] [review]
Rolled up patch v3

I'm not so good about keeping my changes in logically-separate patches, so here's a rolled up patch which seems to work on Windows and Linux.  It's not the prettiest patch, but I'm not sure how to improve it...

This patch adds a whole bunch of memory reporters which will usually be 0, but there are already plenty of image reporters which are usually 0.  They get hidden in about:memory anyway.
Comment 30 Justin Lebar (not reading bugmail) 2011-07-12 13:41:25 PDT
Comment on attachment 545465 [details] [diff] [review]
Rolled up patch v3

This will need Joe's review too, but most of the changes are related to about:memory.  Would you like to look first, Nick?
Comment 31 Nicholas Nethercote [:njn] 2011-07-12 19:47:54 PDT
Comment on attachment 545465 [details] [diff] [review]
Rolled up patch v3

Review of attachment 545465 [details] [diff] [review]:
-----------------------------------------------------------------

The structure of all this seems reasonable.  I worry if the classifications of where images live for different platforms are brittle -- are they likely to change?  But I guess things are already broken so they can hardly get worse...

::: gfx/thebes/gfxASurface.h
@@ +215,5 @@
> +     *
> +     * As a default, we say that the surface's memory lives in the heap.
> +     */
> +    virtual PRBool MemoryIsHeap();
> +    virtual PRBool MemoryIsNonheap();

It would be better to exclude the invalid true/true combination by having a single function that return a tri-valued enum with names like inProcessHeap/inProcessMapped/outOfProcess.  (And see below for my point about "mapped" and "non-heap".)

::: modules/libpr0n/src/RasterImage.cpp
@@ +756,5 @@
> +    if (aHeap) {
> +      val += frame->EstimateHeapMemoryUsed();
> +    }
> +    else {
> +      val += frame->EstimateNonheapMemoryUsed();

I'm a fan of conditional expressions for this sort of thing:

val += aHeap
     ? frame->EstimateHeapMemoryUsed()
     : frame->EstimateNonheapMemoryUsed();

::: modules/libpr0n/src/imgFrame.cpp
@@ -810,5 @@
>  
> -  // fall back to pessimistic/approximate size
> -  if (size == 0) {
> -    size = mSize.width * mSize.height * 4;
> -  }

You had a comment in an earlier patch where you explained why estimating a size of zero was the right thing here.  Is that comment no longer valid?  I liked it...

::: modules/libpr0n/src/imgLoader.cpp
@@ +278,5 @@
>    {
>      if (mType == ChromeUsedRaw) {
>        desc.AssignLiteral("Memory used by in-use chrome images (compressed data).");
> +    } else if (mType == ChromeUsedUncompressedHeap) {
> +      desc.AssignLiteral("Heap memory used by in-use chrome images (uncompressed data).");

If you look at the tool-tips in about:memory you'll see that it prepends "(Heap)" or "(Mapped)" to the start of all descriptions depending on the kind.  So you can replace "Heap memory..." with "Memory...".

Also, the pre-existing dichotomy was "heap" vs "mapped", and this exists both in the code (KIND_MAPPED, KIND_HEAP) and in the user-facing presentation (i.e. those tooltips in about:memory).  But you've used "heap" vs "non-heap" throughout this patch.  I'd luke you to use "heap" vs "mapped" throughout for consistency.
Comment 32 Justin Lebar (not reading bugmail) 2011-07-12 21:26:39 PDT
> Also, the pre-existing dichotomy was "heap" vs "mapped", and this exists both 
> in the code (KIND_MAPPED, KIND_HEAP) and in the user-facing presentation (i.e. 
> those tooltips in about:memory).  But you've used "heap" vs "non-heap" 
> throughout this patch.

How would you feel if I changed about:memory to "heap" and "non-heap"?  I don't really like "mapped" because:

 * The heap itself is mapped.
 * I have no idea how memory for these images is allocated.  Presumably Windows does some kind of mmap'ing behind the scenes, but who knows.  It doesn't feel right to call these allocations "mappings", when all that I know is they're not heap allocations.
 * I don't really care whether memory is mapped.  What's relevant for about:memory is whether the allocation was recorded by the heap allocator or not.  (For instance, are thread stacks "mapped" memory?)
Comment 33 Nicholas Nethercote [:njn] 2011-07-12 21:34:05 PDT
(In reply to comment #32)
> 
> How would you feel if I changed about:memory to "heap" and "non-heap"?

Often it's easier to leave things the way they are, even if they aren't exactly the way you would have done them.  But if your dissatisfaction with the naming is preventing you from making progress on changes that will improve the lives of users, then by all means file a bug and write a patch.
Comment 34 Justin Lebar (not reading bugmail) 2011-07-13 08:25:50 PDT
It's more a matter of: If I have to rewrite something (my patch or the existing code), I'd rather improve the overall state of the code.  I filed bug 671280.
Comment 35 Justin Lebar (not reading bugmail) 2011-07-13 10:22:26 PDT
> You had a comment in an earlier patch where you explained why estimating a size 
> of zero was the right thing here.  Is that comment no longer valid?  I liked 
> it...

I think using an enum as you suggested made this whole thing much clearer -- GetMemoryLocation() can return either MEMORY_IN_PROCESS_HEAP, MEMORY_IN_PROCESS_NONHEAP, or MEMORY_OUT_OF_PROCESS.  So of course EstimateHeapMemoryUsed() or EstimateNonheapMemoryUsed() might return 0.
Comment 36 Justin Lebar (not reading bugmail) 2011-07-13 10:27:00 PDT
Created attachment 545696 [details] [diff] [review]
Patch v4
Comment 37 Joe Drew (not getting mail) 2011-07-13 12:28:14 PDT
Comment on attachment 545696 [details] [diff] [review]
Patch v4

Review of attachment 545696 [details] [diff] [review]:
-----------------------------------------------------------------

oh my goodness this should have been like 4 different patches

::: modules/libpr0n/src/RasterImage.cpp
@@ +746,5 @@
>    return rv;
>  }
>  
> +static PRUint32
> +GetDecodedSize(const nsTArray<imgFrame *> &aFrames, PRBool aHeap)

I'd rather this last parameter be an enum so it's obvious what we're asking for.

@@ +749,5 @@
> +static PRUint32
> +GetDecodedSize(const nsTArray<imgFrame *> &aFrames, PRBool aHeap)
> +{
> +  PRUint32 val = 0;
> +  for (PRUint32 i = 0; i < aFrames.Length(); i++) {

++i
Comment 38 Justin Lebar (not reading bugmail) 2011-07-15 15:34:48 PDT
Created attachment 546247 [details] [diff] [review]
KIND_MAPPED comment change, v1
Comment 39 Justin Lebar (not reading bugmail) 2011-07-15 15:47:35 PDT
Created attachment 546248 [details] [diff] [review]
Patch v5
Comment 40 Justin Lebar (not reading bugmail) 2011-07-15 15:49:56 PDT
Comment on attachment 546248 [details] [diff] [review]
Patch v5

Asking for feedback re bug 671280 comment 5: Are you OK with leaving the image code as "nonheap" and the nsIMemoryReporter code as "mapped"?  It's unfortunately inconsistent, but I think it's worse to call memory that's allocated via CreateDIBSection "mapped".
Comment 41 Nicholas Nethercote [:njn] 2011-07-18 18:34:39 PDT
Comment on attachment 546248 [details] [diff] [review]
Patch v5

Review of attachment 546248 [details] [diff] [review]:
-----------------------------------------------------------------

I think we've reached an agreement on the naming here, right?
Comment 42 Nicholas Nethercote [:njn] 2011-07-18 18:35:35 PDT
Comment on attachment 546247 [details] [diff] [review]
KIND_MAPPED comment change, v1

Review of attachment 546247 [details] [diff] [review]:
-----------------------------------------------------------------

This change is fine, but maybe it's not necessary any more?
Comment 43 Justin Lebar (not reading bugmail) 2011-07-19 08:43:47 PDT
(In reply to comment #41)
> I think we've reached an agreement on the naming here, right?

Indeed!  Pushed to inbound: http://hg.mozilla.org/integration/mozilla-inbound/rev/7b49a2857e18
Comment 45 Justin Lebar (not reading bugmail) 2011-07-20 07:07:24 PDT
Looks like The RSS/Private Bytes regression went away after I backed out, so this was real.
Comment 46 Justin Lebar (not reading bugmail) 2011-07-20 07:47:23 PDT
The RSS increased from 167.2MB to 206.3MB on TP5, and persisted at that level until I backed out.  But whatever is causing this doesn't affect Linux.

Very weird...
Comment 47 Justin Lebar (not reading bugmail) 2011-07-20 08:17:46 PDT
As a wild guess, I moved the definitions of some virtual functions out of the header files.  I'd be pretty surprised if that was the problem, though...

(Also, I noticed that the code I'd written was bugged -- it declared GetMemoryType() on gfxASurface but then declared MemoryType() on gfxWindowsSurface and gfxXlibSurface.  I fixed that too...)

http://tbpl.mozilla.org/?tree=Try&rev=dd48d67a347b
Comment 48 Justin Lebar (not reading bugmail) 2011-07-20 12:32:58 PDT
> http://tbpl.mozilla.org/?tree=Try&rev=dd48d67a347b

Nope, this didn't fix it.
Comment 49 Justin Lebar (not reading bugmail) 2011-07-21 11:49:32 PDT
I ran this through Valgrind on my Mac, but didn't see anything obvious leading to an extra 30mb of memory used.

I'm going to spin builds on my machine and see if I can reproduce this regression and if about:memory yields any insights.

Other ideas would be appreciated; I'm kind of shooting in the dark.
Comment 50 Justin Lebar (not reading bugmail) 2011-07-25 10:43:19 PDT
Okay, I think I understand what's going on here now with the Mac RSS regression.

The problem is that we're reporting images as having 0 size on Mac.  This causes the image cache to be lazy about throwing away images (why not keep them around if they don't take up any space?), resulting in the RSS increase.

The problem arises when imgFrame::mOptSurface is a gfxQuartzSurface (this appears to happen most of the time).  In that case, the QuartzSurface says that it doesn't use any memory -- I think this is because the memory is actually owned by a gfxImageSurface, and the QuartzSurface is a thin wrapper around it.

I can think of a couple of solutions, but I'm not sure which is right.

1) When a QuartzSurface is created wrapping an ImageSurface, set the QuartzSurface's memory used to the ImageSurface's memory used, and then set the ImageSurface's memory used to 0.  This works only if the QuartzSurface's lifetime is the same as the ImageSurface's.

2) When a QuartzSurface is created wrapping an ImageSurface, set the QuartzSurface's memory used to the ImageSurface's memory used, and then decrement the aggregate count of memory used by ImageSurfaces by the ImageSurface's size.  When the QuartzSurface dies, increment the aggregate ImageSurface size.

This has the advantage that it'll work correctly if the QuartzSurface dies before the ImageSurface.  It won't work correctly if the ImageSurface dies first, but presumably that's OK, since the ImageSurface owns the memory.  It also won't work if two QuartzSurfaces can be created from one ImageSurface.

3) Hack imgFrame::EstimateHeapMemoryUsed -- if mOptSurface has type QuartzImage, say that we're using 4 * size * width bytes on the heap.  This is ugly, and like (2) it will over-count if two QuartzSurfaces are created from the same ImageSurface.

Joe or Jeff, do you have thoughts on what's the right way to proceed here?
Comment 51 Boris Zbarsky [:bz] 2011-07-25 10:46:12 PDT
Seems like the number we use for the image cache stuff should NOT match the about:memory number.  The latter needs to be in-process heap memory, while the latter should be how much space the image really takes up in RAM no matter where it happens to live....
Comment 52 Justin Lebar (not reading bugmail) 2011-07-25 10:47:41 PDT
Which "latter" should be a "former"?
Comment 53 Justin Lebar (not reading bugmail) 2011-07-25 10:53:05 PDT
The number in explicit/images/content/used/raw needs to be in-process (heap or nonheap), but I think the number for the image cache should be in-process heap/nonheap plus out-of-process.  That's currently the intent of the patch.
Comment 54 Boris Zbarsky [:bz] 2011-07-25 10:54:09 PDT
The latter "latter" (the second one) should be a "former".  ;)
Comment 55 Boris Zbarsky [:bz] 2011-07-25 10:54:36 PDT
And yes, comment 53 is what I was after.
Comment 56 Justin Lebar (not reading bugmail) 2011-07-25 11:54:22 PDT
Created attachment 548240 [details] [diff] [review]
Fix for Mac, option (3) from comment 50
Comment 57 Joe Drew (not getting mail) 2011-07-28 11:33:53 PDT
Comment on attachment 548240 [details] [diff] [review]
Fix for Mac, option (3) from comment 50

Review of attachment 548240 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry I didn't respond before now.

I'd much rather we fix this in gfxQuartzImageSurface. I think maybe the best thing to do here is to make gfxASurface::KnownMemoryUsed virtual, and override it in gfxQuartzImageSurface to call KnownMemoryUsed on its contained surface (call gfxASurface::GetSurfaceWrapper on the cairo surface).
Comment 58 Justin Lebar (not reading bugmail) 2011-07-28 11:50:12 PDT
That would unfortunately mess up the gfx-*-surface reporters in about:memory -- If gfxQuargzImageSurface wraps X mb of memory, then we'll report those X bytes  for both gfx-quartz-image-surface and gfx-image-surface.
Comment 59 Joe Drew (not getting mail) 2011-07-28 14:01:14 PDT
Are you sure? RecordMemoryUsed seems unrelated to KnownMemoryUsed, except in that they both affect the same variable. KnownMemoryUsed seems to only be called in imgFrame.cpp.
Comment 60 Justin Lebar (not reading bugmail) 2011-07-28 14:08:26 PDT
> Are you sure?

I was, but I'm not anymore.  :)  I think you're right.  I'll spin up a patch.
Comment 61 Justin Lebar (not reading bugmail) 2011-07-28 14:29:00 PDT
Created attachment 549226 [details] [diff] [review]
Fix for Mac, v2

This seems to work, although I'll push to try before burning m-i again.
Comment 62 Justin Lebar (not reading bugmail) 2011-07-28 14:38:46 PDT
Tryserver push: http://tbpl.mozilla.org/?tree=Try&rev=3f8797b3747f
Comment 63 Joe Drew (not getting mail) 2011-07-28 15:50:22 PDT
Comment on attachment 549226 [details] [diff] [review]
Fix for Mac, v2

It's only a WARNING for GetAsImageSurface() to return null, so you might want to do a null-check there.
Comment 64 Justin Lebar (not reading bugmail) 2011-07-29 11:02:04 PDT
The try push looked good.  Inbound: http://hg.mozilla.org/integration/mozilla-inbound/rev/984e656becec
Comment 65 Justin Lebar (not reading bugmail) 2011-07-29 13:04:08 PDT
FWIW, inbound push's TP5 memory numbers also look good.
Comment 66 Marco Bonardo [::mak] 2011-08-01 01:22:52 PDT
Backed out due to regressions in Tp5 XRES (100%) and Tp5 RSS (3-5%) on Linux. Perf-o-matic shows clearly the two regressions on this changeset.
Comment 67 Justin Lebar (not reading bugmail) 2011-08-01 08:17:13 PDT
Hm, it looks like the same thing which was happening on Mac is happening on Linux.  I thought this hunk would have taken care of Linux, but something else must be going on.

>  PRUint32
> Image::GetDataSize()
> {
>   if (mError)
>     return 0;
>   
>-  return GetSourceDataSize() + GetDecodedDataSize();
>+  return GetSourceHeapSize() + GetDecodedHeapSize() + GetDecodedNonheapSize();
> }
Comment 68 Justin Lebar (not reading bugmail) 2011-08-01 11:50:10 PDT
Oy; no, of course that hunk doesn't fix things.  It only counts in-process memory.  Fix coming.
Comment 69 Justin Lebar (not reading bugmail) 2011-08-01 14:13:14 PDT
Created attachment 549918 [details] [diff] [review]
Fix for Linux

Here's the fix.  I'm sorry for the churn, Joe; I should have seen this.
Comment 70 Justin Lebar (not reading bugmail) 2011-08-01 14:41:23 PDT
Try build: http://tbpl.mozilla.org/?tree=Try&rev=484e129465a9
Comment 71 Justin Lebar (not reading bugmail) 2011-08-02 07:13:56 PDT
Compare-talos results: http://goo.gl/Br0Ij

This latest patch looks good to me.  Joe, do you mind looking at it once more?
Comment 72 Joe Drew (not getting mail) 2011-08-04 14:30:06 PDT
Comment on attachment 549918 [details] [diff] [review]
Fix for Linux

Review of attachment 549918 [details] [diff] [review]:
-----------------------------------------------------------------

fingers crossed
Comment 73 Justin Lebar (not reading bugmail) 2011-08-04 14:35:22 PDT
> fingers crossed

Indeed.  http://hg.mozilla.org/integration/mozilla-inbound/rev/effa390ec9c5

Thanks for the review.
Comment 74 Marco Bonardo [::mak] 2011-08-05 08:59:33 PDT
http://hg.mozilla.org/mozilla-central/rev/effa390ec9c5

Note You need to log in before you can comment on or make changes to this bug.