Add a memory reporter for in-memory blobs

RESOLVED FIXED in Firefox 18

Status

()

defect
RESOLVED FIXED
7 years ago
a month ago

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

Tracking

(Blocks 1 bug)

Trunk
mozilla19
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(blocking-basecamp:+, firefox18 fixed)

Details

Attachments

(2 attachments)

We need a memory reporter for e.g. the memory returned by canvas.toBlob.

I have one cooked up here; I'll attach it in a moment.
(Assignee)

Updated

7 years ago
Blocks: 802647
Depends on: 648610
(Assignee)

Updated

7 years ago
Depends on: 803692
Posted patch Patch, v1Splinter Review
Attachment #673413 - Flags: review?(khuey)
Comment on attachment 673413 [details] [diff] [review]
Patch, v1

I imagine njn will have some nits here.
Attachment #673413 - Flags: feedback?(n.nethercote)
I included the sha1's because

 a) It lets us identify duplicate blobs
 b) It would be easy to (at some later point in time) dump all blobs.  Then we could use their SHA1s to identify them (e.g. "blob X, which I happen to know has these contents, disappears after I take action Y").
Comment on attachment 673413 [details] [diff] [review]
Patch, v1

Review of attachment 673413 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/library/Makefile.in
@@ +138,5 @@
>    chromium_s \
>    snappy_s \
>    $(NULL)
>  
> +OS_LIBS += $(call EXPAND_LIBNAME_PATH,mfbt,$(DEPTH)/mfbt)

I am, uh, surprised, that this was not there before ...

Also it should probably be LIBS instead of OS_LIBS.
Attachment #673413 - Flags: review?(khuey) → review+
Comment on attachment 673413 [details] [diff] [review]
Patch, v1

Review of attachment 673413 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, just a few nits.

What kind of numbers are you seeing from this?

::: content/base/public/nsDOMFile.h
@@ +383,5 @@
>      ~DataOwner() {
> +      remove();
> +      if (sDataOwners->isEmpty()) {
> +        // Free the linked list if it's empty.
> +        sDataOwners = nullptr;

You free the list if it's not empty but not the memory reporter.  I don't mind, just thought I'd mention the inconsistency.

::: content/base/src/nsDOMFile.cpp
@@ +680,5 @@
> +  NS_DECL_ISUPPORTS
> +
> +  NS_IMETHOD GetName(nsACString& aName)
> +  {
> +    aName.AssignASCII("nsDOMMemoryFileDataOwner");

Nit: Multi-reporter names usually follow the path conventions, e.g. "dom-memory-file-data-owner".

@@ +711,5 @@
> +
> +      if (size < LARGE_OBJECT_MIN_SIZE) {
> +        smallObjectsTotal += size;
> +      }
> +      else {

Is this standard Gecko "else" style?  I can't remember for sure, but I suspect it's a jlebar special :)

@@ +714,5 @@
> +      }
> +      else {
> +        SHA1Sum sha1;
> +        sha1.update(owner->mData, owner->mLength);
> +        uint8_t digest[20];

Where does the 20 come from?  Should it be a constant, or commented on?  The loop below assumes it's a multiple of 4.

@@ +774,5 @@
> +  }
> +
> +  nsRefPtr<nsDOMMemoryFileDataOwnerMemoryReporter> reporter = new
> +    nsDOMMemoryFileDataOwnerMemoryReporter();
> +  NS_RegisterMemoryMultiReporter(reporter);

You can just inline the |new| call and avoid the local var altogether.  It's done that way in some other places.
Attachment #673413 - Flags: feedback?(n.nethercote) → feedback+
> You free the list if it's not empty but not the memory reporter.  I don't mind, just thought I'd 
> mention the inconsistency.

Yeah; I free the list only because of our annoying leak-checking.  The memory reporter gets freed on shutdown.

(I could ClearOnShutdown this list, but it's not clear how that would interact with blobs we create or leak after shutdown.)

> Is this standard Gecko "else" style?  I can't remember for sure, but I suspect it's a jlebar special 
> :)

It's what the rest of the file does!  (I looked.)  Also, I don't like cuddle-else.  :)

> Where does the 20 come from? 

Yeah, that should be SHA1Sum::HashSize.  And we can statically-assert that it's a multiple of 4 or something.  It's not going to change, anyway.

> You can just inline the |new| call and avoid the local var altogether.  It's done that way in some 
> other places.

That is an /extremely/ dangerous violation of XPCOM calling conventions, in general.  We've had critical use-after-free bugs caused by exactly this sort of thing.  (You have a bug if the callee ever does at least one addref and then drops the net added refcount to 0, by balancing its addrefs and releases.  In that case, we free the object!)

Maybe we should fix those other callers, so we don't inspire anyone to follow our lead.
> What kind of numbers are you seeing from this?

In a vanilla FF build, I believe I saw 0 bytes until I ran the testcase attached.  I haven't looked in B2G yet, but I'd expect only to see our screenshots, for ~5mb in the parent process.
> > You can just inline the |new| call and avoid the local var altogether.  It's done that way in some 
> > other places.
> 
> That is an /extremely/ dangerous violation of XPCOM calling conventions, in
> general.  We've had critical use-after-free bugs caused by exactly this sort
> of thing.  (You have a bug if the callee ever does at least one addref and
> then drops the net added refcount to 0, by balancing its addrefs and
> releases.  In that case, we free the object!)
> 
> Maybe we should fix those other callers, so we don't inspire anyone to
> follow our lead.

Thanks for the correction!  grep tells me we do it a lot:

./gfx/thebes/gfxASurface.cpp:        NS_RegisterMemoryMultiReporter(new SurfaceMemoryReporter());
./gfx/thebes/gfxFont.cpp:    NS_RegisterMemoryMultiReporter(new MemoryReporter);
./gfx/thebes/gfxPlatformFontList.cpp:    NS_RegisterMemoryMultiReporter(new MemoryReporter);
./gfx/thebes/gfxWindowsPlatform.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(D2DCache));
./gfx/thebes/gfxWindowsPlatform.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(D2DVram));
./gfx/thebes/gfxWindowsPlatform.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(D2DVRAMDT));
./gfx/thebes/gfxWindowsPlatform.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(D2DVRAMSS));
./image/src/imgLoader.cpp:  NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(ImagesContentUsedUncompressed));
./ipc/glue/SharedMemory.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(ShmemAllocated));
./ipc/glue/SharedMemory.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(ShmemMapped));
./js/xpconnect/src/XPCJSRuntime.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(XPConnectJSGCHeap));
./js/xpconnect/src/XPCJSRuntime.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(XPConnectJSSystemCompartmentCount));
./js/xpconnect/src/XPCJSRuntime.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(XPConnectJSUserCompartmentCount));
./js/xpconnect/src/XPCJSRuntime.cpp:    NS_RegisterMemoryMultiReporter(new JSCompartmentsMultiReporter);
./toolkit/components/places/History.cpp:  NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(HistoryService));
./xpcom/base/AvailableMemoryTracker.cpp:  NS_RegisterMemoryReporter(new NumLowCommitSpaceEventsMemoryReporter());
./xpcom/base/AvailableMemoryTracker.cpp:  NS_RegisterMemoryReporter(new NumLowPhysicalMemoryEventsMemoryReporter());
./xpcom/base/AvailableMemoryTracker.cpp:    NS_RegisterMemoryReporter(new NumLowVirtualMemoryEventsMemoryReporter());
./xpcom/base/nsCycleCollector.cpp:        NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(CycleCollector));
./xpcom/base/nsMemoryReporterManager.cpp:#define REGISTER(_x)  RegisterReporter(new NS_MEMORY_REPORTER_NAME(_x))
./xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp:    NS_RegisterMemoryReporter(new NS_MEMORY_REPORTER_NAME(xptiWorkingSet));


I guess these cases are safe precisely because NS_RegisterMemoryReporter doesn't do the ref balancing you mentioned, though I can see that assuming that isn't good.
See also bug 800187 comment 5 and 6 -- do you think mReporter should be a nsCOMPtr in that patch?  IIRC I suggested that a while back but Ehsan said it wasn't necessary (though that recollection could well be wrong).
(In reply to Nicholas Nethercote [:njn] from comment #10)
> See also bug 800187 comment 5 and 6 -- do you think mReporter should be a
> nsCOMPtr in that patch?  IIRC I suggested that a while back but Ehsan said
> it wasn't necessary (though that recollection could well be wrong).

Yeah, it's the same thing.  If you want to follow XPCOM calling conventions, you must hold a strong ref to all refcounted arguments before you make a function call.

You could even assert in RegisterMemory{,Multi}Reporter that mRefcount > 0, if you wanted to be really paranoid.  (Calling AddRef() manually will return the new refcount.)
[basecamp-blocking request]
Follow the dependency chain back to a blocking-basecamp+ bug.
blocking-basecamp: --- → ?
> Also it should probably be LIBS instead of OS_LIBS.

It doesn't link with LIBS.  Same error as without any change:

/home/jlebar/code/moz/ff-git/debug/_virtualenv/bin/python ../../../src/config/pythonpath.py -I../../config ../../../src/config/expandlibs_exec.py --depend .deps/libxul.so.pp --target libxul.so --uselist --  /usr/local/bin/ccache c++  -pedantic -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Wtype-limits -Wempty-body -Wno-ctor-dtor-privacy -Wno-overlength-strings -Wno-invalid-offsetof -Wno-variadic-macros -Wcast-align -Wno-long-long -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -std=gnu++0x -pthread -pipe  -DDEBUG -D_DEBUG -DTRACING -g -fno-omit-frame-pointer  -fPIC -shared -Wl,-z,defs -Wl,--gc-sections -Wl,-h,libxul.so -o libxul.so  nsStaticXULComponents.o nsUnicharUtils.o nsBidiUtils.o nsSpecialCasingData.o nsUnicodeProperties.o nsRDFResource.o    -lpthread  -Wl,-z,noexecstack -Wl,--icf=safe   -Wl,-rpath-link,/home/jlebar/code/moz/ff-git/debug/dist/bin -Wl,-rpath-link,/usr/local/lib  ../../toolkit/components/osfile/libosfile_s.a ../../toolkit/xre/libxulapp_s.a  ../../staticlib/components/libnecko.a ../../staticlib/components/libuconv.a ../../staticlib/components/libi18n.a ../../staticlib/components/libchardet.a ../../staticlib/components/libjar50.a ../../staticlib/components/libstartupcache.a ../../staticlib/components/libpref.a ../../staticlib/components/libhtmlpars.a ../../staticlib/components/libidentity.a ../../staticlib/components/libimglib2.a ../../staticlib/components/libmediasniffer.a ../../staticlib/components/libgkgfx.a ../../staticlib/components/libgklayout.a ../../staticlib/components/libdocshell.a ../../staticlib/components/libembedcomponents.a ../../staticlib/components/libwebbrwsr.a ../../staticlib/components/libnsappshell.a ../../staticlib/components/libtxmgr.a ../../staticlib/components/libcommandlines.a ../../staticlib/components/libtoolkitcomps.a ../../staticlib/components/libpipboot.a ../../staticlib/components/libpipnss.a ../../staticlib/components/libappcomps.a ../../staticlib/components/libjsreflect.a ../../staticlib/components/libcomposer.a ../../staticlib/components/libtelemetry.a ../../staticlib/components/libjsinspector.a ../../staticlib/components/libjsdebugger.a ../../staticlib/components/libstoragecomps.a ../../staticlib/components/librdf.a ../../staticlib/components/libwindowds.a ../../staticlib/components/libjsctypes.a ../../staticlib/components/libjsperf.a ../../staticlib/components/libgkplugin.a ../../staticlib/components/libunixproxy.a ../../staticlib/components/libjsd.a ../../staticlib/components/libautoconfig.a ../../staticlib/components/libauth.a ../../staticlib/components/libcookie.a ../../staticlib/components/libpermissions.a ../../staticlib/components/libuniversalchardet.a ../../staticlib/components/libfileview.a ../../staticlib/components/libplaces.a ../../staticlib/components/libtkautocomplete.a ../../staticlib/components/libsatchel.a ../../staticlib/components/libpippki.a ../../staticlib/components/libwidget_gtk2.a ../../staticlib/components/libimgicon.a ../../staticlib/components/libprofiler.a ../../staticlib/components/libaccessibility.a ../../staticlib/components/libremoteservice.a ../../staticlib/components/libspellchecker.a ../../staticlib/components/libzipwriter.a ../../staticlib/components/libservices-crypto.a ../../staticlib/components/libgkdebug.a ../../staticlib/components/libpeerconnection.a ../../staticlib/libjsipc_s.a ../../staticlib/libdomipc_s.a ../../staticlib/libdomplugins_s.a ../../staticlib/libmozipc_s.a ../../staticlib/libmozipdlgen_s.a ../../staticlib/libipcshell_s.a ../../staticlib/libgfxipc_s.a ../../staticlib/libhal_s.a ../../staticlib/libdombindings_s.a ../../staticlib/libxpcom_core.a ../../staticlib/libucvutil_s.a ../../staticlib/libchromium_s.a ../../staticlib/libsnappy_s.a ../../staticlib/libgtkxtbin.a ../../staticlib/libthebes.a ../../staticlib/libgl.a ../../staticlib/libycbcr.a  -L../../dist/bin -L../../dist/lib /home/jlebar/code/moz/ff-git/debug/dist/lib/libjs_static.a -L../../dist/bin -L../../dist/lib -lcrmf -lsmime3 -lssl3 -lnss3 -lnssutil3   -lXrender  ../../dist/lib/libmozsqlite3.a  /home/jlebar/code/moz/ff-git/debug/modules/zlib/src/libmozz.a ../../dist/lib/libgkmedias.a ../../media/mtransport/build/libmtransport.a ../../media/webrtc/signaling/signaling_ecc/libecc.a ../../media/webrtc/signaling/signaling_sipcc/libsipcc.a  -lasound   -lrt -L../../dist/bin -L../../dist/lib  -L/home/jlebar/code/moz/ff-git/debug/dist/lib -lnspr4 -lplc4 -lplds4 ../../dist/lib/libmozalloc.a -ldbus-glib-1 -ldbus-1 -lgobject-2.0 -lglib-2.0    -lX11  -lXext  -lpangoft2-1.0 -lfreetype -lfontconfig -lpangocairo-1.0 -lpango-1.0 -lcairo -lgobject-2.0 -lglib-2.0   -lfontconfig   -lgtk-x11-2.0 -latk-1.0 -lgio-2.0 -lpangoft2-1.0 -lfreetype -lfontconfig -lgdk-x11-2.0 -lpangocairo-1.0 -lgdk_pixbuf-2.0 -lpango-1.0 -lcairo -lgobject-2.0 -lglib-2.0   -lXt -lgthread-2.0 -lfreetype    -ldl  -lrt    
/usr/bin/ld.gold.real: warning: hidden symbol 'pixman_add_triangles' in /home/jlebar/code/moz/ff-git/debug/toolkit/library/../../gfx/cairo/libpixman/src/pixman-trap.o is referenced by DSO /usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/libcairo.so
/usr/bin/ld.gold.real: error: /home/jlebar/code/moz/ff-git/debug/toolkit/library/../../content/base/src/nsDOMFile.o: requires dynamic R_X86_64_PC32 reloc against '_ZN7mozilla7SHA1SumC1Ev' which may overflow at runtime; recompile with -fPIC
../../../../src/content/base/src/nsDOMFile.cpp:716: error: undefined reference to 'mozilla::SHA1Sum::SHA1Sum()'
../../../../src/content/base/src/nsDOMFile.cpp:717: error: undefined reference to 'mozilla::SHA1Sum::update(void const*, unsigned int)'
../../../../src/content/base/src/nsDOMFile.cpp:719: error: undefined reference to 'mozilla::SHA1Sum::finish(unsigned char (&) [20])'
I don't really understand how adding a memory reporter could block bug 802647. This bug doesn't really improve the end-user experience, so I wouldn't hold release for this bug. So not a blocker.

That said, I'd still take a safe patch since it'll help us find memory problems which is important.
> I don't really understand how adding a memory reporter could block bug 802647. This bug doesn't 
> really improve the end-user experience, so I wouldn't hold release for this bug. So not a blocker.

I am not comfortable landing bug 802647 without this bug.

The reason is that without this bug, we have no insight into how much memory blob screenshots use.  Screenshots are currently one of the highest single pieces of memory consumption in B2G.  Moreover, it's easier to leak a blob than a string, because blobs require manual addref/release semantics.

I think part of the problem here is structural: I can't ask for approval to land without asking for blocking or aurora approval, and those two are completely different mechanisms with different rules.  As a result, we waste a lot of time debating whether a patch may land under X set of rules, when it's going to land one way or another.

I'd like a blocking+ on this bug, because I'm not going to land a blocking+ change without this change.
Alternatively, if you just want to aurora+ my patch, that's fine too.
blocking-basecamp: - → ?
Comment on attachment 673413 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
I think this should go in under blocking-basecamp, but to cover all my bases, the aurora request is: Memory-reporter only change.  Relatively simple code addition, so should be low risk.  (We're not messing with tricky code here.)
Attachment #673413 - Flags: approval-mozilla-aurora?
Whatever we end up deciding on blocking+/aurora+ here should also be made on bug 803692.
>> Also it should probably be LIBS instead of OS_LIBS.
>
> It doesn't link with LIBS.  Same error as without any change:

khuey, is OS_LIBS OK, or do we need to do something else?
(Assignee)

Updated

7 years ago
Flags: needinfo?(khuey)
OS_LIBS is fine.
Flags: needinfo?(khuey)
Blocking based on comment #15.
blocking-basecamp: ? → +
(Assignee)

Updated

7 years ago
Attachment #673413 - Flags: approval-mozilla-aurora?
(Assignee)

Updated

7 years ago
Whiteboard: [eventual-checkin-needed:aurora]
Backed out on inbound due to Windows red (linking with mfbt).

http://hg.mozilla.org/integration/mozilla-inbound/rev/d5ca3152fcde

We're currently discussing what's the right way to fix this.
Whiteboard: [eventual-checkin-needed:aurora]
Assignee: nobody → justin.lebar+bug
I've got bug queries that look for bb+ bugs that are FIXED but haven't landed on Aurora yet, so I'll get this after it lands on m-c. No need to request checkin on it :)
Whiteboard: [eventual-checkin-needed:aurora]
https://hg.mozilla.org/mozilla-central/rev/acbfd1eebb3b
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
And backed out of Aurora due to bustage on Windows, because whatever mfbt build glue we were relying on on m-c isn't on Aurora.

https://hg.mozilla.org/releases/mozilla-aurora/rev/b0240133c2c8
https://hg.mozilla.org/releases/mozilla-aurora/rev/27e470b01865

Glandium, do you know what build glue is missing?

https://tbpl.mozilla.org/php/getParsedLog.php?id=16457960&tree=Mozilla-Aurora
(Assignee)

Updated

7 years ago
Flags: needinfo?(mh+mozilla)
It's very subtle. The problem comes from that:
evaluation from e:\builds\moz2_slave\m-aurora-w32-dbg\build\config\rules.mk:1540:4:4:0$ nsinstall nsinstall -t -m 644 mozglue.dll ../../dist/bin
evaluation from e:\builds\moz2_slave\m-aurora-w32-dbg\build\config\rules.mk:1540:4:4:0$ nsinstall nsinstall -t -m 644 mozglue.lib ../../dist/sdk/lib

mozglue.lib is not copied in dist/lib.

And that's due to bug 787184, which is on aurora, and bug 795204 fixed that problem... on m-c. So, instead of a backout, you needed a clobber.

I'll request an uplift for bug 795204.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #29)
> mozglue.lib is not copied in dist/lib.

With more verbosity: the newly built mozglue.lib does not replace the old mozglue.lib from an older build, and when xul.dll is being linked, the old mozglue.lib is picked and the build fails.
(Assignee)

Updated

7 years ago
Depends on: 806243

Updated

4 years ago
Blocks: 1133911
Component: DOM → DOM: Core & HTML
Product: Core → Core
You need to log in before you can comment on or make changes to this bug.