Last Comment Bug 740162 - Large heap-unclassified (>4GB) on jenkins landing page due to XHR strings
: Large heap-unclassified (>4GB) on jenkins landing page due to XHR strings
Status: RESOLVED DUPLICATE of bug 826521
[MemShrink:P2]
:
Product: Core
Classification: Components
Component: General (show other bugs)
: 14 Branch
: x86_64 Linux
: -- normal (vote)
: ---
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
Depends on:
Blocks: DarkMatter 791695
  Show dependency treegraph
 
Reported: 2012-03-28 14:38 PDT by Trev
Modified: 2013-01-28 10:30 PST (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
about:memory (136.14 KB, text/plain)
2012-03-28 14:38 PDT, Trev
no flags Details
about:memory without any addons (29.79 KB, text/plain)
2012-03-29 09:54 PDT, Trev
no flags Details
DMD output (516.11 KB, application/octet-stream)
2012-03-30 08:55 PDT, Trev
no flags Details

Description Trev 2012-03-28 14:38:48 PDT
Created attachment 610308 [details]
about:memory

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:14.0) Gecko/20120328 Firefox/14.0a1
Build ID: 20120328050845

Steps to reproduce:

This is similar to bug 712822 except it looks like an instance of jenkins  is causing the problem.  I had 4 tabs open when I noticed the browser eating up memory (I have the memchaser extension installed and I could see the resident memory >2GB).  I closed all the tabs except jenkins and the memory still didn't go away.  I had previously seen this and when I closed jenkins the memory did go away.  This time I left jenkins open and opened up about:memory.  I'll attach it but it shows heap-unclassified at around 4.3 GB.  Clicking the clear memory buttons at the bottom didn't seem to do anything.  I usually keep jenkins open as the first or second tab and the its been a day or two since I last saw it eat up memory so it seems to be a rare situation.  I also rarely have jenkins as the main tab as well and in both cases, it was not the main tab.

I'll see if I can duplicate it again.
Comment 1 Marco Castelluccio [:marco] 2012-03-28 14:46:19 PDT
I can't access to http://jenkins.eng.soleranetworks.com/
Comment 2 Justin Lebar (not reading bugmail) 2012-03-28 14:48:52 PDT
This is integer underflow.  I don't see something negative in about:memory, so it could be a race condition in how we calculate about:memory.
Comment 3 Trev 2012-03-28 15:03:31 PDT
Marco - jenkins is an only internally accessible.  We are running 1.431 from http://jenkins-ci.org/ if you want to set it up yourself but I'm not sure if that would help as well.

Justin - It was definitely using up a lot of memory.  htop showed firefox using 85% of the memory.  So, I don't think it is just mistaken calculation.
Comment 4 Justin Lebar (not reading bugmail) 2012-03-28 15:17:46 PDT
Taking a closer look, you're right, it looks like it's probably not underflowing.  Sorry!  RSS is 3+ gb.

Perhaps this is orphaned DOM nodes, bug 704623.  Hard to tell without running DMD on the page, though!
Comment 5 Nicholas Nethercote [:njn] 2012-03-28 15:23:17 PDT
Some of the numbers are close to 4.2GB, which is what you'd see if a small negative 32-bit integer was interpreted as a positive integer.  However, we have these:

4,456,817,720 B ── heap-allocated
4,520,681,472 B ── heap-committed

which are reported directly by jemalloc and should be trustworthy.

The fact that you're seeing this on an internal installation of jenkins is unfortunate, as it makes reproduction much harder.  I can believe that jenkins does stuff quite unlike normal web pages, which would explain why heap-unclassified is so high.
Comment 6 Trev 2012-03-29 09:27:30 PDT
I just saw it again.  memchaser showed >2GB in resident memory.  I closed all the other tabs that I had open and then opened about:memory.  Clicking "minimize memory" actually minimized memory this time, though.  One interesting thing about this memory problem is that it is continually climbing.  I had htop open once I noticed it and the resident memory was climbing about 6M every second.  I would close a tab and that would clear out some memory but within a few seconds it would all be back.  I can't imagine anything running in the background jenkins tab that is eating up 6M every second.
Comment 7 Trev 2012-03-29 09:38:13 PDT
And I just saw it again.  It was climbing more than 6M every second, more like 20M every second.  Then after a while (1-2 minutes), it just went away.
Comment 8 Trev 2012-03-29 09:54:34 PDT
Created attachment 610577 [details]
about:memory without any addons

I think I can duplicate this consistently now.  I've disable all my addons and duplicated it again.  The attached about:memory is the result of that.  I'll try and get DMD working and attach the output of that.
Comment 9 Trev 2012-03-30 08:55:17 PDT
Created attachment 610900 [details]
DMD output

I'm attaching the DMD output.

It is definitely looking like Jenkins.  We have an internal site that has a button that when clicked causes the machine to make a curl request to jenkins to grab a file.  If I click that button with Jenkins open, the memory jumps.  If I click that button through a different browser on a different machine, the memory jumps.  If jenkins is closed, the memory doesn't jump.  It appears as if jenkins sees the curl request and decides to send the same data to the browser.  I'll just have to stop leaving jenkins open.
Comment 10 Justin Lebar (not reading bugmail) 2012-03-30 10:27:27 PDT
Jenkins may be doing something wrong, but this may still be a valid bug -- we shouldn't have such high heap-unclassified.

It's XHR strings:

==7574==  2,147,483,648 bytes (2,147,483,644 requested / 4 slop)
==7574==  97.10% of the heap (97.10% cumulative unreported)
==7574==    at 0x4C2813B: realloc (vg_replace_malloc.c:632)
==7574==    by 0x67E31AD: moz_realloc (mozalloc.cpp:145)
==7574==    by 0x961DA1C: nsStringBuffer::Realloc(nsStringBuffer*, unsigned long) (nsSubstring.cpp:239)
==7574==    by 0x961DCAB: nsAString_internal::MutatePrep(unsigned int, unsigned short**, unsigned int*) (nsTSubstring.cpp:135)
==7574==    by 0x961E9BB: nsAString_internal::SetCapacity(unsigned int) (nsTSubstring.cpp:542)
==7574==    by 0x874FE01: nsXMLHttpRequest::AppendToResponseText(char const*, unsigned int) (nsXMLHttpRequest.cpp:879)
==7574==    by 0x8753800: nsXMLHttpRequest::StreamReaderFunc(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*) (nsXMLHttpRequest.cpp:1977)
==7574==    by 0x95D6C50: nsPipeInputStream::ReadSegments(unsigned int (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) (nsPipe3.cpp:799)
==7574==    by 0x8753E08: nsXMLHttpRequest::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int) (nsXMLHttpRequest.cpp:2072)
==7574==    by 0x8685D2F: nsCORSListenerProxy::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int) (nsCrossSiteListenerProxy.cpp:656)
==7574==    by 0x8202FA3: nsHttpChannel::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int) (nsHttpChannel.cpp:4608)
==7574==    by 0x8118341: nsInputStreamPump::OnStateTransfer() (nsInputStreamPump.cpp:514)
Comment 11 Justin Lebar (not reading bugmail) 2012-03-31 16:27:30 PDT
Ben, do you have any idea how hard it would be to write a memory reporter for this?  It's easy to get SizeOf an nsXMLHttpRequest, but is there a way I can go from a window to its list of nsXMLHttpRequests?
Comment 12 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-03-31 17:18:43 PDT
I don't think we have a nice way of doing this currently... You could probably do something hacky with smaug's EventTarget weak hash table, though, since all XHRs are EventTargets. Let me know if you want details on that.

Otherwise, maybe sicking has some ideas here?
Comment 13 Justin Lebar (not reading bugmail) 2012-03-31 22:19:55 PDT
We could certainly keep a list of all XHRs and iterate over that -- but is there any way to go from an XHR object to its document/window?
Comment 14 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-03-31 22:29:57 PDT
Yes, but... In comment 11 you wanted to go from window to XHR, and then in comment 13 you want to go from XHR to window? Both are possible, XHR keeps a weak ref to its owner window.
Comment 15 Justin Lebar (not reading bugmail) 2012-03-31 23:05:36 PDT
(In reply to ben turner [:bent] from comment #14)
> Yes, but... In comment 11 you wanted to go from window to XHR, and then in
> comment 13 you want to go from XHR to window? Both are possible, XHR keeps a
> weak ref to its owner window.

Either way works for the purposes of memory reporters.
Comment 16 Justin Lebar (not reading bugmail) 2012-04-01 18:26:25 PDT
What does mOwner() == NULL mean?
Comment 17 Justin Lebar (not reading bugmail) 2012-04-01 19:50:08 PDT
(This may not make sense to anyone other than njn): Doing this memory report by keeping a global list of XHR objects and then going from XHR to window object is complicated by the ghost windows split.  When we're running the XHR memory reporter, we don't know whether its window is a ghost, so we don't know what the path should be!

I can conceive of machinery which would allow us to use a list of XHR elements to generate this memory report: We'd basically register a "window memory sub-reporter" with the window memory reporter.  When we're doing the window memory report, we call this sub-reporter, which gives us a series of reports per-window.  We'd then attach each of these reports to the actual window report.

Anyway, I think it's doable, but pretty complicated.  I'm curious what comment 12 would entail.
Comment 18 Jet Villegas (:jet) 2012-04-03 16:09:48 PDT
P2 per Memshrink
Comment 19 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-04-18 18:58:04 PDT
(In reply to Justin Lebar [:jlebar] from comment #16)
> What does mOwner() == NULL mean?

mOwner == NULL means (possibly among other things) that this is a C++ created XHR.
Comment 20 Trev 2013-01-23 07:53:45 PST
The memory is now being classified with bug 826521.  I see a large event-target line item under the jenkins window item.
Comment 21 Nicholas Nethercote [:njn] 2013-01-23 14:07:46 PST
Thanks, Trev!

jlebar, I think I avoided the problems you mentioned in comment 17 because I did what bent suggested in comment 12 -- I iterate over the event targets table and measure all the event targets that are XHRs.

*** This bug has been marked as a duplicate of bug 826521 ***
Comment 22 Jonas Sicking (:sicking) PTO Until July 5th 2013-01-25 00:27:04 PST
Note that for worker-xhr there is both an xhr object on the main thread and one on the worker thread (they are entirely different C++ classes). Both can hold significant amounts of memory.
Comment 23 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-25 08:47:58 PST
(In reply to Jonas Sicking (:sicking) from comment #22)
> Note that for worker-xhr there is both an xhr object on the main thread and
> one on the worker thread (they are entirely different C++ classes). Both can
> hold significant amounts of memory.

Well, they both share the same nsString, so presumably the actual buffer is not duplicated.
Comment 24 Nicholas Nethercote [:njn] 2013-01-25 12:56:37 PST
> Well, they both share the same nsString, so presumably the actual buffer is
> not duplicated.

https://hg.mozilla.org/mozilla-central/rev/de2ab911692d is the patch that added the measurement of nsXHR::mResponseText.  It is a shared string, and so I did some non-typical stuff to measure it in nsXMLHttpRequest::SizeOfEventTargetIncludingThis -- normally we don't even try to measure shared strings because of the risk of double-counting.  If that code isn't valid, please let me know!
Comment 25 Jonas Sicking (:sicking) PTO Until July 5th 2013-01-28 01:57:51 PST
I'm not sure we're always able to do sharing. Especially in the case when .response returns an ArrayBuffer or a JSON object.
Comment 26 Ben Turner (not reading bugmail, use the needinfo flag!) 2013-01-28 10:30:09 PST
(In reply to Jonas Sicking (:sicking) from comment #25)
> I'm not sure we're always able to do sharing. Especially in the case when
> .response returns an ArrayBuffer or a JSON object.

In those cases we're definitely not sharing. It's just when response is text.

Note You need to log in before you can comment on or make changes to this bug.