Closed Bug 1249739 Opened 9 years ago Closed 8 years ago

investigate high memory usage (4GB+) on Google Spreedsheets due to XHRs

Categories

(Core :: DOM: Core & HTML, defect)

Unspecified
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: froydnj, Assigned: baku)

References

Details

(Whiteboard: [MemShrink:P1] dom-triaged btpp-active)

Attachments

(5 files, 5 obsolete files)

+++ This bug was initially created as a clone of Bug #1233481 +++

I'm splitting this bug off to specifically look at bug 1233481 comment 7's spreadsheet.  I've been playing around with it for the past couple of days and here are some of the initial findings:

* Don't even try dealing with this thing on a 32-bit build.  I can't get the spreadsheet to load in a 32-bit (Windows) build.  Use a 64-bit build if you want to see it.

* Loading the spreadsheet can easily peak at 4GB+ on a 64-bit Windows build (according to Task Manager), but it's not so easy to tell from about:memory or a DMD run where all that 4GB+ is going.  That memory usage is very transient, and even trying to ensure that a DMD log gets triggered when the memory usage is particularly high doesn't seem to be worthwhile.

I modified DMD so it will automatically dump a report when ~4GB or so of memory is live.  The far-and-away winner for space allocated is, surprisingly:

Unreported {
  228 blocks in heap block record 1 of 8,481
  4,007,497,728 bytes (4,007,132,822 requested / 364,906 slop)
  Individual block sizes: [elided due to space, 11-22MB apiece)
  93.61% of the heap (93.61% cumulative)
  93.61% of unreported (93.61% cumulative)
  Allocated at {
    #01: nsStringBuffer::Alloc (c:\m-c\xpcom\string\nssubstring.cpp:218)
    #02: nsAString_internal::MutatePrep (c:\m-c\xpcom\string\nstsubstring.cpp:133)
    #03: nsAString_internal::SetCapacity (c:\m-c\xpcom\string\nstsubstring.cpp:665)
    #04: nsXMLHttpRequest::AppendToResponseText (c:\m-c\dom\base\nsxmlhttprequest.cpp:608)
    #05: nsXMLHttpRequest::StreamReaderFunc (c:\m-c\dom\base\nsxmlhttprequest.cpp:1732)
    #06: nsStringInputStream::ReadSegments (c:\m-c\xpcom\io\nsstringstream.cpp:249)
    #07: nsXMLHttpRequest::OnDataAvailable (c:\m-c\dom\base\nsxmlhttprequest.cpp:1814)
    #08: mozilla::net::nsHTTPCompressConv::do_OnDataAvailable (c:\m-c\netwerk\streamconv\converters\nshttpcompressconv.cpp:493)
    #09: mozilla::net::nsHTTPCompressConv::OnDataAvailable (c:\m-c\netwerk\streamconv\converters\nshttpcompressconv.cpp:414)
    #10: mozilla::net::HttpChannelChild::DoOnDataAvailable (c:\m-c\netwerk\protocol\http\httpchannelchild.cpp:799)
    #11: mozilla::net::HttpChannelChild::OnTransportAndData (c:\m-c\netwerk\protocol\http\httpchannelchild.cpp:727)
    #12: mozilla::net::HttpChannelChild::RecvOnTransportAndData (c:\m-c\netwerk\protocol\http\httpchannelchild.cpp:629)
    #13: mozilla::net::PHttpChannelChild::OnMessageReceived (c:\m-c\obj-dmd64\ipc\ipdl\phttpchannelchild.cpp:611)
    #14: mozilla::dom::PContentChild::OnMessageReceived (c:\m-c\obj-dmd64\ipc\ipdl\pcontentchild.cpp:6116)
    #15: mozilla::ipc::MessageChannel::DispatchAsyncMessage (c:\m-c\ipc\glue\messagechannel.cpp:1445)
    #16: mozilla::ipc::MessageChannel::DispatchMessageW (c:\m-c\ipc\glue\messagechannel.cpp:1385)
    #17: mozilla::ipc::MessageChannel::OnMaybeDequeueOne (c:\m-c\ipc\glue\messagechannel.cpp:1355)
    #18: RunnableMethod<SoftwareDisplay,void (__cdecl SoftwareDisplay::*)(void) __ptr64,mozilla::Tuple<> >::Run (c:\m-c\ipc\chromium\src\base\task.h:308)
    #19: MessageLoop::RunTask (c:\m-c\ipc\chromium\src\base\message_loop.cc:365)
    #20: MessageLoop::DeferOrRunPendingTask (c:\m-c\ipc\chromium\src\base\message_loop.cc:375)
    #21: MessageLoop::DoWork (c:\m-c\ipc\chromium\src\base\message_loop.cc:459)
    #22: mozilla::ipc::DoWorkRunnable::Run (c:\m-c\ipc\glue\messagepump.cpp:221)
    #23: nsThread::ProcessNextEvent (c:\m-c\xpcom\threads\nsthread.cpp:1018)
    #24: NS_ProcessNextEvent (c:\m-c\xpcom\glue\nsthreadutils.cpp:297)
  }
}

There's another ~32MB allocation around the same place, but it's not paired up with the above.

There's also:

Unreported {
  ~821 blocks in heap block record 2 of 8,481
  ~41,121,289 bytes (~41,121,289 requested / ~0 slop)
  Individual block sizes: 131,072 x 96; 65,536 x 211; 32,768 x 421; 16,384 x 37; 8,192 x 23; 4,096 x 27; 2,048 x 4; 1,024; ~521
  0.96% of the heap (94.57% cumulative)
  0.96% of unreported (94.57% cumulative)
  Allocated at {
    #01: calloc_impl (c:\m-c\memory\build\replace_malloc.c:183)
    #02: js::MallocProvider<js::ExclusiveContext>::maybe_pod_calloc<js::ShapeTable::Entry> (c:\m-c\js\src\vm\mallocprovider.h:65)
    #03: js::MallocProvider<js::ExclusiveContext>::pod_calloc<js::ShapeTable::Entry> (c:\m-c\js\src\vm\mallocprovider.h:136)
    #04: js::ShapeTable::init (c:\m-c\js\src\vm\shape.cpp:55)
    #05: js::Shape::hashify (c:\m-c\js\src\vm\shape.cpp:165)
    #06: js::Shape::search<0> (c:\m-c\js\src\vm\shape-inl.h:58)
    #07: js::NativeObject::lookup (c:\m-c\js\src\vm\nativeobject.cpp:237)
    #08: js::frontend::LexicalLookup<js::frontend::ParseContext<js::frontend::FullParseHandler> > (c:\m-c\js\src\frontend\parser.cpp:1796)
    #09: js::frontend::Parser<js::frontend::FullParseHandler>::noteNameUse (c:\m-c\js\src\frontend\parser.cpp:4160)
    #10: js::frontend::Parser<js::frontend::FullParseHandler>::identifierName (c:\m-c\js\src\frontend\parser.cpp:8746)
    #11: js::frontend::Parser<js::frontend::FullParseHandler>::primaryExpr (c:\m-c\js\src\frontend\parser.cpp:9317)
    #12: js::frontend::Parser<js::frontend::FullParseHandler>::memberExpr (c:\m-c\js\src\frontend\parser.cpp:8568)
    #13: js::frontend::Parser<js::frontend::FullParseHandler>::unaryExpr (c:\m-c\js\src\frontend\parser.cpp:8089)
    #14: js::frontend::Parser<js::frontend::FullParseHandler>::assignExpr (c:\m-c\js\src\frontend\parser.cpp:7779)
    #15: js::frontend::Parser<js::frontend::FullParseHandler>::expr (c:\m-c\js\src\frontend\parser.cpp:7480)
    #16: js::frontend::Parser<js::frontend::FullParseHandler>::expressionStatement (c:\m-c\js\src\frontend\parser.cpp:5710)
    #17: js::frontend::Parser<js::frontend::FullParseHandler>::statement (c:\m-c\js\src\frontend\parser.cpp:7373)
    #18: js::frontend::Parser<js::frontend::FullParseHandler>::statements (c:\m-c\js\src\frontend\parser.cpp:3544)
    #19: js::frontend::Parser<js::frontend::FullParseHandler>::blockStatement (c:\m-c\js\src\frontend\parser.cpp:4554)
    #20: js::frontend::Parser<js::frontend::FullParseHandler>::statement (c:\m-c\js\src\frontend\parser.cpp:7293)
    #21: js::frontend::Parser<js::frontend::FullParseHandler>::ifStatement (c:\m-c\js\src\frontend\parser.cpp:5744)
    #22: js::frontend::Parser<js::frontend::FullParseHandler>::statement (c:\m-c\js\src\frontend\parser.cpp:7377)
    #23: js::frontend::Parser<js::frontend::FullParseHandler>::statements (c:\m-c\js\src\frontend\parser.cpp:3544)
    #24: js::frontend::Parser<js::frontend::FullParseHandler>::globalBody (c:\m-c\js\src\frontend\parser.cpp:1117)
  }
}

and some other related stacks around ShapeTable::Init; I feel like these are immaterial compared to the nsXMLHttpRequest bit.

Filing this in DOM, at least initially.  Anybody want to hazard guesses as to what the page is doing to get 4GB of XHRs?  There is at least one worker on the page that gets reported in about:memory...could this be serviceworkers going bonkers somehow?
I should note as well that I'm not on a fast enough internet connection to download 4GB+ in a matter of seconds, so I feel like these XHRs must be hitting...something else.
bkelly might know if this could be a service worker thing.
(In reply to Olli Pettay [:smaug] (HIGH REVIEW LOAD) from comment #2)
> bkelly might know if this could be a service worker thing.

ni? to ensure he sees it.
Flags: needinfo?(bkelly)
That stacks suggest the data is coming from the parent process.  This would imply service workers are not intercepting here.

It seems like maybe this is "double memory on capacity increase" getting out of hand?
Flags: needinfo?(bkelly)
(In reply to Ben Kelly [:bkelly] from comment #4)
> That stacks suggest the data is coming from the parent process.  This would
> imply service workers are not intercepting here.

These stacks are definitely from the content process, FWIW.
(In reply to Nathan Froyd [:froydnj] from comment #5)
> (In reply to Ben Kelly [:bkelly] from comment #4)
> > That stacks suggest the data is coming from the parent process.  This would
> > imply service workers are not intercepting here.
> 
> These stacks are definitely from the content process, FWIW.

Yes, but:

    #12: mozilla::net::HttpChannelChild::RecvOnTransportAndData (c:\m-c\netwerk\protocol\http\httpchannelchild.cpp:629)

Its receiving the data from the parent process.
Whiteboard: [MemShrink] → [MemShrink] dom-triaged btpp-triaged
(In reply to Nathan Froyd [:froydnj] from comment #1)
> I should note as well that I'm not on a fast enough internet connection to
> download 4GB+ in a matter of seconds, so I feel like these XHRs must be
> hitting...something else.

Devtools/wireshark it?
(In reply to Ben Kelly [:bkelly] from comment #6)
> Its receiving the data from the parent process.

Sorry, I completely missed the "from the" part in there!  </egg-face>
(In reply to Ben Kelly [:bkelly] from comment #4)
> 
> It seems like maybe this is "double memory on capacity increase" getting out
> of hand?

Comment 0 says there are 228 blocks allocated and their sizes range from 11 MiB to 22 MiB. In the worst case half the space would be wasted, which still indicates 2 GB of data.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #7)
> (In reply to Nathan Froyd [:froydnj] from comment #1)
> > I should note as well that I'm not on a fast enough internet connection to
> > download 4GB+ in a matter of seconds, so I feel like these XHRs must be
> > hitting...something else.
> 
> Devtools/wireshark it?

Debug logging says all of the large XHR allocations are hitting the same URL (${LONG_HEX_FINGERPRINT}.cache.js), so perhaps we're serving it from cache.  From the debug log and the DMD log, it looks like some of the allocations after being kept alive even as the string is being resized, so perhaps somebody is grabbing .responseText somehow?  I haven't dived into the JS to find out how XHRs are being used in this case.  Need more logging...
Assigning to Nathan since he's investigating. Feel free to unassign, Nathan.
Assignee: nobody → nfroyd
Whiteboard: [MemShrink] dom-triaged btpp-triaged → [MemShrink] dom-triaged btpp-active
OK, so my best guess based on logging and reading some code is that there's a script, cache.js, that's doing something like the following:

  var xhr = /* open an XHR request to a multi-megabyte file */
  while (...) {
    // Get whatever we've accumulated so far.
    var rt = xhr.responseText;
    // do something with rt
  }

This sort of loop seems like a really odd way to do things, but I'm sure there's a good reason for it.  Do we have contacts at Google that we can ask about stuff like this?

(Note that cache.js is the thing I thought we were making XHRs for in comment 10; it's actually the script that's making the XHR request because I am incapable of reading code.  cache.js itself is ~1MB of minified JS, and I haven't had the stomach to try tracing through the minified JS to find the loop.  Naturally, things like `responseText' and `XMLHttpRequest' don't appear literally in the minified JS.)

nsXMLHttpRequest accumulates the data it receives from the networking stack into an nsString member, mResponseText.  The only place we hand out copies of mResponseText is through the .responseText accessor shown above.  Because we're interested in efficiency, handing out .responseText doesn't actually copy the string into JS engine memory; instead, we allocate an "external string" in JS parlance that shares the data with mResponseText.  Our string types support things sort of sharing by using reference-counted string buffers.  Before .responseText is called, the buffer of mResponseText has a reference count of 1; after we hand the string out to JS, this buffer has a reference count of 2 to indicate JS's partial ownership of the string.

Except that mResponseText isn't static; it's constantly changing as new data arrives from the network.  Which means that the next time data arrives, we're probably going to reallocate mResponseText, leaving JS with the lone reference to the string buffer.

The only problem with this scheme is that the JS GC doesn't know anything about how much memory this external string is actually holding.  The GC probably believes the string is rather small (just a basic JS object), when in reality not GC'ing that string means that several megabytes of data is being kept alive.  After going through the above example loop hundreds (!) of times (as logging suggests), JS is holding on to hundreds or thousands of megabytes of strings and doesn't realize it.

This effect is exacerbated because nsXMLHttpRequest is receiving new data from the network stack in 16KB chunks (!) due to compression (presumably the block size used on the other end, or something?).

The end result is that peak memory usage is really high, and eventually the GC will get triggered to remove these external strings...but in-between those strings getting allocated and those strings getting deallocated, the user experience is rather awful.

The suggested solution from #content is to alert the GC to the external string's size...but this is a little dicey if our string buffer is actually shared when we construct that external string, as it is in the description above.  We might badly overestimate the amount of memory consumed by the GC if somebody's constantly obtaining references to the same large shared string.  I'm not completely convinced that telling the GC about the string even if it's shared is a bad thing; it's possible we might trigger more GCs in the worst case, but that seems better than waiting until we have used up several GB of memory before we GC.  Or maybe we only tell the GC about it if the refcount is <= 2, on the theory that one of the references might go away soon?

ni? to overholt for the Google contacts bit at the beginning, and ni? Terrence for what to do about the GC heuristics.
Flags: needinfo?(terrence)
Flags: needinfo?(overholt)
I think the right solution for this particular situation is that XHR should update the GC malloc counter whenever it updates mResponseText.  That's a good idea anyway, because the XHR object itself might become collectable, so we want to account for its possibly-large internal memory, just like we do for canvas rendering contexts.  We would want to be sure to update the counter by only the amount we're appending if we have a writable string, and by our entire new length if we have a readonly one.
bz volunteered to write a patch, so I guess these ni?'s are unnecessary.
Flags: needinfo?(terrence)
Flags: needinfo?(overholt)
I agree. It's too bad that we might overcount the string data, but currently we only schedule off of allocations since the last GC anyway, so every time you do a GC you forget the past history and all is forgiven.

Only... it looks like we already *do* account for external string data upon creation? http://dxr.mozilla.org/mozilla-central/source/js/src/vm/String-inl.h#323

Is it not going through that code somehow?
I would expect this case to go through the code linked to in comment 43.
I mean comment 15.  It clearly became 43 because you have to multiply by 3 and subtract 2 this week.
Comment on attachment 8722721 [details] [diff] [review]
Something to test

Review of attachment 8722721 [details] [diff] [review]:
-----------------------------------------------------------------

This path does seem to make peak memory usage better...but according to Task Manager on Windows, we still get over 1GB of memory usage fairly regularly and even up in the 1.5-2GB+ range.  It's also not triggering my automatic "do a DMD analysis when we're allocated 4GB of memory", so that's good.

I'll try doing some more DMD runs with this patch tomorrow to get a better idea of max memory usage and seeing if there some transient memory we're missing along the way.  I won't be able to think straight about doing the code tonight.

::: dom/base/nsXMLHttpRequest.cpp
@@ +635,5 @@
> +    nsIGlobalObject* global = DOMEventTargetHelper::GetParentObject();
> +    if (global) {
> +      AutoJSAPI jsapi;
> +      if (jsapi.Init(global)) {
> +        JS_updateMallocCounter(jsapi.cx(), capacityIncrease);

|capacityIncrease| is just how many extra characters the string has grown, correct?  So we really need to multiply this by sizeof(decltype(mResponseText)::char_type)?

Sadly, making that change doesn't seem to improve max memory usage. =/
> So we really need to multiply this by sizeof(decltype(mResponseText)::char_type)?

Er, yes we do.  But given comment 15, it seems like this is not the only issue at play...
Whiteboard: [MemShrink] dom-triaged btpp-active → [MemShrink:P1] dom-triaged btpp-active
OK, better theory about this after staring at refcount logs for a while.

1. We have a worker that opens an XHR.
2. Worker XHRs actually rely on the main thread to process the XHR data; that's where we see the string being allocated in the first place.
3. The string actually gets passed back to the worker thread (!) somewhere around the beginning of EventRunnable::PreDispatch:

http://mxr.mozilla.org/mozilla-central/source/dom/workers/XMLHttpRequest.cpp#1185

4. The last Release() shown in log isn't coming from ~Heap; it's actually coming from EventRunnable::~EventRunnable, when EventRunnable::mResponseText gets destructed.

The funny thing is that according to the refcount log, most (nearly all?) of these strings aren't making it out to JS; they're getting stuck in some sort of worker event queue.  That would explain the skyrocketing memory usage; shoving a bunch of these events into the worker event queue and not processing them until...what?...is not a recipe for happiness.  If something like this is true, it means our story for using XHR from workers is...pretty bad.

I'll take a shot at grokking the worker event queue blockage tomorrow.
It looks like whenever any of the events defined in sEventStrings in dom/workers/XMLHttpRequest.cpp fires on the main thread, we post a runnable to the worker to fire the corresponding event there.  And this runnable brings along a snapshot of the responseText (and some other state, but the other state is small until the load is done, at which point we move the response over in the non-text cases) at the point when the event fired.

I tried doing some archeology, and it looks like this behavior has been around for a good long while (e.g. predates bug 649537).  I guess the idea is that we should avoid an extra roundtrip to the main thread if the worker wants to poke at XHR state from the event listener.

But in this case what that means is that for each progress event we end up creating a copy of the then-current responseText that hangs out in the EventRunnable.  And if the worker is busy with something, those can pile up.
Can we just send the delta since the last runnable?
We can try; we'd need to store the thing from the last runnable on the Proxy, I guess, so we'd have three copies of the response text around (in the mainthread XHR, in the Proxy, in the worker-side XHR) instead of the best-case two copies we have right now.

Really, what it feels to me like we should do is send deltas, when sending each delta clear out the response text on the main thread (so basically steal the text into the runnable), and do the non-string "response" (for GetResponse) creation on the _worker_ thread instead of doing it on the main thread and transferring it over as we do now.  We'd need to check that the main-thread XHR isn't using its responseText length for anything of interest, which I think should be true anyway because we support the moz-chunked-text response type...
We could also rewrite worker xhr on top of fetch().  It streams data using nsIInputStream.  I think the only thing it's missing is plumbing to cancel the nsIChannel if the xhr is aborted.
Progress events might require extra work as well.  Fetch does not have a progress mechanism yet.
The hard part is that we either need to make the state machine work on workers or we still need to ship data to the worker _and_ possibly have it on mainthread, depending on what the state machine needs.
Is this an xhr specific state machine?  Xhr and fetch are fairly equivalent in impl now that a lot has been pushed in necko layer.  Xhr spec is now written in terms of fetch as well.
The XHR state machine, in terms of updating its states, the order it fires its events (progress vs load), etc.
I'm going to unassign myself, because I think we understand what the problem is, we have to fix it, and I'm not sure I have all the context to fix it.  (I think I could do it, it'd just take me a while.)  ni? overholt for next steps.
Assignee: nfroyd → nobody
Flags: needinfo?(overholt)
(I'm figuring out a plan here)
I was thinking about this last night and it seems like another solution would be to run the XHR state machine on the worker thread with off-main-thread delivery of the network data.  That requires bug 1015466 + some careful auditing of the XHR code for threading issues...but that ought to Just Work, right?
Flags: needinfo?(khuey)
The problem is that some of the relevant notifications (e.g. progress) can only be delivered by necko on the main thread last I checked...
baku's going to investigate and figure out the best option here.
Assignee: nobody → amarchesini
Flags: needinfo?(overholt)
We can implement something similar to what we have for WebSocket:

. WebSocket is a non-thread-safe object, available on workers and main-thread,
. WebSocketImpl is a thread-safe object that speaks with necko and it dispatches notification to WebSocket object and/or to content directly.
. Necko works fine on the worker thread (talking about nsIWebSocketChannel) but some operations (openAsync/close) must be executed on the main-thread. But delivering data, that happens on the owning thread.
Flags: needinfo?(khuey)
Depends on: 1269162
(In reply to Ben Kelly [:bkelly] from comment #23)
> Can we just send the delta since the last runnable?

What about if we don't dispatch these runnables at all if we don't have event listeners for them?
Currently we dispatch runnables when abort, error, load, loadstart, progress, timeout, readystatechange, loadend events are received. I propose to dispatch runnables only when we have a listener on the worker-side from them.

When AddEventListener or on<event> is called, we dispatch a sync runnable to inform the proxy about that.
Of course this can be racy, if the event listener is added after send(). But this is racy also on the main-thread because the HTTP connection runs on a different thread, as far as I remember.
Some more information:

XHR in worker doesn't set any event listener nor calls on<event>. The open requests are all done synchronously but we still dispatch 8 runnables on average (4 for readystatechange, 1 loadstart, 1 progress, 1 load and 1 loadend). I want to submit a patch to improve this scenario.
Yeah, having a separate impl of XHR on workers has always been a nightmare, but we couldn't reuse the main thread state machine until bug 1015466 was done.

You could try just blocking the worker and going to the main thread for 'responseText' whenever it's accessed before the XHR completes. Hopefully that's a rare pattern? It would let you avoid copying partial results multiple times.
I assume this is the problem I've been having. On the following sheet, shortly after the yellow "Working..." message vanishes, memory use rapidly rises and falls for a while, until it eventually levels off at a reasonable value.

It seems to get much worse when e10s is enabled.

With the current nightly, without e10s the heap-unclassified spikes to 5G, while with e10s the content process heap-unclassified spikes to 22G.

https://docs.google.com/spreadsheets/d/1F2GPi4_bZqXCwLpU_B_PFiBc0G9p4KyuJzY6YhCxxGU/edit
Some info about this bug. Using DMD I see that, loading the URL from comment 39, we have:

Top 2 unreported entries:

Unreported {
  471 blocks in heap block record 1 of 6,409
  41,861,056 bytes (41,726,300 requested / 134,756 slop)
  Individual block sizes: 16,273,408 x 2; 2,822,144 x 2; 159,744 x 2; 49,152 x 56; 36,864 x 2; 24,576 x 2; 16,384 x 16; 12,288 x 6; 8,192 x 4; 4,096 x 6; 2,048 x 21; 1,024 x 7; 512; 496; 384 x 2; 240 x 38; 176; 160 x 7; 144 x 4; 112 x 4; 96 x 5; 64 x 280; 48; 16
  10.77% of the heap (10.77% cumulative)
  31.62% of unreported (31.62% cumulative) 
  Allocated at {
    #01: __driDriverGetExtensions_i915 (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so)                        
    #02: ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so)
    #03: glAreTexturesResidentEXT (/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1)
    #04: ??? (/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1)
    #05: mozilla::gl::GLXLibrary::xCreateContextAttribs(_XDisplay*, __GLXFBConfigRec*, __GLXcontextRec*, int, int const*) (/home/baku/Sources/m/bugs2/src/gfx/gl/GLContextProviderGLX.cpp:769)


Unreported {
  250 blocks in heap block record 2 of 6,409
  20,933,552 bytes (20,865,442 requested / 68,110 slop)
  Individual block sizes: 16,273,408; 2,822,144; 159,744; 49,152 x 28; 36,864; 24,576; 16,384 x 8; 12,288 x 3; 8,192 x 2; 4,096 x 3; 2,048 x 11; 1,024 x 4; 512; 496; 416; 384; 240 x 16; 176; 160 x 3; 144; 112 x 6; 96 x 10; 64 x 144; 32
  5.38% of the heap (16.15% cumulative)
  15.81% of unreported (47.44% cumulative)
  Allocated at {
    #01: __driDriverGetExtensions_i915 (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so)
    #02: ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so)
    #03: glAreTexturesResidentEXT (/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1)
    #04: ??? (/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1)
    #05: mozilla::gl::GLXLibrary::xCreateContextAttribs(_XDisplay*, __GLXFBConfigRec*, __GLXcontextRec*, int, int const*) (/home/baku/Sources/m/bugs2/src/gfx/gl/GLContextProviderGLX.cpp:769)


Top 2 once-reported entries:

Once-reported {
  1,067 blocks in heap block record 1 of 4,258
  54,295,040 bytes (54,295,040 requested / 0 slop)
  Individual block sizes: 131,072 x 147; 65,536 x 272; 32,768 x 482; 16,384 x 52; 8,192 x 35; 4,096 x 56; 2,048 x 17; 1,024 x 3; 512 x 3
  13.97% of the heap (13.97% cumulative)
  21.18% of once-reported (21.18% cumulative)
  Allocated at {
    #01: js::ShapeTable::init(js::ExclusiveContext*, js::Shape*) (/home/baku/Sources/m/bugs2/src/js/src/vm/Shape.cpp:51)
    #02: js::Shape::hashify(js::ExclusiveContext*, js::Shape*) (/home/baku/Sources/m/bugs2/src/js/src/vm/Shape.cpp:161)
    #03: js::Shape* js::Shape::search<(js::MaybeAdding)0>(js::ExclusiveContext*, js::Shape*, jsid, js::ShapeTable::Entry**) (/home/baku/Sources/m/bugs2/src/js/src/vm/Shape-inl.h:58)


Once-reported {
  15 blocks in heap block record 2 of 4,258
  15,190,016 bytes (15,157,824 requested / 32,192 slop)
  Individual block sizes: 4,616,192 x 2; 4,317,184; 565,248; 364,544; 225,280; 192,512; 184,320; 65,536; 12,288; 8,192 x 3; 4,096; 2,048
  3.91% of the heap (17.87% cumulative)
  5.92% of once-reported (27.10% cumulative)
  Allocated at {
    #01: mozilla::layers::MemoryTextureData::Create(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::gfx::SurfaceFormat, mozilla::gfx::BackendType, mozilla::layers::TextureFlags, mozilla::layers::TextureAllocationFlags, mozilla::layers::ClientIPCAllocator*) (/home/baku/Sources/m/bugs2/src/gfx/layers/BufferTexture.cpp:472)
    #02: mozilla::layers::BufferTextureData::Create(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::gfx::SurfaceFormat, mozilla::gfx::BackendType, mozilla::layers::TextureFlags, mozilla::layers::TextureAllocationFlags, mozilla::layers::ClientIPCAllocator*) (/home/baku/Sources/m/bugs2/src/gfx/layers/BufferTexture.cpp:116)
    #03: mozilla::layers::TextureClient::CreateForRawBufferAccess(mozilla::layers::ClientIPCAllocator*, mozilla::gfx::SurfaceFormat, mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits>, mozilla::gfx::BackendType, mozilla::layers::TextureFlags, mozilla::layers::TextureAllocationFlags) (/home/baku/Sources/m/bugs2/src/gfx/layers/client/TextureClient.cpp:1210)


The total is:
Summary {
  Total:           388,749,072 bytes (100.00%) in 863,119 blocks (100.00%)
  Unreported:      132,376,584 bytes ( 34.05%) in 310,104 blocks ( 35.93%)
  Once-reported:   256,372,488 bytes ( 65.95%) in 553,015 blocks ( 64.07%)
  Twice-reported:            0 bytes (  0.00%) in       0 blocks (  0.00%)
}


The first 'dom' entry in the DMD log is:

Once-reported {
  4 blocks in heap block record 12 of 4,258
  3,940,352 bytes (3,933,770 requested / 6,582 slop)
  Individual block sizes: 2,011,136; 1,900,544; 24,576; 4,096
  1.01% of the heap (37.01% cumulative)
  1.54% of once-reported (56.12% cumulative)
  Allocated at {
    #01: nsTextFragment::SetTo(char16_t const*, int, bool) (/home/baku/Sources/m/bugs2/src/dom/base/nsTextFragment.cpp:281)
    #02: nsGenericDOMDataNode::SetTextInternal(unsigned int, unsigned int, char16_t const*, unsigned int, bool, CharacterDataChangeInfo::Details*) (/home/baku/Sources/m/bugs2/src/dom/base/nsGenericDOMDataNode.cpp:333 (discriminator 6))
    #03: nsGenericDOMDataNode::SetText(char16_t const*, unsigned int, bool) (/home/baku/Sources/m/bugs2/src/dom/base/nsGenericDOMDataNode.cpp:1000)
    #04: nsGenericDOMDataNode::SetText(nsAString_internal const&, bool) (/home/baku/Sources/m/bugs2/src/dom/base/nsGenericDOMDataNode.h:136)
    #05: nsContentUtils::SetNodeTextContent(nsIContent*, nsAString_internal const&, bool) (/home/baku/Sources/m/bugs2/src/dom/base/nsContentUtils.cpp:4748)
    #06: mozilla::dom::HTMLScriptElement::SetText(nsAString_internal const&, mozilla::ErrorResult&) (/home/baku/Sources/m/bugs2/src/dom/html/HTMLScriptElement.cpp:136)


About XMLHttpRequest they appear here:

Unreported {
  1 block in heap block record 326 of 6,409
  12,288 bytes (8,464 requested / 3,824 slop)
  0.00% of the heap (32.66% cumulative)
  0.01% of unreported (95.90% cumulative)
  Allocated at {
    #01: NS_NewXMLContentSink(nsIXMLContentSink**, nsIDocument*, nsIURI*, nsISupports*, nsIChannel*) (/home/baku/Sources/m/bugs2/src/dom/xml/nsXMLContentSink.cpp:89)
    #02: mozilla::dom::XMLDocument::StartDocumentLoad(char const*, nsIChannel*, nsILoadGroup*, nsISupports*, nsIStreamListener**, bool, nsIContentSink*) (/home/baku/Sources/m/bugs2/src/dom/xml/XMLDocument.cpp:536)
    #03: mozilla::dom::XMLHttpRequestMainThread::OnStartRequest(nsIRequest*, nsISupports*) (/home/baku/Sources/m/bugs2/src/dom/xhr/XMLHttpRequestMainThread.cpp:1948)
Attached file DMD report (obsolete) —
It is a little hard to tell what is going on in that log, because the memory usage has not gotten so high yet (388MB). The report only has 34% heap-unclassified, and more than 60% of that is under mozilla::gl::GLContextGLX::CreateGLContext, which I assume is just some standard graphics stuff. Maybe letting the memory usage grow to 1GB before getting a report would make it more obvious what is happening? I'll try to look through this a bit more.
>     #01: js::ShapeTable::init(js::ExclusiveContext*, js::Shape*)
> (/home/baku/Sources/m/bugs2/src/js/src/vm/Shape.cpp:51)
>     #02: js::Shape::hashify(js::ExclusiveContext*, js::Shape*)
> (/home/baku/Sources/m/bugs2/src/js/src/vm/Shape.cpp:161)

Jan just optimized shape memory usage in bug 1295967, with more to come in bug 1299107. So this might improve soon.
Attachment #8787319 - Attachment is obsolete: true
Andrea got a new DMD log, when the memory usage was much higher. Hopefully he'll upload it at some point.He said that it had something like 700MB of unreported memory from mozilla::dom::XMLHttpRequestMainThread::AppendToResponseText(). I'm not sure why that isn't being reported, as we do have a memory reporter for XHR, for that field.
Because strings with multiple refcounts don't get reported, iirc?
(In reply to Boris Zbarsky [:bz] from comment #45)
> Because strings with multiple refcounts don't get reported, iirc?

That's correct.
First of all, I want to say that with this bug I'm learning every single bit of our nsString implementation.
I'm still debugging this issue, but what I want to share is that:

1. there are many XHR used to retrieve JS and JSON resources.
2. these XHRs are kept in memory, and so we don't release mResponseText.
3. For some of these XHRs we have big strings stored in mResponseText. More than 20mb.
4. XHR objects, all together, allocate ~700mb of data in strings.
5. networking can go up to 10mb.
6. chrome as similar allocation memory and similar networking.
Do you know why those XHRs are being kept alive?
I mean, is it some JS keeping them alive, or does Necko for example keep XHRs alive?
Can any of these strings be converted to single-byte js strings?
(In reply to Andrea Marchesini [:baku] from comment #47)
> First of all, I want to say that with this bug I'm learning every single bit
> of our nsString implementation.

That's great!  We need more string peers. =D
This is probably worth a recap at this point. It sounds like there are a few issues right now:

#1 - JS code grabbing partial XHR results in a loop (comment 12). bz put up a patch but comment 15, comment 19 seem to indicate that didn't fix the (entire?) problem. It's not clear if we were going through the code path in comment 15 or not. In the long run doing something to work with this behavior would be a good idea (I've seen similar code when looking at perf problems in tweetdeck), maybe we should spin off a separate bug?

#2 - XHR + Workers == tons of memory. (comment 21 - comment 38). There were a few proposals on how to reduce the amount of string data transferred:
  - Just send deltas (comment 23, comment 24)
  - Build on top of fetch (comment 25 - comment 29)
  - Use off-main-thread delivery for network data (comment 32, I'm not sure if this was it's own solution or in combination with another), but that is blocked by 1015466 which seems pretty dead right now.
  - Do what WebSockets does (comment 36)

#2.1 - Don't dispatch runnables from the worker if there aren't listeners for the events (comment 36). This seems like we could spin off a separate bug regardless of other fixes.

#3 - There are a lot of XHR's hanging around with a lot of data in mResponseText. Questions remain as to why that is.

#4 - Using strings for XHR data may not be the most efficient solution
  - Comment 50 suggests using something other than wide strings if possible (see also bug 1275280)
  - Comment 52 suggests improving our allocation logic for large strings. It seems like we could spin off a bug for this as well.

Hopefully that gets us up to date. Some feedback on whether we want to spin of bugs and if the ideas proposed are still valid would be helpful.
> #2 - XHR + Workers == tons of memory. (comment 21 - comment 38). There were
> a few proposals on how to reduce the amount of string data transferred:

This is not valid anymore. I suspect google changed the code since comment 21.
Currently there are not XHRs in workers. We definitely have to improve XHR in workers, but not for this bug.

> #2.1 - Don't dispatch runnables from the worker if there aren't listeners
> for the events (comment 36). This seems like we could spin off a separate
> bug regardless of other fixes.

Same here.
I was wrong about the Worker and XHR. And I also know the reason why we have this issue. Here what is happening:

There is a small number of XHR in workers, but we dispatch a big number of runnables when loading data. Each one of these take a copy of mResponseText in the PreDispatch():

https://dxr.mozilla.org/mozilla-central/source/dom/xhr/XMLHttpRequestWorker.cpp#1132

Sometimes we end up duplicating mResponseText 300 times, waiting to have these runnables dispatched. The reason why we do this is because we want to dispatch the exactly number of events in XHR in workers, and each one should have the correct XHR state (responseText, the URL and some other attributes).

I'm already working on a solution: mResponseText is a kind of string that can only be increased during the loading. I'm making a new object, thread-safe, refcounted, to keep a snapshot of mResponseText in the worker thread without duplicating data.

A patch will be ready for today.
Attachment #8788940 - Flags: review?(bugs)
Attachment #8788941 - Flags: review?(bugs)
Attached patch part 3 - Workers part (obsolete) — Splinter Review
Attachment #8788942 - Flags: review?(bugs)
Comment on attachment 8788941 [details] [diff] [review]
part 2 - XMLHttpRequestString for main-thread

we do want to share data whenever possible, and nsString (by using StringBuffer) gives it to us. So, using char16_t* mData; doesn't feel right.

And allocation model should be something describe in bug 1301025 (better to fix nsStrings)
Attachment #8788941 - Flags: review?(bugs) → review-
Comment on attachment 8788942 [details] [diff] [review]
part 3 - Workers part

So this will need some changes too
Attachment #8788942 - Flags: review?(bugs)
Attachment #8788940 - Flags: review?(bugs) → review+
Attachment #8788941 - Attachment is obsolete: true
Attachment #8788998 - Flags: review?(bugs)
Attached patch part 3 - Workers part (obsolete) — Splinter Review
Attachment #8788942 - Attachment is obsolete: true
Attachment #8788999 - Flags: review?(bugs)
Comment on attachment 8788998 [details] [diff] [review]
part 2 - XMLHttpRequestString for main-thread

>-
>-  // Why is this safe?  Because no-one else will report this string.  The
>-  // other possible sharers of this string are as follows.
>-  //
>-  // - The JS engine could hold copies if the JS code holds references, e.g.
>-  //   |var text = XHR.responseText|.  However, those references will be via JS
>-  //   external strings, for which the JS memory reporter does *not* report the
>-  //   chars.
>-  //
>-  // - Binary extensions, but they're *extremely* unlikely to do any memory
>-  //   reporting.
>-  //
Please don't lose this comment

>   nsresult rv = mDecoder->GetMaxLength(aSrcBuffer, aSrcBufferLen,
>                                        &destBufferLen);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>-  uint32_t size = mResponseText.Length() + destBufferLen;
>-  if (size < (uint32_t)destBufferLen) {
>+  nsAutoString buffer;
>+  if (!buffer.SetCapacity(destBufferLen, fallible)) {
>     return NS_ERROR_OUT_OF_MEMORY;
>   }
Why do we have this temporary buffer? This causes some extra alloc/free which we shouldn't need, and certainly don't want.
You should be able to increase the capacity of mResponseText and get pointer to its data and use it here.
Attachment #8788998 - Flags: review?(bugs) → review-
Comment on attachment 8788999 [details] [diff] [review]
part 3 - Workers part

> XMLHttpRequestMainThread::GetResponseText(nsAString& aResponseText,
>                                           ErrorResult& aRv)
> {
>-  aResponseText.Truncate();
>-
>+  XMLHttpRequestStringSnapshot snapshot;
>+  GetResponseText(snapshot, aRv);
>+  if (aRv.Failed()) {
>+    return;
>+  }
>+
>+  if (!snapshot.IsSet()) {
>+    aResponseText.SetIsVoid(true);
This looks wrong.

(new XMLHttpRequest()).responseText should be "", not null.


Please check also other SetIsVoid usage
Attachment #8788999 - Flags: review?(bugs) → review-
Attachment #8788998 - Attachment is obsolete: true
Attachment #8789346 - Flags: review?(bugs)
Attachment #8788999 - Attachment is obsolete: true
Attachment #8789347 - Flags: review?(bugs)
Comment on attachment 8789346 [details] [diff] [review]
part 2 - XMLHttpRequestString for main-thread

> XMLHttpRequestMainThread::CreateResponseParsedJSON(JSContext* aCx)
> {
>   if (!aCx) {
>     return NS_ERROR_FAILURE;
>   }
> 
>+  // This will make a copy of the string.
>+  nsAutoString string;
>+  mResponseText.GetAsString(string);
>+
The comment is wrong. Usually you end up sharing the internal buffer of the string

>+XMLHttpRequestStringWriterHelper::Buffer()
I think this is wrong name for the method, since it doesn't return the buffer, but the end of the current string.
Perhaps EndOfExistingData() ?
Attachment #8789346 - Flags: review?(bugs) → review+
Comment on attachment 8789347 [details] [diff] [review]
part 3 - Workers part

feels complicated overall, but then, those helper classes should make this all less error prone.
Attachment #8789347 - Flags: review?(bugs) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e3040aed6e9
Improve performance in XHR in workers - part 1 - XHR.responseText must be cached, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/554d3b443640
Improve performance in XHR in workers - part 2 - Implement XMLHttpRequestString, r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/2fc46e5f15dc
Improve performance in XHR in workers - part 3 - Implement XMLHttpRequestStringSnapshot, r=smaug
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d95f98dcd8a4
Improve performance in XHR in workers - part 4 - Correct cleaning cache for XHR.responseText, r=me
Depends on: 1336811
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: