1160811 - When profiling with memory: "InternalError: allocation size overflow" during getAllocations

Reporter

Description

•

9 years ago

When recording large memory profiles (octane with memory, ouch), I've gotten these errors before, which looks like hitting some limit.

console.error: 
  Error writing response to: getAllocations
console.error: 
  Message: InternalError: allocation size overflow
  Stack:
    Response<.write@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:681:23
actorProto/</handler/sendReturn@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1016:24
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:867:23
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37
Promise*this.PromiseWalker.scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:5
this.PromiseWalker.schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:709:7
Promise.prototype.then@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:417:5
actorProto/</handler/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1036:18
Actor<._queueResponse@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:913:20
actorProto/</handler@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1035:9
DSC_onPacket@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1456:15
ChildDebuggerTransport.prototype.receiveMessage@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/transport/transport.js:734:5

A promise chain failed to handle a rejection. Did you forget to '.catch', or did you forget to 'return'?
See https://developer.mozilla.org/Mozilla/JavaScript_code_modules/Promise.jsm/Promise




Full Message: Protocol error (unknownError): InternalError: allocation size overflow
Full Stack: JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: PendingErrors.register :: line 162
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: this.PromiseWalker.completePromise :: line 675
JS frame :: resource://gre/modules/Task.jsm :: TaskImpl_handleException :: line 448
JS frame :: resource://gre/modules/Task.jsm :: TaskImpl_run :: line 326
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: Handler.prototype.process :: line 870
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: this.PromiseWalker.walkerLoop :: line 746
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: this.PromiseWalker.scheduleWalkerLoop/< :: line 688
native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0

Jordan Santell [:jsantell] [@jsantell]

Reporter

Updated

•

9 years ago

Blocks: perf-allocations

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 1

•

9 years ago

This is where (in JS) the internal error is being triggered:

  /**
   * Write a response for the given return value.
   *
   * @param val ret
   *    The return value.
   * @param object ctx
   *    The object writing the response.
   */
  write: function(ret, ctx) {
    return JSON.parse(JSON.stringify(this.template, function(key, value) { // <---- This line
      if (value instanceof RetVal) {
        return value.write(ret, ctx);
      }
      return value;
    }));
  },

Here is what I think is happening: when we are stringifying the packet to JSON that we are going to send over the RDP, but there are SO MANY allocations that the resulting JSON string is larger than JSString::MAX_LENGTH and fails the check in `JSString::validateLength` and raises the error we see here.

I think the easiest thing to do here is just use a smaller `maxAllocationsLogLength` value.

Assignee: nobody → nfitzgerald

Status: NEW → ASSIGNED

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 2

•

9 years ago

Ugh.  The profiler would be a great match for bulk data I think, which would skip this whole issue.

But, no bulk data for e10s today.  In the future, I hope.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 3

•

9 years ago

(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #2)
> Ugh.  The profiler would be a great match for bulk data I think, which would
> skip this whole issue.
> 
> But, no bulk data for e10s today.  In the future, I hope.

Well, this also immediately JSON.parse's the result, so this is a poor person's object tree traversal/copy. The stringifying for the RDP only actually happens when we are remote debugging, or e10s is enabled.

So if we wanted to use bulk data, we would have to fix that poor person's object tree traversal/copy and use a stream JSON stringifier (not sure if we have one implemented and exposed to JS).

Jordan Santell [:jsantell] [@jsantell]

Reporter

Comment 4

•

9 years ago

FWIW, the entire profile (as in, data from nsIProfiler) is sent over in one chunk, and with the dedupe patch landed, this now is super fast. And amazing that we hit this limit with allocation!

Don't we fake remote debugging, even when in non-e10s?

Why can't we do bulk data in e10s?

Possibly similar or related, investigating profiler data via bulk data in bug 1010523. This could probably be fixed by batching up the allocation data in smaller chunks, but if there is that much data, maybe it makes sense to wait until recording is done, as it could impact perf sending a lot of data frequently.

Comment 5

•

9 years ago

(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #4)
> FWIW, the entire profile (as in, data from nsIProfiler) is sent over in one
> chunk, and with the dedupe patch landed, this now is super fast. And amazing
> that we hit this limit with allocation!

We already de-dupe the allocations data. We're hitting this *with* de-duping, which is pretty crazy.

> Don't we fake remote debugging, even when in non-e10s?

We always use the RDP, but only stringify when remote/e10s. When local, we just freeze the packet objects.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 6

•

9 years ago

(In reply to Nick Fitzgerald [:fitzgen][:nf] from comment #3)
> (In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #2)
> > Ugh.  The profiler would be a great match for bulk data I think, which would
> > skip this whole issue.
> > 
> > But, no bulk data for e10s today.  In the future, I hope.
> 
> Well, this also immediately JSON.parse's the result, so this is a poor
> person's object tree traversal/copy. The stringifying for the RDP only
> actually happens when we are remote debugging, or e10s is enabled.
> 
> So if we wanted to use bulk data, we would have to fix that poor person's
> object tree traversal/copy and use a stream JSON stringifier (not sure if we
> have one implemented and exposed to JS).

Bulk data operates on the level of XPCOM streams, so we would likely not go through this JS deep copy path if it were used.

A first version using bulk data might have the C++ profiler code write its JSON blob to the XPCOM stream.  It does not actually need to be a truly "streaming" profiler API that sends small chunks as events happen to use bulk data, one giant blob is still fine.  It just gets written to an XPCOM stream instead of stringified and parsed numerous times.

We could get fancier after that and have profiler actually output data in truly streaming fashion, but that's not needed for bulk data to be used.

Jordan Santell [:jsantell] [@jsantell]

Reporter

Comment 7

•

9 years ago

WRT profiler data, it renders to a string and we use an api that JSON parses it. Then when it goes over rdp, it gets stringifued and parsed again. If we want we can remove one parse/stringify cycle, but honestly its pretty fast right now

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 8

•

9 years ago

(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #6)
> (In reply to Nick Fitzgerald [:fitzgen][:nf] from comment #3)
> > (In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #2)
> > > Ugh.  The profiler would be a great match for bulk data I think, which would
> > > skip this whole issue.
> > > 
> > > But, no bulk data for e10s today.  In the future, I hope.
> > 
> > Well, this also immediately JSON.parse's the result, so this is a poor
> > person's object tree traversal/copy. The stringifying for the RDP only
> > actually happens when we are remote debugging, or e10s is enabled.
> > 
> > So if we wanted to use bulk data, we would have to fix that poor person's
> > object tree traversal/copy and use a stream JSON stringifier (not sure if we
> > have one implemented and exposed to JS).
> 
> Bulk data operates on the level of XPCOM streams, so we would likely not go
> through this JS deep copy path if it were used.

I'm not familiar enough with protocol.js to have an informed opinion/intuition about this.

> A first version using bulk data might have the C++ profiler code write its
> JSON blob to the XPCOM stream.  It does not actually need to be a truly
> "streaming" profiler API that sends small chunks as events happen to use
> bulk data, one giant blob is still fine.  It just gets written to an XPCOM
> stream instead of stringified and parsed numerous times.

Note that this bug is about the allocations data, not the SPS profiler data. Also, note that we do the de-duping in JS which might change the architecture you describe a bit.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 9

•

9 years ago

(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #4)
> Why can't we do bulk data in e10s?

Bulk data relies on XPCOM streams.  For b2g, you're connecting to a remote device over a socket like USB or TCP, and these sockets have existing XPCOM stream APIs.

When going between processes for e10s, we don't have anything like this yet.  If bug 1093357 (cross process pipe) is implemented (possibly Q3 it says), then we'll have a way to offer bulk data here.

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 10

•

9 years ago

(In reply to Nick Fitzgerald [:fitzgen][:nf] from comment #8)
> > A first version using bulk data might have the C++ profiler code write its
> > JSON blob to the XPCOM stream.  It does not actually need to be a truly
> > "streaming" profiler API that sends small chunks as events happen to use
> > bulk data, one giant blob is still fine.  It just gets written to an XPCOM
> > stream instead of stringified and parsed numerous times.
> 
> Note that this bug is about the allocations data, not the SPS profiler data.
> Also, note that we do the de-duping in JS which might change the
> architecture you describe a bit.

Ah, sorry for the confusion.  The main benefits I would expect you'd get from bulk data:

* This particular p.js stringify / parse goes away
* Should see an overall transfer time improvement

It's hard to claim much more than that without trying it, though.  Like you say, there's still lots of JS de-duping operations performed prior to sending out the data.  I haven't tested such a case on bulk data before, so it's hard to be sure.

Anyway, the main blocker for bulk data is just the e10s support I describe in comment 9.  protocol.js would need to be updated to support bulk data as well, which would involve things like not doing the stringify / parse since separate code paths would be added.  I am happy to write the protocol.js support once it's possible to use in e10s, since it should then have much more general usability by our tools.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

9 years ago

Comment 11

•

9 years ago

Attached patch Lower the max allocations log length to avoid internal allocation overflow errors — Details — Splinter Review

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

9 years ago

Attachment #8608226 - Flags: review?(jsantell)

Jordan Santell [:jsantell] [@jsantell]

Reporter

Comment 12

•

9 years ago

Comment on attachment 8608226 [details] [diff] [review]
Lower the max allocations log length to avoid internal allocation overflow errors

Review of attachment 8608226 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good -- should this even be a pref? Is this something people would possibly want to lower? Should we have a constant in the tool that prevents any value higher than this to fallback to 125000?

browser_perf-options-allocations.js should be updated as well (testing just setting the value, but the value is 777777)

Attachment #8608226 - Flags: review?(jsantell) → review+

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 13

•

9 years ago

You might want to lower it on low memory devices, so I think it has value as a pref.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Comment 14

•

9 years ago

remote:   https://hg.mozilla.org/integration/fx-team/rev/76ba2ef6bdc8

Carsten Book [:Tomcat]

Comment 15

•

9 years ago

https://hg.mozilla.org/mozilla-central/rev/76ba2ef6bdc8

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

status-firefox41: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 41

Brian Grinstead [:bgrins]

Updated

•

9 years ago

Blocks: perf-40-uplifts

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Assignee

Updated

•

9 years ago

Flags: qe-verify-

Brian Grinstead [:bgrins]

Comment 16

•

9 years ago

Comment on attachment 8608226 [details] [diff] [review]
Lower the max allocations log length to avoid internal allocation overflow errors


Approval Request Comment
[Feature/regressing bug #]: 1167252, the new performance tool
[User impact if declined]: Won't ship the performance tool
[Describe test coverage new/current, TreeHerder]: There are try pushes in Bug 1167252 with all patches needing uplift
[Risks and why]: Requesting uplift for the accumulated changes in the performance tool since the 40 merge date, so these changes haven't had the full 6 weeks to bake.  Risks are generally contained within devtools, specifically within the performance panel.
[String/UUID change made/needed]: None

Attachment #8608226 - Flags: approval-mozilla-aurora?

Brian Grinstead [:bgrins]

Comment 17

•

9 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/e0ad76a6eea5

status-firefox40: --- → fixed

Brian Grinstead [:bgrins]

Comment 18

•

9 years ago

Note: I had verbal confirmation for these uplifts from Sylvestre even before he's flagged them as a+.  See https://bugzilla.mozilla.org/show_bug.cgi?id=1167252#c26

Sylvestre Ledru [:Sylvestre]

Comment 19

•

9 years ago

Comment on attachment 8608226 [details] [diff] [review]
Lower the max allocations log length to avoid internal allocation overflow errors

Change approved to skip one train as part of the spring campaign.

Attachment #8608226 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

BMO Automation

Updated

•

6 years ago

Product: Firefox → DevTools