Closed Bug 1160811 Opened 5 years ago Closed 5 years ago

When profiling with memory: "InternalError: allocation size overflow" during getAllocations

Categories

(DevTools :: Performance Tools (Profiler/Timeline), defect)

37 Branch
defect
Not set

Tracking

(firefox40 fixed, firefox41 fixed)

RESOLVED FIXED
Firefox 41
Tracking Status
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: jsantell, Assigned: fitzgen)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

When recording large memory profiles (octane with memory, ouch), I've gotten these errors before, which looks like hitting some limit.

console.error: 
  Error writing response to: getAllocations
console.error: 
  Message: InternalError: allocation size overflow
  Stack:
    Response<.write@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:681:23
actorProto/</handler/sendReturn@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1016:24
Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:867:23
this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:746:7
this.PromiseWalker.scheduleWalkerLoop/<@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:37
Promise*this.PromiseWalker.scheduleWalkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:688:5
this.PromiseWalker.schedulePromise@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:709:7
Promise.prototype.then@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:417:5
actorProto/</handler/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1036:18
Actor<._queueResponse@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:913:20
actorProto/</handler@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1035:9
DSC_onPacket@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/main.js:1456:15
ChildDebuggerTransport.prototype.receiveMessage@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/transport/transport.js:734:5

A promise chain failed to handle a rejection. Did you forget to '.catch', or did you forget to 'return'?
See https://developer.mozilla.org/Mozilla/JavaScript_code_modules/Promise.jsm/Promise




Full Message: Protocol error (unknownError): InternalError: allocation size overflow
Full Stack: JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: PendingErrors.register :: line 162
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: this.PromiseWalker.completePromise :: line 675
JS frame :: resource://gre/modules/Task.jsm :: TaskImpl_handleException :: line 448
JS frame :: resource://gre/modules/Task.jsm :: TaskImpl_run :: line 326
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: Handler.prototype.process :: line 870
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: this.PromiseWalker.walkerLoop :: line 746
JS frame :: resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js :: this.PromiseWalker.scheduleWalkerLoop/< :: line 688
native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
This is where (in JS) the internal error is being triggered:

  /**
   * Write a response for the given return value.
   *
   * @param val ret
   *    The return value.
   * @param object ctx
   *    The object writing the response.
   */
  write: function(ret, ctx) {
    return JSON.parse(JSON.stringify(this.template, function(key, value) { // <---- This line
      if (value instanceof RetVal) {
        return value.write(ret, ctx);
      }
      return value;
    }));
  },

Here is what I think is happening: when we are stringifying the packet to JSON that we are going to send over the RDP, but there are SO MANY allocations that the resulting JSON string is larger than JSString::MAX_LENGTH and fails the check in `JSString::validateLength` and raises the error we see here.

I think the easiest thing to do here is just use a smaller `maxAllocationsLogLength` value.
Assignee: nobody → nfitzgerald
Status: NEW → ASSIGNED
Ugh.  The profiler would be a great match for bulk data I think, which would skip this whole issue.

But, no bulk data for e10s today.  In the future, I hope.
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #2)
> Ugh.  The profiler would be a great match for bulk data I think, which would
> skip this whole issue.
> 
> But, no bulk data for e10s today.  In the future, I hope.

Well, this also immediately JSON.parse's the result, so this is a poor person's object tree traversal/copy. The stringifying for the RDP only actually happens when we are remote debugging, or e10s is enabled.

So if we wanted to use bulk data, we would have to fix that poor person's object tree traversal/copy and use a stream JSON stringifier (not sure if we have one implemented and exposed to JS).
FWIW, the entire profile (as in, data from nsIProfiler) is sent over in one chunk, and with the dedupe patch landed, this now is super fast. And amazing that we hit this limit with allocation!

Don't we fake remote debugging, even when in non-e10s?

Why can't we do bulk data in e10s?

Possibly similar or related, investigating profiler data via bulk data in bug 1010523. This could probably be fixed by batching up the allocation data in smaller chunks, but if there is that much data, maybe it makes sense to wait until recording is done, as it could impact perf sending a lot of data frequently.
See Also: → 1010523
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #4)
> FWIW, the entire profile (as in, data from nsIProfiler) is sent over in one
> chunk, and with the dedupe patch landed, this now is super fast. And amazing
> that we hit this limit with allocation!

We already de-dupe the allocations data. We're hitting this *with* de-duping, which is pretty crazy.

> Don't we fake remote debugging, even when in non-e10s?

We always use the RDP, but only stringify when remote/e10s. When local, we just freeze the packet objects.
(In reply to Nick Fitzgerald [:fitzgen][:nf] from comment #3)
> (In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #2)
> > Ugh.  The profiler would be a great match for bulk data I think, which would
> > skip this whole issue.
> > 
> > But, no bulk data for e10s today.  In the future, I hope.
> 
> Well, this also immediately JSON.parse's the result, so this is a poor
> person's object tree traversal/copy. The stringifying for the RDP only
> actually happens when we are remote debugging, or e10s is enabled.
> 
> So if we wanted to use bulk data, we would have to fix that poor person's
> object tree traversal/copy and use a stream JSON stringifier (not sure if we
> have one implemented and exposed to JS).

Bulk data operates on the level of XPCOM streams, so we would likely not go through this JS deep copy path if it were used.

A first version using bulk data might have the C++ profiler code write its JSON blob to the XPCOM stream.  It does not actually need to be a truly "streaming" profiler API that sends small chunks as events happen to use bulk data, one giant blob is still fine.  It just gets written to an XPCOM stream instead of stringified and parsed numerous times.

We could get fancier after that and have profiler actually output data in truly streaming fashion, but that's not needed for bulk data to be used.
WRT profiler data, it renders to a string and we use an api that JSON parses it. Then when it goes over rdp, it gets stringifued and parsed again. If we want we can remove one parse/stringify cycle, but honestly its pretty fast right now
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #6)
> (In reply to Nick Fitzgerald [:fitzgen][:nf] from comment #3)
> > (In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #2)
> > > Ugh.  The profiler would be a great match for bulk data I think, which would
> > > skip this whole issue.
> > > 
> > > But, no bulk data for e10s today.  In the future, I hope.
> > 
> > Well, this also immediately JSON.parse's the result, so this is a poor
> > person's object tree traversal/copy. The stringifying for the RDP only
> > actually happens when we are remote debugging, or e10s is enabled.
> > 
> > So if we wanted to use bulk data, we would have to fix that poor person's
> > object tree traversal/copy and use a stream JSON stringifier (not sure if we
> > have one implemented and exposed to JS).
> 
> Bulk data operates on the level of XPCOM streams, so we would likely not go
> through this JS deep copy path if it were used.

I'm not familiar enough with protocol.js to have an informed opinion/intuition about this.

> A first version using bulk data might have the C++ profiler code write its
> JSON blob to the XPCOM stream.  It does not actually need to be a truly
> "streaming" profiler API that sends small chunks as events happen to use
> bulk data, one giant blob is still fine.  It just gets written to an XPCOM
> stream instead of stringified and parsed numerous times.

Note that this bug is about the allocations data, not the SPS profiler data. Also, note that we do the de-duping in JS which might change the architecture you describe a bit.
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #4)
> Why can't we do bulk data in e10s?

Bulk data relies on XPCOM streams.  For b2g, you're connecting to a remote device over a socket like USB or TCP, and these sockets have existing XPCOM stream APIs.

When going between processes for e10s, we don't have anything like this yet.  If bug 1093357 (cross process pipe) is implemented (possibly Q3 it says), then we'll have a way to offer bulk data here.
(In reply to Nick Fitzgerald [:fitzgen][:nf] from comment #8)
> > A first version using bulk data might have the C++ profiler code write its
> > JSON blob to the XPCOM stream.  It does not actually need to be a truly
> > "streaming" profiler API that sends small chunks as events happen to use
> > bulk data, one giant blob is still fine.  It just gets written to an XPCOM
> > stream instead of stringified and parsed numerous times.
> 
> Note that this bug is about the allocations data, not the SPS profiler data.
> Also, note that we do the de-duping in JS which might change the
> architecture you describe a bit.

Ah, sorry for the confusion.  The main benefits I would expect you'd get from bulk data:

* This particular p.js stringify / parse goes away
* Should see an overall transfer time improvement

It's hard to claim much more than that without trying it, though.  Like you say, there's still lots of JS de-duping operations performed prior to sending out the data.  I haven't tested such a case on bulk data before, so it's hard to be sure.

Anyway, the main blocker for bulk data is just the e10s support I describe in comment 9.  protocol.js would need to be updated to support bulk data as well, which would involve things like not doing the stringify / parse since separate code paths would be added.  I am happy to write the protocol.js support once it's possible to use in e10s, since it should then have much more general usability by our tools.
See Also: → 1166823
Attachment #8608226 - Flags: review?(jsantell)
Comment on attachment 8608226 [details] [diff] [review]
Lower the max allocations log length to avoid internal allocation overflow errors

Review of attachment 8608226 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good -- should this even be a pref? Is this something people would possibly want to lower? Should we have a constant in the tool that prevents any value higher than this to fallback to 125000?

browser_perf-options-allocations.js should be updated as well (testing just setting the value, but the value is 777777)
Attachment #8608226 - Flags: review?(jsantell) → review+
You might want to lower it on low memory devices, so I think it has value as a pref.
https://hg.mozilla.org/mozilla-central/rev/76ba2ef6bdc8
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 41
Flags: qe-verify-
Comment on attachment 8608226 [details] [diff] [review]
Lower the max allocations log length to avoid internal allocation overflow errors


Approval Request Comment
[Feature/regressing bug #]: 1167252, the new performance tool
[User impact if declined]: Won't ship the performance tool
[Describe test coverage new/current, TreeHerder]: There are try pushes in Bug 1167252 with all patches needing uplift
[Risks and why]: Requesting uplift for the accumulated changes in the performance tool since the 40 merge date, so these changes haven't had the full 6 weeks to bake.  Risks are generally contained within devtools, specifically within the performance panel.
[String/UUID change made/needed]: None
Attachment #8608226 - Flags: approval-mozilla-aurora?
Note: I had verbal confirmation for these uplifts from Sylvestre even before he's flagged them as a+.  See https://bugzilla.mozilla.org/show_bug.cgi?id=1167252#c26
Comment on attachment 8608226 [details] [diff] [review]
Lower the max allocations log length to avoid internal allocation overflow errors

Change approved to skip one train as part of the spring campaign.
Attachment #8608226 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.