Closed Bug 1758643 Opened 3 years ago Closed 3 years ago

MOZ_RELEASE_ASSERT(aLength <= kMax) (string is too large) when capturing a profile

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

RESOLVED FIXED
104 Branch
Tracking Status
firefox104 --- fixed

People

(Reporter: florian, Assigned: mozbugz)

References

(Depends on 1 open bug)

Details

Attachments

(4 files)

I crashed when trying to capture a long profile yesterday (https://crash-stats.mozilla.org/report/index/ea53d488-6637-4447-86e5-aa33e0220308) and today (https://crash-stats.mozilla.org/report/index/86471d1b-6c2d-415d-84c8-7d4df0220309).

Is it possible that something changed recently that makes us attempt to generate JSON profiles that are larger than 2GB? https://searchfox.org/mozilla-central/rev/131f3af9a49d2203adb7b7ef30dcc37c9f1aa10b/xpcom/string/nsTStringRepr.h#65-69,84,87

I can't think of obvious recent changes that could explain this. We keep adding new markers, and recording longer profiles, maybe it's getting out of control? 😛

I see that this happens when we have received all expected child profiles, and we're stitching everything in one big buffer, which is given for adoption to an nsCString, at which point the >2GB length is a problem.
Assuming we can't avoid using nsCString, nsProfiler::FinishGathering() could detect that the sum of all per-process profiles is too big, and we could drop one or more to fit into the 2GB limit.
(I don't think we could remove the oldest data from each profile, though that would be the ideal solution.)

Also other things like bug 1756535 (combining unique strings&stacks from all threads) could help, by reducing the size of each per-process profile.

Locally It could be useful to dump the giant string to disk before crashing (just after this line), to analyze the json profile and see if there are real issues.

Severity: -- → S4
Priority: -- → P3

Hey Gerald,
this seems to happen a lot more now that we enabled IPC by default.
Therefore should we bump the severity and the priority?

Assuming we can't avoid using nsCString

Why can't we use a structure that can hold more than 2GB?

This is trivial to reproduce when doing a media profile of more than a few seconds on a reasonnably active Web Audio API program such as https://jackschaedler.github.io/karplus-stress-tester/ (click Start audio, then the button 600, then profile, wait 10s, capture). This crashes Firefox after a few seconds on capture. I've also seen 140GB (!) memory in use in the child process, when it doesn't crash.

This is likely caused by IPC markers being collected automatically these days, quite a steep increase in marker collected.

(In reply to Julien Wajsberg [:julienw] from comment #2)

Assuming we can't avoid using nsCString

Why can't we use a structure that can hold more than 2GB?

Although in that case we might also reach the limit for WebChannel like https://github.com/firefox-devtools/profiler/issues/4021

I believe we need to discuss another model to transfer the profile data to the webpage.

Thank you both for the extra info. This certainly needs to be investigated sooner.

Severity: S4 → S3
Priority: P3 → P2

I believe bug 1777890 (just landed) fixes the main cause of this issue: The buffer size-limiting algorithm was not running at all since April, which would explain the big buffers in processes, and then the too-big strings when collecting profiles...

Please let me know if you still experience this issue or not.
(But I should work on it in any case.)

This will be used in the following patches to more easily reject the promise
with a specific code AND reset gathering, in one call.

Depends on D151902

Assignee: nobody → gsquelart
Status: NEW → ASSIGNED

The promise to be resolved may end up in JavaScript, so we check for the
maximum JS string length -- which is under the nsCString max length, so we
won't fail the too-big-string assertion there.

Depends on D152023

Instead of allocating a buffer with the profile and then copying it into an
nsCString (at which point there are two full copies in memory), we resize the
nsCString as needed and directly output the profile data into it.

Depends on D152024

Instead of throwing away the whole profile at the end if it's too big, we try
to keep as much data as possible, by only throwing away incoming child profiles
that would make the resulting combined full profile too big.

Depends on D152025

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/207232098956 ResetGathering now take the nsresult to use in the promise rejection - r=florian https://hg.mozilla.org/integration/autoland/rev/0fef9e23cd85 Check for maximum string length before outputting profile JSON - r=florian https://hg.mozilla.org/integration/autoland/rev/643d16db6266 Avoid double allocation when outputting profile JSON - r=florian https://hg.mozilla.org/integration/autoland/rev/2ae1019d8ff7 Preemptively discard child profiles that would add too much data - r=florian

I think I just got a profile that would previously have crashed: https://share.firefox.dev/3J90uyQ

I started the profiler with the 'power' preset, and started watching a video. After about half an hour, I tried to capture the profile. It took a while, and when I finally got the profile, it missed the content process that was playing the video, and the GPU process.

The profile gathering log contains 2 error messages:
Could not create shmem for profile from pid 28660 (535024313 B)
Could not create shmem for profile from pid 26856 (504297331 B)

Thanks. The system works! 😁

Wow, 500MB...
I guess possible next steps could be:

  • Reduce JSON output for the same data (like in bug 1774329, and more to come).
  • Split profiles in segments to fit in smaller shmems.
  • Remove old data to shrink profiles. This would be difficult and slow!
  • Until then, lower the maximum buffer size?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: