Closed Bug 1458246 Opened 2 years ago Closed Last year

Content process crash when trying to collect profile

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: bzbarsky, Assigned: canaltinova)

References

Details

Attachments

(1 file)

STEPS TO REPRODUCE:

1)  Start today's Mac nightly (official nightly, built from rev d2a4720d1c33).
2)  Install Gecko profiler addon from https://perf-html.io/
3)  Set the profiler to sample every 0.2ms with a 630MB buffer.
4)  Start the profiler.
5)  Load https://1yzzl25mwj.codesandbox.io/
6)  Once the frame counter thing gets to 100/100 and shows per/frame times,
    hit Ctrl+Shift+2.

EXPECTED RESULTS: I get a profile of that page.

ACTUAL RESULTS: Page crashes.  I do not actually get a profile for the PID the page was loaded in.  An example crash is at https://crash-stats.mozilla.com/report/index/0323f22b-1a5f-48cd-b609-9be8b0180501 and as mstange noted on IRC the "MOZ_CRASH(IPC message size is too large)" bit is likely the cause...
Per IRC discussion, the 1e6 performance.mark() calls this testcase does are probably relevant.
Priority: -- → P1
I got the same crash when trying to record a very large profile (setting the memory setting to a high value, the period to 0.1ms, and adding "Worker" to the list of threads).

It could be bug 1457301 though. That bug landed recently, so maybe you could try again the STR from comment 0 and report if this still crashes. I won't promise the profile is correctly captured though...
I will take a stab at this.
Assignee: nobody → canaltinova
Status: NEW → ASSIGNED
I can still reproduce the crash btw.
Yesterday, mstange suggested me to look at the bug 1271102 for more info about how to pass large data. Its dependencies showed that using `Shmem` will fix the problem. So I changed code to use Shmem instead of nsCString. But It appears, also with Shmem, I'm getting an allocation problem. AllocShmem method is working fine if the data is not too big and allocates properly. I can successfully see the profiler data in perf-html. But if the data is bigger than 256mb, then it fails to allocate the memory at this point [1]. It appears this check changed by spohl.
spohl, hi! Do you know a way to workaround that problem Or should we send it as a stream instead of sending in one huge chunk?


[1]: https://searchfox.org/mozilla-central/rev/5a744713370ec47969595e369fd5125f123e6d24/ipc/glue/SharedMemoryBasic_mach.mm#555
Flags: needinfo?(spohl.mozilla.bugs)
(In reply to Nazım Can Altınova [:canaltinova] from comment #5)
> Yesterday, mstange suggested me to look at the bug 1271102 for more info
> about how to pass large data. Its dependencies showed that using `Shmem`
> will fix the problem. So I changed code to use Shmem instead of nsCString.
> But It appears, also with Shmem, I'm getting an allocation problem.
> AllocShmem method is working fine if the data is not too big and allocates
> properly. I can successfully see the profiler data in perf-html. But if the
> data is bigger than 256mb, then it fails to allocate the memory at this
> point [1]. It appears this check changed by spohl.
> spohl, hi! Do you know a way to workaround that problem Or should we send it
> as a stream instead of sending in one huge chunk?
> 
> 
> [1]:
> https://searchfox.org/mozilla-central/rev/
> 5a744713370ec47969595e369fd5125f123e6d24/ipc/glue/SharedMemoryBasic_mach.
> mm#555

The allocation doesn't actually fail at line [1] that you mention, but rather the call to mach_make_memory_entry_64 ends up with a smaller memoryObjectSize than expected and we now (as of bug 1362303) bail to avoid crashing. Maybe :jld has a suggestion how to proceed here.
Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(jld)
See bug 1465667 for details, but the short version is that there's a bug with how we handle shared memory on Mac that manifests only with sizes > 128 MiB, and I think there's a simple fix.
Flags: needinfo?(jld)
I tested this patch with the patch from bug 1465667 and it seems like working fine with large data chunks. Perf-html.io side takes lots of time to processs all the data but we don't get any content process crash in the meantime. This depends on Bug 1465667 ofc.
Comment on attachment 8983049 [details]
Bug 1458246 - Use Shmem to pass the profiler data to be able to send large profiler data

https://reviewboard.mozilla.org/r/248920/#review255086

Looks good! I found one copy that's easy to avoid. As a follow-up, there are two more copies we can get rid of: One is in ChunkedJSONWriteFunc::CopyData(), which allocates a UniquePtr<char[]> and copies the chunks into it, and one is in CollectProfileOrEmptyString, which copies the contents from the UniquePtr<char[]> into an nsCString. Ideally, we'd have a way to get the total length of the ChunkedJSONWriteFunc, allocate a shmem of that size, and then copy the chunks directly into the Shmem. But as I said, this can be done in a follow-up bug.

::: tools/profiler/gecko/ProfilerChild.cpp:92
(Diff revision 1)
>    }
>    return profileCString;
>  }
>  
> +Shmem
> +ProfilerChild::ConvertProfileStringToShmem(nsCString profileCString) {

Please make this a const reference, otherwise the string will be copied when this method is called. And please rename the argument to aProfileCString.

::: tools/profiler/gecko/ProfilerChild.cpp:100
(Diff revision 1)
> +  memcpy(shmem.get<char>(),
> +    profileCString.BeginReading(),
> +    profileCString.Length());

Can you use PodCopy instead of memcpy here?
Attachment #8983049 - Flags: review?(mstange) → review+
Thanks for the review mstange! I will open a new bug for these copies.
Pushed by canaltinova@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/b24796357d77
Use Shmem to pass the profiler data to be able to send large profiler data r=mstange
https://hg.mozilla.org/mozilla-central/rev/b24796357d77
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
You need to log in before you can comment on or make changes to this bug.