Closed Bug 1735397 Opened 4 years ago Closed 3 years ago

Profiling a single process barely works when there are many child processes

Categories

(Core :: Gecko Profiler, defect, P3)

defect

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox99 --- fixed

People

(Reporter: florian, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxp])

Attachments

(4 files)

The feature I added in bug 1730239 to profile all threads of a process from a single click in about:processes works well in quick tests on my local build, but I encounter issues when profiling on my normal Nightly build that I use for actual browsing.

  • Despite profiling for 5s, the profile I capture is very short: https://share.firefox.dev/2YBv9ln It seems I have only a single buffer chunk, and there's a "IPCIn — PProfiler::Msg_DestroyReleasedChunksAtOrBefore" marker at the very beginning of the available data.
  • It's currently difficult to see it, but with the patch from bug 1734899 applied, it becomes clear that the profiler starts in all processes, despite the thread filter mentionning only a single process (eg "pid:26528" in the previous profile). In about:processes, in the profiled process I see a SamplerThread taking between 2 and 20% of a core (depending on how many threads the process had), and in all other processes I see a SamplerThread taking about 0.4-0.8% of a core. What are these SamplerThread doing?

I suspect both issues are connected, as the pointless SamplerThreads likely request buffer chunks... and with Fission enabled, my Nightly had 67 content processes! Which is a lot when the default buffer size is 128MB for profiles captured from about:processes.

Bug 1673513 is also frequently in the way when profiling from about:processes, but it's at least possible to workaround it from about:config.

I haven't been able to reproduce this issue. (Or maybe I have? Once I had an error where the profile had zero threads in it!?)

Based on the Msg_DestroyReleasedChunksAtOrBefore, it could indeed be that other processes kept filling the buffer, and the previous chunks of your to-be-profiled process were all discarded.

Of course, ideally other processes shouldn't record anything, but I guess the profiler is still working everywhere, and even though it shouldn't sample anything, currently markers are still recorded everywhere! Hopefully bug 1735697 will help a lot with this issue -- there's a prototype in Try, if you'd like to have a go.

Apart from that, we could also optimize the Profiler in the "pid:12345" case, so that other child processes really don't run it at all, so they wouldn't even consume any of the shared buffer space, and wouldn't run anything related to the Profiler.
I would suggest we keep this bug open for that, to look at after bug 1735697 has landed.

Severity: -- → S3
Depends on: 1735697
Priority: -- → P3

(In reply to Gerald Squelart [:gerald] (he/him) from comment #1)

Once I had an error where the profile had zero threads in it!?

That's usually a symptom of bug 1673513. You can workaround it by increasing the value of devtools.performance.recording.child.timeout_s in about:config, or reducing the value of toolkit.aboutProcesses.profileDuration.

Assignee: nobody → gsquelart
Status: NEW → ASSIGNED

These will be used to check special thread filter rules around "pid:..." filters.

Depends on D135852

When trying to profile only one process, the profiler should not even start in those processes excluded by the filter, so that they don't use any resources (memory and CPU).
To do that, the filter is checked for "pid:" at the multiple locations it may appear:

  • In the parent process, when starting the overall profiler, don't send the "Start" IPC to excluded processes.
  • When a new process starts and initializes the profiler, don't start the profiler if the filter excludes this process.
  • When a new process then registers itself with the parent, don't (re)start the profiler in excluded processes.

Child processes that don't run the profiler may still be asked for a JSON profile at the end of the profiling session. This is handled by sending an empty string, so that the parent process will correctly remove them from the pending-profile list.

Depends on D135853

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/19762488b965 Add mozilla/BaseAndGeckoProfilerDetail.h header for shared profiler-only things - r=florian https://hg.mozilla.org/integration/autoland/rev/d55f04c0a444 mozilla::profiler::detail::FilterHasPid and FiltersExcludePid - r=florian https://hg.mozilla.org/integration/autoland/rev/64546b56bac0 Don't start profiler in processes excluded by pid - r=florian https://hg.mozilla.org/integration/autoland/rev/724e44c19665 Add tests for pid-filtering - r=florian

Backed out 4 changesets (bug 1735397) for causing spidermonkey build bustages on check_vanilla_allocations.py.

Push with failures

Failure log

Backout link

[task 2022-02-08T05:22:39.079Z] TEST-PASS | check_js_msg_encoding.py | /builds/worker/checkouts/gecko/js/xpconnect/src/jsshell.msg | ok
[task 2022-02-08T05:22:39.092Z] make[2]: Leaving directory '/builds/worker/workspace/obj-spider/js/src'
[task 2022-02-08T05:22:39.096Z] make[2]: Entering directory '/builds/worker/workspace/obj-spider/js/src/build'
[task 2022-02-08T05:22:39.096Z] /builds/worker/workspace/obj-spider/_virtualenvs/build/bin/python /builds/worker/checkouts/gecko/config/check_vanilla_allocations.py libjs_static.a
[task 2022-02-08T05:22:51.980Z] TEST-UNEXPECTED-FAIL | check_vanilla_allocations.py | 'operator new(unsigned' present in Unified_cpp_mozglue_baseprofiler0.o
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: Source lines with allocation calls:
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: Accurate in unoptimized builds; util/Utility.cpp expected.
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new[](unsigned long) called at /builds/worker/checkouts/gecko/js/src/vm/ProfilingStack.cpp:43
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/workspace/obj-spider/dist/include/mozilla/UniquePtr.h:609
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new[](unsigned long) called at /builds/worker/workspace/obj-spider/dist/include/mozilla/UniquePtr.h:616
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/checkouts/gecko/intl/components/src/DateTimeFormat.cpp:280
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new[](unsigned long) called at /builds/worker/workspace/obj-spider/dist/include/mozilla/UniquePtr.h:616
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/checkouts/gecko/intl/components/src/IDNA.cpp:22
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long, std::nothrow_t const&) called at /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/stl_tempbuf.h:94
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/workspace/obj-spider/dist/include/mozilla/UniquePtr.h:609
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/checkouts/gecko/intl/components/src/PluralRules.cpp:58
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new[](unsigned long) called at /builds/worker/checkouts/gecko/js/src/util/Utility.cpp:162
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/checkouts/gecko/js/src/util/Utility.cpp:160
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: memalign called at /builds/worker/checkouts/gecko/js/src/util/Utility.cpp:162
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:111
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: memalign called at /builds/worker/checkouts/gecko/memory/build/malloc_decls.h:55
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: memalign called at /builds/worker/checkouts/gecko/memory/mozalloc/mozalloc.cpp:113
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:111
[task 2022-02-08T05:22:51.980Z] check_vanilla_allocations.py: operator new(unsigned long) called at /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:111
[task 2022-02-08T05:22:51.996Z] make[2]: *** [Makefile:21: check-vanilla-allocations] Error 1
[task 2022-02-08T05:22:51.996Z] make[2]: Leaving directory '/builds/worker/workspace/obj-spider/js/src/build'
[task 2022-02-08T05:22:51.996Z] make[1]: *** [/builds/worker/checkouts/gecko/config/recurse.mk:99: js/src/build/check] Error 2
[task 2022-02-08T05:22:51.996Z] make[1]: Leaving directory '/builds/worker/workspace/obj-spider'
[task 2022-02-08T05:22:51.996Z] make: *** [/builds/worker/checkouts/gecko/config/recurse.mk:34: check] Error 2
[task 2022-02-08T05:22:51.996Z] in directory /builds/worker/workspace/obj-spider, running ['setarch', 'x86_64', '-R', 'make', 'check-jit-test']
[task 2022-02-08T05:22:52.003Z] make -C js/src check-jit-test
[task 2022-02-08T05:22:52.008Z] make[1]: Entering directory '/builds/worker/workspace/obj-spider/js/src'
[task 2022-02-08T05:22:52.008Z] ASAN_SYMBOLIZER_PATH='/builds/worker/fetches/llvm-symbolizer/llvm-symbolizer' ../../dist/bin/run-mozilla.sh /builds/worker/workspace/obj-spider/_virtualenvs/build/bin/python -u /builds/worker/checkouts/gecko/js/src/jit-test/jit_test.py \
[task 2022-02-08T05:22:52.008Z]         --no-slow --no-progress --format=automation --jitflags=all \
[task 2022-02-08T05:22:52.008Z] 		 \
[task 2022-02-08T05:22:52.008Z] 		--jitflags=none \
[task 2022-02-08T05:22:52.008Z]         ../../dist/bin/js 
[task 2022-02-08T05:22:53.549Z] TEST-PASS | js/src/jit-test/tests/1659595.js | Success (code 0, args "") [0.1 s]
Flags: needinfo?(gsquelart)

Thank you.
I'm guessing it's catching the allocations from std::string in the 2nd patch. Investigating...

Flags: needinfo?(gsquelart)
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a82be94d0caa Add mozilla/BaseAndGeckoProfilerDetail.h header for shared profiler-only things - r=florian https://hg.mozilla.org/integration/autoland/rev/8452b8b38ecf mozilla::profiler::detail::FilterHasPid and FiltersExcludePid - r=florian https://hg.mozilla.org/integration/autoland/rev/e08aa9249eed Don't start profiler in processes excluded by pid - r=florian https://hg.mozilla.org/integration/autoland/rev/a9d9cffa89e6 Add tests for pid-filtering - r=florian
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: