Open Bug 1792530 Opened 2 years ago Updated 22 days ago

Long content process profile missing when capturing a profile soon after closing the tab

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

People

(Reporter: florian, Unassigned)

References

Details

I think I finally understood in which case this problem happens.

I'm running locally a test that displays a CSS animation, and then every second reduces the refresh rate, to see how this affects power use. If I run the test for 30s, I have a correct profile at the end. If I run the test for 120s, at the end the profile misses the content process that was displaying the animation.

Example profile with missing content process: https://share.firefox.dev/3dK6pzc
Example profile of the same thing running only for 30s: https://share.firefox.dev/3E0RGdZ (there's an example.com content process).

The profile gathering log for the profile with the missing content process says:

0: Array(3) [ 123168.708042, "Generated parent process profile, size:", 53040518 ]
​​​1: Array [ 123168.710208, "No exit profiles." ]
​​​...
7: Array(3) [ 123168.723167, "Waiting for pending profile, pid:", 21993 ]
​​​...
11: Array(4) [ 123170.100958, "Got rejection from pid, with reason:", 21993, 1 ]

It seems "reason: 1" means "ChannelClosed" (https://searchfox.org/mozilla-central/rev/929b2a7154e463674ebac497d4a89208eec1b8f7/ipc/glue/MessageChannel.h#88).

I think the problem is that when we start collecting the profile, the exit profile from the child process (for which the tab has been closed) has not been received yet but is already being generated, and by the time we send an IPC to get the profile, the IPC channel is already closed. If my guess is correct, then when we get the error with reason 1, it would be a good idea to check if an exit profile has been received for that process in the meantime.

(In reply to Florian Quèze [:florian] from comment #0)

Some printf debugging later...

I think the problem is that when we start collecting the profile, the exit profile from the child process (for which the tab has been closed) has not been received yet but is already being generated, and by the time we send an IPC to get the profile, the IPC channel is already closed.

This guess was correct.

If my guess is correct, then when we get the error with reason 1, it would be a good idea to check if an exit profile has been received for that process in the meantime.

This didn't work, profiler_received_exit_profile is correctly called, but after we have already called nsProfiler::FinishGathering. So to fix this we would need a way to detect that we attempted to send an IPC to a content process that was already in the process of cleanly shutting down, and we would need to find a way to wait for it to finish shutting down. I think this should be possible, as I assume the parent process initiated the shutdown, and somehow waits for it to complete, but that's outside the profiler code.

Severity: -- → S3
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.