Open Bug 1782125 Opened 3 years ago Updated 1 year ago

Startup profiling deadlocks on Windows

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

People

(Reporter: florian, Unassigned)

References

Details

I tried to reproduce a test failure with MOZ_PROFILER_STARTUP to see what's happening, and out of 16 runs, I have 4 that failed with "TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 370 seconds with no output"
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=94084601bba55fb449e4136d649e3c48b384e251

These jobs have crash dumps, and stacks in their logs.

In three cases, the main thread is blocked on mozglue.dll!mozilla::WindowsDpiInitialization() at https://searchfox.org/mozilla-central/rev/1061fae5e225a99ef5e43dbdf560a91a0c0d00d1/mozglue/misc/WindowsDpiInitialization.cpp#35-37 while the base profiler is trying to sample.

In a fourth case the main thread is blocked at https://searchfox.org/mozilla-central/rev/1061fae5e225a99ef5e43dbdf560a91a0c0d00d1/widget/windows/WindowsUIUtils.cpp#399-400 (this system function triggers loading a DLL) while the gecko profiler is trying to sample.

We have code at https://searchfox.org/mozilla-central/rev/1061fae5e225a99ef5e43dbdf560a91a0c0d00d1/mozglue/misc/StackWalk.cpp#300 trying to avoid these deadlocks. Maybe something has changed, or maybe there was a bug in it.

Severity: -- → S3
Priority: -- → P2

It also happens relatively frequently for me to see Windows profiles where there's one process that misses stack sampling but where other processes have good stack. Most recent example: https://share.firefox.dev/3bhteJu The parent process main thread has a 4 samples with full stacks from the gecko profiler at the very beginning, and the rest of the profile doesn't have native stack frames for that process.

See Also: → 1836225
You need to log in before you can comment on or make changes to this bug.