mainthreadio markers missing for closing a file on Windows (NtClose)
Categories
(Core :: Gecko Profiler, enhancement, P2)
Tracking
()
People
(Reporter: florian, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fxp])
Attachments
(2 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
19.87 KB,
patch
|
Details | Diff | Splinter Review |
See this profile where most of the time is spent in NtClose, but there's no I/O marker for it: https://perfht.ml/2DSi7lv
We do have open/read/write/stat/fsync markers.
It looks like the IOInterposer code landed as part of bug 902587 doesn't interpose NtClose.
Comment 1•5 years ago
•
|
||
Long time ago, so hard for me to page in, but I see the unlanded patch 1 in bug 888534 adds these operations to IOInterposer. The specific hooks to PoisonIOInterposer would then need to be added via a subsequent patch.
If we do add it, keep in mind that NtClose is not just used for I/O, and in fact we won't want markers for most calls to that function.
Really we'd only want markers for NtClose that actually correspond to closing of file handles, so we should only log a marker when GetFileType() == FILE_TYPE_DISK. We probably should measure the perf impacts of such a check. Hopefully there aren't any, but should verify that.
Reporter | ||
Comment 2•5 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #1)
If we do add it, keep in mind that NtClose is not just used for I/O, and in fact we won't want markers for most calls to that function.
Really we'd only want markers for NtClose that actually correspond to closing of file handles
Is CloseHandle a better symbol to interpose?
In the profile from comment 0, the stack looks like:
NtClose (ntdll.dll)
CloseHandle (KERNELBASE.dll)
static <unnamed-tag> FileClose(struct PRFileDesc *) (nss3.dll)
<actual Firefox code> (xul.dll)
Comment 3•5 years ago
|
||
NtClose is the right one to use for this, as the other PoisonIOInterposer interceptions are at the native NT layer as well.
Reporter | ||
Comment 4•5 years ago
|
||
Reporter | ||
Comment 5•5 years ago
|
||
Here is a try run of the above patch with the main thread I/O startup test: https://treeherder.mozilla.org/#/jobs?repo=try&revision=84bfbc48d5ed482a273a0c9c1351551bf6978f88
On Windows 7 32bit it produces the expected output. On Windows 10 64bit, it's busted to the point that even the xpcshell fake http server fails to start.
I tried to debug this locally on my Windows 10 machine but didn't get anywhere. When using startup+shutdown profiling, I got profiles with correct 'close' FileIO markers. But when using startup profiling and capturing the profile while the browser was running, I never got a profile displayed, the Firefox Profiler UI was stuck on waiting to receive the profile from the Gecko profiler. This looks like there's a deadlock, but I couldn't figure out where.
Then I tried using a debug build, hoping it would give me some clue of what could be causing the deadlock... but I realized that just using startup profiling with the mainthreadio feature broke the browser, both with and without my patch. I filed bug 1543060.
Aaron, any idea of what I could have done wrong in this patch, or what could be broken elsewhere that my patch exposes?
Comment 6•5 years ago
|
||
Any time I see anything about deadlocks involving the profiler on 64-bit, it makes it sound to me like something is missing stackwalk suppressing.
Reporter | ||
Comment 7•5 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #6)
Any time I see anything about deadlocks involving the profiler on 64-bit, it makes it sound to me like something is missing stackwalk suppressing.
Seems like a good guess, thanks!
I tried adding "AutoSuppressStackWalking suppress;" in the functions that seem to play a role here (InterposedNtClose and locked_profiler_stream_json_for_this_process) but unfortunately that didn't fix the deadlock I'm encountering.
Comment 8•5 years ago
|
||
Unfortunately without stacks that's about the most that I can speculate.
Oops, I missed this bug, so I opened another one (now marked duplicate). Here's the info I wrote there, in case there's something useful:
Intercepting NtClose
is possible, and would be very useful, because it is a suspected to sometimes have a heavy cost: See "Closing File Handles on Windows" in this article about surprisingly-slow things.
But it presents some technical difficulties:
NtClose
may be used on any type of object, so we'd need to filter out non-file objects.- Also because
NtClose
may be called with any object, once in the interception routine, the interceptor could too easily be recursively called! In particular it makes stack-walking impossible (TBD), making the interception less useful.
I'm attaching an old WIP patch that may give some ideas.
(Adding mention of NtClose
in bug title.)
Updated•2 years ago
|
Updated•1 year ago
|
Description
•