Complete implementation of Base Profiler on Windows
Categories
(Core :: Gecko Profiler, task, P2)
Tracking
()
People
(Reporter: mozbugz, Assigned: toshi)
References
Details
Attachments
(1 file, 1 obsolete file)
Base Profiler cannot be enabled on package builds of Windows yet.
When present and enabled, but not necessarily started, It seems to somehow prevent (or delay?) stdout and stderr in child processes.
Then when Gecko Profiler starts with the main-thread io feature, it initializes PoisonIOInterposer, which blindly tries to register stdout&err, but because they're not available, it triggers an assert -- see details in bug 1556993.
Adding a simple check for errors in MozillaRegisterDebugFD
removes the assert and allows for full use of Base Profiler and Gecko Profiler together on Windows.
But I'm unsure that it's the right way to procede, it feels like wallpapering to prevent an assert, but doesn't do anything about the strange absence of stdout&err because of Base Profiler.
Bug 1556993 is focused on MozillaRegisterDebugFD
and friends, to determine if they should have this kind of error-checking anyway.
This bug here will focus on the cause of stdout&err disappearance. (And/or whatever else is preventing us from enabling Base Profiler on Windows.)
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 1•4 years ago
|
||
Bug 1559379 will fix TestBaseProfiler on Windows.
Reporter | ||
Comment 2•4 years ago
|
||
Bug 1556993 fixed the crash in PoisonIOInterposer (due to missing stdout&stderr
fd's), and bug 1559379 fixed the mozglue memory issue that the unit test
experienced.
So now Base Profiler can be enabled by default on Windows, still using
MOZ_BASE_PROFILER_...
env-vars for now (upcoming bug will merge these with
non-BASE env-vars soon).
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1ccaad832c1d Enable Base Profiler in Windows - r=florian
Comment 4•4 years ago
|
||
bugherder |
Comment 5•4 years ago
|
||
Backed out for causing bug 1531789
R/b on two bc chunks:
https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=windows%2C7%2Cshippable%2Copt%2Cmochitests%2Ctest-windows7-32-shippable%2Fopt-mochitest-browser-chrome-e10s-6%2Cm%28bc6%29&tochange=505173c26e1af4afcb7ee13a567a5ae4d8149dc7&fromchange=58fced48cd5dcf0f87c1d8345e011f6961af78cf
https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=windows%2C7%2Cdebug%2Cmochitests%2Ctest-windows7-32%2Fdebug-mochitest-browser-chrome-e10s-7%2Cm%28bc7%29&tochange=2799cfb271ca258e9087a16601dd16561bd4a3f3&fromchange=ff23369d071869072c56646034cd0411cb346a85
Failure log https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=255475447&repo=autoland&lineNumber=8630
Backout: https://hg.mozilla.org/integration/autoland/rev/07df7588275399c0008f811069f23f2c97c9b084
![]() |
||
Comment 6•4 years ago
|
||
This is either interfering with our detours setup or triggering a load of user32 too early.
https://searchfox.org/mozilla-central/source/mozglue/build/WindowsDllBlocklist.cpp#810
Reporter | ||
Comment 7•4 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #6)
This is either interfering with our detours setup or triggering a load of user32 too early.
https://searchfox.org/mozilla-central/source/mozglue/build/WindowsDllBlocklist.cpp#810
I tried the test locally, and it's failing even without my patch!
I see "DLL blocklist was unable to intercept AppInit DLLs." in the logs, which points to the user32 issue.
Can you help me with this, please? (Or know someone else who could?)
My totally-uneducated guess is that it could be due to the Windows-specific code run during BaseProfiler init:
https://searchfox.org/mozilla-central/rev/f372470e10c8cb0691681603a1d6324dee5b3b8a/mozglue/baseprofiler/core/platform-win32.cpp#326-331
(Or somewhere else in that file?)
If you want&can try on your side, you can enable BaseProfiler with the attached patch, or by just un-commenting this #define:
https://searchfox.org/mozilla-central/rev/e78f06cdc7e8a54adbb1422b5505162b5258d04d/mozglue/baseprofiler/public/BaseProfiler.h#33
![]() |
||
Comment 8•4 years ago
|
||
Hey Aaron, any thoughts here? I tried this in a local build but couldn't reproduce.
Comment 9•4 years ago
|
||
Gerald, I suggest that you start by running Firefox under a debugger and breaking when user32.dll
loads. Once you know the call stack that is triggering the load, we can work from there to figure out how to mitigate it.
I'm not sure about the Visual Studio Debugger, but the way to trigger this in WinDbg is to enter the command sxe ld:user32.dll
before proceeding with startup.
Reporter | ||
Comment 10•4 years ago
|
||
Sorry, I'm getting lost in the woods.
A local build just fails the browser_checkdllblockliststate.js test even without my BaseProfiler-enabling patch.
I don't know how to run the test on a packaged build, so I don't know if the problem is happening there; I tried to debug with WinDBG, but I didn't see an issue or I just don't understand what's happening.
I also tried with the one Nightly that included my patch: https://archive.mozilla.org/pub/firefox/nightly/2019/07/2019-07-09-15-37-42-mozilla-central/ , same non-results.
E.g., here's the stack on the first break after sxe ld:user32.dll
:
[0x0] ntdll!NtMapViewOfSection + 0x14
[0x1] ntdll!LdrpMinimalMapModule + 0x101
[0x2] ntdll!LdrpMapDllWithSectionHandle + 0x1b
[0x3] ntdll!LdrpLoadKnownDll + 0xe6
[0x4] ntdll!LdrpFindOrPrepareLoadingModule + 0xbd
[0x5] ntdll!LdrpLoadDllInternal + 0x11c
[0x6] ntdll!LdrpLoadDll + 0xa8
[0x7] ntdll!LdrLoadDll + 0xe4
[0x8] mozglue!DllBlocklist_Initialize + 0x10f5
[0x9] KERNELBASE!LoadLibraryExW + 0x161
[0xa] mozglue!DllBlocklist_Initialize + 0x244
[0xb] firefox!Ordinal0 + 0x1402
[0xc] firefox!Ordinal0 + 0x122d
[0xd] firefox!TargetNtUnmapViewOfSection + 0x9338
[0xe] KERNEL32!BaseThreadInitThunk + 0x14
[0xf] ntdll!RtlUserThreadStart + 0x21
which seems "normal"(?)
So I'd really need help from someone who could reproduce the Try failure with my patch, and then could debug the cause. I'd be willing to get training, but I think that would be a waste of resources compared to a Windows expert just doing their thing; I'm remote in Australia, can video-chat or even travel if that was necessary.
Aaron, could you please help, or recommend someone?
![]() |
||
Comment 11•4 years ago
|
||
(In reply to Gerald Squelart [:gerald] from comment #10)
Sorry, I'm getting lost in the woods.
A local build just fails the browser_checkdllblockliststate.js test even without my BaseProfiler-enabling patch.
I don't know how to run the test on a packaged build, so I don't know if the problem is happening there; I tried to debug with WinDBG, but I didn't see an issue or I just don't understand what's happening.I also tried with the one Nightly that included my patch: https://archive.mozilla.org/pub/firefox/nightly/2019/07/2019-07-09-15-37-42-mozilla-central/ , same non-results.
E.g., here's the stack on the first break aftersxe ld:user32.dll
:[0x0] ntdll!NtMapViewOfSection + 0x14
[0x1] ntdll!LdrpMinimalMapModule + 0x101
[0x2] ntdll!LdrpMapDllWithSectionHandle + 0x1b
[0x3] ntdll!LdrpLoadKnownDll + 0xe6
[0x4] ntdll!LdrpFindOrPrepareLoadingModule + 0xbd
[0x5] ntdll!LdrpLoadDllInternal + 0x11c
[0x6] ntdll!LdrpLoadDll + 0xa8
[0x7] ntdll!LdrLoadDll + 0xe4
[0x8] mozglue!DllBlocklist_Initialize + 0x10f5
[0x9] KERNELBASE!LoadLibraryExW + 0x161
[0xa] mozglue!DllBlocklist_Initialize + 0x244
[0xb] firefox!Ordinal0 + 0x1402
[0xc] firefox!Ordinal0 + 0x122d
[0xd] firefox!TargetNtUnmapViewOfSection + 0x9338
[0xe] KERNEL32!BaseThreadInitThunk + 0x14
[0xf] ntdll!RtlUserThreadStart + 0x21
which seems "normal"(?)So I'd really need help from someone who could reproduce the Try failure with my patch, and then could debug the cause. I'd be willing to get training, but I think that would be a waste of resources compared to a Windows expert just doing their thing; I'm remote in Australia, can video-chat or even travel if that was necessary.
Aaron, could you please help, or recommend someone?
Hey Gerald,
Aaron's pretty busy right now with other work so he really doesn't have the cycles for this. I will try to find some help from someone on the team (no ideas but will poke around).
![]() |
||
Comment 12•4 years ago
|
||
Hey Adam, maybe you can help Gerald find a solution here? Something weird with early dll loads after we introduced some profiling code.
Comment 13•4 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:gerald, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 14•4 years ago
|
||
Sorry for letting this sit for so long, but I'm not going to be able to give it the time I need to get up to speed. Jim, could you redirect?
I'm also seeing browser_checkdllblockliststate.js fail in unmodified m-c (dba2c8019074) on a local debug build, but not on release. I wasn't able to work out why.
I was confused for a while trying to check this on a release build. I was hitting the assert when I changed it to MOZ_RELEASE_ASSERT and backed out bug 1556993, even with no profiler stuff enabled. I think that this is just because there is no console, unlike in debug. The docs say _get_osf_handle returns -2 "for example, in a Windows application without a console window" . Running with -console doesn't change that, I'm not sure why (maybe it doesn't get attached soon enough).
But I think this is another good argument for the fix in bug 1556993: when there's nothing on stdout/err that issue would routinely be hit, and I think that's the case on release, so it makes sense to handle the error.
![]() |
||
Comment 15•4 years ago
|
||
Hey Toshihito, when you have some free cycles could you take a look at this issue and see if you can figure out what's going on? We have a check for a dll blocklist related to user32 getting loaded too early that triggers with some of these profiler changes. The profiling team is in need of some Windows expertise here.
Assignee | ||
Comment 16•4 years ago
|
||
(In reply to Gerald Squelart [:gerald] from comment #10)
So I'd really need help from someone who could reproduce the Try failure with my patch, and then could debug the cause. I'd be willing to get training, but I think that would be a waste of resources compared to a Windows expert just doing their thing; I'm remote in Australia, can video-chat or even travel if that was necessary.
I submitted an empty Try with your patch on top of the latest codebase. Could you tell me which test I should select?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=99a832e16efd33ebdbbffe14ee695e4b684ee07e
I ran mach test browser_checkdllblockliststate
locally, but it just passed. Please let me know if I'm missing something.
0:05.63 TEST_START: toolkit/xre/test/browser_checkdllblockliststate.js
0:05.63 INFO Entering test bound test
0:05.96 PASS Windows dll blocklist status should be true, indicating it is running properly. A failure in this test is considered a release blocker. -
0:05.97 INFO Leaving test bound test
Assignee | ||
Comment 17•4 years ago
|
||
I found browser_checkdllblockliststate
in mochitest-browser-chrome. I hit the same failure, but interestingly this failure happens only on Win7. No failure from Win10. Upon quick debugging, user32.dll was loaded even before firefox!wmain
on Win7, while the exact same binary did not load user32.dll on Win10. I'll spend more time on this..
Assignee | ||
Comment 18•4 years ago
|
||
I think I got the root cause. When BaseProfiler is enabled on Windows, mozglue.dll static-links winmm.dll because SamplerThread
uses timeBeginPeriod
/timeEndPeriod
. On Win7, winmm.dll static-links user32.dll while win10's winmm.dll does not. This is why user32.dll is loaded at startup on Win7, causing the test failure.
We can add winmm.dll to the delayload list of mozglue, but it does not mitigate the issue when MOZ_BASE_PROFILER_STARTUP=1
is set because the base profiler is initialized and started earlier than the DllBlockList in such a case.
Comment 19•4 years ago
|
||
Now that we have the launcher process, the user32 check is really only relevant for the fallback case where the launcher process failed and we're starting the "legacy" blocklist.
We should also add a check to skip that check when sInitFlags & eDllBlocklistInitFlagWasBootstrapped
.
Comment 20•4 years ago
•
|
||
Honestly, maybe the most sensible thing to do is just to also skip that check when MOZ_BASE_PROFILER_STARTUP=1
anyway.
Now we know why user32
is being brought in, I think it's safe to say that startup profiling is outside the typical use cases for blocking dlls anyway.
That of course is assuming that we don't plan on setting this env var by default on Nightly or something. Gerald, is MOZ_BASE_PROFILER_STARTUP
going to continue to be set only by individual users, or are we planning on setting it as the default somewhere?
Reporter | ||
Comment 21•4 years ago
|
||
First, thank you Toshihito for looking into this. Sorry I didn't respond quickly enough to your comment 16, but it's great to see you've found the issue!
(In reply to Aaron Klotz [:aklotz] from comment #20)
Gerald, is
MOZ_BASE_PROFILER_STARTUP
going to continue to be set only by individual users, or are we planning on setting it as the default somewhere?
That's right, MOZ_BASE_PROFILER_STARTUP
should only be set by users (and only a small subset of profiler users, mostly Mozilla devs interested in start-up perf).
But like MOZ_PROFILER_STARTUP
(not "BASE"), the parent process will use it to start the profiler in new child processes, when the profiler has been enabled (through MOZ_{,BASE_}PROFILER_STARTUP
or through the add-on). But that's still due to user action.
(In reply to Toshihito Kikuchi [:toshi] from comment #18)
When BaseProfiler is enabled on Windows, mozglue.dll static-links winmm.dll because
SamplerThread
usestimeBeginPeriod
/timeEndPeriod
. On Win7, winmm.dll static-links user32.dll while win10's winmm.dll does not. This is why user32.dll is loaded at startup on Win7, causing the test failure.
I'm a little bit concerned that the profiler "moves" that load, so its real impact on our code won't be visible. But hopefully it's only a tiny influence compared to all the other things that happen at process start-up. And it's only on win7, so a more accurate profile can be done on win10 anyway.
So, going forward:
As per comment 19 and comment 20, we shouldn't do the check on win < 10, and when MOZ_BASE_PROFILER_STARTUP
is set, right?
I'll see if I can prepare a patch myself...
Thank you again everybody for helping me.
Comment 22•4 years ago
|
||
In summary, I think we need three things:
- Add
winmm.dll
tomozglue
's delayload list; - Skip the
user32
check whensInitFlags & eDllBlocklistInitFlagWasBootstrapped
; - Skip the
user32
check when win < 10 andMOZ_BASE_PROFILER_STARTUP
is set.
Assignee | ||
Comment 23•4 years ago
•
|
||
As a note, I submitted a Try with the change to add winmm.dll to the delayload list, and the test passed.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=76d75d8cf3c7eb738afb62ab6df053f639c5aaf1
(browser_checkdllblockliststate.js
was run as a part of bc4
in Windows 7 opt)
Assignee | ||
Comment 24•4 years ago
|
||
Let me prepare a patch tomorrow. The most part of a fix would be in the sandbox area :).
Assignee | ||
Comment 25•4 years ago
|
||
Bug 1556993 fixed the crash in PoisonIOInterposer (due to missing stdout&stderr
fd's), bug 1559379 fixed the mozglue memory issue that the unit test
experienced, and this patch also includes a fix for the test failure in
browser_checkdllblockliststate.js.
So now Base Profiler can be enabled by default on Windows, still using
MOZ_BASE_PROFILER_...
env-vars for now (upcoming bug will merge these with
non-BASE env-vars soon).
Comment 26•4 years ago
|
||
Pushed by aklotz@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/299dc828295a Enable Base Profiler in Windows. r=aklotz,gerald
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Comment 27•4 years ago
|
||
bugherder |
Comment 28•4 years ago
|
||
Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.
![]() |
||
Updated•4 years ago
|
Description
•