Open Bug 1394289 Opened 7 years ago Updated 2 years ago

Analyze reference profiles for heavy users based on Telemetry data

Categories

(Toolkit :: Startup and Profile System, enhancement)

enhancement

Tracking

()

Tracking Status
firefox57 --- fix-optional

People

(Reporter: wiwang, Unassigned)

References

Details

(Keywords: perf)

User Story

Reference profile specification:
* 20 tabs open
* 110 bookmarks
* 34,000 history items
* addons to install: TBD
* browser homepage setting: "show your windows and tabs from last time" (Firefox) and "continue where you left off" (Chrome)
* bookmarks toolbar showing: YES
+++ This bug was initially created as a clone of Bug #1365296 +++ Bug 1365296 is about creating the Firefox profile of heavy users. Now we have a preliminary analysis of that profile; please refer to Bug 1365296 comment 30. Therefore I filed this bug to distinguish further analysis from the creation of the Firefox profile, and please feel free to leave your findings here. I leave the [qf:p1] untouched, and anyone could change it if needed.
Hi Florian, We now have a preliminary profiling for the heavy user, since you are very experienced in startup profiling, would it be possible for you to take a look at that profiling and share your findings here? Thanks a lot!
Flags: needinfo?(florian)
Whiteboard: [qf:p1] → [qf]
(In reply to Will Wang [:WillWang] from comment #1) > Hi Florian, > > We now have a preliminary profiling for the heavy user, since you are very > experienced in startup profiling, would it be possible for you to take a > look at that profiling and share your findings here? A lot of the startup time in this profile is devtools related, and more specifically related to the browser console. Is there a reason why you used the -jsconsole command line parameter? Then as you already noted in bug 1365296 comment 30, there's 366ms spent initializing the history sidebar. Do we have telemetry data showing that starting with the history sidebar is something 'heavy users' actually do? The rest of startup looks as usual.
Flags: needinfo?(florian)
Thank you, Florian! (In reply to Florian Quèze [:florian] [:flo] (away until August 28th) from comment #2) > A lot of the startup time in this profile is devtools related, and more > specifically related to the browser console. Is there a reason why you used > the -jsconsole command line parameter? Yes! I also noticed them; Usually, I opened the browser console to observe any possible message/warning/error, I think we could ignore them at this moment unless we plan to deal with the overhead introduced by devtools during startup. > Then as you already noted in bug 1365296 comment 30, there's 366ms spent > initializing the history sidebar. Do we have telemetry data showing that > starting with the history sidebar is something 'heavy users' actually do? From my experience, I _guess_ this is a rare case, so I specified 'my sidebar opened' in the comment; AFAIK, I don't think we have this kind of data now, we could get someone who is familiar with the telemetry analysis if this also belongs to our user story.
Ok, so I would suggest reprofiling without the console and without the sidebar.
Not sure there's anything actionable at this point --> calling this [qf:investigate]. Please nominate any helper bugs filed (for stuff discovered in the analysis) as [qf].
Whiteboard: [qf] → [qf:investigate]
(In reply to Florian Quèze [:florian] [:flo] (away until August 28th) from comment #4) > Ok, so I would suggest reprofiling without the console and without the > sidebar. Appreciate for your suggestion! Here is the new profile: http://bit.ly/2wjWeZO Startup command: MOZ_PROFILER_STARTUP_ENTRIES=20000000 MOZ_PROFILER_STARTUP=1 ./mach run -attach-console -n (without -jsconsole) The process of profiling is same as bug 1365296 comment 30, and without the sidebar opened: s1. Start Firefox with gecko profiler and wait the session restored. s2. Open a tab, go cnn.com, make some scrolling. s3. Open a tab, open the heavy-link.html, make some scrolling. From the profile, we can see the noises (HistorySidebarInit() and devtools(browser console) related function calls) are gone now.
(In reply to Will Wang [:WillWang] from comment #6) > Here is the new profile: > http://bit.ly/2wjWeZO I don't see anything in here that seems specific to having a 'heavy user' profile. I see several known issues (eg. bug 1348361, bug 1353013, bug 1356532, ...). I see almost no I/O cost in this profile, which makes me think this is a warm startup profile. It may be interesting to capture a cold startup profile (ie. do the same thing, but right after rebooting the computer). Having a 'heavy user profile' may have more impact there, as we'll pay I/O costs. Is this captured on the reference hardware, or on a machine with a fast SSD? (that would be another explanation for the very little I/O costs).
(In reply to Florian Quèze [:florian] [:flo] from comment #7) > I don't see anything in here that seems specific to having a 'heavy user' > profile. I see several known issues (eg. bug 1348361, bug 1353013, bug > 1356532, ...). Very glad to see that you are experienced in identifying many known issues, which is very invaluable for distinguishing the possible new issues from the known ones! > I see almost no I/O cost in this profile, which makes me think this is a > warm startup profile. It may be interesting to capture a cold startup > profile (ie. do the same thing, but right after rebooting the computer). > Having a 'heavy user profile' may have more impact there, as we'll pay I/O > costs. Nice! Here is the cold startup profile: http://bit.ly/2wkCvZY The startup time is apparently longer than the warm one; For example, we can see that loading DLLs takes more time. And, to exclude interference from other applications, I wait few minutes to let the computer be in a non-busy state after rebooting the computer. > Is this captured on the reference hardware, or on a machine with a fast SSD? > (that would be another explanation for the very little I/O costs). It's a 2013 Toshiba laptop R930 with hard disk(not SSD). For reference, building an m-c on this laptop takes 100~180 min. I was wondering if this new-captured profile cannot reveal useful issues, we might need to re-define our 'heavy user' story, or the generated Firefox profile, or the performance profiling environment/process.
(In reply to Will Wang [:WillWang] from comment #8) > > I see almost no I/O cost in this profile, which makes me think this is a > > warm startup profile. It may be interesting to capture a cold startup > > profile (ie. do the same thing, but right after rebooting the computer). > > Having a 'heavy user profile' may have more impact there, as we'll pay I/O > > costs. > Nice! Here is the cold startup profile: > http://bit.ly/2wkCvZY Unfortunately, this profile doesn't include startup on the main thread of the parent process, I think you either need to wait a little less before capturing the profile, or to use a larger buffer (larger MOZ_PROFILER_STARTUP_ENTRIES value).
(In reply to Florian Quèze [:florian] [:flo] from comment #9) > Unfortunately, this profile doesn't include startup on the main thread of > the parent process, I think you either need to wait a little less before > capturing the profile, or to use a larger buffer (larger > MOZ_PROFILER_STARTUP_ENTRIES value). Oops! Here we are: https://perfht.ml/2eo3vOs (cold startup, no sidebar, no -jsconsole) command: MOZ_PROFILER_STARTUP_ENTRIES=30000000 MOZ_PROFILER_STARTUP=1 ./mach run -attach-console -n I checked this profile and it contains functions from the very beginning, such as `XRE_mainStartup`, `NS_InitXPCOM2`, which were both truncated in previous profile. (BTW, do you have any trick which can determine a profile is recorded from the very beginning or not? see if some startup/init functions exist? thanks!)
Are you using a recent nightly? I see bug 1383088 in this profile, and that bug was closed as fixed 10 days ago. You also have 57ms of social api initialization; we completely removed this code 16 days ago in bug 1388902. NSS initialization (right after first paint) takes almost 1.5s here (mostly I/O), bug 1370516. I also see the usual bug 1363384 taking more than 620ms here. I don't see anything unusual.
(In reply to Will Wang [:WillWang] from comment #10) > (BTW, do you have any trick which can determine a profile is recorded from > the very beginning or not? see if some startup/init functions exist? thanks!) In your previous profile, it was immediately visible to me that the beginning was missing because the blue chart for the main thread started later than others. For a more general answer, to check that I have the beginning of startup, I would filter the stacks to "JS only" and verify that the first very visible piece of blue on the main thread chart is the add-on manager startup (this is happening early during startup, and takes long enough that I can't miss it).
(In reply to Florian Quèze [:florian] [:flo] from comment #11) > Are you using a recent nightly? I see bug 1383088 in this profile, and that > bug was closed as fixed 10 days ago. You also have 57ms of social api > initialization; we completely removed this code 16 days ago in bug 1388902. I used an early August m-c, so I think we can exclude these issues here based on your experience, thanks for your check! For anyone who might want to see the latest profile, here is the one which uses the m-c today[1]: https://perfht.ml/2iVVfu8 Same: cold startup, no sidebar, no -jsconsole, beginning of profile included, and updated m-c. > NSS initialization (right after first paint) takes almost 1.5s here (mostly > I/O), bug 1370516. I also see the usual bug 1363384 taking more than 620ms > here. > > I don't see anything unusual. Thanks! Maybe this means what I was wondering in comment 8: > I was wondering if this new-captured profile cannot reveal useful issues, we > might need to re-define our 'heavy user' story, or the generated Firefox > profile, or the performance profiling environment/process. If so, it might be good to discuss the user story and generated Firefox profile and involve someone if needed. [1] $ hg parent changeset: 378688:8e05298328da tag: tip fxtree: central parent: 378511:dbf9f7430406 parent: 378687:4f2d0ed90f58 user: Sebastian Hengst <archaeopteryx@coole-files.de> date: Sun Sep 03 23:57:05 2017 +0200 summary: merge mozilla-inbound to mozilla-central. r=merge a=merge
(In reply to Florian Quèze [:florian] [:flo] from comment #12) > In your previous profile, it was immediately visible to me that the > beginning was missing because the blue chart for the main thread started > later than others. For a more general answer, to check that I have the > beginning of startup, I would filter the stacks to "JS only" and verify that > the first very visible piece of blue on the main thread chart is the add-on > manager startup (this is happening early during startup, and takes long > enough that I can't miss it). I also noticed that missing part of blue chat once you posted comment 9, it's so obvious ...thanks for your tip! I found the `startup` in `AddonManager.jsm` could be one of the checkpoints I can use, Nice :)
(In reply to Will Wang [:WillWang] from comment #13) > For anyone who might want to see the latest profile, here is the one which > uses the m-c today[1]: > https://perfht.ml/2iVVfu8 > Same: cold startup, no sidebar, no -jsconsole, beginning of profile > included, and updated m-c. 1153ms spent early during startup loading advapi32.dll. I'm not sure we can avoid this, it may be early enough that we don't have time to preload it off main thread. MainProcessSingleton.js does main thread IO with a loadProcessScript for a file that somehow isn't cached, and this accesses a local file in this code path: http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/dom/base/nsFrameMessageManager.cpp#1634 This is strange, could this be an unpacked build with omni.ja file? Other dll files that might benefit from preloading off main thread: - wpc.dll (parental control) - d2d1.dll (triggered by Telemetry startup code) - mmdevapi.dll audioses.dll (although I'm not sure mozilla::widget::AudioSession::Start really needs to happen before first paint), - explorerframe.dll (used by mozilla::widget::WinTaskbar::Initialize after first paint), - the nss libraries: softokn3.dll, nssdbm3.dll, freebl3.dll (but if we can initialize NSS as a whole off main thread, maybe preloading the files won't help anymore, see bug 1370516) - later from mozilla::FFVPXRuntimeLinker::Init we read on the main thread: mozavutil.dll and mozavcodec.dll - even later from UpdateUtils.jsm, we load kernel32 using ctypes. I wonder if this is expensive because kernel32 isn't loaded in memory when we are running on a 64bit Windows... but we are running code from other 32 bit libraries (eg. ole32.pdb)... so this may be a red herring. This code really looks like it should be async. And now I stumbled on "mozJSComponentLoader::LoadModule file:///C:/mozilla-source/mozilla-central/obj-i686-pc-mingw32/dist/bin/components/nsSearchService.js" which makes it obvious that this isn't a packaged build, and that we are going to do plenty of IO that normal builds wouldn't do. So I'm going to stop looking for main thread IO in this profile. Adam, do you see a reason why we do main thread IO in this profile relatively late in startup to load explorerframe.dll despite the ReadAheadDll(L"ExplorerFrame.dll") at http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/toolkit/xre/nsAppRunner.cpp#3740 ? From the DLLs I listed above, what would make sense to add to the ReadAheadDlls_ThreadStart list?
Flags: needinfo?(agashlin)
(In reply to Florian Quèze [:florian] [:flo] from comment #15) > Adam, do you see a reason why we do main thread IO in this profile > relatively late in startup to load explorerframe.dll despite the > ReadAheadDll(L"ExplorerFrame.dll") at > http://searchfox.org/mozilla-central/rev/ > 4d8e389498a08668cce9ebf6232cc96be178c3e4/toolkit/xre/nsAppRunner.cpp#3740 ? That is odd, I can think of two things that might cause it to show up so strongly: 1) The system is so bogged down with I/O that we haven't been able to complete the readahead on explorerframe.dll (not impossible, it is around 4MB), 2) there are operations besides reading the bytes of explorerframe.dll taking place, possibly loading other libraries (under WindowsDllBlocklist::patched_LdrLoadDll explorerframe.dll I see _DllMainCRTStartup for DUI70.dll and DUser.dll). The DirectX libs like DUI70.dll and DUser.dll are actually loaded on a separate thread, I think, part of what we're seeing might be waiting for those to finish up. I saw something similar with dataexchange.dll causing loads of d3d11.dll and dxgi.dll on separate threads but decided at the time not to pursue readahead for them, assuming that Windows was already doing something clever by loading them off-thread. But if it is then waiting for the off-thread load to complete (because it needs to run their _DllMainCRTStartup, for instance), then it may make sense to do readahead on these as well. > From the DLLs I listed above, what would make sense to add to the > ReadAheadDlls_ThreadStart list? They all seem plausible, though I'm struggling to understand kernel32. All the time is spent in a NtWaitForSingleObject, the library loader must be blocked on another thread loading simultaneously? mozavutil.dll and mozavcodec.dll seem like they'd be the biggest win, we'd just use ReadAheadLib for the Mozilla DLLs. It might make sense to try DUI70.dll and DUser.dll, too, to see if that would help with this big cost in explorerframe.dll.
Flags: needinfo?(agashlin)
(In reply to Adam Gashlin [:agashlin] from comment #16) I filed bug 1397706 so that we can continue on this path without hijacking this bug.
(In reply to Adam Gashlin [:agashlin] from comment #16) > > From the DLLs I listed above, what would make sense to add to the > > ReadAheadDlls_ThreadStart list? > > They all seem plausible, though I'm struggling to understand kernel32. All > the time is spent in a NtWaitForSingleObject, the library loader must be > blocked on another thread loading simultaneously? Not exactly. RtlpAllocateListLookup() is a private (note the "p" after Rtl) Windows kernel API that allocates something, and is trying to grab some kind of lock, probably the allocator lock or some such. That doesn't have anything to do with loading the library itself, kernel32 is loaded as one of the first DLLs for any Win32 application at startup since it's extremely hard to do anything useful on Win32 and avoid linking to that library. :-) If you profiled this several times you'd probably see quite different results under this particular call site.
Keywords: perf
Remove qf:investigate since heavy user profiles is not a priority for performance program.
Whiteboard: [qf:investigate]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.