Closed
Bug 1458375
Opened 7 years ago
Closed 6 years ago
Remove empty binding & investigate the performance gain caused by it
Categories
(Core :: XBL, task, P2)
Core
XBL
Tracking
()
RESOLVED
FIXED
mozilla62
Tracking | Status | |
---|---|---|
firefox62 | --- | fixed |
People
(Reporter: timdream, Assigned: mconley)
References
Details
Attachments
(9 files, 1 obsolete file)
59 bytes,
text/x-review-board-request
|
rwood
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
rwood
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
Felipe
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
rwood
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
rwood
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
kmag
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
timdream
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
snorp
:
review+
|
Details |
59 bytes,
text/x-review-board-request
|
kmag
:
review+
|
Details |
See bug 1456728. There is a cpstartup performance regression caused by the removal of the scrollbar binding. Adding an empty binding "fixes" it.
We will need to investigate why and remove it.
Updated•7 years ago
|
Blocks: war-on-xbl
Updated•7 years ago
|
Priority: -- → P2
Reporter | ||
Comment 1•7 years ago
|
||
Continue from my investigation from bug 1456728 comment 43:
So I've got to a point where on OSX debug locally, I can see the build gets slower if nsSyncLoadService::PushSyncStreamToListener() calls into nsParser::OnStopRequest(). However, when I copy my URI check code into the function and make OnStopRequest() no-op when it hits general.xml, the build continue to run fast. The two results are conflicting. I guess my methodology might be flawed.
What I am going to do next is to repeat the process in the loner machine, hopefully to avoid the noises on my machine locally.
Reporter | ||
Comment 2•7 years ago
|
||
Just to confirm what I see from Try:
The performance gain happened in nsSyncLoadService::PushSyncStreamToListener() but before XMLDocument::EndLoad()
Performance regressed by skipping nsSyncLoadService::PushSyncStreamToListener() in content process:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=bbe9bfd1dc21&framework=1
But not by an early return from XMLDocument::EndLoad()
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=303f5970e696&framework=1
Reporter | ||
Comment 3•7 years ago
|
||
I have now narrowed down the code that makes it fast to be inside nsParser::ResumeParse()
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=065b6219e7db&framework=1&filter=cpstartup
But not any calls into nsXMLDocumentSink from there:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=9b2c66578bed&framework=1&filter=cpstartup
Reporter | ||
Comment 4•7 years ago
|
||
Worth investigating: we load xul.css in content process in cpstartup test.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed50dfc9fce1f397f703dd403a723e2429e35375
( Ctrl+F "nsSyncLoadService::PushSyncStreamToListener()" in this log https://taskcluster-artifacts.net/ADWlcrjWSbWG0m-EM1SzmQ/0/public/logs/live_backing.log )
Reporter | ||
Comment 5•7 years ago
|
||
Yeah, I don't think components.css belongs to content process either :'(
Reporter | ||
Comment 6•7 years ago
|
||
... and the nsExpatDriver::ParseBuffer() call is identified in nsExpatDriver::ConsumeToken()
https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/parser/htmlparser/nsExpatDriver.cpp#985
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=1ccef9e0f1fe&framework=1&filter=cpstartup
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=29c53f093a74&framework=1&filter=cpstartup
Comment 7•7 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #4)
> Worth investigating: we load xul.css in content process in cpstartup test.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #5)
> Yeah, I don't think components.css belongs to content process either :'(
That must be due to the page loading over a chrome:// URI, or some other test-specific thing. The only place where these sheets get loaded are if LoadsFullXULStyleSheetUpFront is true: https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/layout/base/nsDocumentViewer.cpp#2372-2385.
Reporter | ||
Comment 8•7 years ago
|
||
(In reply to Brian Grinstead [:bgrins] from comment #7)
> That must be due to the page loading over a chrome:// URI, or some other
> test-specific thing. The only place where these sheets get loaded are if
> LoadsFullXULStyleSheetUpFront is true:
> https://searchfox.org/mozilla-central/rev/
> b28b94dc81d60c6d9164315adbd4a5073526d372/layout/base/nsDocumentViewer.
> cpp#2372-2385.
You are right. This eventually reaches nsIDocument::InternalAllowXULXBL() and checks the principal.
https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/dom/base/nsDocument.cpp#5023
I actually don't understand why target.html has to be loaded with system principal, but I haven't tried to remove this.
https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/testing/talos/talos/tests/cpstartup/bootstrap.js#95
Reporter | ||
Comment 9•7 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #6)
The fact that libexpat could cause any side effect is pretty unimaginable. The side effect might be caused by a state change eventually found. So this will still take time to investigate.
Comment 10•7 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #8)
> (In reply to Brian Grinstead [:bgrins] from comment #7)
> > That must be due to the page loading over a chrome:// URI, or some other
> > test-specific thing. The only place where these sheets get loaded are if
> > LoadsFullXULStyleSheetUpFront is true:
> > https://searchfox.org/mozilla-central/rev/
> > b28b94dc81d60c6d9164315adbd4a5073526d372/layout/base/nsDocumentViewer.
> > cpp#2372-2385.
>
> You are right. This eventually reaches nsIDocument::InternalAllowXULXBL()
> and checks the principal.
>
> https://searchfox.org/mozilla-central/rev/
> b28b94dc81d60c6d9164315adbd4a5073526d372/dom/base/nsDocument.cpp#5023
>
> I actually don't understand why target.html has to be loaded with system
> principal, but I haven't tried to remove this.
I think the 'fix' for this is to update the test to not use system principle for the content process document.
I don't know if there are any cases where we load chrome:// URIs in a content process for normal browsing (and if there are it certainly isn't the 'normal' entry point for starting a content process. So whatever side effects we are hitting here are probably affecting the test but not normal users. Mike, does that make sense? And if so, would it be be relatively easy to load the document over localhost instead of chrome://?
Flags: needinfo?(mconley)
Assignee | ||
Comment 11•7 years ago
|
||
(In reply to Brian Grinstead [:bgrins] from comment #10)
> I don't know if there are any cases where we load chrome:// URIs in a
> content process for normal browsing (and if there are it certainly isn't the
> 'normal' entry point for starting a content process. So whatever side
> effects we are hitting here are probably affecting the test but not normal
> users. Mike, does that make sense? And if so, would it be be relatively easy
> to load the document over localhost instead of chrome://?
Yeah, I agree that loading up a chrome:// URI in a new tab is not a very likely scenario. If that's having an impact on the measurement, then we should adjust the test to match the common scenario. There's no great reason to load a chrome:// URI except that (I think) it was just easier to write.
I think I can provide a patch to update the test (rwood should review it). Coming up.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 13•7 years ago
|
||
Okay, this patch moves the page load to http://. I suggest importing that, putting a patch on top of it to remove the empty binding, and seeing what Talos on Try thinks. If it's all green, rwood would be a good reviewer for the Talos bits.
Flags: needinfo?(mconley)
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Reporter | ||
Comment 16•7 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #9)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #6)
>
> The fact that libexpat could cause any side effect is pretty unimaginable.
... and I can confirm that calling into XML_Parse() in xmlparse.c triggers speed-up...
https://treeherder.mozilla.org/#/jobs?repo=try&author=timdream@gmail.com&fromchange=0f0889a9b70f331e6bbf044439677f358ec4d910&selectedJob=177544633
Reporter | ||
Comment 17•7 years ago
|
||
Here is a brief intro on what happened and what I've found: over at bug 1456728, I was flagged by talos for slowing down content process start-up by moving the <xul:scrolbar> mark-ups from the XBL binding to nsScrollbarFrame. Investigation shows that simply by adding an empty binding back to <xul:scrollbar>, the performance regression goes away. Given the merge schedule, I checked-in the empty binding to central, and continue the investigation over here.
My methodology of locating the cause involving identifying the code path that loads the empty binding, and insert early returns in various places. By comparing the talos results on the try pushes, I was able to narrow down the exact function calls that lead to the effect. Eventually, I reached comment 16, which skipping XML_Parse() in nsExpatDriver::ParseBuffer() re-creates the performance regression*.
This is very mindblowing because expat should be self-contained and perhaps stateless. We were hoping to find something that gets initialized earlier/later because of the binding and perhaps keeping the performance by moving it somewhere else. That *something* certainly shouldn't be attempt parse an XML file :-/.
:mconley had a brief discussion with me on this over IRC; he recommends that I use the Linux perf tool [1] to figure out that slows us down when we don't call into expat. It would also help us understand if the cause here is memory allocation-related.
[1] https://en.wikipedia.org/wiki/Perf_(Linux)
This is way out of my field of expertise so it would take some time to set it up. If you have free cycles to dive into this mystery, feel free to take over.
* I've ruled out nsXMLDocumentSink calls from nsParser::ResumeParse() (comment 3) but I didn't rule out calls by nsExpatDriver.
Assignee | ||
Comment 18•7 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #17)
>
> :mconley had a brief discussion with me on this over IRC; he recommends that
> I use the Linux perf tool [1] to figure out that slows us down when we don't
> call into expat. It would also help us understand if the cause here is
> memory allocation-related.
>
I can try to work on this tomorrow if you can forward me the details to sign into the Talos machine you've got loaned out.
Flags: needinfo?(timdream)
Assignee | ||
Comment 20•7 years ago
|
||
Thanks, received! ni?ing myself for analysis.
Flags: needinfo?(mconley)
Assignee | ||
Comment 21•7 years ago
|
||
So, just an update here - I'm able to reproduce the issue on the Talos loaner, but I can't get perf installed on this particular distro.
I think I'm going to fall back to Gecko Profiler profile analysis.
Assignee | ||
Comment 22•7 years ago
|
||
Got some profiles from the Talos loaner. Talos by default does 2 runs when profiling, so I've included both profiles for each state.
# Good (empty binding):
Scores: 1674.0, 1627.0
https://perfht.ml/2wHX1WD
https://perfht.ml/2Kp7BUl
# Bad (XML_Parse bypassed):
Scores: 1691.0, 1702.0
https://perfht.ml/2Kp9oZz
https://perfht.ml/2wNZcYW
Continuing analysis...
Assignee | ||
Comment 23•7 years ago
|
||
Alright, I'm about ready to give up on this one.
I was able to get perf running on the Talos loaner, and the only thing that jumped out at me is that in the "bad" case, for the content process, the CPU misses the L1 cache more. I did a few runs, and here's the difference:
Good: ~227 L1-dcache-loadmisses
Bad: ~251 L2-dcache-loadmisses
That's not a huge increase, but it's the only thing that stuck out to me.
Looking at the default events that perf records, here's what I got...
Good case:
72.46% Web Content [unknown] [.] 0x7fadfc0de1e0
1.83% Web Content [kernel.kallsyms] [k] page_fault
1.45% Web Content libc-2.15.so [.] __strlen_sse2_pminub
1.36% Web Content [kernel.kallsyms] [k] clear_page_c
1.20% Web Content libc-2.15.so [.] __strcmp_sse2
0.99% Web Content libxul.so [.] nsBufferedOutputStream::QueryInterface(nsID const&, void**)
0.95% Web Content [kernel.kallsyms] [k] error_entry
0.93% Web Content [kernel.kallsyms] [k] hpet_msi_next_event
0.93% Web Content perf-24669.map [.] 0x331c4938c2e0
0.91% Web Content [kernel.kallsyms] [k] native_write_msr_safe
0.85% Web Content libglib-2.0.so.0.3200.1 [.] slab_allocator_alloc_chunk
0.82% Web Content libc-2.15.so [.] __memcpy_ssse3_back
0.75% Web Content libc-2.15.so [.] __strchr_sse42
0.61% Web Content libpthread-2.15.so [.] pthread_mutex_lock
0.48% Web Content firefox [.] arena_t::Malloc(unsigned long, bool)
0.48% Web Content [kernel.kallsyms] [k] _cond_resched
0.47% Web Content [kernel.kallsyms] [k] __mem_cgroup_commit_charge
0.47% Web Content [kernel.kallsyms] [k] __ticket_spin_lock
0.47% Web Content [kernel.kallsyms] [k] radix_tree_lookup_element
0.47% Web Content [kernel.kallsyms] [k] pagevec_lru_move_fn
0.47% Web Content [kernel.kallsyms] [k] _copy_to_user
0.46% Web Content [kernel.kallsyms] [k] ____pagevec_lru_add
0.46% Web Content [kernel.kallsyms] [k] anon_vma_clone
0.46% Web Content [kernel.kallsyms] [k] get_page_from_freelist
0.45% Web Content [kernel.kallsyms] [k] sys_mmap
0.45% Web Content [kernel.kallsyms] [k] ____pagevec_lru_add_fn
0.44% Web Content [kernel.kallsyms] [k] __phys_addr
0.44% Web Content libX11.so.6.3.0 [.] 0x514e0
0.43% Web Content libxul.so [.] silk_encode_frame_FLP
0.43% Web Content [nvidia] [k] 0x521783
0.43% Web Content [kernel.kallsyms] [k] mem_cgroup_charge_common
0.43% Web Content libgobject-2.0.so.0.3200.1 [.] g_type_class_ref
0.41% Web Content libglib-2.0.so.0.3200.1 [.] g_slice_alloc
0.41% Web Content libglib-2.0.so.0.3200.1 [.] g_string_insert_len
0.37% Web Content firefox [.] free
0.37% Web Content firefox [.] Allocator<MozJemallocBase>::realloc(void*, unsigned long)
0.34% Web Content [kernel.kallsyms] [k] zap_pte_range
0.34% Web Content [kernel.kallsyms] [k] bit_spin_unlock
0.33% Web Content [kernel.kallsyms] [k] find_vma
0.31% Web Content [kernel.kallsyms] [k] free_pcppages_bulk
0.31% Web Content libgobject-2.0.so.0.3200.1 [.] g_type_check_instance_is_a
0.29% Web Content [kernel.kallsyms] [k] mem_cgroup_del_lru_list.part.36
0.28% Web Content libglib-2.0.so.0.3200.1 [.] g_datalist_get_flags
0.26% Web Content firefox [.] arena_salloc(void const*)
0.24% Web Content [kernel.kallsyms] [k] page_add_new_anon_rmap
0.23% Web Content libz.so.1.2.3.4 [.] inflate
0.23% Web Content libz.so.1.2.3.4 [.] 0xd846
0.23% Web Content [kernel.kallsyms] [k] restore_sigcontext
0.21% Web Content [kernel.kallsyms] [k] __wake_up_bit
0.21% Web Content [kernel.kallsyms] [k] skb_release_data
0.19% Web Content [kernel.kallsyms] [k] lookup_page_cgroup
0.19% Web Content [kernel.kallsyms] [k] sk_run_filter
Bad case:
74.93% Web Content [unknown] [.] 0x7f92f28160d6
1.57% Web Content [kernel.kallsyms] [k] __cpa_flush_all
1.26% Web Content perf-17400.map [.] 0x32d0738a2ba2
1.18% Web Content ld-2.15.so [.] do_lookup_x
1.09% Web Content [nvidia] [k] nv_flush_cache
0.91% Web Content libglib-2.0.so.0.3200.1 [.] g_slice_alloc
0.88% Web Content [kernel.kallsyms] [k] page_fault
0.84% Web Content libxul.so [.] nsBufferedOutputStream::QueryInterface(nsID const&, void**)
0.69% Web Content [kernel.kallsyms] [k] clear_page_c
0.64% Web Content libgobject-2.0.so.0.3200.1 [.] g_signal_newv
0.57% Web Content libglib-2.0.so.0.3200.1 [.] g_hash_table_lookup
0.51% Web Content libpthread-2.15.so [.] pthread_mutex_lock
0.47% Web Content libgobject-2.0.so.0.3200.1 [.] g_type_value_table_peek
0.45% Web Content [kernel.kallsyms] [k] prep_new_page
0.45% Web Content [kernel.kallsyms] [k] mem_cgroup_add_lru_list.part.37
0.44% Web Content [kernel.kallsyms] [k] _cond_resched
0.44% Web Content [kernel.kallsyms] [k] retint_swapgs
0.43% Web Content [kernel.kallsyms] [k] perf_pmu_rotate_start.isra.42
0.43% Web Content [kernel.kallsyms] [k] system_call
0.43% Web Content [kernel.kallsyms] [k] prio_tree_insert
0.43% Web Content [kernel.kallsyms] [k] path_init
0.42% Web Content [kernel.kallsyms] [k] syscall_trace_enter
0.41% Web Content [kernel.kallsyms] [k] radix_tree_lookup_element
0.41% Web Content [kernel.kallsyms] [k] ahci_port_intr
0.41% Web Content [kernel.kallsyms] [k] mem_cgroup_uncharge_start
0.40% Web Content [kernel.kallsyms] [k] __find_get_block_slow
0.40% Web Content libX11.so.6.3.0 [.] 0x4db56
0.39% Web Content libgdk-3.so.0.400.1 [.] 0x5b8bb
0.39% Web Content [kernel.kallsyms] [k] handle_mm_fault
0.39% Web Content libc-2.15.so [.] __GI_____strtod_l_internal
0.39% Web Content [nvidia] [k] 0xa34b
0.39% Web Content [kernel.kallsyms] [k] fd_install
0.39% Web Content [kernel.kallsyms] [k] get_page_from_freelist
0.38% Web Content ld-2.15.so [.] _dl_lookup_symbol_x
0.38% Web Content libglib-2.0.so.0.3200.1 [.] get_C_locale
0.37% Web Content [kernel.kallsyms] [k] do_wp_page
0.36% Web Content libglib-2.0.so.0.3200.1 [.] slab_allocator_alloc_chunk
0.36% Web Content [kernel.kallsyms] [k] find_get_page
0.35% Web Content libpthread-2.15.so [.] pthread_mutex_unlock
0.33% Web Content libz.so.1.2.3.4 [.] crc32
0.31% Web Content libglib-2.0.so.0.3200.1 [.] g_slice_free_chain_with_offset
0.30% Web Content [kernel.kallsyms] [k] copy_creds
0.27% Web Content libgtk-3.so.0.400.1 [.] 0x104a68
0.26% Web Content libgio-2.0.so.0.3200.1 [.] g_local_file_init
0.24% Web Content firefox [.] SizeClass::SizeClass(unsigned long)
0.24% Web Content ld-2.15.so [.] strcmp
0.21% Web Content libglib-2.0.so.0.3200.1 [.] g_array_append_vals
0.21% Web Content firefox [.] Allocator<MozJemallocBase>::free(void*)
0.20% Web Content libc-2.15.so [.] __strchr_sse42
0.19% Web Content libglib-2.0.so.0.3200.1 [.] g_str_hash
0.18% Web Content libpthread-2.15.so [.] pthread_getspecific
0.18% Web Content libc-2.15.so [.] __dcigettext
0.18% Web Content libgobject-2.0.so.0.3200.1 [.] g_type_interface_peek
0.18% Web Content libc-2.15.so [.] __strcmp_sse42
0.18% Web Content libgobject-2.0.so.0.3200.1 [.] param_spec_pool_hash
0.18% Web Content [kernel.kallsyms] [k] __do_softirq
0.13% Web Content [kernel.kallsyms] [k] find_vma
0.00% Web Content [kernel.kallsyms] [k] native_write_msr_safe
My conclusion here, based on this, as well as the Gecko Profiles that we've looked at, is that there's not one thing that we're doing wrong that seems to be hurting us here. Something at a lower level - maybe within some of the GTK libraries, or maybe within the kernel, or maybe down to the CPU caches - is burning us here, and causing us to be a little more expensive.
The other thing to note is that these Talos machines are running a super out-of-date version of Ubuntu (12.04 Precise LTS). That version of Ubuntu hasn't had official support from Canonical in about a year. We should probably upgrade the OS and remeasure here - especially because I haven't been able to reproduce this issue on modern versions of Ubuntu (even when I match CPU cores and RAM to the Talos machines).
I'm going to see if I can get some help getting the Talos loaner updated to a modern version of Ubuntu, and see if the regression is still reproducible.
Flags: needinfo?(mconley)
Reporter | ||
Comment 24•7 years ago
|
||
Thank you so much for looking into it! I am not sure a recent version of Ubuntu, but I was reproducing this bug locally on macOS 10.12. There was just too much notice locally on my machine to work on this bug reliably.
Perhaps I can try to produce two profiles on macOS like comment 22? Last time I did I couldn't get the symbols; I can try to figure out that to do there.
Assignee | ||
Comment 25•7 years ago
|
||
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #24)
> Thank you so much for looking into it! I am not sure a recent version of
> Ubuntu, but I was reproducing this bug locally on macOS 10.12.
I never considered trying to reproduce this on my local macOS machine. How embarrassing - if I can reproduce this with XCode Instruments connected, that might be super helpful in illuminating the issue. I'll try that today.
Assignee | ||
Comment 26•6 years ago
|
||
With XCode Instruments, I may have fallen upon a lead, but it's a pretty weird one.
Using the System Trace instrument, I was able to look at thread locking, context switching, etc. One of the things I noticed was in the "good" case, content processes tend to have js::HelperThread threads doing (what appears to be) JS parsing off of the main thread. I couldn't find these threads in the "bad" case (when I remove the empty binding).
I am able to control parallel parsing with the javascript.options.parallel_parsing pref. Locally, when I disabled parallel parsing, the difference between baseline and removing the empty binding went away (if anything, it appeared as if removing the empty binding was slightly better than baseline).
I've got some try pushes cooking to see if this is the case on Linux64 on our Talos hardware as well. If this truly is the root cause - that we're somehow not parsing our JS in parallel properly in the bad case - I guess the next question is, "why"?
Or this might be all a red herring. We'll see, I guess.
Assignee | ||
Comment 27•6 years ago
|
||
Disabling parallel JS in the binary (which I was tinkering with before I found the pref) gets rid of the difference between the baseline and removing the binding:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7ea891e89077&newProject=try&newRevision=47d0489f190c&framework=1
Flipping the javascript.options.parallel_parsing pref to false is similar:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9407d4cefbf4&newProject=try&newRevision=d684823ce8e7&framework=1
The plot thickens. I'm now trying to figure out if and when we schedule parallel JS parsing in the content process normally, and if and how that's being affected somehow by XML parsing the empty binding.
Assignee | ||
Comment 28•6 years ago
|
||
Okay, miraculously, I think I'm 80% of the way towards understanding what's happening here.
Good news! I'm almost sure that we've caught a real performance regression here! \o/ Thanks, Talos!
We have this thing called the ScriptPreloader that notices scripts loaded in the parent and content processes before the browser-delayed-startup-finished observer notification for the first window fires. After that, the scripts it has seen are parsed off of the main thread, sorted by the order they were loaded in, and stashed in a binary format on the file system.
Then, on start-up, the cache is opened and made available for when both the parent and content processes want to load these scripts - we're able to skip the parsing step and start running the scripts more quickly.
The ScriptPreloader was added in bug 1359653. I'm pretty sure most people don't know it exists. It _does_ have a nice impact on our Talos numbers though (see bug 1359653 comment 93).
Anyhow, for some reason, when we don't have the XBL binding on the scrollbar, the scripts from the content process are only heard about _after_ the browser-delayed-startup-finished notification fires, so they miss the window of time to get into the cache.
The solution might end up being that browser-delayed-startup-finished is not the right point at which to "close the door" on scripts to put into the cache. We might want another notification to fire after all of the idle callback scripts fire in browser.js as well. We might also want to ensure that the content process has sent up its initial scripts too.
So, ultimately, the reason we're regressing cpstartup is because the scripts we're running aren't in the ScriptPreloader cache.
I'm still trying to figure out why, but I'm quite confident I'll have an answer shortly. Stay tuned.
Assignee | ||
Comment 29•6 years ago
|
||
So it seems that the reason we're not caching the scripts from the child when we don't parse the XML seems to be some kind of scheduling foul-up.
In both cases (with or without the empty XBL binding), the browser-delayed-startup-finished notification can fire before the list of scripts are received from the child to cache. At that point, the parent asks a different thread to "prepare" the cache that will be written to disk. Meanwhile, the first tab's content process is sending the list of scripts to cache to the parent, and that message is sitting on the messaging thread waiting for the main thread to service it.
In the "good" case, where we have an empty XBL binding, the thread to prepare the cache is serviced _after_ the IPC message from the child is serviced. This means that we see browser-delayed-startup-finished, queue up the preparation, get some more scripts from the content process, and then prepare and write the cache to disk off of the main thread.
In the "bad" case, with no empty XBL binding, the thread to prepare the cache is serviced _before_ the IPC message from the child is serviced. This means that we see browser-delayed-startup-finished, queue up the preparation, and then start that preparation before we start servicing that list of scripts from the content process. By the time we _do_ service that message, we've already written the cache.
Assignee | ||
Comment 30•6 years ago
|
||
Hey kmag,
Do you have any objections if I change ScriptPreloader to wait for something other than browser-delayed-startup-finished for "closing the caching window" during start-up? I'm thinking, specifically, of adding a new observer notification here:
https://searchfox.org/mozilla-central/rev/5a744713370ec47969595e369fd5125f123e6d24/browser/base/content/browser.js#1781
called browser-idle-startup-tasks-finished, which ScriptPreloader will wait for.
This will increase the size of the caching window significantly, and greatly increases the probability that the content process(es) will have sent up its list of scripts for caching.
Do you see any issues with that?
Flags: needinfo?(kmaglione+bmo)
Comment 31•6 years ago
|
||
Sounds fine to me. I've always suspected it might be a good idea to push the window back a bit further, but there was no other obvious place to put it.
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 32•6 years ago
|
||
Talked to timdream - I'll take this bug.
Assignee: timdream → mconley
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•6 years ago
|
Attachment #8973786 -
Attachment is obsolete: true
Comment 40•6 years ago
|
||
mozreview-review |
Comment on attachment 8981991 [details]
Bug 1458375 - Make ScriptPreloader wait until browser-idle-startup-tasks-finished before writing cache.
https://reviewboard.mozilla.org/r/248036/#review254152
::: js/xpconnect/loader/ScriptPreloader.cpp:34
(Diff revision 1)
> #include "nsJSUtils.h"
> #include "nsProxyRelease.h"
> #include "nsThreadUtils.h"
> #include "nsXULAppAPI.h"
> #include "xpcpublic.h"
>
Please just rename this to something more generic that states its purpose. Like STARTUP_COMPLETE_TOPIC.
Also, please make sure this is dispatched in Fennec.
Attachment #8981991 -
Flags: review?(kmaglione+bmo) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 49•6 years ago
|
||
mozreview-review |
Comment on attachment 8981988 [details]
Bug 1458375 - Add a new browser-idle-startup-tasks-finished notification.
https://reviewboard.mozilla.org/r/248028/#review254220
Attachment #8981988 -
Flags: review?(felipc) → review+
Reporter | ||
Comment 50•6 years ago
|
||
mozreview-review |
Comment on attachment 8981992 [details]
Bug 1458375 - Remove empty binding attached to scrollbars.
https://reviewboard.mozilla.org/r/248040/#review254290
Attachment #8981992 -
Flags: review?(timdream) → review+
Comment 51•6 years ago
|
||
mozreview-review |
Comment on attachment 8981987 [details]
Bug 1458375 - Talos tests that use TalosPowersContent.goQuitApplication(); shouldn't close their windows.
https://reviewboard.mozilla.org/r/248026/#review254404
Yep LGTM and applied patch locally and ran tresize which uses goQuitApplication()
Attachment #8981987 -
Flags: review?(rwood) → review+
Assignee | ||
Comment 52•6 years ago
|
||
Just so nobody thinks I'm cooking the books, here's a comparison with and without the empty binding (both having the rest of the patches):
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=97d5b4811a11&newProject=try&newRevision=3e5e06d9fc97&framework=1&showOnlyImportant=1
Comment 53•6 years ago
|
||
(In reply to Robert Wood [:rwood] from comment #51)
> Comment on attachment 8981987 [details]
> Bug 1458375 - Talos tests that use TalosPowersContent.goQuitApplication();
> shouldn't close their windows.
>
> https://reviewboard.mozilla.org/r/248026/#review254404
>
> Yep LGTM and applied patch locally and ran tresize which uses
> goQuitApplication()
Whoops r+d the wrong patch
Comment 54•6 years ago
|
||
mozreview-review |
Comment on attachment 8981986 [details]
Bug 1458375 - Remove non-e10s support for TalosPowersContent goQuitApplication.
https://reviewboard.mozilla.org/r/248024/#review254412
LGTM, applied locally and ran a talos test that uses goQuitApplication (tresize).
Attachment #8981986 -
Flags: review?(rwood) → review+
Comment 55•6 years ago
|
||
(In reply to Robert Wood [:rwood] from comment #53)
> (In reply to Robert Wood [:rwood] from comment #51)
> > Comment on attachment 8981987 [details]
> > Bug 1458375 - Talos tests that use TalosPowersContent.goQuitApplication();
> > shouldn't close their windows.
> >
> > https://reviewboard.mozilla.org/r/248026/#review254404
> >
> > Yep LGTM and applied patch locally and ran tresize which uses
> > goQuitApplication()
>
> Whoops r+d the wrong patch
Ok now looked at this one, it's also fine. Ran talos tests that use goQuitApplication (tresize, sessionrestore) and no issues.
Comment 56•6 years ago
|
||
mozreview-review |
Comment on attachment 8981989 [details]
Bug 1458375 - Make TalosPowersContent.goQuitApplication wait for browser-idle-startup-tasks-finished to fire before quitting.
https://reviewboard.mozilla.org/r/248030/#review254406
LGTM! Also applied previous patches first as well as this one, and ran the talos tests that use goQuitApplication (tresize, sessionrestore). No issues.
Attachment #8981989 -
Flags: review?(rwood) → review+
Comment 57•6 years ago
|
||
mozreview-review |
Comment on attachment 8981990 [details]
Bug 1458375 - Make sessionrestore tests wait for the browser-idle-startup-tasks-finished event.
https://reviewboard.mozilla.org/r/248032/#review254442
LGTM, also applied locally with the others and talos sessionrestore runs fine.
Attachment #8981990 -
Flags: review?(rwood) → review+
Comment 58•6 years ago
|
||
mozreview-review |
Comment on attachment 8982039 [details]
Bug 1458375 - Add browser-idle-startup-tasks-finished notification to Fennec.
https://reviewboard.mozilla.org/r/248082/#review254458
Attachment #8982039 -
Flags: review?(snorp) → review+
Comment 59•6 years ago
|
||
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2968c0a21410
Remove non-e10s support for TalosPowersContent goQuitApplication. r=rwood
https://hg.mozilla.org/integration/autoland/rev/cfe8da250de7
Talos tests that use TalosPowersContent.goQuitApplication(); shouldn't close their windows. r=rwood
https://hg.mozilla.org/integration/autoland/rev/715850ae92a1
Add a new browser-idle-startup-tasks-finished notification. r=Felipe
https://hg.mozilla.org/integration/autoland/rev/bd2cba9dfb3d
Add browser-idle-startup-tasks-finished notification to Fennec. r=snorp
https://hg.mozilla.org/integration/autoland/rev/868da3a1b94a
Make TalosPowersContent.goQuitApplication wait for browser-idle-startup-tasks-finished to fire before quitting. r=rwood
https://hg.mozilla.org/integration/autoland/rev/1fb76b9b6f82
Make sessionrestore tests wait for the browser-idle-startup-tasks-finished event. r=rwood
https://hg.mozilla.org/integration/autoland/rev/ae969e1cebfb
Make ScriptPreloader stop recording after browser-idle-startup-tasks-finished. r=kmag
https://hg.mozilla.org/integration/autoland/rev/30a743401670
Remove empty binding attached to scrollbars. r=timdream
Comment 60•6 years ago
|
||
Backed out 8 changesets (bug 1458375) for Mochitest leaks on a CLOSED TREE
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=30a7434016700d5d11031eb329526932c8bcb93d&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified&selectedJob=181159673
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=181159673&repo=autoland&lineNumber=5344
Backout: https://hg.mozilla.org/integration/autoland/rev/da1661774f7f4546497f46e89aceba860693613f
Flags: needinfo?(mconley)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 70•6 years ago
|
||
mozreview-review |
Comment on attachment 8983571 [details]
Bug 1458375 - Make browser mochitests wait for browser-idle-startup-tasks-finished before running tests.
https://reviewboard.mozilla.org/r/249428/#review255612
Code analysis found 1 defect in this patch:
- 1 defect found by mozlint
You can run this analysis locally with:
- `./mach lint path/to/file` (JS/Python)
If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx
::: testing/mochitest/browser-test.js:518
(Diff revision 1)
>
> - if (this.tests.length)
> - this.waitForGraphicsTestWindowToBeGone(this.nextTest.bind(this));
> - else
> + if (this.tests.length) {
> + this.waitForWindowsReady().then(() => {
> + this.nextTest();
> + });
> + }
Error: Closing curly brace does not appear on the same line as the subsequent block. [eslint: brace-style]
Comment hidden (mozreview-request) |
Comment 72•6 years ago
|
||
mozreview-review |
Comment on attachment 8983571 [details]
Bug 1458375 - Make browser mochitests wait for browser-idle-startup-tasks-finished before running tests.
https://reviewboard.mozilla.org/r/249428/#review255986
::: testing/mochitest/browser-test.js:530
(Diff revision 2)
> + await new Promise(resolve => {
> + const IDLE_TOPIC = "browser-idle-startup-tasks-finished";
> + let observer = (subject, topic, data) => {
> + if (topic == IDLE_TOPIC && subject == window) {
> + Services.obs.removeObserver(observer, IDLE_TOPIC);
> + resolve();
> + }
> + };
> + Services.obs.addObserver(observer, IDLE_TOPIC);
> + });
Nit:
await TestUtils.topicObserved(IDLE_TOPIC,
subject => subject === window);
Attachment #8983571 -
Flags: review?(kmaglione+bmo) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 82•6 years ago
|
||
For the folks who are wondering why I haven't just gone ahead and landed this, it's because it caused a ts_paint regression on Linux.
It seems that there's some kind of relationship between the size of the ScriptPreload cache for the parent process, and ts_paint performance on Linux. My theory is that if the cache gets large enough, then we end up taxing background threads on decoding which draws CPU cycles from the main thread, pulling from first paint performance. I suspect this only because an issue because I greatly increased the number of scripts going into the cache in the parent process.
So I've modified my scheme somewhat; I'm going back to using browser-delayed-startup-finished to stop the recording for the ScriptPreloader in the parent process - however, I wait until browser-idle-startup-tasks-finished to prepare and write the caches to disk.
This seems to address all of the remaining high-confidence regressions for me.
Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c87fd0965dcc7528a29cfffbd4a8b516c220bf55
Comparison to a baseline: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=395a7f880d36&newProject=try&newRevision=c87fd0965dcc&framework=1&showOnlyImportant=1
I ended up modifying the ScriptPreloader changes in this way:
https://reviewboard.mozilla.org/r/248036/diff/3-4#index_header
Does that still look okay to you, kmag?
Flags: needinfo?(mconley) → needinfo?(kmaglione+bmo)
Comment 83•6 years ago
|
||
Yeah, that sounds reasonable enough. But I really think that ideally we should try to get scripts loaded by the idle tasks into the preloader cache. I'd expect it to noticeably improve startup responsiveness in a lot of cases.
Flags: needinfo?(kmaglione+bmo)
Comment 84•6 years ago
|
||
mozreview-review |
Comment on attachment 8981991 [details]
Bug 1458375 - Make ScriptPreloader wait until browser-idle-startup-tasks-finished before writing cache.
https://reviewboard.mozilla.org/r/248036/#review256998
::: js/xpconnect/loader/ScriptPreloader.cpp:771
(Diff revisions 3 - 4)
> NS_NAMED_LITERAL_CSTRING(mochikitPrefix, "chrome://mochikit/");
> if (StringHead(url, mochikitPrefix.Length()) == mochikitPrefix) {
> return;
> }
>
> + if (mStartupFinished) {
Not necessary. Active() already checks this.
Assignee | ||
Comment 85•6 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #83)
> Yeah, that sounds reasonable enough. But I really think that ideally we
> should try to get scripts loaded by the idle tasks into the preloader cache.
> I'd expect it to noticeably improve startup responsiveness in a lot of cases.
Agreed. I found the ts_paint regression quite surprising - I'll file a follow-up bug to revisit.
(In reply to Kris Maglione [:kmag] from comment #84)
> Not necessary. Active() already checks this.
Thanks!
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 95•6 years ago
|
||
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8b65605994f9
Remove non-e10s support for TalosPowersContent goQuitApplication. r=rwood
https://hg.mozilla.org/integration/autoland/rev/764acaea4d9e
Talos tests that use TalosPowersContent.goQuitApplication(); shouldn't close their windows. r=rwood
https://hg.mozilla.org/integration/autoland/rev/6c43f8368b07
Add a new browser-idle-startup-tasks-finished notification. r=Felipe
https://hg.mozilla.org/integration/autoland/rev/06ebd23a33bf
Add browser-idle-startup-tasks-finished notification to Fennec. r=snorp
https://hg.mozilla.org/integration/autoland/rev/7b152fdfbb4a
Make TalosPowersContent.goQuitApplication wait for browser-idle-startup-tasks-finished to fire before quitting. r=rwood
https://hg.mozilla.org/integration/autoland/rev/5a342be8a03e
Make sessionrestore tests wait for the browser-idle-startup-tasks-finished event. r=rwood
https://hg.mozilla.org/integration/autoland/rev/8603f0a7c7d1
Make ScriptPreloader wait until browser-idle-startup-tasks-finished before writing cache. r=kmag
https://hg.mozilla.org/integration/autoland/rev/616dd117184d
Remove empty binding attached to scrollbars. r=timdream
https://hg.mozilla.org/integration/autoland/rev/b01c4543a097
Make browser mochitests wait for browser-idle-startup-tasks-finished before running tests. r=kmag
Comment 96•6 years ago
|
||
Push from comment 95 brought at least a perf win:
== Change summary for alert #13775 (as of Tue, 12 Jun 2018 16:33:36 GMT) ==
Improvements:
6% sessionrestore_no_auto_restore windows10-64 pgo e10s stylo 279.08 -> 262.25
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13775
Comment 97•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8b65605994f9
https://hg.mozilla.org/mozilla-central/rev/764acaea4d9e
https://hg.mozilla.org/mozilla-central/rev/6c43f8368b07
https://hg.mozilla.org/mozilla-central/rev/06ebd23a33bf
https://hg.mozilla.org/mozilla-central/rev/7b152fdfbb4a
https://hg.mozilla.org/mozilla-central/rev/5a342be8a03e
https://hg.mozilla.org/mozilla-central/rev/8603f0a7c7d1
https://hg.mozilla.org/mozilla-central/rev/616dd117184d
https://hg.mozilla.org/mozilla-central/rev/b01c4543a097
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox62:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Comment 98•6 years ago
|
||
A more complete list of perf improvements:
== Change summary for alert #13786 (as of Tue, 12 Jun 2018 16:33:36 GMT) ==
Improvements:
7% sessionrestore_no_auto_restore windows10-64 opt e10s stylo 306.08 -> 283.75
7% sessionrestore_no_auto_restore linux64 pgo e10s stylo 297.50 -> 276.58
7% sessionrestore_no_auto_restore windows7-32 pgo e10s stylo 302.00 -> 281.17
6% sessionrestore_no_auto_restore linux64 opt e10s stylo 311.38 -> 291.33
6% sessionrestore_no_auto_restore windows7-32 opt e10s stylo 336.42 -> 315.25
6% sessionrestore_no_auto_restore windows10-64 pgo e10s stylo 279.08 -> 262.25
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13786
Updated•5 years ago
|
Type: enhancement → task
You need to log in
before you can comment on or make changes to this bug.
Description
•