Remove empty binding & investigate the performance gain caused by it

RESOLVED FIXED in Firefox 62

Status

()

P2
normal
RESOLVED FIXED
11 months ago
a month ago

People

(Reporter: timdream, Assigned: mconley)

Tracking

(Blocks: 1 bug)

unspecified
mozilla62
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox62 fixed)

Details

Attachments

(9 attachments, 1 obsolete attachment)

59 bytes, text/x-review-board-request
rwood
: review+
Details
59 bytes, text/x-review-board-request
rwood
: review+
Details
59 bytes, text/x-review-board-request
Felipe
: review+
Details
59 bytes, text/x-review-board-request
rwood
: review+
Details
59 bytes, text/x-review-board-request
rwood
: review+
Details
59 bytes, text/x-review-board-request
kmag
: review+
Details
59 bytes, text/x-review-board-request
timdream
: review+
Details
59 bytes, text/x-review-board-request
snorp
: review+
Details
59 bytes, text/x-review-board-request
kmag
: review+
Details
See bug 1456728. There is a cpstartup performance regression caused by the removal of the scrollbar binding. Adding an empty binding "fixes" it.

We will need to investigate why and remove it.
Blocks: 1397874
Priority: -- → P2
Continue from my investigation from bug 1456728 comment 43:

So I've got to a point where on OSX debug locally, I can see the build gets slower if nsSyncLoadService::PushSyncStreamToListener() calls into nsParser::OnStopRequest(). However, when I copy my URI check code into the function and make OnStopRequest() no-op when it hits general.xml, the build continue to run fast. The two results are conflicting. I guess my methodology might be flawed.

What I am going to do next is to repeat the process in the loner machine, hopefully to avoid the noises on my machine locally.
Just to confirm what I see from Try:

The performance gain happened in nsSyncLoadService::PushSyncStreamToListener() but before XMLDocument::EndLoad()

Performance regressed by skipping nsSyncLoadService::PushSyncStreamToListener() in content process:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=bbe9bfd1dc21&framework=1

But not by an early return from XMLDocument::EndLoad()

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2e7bac585cfa&newProject=try&newRevision=303f5970e696&framework=1
Worth investigating: we load xul.css in content process in cpstartup test.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed50dfc9fce1f397f703dd403a723e2429e35375

( Ctrl+F "nsSyncLoadService::PushSyncStreamToListener()" in this log https://taskcluster-artifacts.net/ADWlcrjWSbWG0m-EM1SzmQ/0/public/logs/live_backing.log )
Yeah, I don't think components.css belongs to content process either :'(
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #4)
> Worth investigating: we load xul.css in content process in cpstartup test.

(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #5)
> Yeah, I don't think components.css belongs to content process either :'(

That must be due to the page loading over a chrome:// URI, or some other test-specific thing. The only place where these sheets get loaded are if LoadsFullXULStyleSheetUpFront is true: https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/layout/base/nsDocumentViewer.cpp#2372-2385.
(In reply to Brian Grinstead [:bgrins] from comment #7)
> That must be due to the page loading over a chrome:// URI, or some other
> test-specific thing. The only place where these sheets get loaded are if
> LoadsFullXULStyleSheetUpFront is true:
> https://searchfox.org/mozilla-central/rev/
> b28b94dc81d60c6d9164315adbd4a5073526d372/layout/base/nsDocumentViewer.
> cpp#2372-2385.

You are right. This eventually reaches nsIDocument::InternalAllowXULXBL() and checks the principal.

https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/dom/base/nsDocument.cpp#5023

I actually don't understand why target.html has to be loaded with system principal, but I haven't tried to remove this.

https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/testing/talos/talos/tests/cpstartup/bootstrap.js#95
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #6)

The fact that libexpat could cause any side effect is pretty unimaginable. The side effect might be caused by a state change eventually found. So this will still take time to investigate.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #8)
> (In reply to Brian Grinstead [:bgrins] from comment #7)
> > That must be due to the page loading over a chrome:// URI, or some other
> > test-specific thing. The only place where these sheets get loaded are if
> > LoadsFullXULStyleSheetUpFront is true:
> > https://searchfox.org/mozilla-central/rev/
> > b28b94dc81d60c6d9164315adbd4a5073526d372/layout/base/nsDocumentViewer.
> > cpp#2372-2385.
> 
> You are right. This eventually reaches nsIDocument::InternalAllowXULXBL()
> and checks the principal.
> 
> https://searchfox.org/mozilla-central/rev/
> b28b94dc81d60c6d9164315adbd4a5073526d372/dom/base/nsDocument.cpp#5023
> 
> I actually don't understand why target.html has to be loaded with system
> principal, but I haven't tried to remove this.

I think the 'fix' for this is to update the test to not use system principle for the content process document.

I don't know if there are any cases where we load chrome:// URIs in a content process for normal browsing (and if there are it certainly isn't the 'normal' entry point for starting a content process. So whatever side effects we are hitting here are probably affecting the test but not normal users. Mike, does that make sense? And if so, would it be be relatively easy to load the document over localhost instead of chrome://?
Flags: needinfo?(mconley)
(Assignee)

Comment 11

11 months ago
(In reply to Brian Grinstead [:bgrins] from comment #10)
> I don't know if there are any cases where we load chrome:// URIs in a
> content process for normal browsing (and if there are it certainly isn't the
> 'normal' entry point for starting a content process. So whatever side
> effects we are hitting here are probably affecting the test but not normal
> users. Mike, does that make sense? And if so, would it be be relatively easy
> to load the document over localhost instead of chrome://?

Yeah, I agree that loading up a chrome:// URI in a new tab is not a very likely scenario. If that's having an impact on the measurement, then we should adjust the test to match the common scenario. There's no great reason to load a chrome:// URI except that (I think) it was just easier to write.

I think I can provide a patch to update the test (rwood should review it). Coming up.
(Assignee)

Comment 13

11 months ago
Okay, this patch moves the page load to http://. I suggest importing that, putting a patch on top of it to remove the empty binding, and seeing what Talos on Try thinks. If it's all green, rwood would be a good reviewer for the Talos bits.
Flags: needinfo?(mconley)
Comment hidden (obsolete)
Comment hidden (obsolete)
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #9)
> (In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #6)
> 
> The fact that libexpat could cause any side effect is pretty unimaginable.

... and I can confirm that calling into XML_Parse() in xmlparse.c triggers speed-up...

https://treeherder.mozilla.org/#/jobs?repo=try&author=timdream@gmail.com&fromchange=0f0889a9b70f331e6bbf044439677f358ec4d910&selectedJob=177544633
Here is a brief intro on what happened and what I've found: over at bug 1456728, I was flagged by talos for slowing down content process start-up by moving the <xul:scrolbar> mark-ups from the XBL binding to nsScrollbarFrame. Investigation shows that simply by adding an empty binding back to <xul:scrollbar>, the performance regression goes away. Given the merge schedule, I checked-in the empty binding to central, and continue the investigation over here.

My methodology of locating the cause involving identifying the code path that loads the empty binding, and insert early returns in various places. By comparing the talos results on the try pushes, I was able to narrow down the exact function calls that lead to the effect. Eventually, I reached comment 16, which skipping XML_Parse() in nsExpatDriver::ParseBuffer() re-creates the performance regression*.

This is very mindblowing because expat should be self-contained and perhaps stateless. We were hoping to find something that gets initialized earlier/later because of the binding and perhaps keeping the performance by moving it somewhere else. That *something* certainly shouldn't be attempt parse an XML file :-/.

:mconley had a brief discussion with me on this over IRC; he recommends that I use the Linux perf tool [1] to figure out that slows us down when we don't call into expat. It would also help us understand if the cause here is memory allocation-related.

[1] https://en.wikipedia.org/wiki/Perf_(Linux)

This is way out of my field of expertise so it would take some time to set it up. If you have free cycles to dive into this mystery, feel free to take over.

* I've ruled out nsXMLDocumentSink calls from nsParser::ResumeParse() (comment 3) but I didn't rule out calls by nsExpatDriver.
(Assignee)

Comment 18

11 months ago
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #17)
> 
> :mconley had a brief discussion with me on this over IRC; he recommends that
> I use the Linux perf tool [1] to figure out that slows us down when we don't
> call into expat. It would also help us understand if the cause here is
> memory allocation-related.
> 

I can try to work on this tomorrow if you can forward me the details to sign into the Talos machine you've got loaned out.
Flags: needinfo?(timdream)
Forwarded.
Flags: needinfo?(timdream)
(Assignee)

Comment 20

11 months ago
Thanks, received! ni?ing myself for analysis.
Flags: needinfo?(mconley)
(Assignee)

Comment 21

10 months ago
So, just an update here - I'm able to reproduce the issue on the Talos loaner, but I can't get perf installed on this particular distro.

I think I'm going to fall back to Gecko Profiler profile analysis.
(Assignee)

Comment 22

10 months ago
Got some profiles from the Talos loaner. Talos by default does 2 runs when profiling, so I've included both profiles for each state.


# Good (empty binding):
Scores: 1674.0, 1627.0

https://perfht.ml/2wHX1WD
https://perfht.ml/2Kp7BUl


# Bad (XML_Parse bypassed):
Scores: 1691.0, 1702.0 

https://perfht.ml/2Kp9oZz
https://perfht.ml/2wNZcYW

Continuing analysis...
(Assignee)

Comment 23

10 months ago
Alright, I'm about ready to give up on this one.

I was able to get perf running on the Talos loaner, and the only thing that jumped out at me is that in the "bad" case, for the content process, the CPU misses the L1 cache more. I did a few runs, and here's the difference:

Good: ~227 L1-dcache-loadmisses
Bad:  ~251 L2-dcache-loadmisses

That's not a huge increase, but it's the only thing that stuck out to me.

Looking at the default events that perf records, here's what I got...

Good case:

 72.46%  Web Content  [unknown]                   [.] 0x7fadfc0de1e0
  1.83%  Web Content  [kernel.kallsyms]           [k] page_fault
  1.45%  Web Content  libc-2.15.so                [.] __strlen_sse2_pminub
  1.36%  Web Content  [kernel.kallsyms]           [k] clear_page_c
  1.20%  Web Content  libc-2.15.so                [.] __strcmp_sse2
  0.99%  Web Content  libxul.so                   [.] nsBufferedOutputStream::QueryInterface(nsID const&, void**)
  0.95%  Web Content  [kernel.kallsyms]           [k] error_entry
  0.93%  Web Content  [kernel.kallsyms]           [k] hpet_msi_next_event
  0.93%  Web Content  perf-24669.map              [.] 0x331c4938c2e0
  0.91%  Web Content  [kernel.kallsyms]           [k] native_write_msr_safe
  0.85%  Web Content  libglib-2.0.so.0.3200.1     [.] slab_allocator_alloc_chunk
  0.82%  Web Content  libc-2.15.so                [.] __memcpy_ssse3_back
  0.75%  Web Content  libc-2.15.so                [.] __strchr_sse42
  0.61%  Web Content  libpthread-2.15.so          [.] pthread_mutex_lock
  0.48%  Web Content  firefox                     [.] arena_t::Malloc(unsigned long, bool)
  0.48%  Web Content  [kernel.kallsyms]           [k] _cond_resched
  0.47%  Web Content  [kernel.kallsyms]           [k] __mem_cgroup_commit_charge
  0.47%  Web Content  [kernel.kallsyms]           [k] __ticket_spin_lock
  0.47%  Web Content  [kernel.kallsyms]           [k] radix_tree_lookup_element
  0.47%  Web Content  [kernel.kallsyms]           [k] pagevec_lru_move_fn
  0.47%  Web Content  [kernel.kallsyms]           [k] _copy_to_user
  0.46%  Web Content  [kernel.kallsyms]           [k] ____pagevec_lru_add
  0.46%  Web Content  [kernel.kallsyms]           [k] anon_vma_clone
  0.46%  Web Content  [kernel.kallsyms]           [k] get_page_from_freelist
  0.45%  Web Content  [kernel.kallsyms]           [k] sys_mmap
  0.45%  Web Content  [kernel.kallsyms]           [k] ____pagevec_lru_add_fn
  0.44%  Web Content  [kernel.kallsyms]           [k] __phys_addr
  0.44%  Web Content  libX11.so.6.3.0             [.] 0x514e0
  0.43%  Web Content  libxul.so                   [.] silk_encode_frame_FLP
  0.43%  Web Content  [nvidia]                    [k] 0x521783
  0.43%  Web Content  [kernel.kallsyms]           [k] mem_cgroup_charge_common
  0.43%  Web Content  libgobject-2.0.so.0.3200.1  [.] g_type_class_ref
  0.41%  Web Content  libglib-2.0.so.0.3200.1     [.] g_slice_alloc
  0.41%  Web Content  libglib-2.0.so.0.3200.1     [.] g_string_insert_len
  0.37%  Web Content  firefox                     [.] free
  0.37%  Web Content  firefox                     [.] Allocator<MozJemallocBase>::realloc(void*, unsigned long)
  0.34%  Web Content  [kernel.kallsyms]           [k] zap_pte_range
  0.34%  Web Content  [kernel.kallsyms]           [k] bit_spin_unlock
  0.33%  Web Content  [kernel.kallsyms]           [k] find_vma
  0.31%  Web Content  [kernel.kallsyms]           [k] free_pcppages_bulk
  0.31%  Web Content  libgobject-2.0.so.0.3200.1  [.] g_type_check_instance_is_a
  0.29%  Web Content  [kernel.kallsyms]           [k] mem_cgroup_del_lru_list.part.36
  0.28%  Web Content  libglib-2.0.so.0.3200.1     [.] g_datalist_get_flags
  0.26%  Web Content  firefox                     [.] arena_salloc(void const*)
  0.24%  Web Content  [kernel.kallsyms]           [k] page_add_new_anon_rmap
  0.23%  Web Content  libz.so.1.2.3.4             [.] inflate
  0.23%  Web Content  libz.so.1.2.3.4             [.] 0xd846
  0.23%  Web Content  [kernel.kallsyms]           [k] restore_sigcontext
  0.21%  Web Content  [kernel.kallsyms]           [k] __wake_up_bit
  0.21%  Web Content  [kernel.kallsyms]           [k] skb_release_data
  0.19%  Web Content  [kernel.kallsyms]           [k] lookup_page_cgroup
  0.19%  Web Content  [kernel.kallsyms]           [k] sk_run_filter


Bad case:

 74.93%  Web Content  [unknown]                   [.] 0x7f92f28160d6
  1.57%  Web Content  [kernel.kallsyms]           [k] __cpa_flush_all
  1.26%  Web Content  perf-17400.map              [.] 0x32d0738a2ba2
  1.18%  Web Content  ld-2.15.so                  [.] do_lookup_x
  1.09%  Web Content  [nvidia]                    [k] nv_flush_cache
  0.91%  Web Content  libglib-2.0.so.0.3200.1     [.] g_slice_alloc
  0.88%  Web Content  [kernel.kallsyms]           [k] page_fault
  0.84%  Web Content  libxul.so                   [.] nsBufferedOutputStream::QueryInterface(nsID const&, void**)
  0.69%  Web Content  [kernel.kallsyms]           [k] clear_page_c
  0.64%  Web Content  libgobject-2.0.so.0.3200.1  [.] g_signal_newv
  0.57%  Web Content  libglib-2.0.so.0.3200.1     [.] g_hash_table_lookup
  0.51%  Web Content  libpthread-2.15.so          [.] pthread_mutex_lock
  0.47%  Web Content  libgobject-2.0.so.0.3200.1  [.] g_type_value_table_peek
  0.45%  Web Content  [kernel.kallsyms]           [k] prep_new_page
  0.45%  Web Content  [kernel.kallsyms]           [k] mem_cgroup_add_lru_list.part.37
  0.44%  Web Content  [kernel.kallsyms]           [k] _cond_resched
  0.44%  Web Content  [kernel.kallsyms]           [k] retint_swapgs
  0.43%  Web Content  [kernel.kallsyms]           [k] perf_pmu_rotate_start.isra.42
  0.43%  Web Content  [kernel.kallsyms]           [k] system_call
  0.43%  Web Content  [kernel.kallsyms]           [k] prio_tree_insert
  0.43%  Web Content  [kernel.kallsyms]           [k] path_init
  0.42%  Web Content  [kernel.kallsyms]           [k] syscall_trace_enter
  0.41%  Web Content  [kernel.kallsyms]           [k] radix_tree_lookup_element
  0.41%  Web Content  [kernel.kallsyms]           [k] ahci_port_intr
  0.41%  Web Content  [kernel.kallsyms]           [k] mem_cgroup_uncharge_start
  0.40%  Web Content  [kernel.kallsyms]           [k] __find_get_block_slow
  0.40%  Web Content  libX11.so.6.3.0             [.] 0x4db56
  0.39%  Web Content  libgdk-3.so.0.400.1         [.] 0x5b8bb
  0.39%  Web Content  [kernel.kallsyms]           [k] handle_mm_fault
  0.39%  Web Content  libc-2.15.so                [.] __GI_____strtod_l_internal
  0.39%  Web Content  [nvidia]                    [k] 0xa34b
  0.39%  Web Content  [kernel.kallsyms]           [k] fd_install
  0.39%  Web Content  [kernel.kallsyms]           [k] get_page_from_freelist
  0.38%  Web Content  ld-2.15.so                  [.] _dl_lookup_symbol_x
  0.38%  Web Content  libglib-2.0.so.0.3200.1     [.] get_C_locale
  0.37%  Web Content  [kernel.kallsyms]           [k] do_wp_page
  0.36%  Web Content  libglib-2.0.so.0.3200.1     [.] slab_allocator_alloc_chunk
  0.36%  Web Content  [kernel.kallsyms]           [k] find_get_page
  0.35%  Web Content  libpthread-2.15.so          [.] pthread_mutex_unlock
  0.33%  Web Content  libz.so.1.2.3.4             [.] crc32
  0.31%  Web Content  libglib-2.0.so.0.3200.1     [.] g_slice_free_chain_with_offset
  0.30%  Web Content  [kernel.kallsyms]           [k] copy_creds
  0.27%  Web Content  libgtk-3.so.0.400.1         [.] 0x104a68
  0.26%  Web Content  libgio-2.0.so.0.3200.1      [.] g_local_file_init
  0.24%  Web Content  firefox                     [.] SizeClass::SizeClass(unsigned long)
  0.24%  Web Content  ld-2.15.so                  [.] strcmp
  0.21%  Web Content  libglib-2.0.so.0.3200.1     [.] g_array_append_vals
  0.21%  Web Content  firefox                     [.] Allocator<MozJemallocBase>::free(void*)
  0.20%  Web Content  libc-2.15.so                [.] __strchr_sse42
  0.19%  Web Content  libglib-2.0.so.0.3200.1     [.] g_str_hash
  0.18%  Web Content  libpthread-2.15.so          [.] pthread_getspecific
  0.18%  Web Content  libc-2.15.so                [.] __dcigettext
  0.18%  Web Content  libgobject-2.0.so.0.3200.1  [.] g_type_interface_peek
  0.18%  Web Content  libc-2.15.so                [.] __strcmp_sse42
  0.18%  Web Content  libgobject-2.0.so.0.3200.1  [.] param_spec_pool_hash
  0.18%  Web Content  [kernel.kallsyms]           [k] __do_softirq
  0.13%  Web Content  [kernel.kallsyms]           [k] find_vma
  0.00%  Web Content  [kernel.kallsyms]           [k] native_write_msr_safe



My conclusion here, based on this, as well as the Gecko Profiles that we've looked at, is that there's not one thing that we're doing wrong that seems to be hurting us here. Something at a lower level - maybe within some of the GTK libraries, or maybe within the kernel, or maybe down to the CPU caches - is burning us here, and causing us to be a little more expensive.

The other thing to note is that these Talos machines are running a super out-of-date version of Ubuntu (12.04 Precise LTS). That version of Ubuntu hasn't had official support from Canonical in about a year. We should probably upgrade the OS and remeasure here - especially because I haven't been able to reproduce this issue on modern versions of Ubuntu (even when I match CPU cores and RAM to the Talos machines).

I'm going to see if I can get some help getting the Talos loaner updated to a modern version of Ubuntu, and see if the regression is still reproducible.
Flags: needinfo?(mconley)
Thank you so much for looking into it! I am not sure a recent version of Ubuntu, but I was reproducing this bug locally on macOS 10.12. There was just too much notice locally on my machine to work on this bug reliably.

Perhaps I can try to produce two profiles on macOS like comment 22? Last time I did I couldn't get the symbols; I can try to figure out that to do there.
(Assignee)

Comment 25

10 months ago
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #24)
> Thank you so much for looking into it! I am not sure a recent version of
> Ubuntu, but I was reproducing this bug locally on macOS 10.12.

I never considered trying to reproduce this on my local macOS machine. How embarrassing - if I can reproduce this with XCode Instruments connected, that might be super helpful in illuminating the issue. I'll try that today.
(Assignee)

Comment 26

10 months ago
With XCode Instruments, I may have fallen upon a lead, but it's a pretty weird one.

Using the System Trace instrument, I was able to look at thread locking, context switching, etc. One of the things I noticed was in the "good" case, content processes tend to have js::HelperThread threads doing (what appears to be) JS parsing off of the main thread. I couldn't find these threads in the "bad" case (when I remove the empty binding).

I am able to control parallel parsing with the javascript.options.parallel_parsing pref. Locally, when I disabled parallel parsing, the difference between baseline and removing the empty binding went away (if anything, it appeared as if removing the empty binding was slightly better than baseline).

I've got some try pushes cooking to see if this is the case on Linux64 on our Talos hardware as well. If this truly is the root cause - that we're somehow not parsing our JS in parallel properly in the bad case - I guess the next question is, "why"?

Or this might be all a red herring. We'll see, I guess.
(Assignee)

Comment 27

10 months ago
Disabling parallel JS in the binary (which I was tinkering with before I found the pref) gets rid of the difference between the baseline and removing the binding:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=7ea891e89077&newProject=try&newRevision=47d0489f190c&framework=1

Flipping the javascript.options.parallel_parsing pref to false is similar:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=9407d4cefbf4&newProject=try&newRevision=d684823ce8e7&framework=1

The plot thickens. I'm now trying to figure out if and when we schedule parallel JS parsing in the content process normally, and if and how that's being affected somehow by XML parsing the empty binding.
(Assignee)

Comment 28

10 months ago
Okay, miraculously, I think I'm 80% of the way towards understanding what's happening here.

Good news! I'm almost sure that we've caught a real performance regression here! \o/ Thanks, Talos!

We have this thing called the ScriptPreloader that notices scripts loaded in the parent and content processes before the browser-delayed-startup-finished observer notification for the first window fires. After that, the scripts it has seen are parsed off of the main thread, sorted by the order they were loaded in, and stashed in a binary format on the file system.

Then, on start-up, the cache is opened and made available for when both the parent and content processes want to load these scripts - we're able to skip the parsing step and start running the scripts more quickly.

The ScriptPreloader was added in bug 1359653. I'm pretty sure most people don't know it exists. It _does_ have a nice impact on our Talos numbers though (see bug 1359653 comment 93).

Anyhow, for some reason, when we don't have the XBL binding on the scrollbar, the scripts from the content process are only heard about _after_ the browser-delayed-startup-finished notification fires, so they miss the window of time to get into the cache.

The solution might end up being that browser-delayed-startup-finished is not the right point at which to "close the door" on scripts to put into the cache. We might want another notification to fire after all of the idle callback scripts fire in browser.js as well. We might also want to ensure that the content process has sent up its initial scripts too.

So, ultimately, the reason we're regressing cpstartup is because the scripts we're running aren't in the ScriptPreloader cache.

I'm still trying to figure out why, but I'm quite confident I'll have an answer shortly. Stay tuned.
(Assignee)

Comment 29

10 months ago
So it seems that the reason we're not caching the scripts from the child when we don't parse the XML seems to be some kind of scheduling foul-up.

In both cases (with or without the empty XBL binding), the browser-delayed-startup-finished notification can fire before the list of scripts are received from the child to cache. At that point, the parent asks a different thread to "prepare" the cache that will be written to disk. Meanwhile, the first tab's content process is sending the list of scripts to cache to the parent, and that message is sitting on the messaging thread waiting for the main thread to service it.

In the "good" case, where we have an empty XBL binding, the thread to prepare the cache is serviced _after_ the IPC message from the child is serviced. This means that we see browser-delayed-startup-finished, queue up the preparation, get some more scripts from the content process, and then prepare and write the cache to disk off of the main thread.

In the "bad" case, with no empty XBL binding, the thread to prepare the cache is serviced _before_ the IPC message from the child is serviced. This means that we see browser-delayed-startup-finished, queue up the preparation, and then start that preparation before we start servicing that list of scripts from the content process. By the time we _do_ service that message, we've already written the cache.
(Assignee)

Comment 30

10 months ago
Hey kmag,

Do you have any objections if I change ScriptPreloader to wait for something other than browser-delayed-startup-finished for "closing the caching window" during start-up? I'm thinking, specifically, of adding a new observer notification here:

https://searchfox.org/mozilla-central/rev/5a744713370ec47969595e369fd5125f123e6d24/browser/base/content/browser.js#1781

called browser-idle-startup-tasks-finished, which ScriptPreloader will wait for.

This will increase the size of the caching window significantly, and greatly increases the probability that the content process(es) will have sent up its list of scripts for caching.

Do you see any issues with that?
Flags: needinfo?(kmaglione+bmo)
Sounds fine to me. I've always suspected it might be a good idea to push the window back a bit further, but there was no other obvious place to put it.
Flags: needinfo?(kmaglione+bmo)
(Assignee)

Comment 32

10 months ago
Talked to timdream - I'll take this bug.
Assignee: timdream → mconley
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

10 months ago
Attachment #8973786 - Attachment is obsolete: true

Comment 40

10 months ago
mozreview-review
Comment on attachment 8981991 [details]
Bug 1458375 - Make ScriptPreloader wait until browser-idle-startup-tasks-finished before writing cache.

https://reviewboard.mozilla.org/r/248036/#review254152

::: js/xpconnect/loader/ScriptPreloader.cpp:34
(Diff revision 1)
>  #include "nsJSUtils.h"
>  #include "nsProxyRelease.h"
>  #include "nsThreadUtils.h"
>  #include "nsXULAppAPI.h"
>  #include "xpcpublic.h"
>  

Please just rename this to something more generic that states its purpose. Like STARTUP_COMPLETE_TOPIC.

Also, please make sure this is dispatched in Fennec.
Attachment #8981991 - Flags: review?(kmaglione+bmo) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 49

10 months ago
mozreview-review
Comment on attachment 8981988 [details]
Bug 1458375 - Add a new browser-idle-startup-tasks-finished notification.

https://reviewboard.mozilla.org/r/248028/#review254220
Attachment #8981988 - Flags: review?(felipc) → review+
Comment on attachment 8981992 [details]
Bug 1458375 - Remove empty binding attached to scrollbars.

https://reviewboard.mozilla.org/r/248040/#review254290
Attachment #8981992 - Flags: review?(timdream) → review+

Comment 51

10 months ago
mozreview-review
Comment on attachment 8981987 [details]
Bug 1458375 - Talos tests that use TalosPowersContent.goQuitApplication(); shouldn't close their windows.

https://reviewboard.mozilla.org/r/248026/#review254404

Yep LGTM and applied patch locally and ran tresize which uses goQuitApplication()
Attachment #8981987 - Flags: review?(rwood) → review+
(Assignee)

Comment 52

10 months ago
Just so nobody thinks I'm cooking the books, here's a comparison with and without the empty binding (both having the rest of the patches):

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=97d5b4811a11&newProject=try&newRevision=3e5e06d9fc97&framework=1&showOnlyImportant=1
(In reply to Robert Wood [:rwood] from comment #51)
> Comment on attachment 8981987 [details]
> Bug 1458375 - Talos tests that use TalosPowersContent.goQuitApplication();
> shouldn't close their windows.
> 
> https://reviewboard.mozilla.org/r/248026/#review254404
> 
> Yep LGTM and applied patch locally and ran tresize which uses
> goQuitApplication()

Whoops r+d the wrong patch

Comment 54

10 months ago
mozreview-review
Comment on attachment 8981986 [details]
Bug 1458375 - Remove non-e10s support for TalosPowersContent goQuitApplication.

https://reviewboard.mozilla.org/r/248024/#review254412

LGTM, applied locally and ran a talos test that uses goQuitApplication (tresize).
Attachment #8981986 - Flags: review?(rwood) → review+
(In reply to Robert Wood [:rwood] from comment #53)
> (In reply to Robert Wood [:rwood] from comment #51)
> > Comment on attachment 8981987 [details]
> > Bug 1458375 - Talos tests that use TalosPowersContent.goQuitApplication();
> > shouldn't close their windows.
> > 
> > https://reviewboard.mozilla.org/r/248026/#review254404
> > 
> > Yep LGTM and applied patch locally and ran tresize which uses
> > goQuitApplication()
> 
> Whoops r+d the wrong patch

Ok now looked at this one, it's also fine. Ran talos tests that use goQuitApplication (tresize, sessionrestore) and no issues.

Comment 56

10 months ago
mozreview-review
Comment on attachment 8981989 [details]
Bug 1458375 - Make TalosPowersContent.goQuitApplication wait for browser-idle-startup-tasks-finished to fire before quitting.

https://reviewboard.mozilla.org/r/248030/#review254406

LGTM! Also applied previous patches first as well as this one, and ran the talos tests that use goQuitApplication (tresize, sessionrestore). No issues.
Attachment #8981989 - Flags: review?(rwood) → review+

Comment 57

10 months ago
mozreview-review
Comment on attachment 8981990 [details]
Bug 1458375 - Make sessionrestore tests wait for the browser-idle-startup-tasks-finished event.

https://reviewboard.mozilla.org/r/248032/#review254442

LGTM, also applied locally with the others and talos sessionrestore runs fine.
Attachment #8981990 - Flags: review?(rwood) → review+
Comment on attachment 8982039 [details]
Bug 1458375 - Add browser-idle-startup-tasks-finished notification to Fennec.

https://reviewboard.mozilla.org/r/248082/#review254458
Attachment #8982039 - Flags: review?(snorp) → review+
(Assignee)

Updated

10 months ago
Blocks: 1465832

Comment 59

10 months ago
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2968c0a21410
Remove non-e10s support for TalosPowersContent goQuitApplication. r=rwood
https://hg.mozilla.org/integration/autoland/rev/cfe8da250de7
Talos tests that use TalosPowersContent.goQuitApplication(); shouldn't close their windows. r=rwood
https://hg.mozilla.org/integration/autoland/rev/715850ae92a1
Add a new browser-idle-startup-tasks-finished notification. r=Felipe
https://hg.mozilla.org/integration/autoland/rev/bd2cba9dfb3d
Add browser-idle-startup-tasks-finished notification to Fennec. r=snorp
https://hg.mozilla.org/integration/autoland/rev/868da3a1b94a
Make TalosPowersContent.goQuitApplication wait for browser-idle-startup-tasks-finished to fire before quitting. r=rwood
https://hg.mozilla.org/integration/autoland/rev/1fb76b9b6f82
Make sessionrestore tests wait for the browser-idle-startup-tasks-finished event. r=rwood
https://hg.mozilla.org/integration/autoland/rev/ae969e1cebfb
Make ScriptPreloader stop recording after browser-idle-startup-tasks-finished. r=kmag
https://hg.mozilla.org/integration/autoland/rev/30a743401670
Remove empty binding attached to scrollbars. r=timdream
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 70

10 months ago
mozreview-review
Comment on attachment 8983571 [details]
Bug 1458375 - Make browser mochitests wait for browser-idle-startup-tasks-finished before running tests.

https://reviewboard.mozilla.org/r/249428/#review255612


Code analysis found 1 defect in this patch:
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: testing/mochitest/browser-test.js:518
(Diff revision 1)
>  
> -    if (this.tests.length)
> -      this.waitForGraphicsTestWindowToBeGone(this.nextTest.bind(this));
> -    else
> +    if (this.tests.length) {
> +      this.waitForWindowsReady().then(() => {
> +        this.nextTest();
> +      });
> +    }

Error: Closing curly brace does not appear on the same line as the subsequent block. [eslint: brace-style]
Comment hidden (mozreview-request)

Comment 72

10 months ago
mozreview-review
Comment on attachment 8983571 [details]
Bug 1458375 - Make browser mochitests wait for browser-idle-startup-tasks-finished before running tests.

https://reviewboard.mozilla.org/r/249428/#review255986

::: testing/mochitest/browser-test.js:530
(Diff revision 2)
> +      await new Promise(resolve => {
> +        const IDLE_TOPIC = "browser-idle-startup-tasks-finished";
> +        let observer = (subject, topic, data) => {
> +          if (topic == IDLE_TOPIC && subject == window) {
> +            Services.obs.removeObserver(observer, IDLE_TOPIC);
> +            resolve();
> +          }
> +        };
> +        Services.obs.addObserver(observer, IDLE_TOPIC);
> +      });

Nit:

    await TestUtils.topicObserved(IDLE_TOPIC,
                                  subject => subject === window);
Attachment #8983571 - Flags: review?(kmaglione+bmo) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 82

10 months ago
For the folks who are wondering why I haven't just gone ahead and landed this, it's because it caused a ts_paint regression on Linux.

It seems that there's some kind of relationship between the size of the ScriptPreload cache for the parent process, and ts_paint performance on Linux. My theory is that if the cache gets large enough, then we end up taxing background threads on decoding which draws CPU cycles from the main thread, pulling from first paint performance. I suspect this only because an issue because I greatly increased the number of scripts going into the cache in the parent process.

So I've modified my scheme somewhat; I'm going back to using browser-delayed-startup-finished to stop the recording for the ScriptPreloader in the parent process - however, I wait until browser-idle-startup-tasks-finished to prepare and write the caches to disk.

This seems to address all of the remaining high-confidence regressions for me.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c87fd0965dcc7528a29cfffbd4a8b516c220bf55

Comparison to a baseline: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=395a7f880d36&newProject=try&newRevision=c87fd0965dcc&framework=1&showOnlyImportant=1

I ended up modifying the ScriptPreloader changes in this way:

https://reviewboard.mozilla.org/r/248036/diff/3-4#index_header

Does that still look okay to you, kmag?
Flags: needinfo?(mconley) → needinfo?(kmaglione+bmo)
Yeah, that sounds reasonable enough. But I really think that ideally we should try to get scripts loaded by the idle tasks into the preloader cache. I'd expect it to noticeably improve startup responsiveness in a lot of cases.
Flags: needinfo?(kmaglione+bmo)

Comment 84

10 months ago
mozreview-review
Comment on attachment 8981991 [details]
Bug 1458375 - Make ScriptPreloader wait until browser-idle-startup-tasks-finished before writing cache.

https://reviewboard.mozilla.org/r/248036/#review256998

::: js/xpconnect/loader/ScriptPreloader.cpp:771
(Diff revisions 3 - 4)
>      NS_NAMED_LITERAL_CSTRING(mochikitPrefix, "chrome://mochikit/");
>      if (StringHead(url, mochikitPrefix.Length()) == mochikitPrefix) {
>          return;
>      }
>  
> +    if (mStartupFinished) {

Not necessary. Active() already checks this.
(Assignee)

Comment 85

9 months ago
(In reply to Kris Maglione [:kmag] from comment #83)
> Yeah, that sounds reasonable enough. But I really think that ideally we
> should try to get scripts loaded by the idle tasks into the preloader cache.
> I'd expect it to noticeably improve startup responsiveness in a lot of cases.

Agreed. I found the ts_paint regression quite surprising - I'll file a follow-up bug to revisit.

(In reply to Kris Maglione [:kmag] from comment #84)
> Not necessary. Active() already checks this.

Thanks!
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 95

9 months ago
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8b65605994f9
Remove non-e10s support for TalosPowersContent goQuitApplication. r=rwood
https://hg.mozilla.org/integration/autoland/rev/764acaea4d9e
Talos tests that use TalosPowersContent.goQuitApplication(); shouldn't close their windows. r=rwood
https://hg.mozilla.org/integration/autoland/rev/6c43f8368b07
Add a new browser-idle-startup-tasks-finished notification. r=Felipe
https://hg.mozilla.org/integration/autoland/rev/06ebd23a33bf
Add browser-idle-startup-tasks-finished notification to Fennec. r=snorp
https://hg.mozilla.org/integration/autoland/rev/7b152fdfbb4a
Make TalosPowersContent.goQuitApplication wait for browser-idle-startup-tasks-finished to fire before quitting. r=rwood
https://hg.mozilla.org/integration/autoland/rev/5a342be8a03e
Make sessionrestore tests wait for the browser-idle-startup-tasks-finished event. r=rwood
https://hg.mozilla.org/integration/autoland/rev/8603f0a7c7d1
Make ScriptPreloader wait until browser-idle-startup-tasks-finished before writing cache. r=kmag
https://hg.mozilla.org/integration/autoland/rev/616dd117184d
Remove empty binding attached to scrollbars. r=timdream
https://hg.mozilla.org/integration/autoland/rev/b01c4543a097
Make browser mochitests wait for browser-idle-startup-tasks-finished before running tests. r=kmag
Push from comment 95 brought at least a perf win:

== Change summary for alert #13775 (as of Tue, 12 Jun 2018 16:33:36 GMT) ==

Improvements:

  6%  sessionrestore_no_auto_restore windows10-64 pgo e10s stylo     279.08 -> 262.25

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13775
(Assignee)

Updated

9 months ago
See Also: → bug 1468673
A more complete list of perf improvements:

== Change summary for alert #13786 (as of Tue, 12 Jun 2018 16:33:36 GMT) ==

Improvements:

  7%  sessionrestore_no_auto_restore windows10-64 opt e10s stylo     306.08 -> 283.75
  7%  sessionrestore_no_auto_restore linux64 pgo e10s stylo          297.50 -> 276.58
  7%  sessionrestore_no_auto_restore windows7-32 pgo e10s stylo      302.00 -> 281.17
  6%  sessionrestore_no_auto_restore linux64 opt e10s stylo          311.38 -> 291.33
  6%  sessionrestore_no_auto_restore windows7-32 opt e10s stylo      336.42 -> 315.25
  6%  sessionrestore_no_auto_restore windows10-64 pgo e10s stylo     279.08 -> 262.25

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13786
You need to log in before you can comment on or make changes to this bug.