Closed Bug 1632682 Opened 5 years ago Closed 3 years ago

Scripts are run in content process before ScriptPreloader is initialized

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox99 --- fixed

People

(Reporter: bdahl, Assigned: mcomella)

References

(Blocks 2 open bugs)

Details

Attachments

(1 file)

Starting up Fenix for the second time I see the following scripts loaded before ScriptCacheChild::Init is called

04-23 13:36:09.558 17415 17443 I GeckoZZZ: >>> ContentParent::InitInternal remoteType=web
04-23 13:36:09.777 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/ContentProcessSingleton.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.780 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/Services.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.783 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/AppConstants.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.786 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/XPCOMUtils.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.803 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/GeckoViewTelemetryController.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.805 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/GeckoViewUtils.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.810 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/Log.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.820 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=resource://gre/modules/TelemetryUtils.jsm proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.837 17468 17487 I GeckoZZZ: >>> ObjectForLocation uri=jar:jar:file:///data/app/org.mozilla.fenix.performancetest-1/base.apk!/assets/omni.ja!/components/GeckoViewStartup.js proccess=web readFromPreloaderCache=0 readFromStartupCache=0
04-23 13:36:09.849 17468 17487 I GeckoZZZ: >>> ScriptCacheChild::Init processT=web

Relevant discussion with kmag on Matrix:

Blocks: 1623518

I probably won't working on this in the near future, but for future reference:

I put together a patch that delays scripts in content processes until a "content-process-ready-for-script" notification. The talos results show:

  • cpstartup content-process-startup opt e10s stylo - linux64-shippable - -2.61% (good)
  • twinopen ext+twinopen:twinopen.html opt e10s stylo - windows10-64-shippable 2.60% (bad)

For geckoview, I saw no difference in app-link to content paint performance.

Do you have before/after profiles? I still regularly see lots of script parsing in the content process during app link startup, which I believe should be eliminated if this bug (and bug 1623518) are fixed.

Flags: needinfo?(bdahl)

I finally got in to the office to restart my desktop, but unfortunately I can't find any of the profiles anymore.

Flags: needinfo?(bdahl)
Blocks: 1617300
Assignee: bdahl → michael.l.comella

Root cause: some scripts in the content process were loaded before the script
cache was available there so these scripts were never written to the cache and
thus were never loaded from cache on subsequent runs, affecting performance.
These scripts listened for the app-startup topic.

This patch introduces a new topic to indicate when the script cache is init so
scripts previously listening to app-startup can listen to the new topic instead.

I created a patch using the same approach as :bdahl. I confirmed that GeckoViewStartup.jsm delays the Navigation::Start event in the child process (via spinning while loop) so measured the patches as part of fenix in our COLD VIEW time to navigation start benchmark (25 replicates). I see:

  • Before the patch: 2057ms
  • After: 2004ms

That's a 53ms improvement. I also took profiles (clipped from process start to Navigation::Start; each the fastest of 5 profiles):

You can see a similar time improvement in the clipped range (517 -> 470ms = 47ms) and see that each script takes less time to import (though some still take surprisingly long, e.g. ExtensionProcessScript.jsm goes from 26.2ms to 25.3ms).

Great work!

(In reply to Michael Comella (:mcomella) [needinfo or I won't see it] from comment #5)

(though some still take surprisingly long, e.g. ExtensionProcessScript.jsm goes from 26.2ms to 25.3ms).

This script seems to be spending most of its time executing its init function, and that's not something this patch would affect. So I think it's all working as intended.

:kmag, I am wondering what the "correct" solution is to this bug. See comment 4 for the root cause. I see two options:

  1. (already implemented) Add a new category, "content-process-ready-for-script", that runs after ScriptPreloader init that scripts that want to run on start up in the content process can listen for
  2. In the content process, move "app-startup" to occur after ScriptPreloader init

I'm concerned with solution 1 that it's easy to introduce new regressions since each script needs to be manually annotated and the parent process and content process need to be treated differently (e.g. I didn't make any changes for desktop so it may not have benefited from my patch).

Which solution is preferred? Or is there another solution?

fwiw, this is my first Gecko patch so if 2 is challenging and 1 is acceptable to land, let me know that too.

Flags: needinfo?(kmaglione+bmo)

To follow-up on my performance results from comment 5, talos shows an improvement to desktop content process start up (~2.49-3.71%) and other tests don't show significant changes. Since I didn't test desktop, in theory this improvement can come from one of:

  • TelemetryControllerContent will now be cached if desktop also experiences this bug
  • I introduced a bug and TelemetryControllerContent is never loaded

From comment 0, it sounds like a good idea to not run scripts early, so this looks like a reasonable approach to me. I actually found a comment that sounds like some devtools code having to hack around their app-startup code running so early that things aren't set up yet.

In a followup, we could probably hack up the category manager to produce an error if you try to load a JS component too early. It sounds like that's the only issue you uncovered.

(In reply to Michael Comella (:mcomella) [needinfo or I won't see it] from comment #7)

:kmag, I am wondering what the "correct" solution is to this bug. See comment 4 for the root cause. I see two options:
...
2. In the content process, move "app-startup" to occur after ScriptPreloader init

There appear to be c++ components that execute on app-startup in the content process (example) so I don't think it makes sense to move app-startup forward given it's just the JS we want to delay.

After talking with :mccr8 and :nika (e.g. comment 9), it sounds like we should continue with my approach. I like this idea to produce an error if we load a JS component too early so I'll look into that.

Flags: needinfo?(kmaglione+bmo)
Attachment #9263406 - Attachment description: WIP: Bug 1632682 - init scripts after ScriptCacheChild::Init in content process. → Bug 1632682 - init scripts after ScriptCacheChild::Init in content process. r=mccr8

I pushed my changes to try for mochitest-browser-chrome, specifically to execute the performance/browser_startup* tests: https://treeherder.mozilla.org/jobs?repo=try&revision=949506f23b7234d892a66851cb744052c411b99c

In a followup, we could probably hack up the category manager to produce an error if you try to load a JS component too early. It sounds like that's the only issue you uncovered.

This is at least partially covered by the assertion (inside ScriptPreloader that it must be init before getting a script) that I borrowed from :bdahl's patch. However, it's possible we try to load scripts without the ScriptPreloader in some cases that will cause the assertion to be missed. I did a quick skim and this is the only place that concerned me.

I tested my changes on desktop locally by making it sure it can load pages and inspecting profiles:

Desktop appears to have the same pattern (TelemetryControllerContent.jsm loads before the ScriptLoader is init which presumably prevents it from being cached) which appears to be addressed by my change. Curiously, IPCOut — PScriptCache::Msg___delete__ is delayed after my patch, letting us cache more scripts, but maybe that was a fluke.

Pushed by mcomella@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2e10bbb56523 init scripts after ScriptCacheChild::Init in content process. r=geckoview-reviewers,agi,mccr8
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Regressions: 1757648
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: