Closed Bug 1436584 Opened 6 years ago Closed 3 years ago

[meta] Eliminate JS nsIFile access (sync IO), especially on startup

Categories

(Firefox :: General, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox60 --- affected

People

(Reporter: Gijs, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [fxperf:meta])

Attachments

(4 files, 1 obsolete file)

STR:

1. instrument xpconnect for any access to the nsIFile-interface methods/properties from JS
2. run ./mach mochitest browser/base/content/test/urlbar/ --disable-e10s and process output
3. see attached file


The top 10 looks like this:

   1940 XPIInstall.jsm":768
   1940 XPIInstall.jsm":765
   1406 XPIInstall.jsm":769
    534 XPIInstall.jsm":772
    224 XPIProvider.jsm":961
    160 XPIProvider.jsm":889
    132 XPIProvider.jsm":360
    132 XPIProvider.jsm":359
    108 XPIInstall.jsm":112
    108 XPIInstall.jsm":111


Note especially how some of the identical (or almost-dientical) accesses are on subsequent line numbers. Some of the code at issue here basically takes a file object and then does a number of subsequent accesses. For example, it first checks if it's a symlink, then if it's a directory, then reads filesize. A lot of this could be simplified if we had a "stat" like API, or in some cases just outright removed (for example, it seems we collect add-on filesizes but never use it anywhere, as far as I can tell).

Other file accesses are just silly and in code that could be async - and sometimes already is, like this in TelemetryController.jsm:

    const scalarProbeFilename = "ScalarArtifactDefinitions.json";
    let scalarProbeFile = Services.dirsvc.get("GreBinD", Ci.nsIFile);
    scalarProbeFile.append(scalarProbeFilename);
    if (!scalarProbeFile.exists()) {
      this._log.trace("registerJsProbes - no scalar builtin JS probes");
      return;
    }

    // Load the file off the disk.
    let scalarJSProbes = {};
    try {
      let fileContent = await OS.File.read(scalarProbeFile.path, { encoding: "utf-8" });
      ...


Clearly, this could just read the file and catch the 'file not found' error instead, and use OS.File to produce the path of the file.
See Also: → 521264
A nitpick on the methodology: when you run mochitest from a local build, system add-ons don't get packed into xpis.  This is to make developers' lives easier, much like how all our jsms etc don't get packed into omni.ja.  But as a result, I think it means that this test doesn't really match what actually happens in the field in the cases we really care about.
I suspect that the addons manager will still be the biggest offender, but it will be a different set of accesses...
I'm displaying my ignorance here but is it possible to run mach package and then run mochitests against the result of that?  It doesn't look like mach package doesn't include all the resources needed to run tests by default...
We could also address this for just system add-ons by manually packing the contents of objdir/dist/Nightly.app/Contents/Resources/browser/features (that's the path on macos, it is of course different on other platforms)
(In reply to :Gijs from comment #0)
> The top 10 looks like this:
> 
>    1940 XPIInstall.jsm":768
>    1940 XPIInstall.jsm":765
>    1406 XPIInstall.jsm":769
>     534 XPIInstall.jsm":772
>     224 XPIProvider.jsm":961
>     160 XPIProvider.jsm":889
>     132 XPIProvider.jsm":360
>     132 XPIProvider.jsm":359
>     108 XPIInstall.jsm":112
>     108 XPIInstall.jsm":111

I doubt that these results are especially useful. We don't even load XPIInstall.jsm at startup in most sessions. There definitely is still some sync file access in AOM code during normal startups, but the settings we use for running mochitests are not the same as we use in production, and include things like extra synchronous directory scans. They don't reflect what happens for ordinary users.
(In reply to Andrew Swan [:aswan] from comment #2)
> is it possible to run mach package and
> then run mochitests against the result of that?

Yes, ./mach mochitest --appname dist [other params]

I'll do another run with that later today.

(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #3)
> I doubt that these results are especially useful. We don't even load
> XPIInstall.jsm at startup in most sessions.

How sure are you about this?
Flags: needinfo?(kmaglione+bmo)
(In reply to :Gijs from comment #1)
> Created attachment 8949227 [details]
> sorted-nsIFile-access.txt

The nsSearchService.js lines here are for .path and .leafName getters. I doubt these do main thread I/O, I would assume the cost there is only the cost of crossing xpconnect.
(In reply to :Gijs from comment #4)
> (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> blocked) from comment #3)
> > I doubt that these results are especially useful. We don't even load
> > XPIInstall.jsm at startup in most sessions.
> 
> How sure are you about this?

Fairly sure. We created XPIInstall.jsm specifically so it wouldn't be loaded at startup, which gave us a significant talos win. So unless something changed, and we somehow started loading it again without taking a talos hit, it's still not loaded during normal startup.
Flags: needinfo?(kmaglione+bmo)
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #6)
> (In reply to :Gijs from comment #4)
> > (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> > blocked) from comment #3)
> > > I doubt that these results are especially useful. We don't even load
> > > XPIInstall.jsm at startup in most sessions.
> > 
> > How sure are you about this?
> 
> Fairly sure. We created XPIInstall.jsm specifically so it wouldn't be loaded
> at startup, which gave us a significant talos win. So unless something
> changed, and we somehow started loading it again without taking a talos hit,
> it's still not loaded during normal startup.

The attached log is for a mochitest run with the packaged build, which does still show XPIInstall.jsm but it's much lower down and a lot fewer accesses - I assume these are the mochitest-related add-ons which won't be packaged.

I can confirm that a simple 'start browser, close browser' non-first-run run of the packaged build (so not with ./mach run or ./mach mochitest) on a clean profile doesn't seem to hit xpiinstall.jsm (or at least not in terms of using nsIFile).

There is still a bunch of logic in xpiprovider.jsm that looks like it could be improved (e.g. I see subsequent .exists() and .isDirectory accesses).

As noted in comment #0, I'm fairly sure the addon size information can just go away (ie I don't see it being used), which would eliminate that IO even for the cases when xpiinstall.jsm does get loaded (which I guess is only if add-ons have been modified, or something?).
Hrmpf, my grepping needed more revision on the non-mochitest runs (it was reporting 10th stack elements as well); this should be more accurate than the previous file.
Attachment #8949349 - Attachment is obsolete: true
It's not really about whether the add-ons are packaged. It's a matter of the values certain preferences have, and whether we're starting up with a new profile.

Mochitests always start with a fresh profile (as far as I recall, anyway), and we set preferences so that we scan for new/changed add-ons at startup, and install them without prompting. That's not how things work during normal sessions.

If you want a better test, I'd suggest talos ts_paint, which at least pre-creates a profile on the first run, and uses relatively normal settings. I don't remember whether we change the startup scan prefs, though...
Urgh. This has to go:

https://searchfox.org/mozilla-central/rev/a539f8f7fe288d0b4d8d6a32510be20f52bb7a0f/toolkit/mozapps/extensions/internal/XPIProvider.jsm#960-970

Most of the rest of the instances are accessing properties or methods that don't actually trigger IO. It would be nice to filter those out so I don't have to manually sort through all of them.

Also, the FileUtils.jsm ones aren't that helpful. It would be better to have the first stack entry that isn't in FileUtils.jsm, I think.
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #11)
> Urgh. This has to go:
> 
> https://searchfox.org/mozilla-central/rev/
> a539f8f7fe288d0b4d8d6a32510be20f52bb7a0f/toolkit/mozapps/extensions/internal/
> XPIProvider.jsm#960-970
> 
> Most of the rest of the instances are accessing properties or methods that
> don't actually trigger IO. It would be nice to filter those out so I don't
> have to manually sort through all of them.

Which ones? The earlier comment suggested append and leafName wouldn't require IO, but the code you highlighted doesn't seem to require IO either, so now I'm confused.

I've attached the raw log in case that's helpful...

> Also, the FileUtils.jsm ones aren't that helpful. It would be better to have
> the first stack entry that isn't in FileUtils.jsm, I think.

Some more grep results for this (not deduped):

//gre/modules/addons/XPIProvider.jsm":2023
//gre/modules/addons/XPIProvider.jsm":2023
//gre/modules/addons/XPIProvider.jsm":2023
//gre/modules/addons/XPIProvider.jsm":2023
//gre/modules/addons/XPIProvider.jsm":2023
//gre/modules/addons/XPIProvider.jsm":2023
//gre/modules/addons/XPIProvider.jsm":2044
//gre/modules/addons/XPIProvider.jsm":2056
//gre/modules/addons/XPIProvider.jsm":2056
//gre/modules/addons/XPIProvider.jsm":3023
//gre/modules/addons/XPIProvider.jsm":3023
//gre/modules/addons/XPIProvider.jsm":3243
//gre/modules/addons/XPIProviderUtils.js":271
/components/nsBlocklistService.js":680
/components/nsUpdateServiceStub.js":23
/components/nsUpdateServiceStub.js":23
/components/nsUpdateServiceStub.js":23
(In reply to :Gijs from comment #12)
> Which ones? The earlier comment suggested append and leafName wouldn't
> require IO, but the code you highlighted doesn't seem to require IO either,
> so now I'm confused.

Not related to the lines I highlighted. Most of the results from your last run that I looked at were the .path getter, which doesn't do any IO. It's a simple string copy.
Note that the timings in the raw log attempt to only measure xpconnect overhead (ie they give no indication of the actual cost of the method, just of the xpconnect machinery).

(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #10)
> If you want a better test, I'd suggest talos ts_paint, which at least
> pre-creates a profile on the first run, and uses relatively normal settings.
> I don't remember whether we change the startup scan prefs, though...

I tried to look at this but it seems impossible to pipe the gecko output to a file (at least when run with ./mach talos-test, on Windows), which makes it impossible to process the logs (which, as you can see, are rather large).


While I agree that actual file IO is probably the thing to focus on, I think for path manipulation we should encourage code to move to OS.Path instead, which should be much more lightweight than creating and then GC'ing lots of nsIFile instances (where the code you highlighted is just one somewhat extreme example). Perhaps then we can create a test that breaks if people add further nsIFile usage.
(In reply to :Gijs from comment #14)
> I tried to look at this but it seems impossible to pipe the gecko output to
> a file (at least when run with ./mach talos-test, on Windows), which makes
> it impossible to process the logs (which, as you can see, are rather large).

Oh, right... The talos harness is the worst. :/

> While I agree that actual file IO is probably the thing to focus on, I think
> for path manipulation we should encourage code to move to OS.Path instead,
> which should be much more lightweight than creating and then GC'ing lots of
> nsIFile instances (where the code you highlighted is just one somewhat
> extreme example). Perhaps then we can create a test that breaks if people
> add further nsIFile usage.

I'm not so sure about that. The OS.File path manipulation methods are fiendishly expensive. I spent way too much time fighting with them when I was trying to make the SDK loader code reasonably efficient. And there are a lot of places where we actually need nsIFile objects to pass to a native API, anyway, so dealing with them in the intermediate stages isn't a huge amount of extra overhead.

It still might be worth trying to get rid of nsIFile usage more broadly, I just don't see any reason to immediately jump to that conclusion.
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #10)
> If you want a better test, I'd suggest talos ts_paint, which at least
> pre-creates a profile on the first run, and uses relatively normal settings.
> I don't remember whether we change the startup scan prefs, though...

Kris alluded to this but to make it explicit: the addons manager keeps the information it needs for startup in a single file (addonStartup.json.lz4) so it should be able to start up in an existing profile without excessive I/O.  But the first run in a new profile requires that we gather all the information needed to build the addonStartup file and that path is not very well optimized.  I'm all for improving that path, but just want to be clear about the two paths and when they are each taken.

Then there's a separate issue about scanning for sideloaded addons.  Some time last year, Kris changed the regular browser so that those scans can be delayed until after startup.  But a bunch of things in automation rely on dropping test addons into a profile and then having them activated early in startup, so harnesses like mochitest flip prefs to get that behavior back.  As you have already discussed, each test harness does things a little differently, meh...
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #15)
> The OS.File path manipulation methods are fiendishly expensive.
> I spent way too much time fighting with them when I
> was trying to make the SDK loader code reasonably efficient.

This seems unfortunate for something that should, in essence, just be string manipulation. Are there bugs on file on this and/or do you recall specifics?
Flags: needinfo?(kmaglione+bmo)
(In reply to :Gijs from comment #17)
> (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> blocked) from comment #15)
> > The OS.File path manipulation methods are fiendishly expensive.
> > I spent way too much time fighting with them when I
> > was trying to make the SDK loader code reasonably efficient.
> 
> This seems unfortunate for something that should, in essence, just be string
> manipulation. Are there bugs on file on this and/or do you recall specifics?

Sorry, I don't remember the specifics, just that this was a major performance suck in the SDK loader. I'm not sure if there are bugs filed.

I think a lot of it had to do with the path normalization code. The main problem with the Unix Path.join is that it uses rest args and for-of loops, which don't optimize well, especially when called cross-compartment. Windows Path.join has its own set of problems, since it does a bunch of fairly expensive absolute path checks and regexp-based normalizations at every step.

The nsIFile implementations don't have these problems.
Flags: needinfo?(kmaglione+bmo)
(In reply to :Gijs (out until 26th) from comment #17)
> (In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're
> blocked) from comment #15)
> > The OS.File path manipulation methods are fiendishly expensive.
> > I spent way too much time fighting with them when I
> > was trying to make the SDK loader code reasonably efficient.
> 
> This seems unfortunate for something that should, in essence, just be string
> manipulation. Are there bugs on file on this and/or do you recall specifics?

Bug 1388134 is an example of why attempting to use OS.File during early startup is a bad idea.

The meta keyword is there, the bug doesn't depend on other bugs and there is no activity for 12 months.
:Dolske, maybe it's time to close this bug?

Flags: needinfo?(dolske)
Flags: needinfo?(dolske)
Whiteboard: [fxperf] → [fxperf:meta]
Priority: -- → P3

The meta keyword is there, the bug doesn't depend on other bugs and there is no activity for 12 months.
:mossop, maybe it's time to close this bug?

Flags: needinfo?(dtownsend)

I imagine so but since Gijs filed it he can decide.

Flags: needinfo?(dtownsend) → needinfo?(gijskruitbosch+bugs)
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(gijskruitbosch+bugs)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: