startup cache is purged on second run of windows builds

RESOLVED FIXED in Firefox 66

Status

()

defect
RESOLVED FIXED
3 months ago
a month ago

People

(Reporter: mossop, Assigned: mossop)

Tracking

unspecified
mozilla66
Points:
---

Firefox Tracking Flags

(firefox66 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

3 months ago

When Firefox runs we cache a number of things, JS, XUL, XBL, CSS etc. Whenever a new build of Firefox runs against a profile we clear the cache to rebuild with the new information.

On the first run of a profile where compatibility.ini doesn't exist (presumably to cover the case of upgrading from a very old versions of Firefox, but also affects new profiles) we attempt to clear the cache as well.

If any attempt to clear the cache fails we set a flag in compatibility.ini telling us to clear the cache on next startup.

Unfortunately nsIFile.remove returns different error codes on different platforms when the file in question already doesn't exist. So an attempt to remove a cache that does't exist is counted as a failure on Windows and so the cache will be cleared again on second run.

(Assignee)

Comment 1

3 months ago
When Firefox runs we cache a number of things, JS, XUL, XBL, CSS etc. Whenever
a new build of Firefox runs against a profile we clear the cache to rebuild with
the new information.

On the first run of a profile where compatibility.ini doesn't exist (presumably
to cover the case of upgrading from a very old versions of Firefox, but also
affects new profiles) we attempt to clear the cache as well.

If any attempt to clear the cache fails we set a flag in compatibility.ini
telling us to clear the cache on next startup.

Unfortunately nsIFile.remove returns different error codes on different
platforms when the file in question already doesn't exist. So an attempt to
remove a cache that does't exist is counted as a failure on Windows and so the
cache will be cleared again on second run.

This change counts the file not found return code as counting as a success when
clearing the cache.

Comment 2

3 months ago

Added this to the "this week I learned" section of our meeting notes, fwiw...

(Assignee)

Comment 3

3 months ago

Filed bug 1519200 as a follow-up.

Comment 4

3 months ago
Pushed by dtownsend@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f747aad08cbc
Fix cache purging on windows. r=Gijs

Comment 5

3 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

Noticed some big tp5n improvements on Windows thanks to this! \0/

== Change summary for alert #18689 (as of Thu, 10 Jan 2019 18:41:11 GMT) ==

Improvements:

31% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 1,043.21 -> 723.74
22% tp5n nonmain_startup_fileio windows7-32 opt e10s stylo 2,572,989.83 -> 2,007,252.17
21% tp5n time_to_session_store_window_restored_ms windows7-32 opt e10s stylo 1,018.64 -> 800.80
21% tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo 2,525,237.75 -> 1,995,149.17
20% tp5n time_to_session_store_window_restored_ms windows10-64 pgo e10s stylo 821.90 -> 653.93

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18689

(Assignee)

Comment 7

3 months ago

Yeah, essentially those are just because it now uses the startup cache when doing measurements. Those improvements won't really affect users except on the second startup of a build.

Dave, I noticed a 27% tp5n main_normal_fileio regression.
Was this also expected?

Flags: needinfo?(dtownsend)
(Assignee)

Comment 9

2 months ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)

Dave, I noticed a 27% tp5n main_normal_fileio regression.
Was this also expected?

This change would certainly change the file i/o characteristics of talos tests that run in the second run of Firefox, making them much closer to what users see in general. I'm not sure what tp5n main_normal_fileio is though so I can't guess why it regressed. Is there any info on it?

Flags: needinfo?(dtownsend) → needinfo?(igoldan)

(In reply to Dave Townsend [:mossop] (he/him) from comment #9)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)

Dave, I noticed a 27% tp5n main_normal_fileio regression.
Was this also expected?

This change would certainly change the file i/o characteristics of talos tests that run in the second run of Firefox, making them much closer to what users see in general. I'm not sure what tp5n main_normal_fileio is though so I can't guess why it regressed. Is there any info on it?

This is a xperf subtest and you can read more about this suite from here. I think main_normal_fileio stands for the main thread's disk I/O usage after the browser started up.

Am I correct, Aaron?

Flags: needinfo?(igoldan) → needinfo?(aklotz)

Correct, where the delineation between "startup" and "normal" is the sessionstore-windows-restored event.

Flags: needinfo?(aklotz)
(Assignee)

Comment 12

2 months ago

This sounds odd then. Is there any way to see the files being accessed?

In the non-cached case wouldn't we just be loading files out of the jar, whereas in the cached case, we'd be loading jar file + the startup cache file? That could explain more i/o.

(Assignee)

Comment 14

2 months ago

(In reply to Brendan Dahl [:bdahl] from comment #13)

In the non-cached case wouldn't we just be loading files out of the jar, whereas in the cached case, we'd be loading jar file + the startup cache file? That could explain more i/o.

Sure, but I'd expect all the cached stuff to be loaded by the time sessionstore-windows-restored fired.

Comment 15

2 months ago

(In reply to Dave Townsend [:mossop] (he/him) from comment #12)

This sounds odd then. Is there any way to see the files being accessed?

Needinfo for this, so we can get to the bottom of this. Given some of the other "interesting" hangs/issues I've seen when I messed with startupcache (bug 1445739), I'd really like to understand this better. I suspect shutdown IO to the cache for things that didn't load on first startup but loaded on second startup, but lists of files etc. would really help.

Flags: needinfo?(igoldan)

Comment 16

2 months ago

(Also, it's Not Good that the regression is for mainthread IO.)

(In reply to Dave Townsend [:mossop] (he/him) from comment #14)

(In reply to Brendan Dahl [:bdahl] from comment #13)

In the non-cached case wouldn't we just be loading files out of the jar, whereas in the cached case, we'd be loading jar file + the startup cache file? That could explain more i/o.

Sure, but I'd expect all the cached stuff to be loaded by the time sessionstore-windows-restored fired.

Never mind, missed that this was after the session windows restored, was thinking it was startup.

(In reply to :Gijs (he/him) from comment #15)

(In reply to Dave Townsend [:mossop] (he/him) from comment #12)

This sounds odd then. Is there any way to see the files being accessed?

Needinfo for this, so we can get to the bottom of this.

Aaron, could you provide some assistance here? It would save us some time, as I'm not too familiar with xperf tests.

Flags: needinfo?(igoldan) → needinfo?(aklotz)

I seem to recall that at one point we uploaded the original etl file as an artifact, but I'm not seeing any here.

Joel, am I recalling that correctly? Or do we only do that if the xperf suite fails?

Flags: needinfo?(aklotz) → needinfo?(jmaher)

I believe it is only if xperf fails. You can always push to try and make it happen with some small edits. I have always wanted to make xperf easier to debug (i.e. retrigger and collect etl/metrics)

Flags: needinfo?(jmaher)
You need to log in before you can comment on or make changes to this bug.