Open Bug 1446012 Opened 6 years ago Updated 2 years ago

The first startup after an update does main thread I/O to update.status several times

Categories

(Toolkit :: Application Update, enhancement, P3)

enhancement

Tracking

()

Performance Impact medium

People

(Reporter: florian, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf:startup, Whiteboard: [fix suggestion in comment 8])

First there's a .exists() call at https://searchfox.org/mozilla-central/rev/8976abf9cab8eb4661665cc86bd355cd08238011/toolkit/mozapps/update/nsUpdateServiceStub.js#30

Then https://searchfox.org/mozilla-central/rev/8976abf9cab8eb4661665cc86bd355cd08238011/toolkit/mozapps/update/nsUpdateService.js#1610 reads the same file just for the sake of setting a pref.

And then, the same code falls through to https://searchfox.org/mozilla-central/rev/8976abf9cab8eb4661665cc86bd355cd08238011/toolkit/mozapps/update/nsUpdateService.js#1635 with a _postUpdateProcessing call, and that reads the file again at https://searchfox.org/mozilla-central/rev/8976abf9cab8eb4661665cc86bd355cd08238011/toolkit/mozapps/update/nsUpdateService.js#1700

There are several more calls to readStatusFile that would be nice to audit, I wouldn't be surprised if a significant portion of them could be converted to async I/O.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Florian Quèze [:florian] from comment #0)
> Then
> https://searchfox.org/mozilla-central/rev/
> 8976abf9cab8eb4661665cc86bd355cd08238011/toolkit/mozapps/update/
> nsUpdateService.js#1610 reads the same file just for the sake of setting a
> pref.

At least initially, it seems the problem is that quite a few things hang off of that pref. pdfjs initialization uses it, as well as our startup decision to load an update page. Especially that last thing happens very early on (in handling of default commandline arguments). It's not clear to me off-hand how we could fix that without forgoing the IO.

Of course, we could consolidate the IO into 1 call that reads the file, but I'm skeptical that would make much of a performance difference because of disk caching, and if we would need to refactor to eliminate the sync IO completely then that refactor would likely eliminate the use of the optimization.

Doug, certainly within the perf team I suspect you're the most familiar with our updater code... do you have any ideas for what scope we have to fix this by somehow communicating the update status and/or override page in a way that doesn't require early-startup sync IO?
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(dothayer)
Well, the file is always pretty small. It seems like we could pass it as an argument from the updater back to firefox.exe no[1]? Other than that - do we have an intuition for how expensive this is, given that the updater just wrote to the file? I would expect it to be not so bad since it should be in the system file cache?

[1]: https://searchfox.org/mozilla-central/source/toolkit/mozapps/update/updater/updater.cpp#2080
Flags: needinfo?(dothayer)
Here is another more recent startup profile where the updater code does main thread I/O: https://perfht.ml/2LS415v

The stack is:
FileUtils_closeSafeFileOutputStream
writeStringToFile
writeStatusFile
Downloader_downloadUpdate
AUS_downloadUpdate
AUS__postUpdateProcessing
AUS_observe
Blocks: 1543203

I'd like to work on this bug.

(In reply to Doug Thayer [:dthayer] from comment #2)

It seems like we could pass it as an
argument from the updater back to firefox.exe no[1]?

An environment variable might be even easier than a command line parameter.

Do you think this bug is ready for Abdelrahman to work on it?

Flags: needinfo?(dothayer)

Sure.

Along with the env variable the pref to indicate that an update was just successfully performed should be removed and the following code will need to check for the env variable exists and it will need to remove it when it does.
https://searchfox.org/mozilla-central/source/browser/components/BrowserContentHandler.jsm#140-179

Flags: needinfo?(dothayer)

Sorry, the pref check is actually here and the code in getPostUpdateOverridePage shouldn't need to be changed.
https://searchfox.org/mozilla-central/source/browser/components/BrowserContentHandler.jsm#564

Assignee: nobody → a.ahmed1026

I've been thinking about this a bit and I think the code can be changed as follows:

Replace the use of the app.update.postupdate pref in BrowserContentHandler.jsm with a check for if there is an activeUpdate for OVERRIDE_NEW_MSTONE and OVERRIDE_NEW_BUILD_ID. For the OVERRIDE_NEW_MSTONE case when there is an activeUpdate it can be passed to getPostUpdateOverridePage and when there isn't an activeUpdate it can just use the value of the startup.homepage_override_url pref.
https://searchfox.org/mozilla-central/source/browser/components/BrowserContentHandler.jsm#555-577

Remove the following from UpdateService.jsm
https://searchfox.org/mozilla-central/source/toolkit/mozapps/update/UpdateService.jsm#1889-1894

Replace the following so it just QI's the activeUpdate to nsIWritablePropertyBag.
https://searchfox.org/mozilla-central/source/browser/components/BrowserContentHandler.jsm#143-158

The above would of course load the updates.xml when the user has manually installed a different build ID / version. I think that is acceptable given the code simplification.

Florian, do you think the approach outlined in comment #8 and comment #9 would be acceptable?

Flags: needinfo?(florian)

I'm afraid I don't understand the suggestion in comment 8 (and sorry for the delay, I thought I had already commented here last week).

In case this helps, here is a profile of startup after an update, showing the main thread I/O we do: https://perfht.ml/2X4N4ve

Flags: needinfo?(florian)

This will make it so app update doesn't work with any files in post-update-processing which was only done so the pref would be set in time for the What New page code in BrowserContentHandler.jsm.
https://searchfox.org/mozilla-central/source/toolkit/mozapps/update/UpdateService.jsm#1885

It is a first step in lessening this without the need for the pref, env var, or command line arg. I may split that part of this out to a separate bug since there are several other improvements that can be made here.

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #12)

Thanks! Seems like a good first step indeed.

Bug 1555088 won't work because there is another consumer of the pref which was added in bug 1389443. So, either that needs to be changed and bug 1555088 can move forward or another method will need to be devised.

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #14)

Bug 1555088 won't work because there is another consumer of the pref which was added in bug 1389443. So, either that needs to be changed and bug 1555088 can move forward or another method will need to be devised.

I'm fairly sure that "we just updated" can be determined using different prefs, e.g. by setting some kind of flag in the code that reads and updates the buildid/version flags in https://searchfox.org/mozilla-central/source/browser/components/BrowserContentHandler.jsm#532-539 , and make that accessible using some kind of getter.

It would be good to have a central and easy to access way to check for "first startup on a fresh profile" and "first startup after an update" (and maybe even "first startup after a crash", but I don't have consummers in mind for that last one).

(In reply to :Gijs (he/him) from comment #15)

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #14)

Bug 1555088 won't work because there is another consumer of the pref which was added in bug 1389443. So, either that needs to be changed and bug 1555088 can move forward or another method will need to be devised.

I'm fairly sure that "we just updated" can be determined using different prefs, e.g. by setting some kind of flag in the code that reads and updates the buildid/version flags in https://searchfox.org/mozilla-central/source/browser/components/BrowserContentHandler.jsm#532-539 , and make that accessible using some kind of getter.

The PdfJs.jsm code is using that pref before that code runs. I'm not saying there aren't ways, just that the way I thought would work that I filed that bug for won't because of that.

The Update Agent project will make it so updates are applied without running Firefox afterwards so it won't be possible to use an env var or a command line argument.

It might be acceptable to have bug 1389443 just check if the mstone version has changed instead of relying on the post update pref and then go with what I previously proposed. I suspect that it probably should already be doing that since that code won't run when a client that has manually updated by installing a newer version. If the code shouldn't run on downgrades it can just check if the version is greater.

edit the code for bug 1389443 also doesn't run for profiles that didn't perform the update and I think it isn't doing what I think the author thinks it is.

I talked with dthayer and he agrees with the assessment in comment #18 so I filed bug 1559195 to remove this use of the app.update.postupdate pref.

I'll go ahead and reopen / implement bug 1555088 after bug 1559195 is fixed.

Depends on: 1559195

:rstrong, do you know what's left to do here? It seems the startup tests still show us accessing this file, but the bugs in comment #20 are all marked fixed.

Flags: needinfo?(robert.strong.bugs)

When there is an update in progress (should be rare on release) and after an update is applied (happens approximately 2 to 3 times on release depending on how many point releases there are) there is sync file IO for the update.status file during profile-after-change.

Flags: needinfo?(robert.strong.bugs)

(In reply to Robert Strong [:rstrong] (Robert they/them - use needinfo to contact me) from comment #22)

When there is an update in progress (should be rare on release) and after an update is applied (happens approximately 2 to 3 times on release depending on how many point releases there are) there is sync file IO for the update.status file during profile-after-change.

Does this also happen for clean profiles, then? Maybe they get treated as the second case here? Otherwise I don't see why this is being tripped in the test... ( https://searchfox.org/mozilla-central/rev/7ed8e2d3d1d7a1464ba42763a33fd2e60efcaedc/browser/base/content/test/performance/browser_startup_mainthreadio.js#348-352 )

Flags: needinfo?(robert.strong.bugs)

New profiles check if the old update directory needs to be migrated and creates the new update directory if it doesn't exist.

For all profiles, there is also a check if a file exists which informs the client whether there is an update in progress.

Flags: needinfo?(robert.strong.bugs)

Florian captured another profile today where the file IO from the UpdateService was blocking the main thread in the parent process for ~1s.

https://perfht.ml/2QHxy8c

(In reply to Mike Conley (:mconley) (:⚙️) (Wayyyy behind on needinfos) from comment #25)

Florian captured another profile today where the file IO from the UpdateService was blocking the main thread in the parent process for ~1s.

https://perfht.ml/2QHxy8c

The syscall that blocked the main thread there was an fsync that I don't think is necessary. And what's also interesting in this profile is that the main thread I/O from the updater blocks the first network request of a page load, ie. this main thread I/O actively prevents the user from browsing.

We'll get to this as soon as we can.

Priority: -- → P2
Whiteboard: [fxperf:p2] → [fxperf:p2][fix suggestion in comment 8]
Priority: P2 → P3

The bug assignee didn't login in Bugzilla in the last 7 months.
:Amir, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: a.ahmed1026 → nobody
Flags: needinfo?(ahabibi)

It seems pretty safe to say that no one is currently working on this. The priority appears to already reflect this.

Flags: needinfo?(ahabibi)
Performance Impact: --- → P2
Keywords: perf:startup
Whiteboard: [fxperf:p2][fix suggestion in comment 8] → [fix suggestion in comment 8]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.