Open Bug 1541601 Opened 6 years ago Updated 8 months ago

Very very very carefully... remove channel-prefs.js and stop reading it from the main thread early on startup

Categories

(Toolkit :: Startup and Profile System, defect)

defect

Tracking

()

Performance Impact medium

People

(Reporter: Gijs, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: main-thread-io, perf, perf:startup, Whiteboard: [fxperfsize:L])

channel-prefs.js sets exactly 1 pref. I've seen bug 756325 where :rstrong suggested we should aim for removal.

I'd like us to consider doing that again, to avoid the relevant mainthread IO from reading a separate file on startup (just for 1 config option!).

I'm not entirely clear on why this didn't happen in 756325, or why we need a file that's separate from the user profile's prefs.js file to control the update channel. I assume that if the machine admin needs to lock the update channel somehow they could use enterprise policy to achieve this.

I understand we might need a migration step for users who have changed the file in the app dir. I can help with this if people feel that's a stumbling block.

Rob/Chris, can you help enlighten me as to why we need this file / what's hard about "just" moving the relevant pref to a compiled-in thing in the update service?

Flags: needinfo?(robert.strong.bugs)
Flags: needinfo?(cmanchester)

QA uses it when testing.

Flags: needinfo?(robert.strong.bugs)

Also, with it compiled or added to the omni.ja it wouldn't be possible to test updating on beta to rc builds without recompiling or changing what is in the omni.ja.

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #1)

QA uses it when testing.

We can update QA instructions. :-)

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #2)

Also, with it compiled or added to the omni.ja it wouldn't be possible to test updating on beta to rc builds without recompiling or changing what is in the omni.ja.

Why not? I assume just overriding this pref in prefs.js works, right? Change the same pref in prefs.js, then start Firefox?

Flags: needinfo?(robert.strong.bugs)

Having the file in omni.ja doesn't prevent users from adding the file where it used to be, and that would be picked up by Firefox, and override whatever's set in the omni.ja ; shouldn't it?

(In reply to Mike Hommey [:glandium] from comment #4)

Having the file in omni.ja doesn't prevent users from adding the file where it used to be, and that would be picked up by Firefox, and override whatever's set in the omni.ja ; shouldn't it?

So to be clear, I want to stop reading the file, because it adds unnecessary mainthread IO overhead.

To me, the simplest solution seems to be compiling the information that's currently preprocessed into this file into the binary, and continuing to allow the pref (if set, like any other pref) to override it.

If there is some reason that I don't yet see why that can't work, we could perhaps move the reading of this file off the main thread and off the startup path, and delay starting the update service until we have read the relevant value... but that seems like more work to me.

Summary: Very very very carefully... remove channel-prefs.js → Very very very carefully... remove channel-prefs.js and stop reading it from the main thread early on startup

If the file is in omni.ja, or the pref in greprefs.js, there will virtually be no IO overhead because the surrounding data in omni.ja would already have been read.

As far as I know, this pref isn't read as a normal user pref, is it? You can't set it in prefs.js.

(I've never tried user.js).

Note that we still need to continue to check for JS file in defaults/pref on startup for other reasons (Autoconfig).

Last I checked the omni.ja prefs override the ones in prefs.

We update beta users to RC and not just QA systems. If the channel were baked in some how beta users would end up on release when we test RC's on beta.

I've considered doing something along the lines of what we do with project branches on all channels.
nightly (nightly baked in)
nightly-oak (nightly baked in and "-oak" determined by an external file)

This way the base channel name couldn't be set to something that isn't supported and it would still allow QA to use test channels. This would require beta and aurora to be something along the lines of release-beta and release-aurora. This would also support beta users updating to RC's.

It is read from the default prefs which is why user prefs don't work to set this.

Flags: needinfo?(robert.strong.bugs)

My only knowledge of this file is that the updater treats it specially, which can cause problems when updating its contents (as seen in bug 1431342).

Flags: needinfo?(cmanchester)

(In reply to Chris Manchester (:chmanchester) from comment #9)

My only knowledge of this file is that the updater treats it specially, which can cause problems when updating its contents (as seen in bug 1431342).

That is by request of QA and RelEng. I've often wanted to remove this special handling but it would require more work especially by RelEng to support updating beta users to RC's.

(In reply to :Gijs (he/him) from comment #5)

If there is some reason that I don't yet see why that can't work, we could perhaps move the reading of this file off the main thread and off the startup path, and delay starting the update service until we have read the relevant value... but that seems like more work to me.

The update service already only loads a very small stub unless there is an update in progress in which case the channel is likely needed. Telemetry on the other hand add this to the environment.

I've also considered using a plain file to read this value since it doesn't have to be a pref file but I don't think that would help much here though it would remove it from the read prefs path.

(In reply to Mike Hommey [:glandium] from comment #6)

If the file is in omni.ja, or the pref in greprefs.js, there will virtually be no IO overhead because the surrounding data in omni.ja would already have been read.

Sure, but it's not in omni.ja today, and if we still need to pick it up when the user adds the file outside of omni.ja then we're still going to be doing the same bad IO.

(In reply to Mike Kaply [:mkaply] from comment #7)

Note that we still need to continue to check for JS file in defaults/pref on startup for other reasons (Autoconfig).

At what point are we aiming to remove autoconfig in favour of enterprise policies? If not for this (68) esr cycle, can we do it straight after that train has left?

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #8)

We update beta users to RC and not just QA systems. If the channel were baked in some how beta users would end up on release when we test RC's on beta.

Ah, hm. Yeah, that's unfortunate.

I've considered doing something along the lines of what we do with project branches on all channels.
nightly (nightly baked in)
nightly-oak (nightly baked in and "-oak" determined by an external file)

This way the base channel name couldn't be set to something that isn't supported and it would still allow QA to use test channels. This would require beta and aurora to be something along the lines of release-beta and release-aurora. This would also support beta users updating to RC's.

How would the beta->rc->beta case work in that case? I assume the rc would still 'just' have "release", not "release-beta" baked in... right? How would the update server know to serve beta updates to people on the rc build (having "come there" from beta)?

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #10)

(In reply to :Gijs (he/him) from comment #5)

If there is some reason that I don't yet see why that can't work, we could perhaps move the reading of this file off the main thread and off the startup path, and delay starting the update service until we have read the relevant value... but that seems like more work to me.

The update service already only loads a very small stub unless there is an update in progress in which case the channel is likely needed. Telemetry on the other hand add this to the environment.

Right. I think telemetry can wait, or use a compiled-in value until we update it with the data we read OMT from this other file.

I've also considered using a plain file to read this value since it doesn't have to be a pref file but I don't think that would help much here though it would remove it from the read prefs path.

Yeah, I don't think that'd help too much, though we might still want to do it as part of moving this out of the startup path. Still interested in the release-beta thing though, if that would let us get rid of this IO completely...

Flags: needinfo?(robert.strong.bugs)
Flags: needinfo?(mozilla)

At what point are we aiming to remove autoconfig in favour of enterprise policies? If not for this (68) esr cycle, can we do it straight after that train has left?

We have no plans to completely remove Autoconfig. There are still a ton of prefs that we don't want to add to policies that folks have requirements to set for their unique environments.

How big of a startup hit are we talking about here?

Flags: needinfo?(mozilla)

(In reply to Mike Kaply [:mkaply] from comment #12)

At what point are we aiming to remove autoconfig in favour of enterprise policies? If not for this (68) esr cycle, can we do it straight after that train has left?

We have no plans to completely remove Autoconfig. There are still a ton of prefs that we don't want to add to policies that folks have requirements to set for their unique environments.

Do you know if people abuse the existing channel-prefs.js to add their own stuff, or are they more likely to add separate files?

How big of a startup hit are we talking about here?

Like any IO, it's very hard to predict. On my super fast machine where this lives on an SSD, less than 1ms. On a slow reference hardware device (laptop with spinny disk), probably several ms, more if you're unlucky.

Do you know if people abuse the existing channel-prefs.js to add their own stuff, or are they more likely to add separate files?

Add separate files.

(In reply to :Gijs (he/him) from comment #11)

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #8)

We update beta users to RC and not just QA systems. If the channel were baked in some how beta users would end up on release when we test RC's on beta.

Ah, hm. Yeah, that's unfortunate.

I've considered doing something along the lines of what we do with project branches on all channels.
nightly (nightly baked in)
nightly-oak (nightly baked in and "-oak" determined by an external file)

This way the base channel name couldn't be set to something that isn't supported and it would still allow QA to use test channels. This would require beta and aurora to be something along the lines of release-beta and release-aurora. This would also support beta users updating to RC's.

How would the beta->rc->beta case work in that case? I assume the rc would still 'just' have "release", not "release-beta" baked in... right? How would the update server know to serve beta updates to people on the rc build (having "come there" from beta)?

Beta would have an external file that the contents of is appended to the base channel. So, an RC and a Beta build would both have a base channel name of "release" and Beta would have this external file that contains "beta" which would make the channel "release-beta". The external file typically wouldn't be updated (deleted or modified) similar to how the channel-pref.js file isn't updated.

This wouldn't get rid of the IO on Beta and Aurora of course. The reason I originally considered going down this path is that there are clients that have a modified channel name and this would allow us to continue to update them.

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #10)

(In reply to :Gijs (he/him) from comment #5)

If there is some reason that I don't yet see why that can't work, we could perhaps move the reading of this file off the main thread and off the startup path, and delay starting the update service until we have read the relevant value... but that seems like more work to me.

The update service already only loads a very small stub unless there is an update in progress in which case the channel is likely needed. Telemetry on the other hand add this to the environment.

Right. I think telemetry can wait, or use a compiled-in value until we update it with the data we read OMT from this other file.

I've also considered using a plain file to read this value since it doesn't have to be a pref file but I don't think that would help much here though it would remove it from the read prefs path.

Yeah, I don't think that'd help too much, though we might still want to do it as part of moving this out of the startup path. Still interested in the release-beta thing though, if that would let us get rid of this IO completely...

Since this is mainly about the startup path it might make sense to just store this information in a separate file and make telemetry read this later on after startup.

There is actually already a compiled in value as a fallback for when the channel-prefs.js file doesn't exist that I added way back when that has been changed from the original implementation to use AppConstants.MOZ_UPDATE_CHANNEL.
https://searchfox.org/mozilla-central/source/toolkit/modules/UpdateUtils.jsm#49

In an ideal world it would be possible to create a beta RC build and a release RC build. Only the value in AppConstants would need to be changed after all and it would remove the additional IO on beta and aurora as well.

(In reply to Mike Kaply [:mkaply] from comment #12)

At what point are we aiming to remove autoconfig in favour of enterprise policies? If not for this (68) esr cycle, can we do it straight after that train has left?

We have no plans to completely remove Autoconfig.

Is this a recent change of plan? The enterprise policies project has been presented to me as what would enable removal of autoconfig, which is a known source of search malware.

There are still a ton of prefs that we don't want to add to policies that folks have requirements to set for their unique environments.

I'm confused by this statement. Why do we not want to add them to policies if we want to let people set them?

If we can't remove autoconfig now, can we make it esr-only?

Flags: needinfo?(mozilla)

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #15)

How would the beta->rc->beta case work in that case? I assume the rc would still 'just' have "release", not "release-beta" baked in... right? How would the update server know to serve beta updates to people on the rc build (having "come there" from beta)?

Beta would have an external file that the contents of is appended to the base channel. So, an RC and a Beta build would both have a base channel name of "release" and Beta would have this external file that contains "beta" which would make the channel "release-beta". The external file typically wouldn't be updated (deleted or modified) similar to how the channel-pref.js file isn't updated.

This wouldn't get rid of the IO on Beta and Aurora of course. The reason I originally considered going down this path is that there are clients that have a modified channel name and this would allow us to continue to update them.

Would it be possible to have non-release builds (beta/aurora/nightly) save their update channel into a user pref (so saved in the user profile, rather than in the application folder), and to have all builds (including release, RC) read that pref. If the pref exists and its value is different from the current build's built-in channel, we would send <built-in channel>-<channel name from the pref> as the update channel.

Ie. RC builds would send "release-beta", beta builds would send "beta".

This doesn't require any extra file, so no additional I/O.

I think this would have been a problem if a user had the same profile for multiple builds, but now that we are switching to a profile-per channel model, it might be ok.

That would have the nice side effect of making it easy to switch channel, although I wonder how that would interact with the profile-per-channel model.

(In reply to Florian Quèze [:florian] from comment #17)

Ie. RC builds would send "release-beta", beta builds would send "beta".

This would mean that if you downloaded a beta build when beta was at the RC stage (or if you created a profile / used firefox reset while using beta-at-rc), you'd get 'release' in your profile, no? That seems unfortunate...

(In reply to Florian Quèze [:florian] from comment #16)

(In reply to Mike Kaply [:mkaply] from comment #12)

At what point are we aiming to remove autoconfig in favour of enterprise policies? If not for this (68) esr cycle, can we do it straight after that train has left?

We have no plans to completely remove Autoconfig.

Is this a recent change of plan? The enterprise policies project has been
presented to me as what would enable removal of autoconfig, which is a known
source of search malware.

No, that has never been the case. It was to sandbox Autoconfig to its original API which is setting and locking prefs. There was never a plan to completely remove it. It is currently sandboxed by default.

There are still a ton of prefs that we don't want to add to policies that folks have requirements to set for their unique environments.

I'm confused by this statement. Why do we not want to add them to policies
if we want to let people set them?

Because we would have to add thousands of policies for preferences (or add support for arbitrary preferences which I would rather not do). Every pref that we have someone wants to set and I'm not going to prevent them from doing so via the mechanism that has been around for 20+ years.

Feel feel to read the issues in https://github.com/mozilla/policy-templates/issues/ to get a sense of what people are changing. We are going to add a lot of prefs to policy, but we're not going to add all of them.

If we can't remove autoconfig now, can we make it esr-only?

No. We don't want to force folks to use the ESR. ESR is not an enterprise solution, it is a long term support branch.

Side note - the way to solve the prefs problem is to remove all the prefs we have if we never intended a user to change them. As long as we have a pref, someone is going to come up with a reason to change it.

Flags: needinfo?(mozilla)

(In reply to Mike Hommey [:glandium] from comment #18)

That would have the nice side effect of making it easy to switch channel, although I wonder how that would interact with the profile-per-channel model.

I implemented channel switching for Firefox 5 and there is more that would have to be done to support channel switching. There are also things that couldn't be done (changing the bundle name, install dir, shortcut names, dock names, etc.) which were originally deemed acceptable by upper management and then later decided by upper management as unacceptable.

Flags: needinfo?(robert.strong.bugs)

(In reply to Florian Quèze [:florian] from comment #17)

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #15)

How would the beta->rc->beta case work in that case? I assume the rc would still 'just' have "release", not "release-beta" baked in... right? How would the update server know to serve beta updates to people on the rc build (having "come there" from beta)?

Beta would have an external file that the contents of is appended to the base channel. So, an RC and a Beta build would both have a base channel name of "release" and Beta would have this external file that contains "beta" which would make the channel "release-beta". The external file typically wouldn't be updated (deleted or modified) similar to how the channel-pref.js file isn't updated.

This wouldn't get rid of the IO on Beta and Aurora of course. The reason I originally considered going down this path is that there are clients that have a modified channel name and this would allow us to continue to update them.

Would it be possible to have non-release builds (beta/aurora/nightly) save their update channel into a user pref (so saved in the user profile, rather than in the application folder), and to have all builds (including release, RC) read that pref. If the pref exists and its value is different from the current build's built-in channel, we would send <built-in channel>-<channel name from the pref> as the update channel.

Ie. RC builds would send "release-beta", beta builds would send "beta".

This doesn't require any extra file, so no additional I/O.

I think this would have been a problem if a user had the same profile for multiple builds, but now that we are switching to a profile-per channel model, it might be ok.

Possibly though it would break updates and thereby orphan the client for the aurora and nightly channels if the client changed the value since there is a security team channel requirement (there is an exception for beta and release). If this were the case we wouldn't actually need the base channel name either though it would likely be something 3rd parties that compile Firefox with update enabled would want.

Also, comment #19

To avoid this I'd much prefer just going with a separate file in the app dir that isn't read on startup except by app update when there is an update in progress and even that could likely be read at a later point.

What about having two omni.ja's for RC builds... one for beta and one for release? The two OS X bundles would have to be signed and MAR files for all platforms would have to be created for both the beta RC and release RC. This should be possible to automate without risking orphaning clients.

(In reply to Mike Kaply [:mkaply] from comment #20)

Because we would have to add thousands of policies for preferences (or add support for arbitrary preferences which I would rather not do).

Could we add a policy that enables autoconfig?

Could we add a policy that enables autoconfig?

I've thought about that, but I'm not sure we'd be able to do it early enough in startup. Plus it makes autoconfig require policy and that doesn't necessarily make sense. We don't really have a sense of how many people just use basic Autoconfig in the wild, so I'm not ready to make changes to break that yet. (telemetry doesn't help because these are enterprises).

The question is really for Mossop, though. If channel-prefs.js is gone, will we still be able to process other prefs files in the directory if they happen to be there.

(In reply to :Gijs (he/him) from comment #13)

How big of a startup hit are we talking about here?

Like any IO, it's very hard to predict. On my super fast machine where this lives on an SSD, less than 1ms. On a slow reference hardware device (laptop with spinny disk), probably several ms, more if you're unlucky.

I think we should come back to this before we go any further with big proposals for change.

(In reply to :Gijs (he/him) from comment #13)

Like any IO, it's very hard to predict. On my super fast machine where this lives on an SSD, less than 1ms. On a slow reference hardware device (laptop with spinny disk), probably several ms, more if you're unlucky.

I've not had any particularly slow startup yet, but here is a profile where it took 3ms, which is enough to have it appear in the sampling: https://perfht.ml/2UvR3Tr

There is also Bug 1278719 which proposes using Snap to distribute Firefox on Linux. The desire is to use our official builds AND to disable updates. This will likely either require using an external file or modifying the omni.ja as mentioned previously to disable app update for this distribution.

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #22)

What about having two omni.ja's for RC builds... one for beta and one for release? The two OS X bundles would have to be signed and MAR files for all platforms would have to be created for both the beta RC and release RC. This should be possible to automate without risking orphaning clients.

How much work is this from the releng side?

If this is relatively easy to do in today's world, I think we should consider moving to that model. If not, I think in the short term we could move the preprocessed file to not be a prefs.js file so it doesn't get read super early on startup as it does now, and keep everything else more or less the same.

(In reply to Mike Kaply [:mkaply] from comment #24)

If channel-prefs.js is gone, will we still be able to process other prefs files in the directory if they happen to be there.

As far as this bug goes, yes. I don't think that's an ideal situation, but that's a discussion for somewhere else ie not this bug.

Flags: needinfo?(nthomas)

(In reply to :Gijs (he/him) from comment #28)

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #22)

What about having two omni.ja's for RC builds... one for beta and one for release? ...
How much work is this from the releng side?

I'd estimate this is quite a large amount of work because it would modify files near the start of the chain of release tasks, and we have ~600 platform+locale combinations. We'd need an omni.ja modifier, and to duplicate a lot of the tasks downstream of compilation (OS X signing, partial+complete generation & signing), both development time and runtime for every RC build. omni.ja signing is probably coming too. RC builds are a special case for the release automation (ie not tested in the beta cycle) so adding more divergence would increase risk of bustage in early RC builds. Definitely not my first choice.

Stepping back, we could ask Release Management if publishing release candidates to the beta channel is still valuable. I've heard that beta and release populations are quite different, and it would be interesting to know how often we detect and fix issues in the week of bake-time on the beta channel. We might greatly simplify our discussion here if we can drop the rc-on-beta approach.

If this is relatively easy to do in today's world, I think we should consider moving to that model. If not, I think in the short term we could move the preprocessed file to not be a prefs.js file so it doesn't get read super early on startup as it does now, and keep everything else more or less the same.

A release-beta/release-aurora/etc channel change would affect the update server setup and I haven't scoped that out.

Reading the channel from a different file would be relatively straightforward to deal with, mainly adjusting the special handling of channel-prefs.js in the mar generation scripts and handling the transition. We'd probably need a watershed on beta to avoid <old-style beta> --> <new-style rc> updates which could move people onto the release channel. Other channels wouldn't need that.

Flags: needinfo?(nthomas)

(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #29)

OK, so sounds like that's not trivial. Focusing on other options...

Stepping back, we could ask Release Management if publishing release candidates to the beta channel is still valuable. I've heard that beta and release populations are quite different, and it would be interesting to know how often we detect and fix issues in the week of bake-time on the beta channel. We might greatly simplify our discussion here if we can drop the rc-on-beta approach.

Ritu, can you speak to this?

If this is relatively easy to do in today's world, I think we should consider moving to that model. If not, I think in the short term we could move the preprocessed file to not be a prefs.js file so it doesn't get read super early on startup as it does now, and keep everything else more or less the same.

A release-beta/release-aurora/etc channel change would affect the update server setup and I haven't scoped that out.

Reading the channel from a different file would be relatively straightforward to deal with, mainly adjusting the special handling of channel-prefs.js in the mar generation scripts and handling the transition. We'd probably need a watershed on beta to avoid <old-style beta> --> <new-style rc> updates which could move people onto the release channel. Other channels wouldn't need that.

I'm a little bit puzzled. What does 'watershed' mean in this context? I'm assuming the updater could rm the old channel-prefs.js file and create the new file with the "right" contents... is that a mistaken assumption?

That is, from the app's perspective, all that needs to happen is updating a bunch of references and creating a '.dat' or '.ini' or whatever file that only contains the channel, nothing else, and lives outside the defaults/pref/ directory. I assumed that the updater could take care of migrating the channel-prefs file to the new file, and ensuring the old file is removed and the new file has the "right" content. I assume it needs to be done by the updater because it will need elevated privileges on some systems. Is there something I'm missing in that respect?

Flags: needinfo?(rkothari)
Flags: needinfo?(nthomas)

nthomas, for my own understanding, the additional files would be the same as if there was an additional beta build though the only file that would need to be updated is in the omni.ja so there would be one for beta and one for release?

(In reply to :Gijs (he/him) from comment #30)

...
I'm a little bit puzzled. What does 'watershed' mean in this context? I'm assuming the updater could rm the old channel-prefs.js file and create the new file with the "right" contents... is that a mistaken assumption?

That is, from the app's perspective, all that needs to happen is updating a bunch of references and creating a '.dat' or '.ini' or whatever file that only contains the channel, nothing else, and lives outside the defaults/pref/ directory. I assumed that the updater could take care of migrating the channel-prefs file to the new file, and ensuring the old file is removed and the new file has the "right" content. I assume it needs to be done by the updater because it will need elevated privileges on some systems. Is there something I'm missing in that respect?

At a minimum to lessen update watersheds the following would need to be done:

  1. balrog would need to not update beta clients that haven't updated to a build with the change to an RC since this file is what prevents them from moving to release.
  2. the mar generation scripts would need to add the new file with the add-if-not instruction which adds the file if it doesn't exist. This is already done for the channel-prefs.js file and should be somewhat trivial.

Without the balrog changes for beta in "1." above there would have to be an update watershed on beta to update to the build previous to an RC.

Edited to add that if we go with this change that at some point we'll want balrog to advertise updates clients that have a suffix other than beta or aurora (we might not need to do this for aurora) to release with a complete update so those clients aren't orphaned.

Hi Gijs, not pushing RCs builds to Beta channel will dramatically increase the risk of unknown issues and potentially stalling the release shortly after launch on go-live day.

RC week is code freeze and the fixes taken during this week are generally ones that could lead to a dot release shortly after launch. We typically take ~10-15 fixes which are medium-high risk. I can add more data if needed. I can also provide data on regressions caught by pushing RC2/RC3 to Beta channel.

In fact, since last year, we also began pushing Fennec RC builds to Fennec release population using GPS. Unless we can mimick a similar behavior on balrog for Firefox on release channel, not pushing RC builds to Beta channel is a high-risk ridden option and not ideal.

Flags: needinfo?(rkothari)

(In reply to Ritu Kothari (:ritu) from comment #32)

...
In fact, since last year, we also began pushing Fennec RC builds to Fennec release population using GPS. Unless we can mimick a similar behavior on balrog for Firefox on release channel, not pushing RC builds to Beta channel is a high-risk ridden option and not ideal.

Hi Ritu, what would be required to "mimic" the behavior? If it were acceptable for the beta population to not receive the RC while either allowing QA to update to it to verify the ~10-15 fixes then that can be done. If more is needed would it be acceptable to also update a small population of release users to the RC?

Flags: needinfo?(rkothari)

(In reply to Florian Quèze [:florian] from comment #26)

(In reply to :Gijs (he/him) from comment #13)

Like any IO, it's very hard to predict. On my super fast machine where this lives on an SSD, less than 1ms. On a slow reference hardware device (laptop with spinny disk), probably several ms, more if you're unlucky.

I've not had any particularly slow startup yet, but here is a profile where it took 3ms, which is enough to have it appear in the sampling: https://perfht.ml/2UvR3Tr

How much is 3ms as a percentage of the overall startup?

(In reply to Chris AtLee [:catlee] from comment #34)

(In reply to Florian Quèze [:florian] from comment #26)

(In reply to :Gijs (he/him) from comment #13)

Like any IO, it's very hard to predict. On my super fast machine where this lives on an SSD, less than 1ms. On a slow reference hardware device (laptop with spinny disk), probably several ms, more if you're unlucky.

I've not had any particularly slow startup yet, but here is a profile where it took 3ms, which is enough to have it appear in the sampling: https://perfht.ml/2UvR3Tr

How much is 3ms as a percentage of the overall startup?

On slow machines, the combination of all our mainthread IO takes up a significant proportion of overall startup (in some cases, more than half). Startup is the frontend perf team's main focus for this quarter. Even if this particular file on its own seems insignificant, the combination of all these bits of IO is not sustainable and needs to go. I don't think the exact percentage/time is all that relevant, as it will vary significantly across cold/warm startup, whether there is IO contention on the machine, etc. etc. Main thread IO is bad, and we should avoid it if at all possible.

The main question is "how". We can switch to another small file that simply gets read later and off the main thread, which the update service code inside Firefox is OK with based on earlier comments. This is not crazy amounts of work and largely doesn't affect updates. The question is if there is a reasonable way we can avoid having a separate file at all, instead of pushing the IO around. It's fine if the answer to that question is "no", then we'll just move the file access alone, but it seems worth asking the bigger picture questions.

(In reply to :Gijs (he/him) from comment #30)

I'm a little bit puzzled. What does 'watershed' mean in this context? I'm assuming the updater could rm the old channel-prefs.js file and create the new file with the "right" contents... is that a mistaken assumption?

A watershed is update jargon for a version which users must update to before continuing, eg old -> intermediate -> latest. We've used these to change the compression algorithms and crypto used in the update file packaging, where the intermediate build supports the new functionality.

That is, from the app's perspective, all that needs to happen is updating a bunch of references and creating a '.dat' or '.ini' or whatever file that only contains the channel, nothing else, and lives outside the defaults/pref/ directory. I assumed that the updater could take care of migrating the channel-prefs file to the new file, and ensuring the old file is removed and the new file has the "right" content. I assume it needs to be done by the updater because it will need elevated privileges on some systems. Is there something I'm missing in that respect?

The updates currently include channel-prefs.js with an add-if-not instruction - the idea is to leave the file alone unless it has somehow disappeared. This prevents an RC update moving a beta user to release, which a simple add instruction would do by overwriting the file.

The updater doesn't support a move instruction (here's what it does support) so we have to create the new file. I think Rob and I are suggesting we just move the existing add-if-not handling to the new file. Doing that, the baked in channel will be wrong for an update to the RC build, so beta users would need a watershed where the intermediate is a genuine beta build.

Flags: needinfo?(nthomas)

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #31)

nthomas, for my own understanding, the additional files would be the same as if there was an additional beta build though the only file that would need to be updated is in the omni.ja so there would be one for beta and one for release?

Probably less wall clock time than a whole extra beta, and we would want to minimize the differences so as to provide meaningful testing of the RC (ie no re-compilation or anything). It's definitely a non-trivial amount of setup work to repack omni.ja while preserving all the special tricks we do with it, re-sign, create updates, plug them into our update verification, etc. RC1 is also a bad time to discover bustage in your automation from something that changed up to 12 weeks ago, so there'd be a time commitment to run those tests earlier in the cycle. This doesn't make much sense to me compared to reading the channel out of a new file off-main-thread.

At a minimum to lessen update watersheds the following would need to be done:

  1. balrog would need to not update beta clients that haven't updated to a build with the change to an RC since this file is what prevents them from moving to release.

Could you remind me of the data on the effect of watersheds ? Balrog's rule-based system makes it simple to implement a watershed as a new rule, and my first reaction is that it's not good to put that sort of logic inside the core instead.

Edited to add that if we go with this change that at some point we'll want balrog to advertise updates clients that have a suffix other than beta or aurora (we might not need to do this for aurora) to release with a complete update so those clients aren't orphaned.

What situations are you thinking of here ? We'd have to be careful with release-cck-<partner>, release-localtest, and so on.

(In reply to Robert Strong (Robert they/them) [:rstrong] (use needinfo to contact me) from comment #33)

(In reply to Ritu Kothari (:ritu) from comment #32)

...
In fact, since last year, we also began pushing Fennec RC builds to Fennec release population using GPS. Unless we can mimick a similar behavior on balrog for Firefox on release channel, not pushing RC builds to Beta channel is a high-risk ridden option and not ideal.

Hi Ritu, what would be required to "mimic" the behavior? If it were
acceptable for the beta population to not receive the RC while either
allowing QA to update to it to verify the ~10-15 fixes then that can be
done. If more is needed would it be acceptable to also update a small
population of release users to the RC?

Hey! QA blessing the ~10-15 fixes in RC week is insufficient mitigation of the risk of shipping a poor release. RC builds have to be pushed to a large population. Either, a) we keep it at status quo by pushing to all of Beta channel or b) supplement/replace that with a 1-5% rollout on release population with the hope of catching launch blockers during RC week as opposed to go-live week.

Flags: needinfo?(rkothari)

(In reply to Chris AtLee [:catlee] from comment #34)

(In reply to Florian Quèze [:florian] from comment #26)

(In reply to :Gijs (he/him) from comment #13)

Like any IO, it's very hard to predict. On my super fast machine where this lives on an SSD, less than 1ms. On a slow reference hardware device (laptop with spinny disk), probably several ms, more if you're unlucky.

I've not had any particularly slow startup yet, but here is a profile where it took 3ms, which is enough to have it appear in the sampling: https://perfht.ml/2UvR3Tr

How much is 3ms as a percentage of the overall startup?

Here is another profile where it took 172ms: https://perfht.ml/2KXssmS

Marking perf:p2 given the impact, and an L (month of work) amount of effort given the amount of updater/releng/builds interaction involved and making sure nothing breaks as part of the migration.

Whiteboard: [fxperf] → [fxperf:p2] [fxperfsize:L]
Component: General → Startup and Profile System
Product: Firefox Build System → Toolkit
Performance Impact: --- → P2
Keywords: perf:startup
Whiteboard: [fxperf:p2] [fxperfsize:L] → [fxperfsize:L]
Severity: normal → S3

Fyi, channel-prefs.js is being removed and replaced by a macOS Framework in bug 1799332 and will therefore address the Mac-side of this bug.

Depends on: 1799332
You need to log in before you can comment on or make changes to this bug.