Closed Bug 1749345 Opened 2 years ago Closed 2 months ago

White screen for up to 30 sec after updating Firefox using the full installer, due to FirstStartup component

Categories

(Toolkit :: General, defect, P1)

Firefox 96
defect

Tracking

()

RESOLVED FIXED
125 Branch
Performance Impact high
Tracking Status
firefox106 --- wontfix
firefox107 - wontfix
firefox108 --- wontfix
firefox109 --- wontfix
firefox110 --- wontfix
firefox111 --- wontfix
firefox112 --- wontfix
firefox125 --- fixed

People

(Reporter: rick3162, Assigned: mconley)

References

(Depends on 3 open bugs)

Details

(Keywords: perf:startup, reproducible, Whiteboard: [fidedi-ope])

Attachments

(8 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:96.0) Gecko/20100101 Firefox/96.0

Steps to reproduce:

Starting with Firefox 93, every time I updated Firefox via the Firefox Setup installer to the newer version (93 -> 94 -> 95 -> 95.0.1 -> 95.0.2-> 96).

I have Windows 11 x64 and a fast desktop system (CPU Intel i7 9700K, SSD m2, GPU MSI GTX 1660 Ti).

What I did just now was: I had Firefox 95.0.2 x64 with my everyday profile (which has 16 extensions enabled) on windows 11 .
I simply ran Firefox Setup 96.0 full installer [1]

[1] https://ftp.mozilla.org/pub/firefox/releases/96.0/win64/en-US/Firefox%20Setup%2096.0.exe

PS. I also reported the issue in http://forums.mozillazine.org/viewtopic.php?f=38&t=3083857

Actual results:

After update completed, when I first launched Firefox, there was a white screen for 30 sec and then eventually the Firefox UI was displayed.
Note that the white screen issue only occurred right after update, never on subsequent FF launches.

The issue occurs consistently on my everyday FF profile:
if I completely uninstall +reinstall 96.0
and remove the two directories:

  • c:\Users\user\AppData\Roaming\Mozilla
  • c:\Users\user\AppData\Local\Mozilla
    and then restore the former directory from my backup,
    then the 30 sec white screen occurs consistently on first FF launch.

The issue doesn't occur on a fresh FF profile.
If any dev would like to test, I could email my everyday FF profile to him.

Expected results:

The white screen should only appear momentarily, as in previous FF versions.

The Bugbug bot thinks this bug should belong to the 'Firefox::Installer' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Installer

To clarify the STR:

  • have Firefox 95.0.2 installed,
  • close Firefox, and backup your profile directory c:\Users\user\AppData\Roaming\Mozilla
  • delete the above directory, as well as c:\Users\user\AppData\Local\Mozilla
  • update to 96.0 using Firefox Setup 96.0.exe
  • launch Firefox: at initial launch I have consistently white screen for 33 seconds (I repeated it multiple times). There is no white screen in subsequent launches.

I also attach a screenshot of the white screen.

Attached image white screen.png

Some things that came up in the forum:

  1. I installed FF 96 over top of 95. I didn't use the internal updater.

  2. I also tried the following:
    First I disabled all (16) extensions. Then repeated STR:

    • in regular mode: --> white screen for 13 sec
    • in safe mode: -----> no white screen

    I kept all extensions disabled and also I disabled HWA. Then repeated STR:

    • in regular mode: --> white screen for 8 sec
    • in safe mode: -----> no white screen

We could really use a Gecko startup profile here. I don't see great instructions written down, but I think that setting MOZ_PROFILER_STARTUP=1 and setting MOZ_PROFILE_SHUTDOWN=/path/to/profile.json should be enough -- see https://searchfox.org/mozilla-central/source/intl/benchmarks/README.md.

Could you please guide me with this procedure? I am totally unfamiliar with this.

  • Firsty, where do I download mach perftest ? I can't find a link, and it's not contained in MozillaBuild.
  • And, then, how to use the two settings you gave?
    Should I save the sh fenched code in the link you gave as e.g. script.sh , and then run mach perftest script.sh ?
    I am asking because in here it has ./mach perftest perftest_script.js, i.e. .js, not .sh

(In reply to Kostas from comment #6)

Could you please guide me with this procedure? I am totally unfamiliar with this.

  • Firsty, where do I download mach perftest ? I can't find a link, and it's not contained in MozillaBuild.
  • And, then, how to use the two settings you gave?
    Should I save the sh fenched code in the link you gave as e.g. script.sh , and then run mach perftest script.sh ?
    I am asking because in here it has ./mach perftest perftest_script.js, i.e. .js, not .sh

Oh, sorry, I was not clear. The linked document just shows the environment variables to set; no need for anything with mach perftest. (In fact, no need for mach at all.)

So if you can run:

MOZ_PROFILER_STARTUP=1 MOZ_PROFILE_SHUTDOWN=/path/to/profile.json /path/to/firefox.exe

in a Windows shell, wait for startup, and then exit (without browsing, so no URLs load), that should be enough to capture a profile for us to inspect. I'm not 100% sure how to set those environment variables in a Windows shell, but maybe you know more than me here. Thanks!

Flags: needinfo?(rick3162)

Firstly, I realized that the issue ONLY occurs if I start Firefox from inside the installer, i.e. only if I keep the 'Launch Firefox now' checkbox ticked in the "Completing the Mozilla Firefox Setup Wizard" screen, to immediately launch FF (see attached screenshot).
In contrast, if I untick it and launch Firefox afterwards, either from the command prompt or from the desktop shortcut, then the issue doesn't occur (the white screen occurs only for 3 sec, instead of 33).

Updated STR:

  • close Firefox 95, and backup your profile directory c:\Users\user\AppData\Roaming\Mozilla (from FF 95)
  • delete the above directory, as well as c:\Users\user\AppData\Local\Mozilla
  • restore c:\Users\user\AppData\Roaming\Mozilla from your backup
  • uninstall Firefox, install FF 96 (no need to install 95 first), keep 'Launch Firefox now' checkbox' checkbox ticked and press "Finish" (see screenshot): I have white screen for 33 sec.

Therefore capturing a profile from windows shell probably won't help. Am I wrong? Is there a way to capture profile when launching FF immediately after setup?

 
In any case I tried both in windows console (cmd) :

set MOZ_PROFILER_STARTUP=1 
set MOZ_PROFILE_SHUTDOWN=/path/to/profile.json 
cd "C:\Program Files\Mozilla Firefox\firefox.exe"
firefox.exe

and inside Git Bash (MINGW64) these two lines:

cd ../../Program\ Files/Mozilla\ Firefox/
MOZ_PROFILER_STARTUP=1 MOZ_PROFILE_SHUTDOWN=/path/to/profile.json ./firefox.exe

And in both cases I waited for startup and then exit, and there was no output in console. Am I doing something wrong?

Flags: needinfo?(rick3162)
Severity: -- → S4

(In reply to Kostas from comment #8)

I realized that the issue ONLY occurs if I start Firefox from inside the installer

Are you frequently starting Firefox that way? Can I ask why?

Flags: needinfo?(rick3162)

I only start Firefox that way when there's a new version available. I keep track of the scheduled release dates, and when a version is available I immediately download it from https://ftp.mozilla.org/pub/firefox/releases/ (usually that is 1 day before the scheduled release date).
I never wait for the built-in updater, I want to update as soon as a new build gets released.

Flags: needinfo?(rick3162)

The same occurred today (white screen for 33 sec) after updating 96.0.3 to 97.0 via running the installer and launching FF from inside the installer.

If it helps, my everyday profile has (as mentioned in the OP) 16 extensions enabled, and 10165 bookmarks.

I don't think that we are going to be able to do anything about this without some diagnostic information. I believe that there is nothing preventing a startup profile from being collected when using the installer. Could you please set the environment variables MOZ_PROFILER_STARTUP=1 and MOZ_PROFILE_SHUTDOWN=<output_file_path> before running the installer? If you send us the output JSON file, we might be able to figure out why your startup is so slow.

Flags: needinfo?(rick3162)

If you send us the output JSON file, we might be able to figure out why your startup is so slow.

I ran in command prompt the following:

set MOZ_PROFILER_STARTUP=1 
set MOZ_PROFILE_SHUTDOWN=/path/to/profile.json 
Firefox Setup 97.0.exe

and chose to launch Firefox in the installer, waited for the 33 sec delay for FF to launch and then I closed it.
In the end, there was no JSON output in command prompt (I attach a screenshot). And no profile.json file was created anywhere in my ssd (I searched the entire disk).

So, what output JSON file are you referring to? Where is it created? Inside the profile directory? What's its filename?
As I also asked in comment #8, am I doing something wrong?

Flags: needinfo?(rick3162)
Attached image command prompt.jpg

I believe that whatever path you assign to MOZ_PROFILE_SHUTDOWN is where the output file will be saved. You probably don't want to use /path/to/profile.json. Instead you might want something like C:\Users\Kostas\profile.json.

Flags: needinfo?(rick3162)

Ok, I tried all the following but unfortunately no profile.json file gets created in any case:

set MOZ_PROFILE_SHUTDOWN=C:\Users\Kostas\profile.json
set MOZ_PROFILE_SHUTDOWN="C:\Users\Kostas\profile.json"
set MOZ_PROFILE_SHUTDOWN=profile.json
set MOZ_PROFILE_SHUTDOWN=/Users/Kostas/profile.json
Flags: needinfo?(rick3162)

Hmm, I tried this out, and I'm getting the same experience. At first I thought that MOZ_PROFILER_STARTUP and MOZ_PROFILE_SHUTDOWN were getting dropped out of the environment somewhere between the installer starting and Firefox starting. But I can actually see them in Firefox in about:support. So they are being set. But I'm seeing no profile.json being created.

These are the steps I tried:

  1. In my (MSYS) shell, I ran MOZ_PROFILER_STARTUP=1 MOZ_PROFILE_SHUTDOWN=/c/Users/bytesized/profile.json firefox-98.0a1.en-US.win64.installer.exe
  2. I ran through the installer, as usual. I made sure to check the "Launch Firefox" checkbox at the end of the installer.
  3. When Firefox launched, I navigated to about:support and verified that MOZ_PROFILER_STARTUP and MOZ_PROFILE_SHUTDOWN were set appropriately.
  4. Exited Firefox
  5. Checked for profile.json in the specified directory and found that it was not there.

@mossop I've never actually created a startup profile before. Am I doing it incorrectly?

Flags: needinfo?(dtownsend)

(In reply to Kirk Steuber (he/him) [:bytesized] from comment #18)

@mossop I've never actually created a startup profile before. Am I doing it incorrectly?

I've actually never done it either. Mike can you help or recommend someone who can?

Flags: needinfo?(dtownsend) → needinfo?(mconley)

Going to redirect to julianw who probably has a better handle on how this stuff currently works.

Flags: needinfo?(mconley) → needinfo?(felash)

I wonder if the sandbox gets in the way here.
Redirecting to Gerald who knows Windows and startup profiling way better than I do :-)

Flags: needinfo?(felash) → needinfo?(gsquelart)

TL;DR for Gerald: we want to run a startup profile for Firefox running directly from the installer.
But the first step could be to understand how to run a startup profile on Windows at all. Do you have tips and tricks?

(In reply to Kirk Steuber (he/him) [:bytesized] from comment #18)

  1. In my (MSYS) shell, I ran MOZ_PROFILER_STARTUP=1 MOZ_PROFILE_SHUTDOWN=/c/Users/bytesized/profile.json firefox-98.0a1.en-US.win64.installer.exe

If that's exactly what you wrote, the error is a missing 'R' in the 2nd env-var, it should be: MOZ_PROFILER_SHUTDOWN=....

If you still can't find the expected shutdown file, you could try without it since we're only interested in the startup:

  1. MOZ_PROFILER_STARTUP=1 installer.exe
  2. When Firefox has started, quickly navigate to profiler.firefox.com, and click "+ enable Firefox Profiler Menu Button".
  3. There should be a popup appearing near the top-right (otherwise click the profiler icon), click "Capture".
  4. After a few seconds transferring the data, and then getting symbols, you should end up on the profiler web page with your startup profile.
  5. click "Upload Local Profile", you can hide some private data if you wish, then click "Upload".
    This should give a URL starting with "share.firefox.dev", which can be shared here or privately.

Good luck! And thank you Kostas for trying to help.

Flags: needinfo?(gsquelart)

Gerald, thanks for the instructions. Following them I managed to make a profile successfully (Ι ticked all 5 checkboxes) .
I've just sent the profile link to Nick, Kirk, Julien and Gerald's emails as I'd prefer to be shared privately.

You're welcome! Always glad to be of help!

Normally I would ask debugging questions of other engineers publicly in this bug. But since the profile link was sent privately, that doesn't really work super well. So I'm going to internally ask some other engineers to help me look into this and I'll update this bug with our findings.

Sorry for the inconvenience, it's just that I hesitate to post the profile publicly. Thanks for all the help.

(In reply to Kostas from comment #27)

Sorry for the inconvenience, it's just that I hesitate to post the profile publicly. Thanks for all the help.

Not a problem :)


It looks like we may have tracked down the source of the problem. There is a component called FirstStartup that I wasn't previously aware of. It only runs when Firefox was launched from the installer and it explicitly blocks the first application window from loading until it completes, which sounds consistent with your experience. And it times out (gives up) after 30 seconds. It looks from the profile provided that your installation is indeed spending the full 30 seconds waiting for it. It's not immediately clear to me what the reason is for it is consistently timing out for you though.

It's also not entirely clear to me what the right fix is. This feature wasn't created by this team, so I'm going to try to get some people involved that can help figure out what we want to do about this.

@rhelmer - When we talked about this, it sounded like you thought that this code may not be necessary at all anymore. Could you redirect to someone that would know for sure?

Flags: needinfo?(rhelmer)

(In reply to Kirk Steuber (he/him) [:bytesized] from comment #28)

(In reply to Kostas from comment #27)

Sorry for the inconvenience, it's just that I hesitate to post the profile publicly. Thanks for all the help.

Not a problem :)


It looks like we may have tracked down the source of the problem. There is a component called FirstStartup that I wasn't previously aware of. It only runs when Firefox was launched from the installer and it explicitly blocks the first application window from loading until it completes, which sounds consistent with your experience. And it times out (gives up) after 30 seconds. It looks from the profile provided that your installation is indeed spending the full 30 seconds waiting for it. It's not immediately clear to me what the reason is for it is consistently timing out for you though.

It's also not entirely clear to me what the right fix is. This feature wasn't created by this team, so I'm going to try to get some people involved that can help figure out what we want to do about this.

@rhelmer - When we talked about this, it sounded like you thought that this code may not be necessary at all anymore. Could you redirect to someone that would know for sure?

Thanks! Kate, I see isFirstStartup being exposed by Nimbus, do you know if this feature is still in-use and required?

If so, we should look into why the UI is showing at all (there may have been a startup optimization in the meantime), and think about adjusting the time-out. We have telemetry on this that should help to see how many users are running into this. I think it would also be appropriate to re-consider if this is how we want to do first-startup experiments (an alternative would be modifying the installers to bundle the info they need vs. fetching on first-startup-after-install like this, we modify installers to add attribution codes for instance) since the current approach hasn't been re-visited in a several years, now.

Flags: needinfo?(rhelmer) → needinfo?(khudson)
Flags: needinfo?(khudson) → needinfo?(brennie)

I can reproduce this with release installers (albeit for ~15s instead of 30s, which is probably due to system factors), but I cannot seem to reproduce this with an installer produced by ./mach package with --enable-release. It also seems to require a profile that's been used somewhat. I've created a new profile and upgraded over it and it doesn't reproduce.

Some possible workarounds I've thought of include

  • checking ProfileAge.jsm and baililng out early if we are in an existing profile
  • having the installer detect we have an existing profile and not pass --first-startup
  • moving first startup into the installer to run while we install firefox

I'm not sure why we are painting a blank window as I was under the impression that we should only start painting once First Startup finished.

Flags: needinfo?(brennie)

It seems like this issue is being caused within Firefox rather than by the installer, so I'm going to try to find a better home for it where it is more likely to be fixed.

(In reply to Robert Helmer [:rhelmer] from comment #29)

I see isFirstStartup being exposed by Nimbus

Perhaps Nimbus is a better place for it?

Severity: S4 → --
Component: Installer → Nimbus Desktop Client

I don't know that this is actually Nimbus. While Nimbus does check isFirstStartup, the FirstStartup module itself is its own thing

Moving to Toolkit :: General because that was that was the component for bug 1576507, which introduced FirstStartup.

Component: Nimbus Desktop Client → General
Product: Firefox → Toolkit

The severity field is not set for this bug.
:mossop, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dtownsend)
Performance Impact: --- → ?

I've cleared this needinfo and punted to the perf team for Performance triage.

Flags: needinfo?(dtownsend)

Hey there, is there something else we need to do for this bug to be triaged by the perf team?

Flags: needinfo?(bas)

Jeff, does Graphics still do certain things on first startup for a new profile?

Flags: needinfo?(bas) → needinfo?(jmuizelaar)

Barrett, I can't seem to reproduce this locally, any chance (if you can still reproduce this) you might be able to grab a startup profile? (See: https://profiler.firefox.com/docs/#/./guide-startup-shutdown)

Flags: needinfo?(brennie)

The Performance Priority Calculator has determined this bug's performance priority to be P2. If you'd like to request re-triage, you can reset the Performance flag to "?" or needinfo the triage sheriff.

Platforms: Windows
Impact on browser UI: Causes noticeable startup delay
[x] Able to reproduce locally

Performance Impact: ? → medium

There's some quick graphics tests that can happen but if we can reproduce this we should just look at what the profile says is going on.

Flags: needinfo?(jmuizelaar)

I was able to reproduce. The white screen only lasted a few seconds, and the blank white paint was not captured in the profiler screenshots, but it happened.
profile

Flags: needinfo?(brennie)

(In reply to Robert Helmer [:rhelmer] from comment #29)

If so, we should look into why the UI is showing at all (there may have been a startup optimization in the meantime), and think about adjusting the time-out. We have telemetry on this that should help to see how many users are running into this.

mak and I found some two-year old telemetry data on this suggesting that 3% of first startup users might be timing out. That seems like a lot given the bad this failure seems from a user's perspective, on a first run? Setting severity to S2 for that. Do we have some more recent data on this? The original probes seem to have expired.

Severity: -- → S2
Flags: needinfo?(rhelmer)

(In reply to Bas Schouten (:bas.schouten) from comment #38)

The Performance Priority Calculator has determined this bug's performance priority to be P2. If you'd like to request re-triage, you can reset the Performance flag to "?" or needinfo the triage sheriff.

Platforms: Windows
Impact on browser UI: Causes noticeable startup delay
[x] Able to reproduce locally

I think it might be a good idea to reevaluate, see previous comments.

Performance Impact: medium → ?

(In reply to Dão Gottwald [::dao] from comment #41)

(In reply to Robert Helmer [:rhelmer] from comment #29)

If so, we should look into why the UI is showing at all (there may have been a startup optimization in the meantime), and think about adjusting the time-out. We have telemetry on this that should help to see how many users are running into this.

mak and I found some two-year old telemetry data on this suggesting that 3% of first startup users might be timing out. That seems like a lot given the bad this failure seems from a user's perspective, on a first run? Setting severity to S2 for that. Do we have some more recent data on this? The original probes seem to have expired.

Two things come to mind:

  1. that timeout is likely way too long, the payload here is tiny. I'd guess that this is more due to folks blocking connections or being offline.
  2. we should re-enable telemetry

That's all assuming this feature is still in use and useful for first-run experiments (where remote data is required before the first time Firefox launches). I haven't worked on this in a few years but happy to help get it fixed.

Flags: needinfo?(rhelmer)
Depends on: 1789850
Depends on: 1789851
Depends on: 1789852
Summary: White screen for up to 30 sec after updating Firefox using the full installer → White screen for up to 30 sec after updating Firefox using the full installer due to FirstStartup component
Summary: White screen for up to 30 sec after updating Firefox using the full installer due to FirstStartup component → White screen for up to 30 sec after updating Firefox using the full installer, due to FirstStartup component
Status: UNCONFIRMED → NEW
Ever confirmed: true

(In reply to Dão Gottwald [::dao] from comment #41)

mak and I found some two-year old telemetry data on this suggesting that 3% of first startup users might be timing out.

I'm going to count this as "multiple reporters". This ups the impact to "high".

The Performance Impact Calculator has determined this bug's performance impact to be high. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Platforms: Windows
Impact on browser UI: Causes noticeable startup delay
[x] Able to reproduce locally
[x] Multiple reporters

Performance Impact: ? → high

Any update on this? Were you able to generate some attention for this to get prioritized, molly? Do we even need that at this point or can Rob go ahead and own this?

Flags: needinfo?(rhelmer)
Flags: needinfo?(mhowell)

I did run this by product, and they told me they'd add this to their list to prioritize for 107; I think they've been too busy with MR stuff to really pay a lot of attention. I don't think we need anything else there if Rob is willing to take this.

Flags: needinfo?(mhowell)
Assignee: nobody → rhelmer
Flags: needinfo?(rhelmer)

[Tracking Requested - why for this release]: See comments 41 and below.

Has there been any progress on this?
Tracking it for 107 as requested in comment 47.
Comment 46 indicated it might be addressed for 107, but wondering if that is still expected.
107 is now in beta, uplifts to 107 beta would depend on the fix

Flags: needinfo?(rhelmer)

(In reply to Donal Meehan [:dmeehan] from comment #48)

Has there been any progress on this?
Tracking it for 107 as requested in comment 47.
Comment 46 indicated it might be addressed for 107, but wondering if that is still expected.
107 is now in beta, uplifts to 107 beta would depend on the fix

We should re-enable telemetry so we can measure the current situation (and be able to tell if a fix actually works), I'll do that today. Thanks for the heads-up sorry for the delay.

Flags: needinfo?(rhelmer)
Attachment #9299598 - Attachment description: WIP: Bug 1749345 - re-enable telemetry for FirstStartup service → Bug 1749345 - re-enable telemetry for FirstStartup service
Attachment #9299598 - Attachment description: Bug 1749345 - re-enable telemetry for FirstStartup service → Bug 1749345 - re-enable telemetry for FirstStartup service r?chutten

I've put up a patch to re-enable Telemetry for this feature, and remove the expiration (that latter part might require a data review). Telemetry should start being sent for:

  • statusCode: timed out, success etc.
  • elapsed: elapsed time in ms before success or timeout

What I'd like to do next is to run an experiment reducing the the first-startup.timeout pref (per the docs) and see how that changes the above values.

I suspect that the current timeout is way too high, and that making it much lower (a few seconds) will still enroll a sufficient number of users to run experiments, while not showing a blank white screen for an extended period of time. We can roll out this out as a pref change once we've determined what a better value is here, and make a change to the default (which can ride the trains but we don't need to block on it to make things better for users).

Longer-term I think it would be better to get this feature out of the startup path, and maybe out of Firefox entirely - it seems like fetching experiments and placing them somewhere Nimbus can access could be an action the installer does or maybe a separate process that the installer runs, or something along those lines.

I see that there's a WDBA-related cleanup task that happens in FirstStartup as well as of https://bugzilla.mozilla.org/show_bug.cgi?id=1705281 - so the impact of timeout means that this would not be finished as well.

I don't think this needs to change the pref experiment/measure/rollout approach outlined in comment 51, but the impact of a timeout is not just that we didn't download experiments, it could be that this cleanup task didn't finish. It's probably worth adding a little bit more to telemetry so we can tell the difference, I can add that as well.

(In reply to Robert Helmer [:rhelmer] from comment #51)

What I'd like to do next is to run an experiment reducing the the first-startup.timeout pref (per the docs) and see how that changes the above values.

I suspect that the current timeout is way too high, and that making it much lower (a few seconds) will still enroll a sufficient number of users to run experiments, while not showing a blank white screen for an extended period of time. We can roll out this out as a pref change once we've determined what a better value is here, and make a change to the default (which can ride the trains but we don't need to block on it to make things better for users).

Small correction here - it occurred to me that since this is too early in startup to be able to reliably pull this pref from the experimenter system, we'll have to let the pref change ride the trains to take effect. We should still re-enable telemetry as a first step, we can then watch the change as it rolls out across the channels.

(In reply to Kostas from comment #25)

Gerald, thanks for the instructions. Following them I managed to make a profile successfully (Ι ticked all 5 checkboxes) .
I've just sent the profile link to Nick, Kirk, Julien and Gerald's emails as I'd prefer to be shared privately.

You're welcome! Always glad to be of help!

I asked Julien to share this profile with me. Here is what I see in it:

  • The add-on manager starts first.
  • Some add-ons (including at least one ad blocker) get initialized asynchronously at this point.
  • We spin a "FirstStartup.jsm:init" nested event loop, that lasts 30.2s. This nested event loop starts before add-ons are done initializing.
  • During the nested event loop, there's a network request being made to https://normandy.cdn.mozilla.net/api/v1/ This is the first network request of the entire profile, and it remains in the "Waiting for socket thread" state (a slightly misleading name used in the profiler network markers to say "not really started yet") for 31.3s.

So my impression is that something (quite possibly the ad blocker) messes with Firefox's ability to do network requests during the "FirstStartup.jsm:init" nested event loop. For engineers trying to reproduce, I would suggest installing uBlock Origin in the profile used for testing.

(In reply to Barret Rennie [:barret] (they/them) from comment #30)

I'm not sure why we are painting a blank window as I was under the impression that we should only start painting once First Startup finished.

There's a 32.9s earlyBlankWindowVisible marker, which indicates the code at https://searchfox.org/mozilla-central/rev/12a18f7e112a4dcf88d8441d439b84144bfbe9a3/browser/components/BrowserGlue.jsm#1461 ran before "First Startup" (which I think is expected, as that code tries to be the first piece of JS code to run during early startup).

Hi, I'm the tech lead for the Nimbus project. Nimbus uses the isFirstStartup module to remotely fetch experiments on the first start of a new install specifically for experiments targeting the onboarding experience (about:welcome). For non first startup initializations, the fetch happens asynchronously (ie not blocking first paint) and applies experiment enrollment and configuration in the background. It seems to me in this particular case, ie using the installer to overwrite an existing installation with an existing profile, there is no need for that blocking fetch to occur since when the browser session is available, no onboarding experience will trigger since it's taking place in an existing profile. Then a possible fix to this is to not trigger that isFirstStartup blocking remote fetch at all for cases where the browser is starting with an existing profile. Whether that means not running the isFirstStartup module at all after the installer runs on an existing profile (which may have other impacts not related to Nimbus), or simply preventing the blocking Remote Settings fetch within the isFirstStartup module for cases where a profile exists, I'm not certain.

:rhelmer, we are near the end of the beta cycle for 107. The final 107 builds on 2022-11-03.
Given Comment 54 and Comment 57, does not look like we have a fix ready for this.
Added tracking to 108 for the next release as this is an S2.
Setting 107 to fix-optional, if later you have a fix then we could consider it as a ride-along for a 107 dot release.

Flags: needinfo?(rhelmer)

(In reply to Donal Meehan [:dmeehan] from comment #58)

:rhelmer, we are near the end of the beta cycle for 107. The final 107 builds on 2022-11-03.
Given Comment 54 and Comment 57, does not look like we have a fix ready for this.
Added tracking to 108 for the next release as this is an S2.
Setting 107 to fix-optional, if later you have a fix then we could consider it as a ride-along for a 107 dot release.

Thanks, I need to get a data review on the telemetry patch. Jared made a good observation in https://bugzilla.mozilla.org/show_bug.cgi?id=1749345#c57 that this is likely only happening when there's an existing profile, so I might be able to craft a real fix for this and not just work around it.

In any case I would like to get the telemetry landed first.

Flags: needinfo?(rhelmer)

did you plan on landing the telemetry in 108 or just let this ride the trains?

Flags: needinfo?(rhelmer)

(In reply to Dianna Smith [:diannaS] from comment #60)

did you plan on landing the telemetry in 108 or just let this ride the trains?

I have a hunch what's going on here and I want to instrument the telemetry a bit better, if it's possible to uplift to 108 that would be great.

Flags: needinfo?(rhelmer)

Hey rhelmer, what's the likelihood you'll have time to come back to this in the short term? If it's a low likelihood, should we unassign you and find someone to drive this over the line?

Flags: needinfo?(rhelmer)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #62)

Hey rhelmer, what's the likelihood you'll have time to come back to this in the short term? If it's a low likelihood, should we unassign you and find someone to drive this over the line?

If someone else is available that'd be great, happy to chat about it but the last few months did not pan out the way I expected. Thanks!

Assignee: rhelmer → nobody
Flags: needinfo?(rhelmer)
Assignee: nobody → mconley
Priority: -- → P1
Attachment #9299598 - Attachment description: Bug 1749345 - re-enable telemetry for FirstStartup service r?chutten → Bug 1749345 - re-enable telemetry for FirstStartup service r?chutten!
Attachment #9299615 - Attachment description: WIP: Bug 1749345 - correct documentation regarding default FirstStartup timeout → Bug 1749345 - correct documentation regarding default FirstStartup timeout. r?rhelmer!
Attachment #9320098 - Flags: data-review?(chutten)

Comment on attachment 9320098 [details]
Data collection request.md

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Mike Conley <mconley@mozilla.com> and Robert Helmer <rhelmer@mozilla.com> are responsible.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does the data collection use a third-party collection tool?

No.


Result: datareview+

Attachment #9320098 - Flags: data-review?(chutten) → data-review+
Attachment #9299598 - Attachment description: Bug 1749345 - re-enable telemetry for FirstStartup service r?chutten! → Bug 1749345 - re-enable telemetry for FirstStartup service. data-review=chutten, r?chutten!
Duplicate of this bug: 1789851
Attachment #9299615 - Attachment description: Bug 1749345 - correct documentation regarding default FirstStartup timeout. r?rhelmer! → Bug 1749345 - correct documentation regarding default FirstStartup timeout. r?trivial
Attachment #9299615 - Attachment description: Bug 1749345 - correct documentation regarding default FirstStartup timeout. r?trivial → Bug 1749345 - correct documentation regarding default FirstStartup timeout. r=trivial
Duplicate of this bug: 1789850
Keywords: leave-open
Attachment #9299615 - Attachment description: Bug 1749345 - correct documentation regarding default FirstStartup timeout. r=trivial → Bug 1749345 - correct documentation regarding default FirstStartup timeout. r?rhelmer!
Attachment #9299615 - Attachment description: Bug 1749345 - correct documentation regarding default FirstStartup timeout. r?rhelmer! → Bug 1749345 - correct documentation regarding default FirstStartup timeout. r=trivial
Attachment #9299615 - Attachment description: Bug 1749345 - correct documentation regarding default FirstStartup timeout. r=trivial → Bug 1749345 - correct documentation regarding default FirstStartup timeout.
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e492108d710d
re-enable telemetry for FirstStartup service. data-review=chutten, r=chutten
https://hg.mozilla.org/integration/autoland/rev/f4089978f004
correct documentation regarding default FirstStartup timeout. r=kcochrane
See Also: → 1819720

Comment on attachment 9299598 [details]
Bug 1749345 - re-enable telemetry for FirstStartup service. data-review=chutten, r?chutten!

Beta/Release Uplift Approval Request

  • User impact if declined: None.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is a Telemetry uplift to give us greater visibility into how long the FirstStartup module is taking to do its thing after an install from the stub installer on Windows. The sooner we know how this is operating in the wild, the sooner we can tune the timeout from 30s to something more reasonable.
  • String changes made/needed: None.
  • Is Android affected?: No
Attachment #9299598 - Flags: approval-mozilla-beta?
Attachment #9299615 - Flags: approval-mozilla-beta?

Comment on attachment 9299598 [details]
Bug 1749345 - re-enable telemetry for FirstStartup service. data-review=chutten, r?chutten!

Approved for 111.0 RC1

Attachment #9299598 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9299615 [details]
Bug 1749345 - correct documentation regarding default FirstStartup timeout.

Approved for 111.0 RC1

Attachment #9299615 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Setting 111 to affected, bug has a leave-open keyword and the uplifts in Comment 73 are telemetry only

Regressions: 1820068

Comment on attachment 9299598 [details]
Bug 1749345 - re-enable telemetry for FirstStartup service. data-review=chutten, r?chutten!

Since the bug is in a leave-open state, removing uplift approved flag to remove this from missed uplift queries.

Attachment #9299598 - Flags: approval-mozilla-beta+
Attachment #9299615 - Flags: approval-mozilla-beta+

Enough time has passed that we should have a clear sense of how long this is taking in the wild given the probe we landed around comment 70, so needinfo'ing myself to look at the probe data and determine if we can shorten the timeout to something a little more reasonable than 30 seconds.

Flags: needinfo?(mconley)

We've got data visualized here: https://glam.telemetry.mozilla.org/fog/probe/first_startup_elapsed/explore?aggType=avg&app_id=release&currentPage=1&ping_type=*&ref=2023060821&visiblePercentiles=%5B99%2C95%2C75%2C50%2C25%2C5%5D

Hey rhelmer, what do you think of this? Do you think 10s would do? That'll get us past the 75th percentile... what threshold should we be looking for here?

Flags: needinfo?(mconley) → needinfo?(rhelmer)

(In reply to Mike Conley (:mconley) (:⚙️) from comment #77)

We've got data visualized here: https://glam.telemetry.mozilla.org/fog/probe/first_startup_elapsed/explore?aggType=avg&app_id=release&currentPage=1&ping_type=*&ref=2023060821&visiblePercentiles=%5B99%2C95%2C75%2C50%2C25%2C5%5D

Hey rhelmer, what do you think of this? Do you think 10s would do? That'll get us past the 75th percentile... what threshold should we be looking for here?

Thanks for re-enabling this (and upgrading to Glean!) So I think the trade-off is really that it will impact enrollment for studies that need to be run on first-run. I'd guess that the biggest impact is that it might take longer to enroll a sufficient sample of users, because some amount of them will time out. The other side of the trade-off is that we want the installation to be a good experience for users and not appear to hang.

Jared, do you have any data from Nimbus that might help here? Thanks!

Flags: needinfo?(rhelmer) → needinfo?(jkerim)

So I spoke to jlockhart about this over Zoom.

The problem with just landing a number here, like 15, or 10, as our new timeout, is that it potentially biases the population experiencing onboarding treatments. We (currently) have no idea how much bias it would introduce if any, but that's the question that needs to be answered before we can settle on a number here. jlockhart suggests I reach out to Daniel Berry from data science to see if we can come up with a sense of whether or not there's wiggle room to move this timeout without introducing population bias.

Flags: needinfo?(jkerim)
See Also: → 1841138
See Also: → 1843904
See Also: → 1868951

Mike, what is the result of the probe? This is a P1/S2 and it seems users are still hitting it. Thanks!

I'm looking at the Normandy init times here: https://glam.telemetry.mozilla.org/fog/probe/first_startup_normandy_init_time/explore?aggType=avg&app_id=release&currentPage=1&ping_type=*&timeHorizon=ALL

It looks like in some cases, Normandy init time is taking 30 seconds or more. The issue is that if we reduce the timeout length, we will skew the population of people enrolled in startup experiments, introducing bias.

What we desperately need here is an understanding of why Normandy init is so slow. Going through the comments here again, it looks like we've tried (and failed) to get performance profiles recorded which might help us get a sense of where the bottleneck is. Still, I'm convinced at this point that that's the right way to move this forward - get a performance profile from one of the affected users, identify and fix the bottleneck - as opposed to changing the timeout.

Ah, I realize now that there's an older profile in comment 40 (thanks for pointing that out, florian)

florian also directed me to comment 57, which I guess I'd missed. jlockhart's solution sounds good to me - if we can have the installer skip the --first-startup step in the presence of an existing user profile, we might be in business to improve this.

We should also be making sure that calls to --first-startup keep the stub installer window open and in the foreground, so that we include this initialization as part of the stub installer experience, rather than it being part of browser startup.

Hey bytesized, who's working on the installer these days? Who should we talk to about making such changes?

Flags: needinfo?(bytesized)

That would be my team, Desktop Integrations. But the installer has no knowledge of profiles, so I would guess that it won't be easy to make the change you are describing. It might be easier to have Firefox ignore the switch in the presence of an existing profile instead, since Firefox already knows how to check profile existence.

Flags: needinfo?(bytesized)

That sounds reasonable.

Hey rhelmer, do you have cycles for this (comment 84)?

Flags: needinfo?(rhelmer)

(In reply to Mike Conley (:mconley) (:⚙️) (PTO: Dec 19 - Jan 2 inc) from comment #85)

That sounds reasonable.

Hey rhelmer, do you have cycles for this (comment 84)?

The end of the year is looking pretty busy for me, I will probably have cycles in the next few weeks. If anyone else wants to look at it sooner i think comment 84 makes sense and I could help someone else get started too.

Flags: needinfo?(rhelmer)

I see that we can easily have the browser ignore first-startup around https://searchfox.org/mozilla-central/rev/3bd65516eb9b3a9568806d846ba8c81a9402a885/browser/components/BrowserContentHandler.sys.mjs#625, but there's an interaction between painting and the installer that I'm not 100% sure will work. In any case, I've added a Jira ticket that will get this into the Desktop Integrations triage.

Whiteboard: [fidedi-ope]

Just double-checking here - nalexander, if a user is installing Firefox, and they've got a pre-existing install AND a pre-existing user profile, is it still necessary to delete all of the TaskScheduler tasks here: https://searchfox.org/mozilla-central/rev/961556b584938a694d7df6e7bf25d70253c37060/toolkit/modules/FirstStartup.sys.mjs#62

or is that something we can also skip in the pre-existing profile case?

Flags: needinfo?(nalexander)

The thinking here being that if a previous profile exists, then the
--first-startup argument is probably be passed because the user is
reinstalling on a system that still has (or once had) the browser
already installed on it. In that case, we're going to use that
pre-existing profile, and we don't need to do the FirstStartup
things, since they're primarily for systems where a new profile
is being created after install.

(In reply to Mike Conley (:mconley) (:⚙️) from comment #88)

Just double-checking here - nalexander, if a user is installing Firefox, and they've got a pre-existing install AND a pre-existing user profile, is it still necessary to delete all of the TaskScheduler tasks here: https://searchfox.org/mozilla-central/rev/961556b584938a694d7df6e7bf25d70253c37060/toolkit/modules/FirstStartup.sys.mjs#62

or is that something we can also skip in the pre-existing profile case?

I think we can skip deleting them.

Flags: needinfo?(nalexander)
Keywords: leave-open
Blocks: 1877545
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f6de722199b9
Skip FirstStartup initting if a previous profile exists. r=nalexander,rhelmer

Hey tmaity, this seems like an important one for us to get right. Are there any cycles for QA to verify this before it goes out to broader release?

Flags: needinfo?(tmaity)
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 124 Branch

Hani is working to verify this issue.

Flags: needinfo?(tmaity) → needinfo?(hyacoub)
No longer duplicate of this bug: 1789850

I couldn't managed to reproduce the issue following the exact steps mentioned in comment 8 using the profiles that I already have on my end. I tried on the builds where this bug was initially discovered Firefox 95 and Firefox 96 and also on recent Firefox builds (Firefox 100 and 101). On both cases the white screen lasted for a couple of seconds.

Mike, are there any other steps to follow which could help us reproducing this bug?
Thanks.

Flags: needinfo?(hyacoub) → needinfo?(mconley)

Mike, are there any other steps to follow which could help us reproducing this bug?

It might be helpful to reproduce the "bad" case if you artificially slow your network down so that downloading the startup information for Normandy takes longer. Do you have tools for slowing down your network?

Flags: needinfo?(mconley) → needinfo?(hyacoub)

If you could recommend one that would be great.

Flags: needinfo?(hyacoub) → needinfo?(mconley)

Backed out changeset f6de722199b9 (bug 1749345) backout on mconley's request.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 124 Branch → ---

Hey Hani,

You could try Fiddler, which is a free-to-try Windows utility: https://www.telerik.com/fiddler/fiddler-classic

Simulating slow networks with Fiddler: https://blog.testproject.io/2021/09/27/simulating-slow-network-connection-for-testing/#Fiddler

Flags: needinfo?(mconley)
Attachment #9376712 - Attachment description: Bug 1749345 - Skip FirstStartup initting if a previous profile exists. r?nalexander!,rhelmer! → Bug 1749345 - Skip FirstStartup initting if a previous profile exists. r?nalexander!,rhelmer!,aminomancer!

Thanks Mike, I managed to reproduce the white screen following the steps from comment 8 using Fiddler to simulate a slow Internet connection.
Since this bug was reopened I will postpone the verification.

I'll go ahead and re-land the fix. The test can land separately.

Keywords: leave-open
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3908f0d77cd1
Skip FirstStartup initting if a previous profile exists. r=nalexander,rhelmer,aminomancer
Keywords: leave-open
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3812a275a1e1
Test that FirstStartup initting is only run for new profiles. r=rhelmer
Status: REOPENED → RESOLVED
Closed: 3 months ago2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: