Closed Bug 1732076 Opened 4 years ago Closed 3 years ago

Very slow startup of Firefox on Windows causes geckodriver to kill the process after the startup timeout (error: No connection could be made because the target machine actively refused it)

Categories

(Toolkit :: Startup and Profile System, defect)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: boni.garcia, Unassigned)

References

Details

Attachments

(9 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36

Steps to reproduce:

A very basic Selenium WebDriver test using Firefox and executed on Windows fails from time to time.

Environment:

OS: Windows Server 2019
Browser: Firefox
Browser version: 92.0
Browser Driver version: geckodriver 0.30.0
Selenium WebDiver: Java language bindings (selenium-java 4.0.0-rc-1)

Actual results:

I run this test in windows-latest (Windows Server 2019) on GitHub Actions. Randomly, it fails showing the following error traces (see the complete execution logs):

Sep 17, 2021 10:05:34 PM org.openqa.selenium.os.OsProcess destroy
INFO: Unable to drain process streams. Ignoring but the exception being swallowed follows.
org.apache.commons.exec.ExecuteException: The stop timeout of 2000 ms was exceeded (Exit value: -559038737)
	at org.apache.commons.exec.PumpStreamHandler.stopThread(PumpStreamHandler.java:295)
	at org.apache.commons.exec.PumpStreamHandler.stop(PumpStreamHandler.java:181)
	at org.openqa.selenium.os.OsProcess.destroy(OsProcess.java:135)
	at org.openqa.selenium.os.CommandLine.destroy(CommandLine.java:152)
	at org.openqa.selenium.remote.service.DriverService.stop(DriverService.java:278)
	at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:151)
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:547)
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:244)
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:166)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:205)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:193)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:171)
	at io.github.bonigarcia.webdriver.junit4.ch2.helloworld.HelloWorldFirefoxJUnit4Test.setup(HelloWorldFirefoxJUnit4Test.java:46)
	...

Sep 17, 2021 10:05:34 PM org.openqa.selenium.os.OsProcess destroy
SEVERE: Unable to kill process java.lang.ProcessImpl@48d7ad8b
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 141.686 sec <<< FAILURE!
test(io.github.bonigarcia.webdriver.junit4.ch2.helloworld.HelloWorldFirefoxJUnit4Test)  Time elapsed: 140.584 sec  <<< ERROR!
org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Response code 500. Message: No connection could be made because the target machine actively refused it. (os error 10061)
Build info: version: '4.0.0-rc-1', revision: 'bc5511cbda'
System info: host: 'fv-az159-878', ip: '10.1.0.165', os.name: 'Windows Server 2019', os.arch: 'amd64', os.version: '10.0', java.version: '1.8.0_302'
Driver info: org.openqa.selenium.firefox.FirefoxDriver
Command: [null, newSession {capabilities=[Capabilities {acceptInsecureCerts: true, browserName: firefox, moz:debuggerAddress: true, moz:firefoxOptions: {}}], desiredCapabilities=Capabilities {acceptInsecureCerts: true, browserName: firefox, moz:debuggerAddress: true, moz:firefoxOptions: {}}}]
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:126)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
	at org.openqa.selenium.remote.service.DriverCommandExecutor.invokeExecute(DriverCommandExecutor.java:164)
	at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:139)
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:547)
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:244)
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:166)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:205)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:193)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:171)
	at io.github.bonigarcia.webdriver.junit4.ch2.helloworld.HelloWorldFirefoxJUnit4Test.setup(HelloWorldFirefoxJUnit4Test.java:46)
	...

I experience this problem in different repositories, for example:

https://github.com/bonigarcia/webdrivermanager/runs/3650063697?check_suite_focus=true
https://github.com/bonigarcia/webdrivermanager-basic/runs/3663919684?check_suite_focus=true
https://github.com/bonigarcia/selenium-webdriver-java/runs/3662711944?check_suite_focus=true

Expected results:

This test should pass. The same test is being executed also in ubuntu-latest (Ubuntu 20.04) and macos-latest (macOS 10.15) in GitHub Actions without any problem.

Thanks for the report! Would you mind running your tests with trace logs enabled, and trying to reproduce? That would help a lot to figure out which connection is actually affected.

Flags: needinfo?(boni.garcia)

I modified the test enabling trace log as requested. The test has failed again, and the geckodriver traces are the following:

1632396770056	geckodriver::marionette	DEBUG	Waiting 60s to connect to browser on 127.0.0.1:57714
1632396771058	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396772159	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396773260	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396774362	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396775464	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396776566	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396777667	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396778770	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396779871	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396780974	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396782075	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396783178	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396784281	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396785382	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396786484	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396787586	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396788689	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396789792	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396790892	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396791994	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396793096	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396794198	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396795300	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396796402	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396797505	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396798608	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396799711	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396800813	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396801916	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396803020	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396804121	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396805225	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396806328	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396807430	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396808533	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396809635	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396810737	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396811839	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396812941	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396814044	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396815147	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396816249	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396817351	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396818453	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396819554	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396820655	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396821757	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396822859	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396823962	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396825065	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396826167	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396827269	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396828372	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396829475	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632396837804	RemoteAgent	DEBUG	CDP enabled
1632396837804	Marionette	INFO	Marionette enabled
1632396839075	Marionette	TRACE	Received observer notification toplevel-window-ready
console.warn: SearchSettings: "get: No settings file exists, new profile?" (new NotFoundError("Could not open the file at C:\\Users\\runneradmin\\AppData\\Local\\Temp\\rust_mozprofileQJgwkl\\search.json.mozlz4", (void 0)))
1632396849892	RemoteAgent	TRACE	Received observer notification remote-startup-requested
1632396850057	RemoteAgent	DEBUG	Setting recommended pref apz.content_response_timeout to 60000
1632396850058	RemoteAgent	DEBUG	Setting recommended pref browser.contentblocking.introCount to 99
1632396850058	RemoteAgent	DEBUG	Setting recommended pref browser.download.panel.shown to true
1632396850058	RemoteAgent	DEBUG	Setting recommended pref browser.newtabpage.enabled to false
1632396850060	RemoteAgent	DEBUG	Setting recommended pref browser.pagethumbnails.capturing_disabled to true
1632396850060	RemoteAgent	DEBUG	Setting recommended pref browser.search.update to false
1632396850060	RemoteAgent	DEBUG	Setting recommended pref browser.tabs.disableBackgroundZombification to false
1632396850060	RemoteAgent	DEBUG	Setting recommended pref browser.tabs.unloadOnLowMemory to false
1632396850060	RemoteAgent	DEBUG	Setting recommended pref browser.tabs.warnOnCloseOtherTabs to false
1632396850061	RemoteAgent	DEBUG	Setting recommended pref browser.tabs.warnOnOpen to false
1632396850061	RemoteAgent	DEBUG	Setting recommended pref browser.toolbars.bookmarks.visibility to never
1632396850061	RemoteAgent	DEBUG	Setting recommended pref browser.usedOnWindows10.introURL to 
1632396850061	RemoteAgent	DEBUG	Setting recommended pref browser.urlbar.suggest.searches to false
1632396850062	RemoteAgent	DEBUG	Setting recommended pref datareporting.policy.dataSubmissionPolicyAccepted to false
1632396850062	RemoteAgent	DEBUG	Setting recommended pref dom.disable_open_during_load to false
1632396850062	RemoteAgent	DEBUG	Setting recommended pref dom.file.createInChild to true
1632396850063	RemoteAgent	DEBUG	Setting recommended pref dom.max_chrome_script_run_time to 0
16DevTools listening on ws://localhost:9222/devtools/browser/561e2ffe-114d-480c-80e6-f51135fca831
32396850063	RemoteAgent	DEBUG	Setting recommended pref dom.max_script_run_time to 0
1632396850064	RemoteAgent	DEBUG	Setting recommended pref dom.push.connection.enabled to false
1632396850064	RemoteAgent	DEBUG	Setting recommended pref extensions.getAddons.cache.enabled to false
1632396850064	RemoteAgent	DEBUG	Setting recommended pref extensions.getAddons.discovery.api_url to data:, 
1632396850064	RemoteAgent	DEBUG	Setting recommended pref network.http.prompt-temp-redirect to false
1632396850064	RemoteAgent	DEBUG	Setting recommended pref privacy.trackingprotection.enabled to false
1632396850064	RemoteAgent	DEBUG	Setting recommended pref security.fileuri.strict_origin_policy to false
1632396850065	RemoteAgent	DEBUG	Setting recommended pref security.notification_enable_delay to 0
1632396850065	RemoteAgent	DEBUG	Setting recommended pref signon.autofillForms to false
1632396850066	RemoteAgent	DEBUG	Setting recommended pref signon.rememberSignons to false
1632396850067	RemoteAgent	DEBUG	Setting recommended pref browser.topsites.contile.enabled to false
1632396850067	RemoteAgent	DEBUG	Setting recommended pref browser.contentblocking.features.standard to -tp,tpPrivate,cookieBehavior0,-cm,-fp
1632396850067	RemoteAgent	DEBUG	Setting recommended pref network.cookie.cookieBehavior to 0
1632396850100	Marionette	TRACE	Received observer notification marionette-startup-requested
1632396850100	Marionette	TRACE	Waiting until startup recorder finished recording startup scripts...
1632396850372	Marionette	TRACE	All scripts recorded.
1632396850372	RemoteAgent	DEBUG	Setting recommended pref dom.disable_beforeunload to true
1632396850389	Marionette	INFO	Listening on port 57714
1632396850390	Marionette	DEBUG	Marionette is listening
1632396900584	mozrunner::runner	DEBUG	Killing process 5376
1632396900587	geckodriver::browser	DEBUG	Browser process stopped: exit code: 1
1632396900597	webdriver::server	DEBUG	<- 500 Internal Server Error {"value":{"error":"timeout","message":"No connection could be made because the target machine actively refused it. (os error 10061)","stacktrace":""}}

See the complete execution here.

Flags: needinfo?(boni.garcia)

As I can see by the timestamp it takes around 80 seconds for Firefox to startup. Do you know why it is that slow? It's actually the reason for this failure. Still interesting that we cancel after 80s and not the 60s as we actually want to do.

Flags: needinfo?(boni.garcia)

No clue, sorry. I only know that this problem happens from time to time, not always.

Flags: needinfo?(boni.garcia)

Could you maybe set the Firefox preference toolkit.asyncshutdown.log to True and run some more tests until it fails? That log might give us an indication in which step during startup Firefox might take longer than usual.

Overall this seems more a Firefox or system integration issue. But lets see what's in the asked extended log.

Flags: needinfo?(boni.garcia)

I set that preference in my test. I don't know if it is somehow related, but now I am not able to reproduce the issue. The traces I get now are the following (but the test, in this case, is passing):

1632734254768	geckodriver	INFO	Listening on 127.0.0.1:60309
1632734255235	mozrunner::runner	INFO	Running command: "C:\\Program Files\\Mozilla Firefox\\firefox.exe" "--marionette" "--remote-debugging-port" "9222" "-no-remote" "-profile" "C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\rust_mozprofileScXFQN"
1632734255242	geckodriver::marionette	DEBUG	Waiting 60s to connect to browser on 127.0.0.1:53693
1632734256244	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734257347	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734258450	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734259553	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734260656	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734261772	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734262875	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734263979	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734265081	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734266183	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734267287	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734268389	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734269491	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734270595	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734271697	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734272799	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734273902	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
1632734275005	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
DEBUG: Adding blocker OS.File: flush pending requests, warn about unclosed files, shut down service. for phase xpcom-will-shutdown
DEBUG: Adding blocker OS.File: flush I/O queued before profileBeforeChange for phase profile-before-change
DEBUG: Adding blocker AddonManager: shutting down. for phase profile-before-change
DEBUG: Adding blocker IOUtils Blocker (profile-before-change) for phase profile-before-change
DEBUG: Adding blocker IOUtils Blocker (xpcom-will-shutdown) for phase xpcom-will-shutdown
DEBUG: Adding blocker Flush WebExtension StartupCache for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Adding blocker JSON store: writing data for phase AddonManager: Waiting for providers to shut down.
DEBUG: Adding blocker Update add-on blocklist state into add-on DB for phase AddonManager: Waiting to start provider shutdown.
DEBUG: Completed blocker Update add-on blocklist state into add-on DB for phase AddonManager: Waiting to start provider shutdown.
DEBUG: Adding blocker XPIProvider shutdown for phase quit-application-granted
DEBUG: Adding blocker XPIProvider for phase AddonManager: Waiting for providers to shut down.
DEBUG: Adding blocker ServiceWorkerRegistrar: Flushing data for phase profile-before-change
DEBUG: Adding blocker CrashMonitor: Writing notifications to file after receiving profile-before-change for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Adding blocker TelemetryController: shutting down for phase profile-before-change-telemetry
1632734276107	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
DEBUG: Adding blocker EnvironmentAddonBuilder for phase AddonManager: Waiting to start provider shutdown.
DEBUG: Adding blocker RemoteSettingsClient - finish IDB access. for phase profile-before-change
1632734276214	RemoteAgent	DEBUG	CDP enabled
1632734276214	Marionette	INFO	Marionette enabled
1632734276583	Marionette	TRACE	Received observer notification toplevel-window-ready
DEBUG: Adding blocker ClientManagerService: start destroying IPC actors early for phase xpcom-will-shutdown
DEBUG: Adding blocker Flush AddonRepository for phase profile-before-change
1632734277210	geckodriver::marionette	TRACE	No connection could be made because the target machine actively refused it. (os error 10061). Retrying in 100ms
DEBUG: Adding blocker MediaShutdownManager: shutdown for phase profile-before-change
DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Adding blocker Remote Settings profile-before-change for phase profile-before-change
DEBUG: Adding blocker GMPProvider for phase AddonManager: Waiting for providers to shut down.
DEBUG: Spinning the event loop
DEBUG: Completed blocker sanitize.js: Sanitize on shutdown for phase Places Clients shutdown
DEBUG: Finished phase profile-change-teardown
DEBUG: Starting phase profile-before-change
DEBUG: Spinning the event loop
DEBUG: Completed blocker EnvironmentAddonBuilder for phase AddonManager: Waiting to start provider shutdown.
DEBUG: Completed blocker PageActions: purging unregistered actions from cache for phase profile-before-change
DEBUG: Completed blocker Places Expiration: shutdown for phase Places Connection shutdown
DEBUG: Completed blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker RemoteSettingsClient - finish IDB access. for phase profile-before-change
DEBUG: Completed blocker Flush AddonRepository for phase profile-before-change
DEBUG: Completed blocker Remote Settings profile-before-change for phase profile-before-change
DEBUG: Completed blocker DownloadAutoSaveView: writing data for phase profile-before-change
DEBUG: Completed blocker Search service: shutting down for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker places.sqlite#0: waiting for shutdown for phase Sqlite.jsm: wait until all connections are closed
DEBUG: Completed blocker OS.File: flush I/O queued before profileBeforeChange for phase profile-before-change
DEBUG: Completed blocker PlacesUtils wrapped connection closing as part of Places shutdown for phase Places Connection shutdown
DEBUG: Completed blocker PlacesUtils wrapped connection must be closed before Sqlite.jsm for phase Sqlite.jsm: wait until all clients have completed their task
DEBUG: Completed blocker JSON store: writing data for phase AddonManager: Waiting for providers to shut down.
DEBUG: Completed blocker GMPProvider for phase AddonManager: Waiting for providers to shut down.
DEBUG: Completed blocker XPIProvider for phase AddonManager: Waiting for providers to shut down.
DEBUG: Completed blocker AddonManager: shutting down. for phase profile-before-change
DEBUG: Completed blocker places.sqlite#1: waiting for shutdown for phase Sqlite.jsm: wait until all connections are closed
DEBUG: Completed blocker PlacesUtils read-only connection closing as part of Places shutdown for phase Places Connection shutdown
DEBUG: Completed blocker PlacesUtils read-only connection must be closed before Sqlite.jsm for phase Sqlite.jsm: wait until all clients have completed their task
DEBUG: Completed blocker content-prefs.sqlite#0: waiting for shutdown for phase Sqlite.jsm: wait until all connections are closed
DEBUG: Completed blocker Closing ContentPrefService2 connection. for phase Sqlite.jsm: wait until all clients have completed their task
DEBUG: Completed blocker Sqlite.jsm shutdown blocker for phase profile-before-change
DEBUG: Completed blocker CrashMonitor: Writing notifications to file after receiving profile-before-change for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Completed blocker Flush WebExtension StartupCache for phase IOUtils: waiting for profileBeforeChange IO to complete
DEBUG: Finished phase profile-before-change
DEBUG: Starting phase profile-before-change-telemetry
DEBUG: Spinning the event loop
JavaScript error: resource:///modules/Interactions.jsm, line 230: NS_DEBUG: Adding blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
DEBUG: Completed blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIUserIdleService.removeIdleObserver]

###!!! [Child][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost

DEBUG: Adding blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
DEBUG: Adding blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
DEBUG: Adding blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
DEBUG: Completed blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
DEBUG: Completed blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
DEBUG: Completed blocker Waiting for ping task for phase TelemetryController: Waiting for pending ping activity
DEBUG: Completed blocker TelemetryController: shutting down for phase profile-before-change-telemetry
DEBUG: Finished phase profile-before-change-telemetry
DEBUG: Starting phase xpcom-will-shutdown
DEBUG: Spinning the event loop
DEBUG: Completed blocker OS.File: flush pending requests, warn about unclosed files, shut down service. for phase xpcom-will-shutdown
DEBUG: Finished phase xpcom-will-shutdown
DEBUG: Starting phase web-workers-shutdown
DEBUG: Spinning the event loop
DEBUG: Finished phase web-workers-shutdown
1632734283307	geckodriver::browser	DEBUG	Browser process stopped: exit code: 0
1632734283440	webdriver::server	DEBUG	<- 200 OK {"value":null}
Flags: needinfo?(boni.garcia)
Attached file error traces

Thanks. So here an excerpt from the log with the important steps:

1633601005223	geckodriver::marionette	DEBUG	Waiting 60s to connect to browser on 127.0.0.1:49986
[..]
1633601100298	RemoteAgent	DEBUG	CDP enabled
1633601100298	Marionette	INFO	Marionette enabled
1633601101605	Marionette	TRACE	Received observer notification toplevel-window-ready
[..]
1633601114949	Marionette	TRACE	Received observer notification marionette-startup-requested
1633601114949	Marionette	TRACE	Waiting until startup recorder finished recording startup scripts...
[..]
1633601115414	Marionette	INFO	Listening on port 49986
1633601115415	Marionette	DEBUG	Marionette is listening
[..]
1633601135781	mozrunner::runner	DEBUG	Killing process 828

As it can be seen by the timestamps from the Marionette log the startup of Firefox is again very slow. Alone from launching the binary up to the enabling of CDP and Marionette it took 95 seconds. Then it takes additional 15 seconds to put Marionette in listening mode. Overall these are 110 seconds which is way beyond what we usually allow (60 seconds). So seeing the browser process killed by geckodriver is expected here.

What we could do is to provide a custom build of geckodriver which extends the 60s to some longer timeout, so that we could at least try to get the Gecko profiler run. A generated profile can only be exported / saved to disk if the process is not force killed.

But to get to the underlying issue, most of the time seems to be spend in getting the Firefox binary to launch at all, means before the first log output gets actually written. There might be some other internal logs that could be used to get more details of what's going on during startup.

Dave or Florian, do you know what could be helpful here to get this very slow startup better investigated? Confirming the bug for now given the detailed logs from geckodriver and Marionette.

Status: UNCONFIRMED → NEW
Component: geckodriver → Startup and Profile System
Ever confirmed: true
Flags: needinfo?(florian)
Flags: needinfo?(dtownsend)
Product: Testing → Toolkit
Summary: geckodriver random error: No connection could be made because the target machine actively refused it. (os error 10061) → Very slow startup of Firefox on Windows causes geckodriver to kill the process after the startup timeout (error: No connection could be made because the target machine actively refused it)
Version: Default → unspecified

@Boni is that actually a new problem for you or did older Firefox releases behave the same?

Flags: needinfo?(boni.garcia)

I experienced this problem also in former versions of Firefox and geckodriver.

Flags: needinfo?(boni.garcia)

Ok, so it's at least not a recent regression. Does it actually happen on your local system or some server that runs the tests? If it's happening locally do you actually see the Firefox window to open very late? Then it should still be around for 15s, which should allow you to manually close Firefox. If that's the case we could try the Gecko profiler. Please let me know.

It happens in windows-latest (Windows Server 2019) on GitHub Actions, but I never managed to reproduce it in a local Windows.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #8)

Dave or Florian, do you know what could be helpful here to get this very slow startup better investigated? Confirming the bug for now given the detailed logs from geckodriver and Marionette.

I don't have a good sense of what's possible to run on GitHub Actions. Of course getting a startup profile from our profiler would be best, but as you said in previous comments, that won't work if the process is killed. xperf could be a good option on Windows, but I'm not really familiar with it. Figuring out if we are blocked on CPU or on I/O might be interesting, but I can't think of any easy way to do it.

Flags: needinfo?(florian)

I wonder if there are some MOZ_LOG modules available that might log during startup.

Probably your best bet is a custom build with timed log messages inserted at various places during startup. You could also try this in a windows docker container or azure hosted VM since that appears to be what github actions run under.

Flags: needinfo?(dtownsend)

We already run wdspec tests (via geckodriver) on Azure machines and never faced this slow startup yet. Maybe the reporter could use a debug build of Firefox which might already give some more information also taking into account that it is slower then opt builds. But if something takes long in opt, it would take even longer in debug builds and could cause a similar behavior.

Boni, the build you can download from our CI. After you have opened the link click the Artifacts tab in the lower pane, and find the target.zip file.

If that doesn't help we could try some custom logs.

Thanks in advance.

Flags: needinfo?(boni.garcia)

As I said previously, this error happens in GitHub Actions (Windows images). I think it is not possible to deploy a custom Firefox on this environment, at least I don't know how to do it.

Flags: needinfo?(boni.garcia)

Do you have a chance to at least run a custom geckodriver build?

Yes, I can change my test to use a custom geckodriver, no problem.

I modified geckodriver to wait 300s for the Marionette handshake now. The build can be downloaded via:
https://treeherder.mozilla.org/jobs?repo=try&revision=b6ea5011b708c1f4e6fc151f7168c38f00f334ee&selectedTaskRun=bXcpVbqhSp2WUmaSvGZ0Fg.0

Just get the geckodriver.zip from the Artifacts pane at the bottom of the page.

When you run this version of geckodriver it will output a lot of log entries of the kind **** Exceeded timeout; Waiting 4 times longer... **** which you want to find in your logs. Once we have it we should see how long it actually takes to start Firefox.

As follow-up we could let geckodriver start firefox with the profiler environment setup so that a JSON file can be saved, which you might be able to pull? We should limit the amount of interactions such a test does to its minimum to reduce the size of that file. I hope that this would be doable and still would reproduce the problem.

I modified my test to use that geckodriver:

https://github.com/bonigarcia/webdrivermanager-basic/blob/master/src/test/java/io/github/bonigarcia/wdm/test/FirefoxTest.java

I have executed the test in GitHub Actions but so far the error is not happening:

https://github.com/bonigarcia/webdrivermanager-basic/actions/runs/1340781257

Regarding the profiler environment, I don't know about that.

Well, as it looks like the startup is always slow and takes at minimum around 25s? At least this one indicates that:
https://github.com/bonigarcia/webdrivermanager-basic/runs/3891855316?check_suite_focus=true#step:5:1110

Regarding the profiler please set the following environment variables so that at least geckodriver picks them up and forwards them to Firefox. Setting globally would also work.

MOZ_PROFILER_STARTUP=1
MOZ_PROFILER_SHUTDOWN=%path_to_json_file%

This will enable the profiler for startup and causes a JSON file to be written during shutdown to the specified location. If you could create an artifact of that file so it would be possible to download we could then investigate the slow startup. Thanks!

I don't have access to the file system of GitHub Actions, so I think this configuration is not possible:

MOZ_PROFILER_SHUTDOWN=%path_to_json_file%

Well, geckodriver and Firefox have access. And that's all what's needed here. But not sure how paths have to be defined. As such please check the following docs in how to create artifacts with Github actions, and then how to download them:

https://docs.github.com/en/actions/advanced-guides/storing-workflow-data-as-artifacts
https://docs.github.com/en/actions/managing-workflow-runs/downloading-workflow-artifacts

The severity field is not set for this bug.
:mossop, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dtownsend)
Severity: -- → S4
Flags: needinfo?(dtownsend)

Boni, I would still appreciate if you could find the time to generate a profile based on my comment 24. Thanks!

Flags: needinfo?(boni.garcia)

Sorry, but this month I'm very busy and I cannot spend time on this. Maybe for month I might find the time to read the GH action doc.

Flags: needinfo?(boni.garcia)

No worries. Given the current information we have and what's missing to further investigate it, I'm going to close as incomplete for now.

We can clearly reopen the bug when it turns out to be a Firefox problem and not related to any broken configuration for Github actions. Please keep us posted whenever you have the time. Thanks!

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
I'm still experiencing this issue. I recently configured the GitHub repo to get the requested JSON file for a failed test. I'm attaching this file here.

I'm still experiencing this issue. I recently configured the GitHub repo to get the requested JSON file for a failed test. I'm attaching this file here.

Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to boni.garcia from comment #29)

I'm still experiencing this issue. I recently configured the GitHub repo to
get the requested JSON file for a failed test. I'm attaching this file here.

In this profile it takes around 10.5s until Marionette is enabled. As such the connection attempts from geckodriver should not timeout. Do you have a corresponding geckodriver log at hand? If it's not timing out the profile is more related to bug 1650025 and might need further investigation.

Flags: needinfo?(boni.garcia)
Yes, this is the corresponding geckodriver logs for the previous profile info:

Thanks a lot! This is actually very strange! geckodriver starts Firefox at 1639758322134, but then the first logging output from Firefox appears at 1639758381542. That is after 59,6s AFTER the attempt to start Firefox. It's not correlating to what the profile actually shows. Florian, do we maybe have issues before the profiler starts recording?

Also geckodriver finally kills the process and by that we usually do not generate a profile JSON file because it's only done for a regular shutdown.

As such could it be that the profile is from a former run and was maybe cached on the worker?

Flags: needinfo?(florian)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #35)

As such could it be that the profile is from a former run and was maybe cached on the worker?

That's a reasonable explanation. The shutdown in the profile is definitely a clean one.

Flags: needinfo?(florian)
Attached file geckodriver-logs-2
I managed to reproduced the error again. I'm attaching the new logs and profile data:
Attached file geckodriver-logs-2
I managed to reproduced the error again. I'm attaching the new logs and profile data:
Attached file geckodriver-logs-2
I managed to reproduced the error again. I'm attaching the new logs and profile data:
Attached file logs-2
I managed to reproduced the error again. I'm attaching the new logs and profile data: ````
Attached file logs-2
I managed to reproduced the error again. I'm attaching the new logs and profile data:

Sorry for repeating the same logs, but the app is giving me errors and it appears repeatedly. I have just realized I don't have the profile data for the previous logs. I will try to reproduce again and gather the log data.

Flags: needinfo?(boni.garcia)

I am not able to gather profile data when the test fails. Maybe this has sense, since the browser cannot be started, and the profile data is not generated.

The browser will be started but with quite some delay. I think that we should wait until my work on bug 1726465 actually lands. Once done Marionette's socket server will be available close to the application start and as such creating a new session will fail instead. Hereby we will safely shutdown the browser which would allow you to actually generate such a profile.

Depends on: 1726465

The issue still happens with geckodriver 0.31.0, recently released. This GitHub Actions execution shows it:

https://github.com/bonigarcia/webdrivermanager-basic/runs/6006319077?check_suite_focus=true

(In reply to boni.garcia from comment #45)

The issue still happens with geckodriver 0.31.0, recently released. This GitHub Actions execution shows it:

https://github.com/bonigarcia/webdrivermanager-basic/runs/6006319077?check_suite_focus=true

A geckodriver release wont change anything for the startup of Firefox. As mentioned in my last comment we have to wait for bug 1726465 being fixed. I hope it wont take that long anymore. Sorry for that annoyance.

Boni, can you please check again with the very recent Nightly build of Firefox? Does it work now? If yes please create a trace log and attach it to this bug. I'm interested in how long it actually takes until the first application window is opened on your machine. If that's very long we can finally try again to create a startup profile. Thanks!

Flags: needinfo?(boni.garcia)

Sorry, but I cannot do that. The problem is that I only managed to reproduce the issue when running my tests on GitHub Actions, and for the best of my knowledge, I cannot use Firefox Nightly there.

Flags: needinfo?(boni.garcia)

Ah I see. So we would have to wait until Firefox 102 has made it to at least beta. I'll set me a reminder to ask you again once the next merge has been done.

Boni, as of yesterday Firefox 102 is on the beta channel. I would appreciate your feedback given that it should be possible now to get it running in Github Actions. Thanks!

Flags: needinfo?(boni.garcia)

I don't know how to use Firefox beta on Windows in GitHub Actions. @Henrik: Do you know if this is possible? Do you know how?

Flags: needinfo?(boni.garcia)

It should be similar to what the Selenium folks recently did in their CI for the Java bindings:
https://github.com/SeleniumHQ/selenium/blob/trunk/.github/workflows/ci-java.yml#L153-L158

The underlying actions set used in Selenium to install different versions of Firefox is the following:

https://github.com/marketplace/actions/setup-firefox
https://github.com/browser-actions/setup-firefox

Reading the docs, it seems that it is mandatory to use Ubuntu, which makes sense for the easy installation of Firefox. I tried in any case in one of my projects, and indeed, this is not working in Windows, which is where the problem happens:

https://github.com/bonigarcia/webdrivermanager-basic/runs/6719651006?check_suite_focus=true

Run browser-actions/setup-firefox@latest
  with:
    firefox-version: latest
  env:
    DISPLAY: :99
    WDM_GITHUBTOKEN: ***
    MOZ_PROFILER_STARTUP: 1
    MOZ_PROFILER_SHUTDOWN: geckodriver-profile.json
    JAVA_HOME: C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\8.0.332-9\x64
Setup firefox latest (en-US)
Attempting to download firefox latest...
Error: Unsupported platform windows amd64 for version latest

Oh I see. Thanks for checking that.

So in this case we would indeed have to wait until Firefox 102 made it to release, which will happen on 2022-06-28.

Now Firefox 102 has been released. Would you mind to run the check? Thanks.

Flags: needinfo?(boni.garcia)

Firefox 102 is not available yet in GH Actions.

Flags: needinfo?(boni.garcia)

Boni, I assume that you can verify now. :)

Flags: needinfo?(boni.garcia)

Yes, now Firefox 102 is already in GH Actions, and I did not manage to reproduce the problem again. So yes, we can suppose it is fixed now. Thanks a lot.

Flags: needinfo?(boni.garcia)

Although I cannot reproduce the previous error (Response code 500. Message: No connection could be made because the target machine actively refused it), I have to say that it is still unreliable to drive Firefox in Windows with Selenium. Now I am getting this error in my GH Actions workflow:

Caused by: org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '4.4.0', revision: 'e5c75ed026a'
System info: host: 'fv-az449-42', ip: '10.1.0.64', os.name: 'Windows Server 2022', os.arch: 'amd64', os.version: '10.0', java.version: '11.0.16'
Driver info: org.openqa.selenium.firefox.FirefoxDriver
Command: [null, newSession {capabilities=[Capabilities {acceptInsecureCerts: true, browserName: firefox, moz:debuggerAddress: true, moz:firefoxOptions: {}}], desiredCapabilities=Capabilities {acceptInsecureCerts: true, browserName: firefox, moz:debuggerAddress: true, moz:firefoxOptions: {}}}]
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:565)
	at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:242)
	at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:157)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:110)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:98)
	at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:94)
	... 84 more
Caused by: java.io.UncheckedIOException: java.net.ConnectException: Connection refused: no further information: localhost/0:0:0:0:0:0:0:1:59542
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:73)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:49)
	at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
	at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
	at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:98)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:120)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:102)
	at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:67)
	at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
	at org.openqa.selenium.remote.service.DriverCommandExecutor.invokeExecute(DriverCommandExecutor.java:167)
	at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:142)
	at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:547)
	... 89 more

Just in case, Firefox is supposed to be 103.0.1, and geckodriver is 0.31.0.

The Selenium logging output only is not helpful. Can you please provide a geckodriver trace log again? I'm happy to have a look at this. Thanks.

Flags: needinfo?(boni.garcia)

Sorry but I don't have the geckodriver traces for that failure. The problem is that these kinds of problems happen randomly in several of my projects related to Selenium and executed on GH Actions. The previous error corresponds to this execution:

https://github.com/bonigarcia/webdrivermanager/runs/7818790290?check_suite_focus=true

Flags: needinfo?(boni.garcia)

If there are no failures there isn't that much that I can do here. It's sad but I would suggest that we close this bug as WORKSFORME and you can file a new bug for this specific new lower intermittent issue once you have a better log. I had a look at the link and it's very hard to read given that logs for multiple drivers/browsers seem to get mixed.

Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → WORKSFORME
Duplicate of this bug: 1834856
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: