Open Bug 1893921 Opened 22 days ago Updated 6 days ago

browsingContext.create is slow on macos arm64 runners on GitHub actions

Categories

(Remote Protocol :: WebDriver BiDi, defect, P2)

defect
Points:
3

Tracking

(Not tracked)

People

(Reporter: alexrudenko, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

(Whiteboard: [webdriver:m11])

See https://github.com/puppeteer/puppeteer/actions/runs/8829855155/job/24242835681?pr=12318 It is a test run done on new macos arm64 runners on GitHub actions (https://github.com/actions/runner-images/blob/macos-14-arm64/20240422.3/images/macos/macos-14-arm64-Readme.md#browsers)

Several tests time out and in general the test run is very slow, 21 minutes for one test shard (usually up to 5 min). When run with the trace option, the logs indicate that browsingContext.create always blocks for 5 seconds: https://gist.github.com/OrKoN/1bcde28b31e2dba4ba50909deecdd010

I've triggered a new job with debug enabled. Lets see if this gives some more details:

https://github.com/puppeteer/puppeteer/actions/runs/8829855155/job/24363405593?pr=12318

And here is already some output:

 1714376280846	RemoteAgent	TRACE	[12] ProgressListener Start: expectNavigation=false resolveWhenStarted=false unloadTimeout=5000 waitForExplicitStart=false
1714376280846	RemoteAgent	TRACE	[12] ProgressListener Setting unload timer (5000ms)
1714376285849	RemoteAgent	DEBUG	WebDriverBiDiConnection 7462f260-895e-4517-8a97-128c1d17c323 <- {"type":"event","method":"script.realmCreated","params":{"realm":"8f9e717e-c133-42cb-9051-61e1a4366967","origin":"null","context":"0c801b4e-d1d1-455f-b76b-495f31e4e308","type":"window"}}

It looks like the hang comes from waiting for about:blank to be loaded. Sadly I do not see any update of the web progress listener status and wonder if we somehow miss loading this page.

Alex, is it easy to run a job with the MOZ_LOG="BCWebProgress:5" environment variable set?

Flags: needinfo?(alexrudenko)

I believe the same log was already in the linked gist https://gist.github.com/OrKoN/1bcde28b31e2dba4ba50909deecdd010

To set extra env variables we can use the PR here https://github.com/puppeteer/puppeteer/pull/12328

Flags: needinfo?(alexrudenko)

Here an example:
https://github.com/puppeteer/puppeteer/actions/runs/8875780005/job/24365989726?pr=12328#step:9:350

1714380471274	RemoteAgent	DEBUG	WebDriverBiDiConnection adc3c975-c114-4598-bfaa-f6881514cfc9 -> {"id":8,"method":"browsingContext.create","params":{"type":"tab","userContext":"3da5f8e5-eb20-4d60-b6a2-3b6706ee4274"}}
1714380471274	RemoteAgent	TRACE	Module root/browsingContext.sys.mjs found for ROOT
1714380471274	RemoteAgent	TRACE	Received command browsingContext.create for destination ROOT
1714380471274	RemoteAgent	TRACE	Module root/browsingContext.sys.mjs found for ROOT
1714380471275	RemoteAgent	TRACE	[10] NavigationListener onStateChange, stateFlags: 196612, status: 0, isStart: false, isStop: false, isNetwork: false, isBindingAborted: false, targetURI: about:blank
1714380471275	RemoteAgent	TRACE	Received command browsingContext._awaitVisibilityState for destination WINDOW_GLOBAL
1714380471275	RemoteAgent	TRACE	Module windowglobal/browsingContext.sys.mjs found for WINDOW_GLOBAL
1714380471277	RemoteAgent	TRACE	[10] NavigationListener onStateChange, stateFlags: 131088, status: 0, isStart: false, isStop: true, isNetwork: false, isBindingAborted: false, targetURI: about:blank
1714380471314	RemoteAgent	DEBUG	WebDriverBiDiConnection adc3c975-c114-4598-bfaa-f6881514cfc9 <- {"type":"event","method":"browsingContext.contextCreated","params":{"children":null,"context":"4e53e36f-31c1-4b45-ba2f-f6dc0adf121f","url":"about:blank","userContext":"3da5f8e5-eb20-4d60-b6a2-3b6706ee4274","parent":null}}
2024-04-29T08:47:51.315Z puppeteer:webDriverBiDi:RECV ◀ [
  '{"type":"event","method":"browsingContext.contextCreated","params":{"children":null,"context":"4e53e36f-31c1-4b45-ba2f-f6dc0adf121f","url":"about:blank","userContext":"3da5f8e5-eb20-4d60-b6a2-3b6706ee4274","parent":null}}'
]
1714380471330	RemoteAgent	TRACE	Received DOM event TabSelect for [object XULElement]
[Parent 5860: Main Thread]: I/BCWebProgress OnLocationChange({isTopLevel:1, isLoadingDocument:1}, {URI:about:blank, originalURI:about:blank}, about:blank, 0) on {top:1, id:a, url:about:blank}
1714380471336	RemoteAgent	TRACE	Module windowglobal-in-root/realm-created.sys.mjs not found for WINDOW_GLOBAL
2024-04-29T08:47:51.337Z puppeteer:webDriverBiDi:RECV ◀ [
1714380471336	RemoteAgent	DEBUG	WebDriverBiDiConnection adc3c975-c114-4598-bfaa-f6881514cfc9 <- {"type":"event","method":"script.realmCreated","params":{"realm":"9279068c-35a3-483d-854f-cba368b3ed6a","origin":"null","context":"306dcc2d-3624-41f7-b432-d87b10d858bd","type":"window"}}
  '{"type":"event","method":"script.realmCreated","params":{"realm":"9279068c-35a3-483d-854f-cba368b3ed6a","origin":"null","context":"306dcc2d-3624-41f7-b432-d87b10d858bd","type":"window"}}'
1714380471336	RemoteAgent	TRACE	Module windowglobal-in-root/window-global-handler-created.sys.mjs not found for WINDOW_GLOBAL
]
1714380471337	RemoteAgent	DEBUG	WebDriverBiDiConnection adc3c975-c114-4598-bfaa-f6881514cfc9 <- {"type":"event","method":"browsingContext.domContentLoaded","params":{"context":"306dcc2d-3624-41f7-b432-d87b10d858bd","timestamp":1714380471277,"url":"about:blank","navigation":null}}
[Parent 5860: Main Thread]: I/BCWebProgress OnSecurityChange(<null>, <null>, 4) on {top:1, id:a, url:about:blank}
1714380471337	RemoteAgent	TRACE	[306dcc2d-3624-41f7-b432-d87b10d858bd] No navigation found to stop for url: about:blank
[Parent 5860: Main Thread]: I/BCWebProgress OnStateChange({isTopLevel:1, isLoadingDocument:1}, {URI:about:blank, originalURI:about:blank}, STATE_STOP|STATE_IS_NETWORK|STATE_IS_WINDOW, NS_OK) on {top:1, id:a, url:about:blank}
1714380471337	RemoteAgent	DEBUG	WebDriverBiDiConnection adc3c975-c114-4598-bfaa-f6881514cfc9 <- {"type":"event","method":"browsingContext.load","params":{"context":"306dcc2d-3624-41f7-b432-d87b10d858bd","timestamp":1714380471277,"url":"about:blank","navigation":null}}
2024-04-29T08:47:51.338Z puppeteer:webDriverBiDi:RECV ◀ [
  '{"type":"event","method":"browsingContext.domContentLoaded","params":{"context":"306dcc2d-3624-41f7-b432-d87b10d858bd","timestamp":1714380471277,"url":"about:blank","navigation":null}}'
]
2024-04-29T08:47:51.338Z puppeteer:webDriverBiDi:RECV ◀ [
  '{"type":"event","method":"browsingContext.load","params":{"context":"306dcc2d-3624-41f7-b432-d87b10d858bd","timestamp":1714380471277,"url":"about:blank","navigation":null}}'
]
1714380471407	RemoteAgent	TRACE	[12] ProgressListener Start: expectNavigation=false resolveWhenStarted=false unloadTimeout=5000 waitForExplicitStart=false
1714380471407	RemoteAgent	TRACE	[12] ProgressListener Setting unload timer (5000ms)
2024-04-29T08:47:56.506Z puppeteer:webDriverBiDi:RECV ◀ [
1714380476506	RemoteAgent	TRACE	[12] ProgressListener No navigation detected: about:blank
  '{"type":"success","id":8,"result":{"context":"4e53e36f-31c1-4b45-ba2f-f6dc0adf121f"}}'
1714380476506	RemoteAgent	TRACE	[12] ProgressListener Stop: has error=false
]
1714380476506	RemoteAgent	DEBUG	WebDriverBiDiConnection adc3c975-c114-4598-bfaa-f6881514cfc9 <- {"type":"success","id":8,"result":{"context":"4e53e36f-31c1-4b45-ba2f-f6dc0adf121f"}}

We seem to start the progress listener after the page about:blank has already been loaded? Also I see the browsingContext.load event emitted twice.

Blocks: 1750547
Severity: -- → S3
Points: --- → 3
Priority: -- → P2
Whiteboard: [webdriver:m11]

We discussed this in triage yesterday.

First observation is that browsingContext.create waits for the visibilityState of the previous tab to change (here) before we start waiting for the navigation (here).

It would probably be safer to start waiting for the initial navigation right away. However our waitForInitialNavigationCompleted helper is supposed to detect situations where the initial page has already been loaded and resolve early (here).

To verify that, locally I artificially delayed the call to waitForInitialNavigationCompleted by 500ms, 1000ms, 2000ms, and in all cases the helper resumed early, without waiting for the 5s timeout of the ProgressListener. Based on this test, I think that calling the helper "late" works fine.

Therefore I tried to delay the call to waitForInitialNavigationCompleted on Puppeteer as well, by setting browser.tabs.remote.unloadDelayMs to a few hundred ms and pushing to github CI. This should force waitForInitialNavigationCompleted to start late as well, and give enough time to the initial navigation to complete, but it still seems to fail with macos-latest.

Depends on: 1894305

(In reply to Julian Descottes [:jdescottes] from comment #7)

Therefore I tried to delay the call to waitForInitialNavigationCompleted on Puppeteer as well, by setting browser.tabs.remote.unloadDelayMs to a few hundred ms and pushing to github CI. This should force waitForInitialNavigationCompleted to start late as well, and give enough time to the initial navigation to complete, but it still seems to fail with macos-latest.

Note that changing this preference does delay the visibilitychange event even further, so I wouldn't expect that anything changes for Puppeteer in GitHub CI. Based on my comment did you try it out locally by setting browser.tabs.remote.unloadDelayMs to some higher values? Were you able to reproduce the timeout in waiting for the initial navigation?

Flags: needinfo?(jdescottes)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #8)

(In reply to Julian Descottes [:jdescottes] from comment #7)

Therefore I tried to delay the call to waitForInitialNavigationCompleted on Puppeteer as well, by setting browser.tabs.remote.unloadDelayMs to a few hundred ms and pushing to github CI. This should force waitForInitialNavigationCompleted to start late as well, and give enough time to the initial navigation to complete, but it still seems to fail with macos-latest.

Note that changing this preference does delay the visibilitychange event even further, so I wouldn't expect that anything changes for Puppeteer in GitHub CI. Based on my comment did you try it out locally by setting browser.tabs.remote.unloadDelayMs to some higher values? Were you able to reproduce the timeout in waiting for the initial navigation?

Yes, that's what I meant with the previous paragraph: I tried locally with 500ms, 1000ms, 2000ms and it didn't reproduce the issue. The helper successfully detects that the navigation is already done and resolves immediately.

Note that changing this preference does delay the visibilitychange event even further, so I wouldn't expect that anything changes for Puppeteer in GitHub CI.

I think it should have an impact, and it's worrying that it doesn't.
Basically by increasing this value, we delay the "gap" between the call to addTab and the call to waitForInitialNavigationCompleted. For instance if we set the value to 1000 ms, it should give enough time to the new tab to do its initial navigation and waitForInitialNavigationCompleted should then detect that the initial navigation is over and resolve. As I said, I verified locally that the helper was correctly returning early in those cases, so I don't really understand why it doesn't happen on Github CI.

Flags: needinfo?(jdescottes)

I assume that with bug 1894305 now fixed, we should trigger some new debug jobs in GitHub CI for Puppeteer (with the Firefox cache removed first)?

I retriggered a run at https://github.com/puppeteer/puppeteer/actions/runs/9059188428/job/24886294787?pr=12336
My previous push had a typo in MOZ_LOG so it didn't include the Process logs.

Thanks! So it seems it takes a long time to create a new process when we try to create new tabs:

2024-05-13T08:38:30.1111650Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:30.1211520Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:30.1313910Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:30.2590990Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:30.2595100Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:30.2596060Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:31.1948660Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:31.1949320Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:31.1950120Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:31.3429020Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:31.3457780Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:31.3458420Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:32.2271950Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:32.2373640Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:32.2475650Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:32.4686340Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:32.4725860Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:32.4768310Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:33.2278670Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:33.2283020Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:33.2287190Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:33.4980610Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:33.4981200Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:33.4981830Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000
2024-05-13T08:38:34.3286360Z [Parent 5255: Main Thread]: D/Process Starting process allocate on idle
2024-05-13T08:38:34.3287130Z 1715589514319	RemoteAgent	TRACE	[12] ProgressListener No navigation detected: about:blank
2024-05-13T08:38:34.3307020Z 2024-05-13T08:38:34.320Z puppeteer:webDriverBiDi:RECV ◀ [
2024-05-13T08:38:34.3307680Z 1715589514319	RemoteAgent	TRACE	[12] ProgressListener Stop: has error=false
2024-05-13T08:38:34.3369870Z   '{"type":"success","id":8,"result":{"context":"5e1f91fa-3853-493c-9146-eb0e5c25d226"}}'
2024-05-13T08:38:34.3371320Z 1715589514320	RemoteAgent	DEBUG	WebDriverBiDiConnection 33fdce95-6f81-4977-8679-0cec0e867265 <- {"type":"success","id":8,"result":{"context":"5e1f91fa-3853-493c-9146-eb0e5c25d226"}}
2024-05-13T08:38:34.3475890Z ]
2024-05-13T08:38:34.3478510Z 1715589514322	RemoteAgent	DEBUG	WebDriverBiDiConnection 33fdce95-6f81-4977-8679-0cec0e867265 -> {"id":9,"method":"browsingContext.setViewport","params":{"context":"5e1f91fa-3853-493c-9146-eb0e5c25d226","viewport":{"width":800,"height":600},"devicePixelRatio":null}}
2024-05-13T08:38:34.3570390Z 2024-05-13T08:38:34.321Z puppeteer:webDriverBiDi:SEND ► [
2024-05-13T08:38:34.3571080Z 1715589514322	RemoteAgent	TRACE	Module root/browsingContext.sys.mjs found for ROOT
2024-05-13T08:38:34.3616340Z   '{"id":9,"method":"browsingContext.setViewport","params":{"context":"5e1f91fa-3853-493c-9146-eb0e5c25d226","viewport":{"width":800,"height":600},"devicePixelRatio":null}}'
2024-05-13T08:38:34.3620350Z 1715589514322	RemoteAgent	TRACE	Received command browsingContext.setViewport for destination ROOT
2024-05-13T08:38:34.3653510Z ]
2024-05-13T08:38:34.3654270Z 1715589514322	RemoteAgent	TRACE	Module root/browsingContext.sys.mjs found for ROOT
2024-05-13T08:38:34.3655360Z [Parent 5255: Main Thread]: D/Process Trying to start process now
2024-05-13T08:38:34.3656560Z [Parent 5255: Main Thread]: D/Process Starting delayed process start, delay=1000

We try to start a process, but we get Process Starting delayed process start, delay=1000 for 5 seconds.

Olli, do you know what could delay the creation of new processes? This is happening on macos arm workers on Github

Flags: needinfo?(smaug)

If I don't remember wrong Randell worked on some process related changes in the past. So maybe he is a better person to ask. Randell, if that is not the case please let us know, and maybe you know a better person to ask for. Thanks!

Flags: needinfo?(smaug) → needinfo?(rjesup)

(I don't know what "macos arm workers on Github" means).

I assume this is on opt/shippable builds?
Do we have performance profiles for this?

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #15)

(I don't know what "macos arm workers on Github" means).

The workers provided by Github to run Github actions, see https://github.com/actions/runner-images/blob/macos-14-arm64/20240422.3/images/macos/macos-14-arm64-Readme.md#browsers

I assume this is on opt/shippable builds?

Yes.

Do we have performance profiles for this?

Not yet I'll try to get one.

I just saw that there are as well MacOS 13 ARM images:
https://github.com/actions/runner-images/blob/main/images/macos/macos-13-arm64-Readme.md

I wonder if those might show the same problem or not. The current puppeteer checks all run for MacOS 13. So maybe it might be a MacOS 14 issue.

You need to log in before you can comment on or make changes to this bug.