Retrieving geolocation coordinates for WebDriver tests in CI takes more than 500ms on ASAN builds
Categories
(Core :: DOM: Geolocation, defect)
Tracking
()
People
(Reporter: Sasha, Unassigned, NeedInfo)
References
(Blocks 2 open bugs)
Details
Attachments
(1 file)
After bug 1960651 was fixed, the retrieving geolocation coordinates for WebDriver tests in CI became fast, but we started to get failures on ASAN builds because it would still take more than 500 ms to get them (this is a timeout for this API call which we have), see more details in bug 1964346. We started to use a timeout multiplier to increase the timeout for slow builds. But maybe something can be improved here.
Comment 1•1 year ago
|
||
Here an excerpt from a log of an ASAN jobs:
https://treeherder.mozilla.org/logviewer?job_id=506831595&repo=mozilla-central&lineNumber=45515-45524
[task 2025-05-06T15:22:48.215Z] 15:22:48 INFO - PID 17377 | 1746544968212 RemoteAgent DEBUG WebDriverBiDiConnection 4a593ff8-e804-46b4-8547-62ae645355b7 -> {"id":10,"method":"script.callFunction","params":{"functionDeclaration":"(multiplier) =>\n new Promise(\n resolve => window.navigator.geolocation.getCurrentPosition(\n ... de: error.code, message: error.message}),\n {timeout: 500 * multiplier}\n ))\n ","target":{"context":"970a14c5-5737-406b-9e38-3c33c311ade1"},"awaitPromise":true,"arguments":[{"type":"number","value":4}]}}
[task 2025-05-06T15:22:48.222Z] 15:22:48 INFO - PID 17377 | 1746544968214 RemoteAgent TRACE Received command script.callFunction for destination ROOT
[task 2025-05-06T15:22:48.269Z] 15:22:48 INFO - PID 17377 | 1746544968258 RemoteAgent TRACE Created MessageHandler WINDOW_GLOBAL for session 86a902c1-061c-4eac-8b27-7ff9aa4492f1
[task 2025-05-06T15:22:48.271Z] 15:22:48 INFO - PID 17377 | 1746544968258 RemoteAgent TRACE Received command script.callFunctionDeclaration for destination WINDOW_GLOBAL
[task 2025-05-06T15:22:48.273Z] 15:22:48 INFO - PID 17377 | 1746544968272 RemoteAgent TRACE WebDriverProcessData actor created for PID 17634
[task 2025-05-06T15:22:48.317Z] 15:22:48 INFO - PID 17377 | *** WIFI GEO: startup called.
[task 2025-05-06T15:22:48.324Z] 15:22:48 INFO - PID 17377 | *** WIFI GEO: watch called
[task 2025-05-06T15:22:48.325Z] 15:22:48 INFO - PID 17377 | *** WIFI GEO: onStatus called.wifi-timeout
[task 2025-05-06T15:22:48.325Z] 15:22:48 INFO - PID 17377 | *** WIFI GEO: Use request cache:true reason:New req. is GeoIP.
[task 2025-05-06T15:22:48.328Z] 15:22:48 INFO - PID 17377 | 1746544968325 RemoteAgent DEBUG WebDriverBiDiConnection 4a593ff8-e804-46b4-8547-62ae645355b7 <- {"type":"success","id":10,"result":{"realm":"8352c3fd-d3c5-4c95-a13f-3c3ddb49967b","type":"success","result":{"type":"object","value":[["latitude",{"type":"number","value":37.41857}],["longitude",{"type":"number","value":-122.08769}],["altitude",{"type":"null"}],["accuracy",{"type":"number","value":42}],["altitudeAccuracy",{"type":"null"}],["heading",{"type":"null"}],["speed",{"type":"null"}]]}}}
The delay of about 400ms happens before the WIFI GEO: startup called log output.
Kagami, maybe you have an idea what this could be? If not would a log for Geolocation:5 help?
Comment 2•1 year ago
|
||
The delay of about 400ms happens before the
WIFI GEO: startup calledlog output.
No idea yet, but that's 40ms, not 400ms.
Comment 3•1 year ago
•
|
||
I agree that there may still be room for enhancement, but why can we make it so strict in WPT? I don't think such strict timeout should apply to all implementations. It'll also fail for any local run if one depends on a remote server and internet connection is slow.
Comment 4•1 year ago
|
||
Sorry I had a wrong reference. Here is a failing test for ASAN:
https://treeherder.mozilla.org/logviewer?job_id=506785799&repo=autoland&lineNumber=45541-45551
[task 2025-05-06T09:02:29.772Z] 09:02:29 INFO - PID 17426 | 1746522149766 RemoteAgent TRACE Received command script.callFunctionDeclaration for destination WINDOW_GLOBAL
[task 2025-05-06T09:02:30.436Z] 09:02:30 INFO - PID 17426 | *** WIFI GEO: startup called.
[task 2025-05-06T09:02:30.439Z] 09:02:30 INFO - PID 17426 | *** WIFI GEO: watch called
[task 2025-05-06T09:02:30.439Z] 09:02:30 INFO - PID 17426 | *** WIFI GEO: onStatus called.wifi-timeout
[task 2025-05-06T09:02:30.441Z] 09:02:30 INFO - PID 17426 | *** WIFI GEO: Use request cache:false reason:No cached data
[task 2025-05-06T09:02:30.444Z] 09:02:30 INFO - PID 17426 | *** WIFI GEO: Sending request
[task 2025-05-06T09:02:30.444Z] 09:02:30 INFO - PID 17426 | *** WIFI GEO: onStatus called.xhr-start
ModemManager[17802]: <info> ModemManager (version 1.10.0) starting in system bus...
[task 2025-05-06T09:02:30.461Z] 09:02:30 INFO - PID 17426 | 1746522150460 RemoteAgent DEBUG WebDriverBiDiConnection 8ee18d15-89f8-4e8c-b973-a28ee622c7d9 <- {"type":"success","id":8,"result":{"realm":"fd89d63b-dc8f-42dd-868e-f3fef11441cf","type":"success","result":{"type":"object","value":[["code",{"type":"number","value":3}],["message",{"type":"string","value":"Position acquisition timed out"}]]}}}
The delay - in the above case around 650ms - is happening before startup is called, so it cannot be related to network latency, right?
Comment 5•1 year ago
|
||
Network latency can't be a problem for our CI run, as it should always use the mock server. 🤔🤔
Comment 6•1 year ago
|
||
I pushed a try build with the Geolocation:5 logs enabled. Lets see if that might give an indication.
Comment 7•1 year ago
|
||
It doesn't look like it's helpful that much:
https://treeherder.mozilla.org/logviewer?job_id=506971728&repo=try&lineNumber=45438-45464
In some cases it works fine but in others (the 2nd case for the above link) it's taking longer. I wonder if this might be related to switching between tabs and activation?
Comment 8•1 year ago
|
||
Those lines also happened within 10ms, so not a good reference...
https://treeherder.mozilla.org/logviewer?job_id=506971728&repo=try&lineNumber=45438-45443
Here I see the fallback from geoclue took 140ms, we could force it to try network provider and skip geoclue by geo.provider.use_geoclue=false and geo.provider.use_mls=true.
Comment 9•1 year ago
|
||
Sorry, looks like I pasted again something invalid. Not sure why the clipboard didn't had the updated URL. Here is the one that I wanted to refer:
https://treeherder.mozilla.org/logviewer?job_id=506971728&repo=try&lineNumber=46993-47000
There are 320ms and maybe this is related to called.wifi-timeout? But there are other cases when this comes in pretty quickly. So not sure if that is related.
(In reply to Kagami Rosylight [:saschanaz] (they/them) from comment #8)
Here I see the fallback from geoclue took 140ms, we could force it to try network provider and skip geoclue by
geo.provider.use_geoclue=falseandgeo.provider.use_mls=true.
We don't disable these in automation yet. Which side-effect would it have? I assume we should not disable for WebDriver users?
Comment 10•1 year ago
|
||
Those lines have no geoclue logs so seems not related... And there's no other log during the delay 🤔🤔🤔
The side effect would be that the geolocation API literally won't use geoclue. But that would deviate from the usual behavior, yes.
Comment 11•1 year ago
|
||
I assume that Geolocation::GetCurrentPosition is the entry point? Maybe we could add a log entry there to see when exactly the geolocation provider is called? Maybe there is some kind of delay before? Could it maybe related to permission checks etc?
Updated•1 year ago
|
Updated•1 year ago
|
Comment 12•1 year ago
|
||
I finally got around and pushed a try build with the extra logging enabled and the multiplier removed so it should be easier to see failing tests. I already tried without removing the multiplier but didn't notice any longer delays anymore. Lets see if that helps now.
https://treeherder.mozilla.org/jobs?repo=try&revision=d8c7b4b3e8c6726c7b178508742759e31b2bc4c6
Comment 13•1 year ago
|
||
I'm no longer able to see this delay. I assume that the move to the Linux docker worker made this more stable. As such I would suggest that we are removing the multiplier when retrieving the geolocation in wdspec tests, so that we have 500ms for any build, which should be totally enough.
Comment 14•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 15•1 year ago
|
||
This is actually still a problem as the retriggered jobs from the try build show. Here an excerpt and maybe it helps to identify where we are hanging:
https://treeherder.mozilla.org/logviewer?job_id=512776610&repo=try&lineNumber=44799-44803
[task 2025-06-12T11:42:34.619+00:00] 11:42:34 INFO - PID 17464 | 1749728554614 RemoteAgent TRACE Received command script.callFunctionDeclaration for destination WINDOW_GLOBAL
[task 2025-06-12T11:42:34.622+00:00] 11:42:34 INFO - PID 17464 | [Child 17620: Main Thread]: I/Geolocation Geolocation::GetCurrentPosition
[task 2025-06-12T11:42:34.729+00:00] 11:42:34 INFO - PID 17464 | [Parent 17481: Main Thread]: D/Geolocation Checking GeoclueLocationProvider
[task 2025-06-12T11:42:35.182+00:00] 11:42:35 INFO - PID 17464 | [Child 17620: Main Thread]: D/Geolocation nsGeolocationRequest::Notify
[task 2025-06-12T11:42:35.184+00:00] 11:42:35 INFO - PID 17464 | [Child 17620: Main Thread]: D/Geolocation nsGeolocationRequest::NotifyErrorAndShutdown with error code: 3
[task 2025-06-12T11:42:35.186+00:00] 11:42:35 INFO - PID 17464 | [Child 17620: Main Thread]: D/Geolocation nsGeolocationRequest::NotifyError with error code: 3
[task 2025-06-12T11:42:35.187+00:00] 11:42:35 INFO - PID 17464 | [Child 17620: Main Thread]: D/Geolocation GeolocationPositionError::NotifyCallback
[task 2025-06-12T11:42:35.320+00:00] 11:42:35 INFO - PID 17464 | [Parent 17481: Main Thread]: D/Geolocation Selected GeoclueLocationProvider
ModemManager[17841]: <info> ModemManager (version 1.10.0) starting in system bus...
[task 2025-06-12T11:42:35.323+00:00] 11:42:35 INFO - PID 17464 | *** WIFI GEO: startup called.
This can be seen as well for opt builds which fail with code: 3 (bug 1966071).
Comment 16•1 year ago
|
||
I triggered pernosco, let's see what it finds out.
Comment 17•11 months ago
|
||
Hi Kagami, I wanted to check back if you were able to figure out something from the Pernosco recording.
Comment 18•11 months ago
|
||
I did not get any response from Pernosco. Pinged Pernosco channel for support.
Comment 19•11 months ago
|
||
I wanted to check back if you got some support. If not shall we push a new try build?
Comment 20•11 months ago
|
||
No response. Perhaps you can try triggering pernosco and see if it responds to you, I feel like it just hates me 🤔
Comment 21•11 months ago
|
||
I never got Pernosco working for me due to authentication issues that I was not able to resolve. :(
Updated•11 months ago
|
Comment 23•11 months ago
•
|
||
I'm told that the pernosco issue is because of https://github.com/mozilla/geckodriver/issues/2082, which maybe you can take a look... (I see you in the thread 😄)
Comment 24•11 months ago
|
||
We don't do anything special when starting Firefox. Lets continue the discussion on Matrix to figure out a bit more. I actually wonder if other test suites experience a similar issue or not.
Comment 25•10 months ago
|
||
I've asked on the pernosco channel but it makes it hard to track. Lets see if Kyle can help us further. TIA
Comment 26•10 months ago
|
||
There is an r+ patch which didn't land and no activity in this bug for 2 weeks.
:whimboo, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.
Comment 27•10 months ago
|
||
I think this is still pending investigations.
Comment 28•10 months ago
|
||
I was on vacation earlier this month and this is low priority because of how niche it is but I expect to investigate this this weekend.
Comment 29•10 months ago
|
||
Kyle, in case you need newer builds I'm happy to push a new try build. Just let me know. Thanks.
Comment 30•10 months ago
|
||
Yeah, that would be great, thanks.
Comment 31•9 months ago
|
||
Here a new try job with a failing wdspec test:
This are the related lines in the log where the delay can be seen:
https://treeherder.mozilla.org/logviewer?job_id=522035289&repo=try&lineNumber=46630-46643
[task 2025-08-12T09:03:47.105+00:00] 09:03:47 INFO - PID 18994 | 1754989427102 RemoteAgent TRACE Received command script.callFunctionDeclaration for destination WINDOW_GLOBAL
[task 2025-08-12T09:03:47.186+00:00] 09:03:47 INFO - PID 18994 | [Parent 19024: Main Thread]: D/Geolocation Checking GeoclueLocationProvider
[task 2025-08-12T09:03:47.654+00:00] 09:03:47 INFO - PID 18994 | [Child 19220: Main Thread]: D/Geolocation nsGeolocationRequest::Notify
[task 2025-08-12T09:03:47.656+00:00] 09:03:47 INFO - PID 18994 | [Child 19220: Main Thread]: D/Geolocation nsGeolocationRequest::NotifyErrorAndShutdown with error code: 3
[task 2025-08-12T09:03:47.657+00:00] 09:03:47 INFO - PID 18994 | [Child 19220: Main Thread]: D/Geolocation nsGeolocationRequest::NotifyError with error code: 3
[task 2025-08-12T09:03:47.658+00:00] 09:03:47 INFO - PID 18994 | [Child 19220: Main Thread]: D/Geolocation GeolocationPositionError::NotifyCallback
[task 2025-08-12T09:03:47.737+00:00] 09:03:47 INFO - PID 18994 | [Parent 19024: Main Thread]: D/Geolocation Selected GeoclueLocationProvider
[task 2025-08-12T09:03:47.742+00:00] 09:03:47 INFO - PID 18994 | *** WIFI GEO: startup called.
[task 2025-08-12T09:03:47.744+00:00] 09:03:47 INFO - PID 18994 | *** WIFI GEO: watch called
[task 2025-08-12T09:03:47.746+00:00] 09:03:47 INFO - PID 18994 | *** WIFI GEO: onStatus called.wifi-timeout
[task 2025-08-12T09:03:47.750+00:00] 09:03:47 INFO - PID 18994 | *** WIFI GEO: Use request cache:false reason:No cached data
[task 2025-08-12T09:03:47.753+00:00] 09:03:47 INFO - PID 18994 | *** WIFI GEO: Sending request
[task 2025-08-12T09:03:47.753+00:00] 09:03:47 INFO - PID 18994 | *** WIFI GEO: onStatus called.xhr-start
[task 2025-08-12T09:03:47.754+00:00] 09:03:47 INFO - PID 18994 | 1754989427748 RemoteAgent DEBUG WebDriverBiDiConnection 3d8b3857-3114-4950-8202-d934ff5edf4c <- {"type":"success","id":8,"result":{"realm":"f2aa0f81-81b0-4e12-8c91-ff37942e406b","type":"success","result":{"type":"object","value":[["code",{"type":"number","value":3}]]}}}
Interesting is that we do not have such delays for the Mochitests which test the Geolocation feature.
Comment 32•9 months ago
|
||
Hi Kyle, did the above help or won't Pernosco work with webdriver tests? I'm not sure. Thanks.
Comment 33•9 months ago
|
||
I've taken a look at recent jobs and as it looks like a longer delay seem to only happen for the very first time the Geolocation API is used after starting Firefox. For following calls the timing looks fine.
The failures that we get classified once in a while for bug 1966071 are all for esr140. So maybe something was fixed post the branching?
Kagami, are you aware of some landing
Comment 34•9 months ago
•
|
||
No, I think it's still the geoclue responding late, given the 500ms delay happens synchronously: https://searchfox.org/firefox-main/rev/856a307913c2b73765b4e88d32cf15ed05549cae/dom/geolocation/Geolocation.cpp#746-758
We are calling dbus synchronously which is not great: https://searchfox.org/firefox-main/rev/856a307913c2b73765b4e88d32cf15ed05549cae/dom/system/linux/GeoclueLocationProvider.cpp#1004-1006
But other than that I think my opinion in comment #3 still holds: this is WPT not mochitest, we can't apply such strict limitation to all implementations.
Comment 35•9 months ago
|
||
:whimboo this is still on my todo list I just keep getting sidetracked by other things :/
Description
•