Closed Bug 1810421 Opened 2 years ago Closed 2 years ago

Lag spikes every 5-6 seconds when Windows Location Services are disabled

Categories

(Core :: Widget: Win32, defect, P2)

Firefox 108
defect

Tracking

()

RESOLVED FIXED
115 Branch
Performance Impact high
Tracking Status
relnote-firefox --- 115+
firefox115 --- fixed

People

(Reporter: gtjacobson+bugzilla, Assigned: handyman)

References

Details

(Keywords: perf:resource-use)

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0

Steps to reproduce:

This happens after I have been using Firefox for a while, anywhere from minutes to hours. I have not been able to find a specific cause. I have done a troubleshooting refresh multiple times, and disabled all of my addons, my theme, and tab syncing.

Once the lag spikes start, they do not stop until Firefox is closed.

This seems to have affected other people, see https://www.reddit.com/r/firefox/comments/y4qbfi/really_big_lag_spikes_when_using_firefox/ and https://www.reddit.com/r/firefox/comments/1082ia2/running_firefox_make_other_video_call_software/

Actual results:

Massive lag spikes every 5-6 seconds. This affects every other program on my PC (Windows 10). If I want to do a video/audio call in another app, I have to close Firefox first, or the call quality will be terrible.

I run command prompt with the following command: ping 192.168.0.1 -t

When the lag spikes start, the ping results look like this:

Reply from 192.168.0.1: bytes=32 time=6ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1184ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1688ms TTL=64
Reply from 192.168.0.1: bytes=32 time=3ms TTL=64
Reply from 192.168.0.1: bytes=32 time=6ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1309ms TTL=64
Reply from 192.168.0.1: bytes=32 time=6ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=7ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1496ms TTL=64
Reply from 192.168.0.1: bytes=32 time=3ms TTL=64
Reply from 192.168.0.1: bytes=32 time=6ms TTL=64

Expected results:

There is no reason for the lag spikes to occur.

Component: Untriaged → Performance
Product: Firefox → Core

Hi Gary, can you please capture a performance profile by following the instructions at https://profiler.firefox.com/. Then upload the profile and insert the link here? Just make sure the profile captures the lag period.

Thanks!

Flags: needinfo?(gtjacobson+bugzilla)

Hi Sean, hope this suffices:

https://share.firefox.dev/3jIzJZW

Flags: needinfo?(gtjacobson+bugzilla)

Hi Gary,

Looks like Firefox was basically idle during that period.

Just to double check, the machine was lagging while the profiler was running? And for this particular lag, other than slow pings, did you experience anything else, such as slowness in a different browser?

Flags: needinfo?(gtjacobson+bugzilla)

Hi Sean, I verified that there were slow pings during this period. I had closed all tabs except for profiler.firefox.com.

I didn't specifically check anything else during this period, but it's been consistent over the last few months: when there are slow pings, I'm unable to make video or audio calls either in Firefox itself, in Slack (desktop app) or in Google Chrome. Audio cuts out every time there is a slow ping, and video just doesn't work. Closing Firefox immediately solves the issue.

I haven't noticed any kind of slowness or lagging apart from that. Streaming audio/video works fine but presumably that's due to a buffer smoothing out the lag spikes. I don't do multiplayer gaming or anything else that might be impacted.

If you have any further suggestions for debugging I'd be happy to try anything.

Flags: needinfo?(gtjacobson+bugzilla)

This sounds suspiciously like bug 1806942. Given that there is a responsive reporter here and in that other bug, as well as Reddit posts, I'm going to bypass the triage calculator and mark this a high priority considering this appears to have a system wide impact and needs urgent further investigation in my opinion.

ni? Andrew Creskey since he has the most Necko expertise on the performance team and may be able to talk to Greg about how we can collect more information here.

A profile of slow pings -inside- firefox may also be helpful here, that way we may be able to see what part of the connection process is being impacted, ni?reporter for that.

My working theory here is that Firefox is holding some kind of system resource in this case that is limiting something along the lines of the available sockets here. But this is all very vague and hand-wavy and well outside my area of expertise. Process Explorer might be able to provide more hints in terms of open system resources/NT kernel handles/etc.

Performance Impact: --- → high
Component: Performance → Networking
Flags: needinfo?(gtjacobson+bugzilla)
Flags: needinfo?(acreskey)

Hi Bas

Not sure what you mean by "A profile of slow pings -inside- firefox"? How should I capture this?

Flags: needinfo?(gtjacobson+bugzilla) → needinfo?(bas)

(In reply to Gary from comment #6)

Hi Bas

Not sure what you mean by "A profile of slow pings -inside- firefox"? How should I capture this?

Hi Gary, here are steps to capturing a profile:
https://profiler.firefox.com/docs/#/./guide-getting-started

For this bug, if you could change the profiler settings to "Networking" (when you open the pop-up), that would be best.

Flags: needinfo?(bas)

Here you are: https://share.firefox.dev/3SL07j9

I closed all Firefox tabs before capturing, and confirmed that the lag spikes were occurring during this period by using ping.

For comparison, I closed and reopened Firefox and captured a profile without lag spikes: https://share.firefox.dev/3kEiKsc

(In reply to Gary from comment #8)

Here you are: https://share.firefox.dev/3SL07j9

I closed all Firefox tabs before capturing, and confirmed that the lag spikes were occurring during this period by using ping.

Thanks for providing the profile and helping us with this issue, Gary.

I don't see any smoking gun in that profile.
Locally, on my 2017 Asus laptop (Core i3), I have not yet been able to reproduce the lag spikes (using your ping test).

I did notice that you have a significant number of extensions installed.
Do you still see the lag spike behaviour in a new profile with no extensions?

Flags: needinfo?(acreskey) → needinfo?(gtjacobson+bugzilla)

Hi Andrew

I started up a new profile with no extensions and it was fine for a couple of days. Then I installed just two of my favourite extensions, which also went fine for a couple of days. Then I installed all of my remaining extensions. After about a day I noticed the lag spikes had started again. I then removed all extensions from the new profile and restarted Firefox, but the lag spikes returned after a while.

I don't know if the extensions had anything to do with it or if it was just coincidental timing.

I had a similar experience the first time I tried to fix the lag spikes by doing a Firefox refresh - the lag spikes went away for about a week, but then returned.

Flags: needinfo?(gtjacobson+bugzilla)
Flags: needinfo?(acreskey)

Thanks for trying that test, Gary.
It seems like the spike in pings still occurs without extensions.

I have a few ideas for next steps.

You could try disabling Telemetry to seeing if that remedies the problem.
(There was a telemetry ping in the profile you shared)
There instructions are here.
I would turn off Allow Firefox to send technical and interaction data to Mozilla and Allow Firefox to install and run studies

You could capture a more detailed performance profile when the problem is occurring.
From the "Record a performance profile" pop-up off on the toolbar, select "Edit Settings".
From here, please select "Bypass selections above and record all registered threads"
And also select the checkbox, "IPC Messages".

If we don't find the source with these steps we can consider capturing network logs, but I think we should 1. and 2. before that.

Flags: needinfo?(acreskey)
Flags: needinfo?(gtjacobson+bugzilla)
  1. I disabled telemetry but it didn't help.
  2. See https://share.firefox.dev/42kINWx
Flags: needinfo?(gtjacobson+bugzilla)

Thank you, Gary.

That profile looks very very quiet.
I don't even see any network requests.

But, as Bas hypothesized, Firefox may be holding onto too many system resources.

I'm trying again on my reference laptop to reproduce this - if you have any additional tips, please let me know :)

Flags: needinfo?(acreskey)

Gary, we have some ideas to test.
Let's start with this one:

This may be the same issue as Bug 1784402.
Can you set the preference media.cubeb.sandbox to false via about:config, restart Firefox, and see if the issue persists?

Flags: needinfo?(acreskey) → needinfo?(gtjacobson+bugzilla)

OK trying it now... Note that it sometimes takes days for the issue to reappear, so I'll let you know if it reappears or if a number of days go by.

Flags: needinfo?(gtjacobson+bugzilla)

Hi Andrew, setting media.cubeb.sandbox to false didn't work.

Hi Gary,

Could you try to get the http log with the steps below?

  1. Start Firefox.
  2. Go to about:logging and set New log modules: to timestamp,sync,nsHttp:5,nsSocketTransport:5,nsHostResolver:5,nsIOService:5
  3. Choose Logging to a file and then click Set Log File
  4. Put Firefox into offline mode by clicking File -> Work Offline.
  5. Wait 10s (I assume 10s should be enough) and see if this issue happens.
  6. Stop logging and upload the log file

If this issue still happens when Firefox is in offline mode, at least we know this is not caused by socket IO.

Thanks.

Flags: needinfo?(gtjacobson+bugzilla)

Thanks for trying media.cubeb.sandbox, Gary.

In addition to the steps from Comment 18, can you also try disabling the preference geo.enabled? (see bug 1516103)

Hi Kershaw

I wasn't sure when to start logging. I started logging before going into offline mode, waited a bit, exited offline mode, then stopped logging. It appears that nothing was written to the log file during the period that I was offline.

I confirmed that the lag issue was occurring before, during, and after offline mode.

Flags: needinfo?(gtjacobson+bugzilla)

Thanks for the log.
At least it shows that Firefox is in offline mode and there is no socket I/O at all during that time, so we can rule out the probability that this is caused by networking.

Maybe this is caused by calling some kind of system API periodically, but I am out of my idea for now.

Andrew, I think you found it!!

I've never been able to figure out what triggered the lag spikes before, but going to the link mentioned in the other issue - https://html5demos.com/geo - and allowing location immediately triggered it.

I'm trying out geo.enabled = false now, will see how it goes.

I can now tell you exactly what is causing this. I had location disabled in Windows 10 settings. I ran a few experiments with turning location on and off, and the lag spikes occur 100% of the time when I allow location in Firefox but have location disabled in Windows.

Thanks. I'd like to change the component to DOM:Geolocation for further investigation.

Component: Networking → DOM: Geolocation

Thank you for putting in the hard work to pinpoint the issue, Gary.

Severity: -- → S3

The severity field for this bug is set to S3. However, the Performance Impact field flags this bug as having a high impact on the performance.
:hsivonen, could you consider increasing the severity of this performance-impacting bug? Alternatively, if you think the performance impact is lower than previously assessed, could you request a re-triage from the performance team by setting the Performance Impact flag to ??

For more information, please visit auto_nag documentation.

Flags: needinfo?(hsivonen)

I'm not sure how we want to collapse these, but from user tests it looks like these are all duplicates of Bug 1658449.

In addition to this bug (Bug 1810421), that includes:
Bug 1516103
Bug 1806942
Bug 1502532

Given the number of bugs associated with this issue and the high performance impact, can the severity be reviewed? Our suggestion would be an S2 along with heightened attention on the issue.

I'm a little on the fence about what the component should be here. But I'm moving this to Widget:Win32 as it seems this is a platform integration issue.

Component: DOM: Geolocation → Widget: Win32

Comment 24 hints at an uncommon Windows configuration so I would say that's a reason to not mark this S2.

Does bug 1704500 change the behavior here in any way?

After reading bug 1658449, are y'all on Wifi or on cabled internet?

But I'm moving this to Widget:Win32 as it seems this is a platform integration issue.

FWIW given that this happens when disabling the system location provider - which causes us to fall back to the Mozilla service, it's somewhat likely it's the opposite. Anyway, someone will have to root cause it :P

Duplicate of this bug: 1806942

Gian-Carlo I'm on wifi - desktop PC with a USB dongle. (TP-Link TL-WN725N 150Mbps Wireless N Nano USB Adapter)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #34)

But I'm moving this to Widget:Win32 as it seems this is a platform integration issue.

FWIW given that this happens when disabling the system location provider - which causes us to fall back to the Mozilla service, it's somewhat likely it's the opposite. Anyway, someone will have to root cause it :P

Ah! Sorry, this is a misunderstanding on my part of the working of the code.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #32)

Comment 24 hints at an uncommon Windows configuration so I would say that's a reason to not mark this S2.

Does bug 1704500 change the behavior here in any way?

I wouldn't say it's that uncommon, Windows asks you during install whether to disable it or not, it'd be interesting to have some statistics on this. But I want to emphasize this lags the -entire system- and not just Firefox, other software as well.

Bug 1658449 sounds very likely here. I can confirm that if you disable the system provider, we'll fall back to MLS, which will scan nearby Wifi and cell (?) access points. Support was added 14 years ago in bug 479898.

I'm going to guess some Wifi adapters/drivers don't deal with this very well. The code has some hints like: https://searchfox.org/mozilla-central/source/netwerk/wifi/win_wifiScanner.cpp#121

In any case, if the lag spikes every few seconds that seems to imply this process isn't properly shut down, or running way too often. If the hardware is limited the lag spike is unavoidable. One possible heuristic would be to not rescan if we're still connected to the same Wifi AP ourselves - we're less likely to have moved around if that's true.

Alternatively, if the user force-disables the Windows provider, should we even try to scan at all? Having a fallback is nice when the Windows one doesn't work, but it may be the opposite of what the user wants if they forcibly disabled it.

I think we're getting consensus there's likely 2 bugs here:

a) The wifi AP scan seems to be triggering way too often.
b) If the user has disabled Windows Location services...which Windows now offers by default on install (thanks Bas)...we probably should not be "helpful" and fall back automatically to our own!

Agree to make this S2 because of the surfacing of that option during install (and overlap with what a typical Firefox user cares about!), confirming due to the number of dupes.

Severity: S3 → S2
Status: UNCONFIRMED → NEW
Ever confirmed: true

This changed the scan interval from 60s to 5s:
https://hg.mozilla.org/mozilla-central/rev/ba5f01898068#l8.12

May have made sense for B2G where we knew what the Wifi drivers are, but for desktop? :-/

If you're suffering from this bug, I think the cleanest workaround for now would be to set geo.provider.network.scan to false. You may need to create that preference in about:config.

Flags: needinfo?(hsivonen)
Assignee: nobody → davidp99
Priority: -- → P2
See Also: → 1711854

Would like to provide quick update. (Please see my last update on https://bugzilla.mozilla.org/show_bug.cgi?id=1711854#c12, in short, I fix this problem just by changing my Wifi card)

As of today (Firefox v111.0.1 64-bit, win11), the situation on my machine is like this.

With Intel AX210 card, no problems whatsoever.

With Realtek 8822BE, the problem is still there . However, the situation is better than before, the ping spike only happens when I try to use geolocation service. For example, it only happen when

For me, this is major improvement comparing to the last month because at that time, when geolocation is used, the ping spike happens in constant interval until Firefox is close. Closing tabs that use geolocation did not help.

But now, the ping spike only happen once the geolocation is actually performed. Let me elaborate on this point. Let's say I call navigator.geolocation.getCurrentPosition((position)=>{console.log(position)}) repeatedly 10 times. At the time of calling, nothing happens, no ping spike, no result appears on the console log. Several seconds after that, there is a ping spike once and 10 output in the console appears at the same time. After that, no ping spike.

Turning geo.enabled to false solves the problem but the location service is not available, obviously.

Hope this help. Right now I am happy with Intel AX210.

Summary: Lag spikes every 5-6 seconds → Lag spikes every 5-6 seconds when Windows Location Services are disabled

I got a TX20U wireless adapter from TPLink that seems to have a rtl8852au chipset from realtek (usb id: 2357:013f) and since the day I bought it I encountered these random ping spikes starting from aparently nowhere while when using a cable I didn't.

I initially thought it was a driver issue (which could still be the case) or a device issue so I jumped into a linux install and installed the drivers from https://github.com/lwfinger/rtl8852au.

Everything worked correctly for the whole day I used that installation, I reinstalled Windows and reinstalled the TPLink drivers, everything seemed ok, but as said before the problem started out of nowhere, I got acostumed to keep a CMD window permanently pinging google servers, when the problem started I tried different things, reinstalling drivers, disabling power saving mode in usb devices and so on.

Sudenly before restarting one time I closed firefox and noticed the spikes stopped almost immediately according to the ping results, after 2 hours looking into this I found this thread, I've seen others threads and the only similareties I found was that everyone seemed to use wireless adapters from Realtek, but since this didn't happen in my Linux installation the Windows location services thing definetely has to be the culprit.

I turned off the location in Firefox and it has been 2 whole days since I haven't encountered the issue.

Since this doesn't happen with other browsers I would guess that Firefox is also the culprit, but the fact that other people seems to only have realtek devices also shifts the blame to Realtek, I would like to help with any logs that you guys need to pinpoint exactly where the problem is located.

We have reproduced this and have a fix that ties wifi-scanner polling for location to mobile networks only. On e.g. wifi, scanning for location would pretty much happen once, not every 5 seconds. On mobile networks, we'll try dropping the polling rate to every 60s (down from 5s).

Reorganizes the wifi-scanning code and makes more of it platform-generic
to ease the transition from polling the wifi to usually scanning only on
network changes. This is essentially just moving files/code around and
promoting nsWifiMonitor::DoScan to be platform-independent.

Simplifies the concurrent operations of wifi scanning and reduces the
frequency in common cases. Wifi scanning when on mobile is reduced from
every 5 seconds to every minute. Wifi scans will also happen whenever
a new listener is registered.

Depends on D176199

Attachment #9329770 - Attachment description: Bug 1810421: Refactor wifi scanning to consolidate generic code r=cmartin! → Bug 1810421: Refactor wifi scanning to consolidate generic code r=#necko-reviewers!
Attachment #9329771 - Attachment description: Bug 1810421: Only scan wifi when the network changes or when on mobile r=cmartin!,#necko-reviewers! → Bug 1810421: Only scan wifi when the network changes or when on mobile r=#necko-reviewers!
Duplicate of this bug: 1711854
Pushed by daparks@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/afd62e82fe83 Refactor wifi scanning to consolidate generic code r=necko-reviewers,kershaw,valentin https://hg.mozilla.org/integration/autoland/rev/1cf92ef74fb4 Only scan wifi when the network changes or when on mobile r=necko-reviewers,kershaw

Backed out for causing build bustages in TestWifiMonitor.cpp

Flags: needinfo?(davidp99)
Pushed by daparks@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/35c408b9531d Refactor wifi scanning to consolidate generic code r=necko-reviewers,kershaw,valentin https://hg.mozilla.org/integration/autoland/rev/028af3d71e19 Only scan wifi when the network changes or when on mobile r=necko-reviewers,kershaw
Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/af2b512cb5ac Refactor wifi scanning to consolidate generic code r=necko-reviewers,kershaw,valentin a=reland CLOSED TREE https://hg.mozilla.org/integration/autoland/rev/cefcb5f4f4e0 Only scan wifi when the network changes or when on mobile r=necko-reviewers,kershaw a=reland CLOSED TREE

Relanded as future investigation uncovered the actual regressor.

We apologise if this has caused any inconvenience.

Flags: needinfo?(davidp99)
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch

Thank you for fixing this!

Duplicate of this bug: 1516103
Duplicate of this bug: 1835511
Duplicate of this bug: 1835512
Duplicate of this bug: 1658449

Would you consider nominating this bug fix for Fx 115 release notes? We're addressing 7 bugs (I believe) and significantly enhancing the experience for the users who have been affected.

Flags: needinfo?(davidp99)
QA Whiteboard: [qa-115b-p2]

Sounds like a good idea.

Release Note Request (optional, but appreciated)

[Why is this notable]:
Many users have been affected by this bug for years. It is common among low-end/USB wifi drivers. The symptoms were severe.

[Suggested wording]:
Windows users with affected wifi drivers and OS geolocation disabled can now approve geolocation on a case by case basis without causing system-wide network instability.

[Links (documentation, blog post, etc)]:
There is discussion of a blog post but it will likely not appear in time. I will reach out if that changes.

relnote-firefox: --- → ?
Flags: needinfo?(davidp99)

Added a slightly reworded release note the Fx115 release notes

Duplicate of this bug: 1800454
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: