Open Bug 1810421 Opened 2 months ago Updated 18 hours ago

Lag spikes every 5-6 seconds

Categories

(Core :: Networking, defect)

Firefox 108
defect

Tracking

()

UNCONFIRMED
Performance Impact high

People

(Reporter: gtjacobson+bugzilla, Unassigned)

References

Details

(Keywords: perf:resource-use)

Attachments

(1 file)

22.11 KB, application/x-zip-compressed
Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0

Steps to reproduce:

This happens after I have been using Firefox for a while, anywhere from minutes to hours. I have not been able to find a specific cause. I have done a troubleshooting refresh multiple times, and disabled all of my addons, my theme, and tab syncing.

Once the lag spikes start, they do not stop until Firefox is closed.

This seems to have affected other people, see https://www.reddit.com/r/firefox/comments/y4qbfi/really_big_lag_spikes_when_using_firefox/ and https://www.reddit.com/r/firefox/comments/1082ia2/running_firefox_make_other_video_call_software/

Actual results:

Massive lag spikes every 5-6 seconds. This affects every other program on my PC (Windows 10). If I want to do a video/audio call in another app, I have to close Firefox first, or the call quality will be terrible.

I run command prompt with the following command: ping 192.168.0.1 -t

When the lag spikes start, the ping results look like this:

Reply from 192.168.0.1: bytes=32 time=6ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1184ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1688ms TTL=64
Reply from 192.168.0.1: bytes=32 time=3ms TTL=64
Reply from 192.168.0.1: bytes=32 time=6ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1309ms TTL=64
Reply from 192.168.0.1: bytes=32 time=6ms TTL=64
Reply from 192.168.0.1: bytes=32 time=4ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=5ms TTL=64
Reply from 192.168.0.1: bytes=32 time=7ms TTL=64
Reply from 192.168.0.1: bytes=32 time=1496ms TTL=64
Reply from 192.168.0.1: bytes=32 time=3ms TTL=64
Reply from 192.168.0.1: bytes=32 time=6ms TTL=64

Expected results:

There is no reason for the lag spikes to occur.

Component: Untriaged → Performance
Product: Firefox → Core

Hi Gary, can you please capture a performance profile by following the instructions at https://profiler.firefox.com/. Then upload the profile and insert the link here? Just make sure the profile captures the lag period.

Thanks!

Flags: needinfo?(gtjacobson+bugzilla)

Hi Sean, hope this suffices:

https://share.firefox.dev/3jIzJZW

Flags: needinfo?(gtjacobson+bugzilla)

Hi Gary,

Looks like Firefox was basically idle during that period.

Just to double check, the machine was lagging while the profiler was running? And for this particular lag, other than slow pings, did you experience anything else, such as slowness in a different browser?

Flags: needinfo?(gtjacobson+bugzilla)

Hi Sean, I verified that there were slow pings during this period. I had closed all tabs except for profiler.firefox.com.

I didn't specifically check anything else during this period, but it's been consistent over the last few months: when there are slow pings, I'm unable to make video or audio calls either in Firefox itself, in Slack (desktop app) or in Google Chrome. Audio cuts out every time there is a slow ping, and video just doesn't work. Closing Firefox immediately solves the issue.

I haven't noticed any kind of slowness or lagging apart from that. Streaming audio/video works fine but presumably that's due to a buffer smoothing out the lag spikes. I don't do multiplayer gaming or anything else that might be impacted.

If you have any further suggestions for debugging I'd be happy to try anything.

Flags: needinfo?(gtjacobson+bugzilla)

This sounds suspiciously like bug 1806942. Given that there is a responsive reporter here and in that other bug, as well as Reddit posts, I'm going to bypass the triage calculator and mark this a high priority considering this appears to have a system wide impact and needs urgent further investigation in my opinion.

ni? Andrew Creskey since he has the most Necko expertise on the performance team and may be able to talk to Greg about how we can collect more information here.

A profile of slow pings -inside- firefox may also be helpful here, that way we may be able to see what part of the connection process is being impacted, ni?reporter for that.

My working theory here is that Firefox is holding some kind of system resource in this case that is limiting something along the lines of the available sockets here. But this is all very vague and hand-wavy and well outside my area of expertise. Process Explorer might be able to provide more hints in terms of open system resources/NT kernel handles/etc.

Performance Impact: --- → high
Component: Performance → Networking
Flags: needinfo?(gtjacobson+bugzilla)
Flags: needinfo?(acreskey)

Hi Bas

Not sure what you mean by "A profile of slow pings -inside- firefox"? How should I capture this?

Flags: needinfo?(gtjacobson+bugzilla) → needinfo?(bas)

(In reply to Gary from comment #6)

Hi Bas

Not sure what you mean by "A profile of slow pings -inside- firefox"? How should I capture this?

Hi Gary, here are steps to capturing a profile:
https://profiler.firefox.com/docs/#/./guide-getting-started

For this bug, if you could change the profiler settings to "Networking" (when you open the pop-up), that would be best.

Flags: needinfo?(bas)

Here you are: https://share.firefox.dev/3SL07j9

I closed all Firefox tabs before capturing, and confirmed that the lag spikes were occurring during this period by using ping.

For comparison, I closed and reopened Firefox and captured a profile without lag spikes: https://share.firefox.dev/3kEiKsc

(In reply to Gary from comment #8)

Here you are: https://share.firefox.dev/3SL07j9

I closed all Firefox tabs before capturing, and confirmed that the lag spikes were occurring during this period by using ping.

Thanks for providing the profile and helping us with this issue, Gary.

I don't see any smoking gun in that profile.
Locally, on my 2017 Asus laptop (Core i3), I have not yet been able to reproduce the lag spikes (using your ping test).

I did notice that you have a significant number of extensions installed.
Do you still see the lag spike behaviour in a new profile with no extensions?

Flags: needinfo?(acreskey) → needinfo?(gtjacobson+bugzilla)

Hi Andrew

I started up a new profile with no extensions and it was fine for a couple of days. Then I installed just two of my favourite extensions, which also went fine for a couple of days. Then I installed all of my remaining extensions. After about a day I noticed the lag spikes had started again. I then removed all extensions from the new profile and restarted Firefox, but the lag spikes returned after a while.

I don't know if the extensions had anything to do with it or if it was just coincidental timing.

I had a similar experience the first time I tried to fix the lag spikes by doing a Firefox refresh - the lag spikes went away for about a week, but then returned.

Flags: needinfo?(gtjacobson+bugzilla)
Flags: needinfo?(acreskey)

Thanks for trying that test, Gary.
It seems like the spike in pings still occurs without extensions.

I have a few ideas for next steps.

You could try disabling Telemetry to seeing if that remedies the problem.
(There was a telemetry ping in the profile you shared)
There instructions are here.
I would turn off Allow Firefox to send technical and interaction data to Mozilla and Allow Firefox to install and run studies

You could capture a more detailed performance profile when the problem is occurring.
From the "Record a performance profile" pop-up off on the toolbar, select "Edit Settings".
From here, please select "Bypass selections above and record all registered threads"
And also select the checkbox, "IPC Messages".

If we don't find the source with these steps we can consider capturing network logs, but I think we should 1. and 2. before that.

Flags: needinfo?(acreskey)
Flags: needinfo?(gtjacobson+bugzilla)
  1. I disabled telemetry but it didn't help.
  2. See https://share.firefox.dev/42kINWx
Flags: needinfo?(gtjacobson+bugzilla)

Thank you, Gary.

That profile looks very very quiet.
I don't even see any network requests.

But, as Bas hypothesized, Firefox may be holding onto too many system resources.

I'm trying again on my reference laptop to reproduce this - if you have any additional tips, please let me know :)

Flags: needinfo?(acreskey)

Gary, we have some ideas to test.
Let's start with this one:

This may be the same issue as Bug 1784402.
Can you set the preference media.cubeb.sandbox to false via about:config, restart Firefox, and see if the issue persists?

Flags: needinfo?(acreskey) → needinfo?(gtjacobson+bugzilla)

OK trying it now... Note that it sometimes takes days for the issue to reappear, so I'll let you know if it reappears or if a number of days go by.

Flags: needinfo?(gtjacobson+bugzilla)

Hi Andrew, setting media.cubeb.sandbox to false didn't work.

Hi Gary,

Could you try to get the http log with the steps below?

  1. Start Firefox.
  2. Go to about:logging and set New log modules: to timestamp,sync,nsHttp:5,nsSocketTransport:5,nsHostResolver:5,nsIOService:5
  3. Choose Logging to a file and then click Set Log File
  4. Put Firefox into offline mode by clicking File -> Work Offline.
  5. Wait 10s (I assume 10s should be enough) and see if this issue happens.
  6. Stop logging and upload the log file

If this issue still happens when Firefox is in offline mode, at least we know this is not caused by socket IO.

Thanks.

Flags: needinfo?(gtjacobson+bugzilla)

Thanks for trying media.cubeb.sandbox, Gary.

In addition to the steps from Comment 18, can you also try disabling the preference geo.enabled? (see bug 1516103)

Hi Kershaw

I wasn't sure when to start logging. I started logging before going into offline mode, waited a bit, exited offline mode, then stopped logging. It appears that nothing was written to the log file during the period that I was offline.

I confirmed that the lag issue was occurring before, during, and after offline mode.

Flags: needinfo?(gtjacobson+bugzilla)

Thanks for the log.
At least it shows that Firefox is in offline mode and there is no socket I/O at all during that time, so we can rule out the probability that this is caused by networking.

Maybe this is caused by calling some kind of system API periodically, but I am out of my idea for now.

Andrew, I think you found it!!

I've never been able to figure out what triggered the lag spikes before, but going to the link mentioned in the other issue - https://html5demos.com/geo - and allowing location immediately triggered it.

I'm trying out geo.enabled = false now, will see how it goes.

I can now tell you exactly what is causing this. I had location disabled in Windows 10 settings. I ran a few experiments with turning location on and off, and the lag spikes occur 100% of the time when I allow location in Firefox but have location disabled in Windows.

You need to log in before you can comment on or make changes to this bug.