Open Bug 1930110 Opened 19 days ago Updated 2 days ago

Decreased performance for Firefox 132 inside of Docker

Categories

(Core :: Performance, defect)

Firefox 132
defect

Tracking

()

People

(Reporter: peter, Unassigned, NeedInfo)

References

Details

Attachments

(5 files, 2 obsolete files)

Attached image firefox-increase.jpg

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.1 Safari/605.1.15

Steps to reproduce:

Hi! We run performance test for Wikipedia using Browsertime/sitespeed.io and when our test updated to Firefox 132 all our metric increased. Metrics like First Visual Change and Largest contentful Paint increased by 1 seconds. I can also see that TTFB increased so it looks like everything become slower.

I've verified that this also happens for other web sites and on other servers (it's no a server issue). I also verified that the only thing changed between two different runs is the Firefox version.

However I also run test direct on a bare metal server without using Firefox inside of Docker, and there I cannot see any change. It looks like this only happens for Firefox 132 inside of Docker.

You can see the change for example in: https://grafana.wikimedia.org/d/IvAfnmLMk/synthetic-testing-page-drilldown?orgId=1&var-base=sitespeed_io&var-path=desktop&var-testtype=firstView&var-group=en_wikipedia_org&var-page=_wiki_Barack_Obama&var-browser=firefox&var-connectivity=4g&var-function=median&var-s3path=https:%2F%2Fwikimedia.sitespeed.io&from=now-7d&to=now&viewPanel=302

If you click "Show each test" and hover on the green vertical lines you can see browser versions (see my screenshot browser-version, I'll attach that soon).

I will rollback to Firefox 131.

Please let me know what I can do to help!

Attached image browser-version.jpg

How to check browser version.

The Bugbug bot thinks this bug should belong to the 'Core::Performance' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Performance
Product: Firefox → Core

Offhand, its difficult to tell what could be the reason. Bet bet would be to do the following. While running Firefox132 in the environment where it is slow, do the following:

  1. Go to "about:support" in Firefox 132 and copy-paste its contents here
  2. Capture a performance profile in Firefox 132 using the profiler (https://profiler.firefox.com/). As a first step try the "graphics" preset and then the "networking preset". Please share the profiles here.
  3. If you have a testcase, please attach it to the bug.
  4. Perform a bisection using https://mozilla.github.io/mozregression/ . This tool will automatically download a series of builds on which you can then test. Please report with its final log.

Thanks!

Flags: needinfo?(peter)

Hi I'll try to produce performance profiles this weekend. Running using mozregression will be hard I think since it's need to be dockerized to reproduce, but I can try that as a last resort.

Hmm I can manually login to the server, run the same test and get the same result (132 much slower). However if turn on gecko profiler, there is no difference in the metrics.

If I increase the wait time before I start the test, meaning the browser starts, wait extra X seconds, and then the test runs, the metrics comes back to normal. Has there been any change in 132 that Firefox phones home (do request to Mozilla or some other service) to get some information and that disturbs the metrics if you start the test earlier? And if I wait some time, that has already happend?

Flags: needinfo?(peter)

That sounds like network thing... The attached image had "4g" in it.. Does that mean that the docker is simulating a mobike 4g network environment?
If so, then try creating a networking log.

This bug was moved into the Performance component.

:peter, could you make sure the following information is on this bug?

  • For slowness or high CPU usage, capture a profile with http://profiler.firefox.com/, upload it and share the link here.
  • For memory usage issues, capture a memory dump from about:memory and attach it to this bug.
  • Troubleshooting information: Go to about:support, click "Copy raw data to clipboard", paste it into a file, save it, and attach the file here.

If the requested information is already in the bug, please confirm it is recent.

Thank you.

Flags: needinfo?(peter)

So I got stuck on this yesterday with that I couldn't generate a network log in Docker (all files are zero bytes). It works running Firefox on Mac to generate those, so I will test to see if I can run Firefox on Ubuntu instead if I can get the logs there just to see if its a Firefox/Linux thing or something strange going on in the container.

When I had a look at the network log from Mac on 132 there where so many requests going back to Mozilla so it was hard to see/understand if there where any new ones.

I will have look later this week see what's going on with the network log.

There's a new security warning that happens in 132: https://github.com/sitespeedio/browsertime/issues/2207 - that also affect screenshots and videos so I think a first step could be to remove that .It pushes down content on the screen. Do you know which flag/configuration that will do that? Attaching a screenshot here as well.

Flags: needinfo?(mayankleoboy1)

I dont know what causes this or how to fix it.

Flags: needinfo?(mayankleoboy1)

(In reply to Peter Hedenskog from comment #10)

There's a new security warning that happens in 132: https://github.com/sitespeedio/browsertime/issues/2207 - that also affect screenshots and videos so I think a first step could be to remove that .It pushes down content on the screen. Do you know which flag/configuration that will do that? Attaching a screenshot here as well.

Please follow the link and read the doc on SUMO: https://support.mozilla.org/en-US/kb/install-firefox-linux#w_security-features-warning

the tl;dr is that you dont have access to user namespaces and this is making our sandbox less effective, and it may also badly break other things. There's a suggested fix for the case of AppArmor usage starting with Ubuntu 24.04 but in other cases you will have to verify how to grant that.

Flipping security.sandbox.warn_unprivileged_namespaces pref would not show the notification assuming you cannot fix the configuration (and it does not break too much things)

Ok thanks let me try that pref first.

The pref removed the notification, but the metrics are still high. I tried to roll forward to the latest 133 beta and the issue is still there. I changed how Firefox is installed in the container and use apt-get, the metrics is still high. I'll try again on the servers that has the problem and see if I can understand better what's going on.

Attached file geckoProfile-ff-132.json (obsolete) —

Profile from 132

Attached file geckoProfile-ff-131.json (obsolete) —

Profile for 131

Ok, I got two profiles now can you please check if you can see there what's going on. One is using 131 and one is 132. Running 132 adds at least 300ms on all metrics (including TTFB).

When I tried profiling first, I could not get a difference. However running the profiling with Browsertime adds a 3 seconds wait time for Geckoprofiler , and with that 3 second wait time, there's no difference between 131 and 132. When just waits 100 ms, the difference is there and I could get a profile.

I also tried turning on the net log but that seems broken on Linux/Ubuntu in the container, it generated empty files. On Mac it works.

Flags: needinfo?(peter)

Looking at the release notes it looks like https://support.mozilla.org/en-US/kb/third-party-cookies-firefox-tracking-protection could correlate (or do you internally have another changelog so its easier to see what's changed?)?

Attached file 132.json

New trace with HTTP request

Attachment #9438677 - Attachment is obsolete: true
Attached file 131.json

new trace with HTTP request

Attachment #9438678 - Attachment is obsolete: true

Ok, with help from Greg I could get the HTTP log in the profile.json.

In 131 there are request that goes to quicksuggest, here's an example:
https://firefox-settings-attachments.cdn.mozilla.net/main-workspace/quicksuggest/b853057c-be5f-4e79-af55-dc310e29e144

In 132 the quicksuggest request are there + new requests that goes to search-categorization. This is one example URL:
https://firefox-settings-attachments.cdn.mozilla.net/main-workspace/search-categorization/9bcce3e1-df09-4b94-9153-3bcb0200c121.json

So my guess is that the new search-categorization requests is causing the problem. What is the correct way to turn that off? Also is there a way to turn off quicksuggest?

Hi :standard8, I see that you're the owner of search code, would you know we could turn those quicksuggest, and search-categorization requests off (see comment #23)?

Flags: needinfo?(standard8)

I suspect this may be improved once bug 1907327 lands/is released.

I think you can turn search-categorization off, for that matter you could possibly stop remote settings as well I think, but I'm wondering if the real problem is in the harness?

I believe on Firefox performance tests, we'll start up Firefox with a new profile, let it settle, and then restart and run the actual tests. We also have a bunch of preferences that we turn off for all tests and perf tests.

Are the Wikipedia tests following that process as well?

Flags: needinfo?(standard8)
See Also: → 1907327

I think the configuration is the same as how you internally runs the tests, except that we don't wait that settle time. If I would add that to Firefox in Browsertime, users that runs test locally using Chrome would just get the result, then when they switch to run performance tests with Firefox, they would wait 30 seconds, then the measurement would start. That seems like bad user experience so I would really want to avoid it.

The regression was introduced in 132, if there's a switch to turn off search-categorization, then I could verify that's the root cause? Right now I think it is but I will only know if I turn it off and the regression goes away.

(In reply to Peter Hedenskog from comment #26)

I think the configuration is the same as how you internally runs the tests, except that we don't wait that settle time. If I would add that to Firefox in Browsertime, users that runs test locally using Chrome would just get the result, then when they switch to run performance tests with Firefox, they would wait 30 seconds, then the measurement would start. That seems like bad user experience so I would really want to avoid it.

Greg, I am right in thinking there is some sort of settle time that we have, or do we just go straight into the tests?

All I'm really thinking here is that you may get inconsistent results over time, and more issues such as this pop-up.

The regression was introduced in 132, if there's a switch to turn off search-categorization, then I could verify that's the root cause? Right now I think it is but I will only know if I turn it off and the regression goes away.

The preference is browser.search.serpEventTelemetryCategorization.enabled. In theory, that would only be affecting network on first startup of a fresh profile. Once it has downloaded, it would then be fine.

Flags: needinfo?(gmierz2)

Thanks :standard8! Yes, you're correct, we have a 30s wait before starting our performance tests as a default. It's either done at browser startup during the tests, or we build a conditioned profile where the 30s wait was already done, then, in our tests we skip the 30s wait while using that prebuilt profile.

Peter, I was looking at the prefs that we're setting, and there may be some new ones that you don't have in browsertime at the moment. Here's a few that I spotted there: https://searchfox.org/mozilla-central/source/testing/profiles/common/user.js#59,88-90

Flags: needinfo?(gmierz2)

Thank you Greg I added those configurations now..

The configuration helped, the regression is gone, thank you :standard8!

We can close this issue now.

Greg, is there anything else to follow-up on here or can we close this as works for me? I think bug 1907327 is perhaps the only follow-up, but that's already filed.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(gmierz2)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: