Decreased performance for Firefox 132 inside of Docker
Categories
(Core :: Performance, defect)
Tracking
()
People
(Reporter: peter, Unassigned, NeedInfo)
References
Details
Attachments
(5 files, 2 obsolete files)
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.1 Safari/605.1.15
Steps to reproduce:
Hi! We run performance test for Wikipedia using Browsertime/sitespeed.io and when our test updated to Firefox 132 all our metric increased. Metrics like First Visual Change and Largest contentful Paint increased by 1 seconds. I can also see that TTFB increased so it looks like everything become slower.
I've verified that this also happens for other web sites and on other servers (it's no a server issue). I also verified that the only thing changed between two different runs is the Firefox version.
However I also run test direct on a bare metal server without using Firefox inside of Docker, and there I cannot see any change. It looks like this only happens for Firefox 132 inside of Docker.
You can see the change for example in: https://grafana.wikimedia.org/d/IvAfnmLMk/synthetic-testing-page-drilldown?orgId=1&var-base=sitespeed_io&var-path=desktop&var-testtype=firstView&var-group=en_wikipedia_org&var-page=_wiki_Barack_Obama&var-browser=firefox&var-connectivity=4g&var-function=median&var-s3path=https:%2F%2Fwikimedia.sitespeed.io&from=now-7d&to=now&viewPanel=302
If you click "Show each test" and hover on the green vertical lines you can see browser versions (see my screenshot browser-version, I'll attach that soon).
I will rollback to Firefox 131.
Please let me know what I can do to help!
Reporter | ||
Comment 1•19 days ago
|
||
How to check browser version.
Comment 2•19 days ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Performance' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 3•19 days ago
•
|
||
Offhand, its difficult to tell what could be the reason. Bet bet would be to do the following. While running Firefox132 in the environment where it is slow, do the following:
- Go to "about:support" in Firefox 132 and copy-paste its contents here
- Capture a performance profile in Firefox 132 using the profiler (https://profiler.firefox.com/). As a first step try the "graphics" preset and then the "networking preset". Please share the profiles here.
- If you have a testcase, please attach it to the bug.
- Perform a bisection using https://mozilla.github.io/mozregression/ . This tool will automatically download a series of builds on which you can then test. Please report with its final log.
Thanks!
Reporter | ||
Comment 4•19 days ago
|
||
Hi I'll try to produce performance profiles this weekend. Running using mozregression will be hard I think since it's need to be dockerized to reproduce, but I can try that as a last resort.
Reporter | ||
Comment 5•18 days ago
|
||
Hmm I can manually login to the server, run the same test and get the same result (132 much slower). However if turn on gecko profiler, there is no difference in the metrics.
If I increase the wait time before I start the test, meaning the browser starts, wait extra X seconds, and then the test runs, the metrics comes back to normal. Has there been any change in 132 that Firefox phones home (do request to Mozilla or some other service) to get some information and that disturbs the metrics if you start the test earlier? And if I wait some time, that has already happend?
Comment 6•18 days ago
|
||
That sounds like network thing... The attached image had "4g" in it.. Does that mean that the docker is simulating a mobike 4g network environment?
If so, then try creating a networking log.
Comment 7•16 days ago
|
||
This bug was moved into the Performance component.
:peter, could you make sure the following information is on this bug?
- For slowness or high CPU usage, capture a profile with http://profiler.firefox.com/, upload it and share the link here.
- For memory usage issues, capture a memory dump from
about:memory
and attach it to this bug. - Troubleshooting information: Go to
about:support
, click "Copy raw data to clipboard", paste it into a file, save it, and attach the file here.
If the requested information is already in the bug, please confirm it is recent.
Thank you.
Reporter | ||
Comment 8•16 days ago
|
||
So I got stuck on this yesterday with that I couldn't generate a network log in Docker (all files are zero bytes). It works running Firefox on Mac to generate those, so I will test to see if I can run Firefox on Ubuntu instead if I can get the logs there just to see if its a Firefox/Linux thing or something strange going on in the container.
When I had a look at the network log from Mac on 132 there where so many requests going back to Mozilla so it was hard to see/understand if there where any new ones.
I will have look later this week see what's going on with the network log.
Comment 9•14 days ago
|
||
Comment 10•14 days ago
|
||
There's a new security warning that happens in 132: https://github.com/sitespeedio/browsertime/issues/2207 - that also affect screenshots and videos so I think a first step could be to remove that .It pushes down content on the screen. Do you know which flag/configuration that will do that? Attaching a screenshot here as well.
Comment 11•13 days ago
|
||
I dont know what causes this or how to fix it.
Comment 12•13 days ago
•
|
||
(In reply to Peter Hedenskog from comment #10)
There's a new security warning that happens in 132: https://github.com/sitespeedio/browsertime/issues/2207 - that also affect screenshots and videos so I think a first step could be to remove that .It pushes down content on the screen. Do you know which flag/configuration that will do that? Attaching a screenshot here as well.
Please follow the link and read the doc on SUMO: https://support.mozilla.org/en-US/kb/install-firefox-linux#w_security-features-warning
the tl;dr is that you dont have access to user namespaces and this is making our sandbox less effective, and it may also badly break other things. There's a suggested fix for the case of AppArmor usage starting with Ubuntu 24.04 but in other cases you will have to verify how to grant that.
Comment 13•13 days ago
|
||
Comment 14•13 days ago
|
||
Flipping security.sandbox.warn_unprivileged_namespaces
pref would not show the notification assuming you cannot fix the configuration (and it does not break too much things)
Reporter | ||
Comment 15•13 days ago
|
||
Ok thanks let me try that pref first.
Reporter | ||
Comment 16•8 days ago
|
||
The pref removed the notification, but the metrics are still high. I tried to roll forward to the latest 133 beta and the issue is still there. I changed how Firefox is installed in the container and use apt-get, the metrics is still high. I'll try again on the servers that has the problem and see if I can understand better what's going on.
Reporter | ||
Comment 17•8 days ago
|
||
Profile from 132
Reporter | ||
Comment 18•8 days ago
|
||
Profile for 131
Reporter | ||
Comment 19•8 days ago
|
||
Ok, I got two profiles now can you please check if you can see there what's going on. One is using 131 and one is 132. Running 132 adds at least 300ms on all metrics (including TTFB).
When I tried profiling first, I could not get a difference. However running the profiling with Browsertime adds a 3 seconds wait time for Geckoprofiler , and with that 3 second wait time, there's no difference between 131 and 132. When just waits 100 ms, the difference is there and I could get a profile.
I also tried turning on the net log but that seems broken on Linux/Ubuntu in the container, it generated empty files. On Mac it works.
Reporter | ||
Comment 20•6 days ago
|
||
Looking at the release notes it looks like https://support.mozilla.org/en-US/kb/third-party-cookies-firefox-tracking-protection could correlate (or do you internally have another changelog so its easier to see what's changed?)?
Reporter | ||
Comment 21•6 days ago
|
||
New trace with HTTP request
Reporter | ||
Comment 22•6 days ago
|
||
new trace with HTTP request
Reporter | ||
Comment 23•6 days ago
|
||
Ok, with help from Greg I could get the HTTP log in the profile.json.
In 131 there are request that goes to quicksuggest, here's an example:
https://firefox-settings-attachments.cdn.mozilla.net/main-workspace/quicksuggest/b853057c-be5f-4e79-af55-dc310e29e144
In 132 the quicksuggest request are there + new requests that goes to search-categorization. This is one example URL:
https://firefox-settings-attachments.cdn.mozilla.net/main-workspace/search-categorization/9bcce3e1-df09-4b94-9153-3bcb0200c121.json
So my guess is that the new search-categorization requests is causing the problem. What is the correct way to turn that off? Also is there a way to turn off quicksuggest?
Comment 24•6 days ago
|
||
Hi :standard8, I see that you're the owner of search code, would you know we could turn those quicksuggest
, and search-categorization
requests off (see comment #23)?
Comment 25•5 days ago
|
||
I suspect this may be improved once bug 1907327 lands/is released.
I think you can turn search-categorization off, for that matter you could possibly stop remote settings as well I think, but I'm wondering if the real problem is in the harness?
I believe on Firefox performance tests, we'll start up Firefox with a new profile, let it settle, and then restart and run the actual tests. We also have a bunch of preferences that we turn off for all tests and perf tests.
Are the Wikipedia tests following that process as well?
Reporter | ||
Comment 26•5 days ago
|
||
I think the configuration is the same as how you internally runs the tests, except that we don't wait that settle time. If I would add that to Firefox in Browsertime, users that runs test locally using Chrome would just get the result, then when they switch to run performance tests with Firefox, they would wait 30 seconds, then the measurement would start. That seems like bad user experience so I would really want to avoid it.
The regression was introduced in 132, if there's a switch to turn off search-categorization, then I could verify that's the root cause? Right now I think it is but I will only know if I turn it off and the regression goes away.
Comment 27•5 days ago
|
||
(In reply to Peter Hedenskog from comment #26)
I think the configuration is the same as how you internally runs the tests, except that we don't wait that settle time. If I would add that to Firefox in Browsertime, users that runs test locally using Chrome would just get the result, then when they switch to run performance tests with Firefox, they would wait 30 seconds, then the measurement would start. That seems like bad user experience so I would really want to avoid it.
Greg, I am right in thinking there is some sort of settle time that we have, or do we just go straight into the tests?
All I'm really thinking here is that you may get inconsistent results over time, and more issues such as this pop-up.
The regression was introduced in 132, if there's a switch to turn off search-categorization, then I could verify that's the root cause? Right now I think it is but I will only know if I turn it off and the regression goes away.
The preference is browser.search.serpEventTelemetryCategorization.enabled
. In theory, that would only be affecting network on first startup of a fresh profile. Once it has downloaded, it would then be fine.
Comment 28•5 days ago
|
||
Thanks :standard8! Yes, you're correct, we have a 30s wait before starting our performance tests as a default. It's either done at browser startup during the tests, or we build a conditioned profile where the 30s wait was already done, then, in our tests we skip the 30s wait while using that prebuilt profile.
Peter, I was looking at the prefs that we're setting, and there may be some new ones that you don't have in browsertime at the moment. Here's a few that I spotted there: https://searchfox.org/mozilla-central/source/testing/profiles/common/user.js#59,88-90
Reporter | ||
Comment 29•5 days ago
|
||
Thank you Greg I added those configurations now..
The configuration helped, the regression is gone, thank you :standard8!
We can close this issue now.
Comment 30•2 days ago
|
||
Greg, is there anything else to follow-up on here or can we close this as works for me? I think bug 1907327 is perhaps the only follow-up, but that's already filed.
Description
•