Open Bug 1626604 Opened 4 years ago Updated 1 year ago

[meta] Hardening Raptor jobs with conditioned profile usage

Categories

(Testing :: Raptor, task, P2)

Version 3
task

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

(Depends on 4 open bugs, Blocks 5 open bugs)

Details

(Keywords: meta)

Yesterday I noticed that when I run a Raptor test like raptor-tp6m-1 on my Android Moto G5 device the downloaded conditioned profile is NOT copied over. Instead only the cert db files as listed here are copied to the device at /sdcard/raptor/profile.

That means that we actually do not have a conditioned profile when running the Raptor test. And the post startup delay in such a case is 1s, instead of 30s. So we basically start the page load tests while the browser hasn't been settled yet.

I pushed two try builds to check various bits:

  1. I added the --no-conditioned-profile option to page load tests from 1-9 to see what happens if we use the 30s post startup delay again. And nearly all the tests are green, and I cannot see a single page load timeout failure:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=065e2014b787450d7b4d8b5784d3c82bab384f88&selectedJob=295707852

  1. This try build uses the conditioned profiles, but increases the post startup delay to 30s which is the same as for a non-conditioned profile job. Opt builds are failing still with an application timeout failure.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b75c9652512ca1a25aa29bb45f31218ea7afcf2b

All that means that something is wrong with conditioned profiles.

The following we should do:

  1. Definitely push the conditioned profile to the device
  2. Establish a mechanism to determine if the conditioned profile is present
  3. Investigate why there are application timeout failures when the conditioned profile is used

Also we might want to temporarily disable conditioned profiles until we got the underlying problems fixed.

Tarek, what do you think? Also would you be ok with temporarily disabling the conditioned profiles for Android?

Flags: needinfo?(tarek)

Since it does not sound like it's going to take days or weeks to fix the issue, I think it would be wiser to fix the problem instead of deactivating the condprof and then reactivating it

Flags: needinfo?(tarek)

Looking at a recent run here

https://firefoxci.taskcluster-artifacts.net/HWJJeymYSPaccKdBdaaARA/0/public/logs/live_backing.log

it correctly pulls https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.latest.firefox.condprof-p2_aarch64-geckoview_example/artifacts/public/condprof/profile-p2_aarch64-geckoview_example-settled-default.tgz

and then correctly pushes it to the sdcard

[task 2020-04-01T16:52:06.611Z] 16:52:02     INFO -  adb command_output: adb -s FA84B1A00039 wait-for-device push /tmp/tmpGnF3ql/profile /sdcard/raptor/profile, timeout: None, timedout: None, exitcode: 0, output: /tmp/tmpGnF3ql/profile/: 11 files pushed, 0 skipped. 9.9 MB/s (572319 bytes in 0.055s)

11 files were pushed there, so in order to fix the bug, could you show me a try run where it's actually doing what you have described ?
from there I can try to investigate

Flags: needinfo?(hskupin)

actually, looking at that tarball closely, there are 34 top level files/dirs and 117 files, so if the adb push step UI says 11, there's definitely something wrong along the way. I will add some logs in every steps and run a try run to verify each step (untarring, adb push, resulting dir)

Flags: needinfo?(hskupin)

Yes, the 11 files are the following plus the ones for the Raptor web extension:

cert9.db
extensions
key4.db
pkcs11.txt
prefs.js
user.js

Given that this is a meta bug, I'm going to file a new bug to get this fixed.

Depends on: 1626726
Depends on: 1626729

(In reply to Tarek Ziadé (:tarek) from comment #3)

Since it does not sound like it's going to take days or weeks to fix the issue, I think it would be wiser to fix the problem instead of deactivating the condprof and then reactivating it

Note, that I won't have the time to work on those fixes. I was happily doing the investigation, and upcoming verification work.

No longer depends on: 1626729

I was happily doing the investigation, and upcoming verification work.

Henrik, thanks for the investigation. We'll add more checks, but let's push the fix.
and no, we are not disabling the conditionned profile, we are fixing the bug I found

Depends on: 1626729

That is fine. As I also noticed earlier today my fixes for bug 1625892 also help a lot for intermittents. So far I haven't seen any page timeouts, so it seems that the settle time might not be that important in regards of affecting page loads in general. But maybe it would only show up in Perfherder.

Blocks: 1620828
Depends on: 1630009
Blocks: 1536090

We are seeing performance regressions with conditioned profiles as filed as bug 1631717. We might take those as a new base line.

But beside all that I wonder how several services in Firefox like updating the safe browsing files, check for updates, or the block list files play into account here. There are several preferences which can be used for that, and which might not have been set for the conditioned profile yet.

Here some examples:

"app.update.disabledForTesting": True,
"browser.search.update": False,
"extensions.update.enabled": False,
"browser.safebrowsing.blockedURIs.enabled": False,
"browser.safebrowsing.downloads.enabled": False,
"browser.safebrowsing.passwords.enabled": False,
"browser.safebrowsing.malware.enabled": False,
"browser.safebrowsing.phishing.enabled": False,
"services.settings.server": "data:,",

Especially for safe browsing a decent amount of data will be downloaded, which has a huge impact on the file size of the conditioned profile. In case we don't need/want safe browsing we should disable it complete with the prefs above. Otherwise the update timers would need to be adjusted so that no additional checks are performed within a day the conditioned profile is in use.

Tarek, which prefs from the above do we actually use?

Depends on: 1631717
Flags: needinfo?(tarek)

In case we don't need/want safe browsing we should disable it complete with the prefs above.
Tarek, which prefs from the above do we actually use?

None. Part of the reason we have the condprofile is to try to start Firefox with a realistic environment, check for updates, etc.
So I would not be in favor of deactivating some things that are happening when we build/update the profile.

Now, another thing I am wondering is, do we use all of those option during page load tests?

Flags: needinfo?(tarek)

(In reply to Tarek Ziadé (:tarek) from comment #11)

Now, another thing I am wondering is, do we use all of those option during page load tests?

At least safebrowsing is disabled for perf tests:
https://searchfox.org/mozilla-central/rev/3446310d6cc5c85cde16a82eccf560e9b71a3d44/testing/profiles/perf/user.js#20-21,23-25

So it doesn't make sense to include all those downloaded files (can be around 40MB) in the conditioned profile.

Might be worth to also check the other prefs for various profiles as used by Raptor.

Blocks: 1607511

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #12)

At least safebrowsing is disabled for perf tests:
https://searchfox.org/mozilla-central/rev/3446310d6cc5c85cde16a82eccf560e9b71a3d44/testing/profiles/perf/user.js#20-21,23-25

So it doesn't make sense to include all those downloaded files (can be around 40MB) in the conditioned profile.

Might be worth to also check the other prefs for various profiles as used by Raptor.

Hm, as Andrew mentioned on a different bug we should actually keep safe browsing / tracking protection enabled because it can cause very different metrics due to blocked content. Maybe he can give more details on his own because I forgot which bug it was.

Maybe we should simply try to stop running updates for safe browsing while the performance tests are running. Also why does the perftest profile disable various safe browsing features?

Flags: needinfo?(acreskey)
Depends on: 1636886

It's a tricky issue -- SafeBrowsing and Tracking Protection can both affect performance.
Not only are resourced blocked with Strict Tracking Protection, but even without it there numerous network prioritizations that these features affect.
See https://wiki.mozilla.org/Security/Tracking_protection

However the Safebrowsing download can introduce noise if the profile is not sufficiently conditioned.

In Bug 1636461 I proposed no longer disabling these features, at least for Fenix where they have been shown to significantly affect the results.

Another idea was to add new tests that enable (or disable) these features. Perhaps run at a lower frequency.

Flags: needinfo?(acreskey)
Depends on: 1636956

Ok, I moved this specific discussion over to bug 1636956. Thanks Andrew!

Depends on: 1637724
No longer depends on: 1637724
No longer depends on: 1636956
Depends on: 1647349
Depends on: 1665153
Depends on: 1665155
Severity: normal → S3
Whiteboard: [perftest:triage]
Whiteboard: [perftest:triage]
You need to log in before you can comment on or make changes to this bug.