Open Bug 1438385 Opened 6 years ago Updated 2 years ago

Profile startup with most used antiviruses installed

Categories

(Toolkit :: Startup and Profile System, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox60 --- affected

People

(Reporter: marco, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: inj+)

A large number of users have antiviruses installed. They often slow down I/O.
It might be worth profiling our startup (and maybe other key interactions) on a machine with an antivirus installed.

We have stats on the most used antiviruses, so we could profile Firefox with those. E.g. Avast, IIRC the most commonly used one, is installed for ~14% of users.
I believe Florian is already doing it?
Flags: needinfo?(florian)
(In reply to Panos Astithas [:past] (please ni?) from comment #1)
> I believe Florian is already doing it?

I think this bug was filed as a result of me pointing out that Kaspersky was dramatically slowing down content process startup on a machine where I profiled. If I understand correctly, the idea of this bug is to profile with various all the common antiviruses that our users have; I haven't done this. Sounds like a good idea but I'm not sure who should be doing it. Maybe QA?
Flags: needinfo?(florian)
Is there a list of OS/config and AntiVirus software we should test with on a first pass?

I imagine we would test startup and pageload tests from talos, run manually probably on a windows 10 reference laptop or some similar spec mass production hardware.

Once we have a clear list of what we want to do, we can figure out who can do it and start getting results.  Ideally we can find a solution for doing this in automation.  AV software makes that difficult- possibly there are good solutions that can work well in automation.
Component: General → Startup and Profile System
Product: Firefox → Toolkit
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #3)
> Is there a list of OS/config and AntiVirus software we should test with on a
> first pass?

Marco, could you provide the stats you hinted at in comment 0?
Flags: needinfo?(mcastelluccio)
I have some out of date stats, I could regenerate them. But maybe Romain already has something up to date.
Flags: needinfo?(mcastelluccio) → needinfo?(rtestard)
I looked at the 64 bit DLLs for Windows 10/7/8.1 that were not from Microsoft, did not have hardware dependencies and impacted over 2% of the Windows 10/7/8.1 64 bit user base. This was mostly to identify which 3rd party softwares should be manually tested - the DLL identification process was fairly manual so prone to error.
The list can be accessed here (this data is 4 months old): https://docs.google.com/document/d/1qYkWOOO2zGfq2RlnnDqi7cyUx3pq0fM7kCz7uN2lRqM/edit
Flags: needinfo?(rtestard)
it looks like:
* Avast Anti Virus: https://www.avast.com/en-us/free-antivirus-download
* Kaspersky Anti-Virus: https://usa.kaspersky.com/downloads/thank-you/antivirus-free-trial
* ESET Anti virus: https://www.eset.com/us/home/free-trial/?CMP=knc-Google-G_S-US-BR-C-Antivirus_B&gkw=eset%20antivirus%20trial&gcr=181305559630&gcp744383276&gag=39591760756&gpl=&gclid=EAIaIQobChMIxu20i__w2QIVmoqzCh2rLwp4EAAYASAAEgLIuPD_BwE&gclsrc=aw.ds
* Symantec CMC Firewall: https://us.norton.com/downloads

these entries appear to be apps from other companies that are not anti-virus/malware
Bonjour
Sogou.com Inc
Java(TM) Platform SE


I assume we have versions of this internally we can download that are fully licensed (or does a trial work out ok)?  do we have recommendations for settings on these?  default settings maybe?  

:rt, can you help confirm my understanding of this and help direct what we should be testing?
Flags: needinfo?(rtestard)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #7)
> it looks like:
> * Avast Anti Virus: https://www.avast.com/en-us/free-antivirus-download
> * Kaspersky Anti-Virus:
> https://usa.kaspersky.com/downloads/thank-you/antivirus-free-trial
> * ESET Anti virus:
> https://www.eset.com/us/home/free-trial/?CMP=knc-Google-G_S-US-BR-C-
> Antivirus_B&gkw=eset%20antivirus%20trial&gcr=181305559630&gcp744383276&gag=39
> 591760756&gpl=&gclid=EAIaIQobChMIxu20i__w2QIVmoqzCh2rLwp4EAAYASAAEgLIuPD_BwE&
> gclsrc=aw.ds
> * Symantec CMC Firewall: https://us.norton.com/downloads
> 
> these entries appear to be apps from other companies that are not
> anti-virus/malware
> Bonjour
> Sogou.com Inc
> Java(TM) Platform SE
> 
That's right although it may still impact startup time.
> 
> I assume we have versions of this internally we can download that are fully
> licensed (or does a trial work out ok)?  do we have recommendations for
> settings on these?  default settings maybe?  
I contacted Avast in the past and they advised using the free version since the vast majority of Avast usage is on their free version.
I'd assume the same for Kaspersky, ESET and Symantec although perhaps we could test that the free versions indeed load the reported DLLs?

For info Su is helping me get a report of how DLLs injected correlate to retention. I hope this can help prioritize software testing by looking at the combinations of frequency of occurences of DLLs and how this correlates to retention.
Flags: needinfo?(rtestard)
Bonjour is apple itunes

:igoldan, next week can you start running ts_paint locally with:
* baseline (x5)
* for each of the above programs, install, run ts_paint x5, get data saved off, uninstall
Flags: needinfo?(igoldan)
Whiteboard: [PI:March]
Assignee: nobody → igoldan
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #9)
> Bonjour is apple itunes
> 
> :igoldan, next week can you start running ts_paint locally with:
> * baseline (x5)
> * for each of the above programs, install, run ts_paint x5, get data saved
> off, uninstall

Yes, will do this after I get my hands on a laptop from the workplace.
I understand I'll test against Avast, Kaspersky, ESET and Symantec.

For the other apps you mentioned, are these the right places from which I can download them?
Bonjour: https://support.apple.com/kb/DL999?locale=en_US&viewlocale=en_US
Java(TM) Platform SE: download.oracle.com/otn-pub/java/jdk/9.0.4+11/c2514751926b4512b076cc82f959763f/jdk-9.0.4_windows-x64_bin.exe

I don't know where I can download the last one, Sogou.com Inc. Please share the appropriate for this app.
Flags: needinfo?(igoldan)
Sogou.com seems to be a Chinese language search engine- I spent 15 minutes trying to figure out what would be installed and I couldn't find anything obvious to install.  If others have ideas here, speak up!

Regarding Java, I think there needs to be an app that uses the jdk for it to be running in the background.

Thanks for picking this up Ionut!
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #11)
> Sogou.com seems to be a Chinese language search engine- I spent 15 minutes
> trying to figure out what would be installed and I couldn't find anything
> obvious to install.  If others have ideas here, speak up!

https://en.wikipedia.org/wiki/Sogou_Pinyin
I will be using Windows 10 x64. Does this suffice?
Also, if you have some specific hardware preferences regarding the laptop, please tell.
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #13)
> I will be using Windows 10 x64. Does this suffice?
> Also, if you have some specific hardware preferences regarding the laptop,
> please tell.

This shows the user distribution per OS, per OS arch: https://sql.telemetry.mozilla.org/queries/49731/source#133820
77% of users are covered by the top 3: 64 bit Win10, 64 bit Win7 and 32 bit Win7

The list I provided only looked at 64 bit DLLs so I'd suggest focussing on 64 bit Win10 and 64 bit Win7 sinec the share of Win7 is still fairly high.

Regarding SoGou, 2 DLLs from this editor came-up in the top DLLs: PicFace64.dll and SOGOUPY.IME.
Similarly to Comment 12, all descriptions I found on these seem to point to Sogou Pinyin input method (https://pinyin.sogou.com/mac/).
I tried to install Sogou using:
http://cdn2.ime.sogou.com/836c2ad141067487bff5c61e22978726/5ab2bbec/dl/index/1520568503/sogou_pinyin_89c.exe

unfortunately my network provider blocks access to that file- I believe that install will get us the Pinyin IM.
Would it be OK if I use a Firefox artifact for doing my tests? That is: pick a recent mozilla-central changeset, then build Firefox in artifact mode.
Flags: needinfo?(rtestard)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #16)
> Would it be OK if I use a Firefox artifact for doing my tests? That is: pick
> a recent mozilla-central changeset, then build Firefox in artifact mode.

What's the benefit of using an artifact build compared to an official nightly build? I would expect this startup profiling to happen on a machine that looks more like a user machine than like a developer machine, so I wouldn't expect for example visual studio.
(In reply to Florian Quèze [:florian] from comment #17)
> (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment
> #16)
> > Would it be OK if I use a Firefox artifact for doing my tests? That is: pick
> > a recent mozilla-central changeset, then build Firefox in artifact mode.
> 
> What's the benefit of using an artifact build compared to an official
> nightly build? I would expect this startup profiling to happen on a machine
> that looks more like a user machine than like a developer machine, so I
> wouldn't expect for example visual studio.

I will configure the tests to use the nightly build.
But I cannot uninstall Visual Studio, as it is an important dependency for our Windows tooling. Without it, I won't be able to run my perf tests.
Regarding testing alongside Java: I'm downloading the JRE and keeping Virtualbox running, to respect :jmaher's indication from comment 11.
If you have other opinions, please share them here.
For installing Sogou Pinyin, would this link do: http://qpdownload.com/sogou-pinyin/ ?
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #18)

> But I cannot uninstall Visual Studio, as it is an important dependency for
> our Windows tooling. Without it, I won't be able to run my perf tests.

Which tooling and perf tests are you considering? It's possible we have understood this bug differently. I was expecting a machine in a 'user' configuration, and Firefox started with the MOZ_PROFILER_STARTUP environment variable, and then the startup profile captured and uploaded. Preferably with the machine rebooted each time, to capture cold startup.
I think we had a lack of communication in the bug.  In comment 3 I had mentioned running ts_paint from talos with different anti-virus programs installed.

:florian, could you outline what is really desired here?  I think there might be a lot of things that both Ionut and myself are not familiar with in your request, so the more detailed the steps the easier it is to do it right.
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] from comment #21)
> (In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment
> #18)
> 
> > But I cannot uninstall Visual Studio, as it is an important dependency for
> > our Windows tooling. Without it, I won't be able to run my perf tests.
> 
> Which tooling and perf tests are you considering? It's possible we have
> understood this bug differently. I was expecting a machine in a 'user'
> configuration, and Firefox started with the MOZ_PROFILER_STARTUP environment
> variable, and then the startup profile captured and uploaded. Preferably
> with the machine rebooted each time, to capture cold startup.

I'm using the steps listed by :jmaher in comment 9. Those steps expect me to prepare a local development environment, as showed here [1]. I'm then running the tests from inside the start-shell.bat terminal emulator. That's how Talos is able run the ts_paint test and gather the timestamps.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Windows_Prerequisites
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #22)
> I think we had a lack of communication in the bug.  In comment 3 I had
> mentioned running ts_paint from talos with different anti-virus programs
> installed.

Indeed, when you said "test startup and pageload tests from talos, run manually" in comment 3 I wrongly assumed you meant "do the same thing as what talos do in these tests but by hand", so I didn't notice we had something different in mind.

What I was expecting is something roughly like:

Initial setup:
- have a clean Windows 10 machine.
- Install on it the latest Firefox nightly.
- Install the gecko profiler add-on.
- Create a .bat script on the desktop that can easily start Nightly with the environment variables to profile startup (that's MOZ_PROFILER_STARTUP=1 and MOZ_PROFILER_STARTUP_ENTRIES=<large value>).

Preparation for a run:
- reboot the computer,
- wait for everything to settle (eg. using the task manager, wait for the disk and CPU usage to be low). This may take several seconds to several hours depending on what happens. I once saw Windows Update and Windows Defender doing a background scan of my reference hardware's disk for almost a whole day, making any performance testing impossible on that machine for the whole day.

Actually running it:
- click the .bat script to open Nightly,
- once it's done starting, press ctrl+shift+2 to open the profiler UI.
- Wait for symbolication to complete, and then click "Share". Save the link.
If we want to have numbers in addition to the profile, open about:telemetry, click "Simple Measurements" and copy relevant values from there. Relevant values would include at least firstPaint, delayedStartupStarted, delayedStartupFinished, sessionRestored.

I would expect this data to be noisy, so I would probably take a couple runs. Maybe 5. It's important to reboot the computer each time if we want to measure cold startup. If you want to measure warm startup (and be closer to the numbers we have on Talos), run this a couple times in a row without rebooting.


This is to give a baseline for the machine we are going to use. Now to test with third party software, I would install it, reboot the computer, and run the same tests.

I have very little trust for the 'uninstall' feature of antivirus software. So if we are testing using physical hardware, I would ensure we have a system restore point right before installing the third party software, and after we are done testing, I would use system restore to go back to the clean state. If we are using virtual machines instead of physical hardware, I would just snapshot the VM before installing the third party software.

If we are using virtual machines and have enough disk space on the host hardware, it would be nice to keep a snapshot or a copy of the VM with each of the third party software installed. I would assume these things are tedious/slow to install, and if we want to repeat the tests in the future, we would like to avoid spending the same time again doing the setup.


So, this is what I had in mind... and I now imagine you were thinking of something very different. I think it all goes down to which question we are trying to answer. As an engineer I would like to know what's slow in what our users experience. So numbers aren't giving enough information. I also doubt our current talos tests match what our users experience. The most recent example being ts_paint that got a 50+% win on Linux but nothing on Windows with bug 1447719 comment 9. Telemetry says that Windows users experienced the win too: https://mzl.la/2urbSDA I'm also more interested in cold startup than warm startup, because while warm startup is less noisy, it's also much more artificial.

If the question you are trying to answer is which antivirus is affecting us the most right now, and be able to repeat this test in the future to catch regressions in future versions of antiviruses, you want something more automated and easy to repeat to produce numbers, so running Talos makes more sense than what I'm suggesting.

I can't say for sure what Marco had in mind when filing the bug, so you probably want to ask him too. I had just assumed that "profile startup" meant "use the gecko profiler", because that's what I've been doing for months ;-).
Flags: needinfo?(florian)
> I can't say for sure what Marco had in mind when filing the bug, so you probably want to ask him too. I had just assumed that > "profile startup" meant "use the gecko profiler", because that's what I've been doing for months ;-).

Marco, can you be explicit about the question we're trying to answer here?
Flags: needinfo?(mcastelluccio)
:florian Thank you for providing the description of the test setup. I talked with Florin Mezei and concluded this sounds more like a ProductIntegrity (PI) Request, which can be handled by dedicated QA. So, if we are taking your approach and avoid Talos, that's the right thing to do.

From the perf side, we can only provide you with the results that Talos returns.
What I meant by "profile startup" was to actually use the Gecko profiler to analyze the startup performance, like Florian described.
Flags: needinfo?(mcastelluccio)
Tom- this is a request for manual QA, not running automated tests in a custom environment like I originally thought, can you pick this up?
Assignee: igoldan → nobody
Flags: needinfo?(tgrabowski)
Whiteboard: [PI:March]
Clearing NI since I believe Florian addressed it.
Flags: needinfo?(rtestard)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #28)
> Tom- this is a request for manual QA, not running automated tests in a
> custom environment like I originally thought, can you pick this up?

Andrei Vaida's team is working on it. It will take some time to set up the testing environment. Given the last-minute nature of this request and that we are currently working on many other high-priority tasks, we currently estimate it will take one to two weeks to finish testing.
Flags: needinfo?(tgrabowski) → needinfo?(andrei.vaida)
I have tested this as instructed. The final results can be found in the following document:
https://docs.google.com/document/d/1ILBhSWrW7upTfkeUma7FlugrLiet1HSZVm25hMghrX0

A spreadsheet containing all the averages can be viewed here:
https://docs.google.com/spreadsheets/d/1zIectmhfUrAxkYjs5G6D0QgeYO3NhQ6j4olW-xnVYU4/
Flags: needinfo?(andrei.vaida)
(In reply to Cristian from comment #31)

> A spreadsheet containing all the averages can be viewed here:
> https://docs.google.com/spreadsheets/d/1zIectmhfUrAxkYjs5G6D0QgeYO3NhQ6j4olW-
> xnVYU4/

The charts in this spreadsheet is a little surprising because it shows Norton faster than baseline on Windows 7 and Avast faster than baseline on Windows 10.

For Norton on Win7, it's easy to explain: there's a warm startup profile in the list of cold startups, which skews the results.
For Avast on Win10, I have no obvious explanation, so I'll just say that I/O timing is a random, and that we would need many more samples to have numbers we can reliably compare.
(In reply to Cristian from comment #31)
> I have tested this as instructed. The final results can be found in the
> following document:
> https://docs.google.com/document/d/
> 1ILBhSWrW7upTfkeUma7FlugrLiet1HSZVm25hMghrX0

Thanks! There are lots of interesting things we can observe from this data :-).

- some of these startups are pathologically slow - the worst one taking 46s to complete. The difference between baseline and with antivirus installed is the most notable on Windows 7 cold startups, and not really notable on Windows 10.

- on Win7, creating the font list is super expensive (about 5s for cold startup on the baseline), and up to 3 times more expensive (~15s) in with some antiviruses. Which seems to confirm AVs make slow I/O even slower... and that we should find a way to avoid creating that list of fonts to display the early blank window (I saw profiles where this delays the blank window by 15s).

- I saw 2 profiles where there's third party libraries (likely from the AV) doing slow stuff very early during our startup profiles, when we set our crash reporter hooks.
https://perfht.ml/2F9bPfP (an ESET cold startup win7 profile), where we spend 11s in eOppMonitor.dll
https://perfht.ml/2JbQQLS (an Avast cold startup win7 profile), where we spend 1s in snxhk64.dll
This is probably not directly actionable, but it may be interesting to add a telemetry probe there to get more data about how long it takes us to register our crash reporting hook and correlate to injected third party libraries.

- content process creation doesn't seem unusually slow. So maybe the extreme slowness I saw there on my mother's laptop (bug 1348361 comment 52) was due to an *old* version of Kaspersky rather than the current one.

- none of the profiles include the GPU process creation. Most likely explanation is that we disable graphic acceleration when we are running in a VM.

- I _think_ (not fully sure) the Kaspersky profiles have jank near the end of startup due to some add-on. See for example https://perfht.ml/2Fq0sk0 from the first Windows 10 cold startup profile with Kaspersky.

- not really a surprise, but we miss a big part of early startup before the profiler starts. Would be nice to re-run similar tests if/once we fix the startup profiling to start earlier.

- I think these profiles are worth looking at individually, especially the cold startup ones, but that's A LOT of profiles (100). Maybe they would be good material to run a session in SF to train people to look at profiles. Giving each participant a new profile that hasn't been studied yet and asking them to identify issues.
Priority: -- → P3
Keywords: perf
Whiteboard: inj+
See Also: → 1599494
See Also: → 1595709
See Also: → 1619319
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.