[meta] Creating the XPCOM system info service is super expensive
Categories
(Core :: XPCOM, enhancement)
Tracking
()
Performance Impact | medium |
People
(Reporter: ehsan.akhgari, Unassigned)
References
(Depends on 3 open bugs, Blocks 1 open bug)
Details
(4 keywords, Whiteboard: [fxperfsize:S])
Attachments
(1 obsolete file)
Comment 1•8 years ago
|
||
Comment 2•8 years ago
|
||
Comment 3•8 years ago
|
||
Comment 4•8 years ago
|
||
Comment 5•8 years ago
|
||
Comment 6•8 years ago
|
||
Comment 7•7 years ago
|
||
Updated•7 years ago
|
Comment 8•7 years ago
|
||
Updated•7 years ago
|
Updated•7 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Updated•6 years ago
|
I guess this should be reprofiled on Android.
Comment 12•6 years ago
|
||
This has gotten worse on Windows, even on my fast SSD machine we now spend several ms inside GetWindowsSecurityCenterInfo which does a bunch of disk IO, presumably trying to work out what AV solution the machine uses. Punting this back into fxperf triage.
Comment 13•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #12)
This has gotten worse on Windows, even on my fast SSD machine we now spend several ms inside GetWindowsSecurityCenterInfo which does a bunch of disk IO, presumably trying to work out what AV solution the machine uses. Punting this back into fxperf triage.
GetWindowsSecurityCenterInfo usage was added in bug 1418131.
Comment 14•6 years ago
|
||
Dragana, are you still working on this, or would it be OK if I took it?
Updated•6 years ago
|
Comment 15•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #14)
Dragana, are you still working on this, or would it be OK if I took it?
no. you can take it.
Updated•6 years ago
|
Comment 16•5 years ago
|
||
Some initial considerations just to have them written down:
- some things are only needed by telemetry, and might be better off living in telemetry code instead of here.
- on the other hand, we've had cases in the past where info is only available off telemetry that other code then ends up wanting to use, and that then gets messy (cf. bug 1498213)
- some things are completely unused and can just be removed
- some things are only used from other C++ code which could just call the relevant NSPR code instead
- while initializing-on-first-fetch is relatively easy to do (by hooking some 'lazy' init code into all the get* calls), it would keep some of the related IO on the main thread which seems undesirable
- we fetch environment bits (like firewall/antivirus/antispyware etc.) for telemetry on every startup, but this is comparatively expensive, and rarely changes so could probably be cached in some way and updated lazily, OMT.
Raw list of everything I could find inside system info service itself (note that it's a writable property bag so other things can put items into the system info service without the implementation living there, but that seems less interesting for what we're doing here), and where it's used, grouped by usage:
"hasWindowsTouchInterface" #unused
"pagesize" #unused (except tests)
"pageshift" #unused (except tests)
"memmapalign" #unused (except tests)
"hasSeccompBPF" about:support only
"hasSeccompTSync" about:support only
"hasUserNamespaces" about:support only
"hasPrivilegedUserNamespaces" about:support only
"canSandboxContent" about:support only
"canSandboxMedia" about:support only
"registeredAntiSpyware" Telemetry and about:support
"registeredAntiVirus" Telemetry and about:support
"registeredFirewall" Telemetry and about:support
"isWowARM64" Telemetry only
"isWow64" Telemetry only
"installYear" Telemetry only
"binHDDModel" Telemetry only
"binHDDRevision" Telemetry only
"binHDDType" Telemetry only
"winHDDModel" Telemetry only
"winHDDRevision" Telemetry only
"winHDDType" Telemetry only
"appleModelId" Telemetry only
"hasAVX" Telemetry only
"hasAVX2" Telemetry only
"hasAES" Telemetry only
"hasEDSP" Telemetry only
"virtualmemsize" Telemetry only
"cpuvendor" Telemetry only
"cpufamily" Telemetry only
"cpustepping" Telemetry only
"cpucachel2" Telemetry only
"cpucachel3" Telemetry only
"hasMMX" Telemetry and update utils
"hasSSE" Telemetry and update utils
"hasSSE2" Telemetry and update utils
"hasSSE3" Telemetry and update utils
"hasSSSE3" Telemetry and update utils
"hasSSE4A" Telemetry and update utils
"hasSSE4_1" Telemetry and update utils
"hasSSE4_2" Telemetry and update utils
"hasARMv6" Telemetry and update utils
"hasARMv7" Telemetry and update utils
"hasNEON" Telemetry and update utils
"cpucount" Telemetry and profiler metadata, IPC prioritization on Linux + Android (where it's just PR_GetNumberOfProcessors() anyway...)
"cpucores" Telemetry and profiler metadata
"countryCode" search service which uses it for telemetry (but, afaict, nothing else). Note ICU also has code for this...
"memsize" Used from Places (bookmarks/history/etc.) code
"name" Used in lots of places, also available as Services.appinfo.OS
"arch" Used in some places, similar info available on Services.appinfo.XPCOMABI
"version" Used lots
"umask" Used by OS.File/OS.Constants and maybe other stuff
"secondaryLibrary" Blocklist/AMO/Updates
Android only (not looked into these, by and large):
"device"
"manufacturer"
"release_version"
"tablet"
"kernel_version"
"hardware"
Comment 17•5 years ago
|
||
The main performance problem at the moment is that the service causes mainthread IO and is initialized on the profile-do-change notification early in startup.
The bits that cause main thread IO relate to the antivirus/firewall/antispyware data gathering and the data gathering about the disk type for the user profile. Both of those are used in the telemetry environment object which is used to annotate crashreports.
In order to fix the performance issue, this data gathering needs to happen async / off-main-thread, and later in startup. We could either leave the data blank before then or use a cache stored e.g. in prefs or XULStore or similar.
The SSD/HDD data seems like it got added in bug 1533861 which doesn't look like crash data would care about it. Ditto for the antivirus/antispyware stuff from bug 1418131.
Some of the other data that causes IO in some circumstances (e.g. CPU identification that touches the registry) seems like it'd be fine to cache.
Either way, however, that would mean that early startup crashes would not have this data. I don't believe that would be critical, but it's not my call - I filed bug 1552858 to get some feedback from data science about this.
Comment 18•5 years ago
•
|
||
(In reply to :Gijs (he/him) from comment #17)
The SSD/HDD data seems like it got added in bug 1533861 which doesn't look like crash data would care about it. Ditto for the antivirus/antispyware stuff from bug 1418131.
I object to the notion that AV information is irrelevant to crash report data.
EDIT: As a compromise, I propose that we file a bug to move collection of AV information for the purposes of crash reports into minidump-analyzer
. That way we can move the remaining use case of collecting AV info for telemetry off the main thread.
Comment 19•5 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #18)
(In reply to :Gijs (he/him) from comment #17)
The SSD/HDD data seems like it got added in bug 1533861 which doesn't look like crash data would care about it. Ditto for the antivirus/antispyware stuff from bug 1418131.
I object to the notion that AV information is irrelevant to crash report data.
Well, my understanding was that, where crashes are reported, we'd get this information through the 'loaded DLLs' info anyway, rather than through the telemetry environment json dump that's part of the crash report. Is that not the case?
The other thing I noticed was that the information has no versioning, and so presumably we could cache the data across restarts and update it (lazily, OMT) once the UI is up and running, potentially with a fallback that'd cause us to force getting the data earlier if the previous startup crashed. Would that not work?
EDIT: As a compromise, I propose that we file a bug to move collection of AV information for the purposes of crash reports into
minidump-analyzer
. That way we can move the remaining use case of collecting AV info for telemetry off the main thread.
That's a good idea - though I can't see off-hand how this data fits into what the minidump-analyzer is doing, probably because I know very little about our crashreporting infrastructure. Is it "just" a question of adding extra fields to the JSON data as written in UpdateExtraDataFile
? And, would this be preferable over what I just suggested?
Comment 20•5 years ago
•
|
||
(In reply to :Gijs (he/him) from comment #19)
Well, my understanding was that, where crashes are reported, we'd get this information through the 'loaded DLLs' info anyway, rather than through the telemetry environment json dump that's part of the crash report. Is that not the case?
Injected DLLs are not the only vector from which AVs can cause problems. They may be causing issues even though they don't appear in "loaded DLLs." The only way to correlate that would be via the AV info we're collecting.
The other thing I noticed was that the information has no versioning, and so presumably we could cache the data across restarts and update it (lazily, OMT) once the UI is up and running, potentially with a fallback that'd cause us to force getting the data earlier if the previous startup crashed. Would that not work?
My concern would be if the AV prevents the browser from ever starting correctly in the first place, thus preventing construction of the cache and/or initial info. It happens, and I have no intention of opening up new blind spots in our AV data collection.
EDIT: As a compromise, I propose that we file a bug to move collection of AV information for the purposes of crash reports into
minidump-analyzer
. That way we can move the remaining use case of collecting AV info for telemetry off the main thread.That's a good idea - though I can't see off-hand how this data fits into what the minidump-analyzer is doing, probably because I know very little about our crashreporting infrastructure.
I'll give you an example of where we're already doing this: It is really expensive for us to pull cert information out of every DLL loaded into the process when generating the crash dump. I moved that code to minidump-analyzer
so that Firefox can get itself back up and running as quickly as possible, leaving the analyzer to perform those annotations in the background.
Is it "just" a question of adding extra fields to the JSON data as written in
UpdateExtraDataFile
?
It should be, though to prevent needing to change anything on the Soccorro side, we should ensure that we're using the same format and location in the JSON file as the existing AV annotations that we are already using on release.
And, would this be preferable over what I just suggested?
I think that, based on my previous answers, the answer is yes, minidump-analyzer
is the preferred route to go here.
Comment 21•5 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #20)
It should be, though to prevent needing to change anything on the Soccorro side, we should ensure that we're using the same format and location in the JSON file as the existing AV annotations that we are already using on release.
Which are what? You parse it out of the JSON blob that gets sent for TelemetryEnvironment? Because sticking with that would be tricky - we'd need to (from the minidump analyzer) read the current telemetry environment out of the annotations, then append the antivirus/spyware/firewall info, then write it out again... Is that really the only/best option? Wouldn't it be nicer to have it in separate fields anyway?
Comment 22•5 years ago
|
||
(To be clear, from a casual look at the current minidump analyzer code, it looked like all the things it adds get appended to the extant file without reading its contents, but perhaps I'm misreading the code, as I'm not familiar with it.)
Comment 23•5 years ago
|
||
Just a note, I believe almost nobody has ever used this information yet for analyzing crashes. Indeed, I think the Telemetry JSON information is almost never used as it's not easily searchable on Socorro.
I see the AV info being useful though, and moving it to be a crash annotation is going to make it actually useful for people analyzing crash data.
Comment 24•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #22)
(To be clear, from a casual look at the current minidump analyzer code, it looked like all the things it adds get appended to the extant file without reading its contents, but perhaps I'm misreading the code, as I'm not familiar with it.)
Yes, that's what's happening right now. However I'm in the process of switching the .extra file format to JSON instead of the current INI-like format (bug 1420363). This will happen in the coming weeks and at that point the minidump-analyzer will not blindly append its annotation any more.
Comment 25•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #21)
Which are what? You parse it out of the JSON blob that gets sent for TelemetryEnvironment? Because sticking with that would be tricky - we'd need to (from the minidump analyzer) read the current telemetry environment out of the annotations, then append the antivirus/spyware/firewall info, then write it out again... Is that really the only/best option? Wouldn't it be nicer to have it in separate fields anyway?
(In reply to Marco Castelluccio [:marco] from comment #23)
I see the AV info being useful though, and moving it to be a crash annotation is going to make it actually useful for people analyzing crash data.
Yes, the more that I page back in about the current state of things, the more I agree that it makes sense to move the AV stuff over to regular crash annotations. We don't need any changes on the backend to do that.
(In reply to Gabriele Svelto [:gsvelto] from comment #24)
(In reply to :Gijs (he/him) from comment #22)
(To be clear, from a casual look at the current minidump analyzer code, it looked like all the things it adds get appended to the extant file without reading its contents, but perhaps I'm misreading the code, as I'm not familiar with it.)
Yes, that's what's happening right now. However I'm in the process of switching the .extra file format to JSON instead of the current INI-like format (bug 1420363). This will happen in the coming weeks and at that point the minidump-analyzer will not blindly append its annotation any more.
This is also good news, since the AV API actually returns arrays of registered products, so if we were to wait for bug 1420363, recording the AV info as JSON-style annotations would be the best way forward.
Comment 26•5 years ago
|
||
OK, I'm going to split things out.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•2 years ago
|
Description
•