mozilla::safebrowsing::LookupCacheV4::VerifyChecksum consumes >100ms of CPU during startup




2 years ago
5 months ago


(Reporter: gps, Unassigned)



Firefox Tracking Flags

(Not tracked)




2 years ago
When profiling Firefox startup, mozilla::safebrowsing::LookupCacheV4::VerifyChecksum was in the stack a lot - over 100ms on an i7-6700K. Although it wasn't in the main thread, it seems to be using critical CPU very early in app startup.

Furthermore, it appears it is using SHA-256 to perform verification. I question whether we need to use a cryptographic hash for file integrity verification [during startup]. I understand needing to verify the integrity of the safe browsing database when it is downloaded. But for a simple integrity check when loading a file, I would think a non-cryptographic hash like xxHash (which is better than CRC32) would be sufficient. I think xxHash is over a magnitude faster than SHA-256. So using a cryptographic hash here might just be flat out wasting CPU cycles [that could be used by other Firefox threads during startup].
Assignee: nobody → dlee


2 years ago
Assignee: dlee → nobody
Assignee: nobody → dlee
The reason safebrowsing uses SHA-256 is because google uses it[1], we will receive the SHA-256 checksum from google while doing an update. We also use the checksum while saving update data to file so we don't need to calculate checksum again.

We can use faster hash algorithm when store update data to file just like what we have already done in v2[2]. The downside of this approach is we need to calculate checksum again while saving data.

HI francois,
do you have any suggestion?

Flags: needinfo?(francois)
I don't have enough information to be confident about the priority of this optimization. Is 100ms on a background thread a big deal?

We could keep both a hash and a CRC-like thing around, but a DXR search for xxhash came out empty. It's probably overkill to add a new hash algorithm to mozilla-central just for this optimization when we already have in-tree CRC32 implementations.

Alternatively, do we have a way to mark this background thread as low priority so that it doesn't steal CPU cycles from a more important one?
Flags: needinfo?(francois) → needinfo?(gps)
Whiteboard: #sbv4-m8
Priority: -- → P3

Comment 3

2 years ago
I don't know enough about Firefox internals to answer questions in the needinfo. My gut instinct is that this CPU so early during startup is wasteful.

xxHash is a relatively new hashing function. It is very small and very fast (much faster than CRC32) and it has sufficient "randomness" for content verification. It is used by lz4 and zstd, for example. But if we don't want to bloat scope to include it in Gecko, CRC32 should be acceptable (it is certainly faster than SHA-256). It is relatively slow compared to xxhash. But perfect is the enemy of good.
Flags: needinfo?(gps)
not actively work on this.
Assignee: dlee → nobody

Comment 5

2 years ago
Feels like this is something we should track as part of photon-perf or quantum flow. Florian, can you help get the right eyes on this?
Flags: needinfo?(florian)
Not really photon-perf, sounds more like Quantum Flow, although I'm not sure how much quantum flow cares about things that are slow but not on the main thread, ehsan?
QA Whiteboard: [qf]
Flags: needinfo?(florian) → needinfo?(ehsan)

Comment 7

2 years ago
Running non-low-priority CPU intensive work during startup directly competes with CPU intensive work that we need to do (mostly for hot startup and later stages of cold startups).  Ideally we should defer this work if possible.  Please note that we do spawn a fair number of threads pretty early on so it's not just the main thread that this competes with.

This is a hard sell for [qf:p1] but we should *really* try to fix it if possible...
QA Whiteboard: [qf] → [qf:p3]
Flags: needinfo?(ehsan)
Since this is related to Quantum Flow, I'll work on it.
Assignee: nobody → dlee
Whiteboard: #sbv4-m8 → #sbv4-m9
I'll take PTO during M9, remove myself from assignee.
Assignee: dlee → nobody
Whiteboard: #sbv4-m9
I have an imported perf profile that (currently) shows this validation takes 400+ms on a fast dual-xeon desktop (compiled with --enable-release, local inbound pull build).  This (especially as it blocks initial page load, and chews a core when we're trying to initialize) would seem to me be a qf:p1 now at least).  Note also that 17% or so is spent in GetSmallestPrefix(), and 3% in GetPrefixes()

I'll do some more profiles to verify.  CRC32 should be at least 50% better, perhaps more.
QA Whiteboard: [qf:p3] → [qf]
You need to log in before you can comment on or make changes to this bug.