mozilla::safebrowsing::LookupCacheV4::VerifyChecksum consumes >100ms of CPU during startup

NEW
Unassigned

Status

()

Toolkit
Safe Browsing
P3
normal
7 months ago
3 months ago

People

(Reporter: gps, Unassigned)

Tracking

({perf})

Firefox Tracking Flags

(Not tracked)

Details

When profiling Firefox startup, mozilla::safebrowsing::LookupCacheV4::VerifyChecksum was in the stack a lot - over 100ms on an i7-6700K. Although it wasn't in the main thread, it seems to be using critical CPU very early in app startup.

Furthermore, it appears it is using SHA-256 to perform verification. I question whether we need to use a cryptographic hash for file integrity verification [during startup]. I understand needing to verify the integrity of the safe browsing database when it is downloaded. But for a simple integrity check when loading a file, I would think a non-cryptographic hash like xxHash (which is better than CRC32) would be sufficient. I think xxHash is over a magnitude faster than SHA-256. So using a cryptographic hash here might just be flat out wasting CPU cycles [that could be used by other Firefox threads during startup].

Updated

7 months ago
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Assignee: dlee → nobody
Status: ASSIGNED → NEW

Updated

7 months ago
Assignee: nobody → dlee
Status: NEW → ASSIGNED
The reason safebrowsing uses SHA-256 is because google uses it[1], we will receive the SHA-256 checksum from google while doing an update. We also use the checksum while saving update data to file so we don't need to calculate checksum again.

We can use faster hash algorithm when store update data to file just like what we have already done in v2[2]. The downside of this approach is we need to calculate checksum again while saving data.

HI francois,
do you have any suggestion?

[1] https://developers.google.com/safe-browsing/v4/reference/rest/v4/threatListUpdates/fetch#checksum
[2] https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/HashStore.cpp#380 
    https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/HashStore.cpp#963
Flags: needinfo?(francois)
I don't have enough information to be confident about the priority of this optimization. Is 100ms on a background thread a big deal?

We could keep both a hash and a CRC-like thing around, but a DXR search for xxhash came out empty. It's probably overkill to add a new hash algorithm to mozilla-central just for this optimization when we already have in-tree CRC32 implementations.

Alternatively, do we have a way to mark this background thread as low priority so that it doesn't steal CPU cycles from a more important one?
Flags: needinfo?(francois) → needinfo?(gps)
Whiteboard: #sbv4-m8
Priority: -- → P3
I don't know enough about Firefox internals to answer questions in the needinfo. My gut instinct is that this CPU so early during startup is wasteful.

xxHash is a relatively new hashing function. It is very small and very fast (much faster than CRC32) and it has sufficient "randomness" for content verification. It is used by lz4 and zstd, for example. But if we don't want to bloat scope to include it in Gecko, CRC32 should be acceptable (it is certainly faster than SHA-256). It is relatively slow compared to xxhash. But perfect is the enemy of good.
Flags: needinfo?(gps)
not actively work on this.
Assignee: dlee → nobody
Status: ASSIGNED → NEW
Feels like this is something we should track as part of photon-perf or quantum flow. Florian, can you help get the right eyes on this?
Flags: needinfo?(florian)
Not really photon-perf, sounds more like Quantum Flow, although I'm not sure how much quantum flow cares about things that are slow but not on the main thread, ehsan?
QA Whiteboard: [qf]
Flags: needinfo?(florian) → needinfo?(ehsan)

Comment 7

6 months ago
Running non-low-priority CPU intensive work during startup directly competes with CPU intensive work that we need to do (mostly for hot startup and later stages of cold startups).  Ideally we should defer this work if possible.  Please note that we do spawn a fair number of threads pretty early on so it's not just the main thread that this competes with.

This is a hard sell for [qf:p1] but we should *really* try to fix it if possible...
QA Whiteboard: [qf] → [qf:p3]
Flags: needinfo?(ehsan)
Since this is related to Quantum Flow, I'll work on it.
Assignee: nobody → dlee
Status: NEW → ASSIGNED
Whiteboard: #sbv4-m8 → #sbv4-m9
I'll take PTO during M9, remove myself from assignee.
Assignee: dlee → nobody
Status: ASSIGNED → NEW
Whiteboard: #sbv4-m9
You need to log in before you can comment on or make changes to this bug.