Open Bug 1478148 Opened 7 years ago Updated 2 years ago

maybe softoken can stop accessing cert9.db-journal and cert9.db-wal all the time for read-only operations (affects reading Thunderbird mail)

Categories

(NSS :: Libraries, enhancement, P2)

enhancement

Tracking

(Not tracked)

People

(Reporter: froydnj, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, perf:responsiveness, Whiteboard: [nss-triage])

I have this windows event trace taken during normal browsing usage. One of the things that stands out to me is repeated accesses to $PROFILE/cert{8,9}.db, even though these files don't actually exist. While I'm sure that these accesses are actually buried somewhere within NSS, is it possible to optimize things so that we don't repeatedly hit the disk to check for these files? I see that I do have these files on my Linux profile, so I'm not completely sure what the difference is wrt my Windows profile.
You don't have a cert9.db in your profile directory? Does importing a CA certificate work? In any case, are you able to get stack traces out of those event traces? That might help track down what's going on.
Flags: needinfo?(nfroyd)
OK, so apparently I was misremembering things instead of filing with the traces in front of me. The trace I'm looking at is touching $PROFILE/cert9.db-journal and $PROFILE/cert9.db-wal, and cert9.db *does* exist in the directory. So I guess that's more-or-less expected behavior, then? The stack traces are: |- xul.dll!mozilla::psm::`anonymous namespace'::SSLServerCertVerificationJob::Run | xul.dll!mozilla::psm::`anonymous namespace'::AuthCertificate | xul.dll!mozilla::psm::CertVerifier::VerifySSLServerCert | xul.dll!mozilla::psm::CertVerifier::VerifyCert | xul.dll!mozilla::psm::BuildCertChainForOneKeyUsage | xul.dll!mozilla::pkix::BuildCertChain | xul.dll!mozilla::pkix::BuildForward | xul.dll!mozilla::psm::NSSCertDBTrustDomain::FindIssuer | |- nss3.dll!CERT_CreateSubjectCertList and then down into bits of NSS. Similarly: |- xul.dll!mozilla::net::nsSocketTransportService::DoPollIteration | xul.dll!mozilla::net::nsSocketTransport::OnSocketReady | xul.dll!mozilla::net::nsSocketOutputStream::OnSocketReady | xul.dll!mozilla::net::nsHttpConnection::OnOutputStreamReady | xul.dll!mozilla::net::nsHttpConnection::OnSocketWritable | xul.dll!mozilla::net::nsHttpConnection::EnsureNPNComplete | xul.dll!nsNSSSocketInfo::DriveHandshake | nss3.dll!SSL_ForceHandshake | nss3.dll!ssl3_GatherCompleteHandshake | nss3.dll!ssl3_HandleRecord | nss3.dll!ssl3_HandleNonApplicationData | nss3.dll!ssl3_HandleHandshake | nss3.dll!ssl3_HandleHandshakeMessage | nss3.dll!ssl3_HandlePostHelloHandshakeMessage | nss3.dll!ssl3_HandleFinished | nss3.dll!ssl3_FinishHandshake | nss3.dll!ssl_FinishHandshake | xul.dll!HandshakeCallback | xul.dll!RebuildVerifiedCertificateInformation | xul.dll!mozilla::psm::CertVerifier::VerifySSLServerCert | xul.dll!mozilla::psm::CertVerifier::VerifyCert | xul.dll!mozilla::psm::BuildCertChainForOneKeyUsage | xul.dll!mozilla::pkix::BuildCertChain | xul.dll!mozilla::pkix::BuildForward | xul.dll!mozilla::psm::NSSCertDBTrustDomain::FindIssuer | |- nss3.dll!CERT_CreateSubjectCertList There are others (e.g. coming through nsHttpConnectionMgr), but I think those are the two hottest paths in terms of file accesses. They seem to come in groups of four over ~1ms, FWIW. Maybe NSS could be made more intelligent in its SQLite usage, or the higher level layers could cache things? I'm thowing out ignorant ideas here.
Flags: needinfo?(nfroyd)
Ah, that makes a lot more sense. The good news is these should be off the main thread, so in theory this isn't causing ui jank. That said, yeah, it doesn't seem like it would be necessary to repeatedly access these files for this operation (we're basically looking for all certificates that have a particular subject i.e. it's a read-only operation). It may be the case that NSS is using sqlite incorrectly. It may unfortunately instead be the case that this is just how sqlite works (as in, maybe the check is "what if another process opened this database simultaneously but then journaled something and died?")
Summary: constant accesses to non-existent $PROFILE/cert{8,9}.db during normal browsing → maybe softoken can stop accessing cert9.db-journal and cert9.db-wal all the time for read-only operations
After a bit of looking, I think this is because softoken doesn't open the database in exclusive locking mode, which means it has to check for a hot journal that may have been left by a simultaneous access. I think it would be possible to change the implementation to use exclusive locking, but it would be a fair bit of work (in part because currently softoken appears to re-open the database for every operation that could change data, which also seems unnecessary, but again would require a fair bit of work to change). Like I said, though, as long as this is off the main thread, it's probably not too much of an issue and fixing it or working around it probably wouldn't noticeably improve performance.
Priority: -- → P3
Whiteboard: [psm-backlog]
While it's true that this work is being done off the main thread, it's still a non-trivial amount of work (disk accesses). Avoiding that work would be beneficial in just doing less stuff and saving battery life, etc.
I don't think we can solve this at the PSM layer without just not using NSS' certdb for storing/retrieving certificates, so I'm moving this to NSS.
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Priority: P3 → --
Product: Core → NSS
Whiteboard: [psm-backlog]
Version: unspecified → other
I'm investigating hangs of Thunderbird (currently 60.3.3, Windows) on my machine. When sending a single mail I see more than 200k successive accesses of the files cert9.db, cert9.db-journal, and cert9.db-wal, taking 29s total. Nothing else happens during this time and it's always the same 8 calls in a loop. Besides ceremony (LockFile, etc.) the only "real" call is ReadFile with offset 24, length 16, over and over again.
Severity: normal → major
Keywords: perf
QA Contact: jjones
Summary: maybe softoken can stop accessing cert9.db-journal and cert9.db-wal all the time for read-only operations → maybe softoken can stop accessing cert9.db-journal and cert9.db-wal all the time for read-only operations (affects reading Thunderbird mail)
See Also: → 1435141
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p2:responsiveness]

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #4)

After a bit of looking, I think this is because softoken doesn't open the
database in exclusive locking mode, which means it has to check for a hot
journal that may have been left by a simultaneous access. I think it would
be possible to change the implementation to use exclusive locking, but it
would be a fair bit of work (in part because currently softoken appears to
re-open the database for every operation that could change data, which also
seems unnecessary, but again would require a fair bit of work to change).

FYI, it was probably Bob's decision to always check the disk.

Today it's possible to share the NSS database files between multiple processes. For example, you could have just a single location on disk containing your cert9.db and key4.db files, and inside your Firefox and Thunderbird profile directories, you could have symbolic links to the NSS files.

If you decided to use exclusive locking, what would be the consequences for sharing these files between multiple applications?

If you decided to use exclusive locking, what would be the consequences for sharing these files between multiple applications?

This would be incompatible with Chrome's multi-process architecture, which as of recently began experimenting with using NSS across multiple processes (trust store interactions happen in one process, client cert interactions in another). This turns out to have a number of sharp edge cases, but changing NSS to switch to exclusive locking would fundamentally break that use case.

No, we can't change it to exclusive. A number of NSS applications have been sharing databases even before sql. In the case of the old database it could result in damaged databases.

Blocks: 1537784

I see the same problem in my Thunderbird with procmon under Windows 10 1903 in Thunderbird 60.8.0.

08:07:41,5088366 thunderbird.exe 2640 CreateFile C:\Users\<Username>\AppData\Roaming\Thunderbird\Profiles\<hash>.default\cert9.db-journal NAME NOT FOUND Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a 0.0001039
08:07:41,5105433 thunderbird.exe 2640 CreateFile C:\Users\<Username>\AppData\Roaming\Thunderbird\Profiles\<hash>.default\cert9.db-wal NAME NOT FOUND Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a 0.0001284

It is repeated for 8, 16, 24 or 32 times.

Performance Impact: --- → P2
Whiteboard: [qf:p2:responsiveness]

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: major → --
Performance Impact: medium → ---
Severity: -- → S3
Priority: -- → P2
Whiteboard: [nss-triage]
You need to log in before you can comment on or make changes to this bug.