1054373 - Crash in PK11_DoesMechanism due to race condition

I was trying to limit access to the bug as it was pointed to me that there was some Red Hat customer server information that I forgot to redact out when I manually cloned this bug. This is not I what I was trying to accomplish. This is not a security sensitive bug.

Flags: needinfo?(emaldona)

Kai Engert [:KaiE:]

Comment 5

•

11 years ago

marking comment 0 private, because it contains private information. I suggest to mark this bug as invalid, and file another, cleaner one.

Kai Engert [:KaiE:]

Comment 6

•

11 years ago

marking invalid. please file a new bug.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → INVALID

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Keywords: sec-other

Wan-Teh Chang

Comment 7

•

11 years ago

Comment on attachment 8476042 [details] [diff] [review] Fixes race conditions - V2 Review of attachment 8476042 [details] [diff] [review]: ----------------------------------------------------------------- r=wtc. Please review my suggested changes carefully and consult Bob if necessary, because I didn't check the code carefully. I pointed out a couple of changes that I don't understand, and I ask you (or Bob) to doublecheck the changes are correct. ::: lib/pk11wrap/pk11util.c @@ +1262,5 @@ > * helper function to actually create and destroy user defined slots > */ > static SECStatus > secmod_UserDBOp(PK11SlotInfo *slot, CK_OBJECT_CLASS objClass, > + const char *sendSpec, PRBool needlock) DESIGN: in general a boolean parameter makes the code less readable. Since there are only two callers of this function, it may be better to just let the caller be responsible for acquiring the necessary lock. @@ +1285,5 @@ > if (crv != CKR_OK) { > PORT_SetError(PK11_MapError(crv)); > return SECFailure; > } > + return SECSuccess; IMPORTANT: please confirm that you intend to remove the SECMOD_UpdateSlotList call. @@ +1308,2 @@ > PK11_FreeSlot(slot); > + if ((crv == CKR_OK) && Nit: delete the four spaces at the end of lines inside this function. (You can see them in the code review tool.) @@ +1363,5 @@ > char *escSpec; > char *sendSpec; > SECStatus rv; > > + PZ_Lock(mod->refLock); /* don't reuse a slot on the fly */ I don't understand this comment. At least it should be moved to the previous line. Its current end-of-line location implies it describes the PZ_Lock(mod->refLock) call, which is what I find confusing. @@ +1399,5 @@ > PK11_FreeSlot(slot); > PORT_SetError(SEC_ERROR_NO_MEMORY); > return NULL; > } > + rv = secmod_UserDBOp(slot, CKO_NETSCAPE_NEWSLOT, sendSpec, Nit: delete the space at the end of line. @@ +1410,5 @@ > PK11_FreeSlot(slot); > if (rv != SECSuccess) { > return NULL; > } > + rv = SECMOD_UpdateSlotList(mod); /* don't call holding the mod->reflock */ This comment should be more clearer because it could be misinterpreted to mean updating the slot list of 'mod' does not require holding mod->refLock. We can just say SECMOD_UpdateSlotList will acquire mod->reflock internally. @@ +1516,5 @@ > /* PR_smprintf does not set no memory error */ > PORT_SetError(SEC_ERROR_NO_MEMORY); > return SECFailure; > } > + rv = secmod_UserDBOp(slot, CKO_NETSCAPE_DELSLOT, sendSpec, PR_TRUE); Please confirm we don't need to call SECMOD_UpdateSlotList(mod) here.

Attachment #8476042 - Flags: review?(wtc) → review+

Wan-Teh Chang

Comment 8

•

11 years ago

Elio: I didn't realize this bug is already closed. If the patch I just reviewed is abandoned, please ignore my review comments.

Daniel Veditz [:dveditz]

Updated

•

11 years ago

Group: core-security

Elio Maldonado

Reporter

Comment 9

•

9 years ago

(In reply to Wan-Teh Chang from comment #8) > Elio: I didn't realize this bug is already closed. If the patch I > just reviewed is abandoned, please ignore my review comments. Wan-Teh, it turns out that I can't ignore your review and we do need this patch or a variant of it. This bug was originally reported as a problem with Red Hat Directory Server / RHDS 9.1 crash on RHEL 6.5 in a heavily used environment at but now the problem has resurfaces. This time in a different manner with RHEL-7 after we rebased to NSS 3.21 and it wasn't until we included this patch plus another related one that the build succeeded. That's the reason for reponing this bug. I'll submit an updated version of the patch for review once we have some completed additional testing.

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

Elio Maldonado

Reporter

Comment 10

•

9 years ago

Attached patch Fixes race conditions - V3 (obsolete) — Details — Splinter Review

Updated for the current sources and expanded due to further testing done downstream at Red Hat. This is Bob's patch so I defer to him to answer any questions.

Attachment #8739523 - Flags: review?(wtc)

Elio Maldonado

Reporter

Comment 11

•

9 years ago

Comment on attachment 8476042 [details] [diff] [review] Fixes race conditions - V2 obsoleted by V3.

Attachment #8476042 - Attachment is obsolete: true

Elio Maldonado

Reporter

Comment 12

•

9 years ago

Attached patch Fixes race conditions - V4 (obsolete) — Details — Splinter Review

Attachment #8739523 - Attachment is obsolete: true

Attachment #8739523 - Flags: review?(wtc)

Attachment #8739528 - Flags: review?(wtc)

Elio Maldonado

Reporter

Updated

•

9 years ago

Assignee: emaldona → rrelyea

Elio Maldonado

Reporter

Updated

•

9 years ago

Keywords: sec-other

Elio Maldonado

Reporter

Comment 13

•

9 years ago

Comment on attachment 8739528 [details] [diff] [review] Fixes race conditions - V4 This is on Bob's behalf as he's the author of this patch. Wan-Teh's time for NSS reviews is extremely limited so I'm switching the review to Eric.

Attachment #8739528 - Flags: review?(wtc) → review?(ekr)

Tim Taubert [:ttaubert] (inactive)

Updated

•

9 years ago

OS: Linux → All

Hardware: x86 → All

Version: 3.16.1 → trunk

Tim Taubert [:ttaubert] (inactive)

Comment 14

•

9 years ago

Stealing, we should fix this.

Assignee: rrelyea → ttaubert

Tim Taubert [:ttaubert] (inactive)

Comment 15

•

9 years ago

Attached patch 0001-Bug-1054373-Fix-race-between-PK11_DoesMechanism-and-.patch — Details — Splinter Review

This is similar to what Elio did, but a lot smaller. I suspect that his patch contained other things besides just fixing the race condition. Let's concentrate on that and do any potential cleanup in another bug.

Attachment #8739528 - Attachment is obsolete: true

Attachment #8739528 - Flags: review?(ekr)

Tim Taubert [:ttaubert] (inactive)

Comment 16

•

9 years ago

Here's a try run: https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=55678307f46a9a84138fed46c9d4d4998705cd63

Tim Taubert [:ttaubert] (inactive)

Comment 17

•

9 years ago

https://nss-review.dev.mozaws.net/D199

Tim Taubert [:ttaubert] (inactive)

Comment 18

•

9 years ago

https://hg.mozilla.org/projects/nss/rev/92e0af39805c Landed. If we want to take the cleanup suggested by Elio in his patch, let's open a follow-up. Kai, do you have any way to verify that the problem is fixed for you too?

Status: REOPENED → RESOLVED

Closed: 11 years ago → 9 years ago

Flags: needinfo?(kaie)

Resolution: --- → FIXED

Target Milestone: --- → 3.30

Franziskus Kiefer [:franziskus]

Updated

•

9 years ago

Blocks: 1334127

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Tim Taubert [:ttaubert] (inactive)

Comment 19

•

9 years ago

Backed out to unblock bug 1334127. https://hg.mozilla.org/projects/nss/rev/01d6c0dff06f

No longer blocks: 1334127

Flags: needinfo?(kaie)

Tim Taubert [:ttaubert] (inactive)

Updated

•

9 years ago

Target Milestone: 3.30 → ---

Kai Engert [:KaiE:]

Comment 20

•

8 years ago

We still have one of the older obsolete patches used in RHEL packages, but it now runs into a deadlock with the recent PK11_ResetToken fix, so we need to get this fixed, and should follow up on this work. Bob, we have some trouble understanding how the locks must be set. Would you please be able to give us a general description of how the locking is supposed to work, so that we can try to work on a better patch?

Flags: needinfo?(rrelyea)

Ryan Sleevi

Comment 21

•

8 years ago

Bob: This happens to be our number 1 NSS crash for ChromeOS. Because of Chrome's multi-threaded certificate verification, we regularly have two threads verifying certificates, and due to slot/token events, find themselves clobbering eachother. An example stack trace looks like this: Thread 0: C_GetMechanismList PK11_ReadMechanismList PK11_InitToken nssSlot_Refresh nssSlot_IsTokenPresent nssSlot_GetToken nssTrustDomain_FindCertificatesBySubject CERT_CreateSubjectCertList pkix_pl_Pk11CertSTore_GetCert pkix_BuildForwardDepthFirstSearch pkix_Build_InitiateBuildChain PKIX_BuildChain CERT_PKIXVerifyCert Thread 1: PK11_ReadMechanismList PK11_InitToken nssSlot_Refresh nssSlot_IsTokenPresent nssSlot_GetToken nssTrustDOmain_FindCertificatesBySubject CERT_CreateSubjectCertList pkix_pl_Pk11CertSTore_GetCert pkix_BuildForwardDepthFirstSearch pkix_Build_InitiateBuildChain PKIX_BuildChain CERT_PKIXVerifyCert In the case of the crash, Thread 1 has finished calling C_GetMechanismList and updated slot->mechanismList, and is the process of iterating it to set slot->mechanismBits. Thread 0, however, is busy reallocating slot->mechanismList as part of the C_GetMechanismList call. The change in Comment #19 would have addressed this, by ensuring PK11_ReadMechanismList calls were serialized. However, the fact that two threads are in the process of calling PK11_InitToken as part of nssSlot_Refresh makes me think that we actually need locking further up, since PK11_InitToken modifies other attributes of the slot. Does that sound unreasonable? My thinking is that the locking may need to be as high as at the nssTrustDomain level around the active slots, or potentially more aggressive use of nssSlot_EnterMonitor and nssSlot_ExitMonitor

Fixes race conditions 11 years ago Elio Maldonado 3.23 KB, patch		Details \| Diff \| Splinter Review
Fixes race conditions - V2 11 years ago Elio Maldonado 5.08 KB, patch	wtc : review+	Details \| Diff \| Splinter Review
Fixes race conditions - V3 9 years ago Elio Maldonado 21.69 KB, patch		Details \| Diff \| Splinter Review
Fixes race conditions - V4 9 years ago Elio Maldonado 21.67 KB, patch		Details \| Diff \| Splinter Review
0001-Bug-1054373-Fix-race-between-PK11_DoesMechanism-and-.patch 9 years ago Tim Taubert [:ttaubert] (inactive) 3.88 KB, patch		Details \| Diff \| Splinter Review
token-init-cvar.patch 8 years ago Daiki Ueno [:ueno] 5.85 KB, patch		Details \| Diff \| Splinter Review
isPresent_condition.patch 8 years ago Robert Relyea 8.50 KB, patch	ryan.sleevi : review-	Details \| Diff \| Splinter Review
isPresentCondition.patch 8 years ago Robert Relyea 8.42 KB, patch	ryan.sleevi : review-	Details \| Diff \| Splinter Review
isPresentCondition.patch v3 8 years ago Robert Relyea 8.46 KB, patch	ryan.sleevi : review+	Details \| Diff \| Splinter Review
1054373-v4.patch 8 years ago Kai Engert [:KaiE:] 8.12 KB, patch		Details \| Diff \| Splinter Review
isPresent.patch 8 years ago Robert Relyea 8.74 KB, patch		Details \| Diff \| Splinter Review
isPresentv5.patch 8 years ago Robert Relyea 8.75 KB, patch		Details \| Diff \| Splinter Review
isPresentAsCheckedIn.patch 8 years ago Robert Relyea 8.73 KB, patch		Details \| Diff \| Splinter Review
race.update 8 years ago Robert Relyea 3.64 KB, patch		Details \| Diff \| Splinter Review
Race Update with Martin's comments incorporated 8 years ago Robert Relyea 5.24 KB, patch	mt : review+	Details \| Diff \| Splinter Review