Closed Bug 503418 Opened 15 years ago Closed 15 years ago

Crash in [@ find_objects_by_template ] when enabling FIPS mode

Categories

(NSS :: Libraries, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 524167

People

(Reporter: marcia, Assigned: rrelyea)

References

Details

(Keywords: crash, regression, relnote, Whiteboard: [ss:b2])

Crash Data

Attachments

(1 file)

Seen while running Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2a1pre) Gecko/20090709 Minefield/3.6a1pre GTB5 STR: 1. Set a Master Password and Enable FIPS mode 2. Enter Private Browsing and then exist Private Browsing 3. Crash - breakpad http://crash-stats.mozilla.com/report/index/01455e3d-05ae-4ac2-aa0b-5708a2090709 Not sure if I filed this bug in the right component.
I forgot to add that I get in this error in the console after enabling FIPS: Error: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIPKCS11ModuleDB.toggleFIPSMode]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://pippki/content/device_manager.js :: toggleFIPS :: line 545" data: no] In order to trigger the crash again I had to go back and click the "Enable FIPS mode" button and it crashed again.
Severity: normal → major
Keywords: crash
Adding Ehsan to this bug since exiting Private Browsing may be involved. I was also able to reproduce the bug on my 10.5 machine following the same STR.
Signature find_objects_by_template UUID 01455e3d-05ae-4ac2-aa0b-5708a2090709 Build ID 20090709035810 Branch 1.9.2 OS Mac OS X OS Version 10.6.0 10A394 CPU x86 Crash Reason EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE Crash Address 0x11f2c58 Frame Module Signature [Expand] Source 0 libnss3.dylib find_objects_by_template 1 libnss3.dylib nssToken_FindCertificateByIssuerAndSerialNumber 2 libnss3.dylib nssToken_ImportCertificate 3 libnss3.dylib PK11_ImportCert 4 XUL AuthCertificateCallback security/manager/ssl/src/nsNSSCallbacks.cpp:1020 5 libssl3.dylib ssl3_HandleHandshakeMessage 6 libssl3.dylib ssl3_HandleRecord 7 libssl3.dylib ssl3_GatherCompleteHandshake 8 libssl3.dylib ssl_GatherRecord1stHandshake 9 libssl3.dylib ssl_Do1stHandshake 10 libssl3.dylib ssl_SecureSend 11 libssl3.dylib ssl_SecureWrite 12 libssl3.dylib ssl_Write 13 XUL nsSSLThread::Run security/manager/ssl/src/nsSSLThread.cpp:1043 14 libnspr4.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:228 15 libSystem.B.dylib libSystem.B.dylib@0x17dd2 16 libSystem.B.dylib libSystem.B.dylib@0x17b35
Nothing on the stack seems too related to private browsing... Marcia, can you please test these builds and see if they crash similarly? <https://build.mozilla.org/tryserver-builds/ehsan.akhgari@gmail.com-try-91b59c319667/>
I tried running the build and could not reproduce the issue following the STR in Comment 0, but two things happened: 1. I still get the error in the Error Console 2. I crashed but got no breakpad when I tried to type something in the addons search after enabling FIPS. Will attach the Apple report next. (In reply to comment #4) > Nothing on the stack seems too related to private browsing... > > Marcia, can you please test these builds and see if they crash similarly? > <https://build.mozilla.org/tryserver-builds/ehsan.akhgari@gmail.com-try-91b59c319667/>
Attached file Apple crash report
I got this crash after following these STR: 1. Run Ehsan's tryserver build 2. Set a Master Password and Enable FIPS 3. Go to Tools -> Addons and type something in the search field. Breakpad does not come up. I was able to reproduce this issue on my 10.5 machine as well, and Breakpad did not come up there either.
Can this be reproduced on Windows? Might get better stack traces there.
I was able to crash on Windows Vista. Breakpad report is here: http://crash-stats.mozilla.com/report/index/5bcec2b8-1251-4b06-b144-d06ed2090713 Unfortunately this machine used to have Windbg but it got blown away when the machine was reformatted so I can try to set it up again if it will be useful.
OS: Mac OS X → All
Hardware: x86 → All
Marcia, I have reproduced this, with FF 3.5 and my own NSS bits. Trying to figure out the cause now.
What I am seeing may not be exactly the same crash, though the same steps reproduce the crash. Unfortunately, I get a corrupt stack, that ends in a PORT_Free_Util in my copy of nssutil3.dll , and no callers on the stack. I am using a custom-made version of NSS where my libs are all built with the debug heap, but I left NSPR as the stock bits from FF 3.5 that use mozcrt19.dll for their heap. If I had to guess, I think there is a mismatch malloc/free here. But I can't pinpoint it. If I use all the stock bits from FF3.5 (everything built with mozcrt19), then I don't see the crash.
Questions for the Private Browsing people: Does switching into and out of "private browsing" mode shut down and restart NSS? Is it possible that some thread(s) don't know that NSS is being shut down and restarted, and they continue to use stale pointers into old NSS structures that were allocted before NSS was shutdown and restarted, even after the restart has been done? In the stack from comment 8: 0 nss3.dll find_objects_by_template security/nss/lib/dev/devtoken.c:448 1 nss3.dll nssToken_FindCertificateByIssuerAndSerialNumber devtoken.c:866 2 nss3.dll nssToken_ImportCertificate security/nss/lib/dev/devtoken.c:525 3 nss3.dll PK11_ImportCert security/nss/lib/pk11wrap/pk11cert.c:920 4 xul.dll AuthCertificateCallback manager/ssl/src/nsNSSCallbacks.cpp:1020 PSM's AuthCertificateCallback calls PK11_GetInternalKeySlot() and passes that slot pointer down to PK11_ImportCert(), which calls PK11Slot_GetNSSToken(slot) to get slot->token. It then passes this token value down the stack, which is untouched until it gets to find_objects_by_template. There, the token pointer is dereferenced, and is found to be invalid (causes a crash). At the moment, I can't think of any way that PK11_GetInternalKeySlot() could return a pointer to a slot with an invalid token pointer. But if that slot value was stale, then ...
Marcia: the patch inside the try server build I posted is now on trunk, so you should get the same behavior on trunk as well. The patch disables toggling the offline setting when the private browsing mode gets changed. (In reply to comment #11) > Questions for the Private Browsing people: > > Does switching into and out of "private browsing" mode shut down and restart > NSS? No. The only suspicious things I guess are tearing down the secure decoder ring, and deleting the HTTP auth sessions. See <http://mxr.mozilla.org/mozilla-central/source/browser/components/privatebrowsing/src/nsPrivateBrowsingService.js?mark=291-294,296-299#290>. > Is it possible that some thread(s) don't know that NSS is being shut down and > restarted, and they continue to use stale pointers into old NSS structures > that were allocted before NSS was shutdown and restarted, even after the > restart has been done? Like I said NSS is not being shut down, but there have been bugs in this regard. For example, in bug 463256, it was discovered that resetting the decoder ring causes some SSL sockets not be torn down, which may cause problems after the private browsing mode has changed. > In the stack from comment 8: > > 0 nss3.dll find_objects_by_template security/nss/lib/dev/devtoken.c:448 > 1 nss3.dll nssToken_FindCertificateByIssuerAndSerialNumber devtoken.c:866 > 2 nss3.dll nssToken_ImportCertificate security/nss/lib/dev/devtoken.c:525 > 3 nss3.dll PK11_ImportCert security/nss/lib/pk11wrap/pk11cert.c:920 > 4 xul.dll AuthCertificateCallback manager/ssl/src/nsNSSCallbacks.cpp:1020 > > PSM's AuthCertificateCallback calls PK11_GetInternalKeySlot() and passes > that slot pointer down to PK11_ImportCert(), which calls > PK11Slot_GetNSSToken(slot) to get slot->token. It then passes this token > value down the stack, which is untouched until it gets to > find_objects_by_template. There, the token pointer is dereferenced, and > is found to be invalid (causes a crash). > > At the moment, I can't think of any way that PK11_GetInternalKeySlot() could > return a pointer to a slot with an invalid token pointer. But if that slot > value was stale, then ... Could nsISecretDecoderRing.logoutAndTearDown be responsible here?
The essence of this bug is that the "token" pointer in the "slot" structure whose address was returned by PK11_GetInternalKeySlot is invalid. If this had happened just after switching INTO FIPS mode, I would suspect that PK11_GetInternalKeySlot returned the address of the second slot (which ceases to exist in FIPS mode). But since it happened just after switching OUT of FIPS mode, both slots should be valid. So, I conclude that either a) the slot pointer is stale, being a pointer to a slot structure that existed before the change to/from FIPS mode, that has subsequently been freed, or b) One of the slot structures is in an inconsistent and invalid state, having an invalid token pointer. The fact that the process has gone from non-FIPS to FIPS mode and then back to non-FIPS mode is almost certainly the reason that the slot structures changed. In FIPS mode, softoken has only one slot, and PK11_GetInternalKeySlot returns the address of that slot, the first and only one in the slot table. In non-FIPS mode, softoken has two slots and PK11_GetInternalKeySlot returns the address of the second one. I suspect that the switch from non-FIPS -> FIPS mode freed/destroyed the second slot's token structure, and the subsequent switch from non-FIPS -> FIPS failed to fully reinitialized the second slot and token. I think that NSS's suite of test programs lacks any that test operation after repeated switches to/from FIPS mode. AFAIK, the ONLY test program we have that does that switching is modutil, and it exits almost immediately after switching, IINM, so if the switch left things in a bad state, our tests might not detect it.
Nelson, Re: comment 13, Actually, pk11mode tests switching in/out of FIPS mode. But it only tests softoken. It does not test or use any other NSS components, such as libssl . The problem I saw on my machine is crashing right after getting into FIPS mode - so it is a different issue than Marcia's. Sorry for the incorrect report. Marcia's STR in comment 0 say that her problem occurs after she has switched INTO FIPS mode, and then entered/exited private browsing. I didn't see any mention of switching OUT of FIPS mode. Does private browsing affect FIPS mode ?
(In reply to comment #14) > Marcia's STR in comment 0 say that her problem occurs after she has switched > INTO FIPS mode, and then entered/exited private browsing. I didn't see any > mention of switching OUT of FIPS mode. Thanks, Julien. You're right. I misread/misremembered it. > Does private browsing affect FIPS mode ? I don't know. It apparently causes the browser to access an https URL. I'd guess that it is checking for browser updates, which IIRC uses https. This hypothesis is testable. Try firing up FF 3.5, switching info FIPS mode, and then visiting a https URL previously unvisited in that process lifetime. If that also crashes, then we know that "private browsing" was not really a factor, and that merely calling PK11_ImportCert after going into FIPS mode will reproduce the crash. If true, this also gives us a very good idea where the problem is. We can also easily write/enhance a test program to do those steps (init, go to FIPS mode, call PK11_ImportCert).
I forgot to mention yesterday that another way I was able to crash in this stack was to follow the same STR as in Comment 0, except that I went to the Addons manager and typed something in the search bar. If I remember correctly I did not even enter PB mode when that crash happened. But the first manifestations of the crash definitely involved going in and out of PB mode.
Nelson, Re: comment 15, I can't test your hypothesis because my FF 3.5 + my debug heap NSS libraries crash immediately when turning on FIPS mode, with a corrupt stack. There is no chance for me to go into private browsing, or test any other HTTPS URL. If I use the stock libs that come with FF 3.5 then all is fine. I think the crash I am seeing is most likely a separate bug due to mismatched heaps when enabling FIPS. Unfortunately I still can't trace it due to the corrupt stack.
julien, you know you can use https://developer.mozilla.org/en/How_to_get_a_stacktrace_with_WinDbg to get .pdb files for our builds, right?
timeless, Thanks for that link. I didn't know. But I have MS VS 2008, no need for Windbg. However, since the stack is corrupt (missing caller frames - not addresses), having the missing PDBs wouldn't help.
You can also get the PDBs with MSVS. That's what I do.
One thing which might be relevant here is that until recently, every time the status of private browsing mode was changed on trunk, the browser would be put into the offline mode and online mode successively. This no longer happens. The build in comment 4 was the first to stop toggling the offline mode, and based on comment 5, it seems like toggling the offline mode has been a determining factor here, so it may be worth to look into.
(In reply to comment #14) > Marcia's STR in comment 0 say that her problem occurs after she has switched > INTO FIPS mode, and then entered/exited private browsing. I didn't see any > mention of switching OUT of FIPS mode. Does private browsing affect FIPS mode ? The only other FIPS related PB bug that we have on file is bug 489880. Is that relevant?
(In reply to comment #22) > The only other FIPS related PB bug that we have on file is bug 489880. Is that > relevant? They're probably related. They may have the same underlying root cause.
Does this bug occur a) only on the 191 branch and NOT on the trunk? b) only on the trunk and NOT on the 191 branch? c) on both?
Definitely occurs on trunk as that is where I first saw the bug. Was not able to reproduce using FF 3.5 on Vista. Will try on the Firefox 1.9.1 latest branch nightly after this meeting. (In reply to comment #24) > Does this bug occur > a) only on the 191 branch and NOT on the trunk? > b) only on the trunk and NOT on the 191 branch? > c) on both?
If this can be reproduced on trunk but not branch, then I think the recently committed change for bug 496335 is a suspect.
I cannot reproduce this on the branch using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.1pre) Gecko/20090714 Shiretoko/3.5.1pre, so I guess this is trunk only.
So it would be nice to get a regression range. Someone interested to nail it down?
Severity: major → critical
Flags: blocking1.9.2?
Whiteboard: [ss:b2]
Blocks: 496335
(In reply to comment #25) > Definitely occurs on trunk as that is where I first saw the bug. Have you tried to reproduce this on the latest trunk nightly? The trunk code when you first filed this is different with the current code. (In reply to comment #26) > If this can be reproduced on trunk but not branch, then I think the recently > committed change for bug 496335 is a suspect. Please note that what this bug did was to _remove_ the toggling of the offline mode in private browsing transitions, and this bug (and also bug 489880) were filed before bug 496335 was fixed.
I was just able to reproduce this using the latest trunk nightly, Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090731 Minefield/3.6a1pre by following the STR in Comment 0.
I tried to reproduce this on a new profile, but after setting the master password to a string password and trying to enable FIPS, I get the below error on the error console: Error: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIPKCS11ModuleDB.toggleFIPSMode]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://pippki/content/device_manager.js :: toggleFIPS :: line 545" data: no] Marcia, could you please create a Windows/Linux profile and enable FIPS in it, and send it to me so that I can use it to test this bug?
This can't be a result of bug 496335, because it was reported on 07/09, but that bug was landed on trunk on 07/11. Marcia, could you please use the nightlies before 07/11 so that we can be 100% sure if it's related to bug 496335 or not? A regression range is much needed here.
Bob, I think you're the best candidate to diagnose this bug.
Assignee: nobody → rrelyea
Using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090710 Minefield/3.6a1pre, I still crash following the same steps, but the stack trace is a bit different: http://crash-stats.mozilla.com/report/index/dc4755eb-6c14-4a15-9f44-8f2172090820 The one thing I did different was I came out of PB mode and loaded paypal.com. The crash happened immediately after that. Regarding Comment 31, I get the same error that Ehsan does and have been testing this on a new profile even with that error.
Priority: -- → P1
Based on comment 32 and 34, I'm removing the dependency on bug 496335.
No longer blocks: 496335
Flags: blocking1.9.2? → blocking1.9.2+
Comment 27 indicates that this is trunk-only; marking 1.9.2-unaffected; please let me know if that's wrong.
(In reply to comment #36) > Comment 27 indicates that this is trunk-only; marking 1.9.2-unaffected; please > let me know if that's wrong. I don't think that comment is right. Comment 27 says it's not on 1.9.1 (still true?), and says trunk only. But Comment 3 has this stack on 1.9.2, right?
What's the story here?
This definitely affects 1.9.2 - I just confirmed it with the latest nightly. I get the same error in the console as well as the crash if I follow the STR in Comment 0. Basically you cannot toggle FIPS mode in the latest nightly. Regarding this bug not being on 1.9.1 - I just saw Bug 521878 filed on the 1.9.1 branch - it seems to have the same stack trace as well as the same error in the console. The STR are a bit different, but it looks similar to this bug. (In reply to comment #37) > (In reply to comment #36) > > Comment 27 indicates that this is trunk-only; marking 1.9.2-unaffected; please > > let me know if that's wrong. > > I don't think that comment is right. Comment 27 says it's not on 1.9.1 (still > true?), and says trunk only. But Comment 3 has this stack on 1.9.2, right?
Do we think this blocks the beta, or can we fix it for RC? Not sure what we feel the invasive-ness of the problem/fix is.
Priority: P1 → P2
I think it does - you cannot get into FIPS mode correctly (the button does not show as enabled in the UI as well), and because of that you may crash intermittently as you are browsing. (In reply to comment #40) > Do we think this blocks the beta, or can we fix it for RC? Not sure what we > feel the invasive-ness of the problem/fix is.
Correction - I don't think it necessarily blocks the beta, but definitely final.
See also bug 522041 which implies bug 516396 as having caused a regression, and has the exact same STR (but a different stack trace)
Marcia, do you still see the crash with a latest Minefield/Namoroka build and a fresh profile? I did a couple of tests regarding your comments on STR but wasn't able to get Firefox to crash.
This bug could be marked as duplicate to bug 509319 as the issue is fixed in the latest FireFox 3.6b1 candidate. I was able to reproduce this bug by installing Namoroka/3.6b1pre http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009-10-12-05-mozilla-1.9.2/ But then confirmed that the bug is fixed using Firefox 3.6b1-candidates http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.6b1-candidates/build1/ It should be noted that when Step 1 to reproduce 1. Set a Master Password and Enable FIPS mode Namoroka 3.6b1pre did not actually go into FIPS mode, because the .chk files were never installed. Nothing visually happens though when one click "Enable FIPS" so it would be understandable for a user to assume that the browser was in FIPS mode. The Error Console does display the uncaught Exception that Marci reported in Comment #1 Error: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIPKCS11ModuleDB.toggleFIPSMode]" but I then did the following steps: 2. Enter Private Browsing and then exist Private Browsing 3. goto https site (crash occurs) using 3.6b1-candidate that has installed all 3 required .chk files this bug is not reproducible unless I manually go and remove 1 or all of .chk files from Firefox.app/Contents/MacOS then I can reproduce the error. To ensure this bug does not occur again PSM should catch and handle the Exception if NSS is unable to go into FIPS mode due to missing .chk files.
Glen, so recent nightly builds work too for you? I just wonder because of bug 522041.
(In reply to comment #46) > Glen, so recent nightly builds work too for you? I just wonder because of bug > 522041. I tested with the Firefox 3.6beta1 which installs the .chk files correctly then I tested with the more recent nightly builds Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2b2pre) Gecko/20091020 Namoroka/3.6b2pre Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5pre) Gecko/20091020 Shiretoko/3.5.5pre and both of those builds are missing the .chk files: libfreebl3.chk libnssdbm3.chk libsoftokn3.chk There are two issues: 1) nightly builds require .chk files I believe bug 522220 should address this issue. 2) PSM should not crash if it is unable to put NSS in FIPS mode. I plan to address this issue in bug 511320.
Depends on: 522220, 511320
What the ETA/progress of this bug? It's a blocker for Firefox 3.6; happy to get you more support here, need to know what's going on.
Mike Beltzner wrote: > What the ETA/progress of this bug? This bug has the very same stack as bug 524167, for which there is a patch. However, bug 524167 is a NULL ptr dereference, which the patch detects. This crash is NOT obviously a NULL ptr dereference, so it's not a sure bet that the patch for bug 524167 will also fix this crash. It might or might not. It would be nice if there was some way for Marcia to test that patch without committing and releasing it.
I guess we could try creating a tryserver build with the patch in the other bug and I could test it.
But in talking with Henrik it sounds as if it is not that easy - would need a patched version of nss and then I guess we could create an HG patch. I thought I could just upload the patch to tryserver and create a testable build. (In reply to comment #50) > I guess we could try creating a tryserver build with the patch in the other bug > and I could test it.
Nelson or anyone else from the NSS team, can you please tell us if it is enough to patch the single file referenced in the patch or do we have to fetch the current version from cvs to create a patch we can use to upload to the tryserver? I believe other patches have been already checked into NSS since the last revision so it will be a bit harder (as Marcia said).
The patch attached to bug 524167 patches two files in the NSS sources. It's a relatively small patch. You're going to be applying it to code in Mozilla central, right? If it applies cleanly, you should be OK.
Marcia, a tryserver build is available here: https://build.mozilla.org/tryserver-builds/hskupin@mozilla.com-bug503418-FIPS-crash/ Can you please try if the patch on bug 524167 fixes your crash? Btw. I believe we should dupe this bug against bug 524167 because it's the identical crash.
No crash using the tryserver build in Comment 54. However, I do get a strange error that I took a screenshot of when I had an SSL site loaded and then switched into PB mode ("The Operation failed because the PKCS #11 token is not logged in"). I can file a separate but on that if it is reproducible.
Removing blocking; it's become apparent that a dependency for this (bug 524167) won't land until we move to a later version of NSS, and I don't think that I can bring myself to hold the release for the ability to toggle FIPS mode. I'd rather relnote this and fix it in a security/stability release. People know how to renominate if they think I'm approaching this wrong. Marcia: can you help me write a relnote?
blocking2.0: --- → ?
Flags: wanted1.9.2+
Flags: blocking1.9.2-
Flags: blocking1.9.2+
Keywords: relnote
I'm going to mark this as a duplicate of bug 524167. If this bug persists after 3.12.6 is taken into Firefox, please reopen.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
blocking2.0: ? → ---
Crash Signature: [@ find_objects_by_template ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: