Open Bug 936407 Opened 12 years ago Updated 3 years ago

Firefox freezes when removing Gemalto smartcard

Categories

(NSS :: Libraries, defect, P3)

x86
Windows 7

Tracking

(Not tracked)

People

(Reporter: roberto.viola, Unassigned, NeedInfo)

References

Details

(Keywords: regression)

Attachments

(5 files)

User Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36 Steps to reproduce: 1. Insert a Gemalto smartcard 2. Remove the smartcard Note: this bug is available from version 10 to version 25. From version 9 to version 3 there isn't this issue. I would like to try nightly builds from version 9 to version 10 in order to identify which build introduces this bugs, but i can't find a nighlty build repository of these versions. Actual results: Firefox freezes until i reinsert the smartcard. After that it goes flawless Expected results: Firefox shouldn't freeze at all
OS: Windows 8 → Windows 7
Hardware: x86_64 → x86
As nobody has a Gemalto smartcard to test, you need to install the tool mozregression (see http://harthur.github.com/mozregression/ for details). You can use it with a testing profile (read the FAQ). Nightly builds are available on the FTP: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ (by year since 2004, use the repository mozilla-central) FF10 nightlies started in Sept. 2011 (mozregression --good=2011-09-01).
Flags: needinfo?(roberto.viola)
Thank you Loic for your quick answer. I've tried and this is the result: 2011-10-30 is the last good nightly 2011-10-31 is the first bad nighlty If you need more information, i'm here :) Good weekend
Flags: needinfo?(roberto.viola)
Could you provide the changelog output, please.
I'm AFK. You could run mozregression --good=2011-10-30 --bad=2011-10-31 to see the changelog URL.
I don't know this Gemalto Smartcard, but I guess you need to load a Gemalto encryption module in Firefox, like described here: http://docs.epoline.org/doc/epoline/firefox/GemSAFE_Firefox_EN.pdf If that's the case, the suspected bug could be: Brian Smith — Bug 675702 - Remove XPCOM Proxies: Do PSM client certificate processing on the main thread, r=kaie What do you think? Can you explain how this product works with Firefox (encryption, settings etc)?
Flags: needinfo?(roberto.viola)
Yes, i load the gemalto library as described in the document (versions < 2011-10-30 work with that library). The gemalto smartcard contains some certs that are sent to a web server in order to authorize the user with an unique id. I think it could be releated to your link. How could I run the current version (25) without that patch?
Flags: needinfo?(roberto.viola)
CC'ing bsmith who is aware of this kind of stuff.
Blocks: 675702
Status: UNCONFIRMED → NEW
Component: Untriaged → Security: PSM
Ever confirmed: true
Flags: needinfo?(brian)
Keywords: regression
Product: Firefox → Core
Summary: Freeze removing smartcard → Firefox freezes when removing Gemalto smartcard
I won't be able to work on this for quite a while.
Flags: needinfo?(brian)
Brian could be useful if i recompile version 10 and/or version 25 without your patch in order to understand if Gemalto cards works fine without it?
Flags: needinfo?(brian)
With Firefox 25, please follow the instructions at https://developer.mozilla.org/en-US/docs/How_to_Report_a_Hung_Firefox to crash Firefox while it is hung and paste the crash report ID in this bug. Recompiling old versions is probably not useful at this point.
Flags: needinfo?(brian) → needinfo?(roberto.viola)
Flags: needinfo?(roberto.viola) → needinfo?(brian)
Socket transport thread: nsSocketOutputStream::Write(char const *,unsigned int,unsigned int *) nsSSLIOLayerWrite ... nsNSS_SSLGetClientAuthData(void *,PRFileDesc *,CERTDistNamesStr *,CERTCertificateStr * *,SECKEYPrivateKeyStr * *) mozilla::psm::SyncRunnableBase::DispatchToMainThreadAndWait() Main thread: ClientAuthDataRunnable::RunOnTargetThread() nsNSSCertificateDB::FindCertByDBKey(char const *,nsISupports *,nsIX509Cert * *) CERT_FindCertByIssuerAndSN PK11_FindCertByIssuerAndSN STAN_GetCERTCertificateOrRelease stan_GetCERTCertificate nssPKIObject_Unlock PR_ExitMonitor <-- this is hanging? Roberto, when Firefox hangs like this is it using CPU or not?
Flags: needinfo?(brian) → needinfo?(roberto.viola)
Yes, it takes a whole core (25% cpu on a quadcore).
Flags: needinfo?(roberto.viola)
Benjamin, do you think that the stacktrace is releated to the Bug 675702 ?
Flags: needinfo?(brian)
It's in similar code, so it could be. Given that this is spinning and not hanging, we'll probably need a profile in order to get more data.
Profile? So you need a Smartcard to try? If you want i could give you an access to my system via teamviewer.
Benjamin i've tried to remove the patch 79658 from rev 158125. It compiles well but the error still remains.
Update: i'm trying to solve it myself. I'm debugging Firefox with VS2012. When it freezes it hangs in this while on pk11cert.c: do { /* free the old cert on retry. Associated slot was not present */ if (rvCert) { CERT_DestroyCertificate(rvCert); rvCert = NULL; } cert = NSSTrustDomain_FindCertificateByIssuerAndSerialNumber( STAN_GetDefaultTrustDomain(), &issuer, &serial); if (!cert) { break; } rvCert = STAN_GetCERTCertificateOrRelease(cert); if (rvCert == NULL) { break; } /* Check to see if the cert's token is still there */ } while (!PK11_IsPresent(rvCert->slot)); rvCert is valid and PK11_IsPresent return always false so it stays always in this loop! In your opinion, what should be the behaivor when the smart key is unplugged?
Flags: needinfo?(brian) → needinfo?(benjamin)
That is a NSS question which I can't answer.
Flags: needinfo?(benjamin) → needinfo?(brian)
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Flags: needinfo?(brian)
Product: Core → NSS
Version: 10 Branch → trunk
Bob, see comment 19. Based on that comment, this seems to be an NSS bug. However, I know very little about how the smart card insertion/removal detection is supposed to work in NSS. Can you help verify whether the code mentioned in comment 19 is correct and/or give us some pointers for things we should be looking for? Thanks.
Flags: needinfo?(rrelyea)
No, the loop is correct, if PK11_IsPresent returns false for the returning slot, then the cert should be updated so that either we no longer find the cert in the NSSTrustDomain_FindCertificateByIssuerAndSerialNumber, or the slot returned for that certificate should be be a new slot. This loop should only ever execute twice. (well unless you have multiple tokens with the same cert that all get removed at one time).: PK11_IsPresent -> pk11_IsPresentCertLoad -> nssToken_IsPresent -> nssSlot_IsTokenPresent-> nssToken_NotifyCertsNotVisisble -> nssTrustDomain_RemoveTokenCertsFromCache -> either: nssTrustDomain_RemoveCertFromCacheLOCKED() (which should cause NSSTrustDomain_FindCertificateByIssuerAndSerialNumber to fail to find the cert), or: STAN_ForceCERTCertificateUpdate -> stan_GetCERTCertificate -> fill_CERTCertificateFields (which should fill in a different slot than the Places where this chain can go wrong: 1) the token field in the PK11SlotInfo structure (slot passed to PK11_IsPresent) has been corrupted (read set to NULL) before the certs have been removed. 2) The token structure (the one one pointed to by the PK11SlotInfo) had it's name field corrupted before the certs have been removed. 3) The certs weren't properly registered in the cache. If you are looping multiple times, something as definitely messed with the NSS internal state. bob
Flags: needinfo?(rrelyea)
Hmmm I running ESR 17 with NSS 3.15.1 with not problems (insertion/removal seems to work just fine). What versions of NSS does 'version 10' mean is it Firefox 10, or some build number on the current trunk. bob
Thank you Robert. I will run more test next week (I'm out of office right now). I will let you know if I will find something useful.
Hi Robert, i'm finally in front of the problem. I try to explain the behaivour: - The loop is truly without an end (not only twice). - When I detach the key the SmartCardMonitoringThread::Execute run the function PK11_IsPresent that calls correctly the nssTrustDomain_RemoveTokenCertsFromCache. - After that, we approach into the PK11_FindCertByIssuerAndSN and rvCert is always ok and PK11_IsPresent return always false without calling the nssTrustDomain_RemoveTokenCertsFromCache because the name buffer was blanked from the SmartCardMonitoringThread. If you want we could use Teamviewer (and skype maybe) to debug together the issue.
Flags: needinfo?(rrelyea)
Which patch does nssTrustDomain_RemoveTokenCertsFromCache take? Does it call ssTrustDomain_RemoveCertFromCacheLOCKED() or does it call STAN_ForceCERTCertificateUpdate()? Also, do you have a copy of the cert in your NSS database (possibly without the key)?
Flags: needinfo?(rrelyea)
Attached file cert1.crt
Attached file cert3.crt
nssTrustDomain_RemoveTokenCertsFromCache calls STAN_ForceCERTCertificateUpdate. I attached to this task the 2 certs avaiable on the Gemalto.
Flags: needinfo?(rrelyea)
Hi Robert, I think you misunderstood my second question. In addition to the cert being stored in on your Gemalto card, is the cert also stored in the NSS database (can you find the certificate either under 'My certs' or 'Peer certs' in the certificate manager even if your Gemalto card has been removed? The call to STAN_ForceCERTCertificateUpdate should have changed the slot value from the gem slot to some other slot. This happens in fill_CERTCertificateFields. Hmm one thing that looks like it could happen is if instance == NULL, then we may miss updating the cert. I wonder if the cert has somehow reverted to the temp cache because someone has it open. If that's the case, I think we can fix the issue in fill_CERTCertificateFields. Can you check to see if either context != NULL or instance is equal to null. bob
Flags: needinfo?(rrelyea)
I guess you're talking about certutil? If it's so i can't run it from my mozilla build directory. This is the output: C:\mozilla-source\mozilla-central\obj-i686-pc-mingw32\dist\bin>certutil.exe -U certutil.exe: function failed: SEC_ERROR_LEGACY_DATABASE: The certificate/key da tabase is in an old, unsupported format. I checked inside the fill_CERTCertificateFields called when i remove the key. This is the stacktrace: > nss3.dll!fill_CERTCertificateFields(NSSCertificateStr * c=0x09c2cf50, CERTCertificateStr * cc=0x19218768, int forced=1) Riga 843 C nss3.dll!stan_GetCERTCertificate(NSSCertificateStr * c=0x09c2cf50, int forceUpdate=1) Riga 890 C nss3.dll!STAN_ForceCERTCertificateUpdate(NSSCertificateStr * c=0x09c2cf50) Riga 914 C nss3.dll!nssTrustDomain_RemoveTokenCertsFromCache(NSSTrustDomainStr * td=0x192185e8, NSSTokenStr * token=0x1922b360) Riga 448 C nss3.dll!nssToken_NotifyCertsNotVisible(NSSTokenStr * tok=0x1922b360) Riga 303 C nss3.dll!nssSlot_IsTokenPresent(NSSSlotStr * slot=0x1922bde8) Riga 172 C nss3.dll!nssToken_IsPresent(NSSTokenStr * token=0x1922b360) Riga 1441 C nss3.dll!pk11_IsPresentCertLoad(PK11SlotInfoStr * slot=0x19227358, int loadCerts=1) Riga 1435 C nss3.dll!PK11_IsPresent(PK11SlotInfoStr * slot=0x19227358) Riga 1483 C xul.dll!SmartCardMonitoringThread::Execute() Riga 284 C++ nss3.dll!_PR_NativeRunThread(void * arg=0x19231d30) Riga 419 C nss3.dll!pr_root(void * arg=0x19231bd0) Riga 90 C msvcr110.dll!_callthreadstartex() Riga 354 C msvcr110.dll!_threadstartex(void * ptd=0x1923ac88) Riga 332 C I checked instance and context variable: - context that takes the value from c->object.cryptoContext in function fill_CERTCertificateFields, is null; - instance, that takes the value from get_cert_instance, has a right value; Roberto
Flags: needinfo?(rrelyea)
Roberto, by 'the right value' do you mean softoken, or the gemalto token. We just removed the gemalto token (or should have just removed it), So the gemalto token should not be the current instance.
Flags: needinfo?(rrelyea)
Hmm, I wonder if we are incorrectly getting to instances of the gemalto token added to the certificate. Our instance is being removed in 'remove_token_certs' which is a callback function from nssHash_Iterate() iterating over our cert cache and called from nssTrustDomain_RemoveTokenCertsFromCache. We know that we are finding this cert in 'remove_token_certs', and that we are removing at least one instance in the list, and that there are more than one object instance.
Oh, I wonder.... do you have to copies of the exact same cert on your token? That could cause us to have more than one instance for the same token because there would be the same cert with multiple PKCS #11 object id's.
Also, an even more pathelogical case would be 2 different certs with the same issuer serial number (which would be interpeted by NSS as a single cert).
I've got some problems debugging some symbols. For example when i'm in fill_CERTCertificateFields i can't view the value of instance after the line "instance = get_cert_instance(c);": Visual Studio 2012 says "identificator not recognized". Do you know why? Anyway i follow the code inside the get_cert_instance -> nssCryptokiObject_Clone and i check the rvObject: the parameter "label" of this variable has refereed to the gemalto token that has just bee1n removed! Is this the problem? If it's so, what do i should look to investigate deeply? I moved on: with your help i put a breakpoint into the remove_token_certs. If you look on the screenshot, you will see that we have only 1 instances, but the tokens are differents! So remove_token_certs doesn't remove anything! Have we reached the problem? > do you have to copies of the exact same cert on your token? That could cause us to have more than one instance for the same token because there would be the same cert with multiple PKCS #11 object id's. I don't understand this question. I have copied the certs attached to this task in gemalto token. Is this what you want to know?
Flags: needinfo?(rrelyea)
Attached image Immagine3.png
Ops, i made a mistake: i had stopped on the first istance of remove_token_certs. But the PL_HashTableEnumerateEntries calls many times the remove_token_certs. Infact, after some calls, remove_token_certs finds the correct token of my gemalto (look at the new screenshot). So never mind, it's not releated to remove_token_certs, but it still alives the issue on the istance, isn't it?
I inserted a very dirty solution for our issue. I noticed that when remove_token_certs decrements object->numIstances, object->numIstances remains to 1 (it was 2). So, i replace the line object->numIstances--; with the line object->numIstances=0; and firefox doesn't hangs anymore when i unplugged the gemalto key. I know it's very dirty, but i guess it could be useful to you in order to understand the issue. I hope we're near to the solution.
So, given what we are seeing, this is expected. The question is why do you have 2 object instances (are both instances for the removed gem token?). My current theory is your card has two copies of the same cert under 2 pkcs #11 id's. This creates 2 object instances with the same token. If that's the case, what we need to do in remove_token_certs is to continue to loop through the objects and remove all the objects for a given token (rather than break). What we should do is print out the two objects and see if they point to the same token, but different objectID's. If that's the case, then I think we found our issue. bob
Flags: needinfo?(rrelyea)
You were definitely right. I post a screenshot of the 2 instances. What's next? Do I create a patch myself or you want to do it yourself? I have another issue, probably unrelated: with this scenario i have a tremendous memory leak (100 Mb in a hour keeping a local page opened). What you suggest in order to understand the problem?
Flags: needinfo?(rrelyea)
Attached image Immagine5.png
2 instances
Priority: -- → P3
See Also: → 1612360
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: