Firefox freezes since 2018-01-23 Nightly, after loading PKCS#11 module

RESOLVED FIXED in 3.36.1

Status

defect
RESOLVED FIXED
Last year
Last year

People

(Reporter: jjelen, Assigned: ueno)

Tracking

({regression})

other
3.36.1
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
Build ID: 20180317143930

Steps to reproduce:

The reproducer:
 * Remove smart card from card reader
 * Start a Firefox Nightly
 * Go to Preferences -> Privacy & Security -> Security Devices
 * Click Load, add a OpenSC PKCS#11 module (/usr/lib64/pkcs11/opensc-pkcs11.so)
 * Click OK, close the "Security Devices" dialog
 * Insert a card in reader, wait few seconds
 * Click "View Certificates"
 * Wait for PIN prompt



Actual results:

Firefox after merging NSS 3.36 [1] freezes.

https://hg.mozilla.org/integration/mozilla-inbound/rev/348b36156a84

I got to this single commit using mozregression, but I am not able to bisect further (it is a single commit with single build).


Expected results:

The PIN prompt should appear, Firefox should not freeze.
Feel free to reassing to NSS. I filled a Firefox bug, since this is where I have a reproducer. I was not able to isolate this issue to NSS itself yet.
If it's related to pkcs#11, let's start by assuming the issue is with NSS.

Reproducing this bug apparently depends on owning specific smartcard hardware, so we need to find someone with access to such hard, who is able to debug where it is freezing.

Jakub, have you tried to debug NSS?

Could you install debuginfo, and attach using gdb to the running firefox process after it's stuck, and create a full stack track for all threads? 
-> thread apply all bt full

Could you save the resulting output to a text file, and attach it to this bug?
Assignee: nobody → nobody
Status: UNCONFIRMED → NEW
Component: Untriaged → Libraries
Ever confirmed: true
Flags: needinfo?(jjelen)
Product: Firefox → NSS
Version: 60 Branch → other
On Jan 23, Firefox was upgraded to this new NSS snapshot:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/348b36156a84

The bug must have been caused by the changes in that commit.
This is the NSS changeset that Firefox was upgraded to:
  https://hg.mozilla.org/projects/nss/graph/10f7e0179b18

The page shows the changes that had been made after the branching point to the NSS 3.35 release branch.
Bug 1054373 appears to be the most likely candidate for having caused this new bug.
  https://hg.mozilla.org/projects/nss/rev/6242acf9c02f
Flags: needinfo?(rrelyea)
Kai, thank you for the prompt reply and fast analysis.
I was using standard PIV Test cards in OMNIKEY AG CardMan 3121, but it should be reproducible with any other hardware supported by OpenSC or with other PKCS#11 module. We reproduced it also with coolkey, but I was not successful with soft token (it probably needs the dynamic insertion event).

Here is a backtrace from the following processes in the first bad build:

$ mozregression --launch 1c8bbd57b772157a6119f7773cc4ec8f4b7c1539

We have three processes.

The first one with PID 9367 waits here:

#0  0x00007f3f85a17be6 in pthread_cond_timedwait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x00007f3f844eb550 in pt_TimedWait () at /tmp/tmp7Mx3wM/firefox/libnspr4.so
#2  0x00007f3f844eba47 in PR_WaitCondVar () at /tmp/tmp7Mx3wM/firefox/libnspr4.so
#3  0x00007f3f83778ece in nssSlot_IsTokenPresent () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#4  0x00007f3f8377b5e7 in token_is_present.isra () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#5  0x00007f3f8377bbee in nssTokenObjectCache_FindObjectsByTemplate () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#6  0x00007f3f837799f1 in nssToken_FindObjectsByTemplate () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#7  0x00007f3f83779f9d in nssToken_FindCertificateByIssuerAndSerialNumber () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#8  0x00007f3f83782efe in nssTrustDomain_UpdateCachedTokenCerts () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#9  0x00007f3f8378412a in nssSlot_Refresh () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#10 0x00007f3f8377905d in nssSlot_IsTokenPresent () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#11 0x00007f3f837ae597 in pk11_IsPresentCertLoad () at /tmp/tmp7Mx3wM/firefox/libnss3.so
#12 0x00007f3f7763ddac in nsNSSComponent::CheckForSmartCardChanges() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#13 0x00007f3f776176d6 in nsNSSCertificateDB::GetCerts(nsIX509CertList**) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#14 0x00007f3f74584a36 in NS_InvokeByIndex () at /tmp/tmp7Mx3wM/firefox/libxul.so
#15 0x00007f3f74df2b30 in XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#16 0x00007f3f74dfb1aa in XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#17 0x00007f3f778c4512 in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#18 0x00007f3f778bf94f in Interpret(JSContext*, js::RunState&) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#19 0x00007f3f778c4163 in js::RunScript(JSContext*, js::RunState&) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#20 0x00007f3f778c47ab in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#21 0x00007f3f778c54ab in js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#22 0x00007f3f77ba41b5 in JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#23 0x00007f3f75ac05f8 in mozilla::dom::EventHandlerNonNull::Call(JSContext*, JS::Handle<JS::Value>, mozilla::dom::Event&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&) ()
    at /tmp/tmp7Mx3wM/firefox/libxul.so
#24 0x00007f3f75dab3b7 in mozilla::JSEventHandler::HandleEvent(nsIDOMEvent*) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#25 0x00007f3f75dadbab in mozilla::EventListenerManager::HandleEventSubType(mozilla::EventListenerManager::Listener*, nsIDOMEvent*, mozilla::dom::EventTarget*) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#26 0x00007f3f75db9445 in mozilla::EventListenerManager::HandleEventInternal(nsPresContext*, mozilla::WidgetEvent*, nsIDOMEvent**, mozilla::dom::EventTarget*, nsEventStatus*) ()
    at /tmp/tmp7Mx3wM/firefox/libxul.so
#27 0x00007f3f75dba13b in mozilla::EventTargetChainItem::HandleEventTargetChain(nsTArray<mozilla::EventTargetChainItem>&, mozilla::EventChainPostVisitor&, mozilla::EventDispatchingCallback*, mozilla::ELMCreationDetector&) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#28 0x00007f3f75dbaf23 in mozilla::EventDispatcher::Dispatch(nsISupports*, nsPresContext*, mozilla::WidgetEvent*, nsIDOMEvent*, nsEventStatus*, mozilla::EventDispatchingCallback*, nsTArray<mozilla::dom::EventTarget*>*) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#29 0x00007f3f7682edf2 in nsDocumentViewer::LoadComplete(nsresult) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#30 0x00007f3f774e02f9 in nsDocShell::EndPageLoad(nsIWebProgress*, nsIChannel*, nsresult) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#31 0x00007f3f774e18e9 in nsDocShell::OnStateChange(nsIWebProgress*, nsIRequest*, unsigned int, nsresult) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#32 0x00007f3f74f4ba92 in nsDocLoader::DoFireOnStateChange(nsIWebProgress*, nsIRequest*, int&, nsresult) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#33 0x00007f3f74f4bbd2 in nsDocLoader::doStopDocumentLoad(nsIRequest*, nsresult) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#34 0x00007f3f74f4be36 in nsDocLoader::DocLoaderIsEmpty(bool) [clone .part.75] () at /tmp/tmp7Mx3wM/firefox/libxul.so
#35 0x00007f3f74f4ca96 in nsDocLoader::OnStopRequest(nsIRequest*, nsISupports*, nsresult) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#36 0x00007f3f745f76f4 in mozilla::net::nsLoadGroup::RemoveRequest(nsIRequest*, nsISupports*, nsresult) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#37 0x00007f3f752fa1f7 in imgRequestProxy::RemoveFromLoadGroup() [clone .part.329] () at /tmp/tmp7Mx3wM/firefox/libxul.so
#38 0x00007f3f752fa95b in imgRequestProxy::OnLoadComplete(bool) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#39 0x00007f3f752fe562 in void mozilla::image::SyncNotifyInternal<mozilla::image::ObserverTable const*>(mozilla::image::ObserverTable const* const&, bool, unsigned int, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#40 0x00007f3f752fe894 in mozilla::image::ProgressTracker::SyncNotifyProgress(unsigned int, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#41 0x00007f3f7530a920 in mozilla::image::RasterImage::NotifyProgress(unsigned int, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<unsigned int> const&, mozilla::image::DecoderFlags, mozilla::image::SurfaceFlags) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#42 0x00007f3f7530ac37 in mozilla::image::RasterImage::NotifyForLoadEvent(unsigned int) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#43 0x00007f3f753111c6 in mozilla::image::RasterImage::NotifyDecodeComplete(mozilla::image::DecoderFinalStatus const&, mozilla::image::ImageMetadata const&, mozilla::image::DecoderTelemetry const&, unsigned int, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::Maybe<unsigned int> const&, mozilla::image::DecoderFlags, mozilla::image::SurfaceFlags) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#44 0x00007f3f752d940e in mozilla::detail::RunnableFunction<mozilla::image::IDecodingTask::NotifyDecodeComplete(mozilla::NotNull<mozilla::image::RasterImage*>, mozilla::NotNull<mozilla::image::Decoder*>)::{lambda()#1}>::Run() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#45 0x00007f3f7457dd06 in nsThread::ProcessNextEvent(bool, bool*) [clone .part.242] () at /tmp/tmp7Mx3wM/firefox/libxul.so
#46 0x00007f3f7457f4eb in NS_ProcessNextEvent(nsIThread*, bool) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#47 0x00007f3f749a25b2 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#48 0x00007f3f749788e2 in MessageLoop::Run() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#49 0x00007f3f76575729 in nsBaseAppShell::Run() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#50 0x00007f3f77733726 in nsAppStartup::Run() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#51 0x00007f3f777e12ec in XREMain::XRE_mainRun() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#52 0x00007f3f777e1fbb in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#53 0x00007f3f777e23e4 in XRE_main(int, char**, mozilla::BootstrapConfig const&) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#54 0x00000000004066f9 in do_main(int, char**, char**) ()
#55 0x0000000000405f0b in main ()

The others (9422, 9477) wait in firefox code:

#0  0x00007f77379a2c6b in poll () at /lib64/libc.so.6
#1  0x00007f772971941f in PollWrapper(_GPollFD*, unsigned int, int) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#2  0x00007f7732a07e99 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#3  0x00007f7732a07fac in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#4  0x00007f772971946f in nsAppShell::ProcessNextNativeEvent(bool) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#5  0x00007f77296e316c in nsBaseAppShell::DoProcessNextNativeEvent(bool) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#6  0x00007f77296e3320 in nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#7  0x00007f77276e6b31 in nsThread::ProcessNextEvent(bool, bool*) [clone .part.242] () at /tmp/tmp7Mx3wM/firefox/libxul.so
#8  0x00007f77276e84eb in NS_ProcessNextEvent(nsIThread*, bool) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#9  0x00007f7727b0b691 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#10 0x00007f7727ae18e2 in MessageLoop::Run() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#11 0x00007f77296de729 in nsBaseAppShell::Run() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#12 0x00007f772a94baec in XRE_RunAppShell() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#13 0x00007f7727ae18e2 in MessageLoop::Run() () at /tmp/tmp7Mx3wM/firefox/libxul.so
#14 0x00007f772a94be20 in XRE_InitChildProcess(int, char**, XREChildData const*) () at /tmp/tmp7Mx3wM/firefox/libxul.so
#15 0x0000000000406407 in content_process_main(mozilla::Bootstrap*, int, char**) [clone .part.2] ()
#16 0x0000000000405f70 in main ()
> #2  0x00007f3f844eba47 in PR_WaitCondVar () at /tmp/tmp7Mx3wM/firefox/libnspr4.so
> #3  0x00007f3f83778ece in nssSlot_IsTokenPresent () at /tmp/tmp7Mx3wM/firefox/libnss3.so

Here it's waiting on the condition slot->inIsPresent == PR_TRUE:
https://hg.mozilla.org/projects/nss/file/tip/lib/dev/devslot.c#l149
which is set to PR_TRUE a few lines below (in the same function):
https://hg.mozilla.org/projects/nss/file/tip/lib/dev/devslot.c#l162

> #4  0x00007f3f8377b5e7 in token_is_present.isra () at /tmp/tmp7Mx3wM/firefox/libnss3.so
> #5  0x00007f3f8377bbee in nssTokenObjectCache_FindObjectsByTemplate () at /tmp/tmp7Mx3wM/firefox/libnss3.so
> #6  0x00007f3f837799f1 in nssToken_FindObjectsByTemplate () at /tmp/tmp7Mx3wM/firefox/libnss3.so
> #7  0x00007f3f83779f9d in nssToken_FindCertificateByIssuerAndSerialNumber () at /tmp/tmp7Mx3wM/firefox/libnss3.so
> #8  0x00007f3f83782efe in nssTrustDomain_UpdateCachedTokenCerts () at /tmp/tmp7Mx3wM/firefox/libnss3.so
> #9  0x00007f3f8378412a in nssSlot_Refresh () at /tmp/tmp7Mx3wM/firefox/libnss3.so
> #10 0x00007f3f8377905d in nssSlot_IsTokenPresent () at /tmp/tmp7Mx3wM/firefox/libnss3.so

Here, nssSlot_Refresh() called from nssSlot_IsTokenPresent():
https://hg.mozilla.org/projects/nss/file/tip/lib/dev/devslot.c#l237
ends up with recursively calling nssSlot_IsTokenPresent().

So I suspect slot->isInPresent should be cleared before calling nssSlot_Refresh()?
Posted patch devslot-reinsert.patch (obsolete) — Splinter Review
I'm attaching a patch that could bypass the slot->inIsPresent check when nssSlot_IsTokenPresent() is called from itself.  It's a bit tricky as it reuses the logic avoiding repeated calls to nssSlot_IsTokenPresent().

Without the patch, I could reproduce the problem with:
nss-3.36.0-1.0.fc27.x86_64
firefox-59.0.1-1.fc27.x86_64
while it no longer occurs if I apply the patch.
Attachment #8961418 - Flags: review?(rrelyea)
Below is a review from Bob (as he is having trouble with bugzilla login now):

Here's my review: r-

There are two things wrong with this patch: 1) you need to get a lock
before you check and modify the state variables, and 2) even with the
lock, you have a race with reset, which if you lose, you will randomly
find yourself back in this lock state.

I thought about moving the Refresh out of the block, but most of the
stuff happening in Refresh are the things we are trying to avoid
multiple threads from doing at the same time anyway, so your basic
idea is the right one (detect this case and bypass isPresent). I
suggest renaming the isInPresent to tidInIsPresent* and setting that
value to the current thread id. At the top of is present add a ||
clause to the 'in delay' if statement which activates is you are in
the same thread.

* you may pick your own better name. The variable holds the thread id
of the process that is in the body of isPresent. zero means no process
is in the body. (Need to verify that zero is an not a valid thread id
in NSPR).
(In reply to Daiki Ueno [:ueno] from comment #9)

> There are two things wrong with this patch: 1) you need to get a lock
> before you check and modify the state variables, and 2) even with the
> lock, you have a race with reset, which if you lose, you will randomly
> find yourself back in this lock state.

I thought that locking was unnecessary there because it is assured that only one thread can reach the nssSlot_Refresh() call, as the surrounding code is protected by a conditional variable:

    PZ_Lock(slot->isPresentLock);
    while (slot->inIsPresent) {
        PR_WaitCondVar(slot->isPresentCondition, 0);
    }
    PZ_Unlock(slot->isPresentLock);
    ...
    nssrv = nssSlot_Refresh(slot);
    ...
    PZ_Lock(slot->isPresentLock);
    PR_NotifyAllCondVar(slot->isPresentCondition);
    PZ_Unlock(slot->isPresentLock);

However, nevertheless, if nssSlot_Refresh() takes time (i.e., because of slow smart card access), within_token_delay_period() at the beginning could return false and there would be a race in that case.

> I thought about moving the Refresh out of the block, but most of the
> stuff happening in Refresh are the things we are trying to avoid
> multiple threads from doing at the same time anyway, so your basic
> idea is the right one (detect this case and bypass isPresent). I
> suggest renaming the isInPresent to tidInIsPresent* and setting that
> value to the current thread id. At the top of is present add a ||
> clause to the 'in delay' if statement which activates is you are in
> the same thread.

Indeed, that sounds like a clearer approach.  I am attaching a patch along these lines, with a few modifications:
- I used PRThread* instead of a thread ID, because PR_GetThreadID() is a private API hidden in prpriv.h
- I merged the delay check and the "running in the same thread" check into a single function (token_status_checked)
Attachment #8961418 - Attachment is obsolete: true
Attachment #8961418 - Flags: review?(rrelyea)
Attachment #8961713 - Flags: review?(rrelyea)
Comment on attachment 8961713 [details] [diff] [review]
devslot-reinsert.patch

Review of attachment 8961713 [details] [diff] [review]:
-----------------------------------------------------------------

r+ better than what I had in mind.

::: lib/dev/devslot.c
@@ +136,4 @@
>  
>      /* avoid repeated calls to check token status within set interval */
>      PZ_Lock(slot->isPresentLock);
> +    if (token_status_checked(slot)) {

OK I was thinking just adding the test on this if, but your solution is fine.

@@ +156,5 @@
>          PR_WaitCondVar(slot->isPresentCondition, 0);
>      }
>      /* if we were one of multiple threads here, the first thread will have
>       * given us the answer, no need to make more queries of the token. */
> +    if (token_status_checked(slot)) {

In this case the test for slot->isPresentThread == PR_GetCurrentThread() is fine because it is known to always fail ( if you are waiting on the isPresentCondition you better not be the thread in isPresent.
Attachment #8961713 - Flags: review?(rrelyea) → review+
Blocks: 1448404
Flags: needinfo?(jjelen)
Daiki, thank you for creating the fix for this bug!
Is this ready for checkin?
Assignee: nobody → dueno
Flags: needinfo?(rrelyea) → needinfo?(dueno)
I tried nss-try build with the patch, and got a leak report from address sanitizer:
https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=d9baf7f3ee48141680f9de9b49161205d9c0b866
though it's unlikely to be related to this change.  Let me try it locally before checkin.
Flags: needinfo?(dueno)
Landed as: https://hg.mozilla.org/projects/nss/rev/6ae3ab8a1e7b
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → 3.37
Uplifted to the 3.36 branch for 3.36.1
https://hg.mozilla.org/projects/nss/rev/fa159fd618ea
Target Milestone: 3.37 → 3.36.1
Duplicate of this bug: 1450962
Duplicate of this bug: 1466766
You need to log in before you can comment on or make changes to this bug.