Closed Bug 936808 Opened 6 years ago Closed 6 years ago

Intermittent test_sanitySimpletest.html | application terminated with exit code 11 | application crashed [@ PK11_ImportCert]

Categories

(Core :: Security: PSM, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla29
Tracking Status
firefox26 --- wontfix
firefox27 + fixed
firefox28 + fixed
firefox29 + fixed
firefox-esr17 --- wontfix
firefox-esr24 27+ fixed
b2g18 --- fixed
b2g-v1.1hd --- fixed
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: RyanVM, Assigned: cviecco)

References

Details

(Keywords: crash, intermittent-failure, sec-high, Whiteboard: [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+][qa-])

Crash Data

Attachments

(1 file)

I'm afraid this might be a regression from yesterday's NSS update. Kai, can you please take a look?

https://tbpl.mozilla.org/php/getParsedLog.php?id=30361293&tree=Mozilla-Aurora

Ubuntu VM 12.04 mozilla-aurora debug test mochitest-1 on 2013-11-08 19:01:10 PST for push 51f5ae29c5de
slave: tst-linux32-ec2-101

19:05:31     INFO -  161 INFO TEST-START | /tests/Harness_sanity/test_sanitySimpletest.html
19:05:32     INFO -  (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
19:05:32     INFO -  (pkix_CacheCert_Add: PKIX_PL_HashTable_Add for Certs skipped: entry existed
19:05:32     INFO -  ++DOMWINDOW == 38 (0xac91dd0) [serial = 45] [outer = 0xaddc9e0]
19:05:32     INFO -  162 INFO TEST-INFO | /tests/Harness_sanity/test_sanitySimpletest.html | Profile::SimpleTestLoadTime: 32
19:05:32     INFO -  163 INFO TEST-INFO | /tests/Harness_sanity/test_sanitySimpletest.html | Logging some info
19:05:32     INFO -  164 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | test ok - This should be true
19:05:32     INFO -  165 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | is() test failed
19:05:32     INFO -  166 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | isnot() test failed
19:05:32     INFO -  167 INFO TEST-KNOWN-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | test todo - todo() test should not pass
19:05:32     INFO -  168 INFO TEST-KNOWN-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | test todo_is - got false, expected true
19:05:32     INFO -  169 INFO TEST-KNOWN-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | test todo_isnot - didn't expect true, but got it
19:05:32     INFO -  170 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | testPluginIsOOP should be bool
19:05:32     INFO -  171 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | Clipboard has the correct value
19:05:32     INFO -  172 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | waitForClipboard should work
19:05:32     INFO -  173 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | createEl did not create element as expected
19:05:32     INFO -  174 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | $ helper did not get element as expected
19:05:32     INFO -  175 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | computedStyle did not get right display value
19:05:32     INFO -  176 INFO TEST-INFO | /tests/Harness_sanity/test_sanitySimpletest.html | Profile::SimpleTestRunTime: 117
19:05:32     INFO -  177 INFO TEST-PASS | /tests/Harness_sanity/test_sanitySimpletest.html | expected uncaught exception - uncaught exception: i am an uncaught exception at :0
19:05:32     INFO -  JavaScript error: , line 0: uncaught exception: i am an uncaught exception
19:05:33  WARNING -  TEST-UNEXPECTED-FAIL | /tests/Harness_sanity/test_sanitySimpletest.html | application terminated with exit code 11
19:05:33     INFO -  INFO | runtests.py | Application ran for: 0:00:26.610075
19:05:33     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpHM6CODpidlog
19:05:47  WARNING -  PROCESS-CRASH | /tests/Harness_sanity/test_sanitySimpletest.html | application crashed [@ PK11_ImportCert]
19:05:47     INFO -  Crash dump filename: /tmp/tmpsH6iya/minidumps/4c711747-7f3f-8543-3baf75be-34474794.dmp
19:05:47     INFO -  Operating system: Linux
19:05:47     INFO -                    0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686
19:05:47     INFO -  CPU: x86
19:05:47     INFO -       GenuineIntel family 6 model 26 stepping 5
19:05:47     INFO -       1 CPU
19:05:47     INFO -  Crash reason:  SIGSEGV
19:05:47     INFO -  Crash address: 0x14
19:05:47     INFO -  Thread 31 (crashed)
19:05:47     INFO -   0  libnss3.so!PK11_ImportCert [pk11cert.c:51f5ae29c5de : 972 + 0x3]
19:05:47     INFO -      eip = 0xb72d7a38   esp = 0xa10fd000   ebp = 0xa10fd098   ebx = 0xb74086a4
19:05:47     INFO -      esi = 0xa39d0b18   edi = 0xa39d0b20   eax = 0x00000000   ecx = 0xb1400020
19:05:47     INFO -      edx = 0x00000000   efl = 0x00210206
19:05:47     INFO -      Found by: given as instruction pointer in context
19:05:47     INFO -   1  libxul.so!mozilla::psm::::AuthCertificate [SSLServerCertVerification.cpp:51f5ae29c5de : 960 + 0x13]
19:05:47     INFO -      eip = 0xb489a8cb   esp = 0xa10fd0a0   ebp = 0xa10fd118   ebx = 0xb6be96f4
19:05:47     INFO -      esi = 0xa39d6a08   edi = 0x00000000
19:05:47     INFO -      Found by: call frame info
19:05:47     INFO -   2  libxul.so!mozilla::psm::::SSLServerCertVerificationJob::Run [SSLServerCertVerification.cpp:51f5ae29c5de : 1058 + 0x18]
19:05:47     INFO -      eip = 0xb489b6ee   esp = 0xa10fd120   ebp = 0xa10fd198   ebx = 0xb6be96f4
19:05:47     INFO -      esi = 0xb1402308   edi = 0xa10fd158
19:05:47     INFO -      Found by: call frame info
19:05:47     INFO -   3  libxul.so!nsThreadPool::Run() [nsThreadPool.cpp:51f5ae29c5de : 204 + 0x8]
19:05:47     INFO -      eip = 0xb4f3525b   esp = 0xa10fd1a0   ebp = 0xa10fd1f8   ebx = 0xb6be96f4
19:05:47     INFO -      esi = 0xb1402308   edi = 0xa10fd1d8
19:05:47     INFO -      Found by: call frame info
19:05:47     INFO -   4  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:51f5ae29c5de : 622 + 0x13]
19:05:47     INFO -      eip = 0xb4f3260e   esp = 0xa10fd200   ebp = 0xa10fd258   ebx = 0xb6be96f4
19:05:47     INFO -      esi = 0xb145ffe8   edi = 0xa10fd228
19:05:47     INFO -      Found by: call frame info
19:05:47     INFO -   5  libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:51f5ae29c5de : 251 + 0xf]
19:05:47     INFO -      eip = 0xb4eec1d6   esp = 0xa10fd260   ebp = 0xa10fd298   ebx = 0xb6be96f4
19:05:47     INFO -      esi = 0xb145ffe8   edi = 0xa10fd2c0
19:05:47     INFO -      Found by: call frame info
19:05:47     INFO -   6  libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp:51f5ae29c5de : 250 + 0x9]
19:05:47     INFO -      eip = 0xb4f33141   esp = 0xa10fd2a0   ebp = 0xa10fd2e8   ebx = 0xb6be96f4
19:05:47     INFO -      esi = 0xb145ffe8   edi = 0xa10fd2c0
19:05:47     INFO -      Found by: call frame info
Thanks for the report. Looks like a memory corruption issue. :(

The code that triggered the crash is here:

PK11_ImportCert(...)
{
    if (c->object.cryptoContext) {
	/* Delete the temp instance */
	NSSCryptoContext *cc = c->object.cryptoContext;
->	nssCertificateStore_Lock(cc->certStore, &lockTrace);

I carefully looked at the changeset, but I cannot see anything that would influence that code.

The memory allocation handling changes are adding protection against huge allocation attempts, but doesn't haven an influence for standard allocs.

We crash while handling a new SSL/TLS connection, after we have passed server certificate authentication, attempting to cache an intermediate CA certificate.


If I understand correctly, this didn't occur while landing the NSS code. It happened when building a later change, and so far it has only happened once.

Since this same code is also being used on mozilla-central, I'd expect it to trigger such failures on m-c, too. But it hasn't happened yet, right?

I'm hoping for an unrelated one-time failure.
I'll trigger a few additional builds.
Potentially a use-after-free or similar very bad badness -> security sensitive for now.
Group: crypto-core-security
None of the additional 10+ builds triggered this failure.
Depends on: 937721
https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&reason_type=contains&date=2013-11-13&range_value=28&range_unit=days&hang_type=any&process_type=any&signature=PK11_ImportCert

Earliest report is against Firefox 6.0.

Ideally, we'd find the bug and fix it properly.

Short of that, if this is a race in PK11_ImportCert, we may be able to work around this by holding a mutex that serializes these calls to PK11_ImportCert. Or, more drastically, we could try reducing the thread pool size of SSLServerCertVerification to 1 (eliminating parallel cert verification), to see if that helps. Neither of those is a good long-term solution.

Another possibility is to stop saving intermediate certificates we encounter into the NSS certificate database. That's bug 733232.
Severity: normal → critical
Crash Signature: [@ PK11_ImportCert]
Keywords: sec-critical
OS: Linux → All
Hardware: x86 → All
Is there somebody who can look into this?
Whiteboard: [maybe a one-off?]
This is only a workaround by serializing the pk11 calls done by authcertificate. There are other pk11 calls in psm. Do you want those serialized as well?
Attachment #8343406 - Flags: feedback?(brian)
Comment on attachment 8343406 [details] [diff] [review]
mutex-pk11-calls-inpsm

Review of attachment 8343406 [details] [diff] [review]:
-----------------------------------------------------------------

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +986,5 @@
>  
>          // We have found a signer cert that we want to remember.
>          char* nickname = nsNSSCertificate::defaultServerNickname(node->cert);
>          if (nickname && *nickname) {
> +          MutexAutoLock telemetryMutex(*gSSLVerificationPK11Mutex); 

Please add a comment that we suspect there may be a race condition of some kind in PK11_ImportCert and this mutex is an attempt to wallpaper over it.

Also, trailing whitespace.
Attachment #8343406 - Flags: feedback?(brian) → review+
Assignee: nobody → cviecco
This got landed on inbound.
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e4260c7fa4d

Shouldn't it have gotten security approval first?
Comment on attachment 8343406 [details] [diff] [review]
mutex-pk11-calls-inpsm

sec-approval+ (retroactive). Until we know a more reliable way to cause the race sec-high is a better rating. Still potentially exploitable, but not easily weaponized.
Attachment #8343406 - Flags: sec-approval+
Comment on attachment 8343406 [details] [diff] [review]
mutex-pk11-calls-inpsm

[Triage Comment]
Safe fix, a=dveditz for aurora
Attachment #8343406 - Flags: approval-mozilla-aurora+
Comment on attachment 8343406 [details] [diff] [review]
mutex-pk11-calls-inpsm

[Triage Comment]
missed the merge, so a=dveditz for beta (27) also
Attachment #8343406 - Flags: approval-mozilla-beta+
https://hg.mozilla.org/mozilla-central/rev/0e4260c7fa4d
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
We are guessing that this patch may help. AFAICT, we've never reproduced the bug so we don't know for sure.
Whiteboard: [fix is speculative; may not actually be fixed]
Group: crypto-core-security → core-security
Whiteboard: [fix is speculative; may not actually be fixed] → [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+]
Whiteboard: [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+] → [fix is speculative; may not actually be fixed][adv-main27+][adv-esr24.3+][qa-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.