Last Comment Bug 464223 - Certutil didn't accept certificate request to sign.
: Certutil didn't accept certificate request to sign.
Status: RESOLVED FIXED
:
Product: NSS
Classification: Components
Component: Tools (show other bugs)
: trunk
: All All
: P1 normal (vote)
: 3.12.3
Assigned To: Slavomir Katuscak
:
Mentors:
: 403485 464689 465733 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-11-11 03:37 PST by Slavomir Katuscak
Modified: 2009-03-31 17:12 PDT (History)
0 users
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Test data. (18.36 KB, application/octet-stream)
2008-11-11 03:42 PST, Slavomir Katuscak
no flags Details
Another test data. (91.14 KB, application/octet-stream)
2008-11-11 11:30 PST, Slavomir Katuscak
no flags Details
Another test data. (15.04 KB, application/octet-stream)
2008-11-11 11:39 PST, Slavomir Katuscak
no flags Details
Patch to use /bin/bash in memleak.sh. (662 bytes, patch)
2008-11-14 10:06 PST, Slavomir Katuscak
nelson: review+
Details | Diff | Review
use stream redirect instead of piping (1.95 KB, patch)
2008-11-14 15:58 PST, Alexei Volkov
no flags Details | Diff | Review
fix for object leak tinderbox (4.04 KB, patch)
2008-11-19 15:33 PST, Alexei Volkov
nelson: review+
Details | Diff | Review
Workaround. (12.97 KB, patch)
2008-12-01 14:16 PST, Slavomir Katuscak
nelson: review+
alvolkov.bgs: review+
Details | Diff | Review
Workaround - effective part. (checked in) (1.99 KB, patch)
2008-12-02 14:22 PST, Slavomir Katuscak
nelson: review+
Details | Diff | Review
Patch with rest of changes. (checked in) (11.84 KB, patch)
2008-12-02 16:13 PST, Slavomir Katuscak
nelson: review+
Details | Diff | Review

Description Slavomir Katuscak 2008-11-11 03:37:25 PST
Bug detected on Tinderbox (only once - http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1226387476.1226394564.5371.gz&fulltext=1)

Certutil didn't accept certificate request to sign and returned error messages 'bad certificate request' and 'improperly formatted DER-encoded message.'.

I tried to reproduce manually and I didn't have problem to sign the same request.

Log:
---
chains.sh: Creating EE certifiate request EE1Req.der
certutil -s "CN=EE1 EE, O=EE1, C=US" -R  -d EE1DB -f EE1DB/dbpasswd -z /export/tinderlight/data/aquash_32_DBG/mozilla/tests_results/security/aquash.1/upgradedb/tests_noise.11469 -o EE1Req.der


Generating key.  This may take a few moments...

chains.sh: #2848: BridgeWithHalfAIA: Creating EE certifiate request EE1Req.der  - PASSED
chains.sh: Creating certficate EE1CA1.der signed by CA1
certutil -C -c CA1 -v 60 -d CA1DB -i EE1Req.der -o EE1CA1.der -f CA1DB/dbpasswd -m 1111002349  
certutil: bad certificate request
: security library: improperly formatted DER-encoded message.
certutil: unable to create cert (security library: improperly formatted DER-encoded message.)
chains.sh: #2849: BridgeWithHalfAIA: Creating certficate EE1CA1.der signed by CA1  - FAILED
---
Comment 1 Slavomir Katuscak 2008-11-11 03:42:48 PST
Created attachment 347514 [details]
Test data.

Data used for testing (cert DBs, requests).

There was SQLite DB format used (NSS_DEFAULT_DB_TYPE=sql), occured on Solaris9/X86/DBG/32 bit build, but most probably this is not platform dependent problem.
Comment 2 Slavomir Katuscak 2008-11-11 11:30:17 PST
Created attachment 347587 [details]
Another test data.

Reproduced:
Solaris9/Sparc/DBG/32, PKIX build (NSS_ENABLE_PKIX_VERIFY="1")

http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1226414151.1226426254.4572.gz&fulltext=1

chains.sh: Creating DB CA42DB
certutil -N -d CA42DB -f CA42DB/dbpasswd
chains.sh: #870: AnyPolicyWithLevel: Creating DB CA42DB  - PASSED
chains.sh: Creating Intermediate certifiate request CA42Req.der
certutil -s "CN=CA42 Intermediate, O=CA42, C=US" -R -2 -d CA42DB -f CA42DB/dbpasswd -z /export/tinderlight/data/ciotat_32_DBG/mozilla/tests_results/security/ciotat.1/pkix/tests_noise.25090 -o CA42Req.der


Generating key.  This may take a few moments...

chains.sh: #871: AnyPolicyWithLevel: Creating Intermediate certifiate request CA42Req.der  - PASSED
chains.sh: Creating certficate CA42CA1.der signed by CA1
certutil -C -c CA1 -v 60 -d CA1DB -i CA42Req.der -o CA42CA1.der -f CA1DB/dbpasswd -m 1111090860   --extCP
Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > certutil: bad certificate request
: security library: improperly formatted DER-encoded message.
certutil: unable to create cert (security library: improperly formatted DER-encoded message.)
Is this a critical extension [y/N]?
chains.sh: #872: AnyPolicyWithLevel: Creating certficate CA42CA1.der signed by CA1  - FAILED

In log you can see also, that with CA1 were already successfully signed other certificates.
Comment 3 Slavomir Katuscak 2008-11-11 11:39:42 PST
Created attachment 347588 [details]
Another test data.

Reproduced:
Solaris9/Sparc/DBG/64

http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1226403954.1226413654.5887.gz&fulltext=1

chains.sh: Creating DB BridgeDB
certutil -N -d BridgeDB -f BridgeDB/dbpasswd
chains.sh: #1043: BridgeWithAIA: Creating DB BridgeDB  - PASSED
chains.sh: Creating Bridge certifiate request BridgeReq.der
certutil -s "CN=Bridge Bridge, O=Bridge, C=US" -R -2 -d BridgeDB -f BridgeDB/dbpasswd -z /export/tinderlight/data/ciotat_64_DBG/mozilla/tests_results/security/ciotat.1/tests_noise.22426 -o BridgeReq.der


Generating key.  This may take a few moments...

chains.sh: #1044: BridgeWithAIA: Creating Bridge certifiate request BridgeReq.der  - PASSED
chains.sh: Creating certficate BridgeArmy.der signed by Army
certutil -C -c Army -v 60 -d ArmyDB -i BridgeReq.der -o BridgeArmy.der -f ArmyDB/dbpasswd -m 1111054924 -7 Bridge@Army 
certutil: bad certificate request
: security library: improperly formatted DER-encoded message.
certutil: unable to create cert (security library: improperly formatted DER-encoded message.)
chains.sh: #1045: BridgeWithAIA: Creating certficate BridgeArmy.der signed by Army  - FAILED
chains.sh: Importing certificate BridgeArmy.der to BridgeDB database
certutil -A -n Bridge -t u,u,u -d BridgeDB -f BridgeDB/dbpasswd -i BridgeArmy.der
certutil: could not obtain certificate from file: Unrecognized Object Identifier.
Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]?
chains.sh: #1046: BridgeWithAIA: Importing certificate BridgeArmy.der to BridgeDB database  - FAILED
chains.sh: Creating certficate BridgeNavy.der signed by Navy
certutil -C -c Navy -v 60 -d NavyDB -i BridgeReq.der -o BridgeNavy.der -f NavyDB/dbpasswd -m 1111054925 -7 Bridge@Navy 
chains.sh: #1047: BridgeWithAIA: Creating certficate BridgeNavy.der signed by Navy  - PASSED
chains.sh: Importing certificate BridgeNavy.der to BridgeDB database
certutil -A -n Bridge -t u,u,u -d BridgeDB -f BridgeDB/dbpasswd -i BridgeNavy.der
chains.sh: #1048: BridgeWithAIA: Importing certificate BridgeNavy.der to BridgeDB database  - PASSED
chains.sh: Generating PKCS7 package from BridgeDB database
cmsutil -O -r "Bridge@Army,Bridge@Navy" -d BridgeDB > Bridge.p7
cmsutil: cannot find certificate for "Bridge@Army": security library: bad database.
cmsutil: problem with certs-only: security library: bad database.
chains.sh: #1049: BridgeWithAIA: Generating PKCS7 package from BridgeDB database  - FAILED

In log you can see also, that the same cert was successfully signed by Nave CA, but failed for Army CA.
Comment 4 Slavomir Katuscak 2008-11-14 05:04:32 PST
Occured today on aquash SunOS/i386 32bit DBG on 2008/11/14 01:07:55 (Tinderbox):

First failure:
---
chains.sh: Creating certficate CA3CA2.der signed by CA2
certutil -C -c CA2 -v 60 -d CA2DB -i CA3Req.der -o CA3CA2.der -f CA2DB/dbpasswd -m 1114022396   --extCP
certutil: bad certificate request
: security library: improperly formatted DER-encoded message.
certutil: unable to create cert (security library: improperly formatted DER-encoded message.)
chains.sh: #2768: Mapping2: Creating certficate CA3CA2.der signed by CA2  - FAILED
--- 

OK, cert request didn't passed.

Second failure:
---
chains.sh: Importing certificate CA3CA2.der to CA3DB database
certutil -A -n CA3 -t u,u,u -d CA3DB -f CA3DB/dbpasswd -i CA3CA2.der
Is this a CA certificate [y/N]?
Enter the path length constraint, enter to skip [<0 for unlimited path]: > Is this a critical extension [y/N]?
certutil: could not obtain certificate from file: Unrecognized Object Identifier.
chains.sh: #2769: Mapping2: Importing certificate CA3CA2.der to CA3DB database  - FAILED
---

Unrecognized Object Identifier. Nelson asked there yesterday about this bug - now we know why - cert just doesn't exists (request not accepted, cert not signed). Error message is wrong, it should be something like cert not found.

Third failure:
---
chains.sh: Creating certficate UserCA3.der signed by CA3
certutil -C -c CA3 -v 60 -d CA3DB -i UserReq.der -o UserCA3.der -f CA3DB/dbpasswd -m 1114022397   --extCP
certutil: could not find certificate named "CA3": error 0
certutil: unable to create cert (./all.sh: line 84: 25100 Done                    echo "${DATA}"
     25101 Segmentation Fault      (core dumped) | ${BINDIR}/certutil -C -c ${ISSUER} -v 60 -d ${ISSUER_DB} -i ${REQ} -o ${CERT} -f ${ISSUER_DB}/dbpasswd -m ${CERT_SN} ${EMAIL_OPT} ${OPTIONS}
chains.sh: #2772: Mapping2: Creating certficate UserCA3.der signed by CA3  - Core file is detected - FAILED
---

Signing cert not found... but... segmantation fault. 

Attached core to DBX:
t@null (l@1) program terminated by signal SEGV (no mapping at the fault address)
0xd0e3f704: strlen+0x000c:      movl     (%eax),%edx
Current function is CreateCert
 1634                  SECU_Strerror(perr));
(dbx) where                                                                  
  [1] strlen(0x0), at 0xd0e3f704 
  [2] _doprnt(0x80759e1, 0x8046990, 0xd0ed287c), at 0xd0e7850e 
  [3] _fprintf(0xd0ed287c, 0x80759c4, 0x8046faf, 0x0), at 0xd0e7af52 
=>[4] CreateCert(handle = 0x80cd108, slot = 0x80c8890, issuerNickName = 0x8046fbe "CA3", inFile = 0x8094830, outFile = 0x8094870, selfsignprivkey = 0x8046c90, pwarg = 0x8046ca4, hashAlgTag = SEC_OID_UNKNOWN, serialNumber = 1114022397U, warpmonths = 0, validityMonths = 60, emailAddrs = (nil), dnsNames = (nil), ascii = 0, selfsign = 0, extnList = 0x8092b84), line 1634 in "certutil.c"
  [5] certutil_main(argc = 17, argv = 0x8046d6c, initialize = 1), line 2773 in "certutil.c"
  [6] main(argc = 17, argv = 0x8046d6c), line 2934 in "certutil.c"

Next failures:
Some more failures with 'could not obtain certificate from file: Unrecognized Object Identifier.' or just 'cert file UserCA3.der was empty.' - caused by non-existing certificates.
Comment 5 Slavomir Katuscak 2008-11-14 06:40:32 PST
Directly after previous Tinderbox run (comment 4) there was strange failure in next run:

---
chains.sh: Creating certficate CA2Bridge.der signed by Bridge
certutil -C -c Bridge -v 60 -d BridgeDB -i CA2Req.der -o CA2Bridge.der -f BridgeDB/dbpasswd -m 1114035029   --extCP
chains.sh: #2284: BridgeWithPolicyExtensionAndMapping: Creating certficate CA2Bridge.der signed by Bridge  - PASSED
chains.sh: Importing certificate CA2Bridge.der to CA2DB database
certutil -A -n CA2 -t u,u,u -d CA2DB -f CA2DB/dbpasswd -i CA2Bridge.der
Enter a CertPolicy Object Identifier (dotted decimal format)
or "any" for AnyPolicy: > Choose the type of qualifier for policy: OID.1.0
	1 - CPS Pointer qualifier
	2 - User notice qualifier
	Any other number to finish
		Choice:  > Enter CPS pointer URI:  > Enter another policy qualifier [y/N]
Enter another PolicyInformation field [y/N]?
Enter a CertPolicy Object Identifier (dotted decimal format)
or "any" for AnyPolicy: > Choose the type of qualifier for policy: OID.2.0
	1 - CPS Pointer qualifier
	2 - User notice qualifier
	Any other number to finish
		Choice:  > Enter CPS pointer URI:  > certutil: could not obtain certificate from file: Unrecognized Object Identifier.
Enter another policy qualifier [y/N]
Enter another PolicyInformation field [y/N]?
Is this a critical extension [y/N]?
chains.sh: #2285: BridgeWithPolicyExtensionAndMapping: Importing certificate CA2Bridge.der to CA2DB database  - FAILED
---

chains.sh code to run this part is:
---
TESTNAME="Creating certficate ${CERT} signed by ${ISSUER}"
echo "${SCRIPTNAME}: ${TESTNAME}"
echo "certutil -C -c ${ISSUER} -v 60 -d ${ISSUER_DB} -i ${REQ} -o ${CERT} -f ${ISSUER_DB}/dbpasswd -m ${CERT_SN} ${EMAIL_OPT} ${OPTIONS}"
echo "${DATA}" | ${BINDIR}/certutil -C -c ${ISSUER} -v 60 -d ${ISSUER_DB} -i ${REQ} -o ${CERT} -f ${ISSUER_DB}/dbpasswd -m ${CERT_SN} ${EMAIL_OPT} ${OPTIONS}
html_msg $? 0 "${SCENARIO}${TESTNAME}"

TESTNAME="Importing certificate ${CERT} to ${ENTITY_DB} database"
echo "${SCRIPTNAME}: ${TESTNAME}"
echo "certutil -A -n ${ENTITY} -t u,u,u -d ${ENTITY_DB} -f ${ENTITY_DB}/dbpasswd -i ${CERT}"
${BINDIR}/certutil -A -n ${ENTITY} -t u,u,u -d ${ENTITY_DB} -f ${ENTITY_DB}/dbpasswd -i ${CERT} 
html_msg $? 0 "${SCENARIO}${TESTNAME}"
---

There are 2 steps:
1. Creates certificate (signs certificate request) - this command asks for some input on command line - this input is in ${DATA} variable, end is passed there by echo command.
2. Import cert to DB - there is no input expected.

From log above it looks that first command passed (we already have return value).. then message that second comamnd started (importing).. and then output from first command. That's strange, because it's serial operation, no background processes and first command should be already finished before second started. 

For me it looks like certutil first exit with some return value and then does it's work (situation on first certutil run). If this is the case of invalid cert request, it looks like they are not yet finished and in next step there is try to sign them. 

So the question is - is there any fork in certutil ? Is it possible to already have it's exit value and still running some code in the background ?

I see we use echo command to pass input to certutil when expected. I expect that 'echo ${INPUT_DATA} | certutil ...' should return exit value of certutil after it's finished, but maybe there is something wrong in this operation.
Comment 6 Slavomir Katuscak 2008-11-14 10:06:28 PST
Created attachment 348210 [details] [diff] [review]
Patch to use /bin/bash in memleak.sh.

It's possible that /bin/sh in memleak.sh is the cause of failures. 
At least it doesn't behave correctly for fail|true test:

/bin/sh:
> false | true
> echo $?
255

/bin/bash:
$ false | true
$ echo $?
0

That can also explain why we see failures only on memleak tinderboxes.
Comment 7 Nelson Bolyard (seldom reads bugmail) 2008-11-14 10:37:06 PST
Comment on attachment 348210 [details] [diff] [review]
Patch to use /bin/bash in memleak.sh.

r=nelson
Comment 8 Alexei Volkov 2008-11-14 15:58:59 PST
Created attachment 348273 [details] [diff] [review]
use stream redirect instead of piping
Comment 9 Nelson Bolyard (seldom reads bugmail) 2008-11-14 18:13:50 PST
Comment on attachment 348273 [details] [diff] [review]
use stream redirect instead of piping

Question:

Instead of:

>+    echo "${EXT_DATA}" > ${DATA_FILE}
>+    ${BIG_COMMAND} < ${DATA_FILE}
>+    rm -rf ${DATA_FILE}

Does 

+     ${BIG_COMMAND} << EOF
+${EXT_DATA}
+EOF

work just as well?  If so, I'd prefer it.  
(It seems to work just as well in my tests)
Comment 10 Alexei Volkov 2008-11-19 15:33:20 PST
Created attachment 349061 [details] [diff] [review]
fix for object leak tinderbox

1. certvfypkix.c: Remove printing of "Memory leak test: Loop %d" into the log file. Will fix the size of the log file of the object leak tinderbox.

2. chains.sh: Do not use file for temporary process input data.
Comment 11 Nelson Bolyard (seldom reads bugmail) 2008-11-20 00:16:48 PST
Comment on attachment 349061 [details] [diff] [review]
fix for object leak tinderbox

r=nelson
Comment 12 Slavomir Katuscak 2008-11-20 07:48:37 PST
Problem with printing of "Memory leak test: Loop %d" seems to be completely different bug, I suggest to report it separately. Because of this bug I disabled DBG build on communist Tinderbox, because logs has there more than 100MBs. Let me know when it's fixed.

Main problem from description of this bug seems to be fixed by updating S9 with latest recommended patch cluster (haven't seen mentioned issues any more). 

Is there any more action needed there or can we just close this bug ?
Comment 13 Slavomir Katuscak 2008-11-21 04:05:35 PST
From ciotat SunOS/sparc 64bit DBG on 2008/11/20 20:03:20 (tinderbox)

chains.sh: Creating Root CA Root4
certutil -s "CN=Root4 ROOT CA, O=Root4, C=US" -S -n Root4 -t CTu,CTu,CTu -v 600 -x -d Root4DB -1 -2 -5 -f Root4DB/dbpasswd -z /export/tinderlight/data/ciotat_64_DBG/mozilla/tests_results/security/ciotat.1/sharedb/tests_noise.20479 -m 1120212831
chains.sh: #637: MegaBridge_3_2: Creating Root CA Root4  - PASSED
chains.sh: Exporting Root CA Root4.der
certutil -L -d Root4DB -r -n Root4 -o Root4.der


Generating key.  This may take a few moments...

certutil: Could not find: Root4
: Unrecognized Object Identifier.
chains.sh: #638: MegaBridge_3_2: Exporting Root CA Root4.der  - FAILED

Manually verified, cert DB is OK, cert exist there and is exportable. So another synchronization problem, most probably first certutil command to create root cert was not finished at the time when second is trying to export it.
Comment 14 Slavomir Katuscak 2008-11-21 06:08:29 PST
From aquash SunOS/i386 32bit OPT on 2008/11/20 10:31:00 (tinderbox)

chains.sh: Creating EE certifiate request EE1Req.der
certutil -s "CN=EE1 EE, O=EE1, C=US" -R  -d EE1DB -f EE1DB/dbpasswd -z /export/tinderlight/data/aquash_32_OPT/mozilla/tests_results/security/aquash.1/upgradedb/tests_noise.1153 -o EE1Req.der
chains.sh: #2822: BridgeWithAIA: Creating EE certifiate request EE1Req.der  - PASSED


Generating key.  This may take a few moments...

chains.sh: Creating certficate EE1CA1.der signed by CA1
certutil -C -c CA1 -v 60 -d CA1DB -i EE1Req.der -o EE1CA1.der -f CA1DB/dbpasswd -m 1120114130  
certutil: bad certificate request
: security library: improperly formatted DER-encoded message.
certutil: unable to create cert (security library: improperly formatted DER-encoded message.)
chains.sh: #2823: BridgeWithAIA: Creating certficate EE1CA1.der signed by CA1  - FAILED
Comment 15 Slavomir Katuscak 2008-11-21 08:08:38 PST
From aquash SunOS/i386 32bit DBG on 2008/11/21 03:15:29 (tinderbox)

chains.sh: Verifying certificate(s)  UserCA3.der with flags -d AllDB   -o OID.1.0  -t CA1
vfychain -d AllDB -pp -vv   -o OID.1.0  UserCA3.der  -t CA1
chains.sh: -------- Running /export/tinderlight/data/aquash_32_DBG/mozilla/dist/SunOS5.9_i86pc_DBG.OBJ/bin/vfychain under DBX:
/usr/dist/pkgs/sunstudio_i386,v11.0/SUNWspro/prod/bin/dbx /export/tinderlight/data/aquash_32_DBG/mozilla/dist/SunOS5.9_i86pc_DBG.OBJ/bin/vfychain
chains.sh: -------- DBX commands:
dbxenv follow_fork_mode parent
dbxenv rtc_mel_at_exit verbose
dbxenv rtc_biu_at_exit verbose
check -memuse -match 16 -frames 16
run -d AllDB -pp -vv -o OID.1.0 UserCA3.der -t CA1

Returned value is 1, expected result is pass
chains.sh: #2781: Mapping2: Verifying certificate(s)  UserCA3.der with flags -d AllDB   -o OID.1.0  -t CA1 - FAILED
chains.sh: Verifying certificate(s)  UserCA3.der with flags -d AllDB   -o OID.1.1  -t CA1
vfychain -d AllDB -pp -vv   -o OID.1.1  UserCA3.der  -t CA1
chains.sh: -------- Running /export/tinderlight/data/aquash_32_DBG/mozilla/dist/SunOS5.9_i86pc_DBG.OBJ/bin/vfychain under DBX:
/usr/dist/pkgs/sunstudio_i386,v11.0/SUNWspro/prod/bin/dbx /export/tinderlight/data/aquash_32_DBG/mozilla/dist/SunOS5.9_i86pc_DBG.OBJ/bin/vfychain
chains.sh: -------- DBX commands:
dbxenv follow_fork_mode parent
dbxenv rtc_mel_at_exit verbose
dbxenv rtc_biu_at_exit verbose
check -memuse -match 16 -frames 16
run -d AllDB -pp -vv -o OID.1.1 UserCA3.der -t CA1

Chain is bad, -8172 = Peer's certificate issuer has been marked as not trusted by the user.
PROBLEM WITH THE CERT CHAIN:
CERT 5. Root [Certificate Authority]:
  ERROR -8172: Peer's certificate issuer has been marked as not trusted by the user.
    CN=Root ROOT CA,O=Root,C=US
execution completed, exit code is 1
execution completed, exit code is 0
Returned value is 0, expected result is fail
chains.sh: #2782: Mapping2: Verifying certificate(s)  UserCA3.der with flags -d AllDB   -o OID.1.1  -t CA1 - FAILED

There are 2 failures and from this log more things are clear to me.

Part of memleak.sh code:
---
echo "${DBX_CMD}" | ${DBX} ${COMMAND} 2> ${TMP_DBXERR} | grep -v Reading > ${TMP_DBX}
cat ${TMP_DBX} 1>&2
cat ${TMP_DBXERR}

grep "exit code is" ${TMP_DBX}
grep "exit code is 0" ${TMP_DBX} > /dev/null
return $?
---

Message 'execution completed, exit code is 1' is from first test, but was found in second. In first test there was completely missing message about exit code, in second there are both exit 1 (from second test) and exit 0 (from first test), so return states are detected wrong even if vfychain tests behaved correctly. There must be wrong caching/buffering in shell and when output on first line is forwarded to ${TMP_DBX} it's not written immediatelly. When first grep is called in second command, it detects both exit states, in opposite order as they should be written, even if there should be only exit status from second test and output from first test should be rewritten.

This also explains problem from bug 464689 - in time when command 'cat ${TMP_DBX} 1>&2' should set content of tmp file to error log, this file is still empty, so that's the reason why there was no dbx output from some tests.
Comment 16 Slavomir Katuscak 2008-11-21 08:10:03 PST
*** Bug 464689 has been marked as a duplicate of this bug. ***
Comment 17 Slavomir Katuscak 2008-11-21 08:15:57 PST
Is there any simple way to make shell/system write file changes immediately or should we add some sync command after every write ? (Looks like we need to flush files manually after every cert request generation, cert sign, cert import, tmp file write, ..)
Comment 18 Slavomir Katuscak 2008-12-01 14:16:07 PST
Created attachment 350831 [details] [diff] [review]
Workaround.

This workaround contains more things, but most important is that dbx and client/server part runs in a subshell - after this change environment changes from those parts are isolated and doesn't affect other tests.

I also rewrote some parts without functionality impact - the way how input is passed to certutil (now using external file, variable CU_DATA), NSS_DISABLE_ARENA_FREE_LIST is now set before every memleak test run and unset just after this run is finished.

This patch is already running on Tinderbox machines which were failing before (currently used patch is a bit different, contains some more debug messages in chains.sh, but we don't need them now).
Comment 19 Slavomir Katuscak 2008-12-01 14:38:52 PST
*** Bug 465733 has been marked as a duplicate of this bug. ***
Comment 20 Nelson Bolyard (seldom reads bugmail) 2008-12-01 15:50:43 PST
Comment on attachment 350831 [details] [diff] [review]
Workaround.

This patch makes many changes.  We suspect that one of them actually solves a problem, but we know not which one.  
I would prefer a patch that contains just the minimal change needed to fix the problem, but in the interest of expedience, I'm willing to r+ this patch.
I'd like Alexei to also review it.
Comment 21 Alexei Volkov 2008-12-01 15:55:09 PST
Comment on attachment 350831 [details] [diff] [review]
Workaround.

Slavo, did we not use NSS_DISABLE_ARENA_FREE_LIST variable before?
Also, there are bunch of changes in chains.sh. We utilize almost all of them in our other test suites, so it is granted, that they work. I'm ok with putting them in temporally for debugging, but hope you will remove them ones our tinderboxes are stable.
Comment 22 Slavomir Katuscak 2008-12-02 14:20:29 PST
(In reply to comment #21)
> (From update of attachment 350831 [details] [diff] [review])
> Slavo, did we not use NSS_DISABLE_ARENA_FREE_LIST variable before?

Yes, we used it also before, but it was defined at the init part of memleak.sh script.

> Also, there are bunch of changes in chains.sh. We utilize almost all of them in
> our other test suites, so it is granted, that they work. I'm ok with putting
> them in temporally for debugging, but hope you will remove them ones our
> tinderboxes are stable.

Those changes makes code more clear, I would rather keep it there.
Comment 23 Slavomir Katuscak 2008-12-02 14:22:15 PST
Created attachment 351059 [details] [diff] [review]
Workaround - effective part. (checked in)

Patch containing subset of previous patch - only effective part that fixes the problem. There would be one more patch with the rest of cleanup work (that doesn't affect functionality).
Comment 24 Nelson Bolyard (seldom reads bugmail) 2008-12-02 15:32:55 PST
Comment on attachment 351059 [details] [diff] [review]
Workaround - effective part. (checked in)

r=nelson.  Go for checkin.
Comment 25 Slavomir Katuscak 2008-12-02 15:46:41 PST
Checking in memleak.sh;
/cvsroot/mozilla/security/nss/tests/memleak/memleak.sh,v  <--  memleak.sh
new revision: 1.35; previous revision: 1.34
done
Comment 26 Slavomir Katuscak 2008-12-02 16:13:30 PST
Created attachment 351081 [details] [diff] [review]
Patch with rest of changes. (checked in)

Patch containing the rest of changes mentioned in comment 18.
Comment 27 Nelson Bolyard (seldom reads bugmail) 2008-12-02 17:24:02 PST
Comment on attachment 351081 [details] [diff] [review]
Patch with rest of changes. (checked in)

Please wait until TInderbox goes green again before committing this. 
Be sure the checkin comment explains that this is not a bug fix, but just "cleanup"
Comment 28 Slavomir Katuscak 2008-12-04 17:47:59 PST
Checking in chains/chains.sh;
/cvsroot/mozilla/security/nss/tests/chains/chains.sh,v  <--  chains.sh
new revision: 1.10; previous revision: 1.9
done
Checking in memleak/memleak.sh;
/cvsroot/mozilla/security/nss/tests/memleak/memleak.sh,v  <--  memleak.sh
new revision: 1.36; previous revision: 1.35
done
Comment 29 Nelson Bolyard (seldom reads bugmail) 2009-02-09 12:26:51 PST
Slavo, is this bug fully fixed now?  
If so, please reassign it to yourself and then mark it resolved/fixed.
If not, please explain what remains to be fixed.
Comment 30 Slavomir Katuscak 2009-02-10 01:53:41 PST
Nelson,

This bug is fixed, but the reason is still unknown. There are 3 possible reasons - OS problem, shell problem and DBX problem. 2 processes call in serial, first seems to be finished (couldn't see it in process list) but written file is not yet synced - this looks like OS bug. But using subshell helps - this looks like shell bug (maybe exiting subshell enforce sync of files). If DBX uses some operations on kernel level, then it can be DBX bug - this happens only when using DBX.

We know how to prevent it - using subshell for memory leak checking would work also for future tests. I'm not sure if it makes a sense report this problem for Solaris/bash/dbx, there are too many factors that can affect it that they would immediatelly close it.

Now I'm going to reassign this bug to me and close it. Let me know if there are any further steps we should do there.
Comment 31 Slavomir Katuscak 2009-03-12 08:03:37 PDT
*** Bug 403485 has been marked as a duplicate of this bug. ***
Comment 32 Nelson Bolyard (seldom reads bugmail) 2009-03-12 12:45:38 PDT
> *** Bug 403485 has been marked as a duplicate of this bug. ***

Slavo, A strsclnt hang is a duplicate of a certutil signature failure?
Comment 33 Slavomir Katuscak 2009-03-13 03:17:00 PDT
Yes, it was. In fact it was not directly strsclnt problem. On Solaris machines testing memory leaks were both strsclnt and selfserv hanging relatively often, and we didn't know the reason (this time I created tinderclean script to regularly kill those hanged processes). After changes in memleak.sh script to run DBX in subshell when we had problems with PKIX tests (bug 403485), this problem doesn't appear any more, so I came to conclusion that it was most probably caused by the same Solaris/Bash/DBX problem we had this time.

Note You need to log in before you can comment on or make changes to this bug.