Closed Bug 231164 Opened 21 years ago Closed 18 years ago

NSS QA logfile should log all failures

Categories

(NSS :: Test, defect, P1)

x86
Windows 2000
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME
3.11.3

People

(Reporter: bishakhabanerjee, Assigned: slavomir.katuscak+mozilla)

Details

Attachments

(3 files)

Currently, when failures arise due to network problems, and tests fail, the
output log contains no information that the tests failed.

For instance, this morning, tests failed on Windows because of what appears to
be network problems. Console messages said data could not be written to some
network drives. However, the output log files had no mention of the test
failures. The results.html file did show the test failures. Example results and
output log file attached as attachments subsequently.

Would directing all output to stderr be better, or would that make the output
log file too difficult to read?
However, the cert.log which is also visible from teh Tinderbox page (if you go
down from the QA Link to the miller.1 results directory) does log this line:

ERROR: Creating CA Cert DB failed 255

It does not state why, and I am not sure all diagnostic information can be put
in that file (specially if the failures occur due to network issues). I have no
problem modifying the scripts to contain as much info as possible.
Attached file cert.log
I should clarify, the output log does not contain any info on why the Cert tests
failed, or that they failed at all. It does say that subsequent tests like
ssl.sh failed because cert.sh failed.
We again have SDR tests failing on the nightly QA, but Tinderboxes report all
green. Going to look into this now.

Meanwhile, posting from an email Nelson wrote, more observations related to this
bug:

Bishakha and I looked at the result and log files this afternoon, and came
to these conclusions:
 
1. The output log file contains inadequate information for diagnosis.
 
I recommended that:
 
a) every command executed should be echoed into the log file, so that
we can tell which command failed.
 
b) the stderr output from every command should go into the log file, so
that we can determine what error caused the failure, something we cannot
do based on the program's return code alone.
 
c) all output should go to one log file, output.log.  THere should not be
separate log files for different portions of the test (e.g. no cert.log),
because only output.log gets copied to the servers.
 
2. The errors are in different places from run to run. tests that fail
will typically succeed in the next run, and some test that previously
passed will fail.  One would expect that a failure due to a logic error
in NSS libraries would cause repeatable deterministic results. 
 
Bishakha suspects network failures are responsible.
 
3. Presently, the QA tests do nearly everything over the network.
THe scripts are fetched over the net, the executables and shared libs
are fetched over the net, the test input files are fetched over the net,
and the output files are written over the net.  This is slow, and ends
up test the net as much or more than the code under test.
 
I recommend that:
 
a) all the relevant scripts, executables, shared libraries, and input
test files be fetched over the net and stored locally before the test
is begun. 
b) the test runs locally, and writes its output locally.
c) after the tests finish, the output files are copied back to the remote
file server. 
 
I further recommend that we resume checkins on the trunk until such time
as the above recommendations are implemented.  
I seem to recollect from the QA scripts that *Tinderbox* builds and QA was being
done locally, only the results were being written out over the network, but I'll
verify, and implement Nelson's suggestions. I concur that we can eliminate some
of these problems by doing so.

About the separate logs, I see this comment in common/init.sh file (by Sonia?):
    # a new log file, short - fast to search, mostly for tools to
    # see if their portion of the cert has succeeded, also for me -
    CERT_LOG_FILE=${HOSTDIR}/cert.log      #the output.log is so crowded...

Besdies a separate cert.log, I also notice a separate dbtest.log
Is it the consensus of everyone that I coalesce all these separate logs into one
big log? 
I didn't realize that cert.log exists.  So you can resolve
this bug INVALID or WORKSFORME. You don't need to coalesce
the separate logs.

If you want to implement Nelson's suggestions, it's better
to open new bugs.  In general each bug report should track
a single issue and that issue should not change during the
lifetime of the bug report.
I have no problem with creating other log files for eaching for errors, 
as long as their results are:
a) complete, and 
b) merged back into output.log when they're done.  
The nightly QA script (that generates the email) only greps output.log 
for errors.  That's just one reason why ALL the errors need to be in that log.

Wan-The, this bug says that output.log does not report ALL errors.
That is true.  The existence of other log files does not invalidate it.
BTW, if you look at my recent changes to pkits.sh, you will notice that 
that script sends the output of various commands to a separate log file,
which it greps for errors, and then it merges the content of that log
file back into the main log file so that all the output appears in one
log file. That's the right way to do it, IMO.
Assignee: bishakhabanerjee → jason.m.reid
QA Contact: bishakhabanerjee → jason.m.reid
Assignee: jason.m.reid → nobody
QA Contact: jason.m.reid → test
Assignee: nobody → Sandeep.Konchady
Priority: -- → P1
Assignee: Sandeep.Konchady → slavomir.katuscak
Target Milestone: --- → 3.11.3
Currently there are 2 more log files:
- cert.log - data goes to cert.log are also goes to output.log
- dbtest.log - not merged into output.log, but results of bugs goes to html, so we can see failures there; dbtest.sh can be easily modified to log into output.log, but probably there is some reason why it logs to the separate logfile. I see that log contains "ERROR" patterns, but they are OK, because they are in tests which should fail (for example opening non-existing directory) I don't think this is the main reason, is thare something more ?

Also it can be good to log every executed command to log, I can go over all testing scripts and add logging to commands which are not logged now.

Is there something more needed to do on this bug now ? 
I don't think anything else needs to be done. We don't know what tinderbox script Bishakha was running that might have caused it to go green. Closing as WORKSFORME.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: