Closed Bug 384639 Opened 17 years ago Closed 17 years ago

Leak check call stacks must show symbolic function names on all platforms

Categories

(NSS :: Test, defect)

3.11.7
Sun
Solaris
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
3.11.8

People

(Reporter: nelson, Assigned: slavomir.katuscak+mozilla)

References

()

Details

Attachments

(4 files, 2 obsolete files)

The results from recent leak tests performed on various platforms show the 
call stacks of memory allocations that have been leaked.  See, for example,
https://bugzilla.mozilla.org/show_bug.cgi?id=308275#c3

The various debugger and leak checker programs on Windows see the names of 
all the functions in nssckbi in a call stack, but on Solaris, DBX does not 
display the names of the functions in nssckbi.so.  That is the problem.

There are (at least) two possible explanations for this:

1) The program under test is actually using a different copy of nssckbi.so
than the one we intend for it to be using.  Perhaps it is using a copy from
a system library (which is not a debug build, and has no symbols) instead of 
the desired copy, which a debug build, in the same directory as the other
debug shared libraries being tested.  That would probably be an error in 
the test scripts, perhaps setting up the LD_LIBRARY_PATH with the directory
names listed in the wrong order.  

2) The copy of nssckbi.so in the directory of shared libraries that we are
testing was built without symbols.  That would probably be a build error,
where the makefiles build nssckbi for debug without symbols on Solaris, 
and possibly on other platforms also.

The first task for this bug is to find out exactly the full path name of 
the exact copy of nssckbi.so that was loaded into the test program.
There is a way to get the system to tell you that information, but I do 
not recall the steps at this moment.  

If you find that the copy being loaded is NOT the copy from the directory
of shared libraries being tested, then this bug remains a "test" bug, 
and the script should be fixed.

If you find that the copy being loaded IS the copy from the directory of 
shared libraries being tested, then you need to find out if the copy in 
that directory has come from a debug build of NSS or from an optimized build.
If it has come from an optimized build, then the script that copied it from
the wrong build needs to be fixed.

If you find that it is a copy from the debug build, then the next step is to
determine if the Debug build of nssckbi has the symbols in it, or not.  If 
it does not, then the problem is a build problem.  The "component" attribute
of this bug should be changed from "test" to "build", and the Makefiles must
be fixed to build nssckbi.so with symbols.
Bob relyea wrote:

> RE: Missing symbols...
> 
> I now know why the symbols are missing. I saw the same thing in the shared db
> code. What is happening is the code is properly closing the shared library,
> so by the time the leak program goes and displays the results the shared
> library is gone. This tells us we are definately shutting down the PKCS #11
> module correctly.
> 
> For diagnostic purposes we could get the symbols back by commenting out the 
> PR_UnloadLibrary call in pk11wrap. (it will also add some leaks related to
> the shared library, But it will give the symbols of where these leaks are
> comming from).

So, this is NOT a build problem.  
To 1):

I use absolute path when loading library file, so this is not a problem.

To 2):

How can I check if symbols are there or not ? I tried to use strings command and function names are there, is that enough or should I do something more ?

To Bob's suggestion:

I'm going to try to comment PR_UnloadLibrary and check if symbolic names will be there.



So I commented out all PR_UnloadLibrary calls in NSS code and ran memleak tests. Symbolic names are there, so this is it. I expected new leaks as Bob mentioned, but there are only known leaks, nothing new.

Now we have 2 possible solutions:

1. Modify code to skip library unloading when some environment variable is set and have this variable set when building our testing builds.
2. Modify code to skip library unloading when some environment variable is set and if new leak is found, rebuild code with this variable set and run tests again to get full stacks with symbols.

Let me know which solution do you prefer, or if you have some better idea.
Attached patch Patch v1. (obsolete) — Splinter Review
I implemented solution which uses runtime environment variable NSS_DISABLE_UNLOAD. If this variable is set, libraries are not unloaded. I modified memleak.sh to set this variable for DBG tests and unset for OPT tests (as decided on staff meeting last week), so everything is automated.

Tested on aquash Tinderbox.
Attachment #271707 - Flags: review?(nelson)
Comment on attachment 271707 [details] [diff] [review]
Patch v1.

r-
3 comments:

1) The unload calls that are modified in this patch seem to fall into two
categories:
a) unloading after the library has been used (that is, after library 
functions have been called), and 
b) unloading after the library has been loaded but before it has been
used (e.g. because some required symbol is missing from it).

IMO, we only want to suppress unloads for the first of those two cases,
that is, after the library has actually been used, and not in the second
case, where the library has not been called, and so cannot have leaked
any memory.  

So, for example, some of the patched files have two functions, a load
function and an unload function.  IMO, we only want to suppress the 
unload in the unload function (which presumably follows the library 
actually being used), and not suppress it in the load function.

2) I think we want this code that suppressed unloading to be ifdef DEBUG.
That is, I think we only want it in DEBUG builds and not in optimized
builds.  If you disagree, please tell me why.  I'm willing to be persuaded.

3) Given that this seem to be unnecessary on Windows, I think the code to 
suppress unloading should be #if !defined(WIN32)
maybe #if defined(DEBUG) && !defined(WIN32)
Likewise for any other platform on which it is unnecessary.

A request:
I think you made this patch with the command cvs diff -u9 <file_name> ...
Please make the next patch, and future patches, with cvs diff -pu9 
That is, add -p to the cvs diff command.  Thanks.
Attachment #271707 - Flags: review?(nelson) → review-
1) Agree.
2) I don't know those details of NSS code, so I didn't know about DEBUG define. Can you confirm me that I understand this correctly - DEBUG is defined in C code always when environment variable BUILD_OPT is not defined, right ?
3) I don't think this is important. We could add there exceptions also for AIX, HPUX and other systems, or just check if we are on Solaris or Linux (only 2 supported platforms now). But is this really needed ? I would rather let this OS independent.
In answer to question 2 in comment 6, Yes, DEBUG is defined in NSS builds
that do not use BUILD_OPT.

I DO think item 3 is important, but I don't feel so stronly about it to
say that you MUST do it.  Let's ask Julien for another opinion.
Nelson,

Regarding item 3, I think it is better to keep the code simple and portable. It's possible that the particular memleak check tools we are using right now (Purify?) play some tricks to force the shared lib to stay loaded, because normally the shared libraries do get unloaded from memory even on that platform. I don't see any downside to keeping it that way. The only exception would be in the pk11mode.c program which has a native operation mode for Windows and requires #ifdef WIN32.
Attached patch Patch v2. (obsolete) — Splinter Review
Implemented 1. and 2. from comment #5.
Attachment #271707 - Attachment is obsolete: true
Attachment #272774 - Flags: review?(nelson)
Comment on attachment 272774 [details] [diff] [review]
Patch v2.

This patch would make it so that debug builds NEVER unload their libraries!
That's certainly unacceptable.
Attachment #272774 - Flags: review?(nelson) → review-
Slavo, I think perhaps you misunderstood point 2 in comment 5.
I did not suggest that we stop using an environment variable to control it.
I suggested that we only check for that environment variable, and then only
supress unloading if it is set, on debug builds only.  
Attached patch Patch v3.Splinter Review
Thanks for clarification. I prepared patch which checks for both debug build and environment variable. But what I don't understand now is why is not enough to check only for environment variable (which is set only for debug build) ?
Attachment #272774 - Attachment is obsolete: true
Attachment #273406 - Flags: review?(nelson)
Comment on attachment 273406 [details] [diff] [review]
Patch v3.

r=nelson.

In answer to your question, Slavo,
Failing to unload a dynamically loaded library
is an error, a coding error.  We don't ever want
production code used by customers to make that error,
not even if some environment variable is set.
That's why we want this code ifdef DEBUG.
Attachment #273406 - Flags: review?(nelson) → review+
Trunk:

Checking in cmd/pk11mode/pk11mode.c;
/cvsroot/mozilla/security/nss/cmd/pk11mode/pk11mode.c,v  <--  pk11mode.c
new revision: 1.15; previous revision: 1.14
done
Checking in cmd/pk11util/pk11util.c;
/cvsroot/mozilla/security/nss/cmd/pk11util/pk11util.c,v  <--  pk11util.c
new revision: 1.9; previous revision: 1.8
done
Checking in lib/dev/devmod.c;
/cvsroot/mozilla/security/nss/lib/dev/devmod.c,v  <--  devmod.c
new revision: 1.8; previous revision: 1.7
done
Checking in lib/freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.34; previous revision: 1.33
done
Checking in lib/pk11wrap/pk11load.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11load.c,v  <--  pk11load.c
new revision: 1.22; previous revision: 1.21
done
Checking in lib/softoken/lgglue.c;
/cvsroot/mozilla/security/nss/lib/softoken/lgglue.c,v  <--  lgglue.c
new revision: 1.7; previous revision: 1.6
done
Checking in lib/softoken/legacydb/lginit.c;
/cvsroot/mozilla/security/nss/lib/softoken/legacydb/lginit.c,v  <--  lginit.c
new revision: 1.9; previous revision: 1.8
done
Checking in tests/memleak/memleak.sh;
/cvsroot/mozilla/security/nss/tests/memleak/memleak.sh,v  <--  memleak.sh
new revision: 1.6; previous revision: 1.5
done
Attached patch Patch v4.Splinter Review
Previous patch has a problem to build on Windows. This patch fixes the problem.

Checking in freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.35; previous revision: 1.34
done
Checking in pk11wrap/pk11load.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11load.c,v  <--  pk11load.c
new revision: 1.23; previous revision: 1.22
done
The same as patch v3 + patch v4 (small fix) but for branch. It's already working on trunk.
Attachment #273760 - Flags: review?(julien.pierre.boogz)
Comment on attachment 273760 [details] [diff] [review]
Patch for branch v1.

This is fine. But the branch needs a 2nd reviewer.
Attachment #273760 - Flags: superreview?(nelson)
Attachment #273760 - Flags: review?(julien.pierre.boogz)
Attachment #273760 - Flags: review+
Comment on attachment 273760 [details] [diff] [review]
Patch for branch v1.

r=nelson
Attachment #273760 - Flags: superreview?(nelson) → review+
Branch:

Checking in cmd/pk11mode/pk11mode.c;
/cvsroot/mozilla/security/nss/cmd/pk11mode/pk11mode.c,v  <--  pk11mode.c
new revision: 1.1.2.13; previous revision: 1.1.2.12
done
Checking in cmd/pk11util/pk11util.c;
/cvsroot/mozilla/security/nss/cmd/pk11util/pk11util.c,v  <--  pk11util.c
new revision: 1.8.2.1; previous revision: 1.8
done
Checking in lib/dev/devmod.c;
/cvsroot/mozilla/security/nss/lib/dev/devmod.c,v  <--  devmod.c
new revision: 1.7.28.1; previous revision: 1.7
done
Checking in lib/freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.26.2.5; previous revision: 1.26.2.4
done
Checking in lib/pk11wrap/pk11load.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11load.c,v  <--  pk11load.c
new revision: 1.17.2.1; previous revision: 1.17
done
Checking in tests/memleak/memleak.sh;
/cvsroot/mozilla/security/nss/tests/memleak/memleak.sh,v  <--  memleak.sh
new revision: 1.1.2.5; previous revision: 1.1.2.4
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Reopening this bug on Nelson's request from bug 308275. 
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Patch v5.Splinter Review
This patch adds option to disable library unloading also in optimized build. Nelson, in comment #13 you mentioned that this can be an error, so check if you really want this change.
Attachment #284148 - Flags: review?(nelson)
Comment on attachment 284148 [details] [diff] [review]
Patch v5.

r=nelson
Thanks, Slavo.
Attachment #284148 - Flags: review?(nelson) → review+
Trunk:

Checking in cmd/pk11mode/pk11mode.c;
/cvsroot/mozilla/security/nss/cmd/pk11mode/pk11mode.c,v  <--  pk11mode.c
new revision: 1.16; previous revision: 1.15
done
Checking in cmd/pk11util/pk11util.c;
/cvsroot/mozilla/security/nss/cmd/pk11util/pk11util.c,v  <--  pk11util.c
new revision: 1.10; previous revision: 1.9
done
Checking in lib/dev/devmod.c;
/cvsroot/mozilla/security/nss/lib/dev/devmod.c,v  <--  devmod.c
new revision: 1.9; previous revision: 1.8
done
Checking in lib/freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.36; previous revision: 1.35
done
Checking in lib/pk11wrap/pk11load.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11load.c,v  <--  pk11load.c
new revision: 1.24; previous revision: 1.23
done
Checking in lib/softoken/lgglue.c;
/cvsroot/mozilla/security/nss/lib/softoken/lgglue.c,v  <--  lgglue.c
new revision: 1.9; previous revision: 1.8
done
Checking in lib/softoken/legacydb/lginit.c;
/cvsroot/mozilla/security/nss/lib/softoken/legacydb/lginit.c,v  <--  lginit.c
new revision: 1.11; previous revision: 1.10
done
Checking in tests/memleak/memleak.sh;
/cvsroot/mozilla/security/nss/tests/memleak/memleak.sh,v  <--  memleak.sh
new revision: 1.15; previous revision: 1.14
done
Nelson, do we need this change also in branch, or trunk is enough ?
This change is also needed on branch.  
Attachment #284148 - Flags: review?(julien.pierre.boogz)
Attachment #284148 - Flags: review?(julien.pierre.boogz) → review+
Branch:

Checking in cmd/pk11mode/pk11mode.c;
/cvsroot/mozilla/security/nss/cmd/pk11mode/pk11mode.c,v  <--  pk11mode.c
new revision: 1.1.2.14; previous revision: 1.1.2.13
done
Checking in cmd/pk11util/pk11util.c;
/cvsroot/mozilla/security/nss/cmd/pk11util/pk11util.c,v  <--  pk11util.c
new revision: 1.8.2.2; previous revision: 1.8.2.1
done
Checking in lib/dev/devmod.c;
/cvsroot/mozilla/security/nss/lib/dev/devmod.c,v  <--  devmod.c
new revision: 1.7.28.2; previous revision: 1.7.28.1
done
Checking in lib/freebl/loader.c;
/cvsroot/mozilla/security/nss/lib/freebl/loader.c,v  <--  loader.c
new revision: 1.26.2.6; previous revision: 1.26.2.5
done
Checking in lib/pk11wrap/pk11load.c;
/cvsroot/mozilla/security/nss/lib/pk11wrap/pk11load.c,v  <--  pk11load.c
new revision: 1.17.2.2; previous revision: 1.17.2.1
done
Checking in tests/memleak/memleak.sh;
/cvsroot/mozilla/security/nss/tests/memleak/memleak.sh,v  <--  memleak.sh
new revision: 1.1.2.8; previous revision: 1.1.2.7
done
Status: REOPENED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: