Closed Bug 420933 Opened 16 years ago Closed 16 years ago

Static analysis for setting or not-setting outparams correctly in XPCOM methods

Categories

(Developer Infrastructure :: Source Code Analysis, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: dmandelin)

References

Details

Attachments

(2 files, 1 obsolete file)

This is an analysis of correct XPCOM method behavior.

* For any XPCOM method
** If a method returns a failure nsresult, the method must not set any outparams or inoutparams.
** If a method returns a success nsresult, the method must set all outparams (it may leave inoutparams untouched)

Definitions:
* an XPCOM method is any method returning an nsresult
* an outparam is annotated by XPIDL
* an inoutparam is annotated by XPIDL
* for methods that are not declared in IDL, make the following assumptions:
** if the last parameter is nsIFoo**, char**, PRUnichar**, or primitive*, it is an outparam
** if the last parameter is non-const nsAString or nsACString, it is an outparam

Possible enhancements to the spec based on implementation difficulties:

* If the analysis doesn't have enough information (i.e. passes an outparam pointer to a non-XPCOM method), issue a warning
* Reduce the "unknown territory" surface using additional annotations:
** Some existing methods may follow these rules but not return nsresult (e.g. return PRBool to indicate success/failure)... allow some way to annotate such methods with PR_TRUE == sucess or PR_TRUE == failure
Assignee: nobody → dmandelin
I think lots of existing methods do set outparams to null in the very beginning so that they don't have to remember to do it in all of their early returns for failures.

I'm not saying I like that or anything, just that I've seen it frequently.
Maybe we could special-case that for a warning instead of an error, but I'd like to get rid of it altogether before we start messing with exceptions too much.
As I've been working on the prototype, it occurs to me there is a practical problem with integrating this with the build. Namely, the analysis needs information from XPIDL, but all it would get in Treehydra is the gcc internals view of the single file being compiled. 

For the prototype, I have a big file with all the relevant XPIDL data that I read in before analyzing any methods. This would be painful in a Treehydra/build integration solution because (a) Treehydra doesn't know to learn external files (but that can be fixed), and more important, (b) I'm not sure how the file would get built, and (c) reading in a file with data on 8000 methods as part of each compile could get slow.

Thoughts?
I'm not sure I follow. I think we should do the analysis one function at a time, and not require any cross-function analysis:

* You can assume that any "XPCOM function" you call obeys the rules.
* Use annotations like __attribute__((user("NS_outparam"))) and __attribute__((user("NS_inoutparam"))), generated by XPIDL, to annotate methods.
I believe you're rewriting history, or is this more of a mozilla 2 thing?. iirc, the old rules for xpcom said that on failure the parameters may be garbage.
(In reply to comment #5)
>I believe you're rewriting history, or is this more of a mozilla 2 thing?.
>iirc, the old rules for xpcom said that on failure the parameters may be garbage.
iirc, garbage is allowed for numeric types; pointers may be left or set to zero on failure; I don't know what the rules are for the string classes though.
Well, let's do the analysis and see what doesn't follow the restrictive rules. We certainly want to make setting the outparam on success a requirement.
What about in|out parameters?

I don't entirely understand your end goal here, so this may be a non issue. XPConnect can stop in the middle of writing out parameters and return an error. It does go back through and cleans the converted ones up and assigns null. So that I think breaks the rule you're talking about. This occurs in the nsXPCWrappedJSClass::CallMethod method http://lxr.mozilla.org/seamonkey/source/js/src/xpconnect/src/xpcwrappedjsclass.cpp#1514. And this isn't all that unusual since these values are coming from JavaScript and there's no type enforcement.
OK, I have a prototype that works pretty well now using the GCC->JSON_CFG->Python architecture. Key question is, what do you want me to do with it first? I could run it on a Mozilla checkout of your choice and post the results, I could start figuring out how to port it to run on Treehydra, etc. Let me know. All the code is in my personal hg repo, by the way.

The most common violation does seem to be writing null to a pointer argument when the return value is a failure code. Also, there seem to be a fair number of violations relating to OOM handling.

There are a couple of places where it produces what you might consider false positives. In particular:

(a) Sometimes a method has multiple outparams and it delegates to a second method. If the second method is not an XPCOM method, then by the rules in the initial description, only the last param can be considered an outparam. Then the analysis reports that the outparams other than the last aren't written on success code returns.

E.g., see /home/dmandelin/mozilla/content/svg/content/src/nsSVGFilters.cpp:1553:61

This case is pretty common -- maybe 10-20% of the errors. This could be fixed by modifying the rules for determining outparams of non-XPCOM methods. Let me know if I should.

(b) Some methods check if an outparam is a null pointer, and if it doesn't, they don't write to it. For example, GetContainerEnumerator at /home/dmandelin/mozilla/chrome/src/nsChromeRegistry.cpp:1444:40. The analysis then finds a path where the value isn't set on a success case and reports an error. Some cases, like the example I gave here, really look like they're supposed to work this way.

These are less common, maybe 5% or so. I would guess that most can be fixed by tracking conditionals that check null pointers, and not requiring a write through a null pointer.

So, please also let me know what you'd like to see for these two cases.
> with it first? I could run it on a Mozilla checkout of your choice and post the
> results, I could start figuring out how to port it to run on Treehydra, etc.

If you could post the results from running it on mozilla-central, that would be great as a datapoint. Let's discuss at the moz2 meeting tomorrow how we want to proceed with treehydra.

> The most common violation does seem to be writing null to a pointer argument
> when the return value is a failure code. Also, there seem to be a fair number
> of violations relating to OOM handling.

for the moment, let's make "assigning null to an outparam" a warning instead of an error

> (a) Sometimes a method has multiple outparams and it delegates to a second
> method. If the second method is not an XPCOM method, then by the rules in the
> initial description, only the last param can be considered an outparam. Then
> the analysis reports that the outparams other than the last aren't written on
> success code returns.

Yeah, I'm wary of assuming that the other arguments are outparams. It seems better to find these cases and manually annotate them with NS_OUTPARAM... I'm happy to do that, especially if you can issue a common warning message for this case.

> (b) Some methods check if an outparam is a null pointer, and if it doesn't,
> they don't write to it. For example, GetContainerEnumerator at
> /home/dmandelin/mozilla/chrome/src/nsChromeRegistry.cpp:1444:40. The analysis
> then finds a path where the value isn't set on a success case and reports an
> error. Some cases, like the example I gave here, really look like they're
> supposed to work this way.
> 
> These are less common, maybe 5% or so. I would guess that most can be fixed by
> tracking conditionals that check null pointers, and not requiring a write
> through a null pointer.

Yeah, I hadn't considered this case at all. Consider that added to the spec!

This is awesome, by the way!
Here's the results on mozilla-central.

I put in a special warning to indicate methods that should maybe have arguments other than the last annotated as NS_OUTPARAM. There are 378 unique such warnings, although I think there are still fewer methods involved than that. To find these methods:

  grep NS_OUTPARAM moz-central-outparams.txt | sort | uniq

Also, I forgot to mention earlier that it's a path-sensitive analysis, and it has to bail out on a small fraction of methods (132 methods total) because they have too many paths (try "grep bailing" to find). It still finds errors in some of these methods before bailing. Depending on how important it is, I have some ideas for making the checker more robust on these methods.
Depends on: 422555
David, if you can post your patch which adds NS_OUTPARAM (and NS_INOUTPARAM) macros, I can start posting patches to manually annotate some of the confusing methods.
(In reply to comment #0)
> ** if the last parameter is nsIFoo**, char**, PRUnichar**, or primitive*, it is
> an outparam

What about char* or PRUnichar* ? I think those should be special-cased as inparams; I also wonder how often these would be triggered.

> ** Some existing methods may follow these rules but not return nsresult (e.g.
> return PRBool to indicate success/failure)... allow some way to annotate such
> methods with PR_TRUE == sucess or PR_TRUE == failure

nsNNTPProtocol currently (although I am working to fix this, see bug 400331) uses PRInt32 and other mailnews code may use that as well. However, I think that this should rarely come up if it all, since the protocol machine does not use outparams in the vast majority of affected methods.
OK, the JS version of the checker is about done. But I have some questions on the spec, testing, and packaging for bsmedberg before I can prepare a patch for review. I guess I'll just list the spec questions for now or this comment will get really long, and ask the others after.

S1. I tweaked the rules for guessing outparams by type based on my observations of the code. Please review my revised rule, which is that all these, and only these, are guessed to be outparams:

  X *, where X is any of:
    nsIFoo *
    void *
    integer type
    float type
    enum type
    union type

  S &, where S is any of 
    nsString, nsAString, nsACString, nsACString_internal, nsCString,
    nsCString_external, nsCSubstring, nsAString_internal, nsAString_external

S2. Unchecked returns in subcalls

Example code:

  nsresult Bar(NS_OUTPARAM *a);

  nsresult Foo(NS_OUTPARAM *a) {
    Bar(a);
    return SomeOtherCall();
  }

Per the original spec, there are 2 errors here: on success return, 'a' may not have been written (if Bar failed), and on failure return, 'a' may have been written (if Bar succeeded). 

But we might instead take this code pattern to indicate that Bar cannot fail, in which case there's only one error: on failure return, 'a' may have been written. 

What do you really want to happen here?

S3. Error message format

An error message that simply names the offending function, outparam, and property seems too hard to understand. Currently, I give the location of the statement that caused the return value to be success or failure and the location of the write to the outparam. Exactly like this:

Error: outparam written on failure: a
    Return at: test/e6.cc:13:10
    Write at: test/e6.cc:12:32

Is this enough information, or still too cryptic/hard to trace back through the code?

S4. Error message output

I just print the error message, which means it goes to stdout. xhydra has JS function for reporting errors/warnings as if they are any other GCC compiler error, but for this analysis it prints a bogus source location. Options here include:

  - leave it the way it is
  - as is, but print to stderr (requires patching xhydra print function)
  - make into GCC error

>   X *, where X is any of:

>     integer type

Does "integer type" exclude char*? char* should probably be an inparam, and char** is an outparam.

enum and union probably don't matter since xpidl won't emit them, but I suppose we can treat them as outparams.

>   S &, where S is any of 
>     nsString, nsAString, nsACString, nsACString_internal, nsCString,
>     nsCString_external, nsCSubstring, nsAString_internal, nsAString_external

how about "where S is or derives from
  nsAString
  nsACString
  nsAString_internal
  nsACString_internal"

> S2. Unchecked returns in subcalls

>   nsresult Bar(NS_OUTPARAM *a);
> 
>   nsresult Foo(NS_OUTPARAM *a) {
>     Bar(a);
>     return SomeOtherCall();
>   }
> 
> Per the original spec, there are 2 errors here: on success return, 'a' may not
> have been written (if Bar failed), and on failure return, 'a' may have been
> written (if Bar succeeded). 
> 
> But we might instead take this code pattern to indicate that Bar cannot fail,
> in which case there's only one error: on failure return, 'a' may have been
> written. 
> 
> What do you really want to happen here?

I think that if Bar really cannot fail, it should return void, not nsresult. So I think we should issue both warnings.

> S3. Error message format

> Error: outparam written on failure: a
>     Return at: test/e6.cc:13:10
>     Write at: test/e6.cc:12:32

This sounds like enough for me... after it lands I'm sure people will have suggestions for making it better.

> S4. Error message output

Definitely want the GCC reporting mechanism. I think for now we should use warning() not error(). This means that modules with -Werror will fail, while other modules will simply print the warning and continue.

I know that GCC warnings print bogus source locations, in both dehydra and treehydra... in most of my code I keep track of and print my own location information in addition.
Also PRUnichar* is an inparam and PRUnichar** is an outparam.
OK, I've made the changes to bring up to the refined spec (and fixed some regressions and newly exposed bugs). Next questions, on packaging:

P1. In order to run this, I've been replacing NS_SUCCEEDED and NS_FAILED with inline functions. This way, each usage is a single GIMPLE function call instruction, so it's easy for my analysis to handle. Using the macros, I get about 5 GIMPLE instructions, and they're weird, and I believe, platform dependent. (On my test platform, the test for a negative number gets compiled into and & with 0x80000000.)

Ideally, we would just replace the macros with inline functions in Mozilla 2. inline is available in C++ and C9X. Can we do this?

If the answer is no, then can we at least have inlines when compiled with -DDEHYDRA_GCC?

P2. What directory should the files live in? static-analysis.js is just in xpcom/, but this analysis is 10 .js files. Should they just be dumped in the same place, or should we create a new dir for them.

P3. Tests. I have 25 small tests. Each test is a C++ file where the analysis should either print no errors, or one or more outparam error messages. Can I use a Python script to drive the tests, or do I need to put together some Makefile hackery?
> P1. In order to run this, I've been replacing NS_SUCCEEDED and NS_FAILED with
> inline functions. This way, each usage is a single GIMPLE function call
[snip]
> Ideally, we would just replace the macros with inline functions in Mozilla 2.
> inline is available in C++ and C9X. Can we do this?

The only question is whether this affects performance. So can you try submitting this patch to the tryserver, which now has fancy-dandy performance testing?

> If the answer is no, then can we at least have inlines when compiled with
> -DDEHYDRA_GCC?

Yes, absolutely (although the define is NS_STATIC_CHECKING).

> P2. What directory should the files live in? static-analysis.js is just in
> xpcom/, but this analysis is 10 .js files. Should they just be dumped in the
> same place, or should we create a new dir for them.

new dir, make it xpcom/analysis, and move static-checking.js into that directory as well

> P3. Tests. I have 25 small tests. Each test is a C++ file where the analysis
> should either print no errors, or one or more outparam error messages. Can I
> use a Python script to drive the tests, or do I need to put together some
> Makefile hackery?

The makefile hackery already exists, see http://mxr.mozilla.org/mozilla-central/source/xpcom/tests/static-checker/
I just realized there is another problem. The makefile has a way to enable Dehydra checking with static-analysis.js, which adds flags to OS_CXXFLAGS. In order to run the outparams, you need to run Treehydra, which can't be done in the same compile, at least so far. But there a couple of options for that: (1) use the plugin multiplexer, when that becomes available, or (2) run Treehydra which is currently a superset of Dehydra (but may not be, when the multiplexer comes out). 

There are many options on how to do this, and thinking about them too hard makes my head hurt, but how about this:

- User can configure with either gcc_dehydra.so or gcc_treehydra.so. as the plugin. Treehydra-based analyses run only if treehydra was chosen.

- static-checking.js is modified so that iff running under Treehydra, it pulls in outparams.js and runs that analysis as well.

- Someday (but not now), checks currently in static-checking.js will move out to their own file, and static-checking.js will just dispatch to those files based on loaded plugin and user options (through strings passed to xhydra from the command line).

This strikes me as overly complex, but it's less complicated than the other stuff I thought of. Comments?
we should run treehydra as a superset of dehydra until the mythical multiplexing comes to pass. Which is what I'm doing already for the finalizer analysis in XPCOMGC...
Depends on: 428068
Depends on: 429770
Depends on: 425458
(In reply to comment #15)
> Definitely want the GCC reporting mechanism. I think for now we should use
> warning() not error(). This means that modules with -Werror will fail, while
> other modules will simply print the warning and continue.

Is this really what we want? How do you make a full build go through then? Right now it stops on nsCookieService.cpp, even with 'make -k'.
(In reply to comment #18)
> > P3. Tests. I have 25 small tests. Each test is a C++ file where the analysis
> > should either print no errors, or one or more outparam error messages. Can I
> > use a Python script to drive the tests, or do I need to put together some
> > Makefile hackery?
> 
> The makefile hackery already exists, see
> http://mxr.mozilla.org/mozilla-central/source/xpcom/tests/static-checker/
 
I extended the makefile to run my tests, but there is still a problem: on every test case, the compiler exits with an error because some dependency file is not found. Is this a bug in the makefile, or is there something else I have to do before 'make check'? Here's the command actually being run (for an existing test case) and the output:

[dmandelin@dm-oink01 static-checker]$ /home/dmandelin/gcc-dehydra/installed/bin/g++ -o /dev/null -S -I../../../dist/include/system_wrappers -include /home/dmandelin/sources/mozilla-xpidl/config/gcc_hidden.h -DOSTYPE=\"Linux2.6.18-53.1.14\" -DOSARCH=Linux  -I/home/dmandelin/sources/mozilla-xpidl/xpcom/tests/static-checker -I.  -I../../../dist/include   -I../../../dist/include -I../../../dist/include/nspr     -I../../../dist/sdk/include    -fPIC   -fno-rtti -fno-exceptions -Wall -Wconversion -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-long-long -pedantic -fno-strict-aliasing -fshort-wchar -pthread -pipe  -DDEBUG -D_DEBUG -DDEBUG_dmandelin -DTRACING -g -fno-inline -fplugin=/home/dmandelin/gcc-dehydra/dehydra-gcc/gcc_treehydra.so -fplugin-arg=/home/dmandelin/sources/mozilla-xpidl/xpcom/analysis/static-checking.js   -DMOZILLA_CLIENT -include ../../../mozilla-config.h -Wp,-MD,.deps/TestStackTemplate.pp /home/dmandelin/sources/mozilla-xpidl/xpcom/tests/static-checker/TestStackTemplate.cpp


/home/dmandelin/sources/mozilla-xpidl/xpcom/tests/static-checker/TestStackTemplate.cpp:14: error: TestStackTemplate.cpp:11:11: constructed object of type 'A<int>' not on the stack.
/home/dmandelin/sources/mozilla-xpidl/xpcom/tests/static-checker/TestStackTemplate.cpp:14: fatal error: opening dependency file .deps/TestStackTemplate.pp: No such file or directory
The error is caused because nobody created the .deps directory. I'm not sure why we're bothering to collect that deps information, but we should probably just force the .deps directory to exist... did you do a full build before testing (e.g. did you end up doing "make libs" in that directory?)
As for the -Werror... yes, that's what we really want. For initial testing, you can override the normal warnings-as-errors setting using "make WARNINGS_AS_ERRORS="
Depends on: 430592
(In reply to comment #23)
> The error is caused because nobody created the .deps directory. I'm not sure
> why we're bothering to collect that deps information, but we should probably
> just force the .deps directory to exist... did you do a full build before
> testing (e.g. did you end up doing "make libs" in that directory?)

OK, this is almost ready to go, except for this issue with the .deps. I'm not sure exactly what the requirements are supposed to be for running these tests, but just now I completed a full build (with static checking enabled and WARNINGS_AS_ERRORS=). I looked in xpcom/tests/static/checker and saw a bunch of .errlog files. They all have this problem, including the current trunk tests (e.g., TestStack.errlog). 

Suggestions? Is there some magical line to be added to the Makefile that creates the deps files?
Attached patch Preview patch (obsolete) — Splinter Review
Here's a preview of the patch. It's not *quite* ready for review, because of the deps issue in the unit tests, but otherwise I think it's what I want to commit.

I did a full browser build with this patch and static checking enabled. It goes through all the way with 'make WARNINGS_AS_ERRORS='. It prints out a bunch of static checking results. The code passes 25 small test cases and does the right thing for a spot check of results on Mozilla code.

Note that you will see a number of 'false positives' because of this pattern (and perhaps others like it):

  nsresult Foo(int **aInt NS_OUTPARAM) {
    // Uh, oh -- doFoo isn't XPCOM, so we guess its outparams per the spec,
    // which can only guess the past param to be an outparam, so it looks
    // like aInt is never written to on success.
    return doFoo(aInt, mOption);
  }

The false positive can be eliminated by manually annotating the declaration of doFoo.

I went ahead and replaced NS_FAILED and NS_SUCCEEDED with inline functions in C++ only. (I actually got a build to go through having them be inline functions in C as well, but I'm doubtful about that. Apparently C99 has inline, but I don't think Mozilla is building with C99 enabled, so it probably used some GCC extension. And reading about inline functions in C [1] has made me think they are sketchy and hard to understand wrt the linker.) 

I ran the inline part of the patch through tryserver, and it builds on those 3 platforms with AFAICT no perf regression. I'm really not sure what's the correct procedure for determining something doesn't have perf regressions. It's made harder by the fact that the numbers are just different on try-pubuntu vs. mini-ubuntu1(-4). (try is faster.) Here's what I observed:

- The numbers for my patch are about the same as the numbers for other tests being run on the try server (leaving out the spikes).

- I ran an empty patch (just adding a newline to a file) as a baseline, and the inline patch is about the same. E.g., the Tp numbers:

           baseline         inline run 1            inline run 1
Linux       484.66             486.12                   479.62
Mac         286.60               --                     286.31
Win         444.93             444.33                   444.72

(I don't know why results didn't come up for Mac inline run 1. It built OK. There is a big red box from some other build at about the time I would have expected to see my results.)

Historically, it looks like Linux has a standard deviation of about 3 (sample variance 10.0) on these results on mini-ubuntu. My 3 results have a sample variance of 11.63 so they look reasonably consistent.
Comment on attachment 317654 [details] [diff] [review]
Preview patch

>\ No newline at end of file

Saw a couple of these: please fix before checkin

>diff -r 06a3d0cd6aca xpcom/base/nsError.h

For the moment, please make this conditional on NS_STATIC_CHECKING:

#if defined(NS_STATIC_CHECKING) && defined(__cplusplus)
inline functions
#else
macros
#endif

>diff -r 06a3d0cd6aca xpcom/tests/static-checker/Makefile.in

>+OUTPARAMS_FAILURE_TESTCASES = \
>+  e1.cpp e2.cpp e3.cpp e4.cpp e5.cpp e6.cpp e7.cpp e8.cpp e9.cpp e10.cpp \
>+  e11.cpp \
>+  $(NULL)

Please follow existing style and have one file per line, indented with two spaces.

> # We want to compile each file and invert the result to ensure that
> # compilation failed.
> check:: $(STATIC_FAILURE_TESTCASES:.cpp=.s-fail)
>+
>+check_outparams: $(OUTPARAMS_FAILURE_TESTCASES:.cpp=.s-fail) $(OUTPARAMS_PASS_TESTCASES:.cpp=.s-pass)

Why do you have a separate `check_outparams` target? I think we should run these tests as part of the standard target unless there's a good reason not to.

I'm not going to attempt to review the actual analysis scripts ;-)
(In reply to comment #27)
> >+check_outparams: $(OUTPARAMS_FAILURE_TESTCASES:.cpp=.s-fail) $(OUTPARAMS_PASS_TESTCASES:.cpp=.s-pass)
> 
> Why do you have a separate `check_outparams` target? I think we should run
> these tests as part of the standard target unless there's a good reason not to.

I separated it because those tests will not work if the user isn't running with Treehydra, which I assumed was kind of a special plugin that not everyone has. But I'll go ahead and add my tests to the 'check' target unless you say not to. And I'll fix the other stuff. 
Depends on: 430863
Attached patch PatchSplinter Review
Here it is. The big patch. It's about half the size because the reusable code has been checked in to Treehydra.
Attachment #317654 - Attachment is obsolete: true
Attachment #318656 - Flags: review?
Attachment #318656 - Flags: review? → review?(benjamin)
Comment on attachment 318656 [details] [diff] [review]
Patch

>diff -r 06a3d0cd6aca xpcom/analysis/static-checking.js

>+/* -*- Mode: Java; c-basic-offset: 2; indent-tabs-mode: nil -*- */

Because I've become addicted to JS2-mode, could you please remove this modeline? ;-)

r=me
a=me for pushing this to mozilla-central in a closed tree
Attachment #318656 - Flags: review?(benjamin) → review+
Pushed with modeline cut. :-)
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Depends on: 431857
Depends on: 431677, 431832
Depends on: 431965
Would be reasonable to extend the spec, such that outparams are never leaked(ie the pointer is not saved anywhere on the heap)?

This would be helpful for other analyses where we need to ensure that variable values don't change outside of a function during execution.
Depends on: 433607
Depends on: 433939
(In reply to comment #17)
> In order to run this, I've been replacing NS_SUCCEEDED and NS_FAILED with
> inline functions. This way, each usage is a single GIMPLE function call
> instruction, so it's easy for my analysis to handle. Using the macros, I get
> about 5 GIMPLE instructions, and they're weird, and I believe, platform
> dependent.
It's probably the NS_(UN)LIKELY that's confusing it.

> the test for a negative number
nsresult is unsigned ;-)
Product: Core → Firefox Build System
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: