Closed Bug 1019838 Opened 10 years ago Closed 10 years ago

Various leaks in dosprintf (in jsprf.cpp and nsTextFormatter.cpp)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: mccr8, Assigned: gokcehankara, Mentored)

References

(Blocks 1 open bug)

Details

(Keywords: coverity, memory-leak, Whiteboard: [MemShrink:P3] )

Attachments

(2 files, 4 obsolete files)

Coverity reports a number of leaks in this function.  Nine places that return early fail to free |nas|.  This happens at line 660, 674, 822, 837, 880, 887, 925, 944, and 947.  Some kind of RAII class would help here.
Whiteboard: [MemShrink]
It looks like there's another copy of this function (?!?) in xpcom/glue/nsTextFormatter.cpp which has the same problem, so it might as well get fixed at the same time.
Summary: Various leaks in dosprintf (in jsprf.cpp) → Various leaks in dosprintf (in jsprf.cpp and nsTextFormatter.cpp)
Whiteboard: [MemShrink] → [MemShrink] [mentor=mccr8]
Is a successful outcome here fix-and-refactor, or just fix?
I think just fix.  I'm not sure how you can really make the same code be callable in the JS engine and the browser.
Whiteboard: [MemShrink] [mentor=mccr8] → [MemShrink:P3] [mentor=mccr8]
Mentor: continuation
Whiteboard: [MemShrink:P3] [mentor=mccr8] → [MemShrink:P3]
I would like to give it a try if it's ok. First time here so any help is appreciated.

I'm attaching a patch simply adding a RAII class as Andrew suggested. It compiled fine on my machine and `make check` did not fail any test so hopefully I haven't broken anything serious yet. How should I actually test whether the leak is fixed or not?
You are on the right track here, but I have a few comments.

First, the constructor should just take a single argument of type NumArgState*.  This way, all of the complicated argument things that are needed to construct the NumArgState don't have to be built into the RAII class itself.  This will also allow it to be used in different situations.

Second, the destruction for |nas| in your RAII class is not correct.  If you notice, when it does properly free |nas| it does this:
        if (nas != nasArray)
            js_free(nas);
The code is doing this odd thing where sometimes it allocates |nas| using malloc, and sometimes it uses this |nasArray| which is allocated on the stack.  Only in the former case can we call |free| on it, so you'll have to modify the destructor to correctly handle this.

A third comment I have is that once you have made it more generic as I describe in the first comment, you could also use this RAII class in BuildArgArray itself, and remove some of the cleanup code that is currently there.  That can be done in a separate patch, or in a separate bug (possibly to be done by somebody else) if you want to keep this simple.
I don't even think you need a new RAII class for this.  Just use

  UniquePtr<NumArgState, JS::FreePolicy> nas;
  if (...stuff that causes it to be created...)
    nas.reset(BuildArgArray(fmt, ap, rv, nasArray));

  nasArray = ... ? nas.get() : ...;

and adjust as necessary any code that requires direct pointer access.  UniquePtr is in #include "mozilla/UniquePtr.h" which you can find in mfbt/UniquePtr.h, with copious comments on how to use it.
Looks like `JS::FreePolicy` is defined as such:

    struct FreePolicy
    {
        void operator()(void* ptr) {
            js_free(ptr);
        }
    };

Therefore it has the same problem mentioned in Andrew's second comment. I can define a custom free policy struct but then I also need to somehow save `nasArray` for comparison while freeing memory. By the way same thing is also required in our previous RAII class so currently I defined it as follows:

    struct NumArgStateHandler
    {
        NumArgState *nas;
        NumArgState *nasArray;
    
        NumArgStateHandler(NumArgState *nas, NumArgState *nasArray)
        {
            this->nas = nas;
            this->nasArray = nasArray;
        }
    
        ~NumArgStateHandler()
        {
            if (nas && nas != nasArray)
                js_free(nas);
        }
    };

Is there a way to somehow get rid of `nasArray` in the class?

Andrew, I don't mind to keep working on this bug. I can now remove manual memory frees in both `dosprintf` and `BuildArgArray` functions and send an updated patch. Feel free to tell me anything required regarding this bug.
(In reply to Andrew McCreight [:mccr8] from comment #5)
> The code is doing this odd thing where sometimes it allocates |nas| using
> malloc, and sometimes it uses this |nasArray| which is allocated on the
> stack.  Only in the former case can we call |free| on it, so you'll have to
> modify the destructor to correctly handle this.

Can we just use a Vector and be done with it? Vector also has inline capacity and frees its external memory when it goes out of scope.
> Is there a way to somehow get rid of `nasArray` in the class?

No, that looks like the right thing to me.

> I can now remove manual memory frees in both `dosprintf` and `BuildArgArray` functions and send an updated patch.

Sounds great, thanks!

> Can we just use a Vector and be done with it? Vector also has inline capacity and frees its external memory when it goes out of scope.

Only one of the copies of this code is in JS, so it would be nice to fix the possible leaks before making it too JS-y.  That sounds like a good followup bug, though.
(In reply to gokcehankara from comment #7)
> Looks like `JS::FreePolicy` is defined as such:
> 
>     struct FreePolicy
>     {
>         void operator()(void* ptr) {
>             js_free(ptr);
>         }
>     };
> 
> Therefore it has the same problem mentioned in Andrew's second comment.

I don't think so.  The point would be, you'd have UniquePtr solely to own a newly-constructed NumArgState -- if one was constructed.  If none was constructed, the UniquePtr would contain nullptr and wouldn't control anything.  But aside from owning the NumArgState if created, you'd never actually use it -- you'd use a raw pointer, either copied out of the UniquePtr using get() or using the address of the local.
(In reply to Andrew McCreight [:mccr8] from comment #9)
> > Can we just use a Vector and be done with it? Vector also has inline capacity and frees its external memory when it goes out of scope.
> 
> Only one of the copies of this code is in JS, so it would be nice to fix the
> possible leaks before making it too JS-y.  That sounds like a good followup
> bug, though.

Vector is no longer JS-y, see mfbt/Vector.h
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #10)
> I don't think so.  The point would be, you'd have UniquePtr solely to own a
> newly-constructed NumArgState -- if one was constructed.  If none was
> constructed, the UniquePtr would contain nullptr and wouldn't control
> anything.

Would it contain nullptr or nasArray?

> But aside from owning the NumArgState if created, you'd never
> actually use it -- you'd use a raw pointer, either copied out of the
> UniquePtr using get() or using the address of the local.

I'm not sure I'm following, can you maybe send a patch?
(In reply to Jan de Mooij [:jandem] from comment #11)
> Vector is no longer JS-y, see mfbt/Vector.h

What exactly is 'JS-y', where can I read about it?
(In reply to gokcehankara from comment #13)
> What exactly is 'JS-y', where can I read about it?

Sorry, I should have been more explicit there.  I meant that the Vector class used to be something that was really only used by the JS engine (code in js/src), but Jan pointed out that it is now in MFBT, which is part of our standard set of data structures, so Vector can be used outside of the JS engine.
Attached patch UniquePtr-based patch (obsolete) — Splinter Review
(In reply to gokcehankara from comment #12)
> I'm not sure I'm following, can you maybe send a patch?

Oh, now I see what you mean.  I hadn't realized BuildArgArray wasn't consistently returning a new array.

You do have an option for storing the on-stack array pointer for deletion.  See the second template argument to UniquePtr, which can be used to specify how deletion should work.  You could have a deleter class that stored the on-stack pointer, then in its operator()(NumArgState*) freed only if the two pointers were unequal.  The patch here does this -- passes jsapi-tests, passed the first hundred or so jstests, might or might not be good to go.

That said, when I look closer at what this is doing, Vector<NumArgState, NAS_DEFAULT_NUM> seems probably a better idea.  That, or nuking all this code from orbit, but we probably can't quite do so easily.
I started playing with the Vector. Attached is a patch using vectors in `js/src/jsprf.cpp` and `xpcom/glue/nsTextFormatter.cpp`. I also added manual free statements to early returns in `nsprpub/pr/src/io/prprf.c` since we can't use vectors here. Is there an effort to migrate C files to C++?

Also by the way formatting for `nsprpub/pr/src/io/prprf.c` seems inconsistent having mixed tabs and spaces, trail spaces and missing vim modeline. Shall I do something about it?

I'm currently working in a seperate build in `js/src/build-debug` as described in SpiderMonkey documentation. I use `make check` to run the tests but I'm not sure if it runs all available tests. Any advices?
Comment on attachment 8473635 [details] [diff] [review]
1019838.patch - Use Vector for memory management in dosprintf

Review of attachment 8473635 [details] [diff] [review]:
-----------------------------------------------------------------

Because these files are so similar, you are probably better off working on getting one version reviewed (say, the JS one), so you don't have to redo the same work over and over again.  Then you can get the other one reviewed, I guess by the same person.  This case is a little weird because there are three very different areas of the code that use almost identical copies of something.  Generally we try to avoid that.

In terms of testing, it looks like xpcom/tests/TestTextFormatter.cpp does test this, though I'm not sure offhand how to run that.  Maybe there is some kind of ./mach cppunit test that will run it.

::: xpcom/glue/nsTextFormatter.cpp
@@ +549,5 @@
>  **	fmp = "%4$i, %2$d, %3s, %1d";
>  ** the number must start from 1, and no gap among them
>  */
>  
> +static Vector<NumArgState, NAS_DEFAULT_NUM, SystemAllocPolicy>

You probably want to make a typedef for this, called something like NumArgStateVector, to make it a little easier to see.

@@ +555,5 @@
>  {
>    int number = 0, cn = 0, i;
>    const char16_t* p;
>    char16_t  c;
> +  Vector<NumArgState, NAS_DEFAULT_NUM, SystemAllocPolicy> nas;

Declaring Vector as a local variable here means that it will be destroyed when you return from the method.  You probably want to do something like pass in a reference to the vector to this method, and declare it in anything that calls BuildArgArray.

@@ +604,3 @@
>    }
>  
> +  nas.growByUninitialized(number);

The more usual way we do this in Gecko code would be to check the return value of |growByUninitialized|, rather than |.empty()|.
(In reply to Andrew McCreight [:mccr8] from comment #17)
> Because these files are so similar, you are probably better off working on
> getting one version reviewed (say, the JS one), so you don't have to redo
> the same work over and over again.  Then you can get the other one reviewed,
> I guess by the same person.  This case is a little weird because there are
> three very different areas of the code that use almost identical copies of
> something.  Generally we try to avoid that.

I have removed the two other files from the patch for now.

> Declaring Vector as a local variable here means that it will be destroyed
> when you return from the method.  You probably want to do something like
> pass in a reference to the vector to this method, and declare it in anything
> that calls BuildArgArray.

I thought move assignment operator should keep the copy before it was destructed. Not sure how it would behave with inline allocation though. I now pass it in a parameter.
> I thought move assignment operator should keep the copy before it was destructed. Not sure how it would behave with inline allocation though. I now pass it in a parameter.
Oh, maybe that's the case.  Most of our classes are not fancy like that, but I guess Vector probably is. :) I'm only sort of familiar with move assignment stuff.  Waldo might know.
Andrew, shall we merge this patch so that I can actually complete my first bug before someone nuke all this code from the orbit? :)

(In reply to Andrew McCreight [:mccr8] from comment #19)
> Oh, maybe that's the case.  Most of our classes are not fancy like that, but
> I guess Vector probably is. :) I'm only sort of familiar with move
> assignment stuff.  Waldo might know.

I'm not quite comfortable with move semantics either and passing as argument seems like a decent way to do it, so we might as well better be safe than sorry.

Do you think we should fix the functions in the other two files or leave it as a followup bug since they are in different components?
(In reply to gokcehankara from comment #20)
> Andrew, shall we merge this patch so that I can actually complete my first
> bug before someone nuke all this code from the orbit? :)

Sure, which patch are you thinking of?

> I'm not quite comfortable with move semantics either and passing as argument
> seems like a decent way to do it, so we might as well better be safe than
> sorry.
That sounds reasonable.

> Do you think we should fix the functions in the other two files or leave it
> as a followup bug since they are in different components?

It is probably easier to just deal with it in multiple bugs.
Assignee: nobody → gokcehankara
I was thinking of the latest one I attached:

1019838.patch - Use Vector for memory management in dosprintf - review 1 (7.20 KB, patch)
Attachment #8470525 - Attachment is obsolete: true
Attachment #8473126 - Attachment is obsolete: true
Attachment #8473635 - Attachment is obsolete: true
Comment on attachment 8473988 [details] [diff] [review]
1019838.patch - Use Vector for memory management in dosprintf - review 1

Sounds good.  I'll flag Waldo for review because he is a JS peer and has looked at this code a little bit.
Attachment #8473988 - Flags: review?(jwalden+bmo)
Comment on attachment 8473988 [details] [diff] [review]
1019838.patch - Use Vector for memory management in dosprintf - review 1

Review of attachment 8473988 [details] [diff] [review]:
-----------------------------------------------------------------

Good patch, just needs a bit more simplification along one axis.

::: js/src/jsprf.cpp
@@ +59,5 @@
>  };
>  
>  const size_t NAS_DEFAULT_NUM = 20;  // default number of NumberedArgumentState array
>  
> +typedef Vector<NumArgState, NAS_DEFAULT_NUM, SystemAllocPolicy> NumArgStateVector;

Vector is sufficiently well-recognized, I think, that we should get rid of NAS_DEFAULT_NUM and just use a literal 20 here.  The extra variable -- consulted nowhere else -- just makes for an extra layer of abstraction to work through for readers.

@@ +379,5 @@
>   *      fmp = "%4$i, %2$d, %3s, %1d";
>   * the number must start from 1, and no gap among them
>   */
> +static void
> +BuildArgArray(const char *fmt, va_list ap, int *rv, NumArgStateVector& nas)

void is an appropriate return type for functions that can't have errors.  But this function can, and it indicates this through |*rv|.  So what we want -- given that |*rv| is only tested for |*rv < 0| -- is for this function to return true/false for success/failure.
Attachment #8473988 - Flags: review?(jwalden+bmo) → feedback+
Re moving the vector versus otherwise, moving would probably make things safe.  But.  In the ordinary case where relatively few arguments are passed in, the vector's contents are going to be stored on the stack.  Moving from the local elsewhere would have to laboriously copy all of them -- except in the rare case of lots of arguments.  So having the caller pass a reference to the vector to fill up seems sensible.

Re "nuke from orbit", trust me, there's no chance of this code disappearing in the near future.  :-)  I'd expect it to happen no earlier than six months from now, and that's an awfully optimistic lower bound -- more likely this'll be around a year, probably even multiple years.  The costs of the current implementation just don't measure up enough to prioritize it any more strongly than that, at least right now.  And there are a few users that have to be rewritten (I think) before then, anyway.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #25)
> Re moving the vector versus otherwise, moving would probably make things
> safe.  But.  In the ordinary case where relatively few arguments are passed
> in, the vector's contents are going to be stored on the stack.  Moving from
> the local elsewhere would have to laboriously copy all of them -- except in
> the rare case of lots of arguments.  So having the caller pass a reference
> to the vector to fill up seems sensible.

Makes sense.

> Re "nuke from orbit", trust me, there's no chance of this code disappearing
> in the near future.  :-)  I'd expect it to happen no earlier than six months
> from now, and that's an awfully optimistic lower bound -- more likely
> this'll be around a year, probably even multiple years.  The costs of the
> current implementation just don't measure up enough to prioritize it any
> more strongly than that, at least right now.  And there are a few users that
> have to be rewritten (I think) before then, anyway.

I'm both happy and sad to hear that :)

> void is an appropriate return type for functions that can't have errors.  But this function can, and
> it indicates this through |*rv|.  So what we want -- given that |*rv| is only tested for |*rv < 0| --
> is for this function to return true/false for success/failure.

I have now changed `BuildArgArray` to return a `bool` but it looks like `rv` is also used in `dosprintf` and the rest. Shall I change all other `int` returning functions (e.g. dosprintf, GrowStuff, LimitStuff, cvt_l etc.) to return a `bool` or just tie `BuildArgArray` to `dosprintf` with an `if` statement?
I'm attaching a patch while I can still remember for the former case I mentioned, that is only changing the signature of `BuildArgArray` and tying it to the rest in `dosprintf` using `rv = BuildArgArray(fmt, ap, nas) ? 0 : -1;`. Let me know if you like to change the signature of other functions as well.
Attachment #8473988 - Attachment is obsolete: true
Attachment #8481862 - Flags: review?(jwalden+bmo)
(In reply to gokcehankara from comment #26)
> I have now changed `BuildArgArray` to return a `bool` but it looks like `rv`
> is also used in `dosprintf` and the rest. Shall I change all other `int`
> returning functions (e.g. dosprintf, GrowStuff, LimitStuff, cvt_l etc.) to
> return a `bool` or just tie `BuildArgArray` to `dosprintf` with an `if`
> statement?

This would be ideal -- if it's easy, and if all code in all the relevant methods is truly treating the return value as either >= 0, or < 0, with no differentiation of less-than-zero values.  (I only looked as far as this single method, here.)  Boolean return values clearly present less complexity than int-returning methods do, and we should use the former instead of the latter whenever possible.

But no reason to hold up this bug for that work, for sure.
Comment on attachment 8481862 [details] [diff] [review]
1019838.patch - Use Vector for memory management in dosprintf - review 2

Review of attachment 8481862 [details] [diff] [review]:
-----------------------------------------------------------------

I'll make these few fixes locally and push this when I have a few things to push (likely later today).  Thanks!  And sorry for the delay on the review, too many things on my plate right now.  :-(

::: js/src/jsprf.cpp
@@ +404,1 @@
>                      }

SpiderMonkey style is not to brace single-line ifs, so just

if (i > 0)
    return false;

here, and same everywhere else.

@@ +624,1 @@
>      if (rv < 0) {

While this function may (at the moment, still) want to return |rv|, no need for us to assign/test it here.

NumArgStateVector nas;
if (!BuildArgArray(fmt, ap, nas)) {
    // the fmt contains error Numbered Argument format, jliu@netscape.com
    JS_ASSERT(0);
    return -1;
}

Subsequent code that sets/cares about |rv| can do it on its own.

@@ +657,5 @@
>                  i = (i * 10) + (c - '0');
>                  c = *fmt++;
>              }
>  
> +            if (nas[i-1].type == TYPE_UNKNOWN)

Minor pre-existing nitpick, but we'd put spaces around |-| here.
Attachment #8481862 - Flags: review?(jwalden+bmo) → review+
Attached patch Final-ish patchSplinter Review
Actually, one last question -- how should we identify you when we land the patch?  (As included in |hg qrefresh -u| information, and as displayed by |hg outgoing| and in revision metadata.)  Typically it's something like "First Last <email@address>", but going by this bug I only have the latter part of that.

Also, I don't mind it (because I set the r= just before landing), but typically when you post a patch you should include a summary in the patch, and the patch should include the identifying information requested above.  Adding this (modified appropriately) to your .hgrc file (typically in ~/.hgrc on non-Windows systems) should set the username so you don't have to do it manually in the future:

[ui]
username = First Last <email@address>
Flags: needinfo?(gokcehankara)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #29)
> I'll make these few fixes locally and push this when I have a few things to
> push (likely later today).  Thanks!  And sorry for the delay on the review,
> too many things on my plate right now.  :-(

Sounds alright, thanks..

> Actually, one last question -- how should we identify you when we land
> the patch?  (As included in |hg qrefresh -u| information, and as displayed
> by |hg outgoing| and in revision metadata.)  Typically it's something like
> "First Last <email@address>", but going by this bug I only have the latter part of that.

> Also, I don't mind it (because I set the r= just before landing), but typically
> when you post a patch you should include a summary in the patch, and the patch
> should include the identifying information requested above.  Adding this
> (modified appropriately) to your .hgrc file (typically in ~/.hgrc on
> non-Windows systems) should set the username so you don't have to do it manually in the future:

"Gokcehan Kara <gokcehankara@gmail.com>" is fine. Looks like the appropriate `[ui]` tag was already in my `~/.hgrc` but I wasn't giving the necessary flags to `hg refresh`. I will keep that in mind for next time.
Flags: needinfo?(gokcehankara)
https://hg.mozilla.org/integration/mozilla-inbound/rev/94955a351cc7

Ah!  I also have this added to my ~/.hgrc so I don't have to think about adding a flag, ever:

[defaults]
qnew = -U

Thanks for the patch, and for being patient through the discussion of how to fix this!
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #32)
> Thanks for the patch, and for being patient through the discussion of how to
> fix this!

It was my pleasure. Thank you Jeff and Andrew and also others for all the feedback.
https://hg.mozilla.org/mozilla-central/rev/94955a351cc7
https://hg.mozilla.org/mozilla-central/rev/bb9d094bfd93
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Depends on: 1070689
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: