Last Comment Bug 696242 - Pass arrays and lengths separately less often, and pass just the array more often, inferring the array length using C++ template magic
: Pass arrays and lengths separately less often, and pass just the array more o...
Status: RESOLVED FIXED
[mentor=jwalden/Waldo] [lang=c++]
:
Product: Core
Classification: Components
Component: General (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla13
Assigned To: Jacob Holzinger
:
:
Mentors:
Depends on: 455839 arraylength 707498 734384
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-20 15:47 PDT by Jeff Walden [:Waldo] (remove +bmo to email)
Modified: 2012-03-09 07:28 PST (History)
14 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Templatized many of the areas pointed out by Jeff Walden (73.85 KB, patch)
2012-03-05 22:52 PST, Jacob Holzinger
no flags Details | Diff | Splinter Review
Updated patch file without commented code deleted as requested. (64.41 KB, patch)
2012-03-06 00:30 PST, Jacob Holzinger
no flags Details | Diff | Splinter Review
Changes made on mozilla-central code (22.18 KB, patch)
2012-03-06 19:03 PST, Jacob Holzinger
no flags Details | Diff | Splinter Review
Removed comments and added extern to NS_RegisterStaticAtoms (22.04 KB, patch)
2012-03-07 19:46 PST, Jacob Holzinger
jwalden+bmo: review+
Details | Diff | Splinter Review

Description Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-20 15:47:27 PDT
Many methods have signatures like this:

T doSomething(..., U* arr, size_t arrLength, ...);

where |arrLength| is always, or quite often, the length of the constant-length array which was passed as |arr|.  These methods would be improved if they used C++ template magic to infer the length of |arr| from the type of |arr|.  One example where we already do this is nsString::EqualsLiteral:

http://hg.mozilla.org/mozilla-central/annotate/6cd262091470/xpcom/string/public/nsTSubstring.h#l282

We could do this a lot more places.  It'd be safer, and it'd spread the idiom more so that new code would be more likely to use it.

I'm willing to help anyone interested in doing the work for this.  If anyone interested has questions, feel free to ask them, ideally in the bug for the benefit of other contributors.

Additionally, I'll have some blog posts talking about this task, and what spurred it, quite shortly.  I'll mention the links here when I get the posts done.
Comment 1 Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-20 16:08:42 PDT
Blog posts:

https://whereswalden.com/2011/10/20/computing-the-length-or-end-of-a-compile-time-constant-length-array-jsns_array_endlength-are-out-mozillaarraylengthend-are-in/

This post introduces mozilla::ArrayLength and mozilla::ArrayEnd.  It also talks a little bit about their limitations, gotchas, and some workarounds for those.

https://whereswalden.com/2011/10/20/implementing-mozillaarraylength-and-mozillaarrayend-and-some-followup-work/

This post explains how both methods were implemented.  It also discusses the thesis of this bug, linking to a dozen-odd places where it appears to me that we could improve APIs in the way suggested by this bug.
Comment 2 Zack Weinberg (:zwol) 2011-10-20 21:51:59 PDT
Do you mean this to be a tracking bug?
Comment 3 Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-20 22:12:58 PDT
I'm not too concerned about doing it that way, or doing many patches seriatim in a single bug.  There's no area-specific technical complexity motivating separate bugs, but if it offends someone's tidiness that way, separate bugs are fine.  I try not to get too concerned about process in situations like this.  :-)
Comment 4 Zack Weinberg (:zwol) 2011-10-21 09:35:16 PDT
Well, since the CSS scanner thing already has a patch in a bug, I'm'a go ahead and add it as a dependency, then.
Comment 5 David Bradley 2011-10-22 11:02:19 PDT
I would think this array passing pattern is usually found in C++ classes implementing XPCOM interfaces. And so not so easily changed to just passing an array.
Comment 6 Zack Weinberg (:zwol) 2011-10-22 11:29:36 PDT
If xpidl can't handle it, it should be changed.
Comment 7 David Bradley 2011-10-22 12:40:13 PDT
You certainly can change xpidl, but that's a non-trival impact to plugin authors, as well as XPConnect. And adds yet another case where XPCOM can't be implemented via C style functions. You'd need to do something like what was done with the string classes and passing them via XPCOM.

And this is not an argument against doing it, just pointing out some of the heavy lifting that may need to be done.
Comment 8 Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-22 16:56:49 PDT
I'm not interested in the larger problem of XPCOM, at least not immediately.  (And something like this may or may not be the right thing to do for such instances anyway.)  I'm only interested in the existing pure-C++ uses that could be doing something smarter.  There's no heavy lifting needed to do that.
Comment 9 David Bradley 2011-10-23 07:07:05 PDT
Right, but my initial point was that I would expect that to be an infrequent pattern outside of XPCOM classes. My travels through the codebase are dated and not extensive so my expectations may be incorrect.
Comment 10 Jeff Walden [:Waldo] (remove +bmo to email) 2011-10-23 12:29:44 PDT
Per this:

https://whereswalden.com/2011/10/20/implementing-mozillaarraylength-and-mozillaarrayend-and-some-followup-work/

There are around a dozen instances that could probably be fixed without much trouble.  A couple of those are fairly widely used, to the point where you could eliminate 20% or so of all ArrayLength calls if you fixed them.  That seems moderately "frequent" to me, given we're talking about probably 70+ instances.
Comment 11 Jacob Holzinger 2012-01-12 20:36:49 PST
I would like to take this bug, my classmate and I would like to use it for a required project in our Software Engineering class.
Comment 12 Josh Matthews [:jdm] 2012-01-13 09:43:36 PST
Jacob,
Great! Do you have any questions about what is required after reading comments 0, 1, and 10?
Comment 13 Josh Matthews [:jdm] 2012-02-06 12:40:42 PST
Jacob, we've had some interest from other parties about working on this bug. Is this still something you plan to work on?
Comment 14 Jacob Holzinger 2012-02-06 13:10:01 PST
Hi Josh,
Sorry I haven't been able to get back to you, my partner and I are still very interested in working on this bug, we've just been fairly busy with school and other things.

I do believe, however, that we will not be able to work on all the areas that Mr. Walden points out in his initial posts. I will sit down with my partner and discuss exactly what areas we will be working on and let you know asap so that others can work on this as well!

Thanks and sorry for any inconvenience,
Jake
Comment 15 Jeff Walden [:Waldo] (remove +bmo to email) 2012-02-06 13:13:24 PST
Work on any of the individual areas is certainly appreciated even if not all areas get addressed.  Incremental improvements are still improvements.  :-)
Comment 16 Jacob Holzinger 2012-02-23 16:08:18 PST
Okay, I've finally managed some time to sit down and get to work on this bug. I have gone through and read all the information supplied and have started to get into the coding and get my hands dirty, but I'm left with a few questions. This is the first large project that I've worked on and I'm fairly unfamiliar with the code base so I'm a little unsure what exactly needs to be done here.

1) The functions referenced need to be "templatized" in order to obtain the array length at compile time, I assume that the original functions should stay intact in order to remain compatible with existing code, is this correct? Or am I supposed to be modifying the particular areas that call these functions as well?

2) If the former is true should I be modifying the existing function to call the new "templatized" function instead?

3) If the latter is true how would I go about locating the areas that call these functions so I can modify it to call the new "templatized" function?
Comment 17 Jeff Walden [:Waldo] (remove +bmo to email) 2012-02-24 15:49:49 PST
The original *implementation* should remain intact.  But wherever possible (when all callers have a fixed-size array), the original function should no longer be callable.  We don't want people to be able to make a mistake and pass the wrong value.  (This is incompatible, of course; we don't preserve internal compatibility.  Probably we couldn't make the changes suggested here to a public API, but I think every API that we'd want to change this way is internal.)

For example, suppose we had a class like this, and every caller is acting upon a statically-sized array:

  class C
  {
    public:
      void frob(int* arr, size_t length);
  };

  void C::frob(int* arr, size_t length)
  {
    for (size_t i = 0; i < length; i++)
      use(arr[i]);
  }

  int arr[] = { 17, 42, 3 };
  c->frob(arr, ArrayLength(arr));

We want callers to not have to pass an explicit length -- but moreover, we don't want callers to even have the *option* of passing an explicit length, if possible.  If they can pass a length, they might get it wrong.  It'd be better to prevent the explicitly-passed-length version entirely.  In this case you'd make the explicit-length version private, and you'd have a public version that doesn't take a length which calls the private version.

  class C
  {
      void frob(int* arr, size_t length);

    public:
      template<size_t N> void frob(int (&arr)[N])
      {
        return frob(arr, N);
      }
  };

  void C::frob(int* arr, size_t length)
  {
    for (size_t i = 0; i < length; i++)
      use(arr[i]);
  }

  int arr[] = { 17, 42, 3 };
  c->frob(arr, ArrayLength(arr)); // ERROR: private!
  c->frob(arr);                   // WORKS, public

So, yes -- you'd be modifying the particular areas that call these functions as well.  One way to find them would be to simply make the change, then see what breaks.  A better way would be to just do a textual search for all uses:

http://mxr.mozilla.org/mozilla-central/search?string=NS_RegisterStaticAtoms

We have some up-and-coming tools which would do even better than this, letting you distinguish cases where there are multiple implementations of a function, and you only want one.  But they're not ready for prime time yet, so MXR is probably your best bet.  You could also do the textual searches locally, of course, with grep/ack and similar.  But MXR's easiest if you're not familiar with those tools already.

Does that help?  You can also ask questions on IRC if faster turnaround time would be helpful -- the #introduction channel on irc.mozilla.org is a good place to ask questions if you have them.  You can find me as "Waldo" there a fair bit of the time -- during the day on weekdays usually, sometimes also on weekends, depending.
Comment 18 Jacob Holzinger 2012-03-05 22:52:51 PST
Created attachment 603168 [details] [diff] [review]
Templatized many of the areas pointed out by Jeff Walden

This is my first attempt at submitting this patch, the attached OUT_FILE is an hg diff of my file directory (mozilla-release codebase).

Areas modified:
    nsContentUtils::FormatLocalizedString,
    nsCSSScanner::ReportUnexpectedParams,
    nsGenericElement::FindAttributeDependence,
    nsStaticAtom::RegisterStaticAtoms
Comment 19 Josh Matthews [:jdm] 2012-03-05 23:21:15 PST
Woo! That patch is pleasing to the eye. You'll need to remove all of the commented out lines before it gets checked in, though, so you should probably go ahead and do that while waiting for Waldo's review.
Comment 20 Josh Matthews [:jdm] 2012-03-05 23:22:12 PST
Comment on attachment 603168 [details] [diff] [review]
Templatized many of the areas pointed out by Jeff Walden

For future reference, review+ is what the reviewer sets the flag to. When asking for review or feedback, you want the ? value along with the reviewer's email (or bugzilla handle (ie. :Waldo)).
Comment 21 Jacob Holzinger 2012-03-05 23:42:28 PST
Ahhhh I see, thanks for your help! Sorry for the mix-up.
Comment 22 Jacob Holzinger 2012-03-06 00:30:07 PST
Created attachment 603185 [details] [diff] [review]
Updated patch file without commented code deleted as requested.
Comment 23 :Ms2ger (⌚ UTC+1/+2) 2012-03-06 00:35:45 PST
Jacob, could you please base your patch on mozilla-central? It contains a lot of changes that already happened in bug 707498.

Also, for the nsContentUtils functions, the NS_ prefix isn't necessary.
Comment 24 Jacob Holzinger 2012-03-06 00:43:32 PST
Yeah I can definitely do that, is there any documentation on when the NS_ prefix is used? I'm not very familiar with all of the naming conventions.
Comment 25 :Ms2ger (⌚ UTC+1/+2) 2012-03-06 00:46:31 PST
"inconsistently" :) NS_Foo is mostly for functions that aren't in a class, I'd say.
Comment 26 Jacob Holzinger 2012-03-06 19:03:16 PST
Created attachment 603576 [details] [diff] [review]
Changes made on mozilla-central code

Apparently nsGenericElement::FindAttributeDependence was already taken care of in the mozilla-central tree, but the rest of the changes remain.
Comment 27 Jeff Walden [:Waldo] (remove +bmo to email) 2012-03-07 11:59:47 PST
Comment on attachment 603576 [details] [diff] [review]
Changes made on mozilla-central code

Review of attachment 603576 [details] [diff] [review]:
-----------------------------------------------------------------

Good stuff!  Comments follow.

::: content/base/public/nsContentUtils.h
@@ +846,5 @@
>                                          nsXPIDLString& aResult);
> +  
> +  /**
> +   * Template trick to infer array length of a constant length array.
> +   */ 

I wouldn't add this comment.  First, it's something that, the more we use this trick (and this bug of course contemplates using it more), the less a comment is needed.  Second, even assuming someone might need the comment, hg annotate/blame, and similar mechanisms, should point out what's happening.  Third, while the C++ syntax is not the most readable thing in the world, I think given how it's written the reader should be able to make a reasonable guess what's happening.  Fourth, this is such a small part of the code -- not actual algorithms of interest -- that even if someone remained confused, it doesn't seem to be such a great harm.  Do feel free to say something if you think otherwise, tho -- I'm open to considering arguments to the contrary.  :-)

@@ +851,5 @@
> +public:
> +  template<typename T, PRUint32 N>
> +  static nsresult FormatLocalizedString(PropertiesFile aFile,
> +                                        const char* aKey,
> +                                        const T *(&aParams)[N],

Why |const T*| rather than |const PRUnichar*|?  I'm only looking at changes to this file directly right now, but it looks like this method is only ever used with that, so that seems better than extra type-genericity for no gain.  Thus the method would look like this:

  template<PRUint32 N>
  static nsresult FormatLocalizedString(PropertiesFile aFile,
                                        const char* aKey,
                                        const PRUnichar *(&aParams)[N],
                                        nsXPIDLString& aResult)
  {
    return FormatLocalizedString(aFile, aKey, aParams, N, aResult);
  }

::: layout/style/nsCSSScanner.h
@@ +165,5 @@
>    void ReportUnexpectedParams(const char* aMessage,
>                                const PRUnichar **aParams,
>                                PRUint32 aParamsLength);
> +
> +  // Template trick to infer array length of a compile time array.

Again, I'd remove this comment.

@@ +169,5 @@
> +  // Template trick to infer array length of a compile time array.
> +public:
> +  template<typename T, PRUint32 N>                           
> +  void ReportUnexpectedParams(const char* aMessage,
> +                              const T *(&aParams)[N])

This should be able to use PRUnichar directly, rather than T, right?  Just like in nsContentUtils.

::: xpcom/ds/nsStaticAtom.h
@@ +82,5 @@
>  
> +// Template trick to infer array length of a compile time array.
> +template<PRUint32 N>
> +nsresult
> +NS_RegisterStaticAtoms(const nsStaticAtom (&nsAtoms)[N])

Was NS_RegisterStaticAtoms only ever passed an array of fixed size?  If so, we should do what we can to prevent people from using the other overload, that takes a pointer and a length.  Putting this inside this inline and removing the declaration above should do the trick, I think:

extern nsresult
NS_RegisterStaticAtoms(const nsStaticAtom*, PRUint32 aAtomCount);
Comment 28 Jacob Holzinger 2012-03-07 14:06:18 PST
I agree this implementation is pretty straight forward, I was just trying to remain consistent with the commenting on the rest of the code, it can definitely be removed if you'd like. 

As for using T instead of PRUnichar, I initially tried to use PRUnichar but the compiler kept throwing errors at me about not being able to deduce the type, I'm not sure why this was happening but I can try it again when I get home.

On the last part, Yes, I believe only fixed size arrays were passed to NS_RegisterStaticAtoms. So you want me to remove the template function and add the extern modifier to the original definition?? I'm having trouble understanding how this will work.


Finally I have a quick question, what makes 'templatizing' all these functions necessary? couldn't the same thing be achieved with something like this?

void ReportUnexpectedParams(const char* aMessage,
                            const PRUnichar **aParams,
                            PRUint32 aParamsLength);

void ReportUnexpectedParams(const char* aMessage,
                            const PRUnichar **aParams)
{
  return ReportUnexpectedParams(aMessage, aParams, ArrayLength(aParams));
}

I'm just curious really, Thanks.
Comment 29 Jacob Holzinger 2012-03-07 19:46:32 PST
Created attachment 603965 [details] [diff] [review]
Removed comments and added extern to NS_RegisterStaticAtoms
Comment 30 Jeff Walden [:Waldo] (remove +bmo to email) 2012-03-08 16:35:02 PST
(In reply to Jacob Holzinger from comment #28)
> What makes 'templatizing' all these functions necessary? couldn't the same thing
> be achieved with something like this?
> 
> void ReportUnexpectedParams(const char* aMessage,
>                             const PRUnichar **aParams,
>                             PRUint32 aParamsLength);
> 
> void ReportUnexpectedParams(const char* aMessage,
>                             const PRUnichar **aParams)
> {
>   return ReportUnexpectedParams(aMessage, aParams, ArrayLength(aParams));
> }

ArrayLength is a function which takes a reference to an array of specified length.  const PRUnichar** isn't an array of specified length -- it's just a pointer to a pointer to a const PRUnichar.  (Or not even that, if the pointer is NULL, say.)  You can't retrieve the length of an array passed like that -- if indeed it is even an array -- in the C++ type system.  Making the type of the argument a (never-null) reference to an array of specified length is the only way to get the trick to work.  That clear things up?
Comment 31 Jeff Walden [:Waldo] (remove +bmo to email) 2012-03-08 17:18:19 PST
Comment on attachment 603965 [details] [diff] [review]
Removed comments and added extern to NS_RegisterStaticAtoms

Review of attachment 603965 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.  I've made the requested changes locally, and I'm doing a build now to be sure things work.  Once that's done I'll push this to inbound.

::: content/base/public/nsContentUtils.h
@@ +848,5 @@
> +public:
> +  template<PRUint32 N>
> +  static nsresult FormatLocalizedString(PropertiesFile aFile,
> +                                        const char* aKey,
> +                                        const PRUnichar *(&aParams)[N],

It looks like consistency in style would say this should be |const PRUnichar* (&aParams)[N],|.

::: layout/style/nsCSSScanner.h
@@ +168,5 @@
> +
> +public:
> +  template<PRUint32 N>                           
> +  void ReportUnexpectedParams(const char* aMessage,
> +                              const PRUnichar *(&aParams)[N])

const PRUnichar* (&aParams)[N]

::: xpcom/ds/nsStaticAtom.h
@@ +84,5 @@
> +nsresult
> +NS_RegisterStaticAtoms(const nsStaticAtom (&nsAtoms)[N])
> +{
> +    return NS_RegisterStaticAtoms(nsAtoms, N);
> +}

So, what I meant is something like this:

template<PRUint32 N>
nsresult
NS_RegisterStaticAtoms(const nsStaticAtom (&nsAtoms)[N])
{
    extern nsresult RegisterStaticAtoms(const nsStaticAtom*, PRUint32 aAtomCount);
    return NS_RegisterStaticAtoms(nsAtoms, N);
}

And then have |nsresult RegisterStaticAtoms(...)| wherever the current NS_RegisterStaticAtoms is defined.
Comment 32 Jeff Walden [:Waldo] (remove +bmo to email) 2012-03-08 21:03:39 PST
http://hg.mozilla.org/integration/mozilla-inbound/rev/bd1bb076db6e

Are there any other places that can be changed, after this patch and the ones in dependencies?  I don't have too much time to check the list of places I wrote up in comment 1 to see which do and don't apply any more.
Comment 33 Marco Bonardo [::mak] 2012-03-09 05:34:28 PST
https://hg.mozilla.org/mozilla-central/rev/bd1bb076db6e

Note You need to log in before you can comment on or make changes to this bug.