Closed Bug 226439 Opened 21 years ago Closed 20 years ago

Reduce footprint of nsA(C?)String::Equals() applied to ASCII strings

Categories

(Core :: XPCOM, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

References

Details

(Keywords: memory-footprint)

Attachments

(8 files, 8 obsolete files)

8.13 KB, patch
roc
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
2.26 KB, patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
8.00 KB, patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
84.95 KB, patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
390.02 KB, patch
darin.moz
: review-
Details | Diff | Splinter Review
23.02 KB, patch
darin.moz
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
173.56 KB, application/x-tar
darin.moz
: review+
darin.moz
: superreview+
Details
1.23 KB, patch
darin.moz
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
Recently Jonas converted some uses of EqualsIgnoreCase(const char*) to
Equals(NS_LITERAL_STRING(...)), and it added quite a bit of footprint especially
on Linux.

I've implemented new Equals(const char*) methods that do not create a temporary
nsDependentString nor a temporary nsStandardStringComparator. They should get
back most of the footprint bloat --- and then some, if we convert prior
Equals(NS_LITERAL_STRING(...)) methods to use these. Runtime performance may be
a little slower in cases where the C++ optimizer does a really good job on the
existing outer inline Length() check and the strings are mostly not equal. I
doubt we'll notice.
I wrote a script which basically does
s/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/Equals\($2\)/ over the entire
tree. The resulting build seems to work fine :-). I can't measure the
performance impact because it was a debug build, though. I won't attach the
resulting patch since it's absolutely gigantic :-).
Comment on attachment 136051 [details] [diff] [review]
as described

Something to try in 1.7a, but I may as well get it in the queue so I don't
forget about it :-).
Attachment #136051 - Flags: superreview?(dbaron)
Attachment #136051 - Flags: review?(scc)
Keywords: footprint
Comment on attachment 136051 [details] [diff] [review]
as described

You need a new patch for this against the current string code.

I'm not crazy about the method on nsAString -- I think it should probably be
named something like EqualsASCII, since assertions about non-ASCII characters
wouldn't be hit by most developers even if the string were a UTF-8 string
coming from user input.
Attachment #136051 - Flags: superreview?(dbaron)
Attachment #136051 - Flags: superreview-
Attachment #136051 - Flags: review?(scc)
so, we currently have:

  PRBool nsACString::Equals(const char *) const;

and,

  PRBool nsAString::Equals(const char *) const;

sounds a lot like:

  PRBool nsString::EqualsWithConversion(const char *, ... ) const;
Yeah, except that the performance can be good. I'll whip up a new patch with
EqualsASCII on both.
EqualsWithConversion does not inflate the argument before comparing.  It uses
nsStringObsolete.cpp's Compare2to1 method, which is pretty efficient.  It sounds
to me like the two methods should share the same implementation.  I don't mind
if you do away with some of the nsStringObsolete code ;-)
Attached patch woohoo, check this out (obsolete) — Splinter Review
Okay, it's late and maybe I got carried away
Attachment #136051 - Attachment is obsolete: true
Attached patch revised (obsolete) — Splinter Review
it was definitely late, because that patch was off by one.
The name should be changed to EqualsLiteral.

darin: oh. well, the old string guide was incorrect then.

I think the amount of code that could be reused by nsStringObselete is fairly
trivial...
oooh... i like using templates to get the string literal's length.  i wonder if
we are going to run into portability problems with that :-/
Comment on attachment 145257 [details] [diff] [review]
revised

>Index: xpcom/string/public/nsCharTraits.h

>+    compareASCII( const char_type* s1, const char* s2, size_t n )
...
>+            NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");

the same assertion does not exist for the |char_type = char| case.
isn't that inconsistent?


>Index: xpcom/string/public/nsTAString.h

>+        /**
>+         * An efficient comparison with ASCII that can be used even for
>+         * wide strings.
>+         */
>+      NS_COM PRBool EqualsASCII( const char* data, int len ) const;

s/int/size_type/


>+      template<int N>
>+      inline PRBool EqualsASCII(const char (&str)[N]) const {
>+        return EqualsASCII(str, N-1);
>+      }

nit: use SCC-style indentation for consistency:

  template<int N>
  inline PRBool EqualsASCII( const char (&str)[N] ) const
    {
      return EqualsASCII(str, N-1);
    }


>Index: xpcom/string/public/nsTSubstring.h

>+      NS_COM PRBool EqualsASCII( const char* data, int len ) const;
>+      template<int N>
>+      inline PRBool EqualsASCII(const char (&str)[N]) const {
>+        return EqualsASCII(str, N-1);
>+      }

same nits apply here.


r=me w/ those changes.
Attachment #145257 - Flags: review-
Attached patch revised (obsolete) — Splinter Review
revised as per darin's comments.

We ought to prepare the fallback code. What's the cleanest way of selecting
between the template code and some strlen code per-platform?
Attachment #145257 - Attachment is obsolete: true
Attached patch revised^2Splinter Review
This adds an easy way to turn off the use of templates to match literals. I've
built with the templates disabled and that works.
Attachment #145447 - Attachment is obsolete: true
Comment on attachment 145975 [details] [diff] [review]
revised^2

Carrying forward darin's r+. David, if you're OK with this code then I think we
can go ahead and check it in on an otherwise quiet evening and see what
happens...
Attachment #145975 - Flags: superreview?(dbaron)
Attachment #145975 - Flags: review+
Comment on attachment 145975 [details] [diff] [review]
revised^2

The NSCAP_ prefix has always been used only for nsCOMPtr.  Perhaps just use
NS_?

#ifdef-ing includes is a recipe for bustage.  However, nsCharTraits.h already
includes <string.h>, so I think you should just remove those #ifdef-ed includes
completely.

with those changes, sr=dbaron
Attachment #145975 - Flags: superreview?(dbaron) → superreview+
Sure. I'll do those things and check in sometime soon when the tree is quiet and
I have time to babysit. Thanks!
checked in. here we go
Comment on attachment 145975 [details] [diff] [review]
revised^2

>+#ifdef NSCAP_DISABLE_LITERAL_TEMPLATE
>+      inline PRBool EqualsLiteral( const char* str ) const
>+        {
>+          return EqualsASCII(str, strlen(str));
>+        }
>+#else
>+      template<int N>
>+      inline PRBool EqualsLiteral( const char (&str)[N] ) const
>+        {
>+          return EqualsASCII(str, N-1);
>+        }
>+#endif
Do you suppose this template would work on MSVC?
template<class S>
inline PRBool EqualsLiteral( S &str ) const
  {
    return EqualsASCII(str, sizeof str - 1);
  }
I don't know if that will work, but I'd like someone with a VC++ compiler to
tell me.

I'd also like to know whether VC++ 7 can handle the current template.

In the meantime I'm going to convert over a file or two so we can see what the
code size impact is.
Here are a couple of call sites converted to EqualsLiteral. The nsPresShell
patch uses nsAString::EqualsLiteral and the nsObjectFrame patch uses
nsACString::EqualsLiteral. I'll check these two parts in separately so we can
see the effect of each.
Attachment #146955 - Flags: superreview?(dbaron)
Attachment #146955 - Flags: review?(dbaron)
> I'd also like to know whether VC++ 7 can handle the current template.

VC++ 7.1 (as found in Visual Studio .Net 2003 Enterprise) seems to be able to
compile the template. I backed out your latest patch to fix windows tinderbox
bustage. xpcom.dll just built without errors. Dunno about VC++ 7.0 though.
Alright, I'll make the test check _MSC_VER < 1310.
If I were to try this  ...

    char chars[8] = "foo";
    nsCString buf(chars);
    printf("%s\n", buf.EqualsLiteral(chars) ? "true" : "false");

... it would compile without errors. And I would see false on a platform which
is using the template and true on one which not. No?

Documenting how not to use this method might prevent those sorts of abuses. 
Good point, but I think this is not very likely. Anyway I will add some
documentation explaining that you really need to pass a literal string, not a
reference to a variable or any other token.
Comment on attachment 146955 [details] [diff] [review]
experiments for code size

beware that code size experiments with small samples are sometimes misleading
(inlined functions output separately once they're used, perhaps, or things like
that)
Attachment #146955 - Flags: superreview?(dbaron)
Attachment #146955 - Flags: superreview+
Attachment #146955 - Flags: review?(dbaron)
Attachment #146955 - Flags: review+
OK. Stuff checked in. I guess the next step will be to use a script to whack all
of layout and see what effect that has on code size. I'll apply the following
two regular expressions:

s/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/
s/\b== \(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/.EqualsLiteral\($2\)/

There are also instances of '!= NS_LITERAL_STRING' but I'll have to fix those up
by hand.

How much review do we want for these changes?
Results from the Firefox tinderboxes:

Linux redwood: Zdiff:-208 (+0/-208)
MacOSX imola: Zdiff:+16 (+156/-140)
WINNT5 beast: Zdiff:+74 (+304/-230)

beast is using the strlen version. Too bad we don't have a VC++7.1 tinderbox.
Seamonkey tinderboxes:

Linux brad: Zdiff:-336 (+2/-338)
Linux luna: Zdiff:-224 (+0/-224)
That's a whole lot for just 3 string compares.

Maybe we should experiment with making the non-template version non-inline.
This can be used by the non-templated EqualsLiteral, and also by client code
that has a char* or char[] that may not be a literal.

It's even plausible that it would be a win to do away with the template
altogether and just use this. We can test that later...
Comment on attachment 147309 [details] [diff] [review]
provide a fast compare-with-null-terminated-ASCII

Note that I've tested this by converting layout (patch forthcoming) to
EqualsLiteral and then building with the template disabled.
Attachment #147309 - Flags: superreview?(dbaron)
Attachment #147309 - Flags: review?(dbaron)
This used the regexps that I noted plus by-hand conversion of the != operators.
Plus, uses of Equals(NS_LITERAL_(C?)STRING(...)) where the string was not a
literal were converted to EqualsASCII.
Comment on attachment 147310 [details] [diff] [review]
layout conversion

Up to you how much you want to review vs rubberstamp :-)
Attachment #147310 - Flags: superreview?(dbaron)
Attachment #147310 - Flags: review?(dbaron)
Attachment #147309 - Flags: superreview?(dbaron)
Attachment #147309 - Flags: superreview+
Attachment #147309 - Flags: review?(dbaron)
Attachment #147309 - Flags: review+
Attachment #147310 - Flags: superreview?(dbaron)
Attachment #147310 - Flags: superreview+
Attachment #147310 - Flags: review?(dbaron)
Attachment #147310 - Flags: review+
The rough numbers (aaronl checked in a small patch at the same time):

Firefox Linux (redwood): Zdiff:-14126 (+151/-14277)
Firefox MacOSX (imola): Zdiff:-12288 (+720/-13008)
Firefox WinNT5 (beast): Zdiff:-19180 (+1895/-21075)

Seamonkey Linux (brad): Zdiff:-13670 (+1310/-14980)
Seamonkey Linux (luna): Zdiff:-15874 (+194/-16068)

It also appears to have reduced Tp.

Definitely worth converting the rest of the codebase! We should probably do an
AppendASCII/AppendLiteral as well.
More things that should be done after this:

provide AppendASCII/AppendLiteral
provide EqualsASCIICaseInsensitive/EqualsLiteralCaseInsensitive
darin, dbaron, does one of you want to rubber-stamp this?
those tag changes in NSS don't need to be there.
How did you create the patch?  I'd rather review an automated process than
review the patch.
It was mostly done using these commands:

find . -name '*.cpp' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/g'
find . -name '*.h' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/g'
find . -name '*.cpp' | xargs perl -pi -e 's/ ==
NS_LITERAL_(C?)STRING\((\"[^"]*\"\)/.EqualsLiteral\($2\)/g'
find . -name '*.h' | xargs perl -pi -e 's/ ==
NS_LITERAL_(C?)STRING\((\"[^"]*\"\)/.EqualsLiteral\($2\)/g'
find . -name '*.cpp' | xargs perl -pi -e 's/\b(\w+) !=
NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/!$1.EqualsLiteral\($3\)/g'
find . -name '*.h' | xargs perl -pi -e 's/\b(\w+) !=
NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/!$1.EqualsLiteral\($3\)/g'
find . -name '*.h' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(([^\)]+)\)\)/EqualsASCII\($2\)/g'
find . -name '*.cpp' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(([^\)]+)\)\)/EqualsASCII\($2\)/g'

I also hand-edited a few files:
htmlparser/src/nsParser.cpp
parser/htmlparser/src/nsParser.cpp (not sure which is the real one, I edited
them both)
xpfe/components/winhooks/nsWindowsHooksUtil.cpp
-      contentType.Equals(NS_LITERAL_CSTRING(UNKNOWN_CONTENT_TYPE)) ||
+      contentType.EqualsASCII(UNKNOWN_CONTENT_TYPE) ||

you could use EqualsLiteral here.  UNKNOWN_CONTENT_TYPE is just a macro.


in fact, i think there are other cases like this in the patch.  i would
recommend searching the patch for cases where EqualsASCII is used, and fixup as
appropriate.
I would rather use EqualsASCII for the places where it is not clear looking just
at the line that it's a literal. Otherwise we create potential for confusion.
The gain from using EqualsLiteral over EqualsASCII is probably quite small, anyway.
(In reply to comment #44)
> I would rather use EqualsASCII for the places where it is not clear looking just
> at the line that it's a literal. Otherwise we create potential for confusion.
> The gain from using EqualsLiteral over EqualsASCII is probably quite small,
anyway.

why is that more confusing than the code it replaces?

  NS_LITERAL_CSTRING(FOO) -> EqualsLiteral(FOO)
good point. Let's reduce the confusion :-)
Honestly, we could convert those to EqualsLiteral, and I don't really mind if
someone does it, but it's just extra work for dubious performance gain and I'm
not really motivated to do it.
The EqualsASCIIIgnoreCase stuff is hard because xpcom can't depend on intl, so
we can't have functions in xpcom that convert Unicode to lower case. Any
suggestions?
I believe that in any case when we want to be doing EqualsASCIIIgnoreCase we
want to be doing locale-insensitive conversion.  So no intl involved...
So is it true that the only Unicode character that locale-independent lower-case
maps to 'e' is 'E'?
I think it's more like "if we're doing a EqualsASCIIIgnoreCase and we have any
non-ASCII characters, we're not equal".  And then it's a matter of doing a
case-insensitive comparison between two characters that are both in the ascii
range... (still have to do locale-independent because of things like the 'I' and
'i' business in Turkish).
Comment on attachment 147482 [details] [diff] [review]
Assign, Append and EqualsIgnoreCase variants

Seems to work after I converted layout to use these functions.
Attachment #147482 - Flags: superreview?(darin)
Attachment #147482 - Flags: review?(darin)
Comment on attachment 147411 [details] [diff] [review]
drop tag changes in NSS

Darin, you said you'd review or rubber-stamp this?
Attachment #147411 - Flags: superreview?(darin)
Attachment #147411 - Flags: review?(darin)
Comment on attachment 147482 [details] [diff] [review]
Assign, Append and EqualsIgnoreCase variants

>+    static
>+    int
>+    compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n )
>+      {
>+        for ( ; n--; ++s1, ++s2 )
>+          {
>+            NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");
>+            if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)),
>+                              to_int_type(tolower(*s2))) )
>+              return to_int_type(ASCIIToLower(*s1))
>+                - to_int_type(tolower(*s2));
>+          }
>+
>+        return 0;
>+      }
Alternative loop body suggestion:
  if (*s1 & ~0x7F)
    return 1;
  int_type c1 = tolower(*s1);
  int_type c2 = tolower(*s2);
  if (!eq_int_type(c1, c2))
    return c1 - c2;
If you don't like the return 1; then you have the choice of
  return *s1 - *s2;
or
  int_type c2 = tolower(*s2);
  if (*s1 & ~0x7F)
    return *s1 - c2;
  int_type c1 = tolower(*s1);
etc.
Those are fine. It should all come to the same thing really.
You can't use tolower() for ascii comparisons, last I checked -- it's
locale-aware.  See bug 159328 for the havoc that can ensue.
Cripes.

Alright, I'll do it by hand inline in ASCIIToLower.
(In reply to comment #58)
>Alright, I'll do it by hand inline in ASCIIToLower.
Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?
nsCRT::toLower does what you want.
> nsCRT::toLower does what you want.

nsCRT::ToLower isn't inline, unfortunately.

> Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?

That should happen, since ASCIIToLower is an inline method of nsCharTraits.
(In reply to comment #61)
>>Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?
>That should happen, since ASCIIToLower is an inline method of nsCharTraits.
Yes, sorry for overlooking that.
Sorry for the huge delay roc... I do plan to spend some time reviewing these
patches early next week.
Attached patch updated (obsolete) — Splinter Review
Fixes use of tolower as discussed here.

Also, I changed the IgnoreCase methods to assume that the literal/ASCII string
is already lower-case.
Attachment #147482 - Attachment is obsolete: true
(In reply to comment #64)
>Also, I changed the IgnoreCase methods to assume that the literal/ASCII string
>is already lower-case.
That reminds me of an ignore case comparison routine I had in which I knew that
I would never be comparing symbols so I just used (ch | 0x20) to force the
character to lower case.
Note that there are currently callers of EqualsIgnoreCase that does pass strings
with uppercase characters. So I would recommend either a more descriptive name,
or at least some assertions to make sure that that isn't done for these new methods.
Depends on: 242382
Such assertions are in the patch.

When the time comes for mass conversion, it's simple enough to write a regular
expression that only converts for literals that have no upper-case characters.
Comment on attachment 147411 [details] [diff] [review]
drop tag changes in NSS

>Index: editor/composer/src/nsComposerCommands.cpp

>+  PRBool    doingAll = (allStr.EqualsLiteral("all"));

kill excess parantheses.


>Index: editor/libeditor/html/nsHTMLAbsPosition.cpp

>+  PRBool isPositioned = (positionStr.EqualsLiteral("absolute"));

ditto.


>Index: extensions/cookie/nsPermissionManager.cpp
...
>+      if (lineArray[0]->EqualsASCII(kMatchTypeHost) &&

you can use EqualsLiteral here.


>Index: htmlparser/src/CNavDTD.cpp

>-    if(PR_TRUE==aParserContext.mMimeType.Equals(NS_LITERAL_CSTRING(kHTMLTextContentType))) {
>+    if(PR_TRUE==aParserContext.mMimeType.EqualsASCII(kHTMLTextContentType)) {

I think it would be good to use EqualsLiteral for these.  Again, I don't
understand why you are not mapping Equals(NS_LITERAL_...) to EqualsLiteral.
Clearly if the argument was a literal before it will still be a literal
now, so there is no point mapping to EqualsASCII.  Why not use EqualsLiteral
whenever you can?
>>Index: htmlparser/src/CNavDTD.cpp

I just removed the code in question, so it's not something to worry about that
callsite anymore.  ;)
> Again, I don't understand why you are not mapping Equals(NS_LITERAL_...) to
> EqualsLiteral.

OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns.
> OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns.

r+sr=darin with that change.  sorry for not saying that earlier :-(
Comment on attachment 147411 [details] [diff] [review]
drop tag changes in NSS

marking r- to reflect the fact that roc said he would be landing a different
patch.	again, r+sr=me for the version that does a straight conversion from
NS_LITERAL to EqualsLiteral.
Attachment #147411 - Flags: superreview?(darin)
Attachment #147411 - Flags: review?(darin)
Attachment #147411 - Flags: review-
Comment on attachment 147482 [details] [diff] [review]
Assign, Append and EqualsIgnoreCase variants

>Index: xpcom/string/public/nsCharTraits.h

>+    compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n )
>+      {
>+        for ( ; n--; ++s1, ++s2 )
>+          {
>+            NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");
>+            if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)),
>+                              to_int_type(tolower(*s2))) )
>+              return to_int_type(ASCIIToLower(*s1))
>+                - to_int_type(tolower(*s2));
>+          }

in the not-equals case you pay the cost of calling tolower twice. 
wouldn't it be better to compute once and store in the results in a local
variable?

    for ( ; n--; ++s1, ++s2 )
      {
	NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");
	int i1 = to_int_type(ASCIIToLower(*s1));
	int i2 = to_int_type(tolower(*s2));
	if ( !eq_int_type(i1, i2) )
	  return i1 - i2;
      }

it also looks a lot cleaner IMO ;-)


>+    compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 )
...
>+            if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)),
>+                              to_int_type(tolower(*s2))) )
>+              return to_int_type(ASCIIToLower(*s1))
>+                - to_int_type(tolower(*s2));

same idea applies here too.


>+    compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n )
...
>+            if ( tolower(*s1) != tolower(*s2) )
>+              return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2));

same thing here.


>+    compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 )
...
>+            if ( tolower(*s1) != tolower(*s2) )
>+              return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2));

and here.


>Index: xpcom/string/public/nsTSubstring.h

>+      NS_COM void AssignASCII( const char* data, size_type length );
>+      NS_COM void AssignASCII( const char* data );

it seems like these can be simple inline wrappers around the normal
Assign methods in the case where char_type is char.  that has the
advantage of avoiding extra DSO entry points.  no need to be redundant,
right?

same thing applies to the Append variants as well as the methods
declared on nsTAString.

we could also add ReplaceLiteral and InsertLiteral, but maybe those
are less common cases?


>Index: xpcom/string/src/nsTSubstring.cpp


> void
>+nsTSubstring_CharT::AssignASCII( const char* data, size_type length )
>+  {
>+    // A Unicode string can't depend on an ASCII string buffer,
>+    // so this dependence check only applies to CStrings.
>+#ifdef CharT_is_char
>+    if (IsDependentOn(data, data + length))
>+      {
>+        // take advantage of sharing here...
>+        Assign(string_type(data, length));

... and in this codepath you don't even call copyASCII -- and
that's fine -- so, no reason to both implementing AssignASCII
as anything more than an inline wrapper around the normal 
Assign method when CharT_is_char is defined.


overall, this patch looks good.  thanks for the fine work roc!	sorry,
it took me forever to review this :-(


r+sr=darin (with the changes i suggested)
Attachment #147482 - Flags: superreview?(darin)
Attachment #147482 - Flags: superreview+
Attachment #147482 - Flags: review?(darin)
Attachment #147482 - Flags: review+
I'll do all you suggest except:

> it seems like these can be simple inline wrappers around the normal
> Assign methods in the case where char_type is char.  that has the
> advantage of avoiding extra DSO entry points.  no need to be redundant,
> right?

I want to check (in debug builds) that that parameter string really is ASCII,
for consistency with the AString case.

> we could also add ReplaceLiteral and InsertLiteral, but maybe those
> are less common cases?

I think so. That can be future work if merited.

> ... and in this codepath you don't even call copyASCII -- and that's fine

When the parameter is a literal, the IsDependentOn test will fail and we will
always take the copyASCII path. I would like to retain the ASCII path here. I
think the space/time impact of the extra methods will be small. I'm going to go
out on a limb and check it in as is.

If you violently disagree, let me know and I'll fix it up afterward.
Oh wait. I think you reviewed an earlier version of the patch. Attachment 147694 [details] [diff]
has slightly different code, more efficient by assuming the literal is already
lowercase. I'll attach an update with your comments (mostly) taken into account.
Attached patch udpated^2Splinter Review
Attachment #147964 - Attachment is obsolete: true
Comment on attachment 149104 [details] [diff] [review]
udpated^2

one more time :-)
Attachment #149104 - Flags: superreview?(darin)
Attachment #149104 - Flags: review?(darin)
I checked in the conversion of Seamonkey to EqualsLiteral. Here are the final
commands that I used, in case someone wants to convert Firefox code or some
other code:

find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/EqualsLiteral\($2\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ ==
NS_LITERAL_(C?)STRING\(([^)]*)\)/.EqualsLiteral\($2\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\b(\w+) !=
NS_LITERAL_(C?)STRING\(([^)]*)\)/!$1.EqualsLiteral\($3\)/g'
Numbers this time:

Firefox WINNT 5.0 beast Clbr: Zdiff:-15268
Firefox MacOSX Darwin 6.8 imola Dep: Zdiff:-20656 (+19200/-39856) mZdiff:-16608
(+7440/-24048)
Firefox Linux redwood Dep: Zdiff:-30957 (+181/-31138) mZdiff:-23824 (+71/-23895)

Seamonkey Linux brad Clbr: Zdiff:-39175 (+645/-39820) mZdiff:-25456 (+224/-25680)
Seamonkey Linux luna Dep: Zdiff:-40583 (+384/-40967) mZdiff:-25992 (+89/-26081)
Comment on attachment 149104 [details] [diff] [review]
udpated^2

> compareLCASCIIIgnoreCase

i think this function deserves a comment explaining what it does.

you have a comment that mentions EqualsLiteralIgnoreCase, but the code names
the function EqualsLCLiteralIgnoreCase. also, i have to say that i find these
names very cumbersome.	is there no better name that we can come up with?

why don't we just state that all string literals passed to the ignore-case
versions need to be lower case?  then the function names wouldn't need the LC
acronym.

EqualsLiteralIC might be better than EqualsLCLiteralIgnoreCase IMO.
> why don't we just state that all string literals passed to the ignore-case
> versions need to be lower case?

That sounds OK for the Literal versions but for the generic ASCII versions it
seems a bit less suitable. Maybe we just have to deal.

How about just dropping the LC and having EqualsASCIIIgnoreCase,
EqualsLiteralIgnoreCase? EqualsLiteralIC sounds fine but EqualsASCIIIC doesn't...
I'm fine with EqualsLiteralIgnoreCase and EqualsASCIIIgnoreCase.
want me to make a new patch?
Comment on attachment 149104 [details] [diff] [review]
udpated^2

no need for a new patch.  just drop the LC everywhere, and document the cases
where the input string must be lowercase.

r+sr=darin with that.
Attachment #149104 - Flags: superreview?(darin)
Attachment #149104 - Flags: superreview+
Attachment #149104 - Flags: review?(darin)
Attachment #149104 - Flags: review+
Hmm, that's bad. Thanks for reminding me.

How about LowerCaseEqualsASCII, LowerCaseEqualsLiteral?

foo->LowerCaseEqualsLiteral("womble")
foo->LowerCaseEqualsASCII(str)
> Hmm, that's bad. Thanks for reminding me.

yeah, i guess so... though we do have an assertion ;-)


> foo->LowerCaseEqualsLiteral("womble")
> foo->LowerCaseEqualsASCII(str)

does that imply:

  foo->LowerCaseEqualsLiteralIgnoreCase("womble");
  foo->LowerCaseEqualsASCIIIgnoreCase(str);

my fingers hurt :-/
> does that imply:
>
>  foo->LowerCaseEqualsLiteralIgnoreCase("womble");
>  foo->LowerCaseEqualsASCIIIgnoreCase(str);

Nope, LowerCaseEqualsLiteral and LowerCaseEqualsASCII are the full names

In case I wasn't the only one who didn't get it:

  "LowerCase of |this| equals some literal"

I think I like these function names a lot.
OK, last step: convert codebase to use these functions.

Here's what I did:

# Update obvious calls to Append and Assign
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAppend\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/AppendLiteral\($2\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAssign\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/AssignLiteral\($2\)/g'

# Change Append/AssignWithConversion of a literal to Append/AssignLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAppendWithConversion\(("[^"]*")\)/AppendLiteral\($1\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAssignWithConversion\(("[^"]*")\)/AssignLiteral\($1\)/g'

# Update simple uses of operator+= and operator= to Append/AssignLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ \+=
NS_LITERAL_(C?)STRING\(([^)]*)\);/.AppendLiteral\($2\);/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ =
NS_LITERAL_(C?)STRING\(([^)]*)\);/.AssignLiteral\($2\);/g'

# Convert EqualsIgnoreCase of literals to LowerCaseEqualsLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bEqualsIgnoreCase\(("[^"]*")\)/"LowerCaseEqualsLiteral\(".lc($1)."\)"/eg'

# Convert Equals(NS_LITERAL..., nsCaseInsensitiveStringComparator()) to
LowerCaseEqualsLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -0 -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(("[^"]*")\),[
\n\t]*nsCaseInsensitiveStringComparator\(\)\)/"LowerCaseEqualsLiteral\(".lc($2)."\)"/egs'

# Fix up <type> <identifier>.AssignLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/String
(\w+)\.AssignLiteral/String $1; $1\.AssignLiteral/g'

# Manually restore calls to nsIFile::Append in nsJVMConfigManagerUnix.cpp and
# nsLayoutStyleSheetCache.cpp, CBrowserShell.cpp, nsEmbedGlobalHistory.cpp,
# mozPersonalDictionary.cpp, mozMySpell.cpp, myspAffixMgr.cpp, pyloader.cpp,
# testXalan.cpp, nsBayesianFilter.cpp, nsBookmarksService.cpp, winEmbed.cpp

I'll attach the resulting patch.
Attached file tada!
One thing to consider is whether LowerCaseEqualsLiteral makes the code less
readable by forcing the RHS to be lower-cased.

This patch by no means covers all the cases that can take advantage of the new
functions. It's the easily-automated low hanging fruit.

Note that this file is gzipped because the uncompressed patch is too big for
Bugzilla's little mind.
Comment on attachment 150174 [details]
tada!

here you go. by my count, 2252 call sites were changed. You may wish to just
review the script :-)
Attachment #150174 - Flags: superreview?(darin)
Attachment #150174 - Flags: review?(darin)
Ignore the Makefile change in that patch. I won't check it in.
Hum. nsCaseInsensitiveStringComparator is Unicode aware so it's not generally
correct to replace it with LowerCaseEqualsASCII. I will take out those changes.
Another option would be to make LowerCaseEqualsASCII on wide strings be Unicode
aware. It doesn't look that hard. What is the rationale for having the case
converter be an XPCOM service?
Hmm. We only care about Unicode characters that can lowercase-map to ASCII.
Looking at the Unicode data files to see what Unicode characters that means, I find:

-- I-with-dot maps to ASCII i followed by a combining dot; the combining dot is
not ASCII so it can't match.
-- Similarly for Lithuanian accented I and J
-- In Turkic and Azeri locales, I-with-dot maps to plain ASCII i, but would we
really want this comparison to be locale dependent? Argh. I don't think we do it
locale ind
-- 0x212A, KELVIN SIGN, maps to ASCII k

I18N is fun!

Looking at the current case converter (and assuming that there's not some magic
alternative service that might be selected at run time; if so it doesn't appear
to be in our tree), we map I-with-dot to plain ASCII i and KELVIN SIGN to ASCII
k, and that's it. So it would be trivial to have LowerCaseEquals code preserve
this behaviour (in fact the change will go in
nsCharTraits<PR_Unichar>::ASCIIToLower). I'll do that, and leave the
mass-API-update patch as is.
(In reply to comment #97)
> Hmm. We only care about Unicode characters that can lowercase-map to ASCII.

wouldn't it be easier to map the ASCII string to unicode, in the lower- and
uppercase variant? for ASCII, it seems that it's trivial to find the uppercase
equivalent in basically all cases...

(unless you want i to uppercase to I-with-dot in the turkish locale, which I
doubt you do)
(In reply to comment #98)
> wouldn't it be easier to map the ASCII string to unicode, in the lower- and
> uppercase variant? for ASCII, it seems that it's trivial to find the uppercase
> equivalent in basically all cases...

I don't know what you mean.

The plan in my previous comment is simple and will preserve current behaviour.
please make sure the i18n people are in on this. They might have plans to add
support for more characters.
From my reading of the Unicode tables, there really is no other significant case
unless we apply specific rules for Lithuanian locales, but CCing smontagu anyway
just in case :-)
As described. This should guarantee that LowerCaseEqualsLiteral(...) (applied
to a lowercase literal) behaves identically to Equals(...,
nsCaseInsensitiveStringComparator()).
Attachment #150243 - Flags: superreview?(darin)
Attachment #150243 - Flags: review?(darin)
I don't think that these changes are valid. The old way was correct and is what
would be expected from a function called 'ASCIIToLower'. Since you are
explicitly only lowering ascii characters, and the literals are only supposed to
be ASCII characters, the comparison should fail when matching anything other
than ascii characters. 
Maybe we should change the name of that function from ASCIIToLower to something
else. But the comment indicates what it's supposed to do and it does do what
it's supposed to do.
Are you sure it's wise to add i18n code to xpcom?  What if we learn that our
uconv code needs to be changed... will the folks making those changes know to
update xpcom?  Why not implement the nsSubstring::LowerCaseEqualsLiteral symbol
outside of xpcom?  Why not put it inside unicharutils along side
nsCaseInsensitiveStringComparator?  It seems reasonable to me to force consumers
to link to unicharutils to get this functionality.
It's only about six lines of code, based directly on the Unicode spec, unlikely
to ever need changing. I don't think it's worth the pain of moving those lines
out into unicharutils, nor the complexity of having nsSubstring split across two
DLLs.
(In reply to comment #99)
> I don't know what you mean.

I meant: Instead of taking the unicode character and converting it to lowercase,
you could take the ASCII character, and convert it to unicode (which is as
simple as a cast), if that doesn't match uppercase the ascii character and
convert it to unicode and compare. that way, you need not worry about unicode
lowercasing rules, I think.
biesi's suggestion, clarified on IRC, amounts to exactly what the currently
checked in code does.
> It's only about six lines of code, based directly on the Unicode spec, unlikely
> to ever need changing.

the unicode spec isn't static but changes from version to version with new
characters added.
Re: comment 105

If this is to be included in xpcom, then comments should be added to both this
location and the appropriate uconv location so that updates in one place will
find the other.
(In reply to comment #109)
> > It's only about six lines of code, based directly on the Unicode spec,
> > unlikely to ever need changing.
> 
> the unicode spec isn't static but changes from version to version with new
> characters added.

The chances that the Unicode people add some hitherto-obscure characters which
get downcase-mapped to ASCII are extremely low, especially considering there are
only two such characters today. The chances that accidentally failing to handle
 case-insensitive comparision of such characters would be noticeable, let alone
regarded as a serious issue, are even more negligible. In any case I can add
comments to the unicharutils code pointing to the nsCharTraits code. Can we just
do the simple thing to make all the Equals(...,
nsCaseInsensitiveStringComparator()) calls turn into nice fast code, and close
this bug out?
Comments are sufficient as far as I'm concerned.
Comment on attachment 150243 [details] [diff] [review]
fix nsCharTraits<PR_Unichar> for full I18N correctness

r+sr=darin

Please add comments to this code regarding the Unicode to ASCII conversion. 
Also, there's no reason to put an "else" after "return 'k';"

I don't need to see another patch for comments, but please don't hesitate to be
verbose in the comments here and in intl/ :-)
Attachment #150243 - Flags: superreview?(darin)
Attachment #150243 - Flags: superreview+
Attachment #150243 - Flags: review?(darin)
Attachment #150243 - Flags: review+
Comment on attachment 150174 [details]
tada!

well done!

r+sr=darin
Attachment #150174 - Flags: superreview?(darin)
Attachment #150174 - Flags: superreview+
Attachment #150174 - Flags: review?(darin)
Attachment #150174 - Flags: review+
all checked in. Not much Tp impact, still waiting for Tinderbox size numbers.
Thanks all!
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
http://www.mozilla.org/projects/xpcom/string-guide.html
will need to be updated for these changes.
Reduced codesize by about 70K on brad.
Yes, I should update the string guide, but I'm not sure how.
I've got an alternative to the template based solution for EqualsLiteral and
family. The current code breaks VC++ 2003 anyway. The existing solution is a bit
edgy template wise. The solution I propose is a bit more main stream, I believe
and will allow this optimization to be handled on more compilers. I'm not
touching the NS_DISABLE_LITERAL_TEMPLATE macro at this point, but I suspect it
can be less restrictive if not removed all together. People with the various
platforms need to test this out. Patch coming up.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Alternative template solution (obsolete) — Splinter Review
Oh, and swalker reports that VC++8 won't compile this either.
If you do

const char* foo = "blah";
bork.EqualsLiteral(foo)

does it compile?
Oh, so were you using this template mechanism to weed out passing string 
litterals versus by char pointers

I think I found what was tripping up the VC++ 2003. It was the use of the 
terinary operator in the nsIWin32LocalImpl.cpp. So this didn't cause the 
problem at all.

Putting this back to FIXED, I'll provide a patch in another bug for the 
nsIWin32LocalImp
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Attachment #162659 - Attachment is obsolete: true
Component: String → XPCOM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: