Closed Bug 74786 Opened 24 years ago Closed 22 years ago

String cleanup

Categories

(Core :: XSLT, defect, P5)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: axel, Assigned: peterv)

References

Details

Attachments

(1 file, 11 obsolete files)

19.41 KB, patch
axel
: review+
Details | Diff | Splinter Review
After the recent mozilla string landing, it's time to take care about the
transformiix strings once again.
Good reasons:
code size, remove unused code
performance, less stack frames (, and closer signature to nsString?)

I plan to do this in two steps:
propose which methods go, and if new should come
cut down TxStrings.* and sanitize MozillaString.cpp

First issue I'd like to raise is the String::charAt() function. This might
end up in a expensive method in the current stringclasses, so I propose going
for iterators instead.
Those would give simple wrappers, and I like the idea of iterators in the lexer
anyway.
See http://lxr.mozilla.org/seamonkey/source/string/public/nsStringIterator.h 
for something we might want to have.
Do we want writing iterators? or is .replace() good for now?
This is called in CharacterData and StringFunctionCall.cpp, the looks like
it's loving writing iterators, the CharacterData::replaceData replaces only
strings, not single chars, so I would say keep that replace function.

I would like to get rid of all but one lastIndexOf, probably this (as the
replace function) for the standalone only.

input? more folks to CC?

Axel
Priority: -- → P5
moving this bug to be about the numerous patches to move from String to
nsA(Flat)String.
First, get the cast operators to return nsAFlatString, so that we can move some
interfaces to nsAFlatString instead of nsAString, cause their iterators are much
cheaper.
Status: NEW → ASSIGNED
Depends on: 157142
Summary: TxString class needs cleanup → String cleanup
Mike, could you test attachement 108749 on OS/2 or AIX?
It changes casts, and the IBM compilers are the most picky ones in that respect.
thanx. It just needs a rebuild in extensions/transformiix
Attachment #108749 - Flags: superreview+
Attachment #108749 - Flags: review?(bugmail)
Comment on attachment 108749 [details] [diff] [review]
change cast operator to nsAFlatString

r=sicking
Attachment #108749 - Flags: review?(bugmail) → review+
Nope.

First error:

Nope. First error:

D:/builds/current/mozilla/extensions/transformiix/source/base/Double.cpp(177:36)
 : error EDC3226: Object of type "NS_LossyConvertUCS2toASCII" cannot be construc
ted from "const String" expression.
Micheal, could you try this one, too?
It just adds the cast operators to nsAFlatString and leaves the ones to
nsAString
intact. Not sure if this will make your compiler think it has ambiguous casts.

I tested it on windows.
Attachment #108749 - Attachment is obsolete: true
Yep that compiles no problem
Comment on attachment 108767 [details] [diff] [review]
add nsAFlatString casts instead of changing the nsAString ones

carrying r=Jonas, sr=peterv from attachement 108749
Attachment #108767 - Flags: superreview+
Attachment #108767 - Flags: review+
Comment on attachment 108767 [details] [diff] [review]
add nsAFlatString casts instead of changing the nsAString ones

suckage, gcc 2.96 can't cope with this.
backed out of the tree, we have to go different paths :-(
Attachment #108767 - Attachment is obsolete: true
Taking this while I hunt String.
Assignee: axel → peterv
Status: ASSIGNED → NEW
Attached patch Rename API (part 1) (obsolete) — Splinter Review
This renames most of the API to be in sync with Mozilla strings. This was done
with regexps and s///, plus some manual tweaks to get it to compile. This is
just part 1.
Comment on attachment 109777 [details] [diff] [review]
Rename API (part 1)

I got r=sicking, rs=jst on this.
Attachment #109777 - Flags: superreview+
Attachment #109777 - Flags: review+
Status: NEW → ASSIGNED
I killed most of String, but I'd like to do some DOM cleanup first. Attaching
two patches, one for stuff part of the mozilla build for which I'll get r=/sr=,
the other for stuff part of the standalone build for which I'll only get r=. 
Attachment #110861 - Flags: review?(bugmail)
Comment on attachment 110861 [details] [diff] [review]
DOM API cleanup (Stuff part of the default build)

Couldn't you remove the ProcessingInstruction class we well?

r=sicking anyhow
Attachment #110861 - Flags: review?(bugmail) → review+
Comment on attachment 110862 [details] [diff] [review]
DOM API cleanup (Stuff for standalone)

stuff looks good, and compiles. it even runs ;-). I still like to see a change
in how attribute values are done, but that's not this patch.
r=axel@pike.org
Attachment #110862 - Flags: review+
Attachment #110861 - Flags: superreview?(jst)
Comment on attachment 110861 [details] [diff] [review]
DOM API cleanup (Stuff part of the default build)

sr=jst
Attachment #110861 - Flags: superreview?(jst) → superreview+
Attachment #109777 - Attachment is obsolete: true
Attachment #110861 - Attachment is obsolete: true
Attachment #110862 - Attachment is obsolete: true
Attached patch Remove String from module (obsolete) — Splinter Review
Three patches coming up. I'm really sorry for the size. The first is the patch
for module, will need r/sr, the second will be for standalone, will need r and
the third is just the files that are being removed.
I'd appreciate any help I can get to get this into 1.3b.
Attached patch Remove String from standalone (obsolete) — Splinter Review
Comment on attachment 111171 [details] [diff] [review]
Files removed as part of String removal

r=me

Hey, i'm fast :-)
Attachment #111171 - Flags: review+
General comments:

There are a lot of arguments that are nsAFlatString& where I probably would have
just used nsAString&s. Some of these force you to use PromiseFlatString. However
my string-fu isn't strong enough to say which is better.

In a lot of places where you use FindChar we should use iterators, however i'm
more then fine with leaving that to a later patch.

XMLUtils::getLocalPart and XMLUtils::getPrefix could combined into a single
function that produces two atoms. Again, that's for later.

There's a few calls to aPs->receiveError that are much longer then 80 columns,
you might wanna add some newlines.


in source/base/txURIUtils.h:
-    static void getDocumentURI(const String& href, String& docUri);
+    static void getDocumentURI(const nsAString& href, nsAString& docUri);

You don't implement this function.

-    static void getFragmentIdentifier(const String& href, String& frag);
I'm not so sure that we wanna kill getDocumentURI and getFragmentIdentifier. I
suspect that in the compiled-stylesheets world we will have more then a single
point where we get documents (i.e. no ProcessorState::retrieveDocument).

Though at that point i guess we could just write the logic needed instead of
using the functions, or we could reimplement the functions then. Your call.

-void XMLUtils::getLocalPart(const String& src, String& dest)
+const nsAString& XMLUtils::getLocalPart(const nsAString& src)

This looks dangerous, won't the |return Substring(...| cause an
nsDependantString to be create, and then you return a reference to that, and
then it goes out of scope? Shouldn't you return an nsDependantString _object_
instead?


in source/xml/XMLUtils.h
-    txAtom* mLocalName;
+    nsCOMPtr<nsIAtom> mLocalName;

did you make sure that we don't do
|myExpandedName.mLocalName = foo; NS_IF_ADDREF(foo);| anywhere? Or
|myExpandedName.mLocalName = TX_GET_ATOM(foo);|

+    /**
+     * Returns true if the given node's value has only whitespace characters
+     */
+    static PRBool isWhitespace(Node* aNode);

please say something about that it is the DOM-nodevalue and not the
XPath-nodevalue that is being used. I.e. that for elements we don't check the
nodevalue of the descendant-textnodes.


in source/xml/dom/mozImpl/MozillaNode.cpp:
+nsresult Node::getBaseURI(nsAString& aURI)

clear the string if the QI fails

in source/xpath/ExprLexer.cpp:
-        if (pattern.subString(start,iter,subStr).Equals(AND))
+        if (TX_StringEqualsAtom(Substring(pattern, start, iter),
+                                txXPathAtoms::_and))
can you remove the |subStr| variable compleatly?

-      addToken(new Token(pattern.subString(start,iter,subStr),defType));
+      addToken(new Token(Substring(pattern, start, iter),defType));
whitespace

-      addToken(new Token(pattern.subString(start,iter,subStr),Token::NUMBER));
+      addToken(new Token(Substring(pattern, start, iter),Token::NUMBER));
whitespace


in source/xpath/ExprResult.h:
+    nsString& getStringRef()
+    {
+        return value;
+    }
I would have just made |value| public (and renamed it mValue)

in source/xslt/XSLTProcessor.cpp:
+    txHTMLOutput::AddRefTables();
What's this? It's not implemented anywhere.

+    if (!stylesheetStack.EnumerateBackwards(FindUri, NS_CONST_CAST(nsAString*,
&aHref))) {

eek, i don't like const-cast. However i can't really see any better way :(

-        aPs->getEnteredStylesheets()->pop();
+        stylesheetStack.RemoveStringAt(stylesheetStack.Count());

shouldn't that be .Count()-1 ?  (same thing at two places)


+                if (attValue.Equals(NS_LITERAL_STRING("html"))) {

You could use TX_StringEqualsAtom, but i'm not sure which is best. Which raises
the question of weather all TX_StringEqualsAtom should be replaced with
.Equals(NS_LITERAL_STRING);


in source/xslt/txPatternParser.cpp:
-                String& name = aLexer.nextToken()->value;
-                txAtom* nameAtom = TX_GET_ATOM(name);
+                nsCOMPtr<nsIAtom> nameAtom =
TX_GET_ATOM(aLexer.nextToken()->value);

This will leak, use do_GetAtom

nevermind the txHTMLOutput::AddRefTables comment, i forgot there was another
patch :(
I would prefer txHTMLOutput::AddRefTables/txHTMLOutput::ReleaseTables as
txHTMLOutput::init txHTMLOutput::shutdown since that is how we do other
functions like that. ::init (or ::AddRefTables if you wanna keep it as is)
should be able to return an error in case of OOM.

+            if (target.Equals(NS_LITERAL_STRING("xml-stylesheet")) ||
+                target.Equals(NS_LITERAL_STRING("xml:stylesheet"))) {
eek! do we support xml:stylesheet on standalone!?!  mind removing that while
you're there?


+  while (start != end) {
+    SKIP_WHITESPACE(start, end);
+    iter = start;
+    SKIP_ATTR_NAME(iter, end);
+
+    // Remember the attr name.
+    const nsAString & attrName = Substring(start, iter);
This doesn't look safe, when will the returned substring go out of scope?
|nsDependentSubstring attrName = ...| should be safer.

with that, r=sicking
(comment 24 is for the module-patch, comment 25 is for the standalone patch)
bah, i ment comment 23 is for... aahh whatever, you know what i mean
general C++, assigning temps to const refs is fine. At least I recall reading 
that.

Standalone comments:

parseCommandline doesn't need to start at 0

main:
make that sourceOption instead of option, like styleOption

+        // XXX TODO: Handle multiple stylesheets
do we want that this way?
One could argue to have multiple inputs, like three inputs, three outputs
and one stylesheet to feed three sources thru one stylesheet and three
stylesheets to feed them thru each corresponding one.
Or should we say "feed many sources thru one processor" and init that
with as many stylesheets as given?

-nsresult txExpandedName::init(const String& aQName,
+nsresult txExpandedName::init(const nsAFlatString& aQName,

nsAString, as you don't need flatness here.

txStandaloneXSLTProcessor::parseStylesheetPI should take a
nsAFlatString, and use const_char_iterators.

standalone says that the functions in XMLUtils should stay separate, and
we need the string returning localname.

General perf notes, so that I don't forget them.

TX_GET_ATOM_STRING needs a separate patch, we're doing copies that ain't needed.
There are places with nsString() where we could have either other
constructors (and not a default argument)(txAttribute is one candiate)
and some which create txCaseInsensitiveStringComparator often, that
could be a member.
A few loops in the standalone output handlers need attention if they
should get a PromiseFlatString and char_iterators.
-double Double::toDouble(const String& aSrc)
+double Double::toDouble(const nsAFlatString& aSrc)

this needs just nsAString, as we're using a chunked sink

Index: source/base/primitives.h
+class nsAFlatString;
not needed, see above

txURIUtils.cpp

+    while (iter > 0) {
+        if (href[--iter] == HREF_PATH_SEP) {
+            dest.Append(Substring(href, 0, iter));
+            break;
         }
     }
you should do the 
const_char_iterator temp; 
href.BeginReading(temp);
at the start and then use temp[--iter].
BeginReading is costly.

I'm fine with the nsAFlatStrings here.

XMLUtils

{txExpandedName::init(const nsAFlatString& aQName,
needs only nsAString, AFAICT.
make the isValidQName call a promise, until we fix that.
(as CharAt is still too expensive on nsASingleFragmentString,
due to BeginReading.)}
this can be done when we do those right. we should use
copy_string and just iterate over the string once.

There is a pattern to do
+            nsAutoString err(tok->value);
             err.Append(NS_LITERAL_STRING(" not implemented."));
             return new StringExpr(err);
which should read
return new StringExpr(tok->value + NS_LITERAL_STRING(" not implemented."));

saves us a copy.

StringResult.cpp

 StringResult::StringResult(const nsAString& str) {
     //-- copy str
-    this->value.Append(str);
+    this->value = str;
 } //-- StringResult

make that a 
 StringResult::StringResult(const nsAString& str) : value(str)
{
}
?

nsXPathResult.h
-        String* mStringValue;
+        nsString* mStringValue;
nsAString* ? then we can choose a different string impl later.

txNameTest::toString

PRUnichar* prefix;
mPrefix->GetUnicode(&prefix);
aDest.Append(nsDependentString(prefix));

txXSLTProcessor.cpp:
+    if (!stylesheetStack.EnumerateBackwards(FindUri, NS_CONST_CAST(nsAString*,
&aHref))) {
should probably use &txXSLTProcessor::FindUri, looking at the latest 
bustage of BSD and OS/2

txPatternParser::createPattern
+                nsCOMPtr<nsIAtom> nameAtom =
TX_GET_ATOM(aLexer.nextToken()->value);
tststs, do_GetAtom.
void TX_ToLowerCase(nsAString& aString);
 instead of
void TX_ToLowerCase(const nsAString& aString);
is required to compile standalone.
Parser needs
        Node* node = ps->document->createTextNode(Substring((const PRUnichar*)s,
(const PRUnichar*)s + len));
instead of
        Node* node = ps->document->createTextNode(nsDependentString((const
PRUnichar*)s, len));

test.xml is broken at not(position()=3)
the |Substring|s in ExprLexer are all wrong. Substring takes string, start, length
or start and end iterator. Yeehaw.
I tried atom leak stats, and test.xml is now leaking COLOR and xsl, no idea why.
Other tests seem to be fine.
xalan on standalone:
regressions: 15
fixes: 5
(4 of the regressions are string functions, I still compare to nonXPCOM 
standalone)

> Which raises the question of weather all TX_StringEqualsAtom should be
> replaced with .Equals(NS_LITERAL_STRING);
dare. NS_LITERAL_STRING is expensive, at least on those platforms where 
wstrings suck.

> There are a lot of arguments that are nsAFlatString& where I probably would have
> just used nsAString&s. Some of these force you to use PromiseFlatString. However
> my string-fu isn't strong enough to say which is better.

I found one instance where nsAFlatString was wrong (Double::toDouble).
The others are ok. txExpandedName shouldn't take nsAFlatString, but fixing
that to use copy_string is for later.

> In a lot of places where you use FindChar we should use iterators, however i'm
> more then fine with leaving that to a later patch.
not. nsAString::FindChar is as good as it's gonna get.

> XMLUtils::getLocalPart and XMLUtils::getPrefix could combined into a single
> function that produces two atoms. Again, that's for later.
another function is fine, but leave it alone right now. 
with these changes, I run xalan without (new) regressions on solaris
standalone.
It doesn't cover all comments on the patches.
Beware of Double.cpp, that patch applied funny to the trunk. (which holds a 
string sink landing to Double::toDouble.)
regressions: 4 (string functions with empty second argument, known)
fixes: 5
axes95 now succeeds
axes99 now succeeds
namedtemplate12 now succeeds
string130 now succeeds
output105 now succeeds

Note, I did not run error tests, as always. Jonas comments suggest that we 
should. (I can imagine the Count()-1, for example)
>in source/xml/XMLUtils.h
>-    txAtom* mLocalName;
>+    nsCOMPtr<nsIAtom> mLocalName;
>
>did you make sure that we don't do
>|myExpandedName.mLocalName = foo; NS_IF_ADDREF(foo);| anywhere? Or
>|myExpandedName.mLocalName = TX_GET_ATOM(foo);|

I think so, yes.

> parseCommandline doesn't need to start at 0

I don't understand what you mean.

> main:
> make that sourceOption instead of option, like styleOption

But I use it for output and source.

> +        // XXX TODO: Handle multiple stylesheets
> do we want that this way?

Not clear yet. I would like the options of:

one source, many stylesheets
many sources, one stylesheet
many sources, many stylesheets (two options here actually)

>-nsresult txExpandedName::init(const String& aQName,
>+nsresult txExpandedName::init(const nsAFlatString& aQName,
>
>nsAString, as you don't need flatness here.

I do since XMLUtils::isValidQName takes nsAFlatString.

>txURIUtils.cpp
>
>+    while (iter > 0) {
>+        if (href[--iter] == HREF_PATH_SEP) {
>+            dest.Append(Substring(href, 0, iter));
>+            break;
>         }
>     }
>you should do the 
>const_char_iterator temp; 
>href.BeginReading(temp);
>at the start and then use temp[--iter].
>BeginReading is costly.

Like this? I don't really see how it is better.

    nsAFlatString::const_char_iterator temp;
    href.BeginReading(temp);
    PRUint32 iter = href.Length();
    while (iter > 0) {
        if (temp[--iter] == HREF_PATH_SEP) {
            dest.Append(Substring(href, 0, iter));
            break;
        }
    }

>txXSLTProcessor.cpp:
>+    if (!stylesheetStack.EnumerateBackwards(FindUri, NS_CONST_CAST(nsAString*,
>&aHref))) {
>should probably use &txXSLTProcessor::FindUri, looking at the latest 
>bustage of BSD and OS/2

Why? Which bustage?
Attached patch Remove String from module (obsolete) — Splinter Review
Attachment #111169 - Attachment is obsolete: true
Attached patch Remove String from standalone (obsolete) — Splinter Review
Attachment #111170 - Attachment is obsolete: true
Attachment #111346 - Attachment is obsolete: true
Attachment #111516 - Flags: review?(bugmail)
Attachment #111517 - Flags: review?(axel)
+    while (*theAtts) {
+        newElement->setAttribute(nsDependentString((const PRUnichar*)*theAtts++),
+                                 nsDependentString((const PRUnichar*)*theAtts++));
     }
needs to create the dependent strings before hand. At least win32 is creating
them in the wrong order, so the increments and the strings get messed up.
My tree has
    while (*theAtts) {
        const nsAString& depname =
            nsDependentString((const PRUnichar*)*theAtts++);
        const nsAString& depvalue =
            nsDependentString((const PRUnichar*)*theAtts++);
        newElement->setAttribute(depname, depvalue);
    }

XSLTProcessor and ProcessorState have each two NS_LITERAL_STRING("\'"), 
guess you meant NS_LITERAL_STRING("'")

with those changes, standalone compiles and runs on win32. The changes between
the patches look fine otherwise.
Ran buster testsuite on standalone and module.
Module: 0 regressions/3 fixes
Standalone: 0 regressions/1 fix

(The standalone numbers don't look too good but that's unrelated to this patch:
successes 819/failures 687)
Comment on attachment 111517 [details] [diff] [review]
Remove String from standalone

r=axel@pike.org with the changes I had above.
Attachment #111517 - Flags: review?(axel) → review+
Comment on attachment 111516 [details] [diff] [review]
Remove String from module

change "XPath" to "DOM" in the comment for isWhitespace.

r=sicking
Attachment #111516 - Flags: review?(bugmail) → review+
Attachment #111171 - Flags: superreview?(jst)
Attachment #111516 - Flags: superreview?(jst)
Comment on attachment 111171 [details] [diff] [review]
Files removed as part of String removal

sr=jst
Attachment #111171 - Flags: superreview?(jst) → superreview+
Comment on attachment 111516 [details] [diff] [review]
Remove String from module

sr=jst
Attachment #111516 - Flags: superreview?(jst) → superreview+
*** Bug 74985 has been marked as a duplicate of this bug. ***
Attached patch Better tokenizerSplinter Review
Non-copying tokenizer. Not sure if we want to keep this all in a header or move
the implementation to a .cpp.
Attachment #111171 - Attachment is obsolete: true
Attachment #111516 - Attachment is obsolete: true
Attachment #111517 - Attachment is obsolete: true
Attachment #111998 - Flags: review?(axel)
Comment on attachment 111998 [details] [diff] [review]
Better tokenizer

r=axel@pike.org, sweet foo
Attachment #111998 - Flags: review?(axel) → review+
Attachment #111998 - Flags: superreview?(jst)
Comment on attachment 111998 [details] [diff] [review]
Better tokenizer

sr=jst
Attachment #111998 - Flags: superreview?(jst) → superreview+
Closing this one. Further improvements should be separate bugs.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
mass verifying
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: