FindCharInSet 10x faster




17 years ago
17 years ago


(Reporter: bratell, Assigned: bratell)



Dependency tree / graph

Firefox Tracking Flags

(Not tracked)



(3 attachments)



17 years ago
Well, there are improvements to make among the obsolete string classes. I know 
they are supposed to be replaced, but since I thought of an algorithm for 
FindCharInSet that seems to work well in the parser, I wanted to try it for 
ns[C]String. Before my patch Quantify reported 10569ms spent in FindCharInSet, 
afterwards 866ms (12.2 times faster).

I hope you like the patch coming up. It cuts ~5% of the tokenizer time for the 
table stress case.

Comment 1

17 years ago
Created attachment 43912 [details] [diff] [review]
Patch to FindCharInSet

Comment 2

17 years ago
If one wants to improve the patch even further, one could make the caller supply 
the filter, but even though that was simple with an help class in the tokenizer, 
it probably complicates the API too much for the string library. 

Now 35% of the remaining time is spent calculating the filter. 

scc, what do you think?
Keywords: perf, review

Comment 3

17 years ago
Created attachment 43981 [details] [diff] [review]
New patch that removes unused functions

Comment 4

17 years ago
I noticed that Mozilla didn't use most of these functions at all so I took the 
liberty of removing them. What is the meaning of searching for a set of Unicode 
characters in a C string anyway?

So left is ns[C]String::[R]FindCharInSet(char *) and 
nsString::[R]FindCharInSet(PRUnichar *)

Gone is ns[C]String::[R]FindCharInSet(nsStr&), 
nsCString::[R]FindCharInSet(PRUnichar *) and nsStr::[R]FindCharInSet(...)

The functions that are left are very similar so maybe could they be implemented 
someway smarter. I don't want to introduce function calls though.

So, scc, can you review?

Comment 5

17 years ago
One more patch that doesn't hang mail. I had misunderstood the meaning of the 
offset parameter when searching backwards.
Assignee: scc → bratell


17 years ago
Target Milestone: --- → mozilla0.9.4

Comment 6

17 years ago
Created attachment 44295 [details] [diff] [review]
With working RFindCharInSet


17 years ago
OS: Windows 2000 → All
Hardware: PC → All

Comment 7

17 years ago
scc, could you comment on this. Is it worthy of a review?

Comment 8

17 years ago
No review yet so I have to move it to the next milestone.
Target Milestone: mozilla0.9.4 → mozilla0.9.5

Comment 9

17 years ago
I want this in, but have no time to push scc or jag or bryner or someone else
for reviews so I will clear the milestone. 
Target Milestone: mozilla0.9.5 → ---

Comment 10

17 years ago
jag, is this worth pursuing? I don't have the mastar plan for the Mozilla
strings so I don't know if this will fit. It's a smart and fast fix even though
I don't know how rotten it has become.

Comment 11

17 years ago
Daniel, I think it'll still be worth it, but cc'ing alecf who's been making some
changes in this code too.

Comment 12

17 years ago
this is cool, though I've whacked the hell out of this code in my tree.

Basically, I've change around some routines and broken the toplevel nsStr
routines into

as a part of bug 114450

Would you mind waiting until I land the patch there, and then reapply your
changes? I haven't bothered to see which ns*String:: routines are unused in the
tree, so after I land we can probably remove the same functions.


17 years ago
Depends on: 114450

Comment 13

17 years ago
ok, that bug has been fixed.. feel free to make a 2nd attempt, though you'll
notice that we now have 4 implementation of each *FindCharInSet

The good news is that 
1) I'm probably getting rid of some of them (FindCharInSet2in1 is probably going
away, for example)
2) case sensitivity is going away from all of the ones that involve PRUnichar,
such as FindCharInSet2in1, FindCharInSet1in2, etc.. unless the particular code
involves changing the case of the ascii string/character.

so you may want to stay tuned to bug 107575 before you make another attempt.


17 years ago
Depends on: 107575

Comment 14

17 years ago
Now that bug 107575 is fixed I want to make another go on this. One problem
though, the nsStr::(R)FindCharInSet routines have a aIgnoreCase parameter that I
want to remove. It isn't used so it should not be a problem. Can I do that? jag?

Comment 15

17 years ago
if its really not used, please do..again, nsStrPrivate is private, so you can go
tweaking that all you want, without breaking binary compatibility.
(besides, nobody ever said nsString was frozen, so we can continue to change
that API, though people hate it if it affects too many files :))

Comment 16

17 years ago
Go ahead an remove that parameter.

Comment 17

17 years ago
There's a patch for this attached in bug 75081.

Comment 18

17 years ago
should we just mark this one a dupe of the other, since the other covers this
bug, and more?

Comment 19

17 years ago
I was waiting for the reviewer's objects but I can always reopen this if I would
like to split it again. :-)

*** This bug has been marked as a duplicate of 75081 ***
Last Resolved: 17 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.