Last Comment Bug 672641 - Implement CharTokenizer which takes arbitrary chars as delimiters
: Implement CharTokenizer which takes arbitrary chars as delimiters
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla9
Assigned To: Shawn Gong
:
Mentors:
Depends on:
Blocks: 668680
  Show dependency treegraph
 
Reported: 2011-07-19 14:07 PDT by Shawn Gong
Modified: 2012-08-21 12:32 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Referenced from nsWhitespaceTokenizer.h (4.90 KB, patch)
2011-07-19 14:10 PDT, Shawn Gong
no flags Details | Diff | Splinter Review
Changed to mozilla:CharTokenizer (4.93 KB, patch)
2011-07-19 15:05 PDT, Shawn Gong
jonas: review-
Details | Diff | Splinter Review
Corrected the problems (4.46 KB, patch)
2011-07-19 18:21 PDT, Shawn Gong
jonas: review+
Details | Diff | Splinter Review
Fixes according to Ms2ger's comments (4.41 KB, patch)
2011-07-25 14:35 PDT, Shawn Gong
hippopotamus.gong: review+
Details | Diff | Splinter Review

Description Shawn Gong 2011-07-19 14:07:49 PDT
Currently we only have nsCharSeperatedTokenizer and nsWhitespaceTokenizer, which have their specific use cases. 

The nsCharTokenizer can take any char as a delimiter and tokenize the string based on that.
Comment 1 Shawn Gong 2011-07-19 14:10:38 PDT
Created attachment 546903 [details] [diff] [review]
Referenced from nsWhitespaceTokenizer.h

Tested using the MozURL and tests passed.
Comment 2 :Ms2ger (⌚ UTC+1/+2) 2011-07-19 14:45:46 PDT
Could you make that mozilla::{,C}CharTokenizer? (I guess a template wouldn't work here, because of the fake templating we do for our strings.)
Comment 3 Shawn Gong 2011-07-19 15:05:45 PDT
Created attachment 546921 [details] [diff] [review]
Changed to mozilla:CharTokenizer
Comment 4 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-07-19 15:43:44 PDT
Comment on attachment 546921 [details] [diff] [review]
Changed to mozilla:CharTokenizer

Review of attachment 546921 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/ds/CharTokenizer.h
@@ +45,5 @@
> +
> +class CharTokenizer
> +{
> +public:
> +    CharTokenizer(const nsSubstring& aSource, PRUnichar delimiter)

Make the delimiter a template argument rather than a constructor argument.

@@ +52,5 @@
> +      aSource.BeginReading(mIter);
> +      aSource.EndReading(mEnd);
> +
> +      while (mIter != mEnd && isDelimiter(*mIter)) {
> +        ++mIter;

This is not what you want. For a string like "&foo&bar" it'll just create two tokens, "foo" and "bar", whereas you want "", "foo" and "bar".

@@ +75,5 @@
> +        ++mIter;
> +      }
> +      nsSubstring::const_char_iterator end = mIter;
> +      while (mIter != mEnd && isDelimiter(*mIter)) {
> +        ++mIter;

Again, this is not what you want. For a string like "foo&&bar" it'll create the two tokens "foo" and "bar", whereas you want "foo", "" and "bar".
Comment 5 Shawn Gong 2011-07-19 18:21:32 PDT
Created attachment 546960 [details] [diff] [review]
Corrected the problems
Comment 6 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-07-20 08:49:16 PDT
Comment on attachment 546960 [details] [diff] [review]
Corrected the problems

Review of attachment 546960 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good
Comment 7 Mounir Lamouri (:mounir) 2011-07-20 11:34:24 PDT
Can we have some tests?
Comment 8 Shawn Gong 2011-07-20 11:48:05 PDT
(In reply to comment #7)
> Can we have some tests?

Definitely. I will put tests soon.
Comment 9 Shawn Gong 2011-07-20 17:15:47 PDT
(In reply to comment #7)
> Can we have some tests?

Hey Mounir, it turned out that creating new C++ unit tests with nsString is troublesome, so I will test the tokenizer's results in JavaScript in the related bug 668680.
Comment 10 Mounir Lamouri (:mounir) 2011-07-20 23:07:43 PDT
Why is it troublesome? I mean, we should probably open a bug if we can't test that kind of stuff: we should be able to!
Comment 11 :Ms2ger (⌚ UTC+1/+2) 2011-07-23 06:49:35 PDT
Comment on attachment 546960 [details] [diff] [review]
Corrected the problems

>--- /dev/null
>+++ b/xpcom/ds/CharTokenizer.h
>+#ifndef __CharTokenizer_h
>+#define __CharTokenizer_h

Please make that mozilla_CharTokenizer_h
>+#endif /* __CharTokenizer_h */
>+
>+

And get rid of these empty lines.

>diff --git a/xpcom/ds/Makefile.in b/xpcom/ds/Makefile.in
>--- a/xpcom/ds/Makefile.in
>+++ b/xpcom/ds/Makefile.in
>@@ -108,16 +108,17 @@ EXPORTS		= \
>+		CharTokenizer.h \

Please add it to EXPORTS_mozilla instead
Comment 12 Shawn Gong 2011-07-25 14:35:01 PDT
Created attachment 548290 [details] [diff] [review]
Fixes according to Ms2ger's comments
Comment 13 Shawn Gong 2011-07-25 14:36:12 PDT
(In reply to comment #10)
> Why is it troublesome? I mean, we should probably open a bug if we can't
> test that kind of stuff: we should be able to!

Will look into it later
Comment 14 Shawn Gong 2011-09-08 15:16:23 PDT
Comment on attachment 548290 [details] [diff] [review]
Fixes according to Ms2ger's comments

Review+ by sicking
Comment 15 Jonas Sicking (:sicking) No longer reading bugmail consistently 2011-09-08 18:17:58 PDT
Checked in to inbound:

http://hg.mozilla.org/integration/mozilla-inbound/rev/edb32fa29bca

Yay!
Comment 17 Daniel Holbert [:dholbert] 2012-08-21 12:23:30 PDT
(In reply to Shawn Gong from comment #0)
> Currently we only have nsCharSeperatedTokenizer and nsWhitespaceTokenizer,
> which have their specific use cases. 
> 
> The nsCharTokenizer can take any char as a delimiter and tokenize the string
> based on that.

That sounds exactly like what nsCharSeperatedTokenizer does, aside from that nsCharSeperatedTokenizer does whitespace-skipping (which is configurable, since you can pass in a custom "is whitespace" method, and you could pass in a method that unconditionally returns false to prevent whitespace-skipping).

Aside from that, are there any other things that distinguish nsCharTokenizer from nsCharSeperatedTokenizer?
Comment 18 Daniel Holbert [:dholbert] 2012-08-21 12:32:07 PDT
Actually, it looks like CharTokenizer isn't used at all, aside from a chunk of recently-added code in SVGFragmentIdentifier.cpp  (as well as in the awaiting-review patch in bug 668680).

I suspect both of those could easily use a variant of nsCharSeperatedTokenizer, and this class CharTokenizer class could go away... I'm just worried that having two nearly-identically-named classes with different impls for the same use-case will cause confusion & increased surface for bugs to crop up. Happy to file a new bug on that; I'm just commenting here to check if I'm misunderstanding the use-cases / purpose of CharTokenizer, before doing that.

Note You need to log in before you can comment on or make changes to this bug.