Implement CharTokenizer which takes arbitrary chars as delimiters

RESOLVED FIXED in mozilla9

Status

()

Core
DOM
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: Shawn Gong, Assigned: Shawn Gong)

Tracking

Trunk
mozilla9
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 3 obsolete attachments)

(Assignee)

Description

6 years ago
Currently we only have nsCharSeperatedTokenizer and nsWhitespaceTokenizer, which have their specific use cases. 

The nsCharTokenizer can take any char as a delimiter and tokenize the string based on that.
(Assignee)

Comment 1

6 years ago
Created attachment 546903 [details] [diff] [review]
Referenced from nsWhitespaceTokenizer.h

Tested using the MozURL and tests passed.
Attachment #546903 - Flags: review?(jonas)
Could you make that mozilla::{,C}CharTokenizer? (I guess a template wouldn't work here, because of the fake templating we do for our strings.)
(Assignee)

Comment 3

6 years ago
Created attachment 546921 [details] [diff] [review]
Changed to mozilla:CharTokenizer
Attachment #546903 - Attachment is obsolete: true
Attachment #546921 - Flags: review?(jonas)
Attachment #546903 - Flags: review?(jonas)
(Assignee)

Updated

6 years ago
Summary: nsCharTokenizer which takes arbitrary chars as delimiters → Implement CharTokenizer which takes arbitrary chars as delimiters
Comment on attachment 546921 [details] [diff] [review]
Changed to mozilla:CharTokenizer

Review of attachment 546921 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/ds/CharTokenizer.h
@@ +45,5 @@
> +
> +class CharTokenizer
> +{
> +public:
> +    CharTokenizer(const nsSubstring& aSource, PRUnichar delimiter)

Make the delimiter a template argument rather than a constructor argument.

@@ +52,5 @@
> +      aSource.BeginReading(mIter);
> +      aSource.EndReading(mEnd);
> +
> +      while (mIter != mEnd && isDelimiter(*mIter)) {
> +        ++mIter;

This is not what you want. For a string like "&foo&bar" it'll just create two tokens, "foo" and "bar", whereas you want "", "foo" and "bar".

@@ +75,5 @@
> +        ++mIter;
> +      }
> +      nsSubstring::const_char_iterator end = mIter;
> +      while (mIter != mEnd && isDelimiter(*mIter)) {
> +        ++mIter;

Again, this is not what you want. For a string like "foo&&bar" it'll create the two tokens "foo" and "bar", whereas you want "foo", "" and "bar".
Attachment #546921 - Flags: review?(jonas) → review-
(Assignee)

Comment 5

6 years ago
Created attachment 546960 [details] [diff] [review]
Corrected the problems
Attachment #546921 - Attachment is obsolete: true
Attachment #546960 - Flags: review?(jonas)
Comment on attachment 546960 [details] [diff] [review]
Corrected the problems

Review of attachment 546960 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good
Attachment #546960 - Flags: review?(jonas) → review+
Can we have some tests?
(Assignee)

Comment 8

6 years ago
(In reply to comment #7)
> Can we have some tests?

Definitely. I will put tests soon.
(Assignee)

Comment 9

6 years ago
(In reply to comment #7)
> Can we have some tests?

Hey Mounir, it turned out that creating new C++ unit tests with nsString is troublesome, so I will test the tokenizer's results in JavaScript in the related bug 668680.
Why is it troublesome? I mean, we should probably open a bug if we can't test that kind of stuff: we should be able to!
Comment on attachment 546960 [details] [diff] [review]
Corrected the problems

>--- /dev/null
>+++ b/xpcom/ds/CharTokenizer.h
>+#ifndef __CharTokenizer_h
>+#define __CharTokenizer_h

Please make that mozilla_CharTokenizer_h
>+#endif /* __CharTokenizer_h */
>+
>+

And get rid of these empty lines.

>diff --git a/xpcom/ds/Makefile.in b/xpcom/ds/Makefile.in
>--- a/xpcom/ds/Makefile.in
>+++ b/xpcom/ds/Makefile.in
>@@ -108,16 +108,17 @@ EXPORTS		= \
>+		CharTokenizer.h \

Please add it to EXPORTS_mozilla instead
(Assignee)

Comment 12

6 years ago
Created attachment 548290 [details] [diff] [review]
Fixes according to Ms2ger's comments
Attachment #546960 - Attachment is obsolete: true
(Assignee)

Comment 13

6 years ago
(In reply to comment #10)
> Why is it troublesome? I mean, we should probably open a bug if we can't
> test that kind of stuff: we should be able to!

Will look into it later

Updated

6 years ago
Assignee: nobody → shawn
Blocks: 668680
OS: Mac OS X → All
Hardware: x86 → All
Version: unspecified → Trunk
(Assignee)

Comment 14

6 years ago
Comment on attachment 548290 [details] [diff] [review]
Fixes according to Ms2ger's comments

Review+ by sicking
Attachment #548290 - Flags: review+
Checked in to inbound:

http://hg.mozilla.org/integration/mozilla-inbound/rev/edb32fa29bca

Yay!
http://hg.mozilla.org/mozilla-central/rev/edb32fa29bca
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
(In reply to Shawn Gong from comment #0)
> Currently we only have nsCharSeperatedTokenizer and nsWhitespaceTokenizer,
> which have their specific use cases. 
> 
> The nsCharTokenizer can take any char as a delimiter and tokenize the string
> based on that.

That sounds exactly like what nsCharSeperatedTokenizer does, aside from that nsCharSeperatedTokenizer does whitespace-skipping (which is configurable, since you can pass in a custom "is whitespace" method, and you could pass in a method that unconditionally returns false to prevent whitespace-skipping).

Aside from that, are there any other things that distinguish nsCharTokenizer from nsCharSeperatedTokenizer?
Actually, it looks like CharTokenizer isn't used at all, aside from a chunk of recently-added code in SVGFragmentIdentifier.cpp  (as well as in the awaiting-review patch in bug 668680).

I suspect both of those could easily use a variant of nsCharSeperatedTokenizer, and this class CharTokenizer class could go away... I'm just worried that having two nearly-identically-named classes with different impls for the same use-case will cause confusion & increased surface for bugs to crop up. Happy to file a new bug on that; I'm just commenting here to check if I'm misunderstanding the use-cases / purpose of CharTokenizer, before doing that.
You need to log in before you can comment on or make changes to this bug.