Closed Bug 672641 Opened 14 years ago Closed 14 years ago

Implement CharTokenizer which takes arbitrary chars as delimiters

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla9

People

(Reporter: hippopotamus.gong, Assigned: hippopotamus.gong)

References

Details

Attachments

(1 file, 3 obsolete files)

Currently we only have nsCharSeperatedTokenizer and nsWhitespaceTokenizer, which have their specific use cases. The nsCharTokenizer can take any char as a delimiter and tokenize the string based on that.
Tested using the MozURL and tests passed.
Attachment #546903 - Flags: review?(jonas)
Could you make that mozilla::{,C}CharTokenizer? (I guess a template wouldn't work here, because of the fake templating we do for our strings.)
Attached patch Changed to mozilla:CharTokenizer (obsolete) — Splinter Review
Attachment #546903 - Attachment is obsolete: true
Attachment #546921 - Flags: review?(jonas)
Attachment #546903 - Flags: review?(jonas)
Summary: nsCharTokenizer which takes arbitrary chars as delimiters → Implement CharTokenizer which takes arbitrary chars as delimiters
Comment on attachment 546921 [details] [diff] [review] Changed to mozilla:CharTokenizer Review of attachment 546921 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/ds/CharTokenizer.h @@ +45,5 @@ > + > +class CharTokenizer > +{ > +public: > + CharTokenizer(const nsSubstring& aSource, PRUnichar delimiter) Make the delimiter a template argument rather than a constructor argument. @@ +52,5 @@ > + aSource.BeginReading(mIter); > + aSource.EndReading(mEnd); > + > + while (mIter != mEnd && isDelimiter(*mIter)) { > + ++mIter; This is not what you want. For a string like "&foo&bar" it'll just create two tokens, "foo" and "bar", whereas you want "", "foo" and "bar". @@ +75,5 @@ > + ++mIter; > + } > + nsSubstring::const_char_iterator end = mIter; > + while (mIter != mEnd && isDelimiter(*mIter)) { > + ++mIter; Again, this is not what you want. For a string like "foo&&bar" it'll create the two tokens "foo" and "bar", whereas you want "foo", "" and "bar".
Attachment #546921 - Flags: review?(jonas) → review-
Attached patch Corrected the problems (obsolete) — Splinter Review
Attachment #546921 - Attachment is obsolete: true
Attachment #546960 - Flags: review?(jonas)
Comment on attachment 546960 [details] [diff] [review] Corrected the problems Review of attachment 546960 [details] [diff] [review]: ----------------------------------------------------------------- Looks good
Attachment #546960 - Flags: review?(jonas) → review+
Can we have some tests?
(In reply to comment #7) > Can we have some tests? Definitely. I will put tests soon.
(In reply to comment #7) > Can we have some tests? Hey Mounir, it turned out that creating new C++ unit tests with nsString is troublesome, so I will test the tokenizer's results in JavaScript in the related bug 668680.
Why is it troublesome? I mean, we should probably open a bug if we can't test that kind of stuff: we should be able to!
Comment on attachment 546960 [details] [diff] [review] Corrected the problems >--- /dev/null >+++ b/xpcom/ds/CharTokenizer.h >+#ifndef __CharTokenizer_h >+#define __CharTokenizer_h Please make that mozilla_CharTokenizer_h >+#endif /* __CharTokenizer_h */ >+ >+ And get rid of these empty lines. >diff --git a/xpcom/ds/Makefile.in b/xpcom/ds/Makefile.in >--- a/xpcom/ds/Makefile.in >+++ b/xpcom/ds/Makefile.in >@@ -108,16 +108,17 @@ EXPORTS = \ >+ CharTokenizer.h \ Please add it to EXPORTS_mozilla instead
Attachment #546960 - Attachment is obsolete: true
(In reply to comment #10) > Why is it troublesome? I mean, we should probably open a bug if we can't > test that kind of stuff: we should be able to! Will look into it later
Assignee: nobody → shawn
Blocks: 668680
OS: Mac OS X → All
Hardware: x86 → All
Version: unspecified → Trunk
Comment on attachment 548290 [details] [diff] [review] Fixes according to Ms2ger's comments Review+ by sicking
Attachment #548290 - Flags: review+
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla9
(In reply to Shawn Gong from comment #0) > Currently we only have nsCharSeperatedTokenizer and nsWhitespaceTokenizer, > which have their specific use cases. > > The nsCharTokenizer can take any char as a delimiter and tokenize the string > based on that. That sounds exactly like what nsCharSeperatedTokenizer does, aside from that nsCharSeperatedTokenizer does whitespace-skipping (which is configurable, since you can pass in a custom "is whitespace" method, and you could pass in a method that unconditionally returns false to prevent whitespace-skipping). Aside from that, are there any other things that distinguish nsCharTokenizer from nsCharSeperatedTokenizer?
Actually, it looks like CharTokenizer isn't used at all, aside from a chunk of recently-added code in SVGFragmentIdentifier.cpp (as well as in the awaiting-review patch in bug 668680). I suspect both of those could easily use a variant of nsCharSeperatedTokenizer, and this class CharTokenizer class could go away... I'm just worried that having two nearly-identically-named classes with different impls for the same use-case will cause confusion & increased surface for bugs to crop up. Happy to file a new bug on that; I'm just commenting here to check if I'm misunderstanding the use-cases / purpose of CharTokenizer, before doing that.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: