Add const nsACString& constructor to NS_Convert(ASCII|UTF8)toUCS2

RESOLVED FIXED

Status

()

RESOLVED FIXED
18 years ago
18 years ago

People

(Reporter: jag+mozbugs, Assigned: jag+mozilla)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

18 years ago
For parity with NS_ConvertUCS2toUTF8, which has a constructor for nsAString&,
and because this just looks silly:

NS_ConvertASCIItoUCS2 uStr(aStr.get());
(Reporter)

Comment 1

18 years ago
Adding dependency (blocks bug 73009), removing silly cc.
Blocks: 73009

Comment 2

18 years ago
Targeting 1.1 since this isn't crucial, but it is a step in the right direction
for consistency and not too hard so I'd like to get it in before then if
possible.  If I make a tracking bug for handling encoding/conversion issues, I
may move this to block that tracker instead.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.1
(Reporter)

Comment 3

18 years ago
Created attachment 31191 [details] [diff] [review]
[patch] First crude attempt.

Comment 4

18 years ago
re-targeting milestones, starting from a clean slate
Target Milestone: mozilla1.1 → ---
I'm assuming a good bit of this is roughly copied, so I'm not reviewing too
closely...

+      NS_ERROR("not a UTF-8 string");

I'd prefer NS_ASSERTION to NS_ERROR.

+  nsReadingIterator<char> p, end;

I prefer |nsACString::const_iterator p, end;|.

+      explicit
+      NS_ConvertUTF8toUCS2( const nsACString& aCString )
+        {
+          Init( aCString );
+        }

It would be nice to also have:

      explicit
      NS_ConvertUTF8toUCS2( const nsAFlatCString& aCString )
        {
          Init( aCString.get() );
        }

to improve performance for things like nsCString and nsDependentString.

Also, you could have the same thing (also inline) for NS_ConvertASCIItoUCS2.

With that, r=dbaron.
(Assignee)

Comment 6

18 years ago
Created attachment 41917 [details] [diff] [review]
[patch] second attempt, incorporating dbaron's comments
(Assignee)

Comment 7

18 years ago
Taking this.

dbaron, could you review the new patch?
Assignee: scc → jaggernaut
Status: ASSIGNED → NEW
NS_ConvertASCIItoUCS2(const nsACString&)

  * should use nsACString::const_iterator rather than 
    nsReadingIterator<char>

  * should call aCString.Length() and expand the buffer at the beginning
    to avoid multiple allocations

NS_ConvertUTF8toUCS2(const nsACString&)

  * It would be nice to unify this with Init -- perhaps Init could take
    an nsACString& and you could use nsDependentString elsewhere?
    Making the const char* constructor avoid 3 passes would be a trick,
    though...

  * Both loops over the string would probably be better written as
    double for-loops - outer over fragments and inner over characters.

  * You should probably allow embedded nulls, right?  Or not?
(Assignee)

Comment 9

18 years ago
> NS_ConvertASCIItoUCS2(const nsACString&)
> 
>   * should use nsACString::const_iterator rather than 
>     nsReadingIterator<char>

done

>   * should call aCString.Length() and expand the buffer at the beginning
>     to avoid multiple allocations

done

> NS_ConvertUTF8toUCS2(const nsACString&)
> 
>   * It would be nice to unify this with Init -- perhaps Init could take
>     an nsACString& and you could use nsDependentString elsewhere?
>     Making the const char* constructor avoid 3 passes would be a trick,
>     though...

>   * Both loops over the string would probably be better written as
>     double for-loops - outer over fragments and inner over characters.

Let me think about these two.

>   * You should probably allow embedded nulls, right?  Or not?

Erh... scc?
(Assignee)

Updated

18 years ago
Status: NEW → ASSIGNED
Summary: Add const nsAReadableCString& constructor to NS_Convert(ASCII|UTF8)toUCS2 → Add const nsACString& constructor to NS_Convert(ASCII|UTF8)toUCS2
(Assignee)

Comment 10

18 years ago
Created attachment 48525 [details] [diff] [review]
third attempt
(Assignee)

Updated

18 years ago
Attachment #31191 - Attachment is obsolete: true
(Assignee)

Updated

18 years ago
Attachment #41917 - Attachment is obsolete: true
(Assignee)

Comment 11

18 years ago
I'm all ears on how to avoid the initial pass (in nsDependentCString) for the
|const char*| constructor of NS_ConvertUTF8toUCS2.
(Assignee)

Comment 12

18 years ago
>>nsString2.cpp:
>>
>>the iterators (ns{Reading,Writing}Iterator) ought to have a fragment_type so
>>that you don't have to use nsReadableFragment<char> explicitly.
>
>Next pass? New bug?
>
>>  +        nsReadableFragment<char> frag(start.fragment());
>>  
>>Maybe
>>
>>  const nsReadableFragment<char>& frag = start.fragment();
>>
>>would be better?
>
>I could use that.
>
>>How about an NS_ASSERTION(count == mLength, "calculator calculated
>>incorrect length"); at the end of nsConvertUTF8toUCS2::Init?
>
>Done.

Btw, I think this would only occur when we encounter a UTF8 string that isn't,
in which case we already error. If we're indeed fed utf8 straight from the
netwerk, we might want to deal with this more nicely.

>>I think we need to look at the error-handling of NS_ConvertUTF8toUCS2
>>carefully since we can expect it to be called on input from the network
>>(I would think, anyway, although perhaps not).
>
>I should do that.
>
>>+                else
>>+                  {
>>+                    ucs4 -= 0x00010000;
>>+                    *mBuffer++ = 0xD800 | (0x000003FF & (ucs4 >> 10));
>>+                    *mBuffer++ = 0xDC00 | (0x000003FF & ucs4);
>>+                  }
>>
>>Is this UCS2 or UTF-16 or what?
>
>I'll come back to you on this one :-)

Hmmm... This is UTF-16 [0], and it will mess us up (even in the old code) in
that the UTF8 calculator will only have assigned one character for this, and the
above writes out two. Eek!

[0] ftp://ftp.isi.edu/in-notes/rfc2781.txt

Comment 13

18 years ago
That code sure looks wrong to me. I've cc'd ftang and yokoyama, from whom I
stole the conversion code. Guys, we don't really do UTF-16, right?
(Assignee)

Comment 14

18 years ago
Created attachment 50925 [details] [diff] [review]
above patch with the UTF16 "support" removed.
Why bother maintaining mLength in ConvertUTF8toUCS2?  Or should it just be
|#ifdef DEBUG|?
The extra pass with nsDependentString calculating the length is
inefficient -- at the very least, don't bother with the ~PRUint32(0),
just use the one-parameter constructor, since they mean the same thing
and the latter is *much* clearer.  I guess, for simplicity, don't worry
about the extra pass for now, though -- it's also the best way to allow
embedded nulls in all the other cases, I think.  Same for the double
length calculation.


+                if ( ucs4 != 0xfeff ) // ignore BOM
+                    *mBuffer++ = ucs4;

Don't you need:

                 else
                   --mLength;

However, I think it would be better to scrap mLength and keep an
mStart instead of mLength, initialized in the ctor to mBuffer, and
have Length() be |{ return mBuffer - mStart; }|.

One other optimization that would be nice is to, after the check against
minUcs4, check |ucs4 < 0xD800| to short-circuit the rest of the checks
for the vast majority of characters (with a comment saying that it's
short-ciruiting what is below in case someone changes what is below).

Should the |SetCapacity| have a |+1| or not?

Other than that, r=dbaron.  scc should super-review.
(Assignee)

Comment 17

18 years ago
I'll rewrite Length() as you proposed.

And you're right, nsStr's code takes care of adding space for the 0 terminator,
so I don't need to add 1 myself when calling SetCapacity.
(Assignee)

Comment 18

18 years ago
At least, if I'm reading this code correctly:

http://lxr.mozilla.org/seamonkey/source/string/obsolete/nsStr.cpp#680
(Assignee)

Comment 19

18 years ago
Created attachment 53457 [details] [diff] [review]
Addressing comments. Also removed init and null terminate code from conversion constructors, nsAutoString's constructor does that already.
Comment on attachment 53457 [details] [diff] [review]
Addressing comments. Also removed init and null terminate code from conversion constructors, nsAutoString's constructor does that already.

r=dbaron
Attachment #53457 - Flags: review+
(Assignee)

Comment 21

18 years ago
scc, sr= please?
(Assignee)

Updated

18 years ago
Attachment #50925 - Attachment is obsolete: true
(Assignee)

Updated

18 years ago
Attachment #48525 - Attachment is obsolete: true

Comment 22

18 years ago
sr=scc
(Assignee)

Comment 23

18 years ago
Comment on attachment 53457 [details] [diff] [review]
Addressing comments. Also removed init and null terminate code from conversion constructors, nsAutoString's constructor does that already.

and sr=scc
Attachment #53457 - Flags: superreview+
(Assignee)

Comment 24

18 years ago
Ran into a little problem in NS_ConvertUTF8toUCS2::Init:

if |count| is |0|, SetCapacity will free the allocated buffer and make mUStr
point at a shared \0 string, which we can't write our null terminator to. Added
an |mCapacity| guard.

Checked in, marking fixed.
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.