Closed Bug 121627 Opened 23 years ago Closed 23 years ago

UTF8ConverterStream should not allocate

Categories

(Core :: XPCOM, defect, P2)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.9

People

(Reporter: alecf, Assigned: alecf)

Details

(Whiteboard: fix in hand)

Attachments

(2 files)

Oh man.

So when I switched nsUnicharInputStream over to be UTF8-only, I stupidly forgot
the fact that NS_ConvertUTF8toUCS2 would allocate a big ol' buffer to do the
conversion.. which means every time we fill the unicode buffer, we allocate a
huge chunk of memory, convert into it, then copy stuff over, isntead of just
doing the conversion in-place..

I think the reason I avoided it was because of the changes I needed in the
string library to expose the ConvertUTF8toUCS2 class.

Anyway, I finally bit the bullet and fixed it. According to my spacetrace lots,
this makes us do about 1000 less allocations on startup.

patch forthcoming, looking for r=dbaron, sr=jag due to stringlike nature of the
patch.
I've only included the diff to nsString2.h to show that I've moved the class
out of nsString2.cpp - I just cut and pasted - you can be sure it's gone from
nsString2.cpp

Basically, I had to change CountValidUTF8Bytes to determine both the number of
valid bytes in the utf8 string, AND the resulting characters that we'll need in
order to convert... then I use the same mechanism that NS_ConvertUTF8toUCS2
does to convert the string into the buffer.
Comment on attachment 66284 [details] [diff] [review]
move ConvertUTF8toUCS2 into nsString2.h, fix converter

I think this:

>+  nsACString::const_iterator start, end;
>+  nsDependentSingleFragmentCSubstring
>+    utf8buffer(mByteData->GetBuffer(), mByteData->GetBuffer() + srcLen);
>+            
>+  copy_string(utf8buffer.BeginReading(start),
>+              utf8buffer.EndReading(end), converter);

would be better written as (note the additional assertion at the
end, which I think is correct and would recommend adding):

  /* I *think* you can use const_char_iterator here instead
     of const_iterator, but I'm not sure. */
nsASingleFragmentCString::const_char_iterator start, end;
const nsASingleFragmentCString& utf8buffer =
  Substring(mByteData->GetBuffer(), mByteData->GetBuffer() + srcLen);
copy_string(utf8buffer.BeginReading(start),
	    utf8buffer.EndReading(end), converter);
NS_ASSERTION(converter.Length() == dstLen, "length mismatch");

Other than that (and the assertion), r=dbaron.
awesome, thanks. Done in my local tree.
Actually, though, if you *can* use const_char_iterator (which is just |const
char*|, you should just be able to assign directly into the iterators and skip
the |Substring| and the |BeginReading|/|EndReading| (and avoid a virtual
function call or two or three).
Status: NEW → ASSIGNED
Priority: -- → P2
Whiteboard: fix in hand
Target Milestone: --- → mozilla0.9.9
I think I've got this right. my only question is if "end" should be
|GetBuffer() + srcLen| or |GetBuffer() + srcLen - 1| (i.e. is it the last
character in the string, or the one right after it?)
Comment on attachment 66295 [details] [diff] [review]
updated patch per dbaron

That's right.  It should be the character after (i.e., no -1).
Attachment #66295 - Flags: review+
Comment on attachment 66295 [details] [diff] [review]
updated patch per dbaron

Do we really want ConvertUTF8toUCS2::write() to be inline in the header? It's
not really a trivial amount of code. Not a big deal, so either way.

sr=jst
Attachment #66295 - Flags: superreview+
yeah, I'll leave it inline for now, but when we expose this class through XPCOM
we'll have to add NS_COM to the declaration, so I'll fix that then!

fix has landed... thanks.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: