Oh man. So when I switched nsUnicharInputStream over to be UTF8-only, I stupidly forgot the fact that NS_ConvertUTF8toUCS2 would allocate a big ol' buffer to do the conversion.. which means every time we fill the unicode buffer, we allocate a huge chunk of memory, convert into it, then copy stuff over, isntead of just doing the conversion in-place.. I think the reason I avoided it was because of the changes I needed in the string library to expose the ConvertUTF8toUCS2 class. Anyway, I finally bit the bullet and fixed it. According to my spacetrace lots, this makes us do about 1000 less allocations on startup. patch forthcoming, looking for r=dbaron, sr=jag due to stringlike nature of the patch.
Created attachment 66284 [details] [diff] [review] move ConvertUTF8toUCS2 into nsString2.h, fix converter I've only included the diff to nsString2.h to show that I've moved the class out of nsString2.cpp - I just cut and pasted - you can be sure it's gone from nsString2.cpp Basically, I had to change CountValidUTF8Bytes to determine both the number of valid bytes in the utf8 string, AND the resulting characters that we'll need in order to convert... then I use the same mechanism that NS_ConvertUTF8toUCS2 does to convert the string into the buffer.
Comment on attachment 66284 [details] [diff] [review] move ConvertUTF8toUCS2 into nsString2.h, fix converter I think this: >+ nsACString::const_iterator start, end; >+ nsDependentSingleFragmentCSubstring >+ utf8buffer(mByteData->GetBuffer(), mByteData->GetBuffer() + srcLen); >+ >+ copy_string(utf8buffer.BeginReading(start), >+ utf8buffer.EndReading(end), converter); would be better written as (note the additional assertion at the end, which I think is correct and would recommend adding): /* I *think* you can use const_char_iterator here instead of const_iterator, but I'm not sure. */ nsASingleFragmentCString::const_char_iterator start, end; const nsASingleFragmentCString& utf8buffer = Substring(mByteData->GetBuffer(), mByteData->GetBuffer() + srcLen); copy_string(utf8buffer.BeginReading(start), utf8buffer.EndReading(end), converter); NS_ASSERTION(converter.Length() == dstLen, "length mismatch"); Other than that (and the assertion), r=dbaron.
awesome, thanks. Done in my local tree.
Actually, though, if you *can* use const_char_iterator (which is just |const char*|, you should just be able to assign directly into the iterators and skip the |Substring| and the |BeginReading|/|EndReading| (and avoid a virtual function call or two or three).
Status: NEW → ASSIGNED
Priority: -- → P2
Whiteboard: fix in hand
Target Milestone: --- → mozilla0.9.9
Created attachment 66295 [details] [diff] [review] updated patch per dbaron I think I've got this right. my only question is if "end" should be |GetBuffer() + srcLen| or |GetBuffer() + srcLen - 1| (i.e. is it the last character in the string, or the one right after it?)
Comment on attachment 66295 [details] [diff] [review] updated patch per dbaron That's right. It should be the character after (i.e., no -1).
Attachment #66295 - Flags: review+
Comment on attachment 66295 [details] [diff] [review] updated patch per dbaron Do we really want ConvertUTF8toUCS2::write() to be inline in the header? It's not really a trivial amount of code. Not a big deal, so either way. sr=jst
Attachment #66295 - Flags: superreview+
yeah, I'll leave it inline for now, but when we expose this class through XPCOM we'll have to add NS_COM to the declaration, so I'll fix that then! fix has landed... thanks.
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.