Closed Bug 121627 Opened 23 years ago Closed 23 years ago

UTF8ConverterStream should not allocate

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla0.9.9

People

(Reporter: alecf, Assigned: alecf)

Details

(Whiteboard: fix in hand)

Attachments

(2 files)

move ConvertUTF8toUCS2 into nsString2.h, fix converter 23 years ago Alec Flett 7.28 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
updated patch per dbaron 23 years ago Alec Flett 7.31 KB, patch	dbaron : review+ jst : superreview+	Details \| Diff \| Splinter Review

Alec Flett

Assignee

Description

•

23 years ago

Oh man.

So when I switched nsUnicharInputStream over to be UTF8-only, I stupidly forgot
the fact that NS_ConvertUTF8toUCS2 would allocate a big ol' buffer to do the
conversion.. which means every time we fill the unicode buffer, we allocate a
huge chunk of memory, convert into it, then copy stuff over, isntead of just
doing the conversion in-place..

I think the reason I avoided it was because of the changes I needed in the
string library to expose the ConvertUTF8toUCS2 class.

Anyway, I finally bit the bullet and fixed it. According to my spacetrace lots,
this makes us do about 1000 less allocations on startup.

patch forthcoming, looking for r=dbaron, sr=jag due to stringlike nature of the
patch.

Alec Flett

Assignee

Comment 1

•

23 years ago

Attached patch move ConvertUTF8toUCS2 into nsString2.h, fix converter — Details — Splinter Review

I've only included the diff to nsString2.h to show that I've moved the class
out of nsString2.cpp - I just cut and pasted - you can be sure it's gone from
nsString2.cpp

Basically, I had to change CountValidUTF8Bytes to determine both the number of
valid bytes in the utf8 string, AND the resulting characters that we'll need in
order to convert... then I use the same mechanism that NS_ConvertUTF8toUCS2
does to convert the string into the buffer.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

23 years ago

Attachment #66284 - Flags: review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 2

•

23 years ago

Comment on attachment 66284 [details] [diff] [review]
move ConvertUTF8toUCS2 into nsString2.h, fix converter

I think this:

>+  nsACString::const_iterator start, end;
>+  nsDependentSingleFragmentCSubstring
>+    utf8buffer(mByteData->GetBuffer(), mByteData->GetBuffer() + srcLen);
>+            
>+  copy_string(utf8buffer.BeginReading(start),
>+              utf8buffer.EndReading(end), converter);

would be better written as (note the additional assertion at the
end, which I think is correct and would recommend adding):

  /* I *think* you can use const_char_iterator here instead
     of const_iterator, but I'm not sure. */
nsASingleFragmentCString::const_char_iterator start, end;
const nsASingleFragmentCString& utf8buffer =
  Substring(mByteData->GetBuffer(), mByteData->GetBuffer() + srcLen);
copy_string(utf8buffer.BeginReading(start),
	    utf8buffer.EndReading(end), converter);
NS_ASSERTION(converter.Length() == dstLen, "length mismatch");

Other than that (and the assertion), r=dbaron.

Alec Flett

Assignee

Comment 3

•

23 years ago

awesome, thanks. Done in my local tree.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 4

•

23 years ago

Actually, though, if you *can* use const_char_iterator (which is just |const
char*|, you should just be able to assign directly into the iterators and skip
the |Substring| and the |BeginReading|/|EndReading| (and avoid a virtual
function call or two or three).

Alec Flett

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Priority: -- → P2

Whiteboard: fix in hand

Target Milestone: --- → mozilla0.9.9

Alec Flett

Assignee

Comment 5

•

23 years ago

Attached patch updated patch per dbaron — Details — Splinter Review

I think I've got this right. my only question is if "end" should be
|GetBuffer() + srcLen| or |GetBuffer() + srcLen - 1| (i.e. is it the last
character in the string, or the one right after it?)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 6

•

23 years ago

Comment on attachment 66295 [details] [diff] [review]
updated patch per dbaron

That's right.  It should be the character after (i.e., no -1).

Attachment #66295 - Flags: review+

Johnny Stenback (:jst)

Comment 7

•

23 years ago

Comment on attachment 66295 [details] [diff] [review]
updated patch per dbaron

Do we really want ConvertUTF8toUCS2::write() to be inline in the header? It's
not really a trivial amount of code. Not a big deal, so either way.

sr=jst

Attachment #66295 - Flags: superreview+

Alec Flett

Assignee

Comment 8

•

23 years ago

yeah, I'll leave it inline for now, but when we expose this class through XPCOM
we'll have to add NS_COM to the declaration, so I'll fix that then!

fix has landed... thanks.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

UTF8ConverterStream should not allocate

Categories

(Core :: XPCOM, defect, P2)

Tracking

()

People

(Reporter: alecf, Assigned: alecf)

References

Details

(Whiteboard: fix in hand)

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Attachment

General

Description

File Name

Content Type