The current implementation of AvmCore::internStringUTF8() creates a String instance and drops it again when it finds that string to already be part of the interned strings table. This leads to unnecessary string allocations. The String class already has methods to create a hash sum for UTF-8 strings as well as comparing a String top an UTF-8 string. The patch is, therefore, quite easy.
Created attachment 389486 [details] [diff] [review] AvmCore::internStringUTF8() does not pre-create strings AvmCore: - Added findString() with an UTF-8 character pointer, a copy/paste of findString() for UTF-16 data with String method substitution to UTF-8 APIs - Changed internStringUTF8() to using that method to detect existing strings in interned strings table, and create a string only if string is not yet present - Changed signature of internStringUTF8() to accept a utf8_t* instead of char* PoolObject.cpp: - Removed cast in call to internStringUTF8() StringObject.cpp: - hashCodeUTF8(): Added error handling for malformed UTF-8 data (which should not happen for ABC data) UnicodeUtils.cpp: - Utf82Ucs4() treated all 4-byte sequences as error, although characters < 0x100000 are legal
I have a couple of questions/remarks first. Do we have benchmarks that the unnecessary allocations actually matter? I don't understand why the thing with oldStrings is #ifdef DEBUGGER, so explain it here and insert a comment in the code :-) The DRC is not needed for oldStrings; auto variables do not need reference count operations or write barriers.
This is a reaction to Steven's email comments about bug-laden unnecessary allocations. I agree with him - it is completely unnecessary to pre-allocate a string just to find out that the string already is present. It will be hard to create a test case with enough memory evidence - significant differenceses may be proven in a Flash environment. I have no clue what these #ifdef DEBUGGER statements are about - this is old code. Maybe it is possible to track down the original author, or maybe we can just remove these statements.
Re a benchmark, all I'm really asking for is a simple program - I'm guessing 10 lines of ActionScript is more than enough - s.t. a simple run in the avmshell reveals that the new code is a performance improvement over the old one for at least that program. If we can't do even that much then this is probably not worth the bother. I don't care if we reduce GC time or string allocation time or what - I just want to know that under optimally favorable conditions, it makes a difference. (I completely believe that it's a win. I just want to see it.) Re DEBUGGER, yes, let's try to figure out what's going on - it doesn't look safe to me unless there's a strange effect in createUtf8 that can cause the string table to be rehashed. But then why only for DEBUGGER?
Re: benchmark, I agree with Lars here -- we've found lots of unanticipated side-effects from touching String in the past, so a benchmark seems prudent. (I too can believe that it's a win, but want proof that it is at least not a regression, since we've been surprised in the past.) Re: DEBUGGER... not certain, but: if String::createUTF8() triggered a sweep, we might walk thru the interned strings and prune unused ones, meaning that the strings array could change out from under us. (Not sure why this would be DEBUGGER-specific.)
Comment on attachment 389486 [details] [diff] [review] AvmCore::internStringUTF8() does not pre-create strings We'll do a review when we have some numbers.
Comment on attachment 389486 [details] [diff] [review] AvmCore::internStringUTF8() does not pre-create strings Previous comments have not yet been addressed. Need to sort out the #DEBUGGER code, insert comments, and post benchmark + results.
Targeting FP10.1 but until we have evidence it isn't high priority.
This is a mass change. Every comment has "assigned-to-new" in it. I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
#ifdef DEBUGGER code came into player's codebase on 7/18/2006 with CL 231775 from Tom Reilly: Integrate onepass GC w/ some riders: 1) fix AVM+ interning to use the same hash algorithm in findString as reshashStirngs (latent bug just happened to notice)