Closed Bug 505194 Opened 15 years ago Closed 6 years ago

Minimize string allocations during interning of UTF-8 strings

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: daumling, Unassigned)

Details

(Whiteboard: PACMAN)

Attachments

(1 file)

The current implementation of AvmCore::internStringUTF8() creates a String instance and drops it again when it finds that string to already be part of the interned strings table. This leads to unnecessary string allocations.

The String class already has methods to create a hash sum for UTF-8 strings as well as comparing a String top an UTF-8 string. The patch is, therefore, quite easy.
Flags: flashplayer-qrb+
Target Milestone: --- → flash10.1
AvmCore:
- Added findString() with an UTF-8 character pointer, a copy/paste of findString() for UTF-16 data with String method substitution to UTF-8 APIs
- Changed internStringUTF8() to using that method to detect existing strings in interned strings table, and create a string only if string is not yet present
- Changed signature of internStringUTF8() to accept a utf8_t* instead of char*
PoolObject.cpp:
- Removed cast in call to internStringUTF8()
StringObject.cpp:
- hashCodeUTF8(): Added error handling for malformed UTF-8 data (which should not happen for ABC data)
UnicodeUtils.cpp:
- Utf82Ucs4() treated all 4-byte sequences as error, although characters < 0x100000 are legal
Attachment #389486 - Flags: superreview?(edwsmith)
Attachment #389486 - Flags: review?(lhansen)
I have a couple of questions/remarks first.

Do we have benchmarks that the unnecessary allocations actually matter?

I don't understand why the thing with oldStrings is #ifdef DEBUGGER, so explain it here and insert a comment in the code :-)

The DRC is not needed for oldStrings; auto variables do not need reference count operations or write barriers.
This is a reaction to Steven's email comments about bug-laden unnecessary allocations. I agree with him - it is completely unnecessary to pre-allocate a string just to find out that the string already is present. It will be hard to create a test case with enough memory evidence - significant differenceses may be proven in a Flash environment.

I have no clue what these #ifdef DEBUGGER statements are about - this is old code. Maybe it is possible to track down the original author, or maybe we can just remove these statements.
Re a benchmark, all I'm really asking for is a simple program - I'm guessing 10 lines of ActionScript is more than enough - s.t. a simple run in the avmshell reveals that the new code is a performance improvement over the old one for at least that program.  If we can't do even that much then this is probably not worth the bother.  I don't care if we reduce GC time or string allocation time or what - I just want to know that under optimally favorable conditions, it makes a difference.  (I completely believe that it's a win.  I just want to see it.)

Re DEBUGGER, yes, let's try to figure out what's going on - it doesn't look safe to me unless there's a strange effect in createUtf8 that can cause the string table to be rehashed.  But then why only for DEBUGGER?
Re: benchmark, I agree with Lars here -- we've found lots of unanticipated side-effects from touching String in the past, so a benchmark seems prudent. (I too can believe that it's a win, but want proof that it is at least not a regression, since we've been surprised in the past.)

Re: DEBUGGER... not certain, but: if String::createUTF8() triggered a sweep, we might walk thru the interned strings and prune unused ones, meaning that the strings array could change out from under us. (Not sure why this would be DEBUGGER-specific.)
Comment on attachment 389486 [details] [diff] [review]
AvmCore::internStringUTF8() does not pre-create strings

We'll do a review when we have some numbers.
Attachment #389486 - Flags: review?(lhansen)
Comment on attachment 389486 [details] [diff] [review]
AvmCore::internStringUTF8() does not pre-create strings

Previous comments have not yet been addressed.  Need to sort out the #DEBUGGER code, insert comments, and post benchmark + results.
Attachment #389486 - Flags: superreview?(edwsmith) → superreview-
Targeting FP10.1 but until we have evidence it isn't high priority.
OS: Windows XP → All
Priority: -- → P5
Hardware: x86 → All
Target Milestone: flash10.1 → flash10.2
Whiteboard: PACMAN
Assignee: daumling → nobody
This is a mass change. Every comment has "assigned-to-new" in it.

I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW
#ifdef DEBUGGER code came into player's codebase on 7/18/2006 with CL 231775 from Tom Reilly:

Integrate onepass GC w/ some riders:

1) fix AVM+ interning to use the same hash algorithm in findString
as reshashStirngs (latent bug just happened to notice)
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Priority: P5 → --
Target Milestone: Q3 11 - Serrano → Future
Tamarin is a dead project now. Mass WONTFIX.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Tamarin isn't maintained anymore. WONTFIX remaining bugs.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: