If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Minimize string allocations during interning of UTF-8 strings

NEW
Unassigned

Status

Tamarin
Virtual Machine
8 years ago
7 years ago

People

(Reporter: Michael Daumling, Unassigned)

Tracking

unspecified
Future
Bug Flags:
flashplayer-injection -
flashplayer-qrb +
flashplayer-bug -

Details

(Whiteboard: PACMAN)

Attachments

(1 attachment)

(Reporter)

Description

8 years ago
The current implementation of AvmCore::internStringUTF8() creates a String instance and drops it again when it finds that string to already be part of the interned strings table. This leads to unnecessary string allocations.

The String class already has methods to create a hash sum for UTF-8 strings as well as comparing a String top an UTF-8 string. The patch is, therefore, quite easy.

Updated

8 years ago
Flags: flashplayer-qrb+
Target Milestone: --- → flash10.1
(Reporter)

Comment 1

8 years ago
Created attachment 389486 [details] [diff] [review]
AvmCore::internStringUTF8() does not pre-create strings

AvmCore:
- Added findString() with an UTF-8 character pointer, a copy/paste of findString() for UTF-16 data with String method substitution to UTF-8 APIs
- Changed internStringUTF8() to using that method to detect existing strings in interned strings table, and create a string only if string is not yet present
- Changed signature of internStringUTF8() to accept a utf8_t* instead of char*
PoolObject.cpp:
- Removed cast in call to internStringUTF8()
StringObject.cpp:
- hashCodeUTF8(): Added error handling for malformed UTF-8 data (which should not happen for ABC data)
UnicodeUtils.cpp:
- Utf82Ucs4() treated all 4-byte sequences as error, although characters < 0x100000 are legal
Attachment #389486 - Flags: superreview?(edwsmith)
Attachment #389486 - Flags: review?(lhansen)

Comment 2

8 years ago
I have a couple of questions/remarks first.

Do we have benchmarks that the unnecessary allocations actually matter?

I don't understand why the thing with oldStrings is #ifdef DEBUGGER, so explain it here and insert a comment in the code :-)

The DRC is not needed for oldStrings; auto variables do not need reference count operations or write barriers.
(Reporter)

Comment 3

8 years ago
This is a reaction to Steven's email comments about bug-laden unnecessary allocations. I agree with him - it is completely unnecessary to pre-allocate a string just to find out that the string already is present. It will be hard to create a test case with enough memory evidence - significant differenceses may be proven in a Flash environment.

I have no clue what these #ifdef DEBUGGER statements are about - this is old code. Maybe it is possible to track down the original author, or maybe we can just remove these statements.

Comment 4

8 years ago
Re a benchmark, all I'm really asking for is a simple program - I'm guessing 10 lines of ActionScript is more than enough - s.t. a simple run in the avmshell reveals that the new code is a performance improvement over the old one for at least that program.  If we can't do even that much then this is probably not worth the bother.  I don't care if we reduce GC time or string allocation time or what - I just want to know that under optimally favorable conditions, it makes a difference.  (I completely believe that it's a win.  I just want to see it.)

Re DEBUGGER, yes, let's try to figure out what's going on - it doesn't look safe to me unless there's a strange effect in createUtf8 that can cause the string table to be rehashed.  But then why only for DEBUGGER?

Comment 5

8 years ago
Re: benchmark, I agree with Lars here -- we've found lots of unanticipated side-effects from touching String in the past, so a benchmark seems prudent. (I too can believe that it's a win, but want proof that it is at least not a regression, since we've been surprised in the past.)

Re: DEBUGGER... not certain, but: if String::createUTF8() triggered a sweep, we might walk thru the interned strings and prune unused ones, meaning that the strings array could change out from under us. (Not sure why this would be DEBUGGER-specific.)

Comment 6

8 years ago
Comment on attachment 389486 [details] [diff] [review]
AvmCore::internStringUTF8() does not pre-create strings

We'll do a review when we have some numbers.
Attachment #389486 - Flags: review?(lhansen)

Comment 7

8 years ago
Comment on attachment 389486 [details] [diff] [review]
AvmCore::internStringUTF8() does not pre-create strings

Previous comments have not yet been addressed.  Need to sort out the #DEBUGGER code, insert comments, and post benchmark + results.
Attachment #389486 - Flags: superreview?(edwsmith) → superreview-

Comment 8

8 years ago
Targeting FP10.1 but until we have evidence it isn't high priority.
OS: Windows XP → All
Priority: -- → P5
Hardware: x86 → All

Updated

8 years ago
Target Milestone: flash10.1 → flash10.2

Updated

8 years ago
Whiteboard: PACMAN

Updated

8 years ago
Assignee: daumling → nobody
This is a mass change. Every comment has "assigned-to-new" in it.

I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW

Comment 10

7 years ago
#ifdef DEBUGGER code came into player's codebase on 7/18/2006 with CL 231775 from Tom Reilly:

Integrate onepass GC w/ some riders:

1) fix AVM+ interning to use the same hash algorithm in findString
as reshashStirngs (latent bug just happened to notice)

Updated

7 years ago
Flags: flashplayer-injection-
Flags: flashplayer-bug-
Priority: P5 → --
Target Milestone: Q3 11 - Serrano → Future
You need to log in before you can comment on or make changes to this bug.