If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Remove UTF-32 support in strings

VERIFIED FIXED

Status

Tamarin
Virtual Machine
--
minor
VERIFIED FIXED
8 years ago
8 years ago

People

(Reporter: Michael Daumling, Assigned: Lars T Hansen)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

8 years ago
Since UTF-32 support is not in demand and has never been tested, remove the feature for now.

Comments from Lars Hansen in bug 494778:

avmfeatures.h and avmfeatures.cpp MUST NOT be edited, as I had hoped the
comment at the beginning of each file made clear.  Edit avmfeatures.as and then
rerun that script.

Should be split into two patches, there's enough going on here that the issues
should be separated.  (Suggest one patch for UTF32 removal, and then another
building on that for the optimization.)

Questions:

Is it still right for caseChange to operate on utf32_t?  Since every element in
the string is wchar, presumably the argument and return type of the unimapper
should be restricted too?  (If not, it seems that more documentation might be
in order for the function.)

Stylistic and other:

It seems that by using uint8_t* rather than char* for p8 you can avoid some
"... & 0xFF" operations in mixed-width string comparisons.  (Such a fix might
not fit as part of this patch though, it's a general comment.)

In isWhitespace the comparison ch > 0xFFFF is redundant since wchar is known to
be 16 bits.  Ditto <= 0xFFFF in _charAt.  These are just two I noticed, there
might be more.
(Reporter)

Comment 1

8 years ago
Created attachment 380395 [details] [diff] [review]
UTF-32 support removed, minor optimization

This is a complete patch, where UTF-32 support has been removed, and the switch statements have been converted to if/else statements. Also, the two items that Lars mentioned have been changed: checks for ch being <= 0xFFFF have been removed, and the char* data type in the Pointers union has been replaced with uint8_t*, and the corresponding mask operations (ch & 0xFF) have been removed. Interestingly enough, the latter change seems to be a non-trivial performance enhancement: the acceptance tests ran 3:30 instead of 4:00 (which may be coincidental).

Lars, please let me know if the the patch is acceptable as-is, of if you would like the patch to be split into two halves as you suggested in your comments.
Attachment #380395 - Flags: review?(lhansen)
(Reporter)

Comment 2

8 years ago
The #define has not been removed from avmfeatures.as yet. The corresponding patch lists both avmfeatures.cpp and avmfeatures.h as complete files. I would suggest that someone else (Lars?) removes the AVMFEATURE_UTF32 feature from avmfeatures.as for me - I seem not to be able to generate a meaningful patch file here (it is a CRLF problem).
(Assignee)

Comment 3

8 years ago
Comment on attachment 380395 [details] [diff] [review]
UTF-32 support removed, minor optimization

CharAtType now seems redundant, it could just be wchar.
Attachment #380395 - Flags: review?(lhansen) → review+
(Assignee)

Comment 4

8 years ago
Sign this bug over to me when you've landed the patch and I will update the feature system.
(Reporter)

Comment 5

8 years ago
Pushed as changeset #1974:abf75247ba85

Lars, please go ahead and update the AvmFeatures.
(Reporter)

Updated

8 years ago
Assignee: daumling → lhansen
(Assignee)

Comment 6

8 years ago
redux changeset 1981:18a3d15a0114
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED

Comment 7

8 years ago
Resolved fixed engineering / work item that has been pushed.  Setting status to verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.