surrogate pairs could [and should?] be printed (e.g. to console) with codepoint assembled

NEW
Unassigned

Status

P4
enhancement
7 years ago
7 years ago

People

(Reporter: pnkfelix, Unassigned)

Tracking

unspecified
Q1 12 - Brannan
Bug Flags:
flashplayer-injection -
flashplayer-qrb +
flashplayer-needsversioning ?

Details

Attachments

(3 attachments)

Current behavior at avmshell -repl:

> "\uD835\uDFD8"
??????


The above string literal is composed of two code units that form a surrogate pair; for more info read:
  http://en.wikipedia.org/wiki/UTF-16/UCS-2#Code_points_U.2B10000_to_U.2B10FFFF

In particular, that particular surrogate pair of code units
high=0xD835 low=0xDFD8 represent the code point 0x1D7D8
as one can readily verify via the calculation
(0x10000 + ((0xD835 - 0xD800) << 10) + (0xDFD8 - 0xDC00)).toString(16)

The code point 0x1D7D8 corresponds to "blackboard bold 0": 
Yikes, the description got cut off at my attempt to paste a blackboard bold.

I'll skip the pasted character and post the remainder of my intended bug description first (starting here):

The code point 0x1D7D8 corresponds to "blackboard bold 0":
  see: http://en.wikipedia.org/wiki/Blackboard_bold
It was one of the first "interesting" characters I found that lies outside the Basic Multilingual Plane (BMP).

Anyway, this is suboptimal.  Such strings *are* rendered in a more human-readable fashion in the Flash Player (I'll attach an example Flash Builder source file illustrating this).  And the console for Mac OS X (and presumably other modern hosts) is quite capable of rendering such characters properly.

(Plus fixing this does not require very much of a patch.  In my repository it involves minor changes to StringObject and PrintWriter.  The tricky part is determining if the proposed fix could break content elsewhere; even then a version check may suffice, but I would need to survey the player a bit first.)

The main reason I want this fixed: if we properly render surrogate pairs, then I assert there is a higher chance that we will then write shell tests that exercise them.
One more attempt to paste a blackboard bold via Firefox input form: 
Attempt to paste a blackboard bold 0 via Chrome (but this may be a server issue, not a browser one): 
Hmm, well, clearly my difficulty pasting the desired character into a web-browser presents a potential argument *against* implementing this change (since we probably don't want the avmshell console output to be incompatible with our bug tracking system) ... How ironic.

Nonetheless, I'll attach the relevant files (and then put this diversion aside).  Planned attachments are: 1. Illustrative Flash Builder program, 2. Text file holding a rendering blackboard bold 0 (so that I don't have to rely on Bugzilla form to present it), and 3. A patch that fixes the bug but is missing bug-compatibility guards to revert to prior behavior.
(Reporter)

Updated

7 years ago
Summary: surrogate pairs could/should be printed (e.g. to console) with codepoint assembled → surrogate pairs could [and should?] be printed (e.g. to console) with codepoint assembled
Created attachment 550409 [details]
Example Flash Builder program v1 (player does handle surrogate pair)
Created attachment 550410 [details]
Example text (in form of C program) with blackboard bold 0
(Reporter)

Updated

7 years ago
Attachment #550410 - Attachment mime type: text/plain → text/plain;charset=utf-8
Created attachment 550413 [details] [diff] [review]
patchA: a small unversioned and unvetted fix

(this at least fixes the problem I described in the description; with this patch, I can get a blackboard bold 0 when I run avmshell -repl in Terminal.app.)
(In reply to Felix S Klock II from comment #4)
> Hmm, well, clearly my difficulty pasting the desired character into a
> web-browser presents a potential argument *against* implementing this change
> (since we probably don't want the avmshell console output to be incompatible
> with our bug tracking system) ... How ironic.

I asked about this on the #bugzilla chat room.

 [LpSolit] fklockii: hum, I copied the content of your attachment in my Bugzilla, and I have no problem here
[fklockii] Okay.  I mainly wanted to double-check that this wasn't "as expected" or "notabug"; I'll double check whether it happens on landfill.bugzilla.org
 [LpSolit] fklockii: ah, interesting... another local installation has the problem
 [LpSolit] maybe it's a bug in Perl itself
 [LpSolit] ah no
 [LpSolit] because another installation with Perl 5.12.3 works fine too
 [LpSolit] the two which work run PostgreSQL, and the one which fails run MySQL
[fklockii] okay that sounds plausible cause

So:
1. I don't know if there's much we can do about the Bugzilla issue
2. I don't know if the Bugzilla issue actually matters: Arguably bugs in Bugzilla should not keep us from improving our own software.
(Reporter)

Updated

7 years ago
Attachment #550413 - Flags: feedback?(stejohns)

Comment 9

7 years ago
Comment on attachment 550413 [details] [diff] [review]
patchA: a small unversioned and unvetted fix

Review of attachment 550413 [details] [diff] [review]:
-----------------------------------------------------------------

No obvious flaws. Need to figure out whether this behavior needs to be versioned or not; most usage of PrintWriter is for diagnostic/development issues, but might be user-facing behavior that's affected...
Attachment #550413 - Flags: feedback?(stejohns) → feedback+
Hmm, Bug 641975 says that not all 8-bit Strings are not UTF-8: "they are Latin-1, which is only UTF8-compatible iff the 7bit flag is set."

I need to review our String API more carefully to understand the cases here.

Updated

7 years ago
Severity: normal → enhancement
Flags: flashplayer-qrb+
Flags: flashplayer-needsversioning?
Flags: flashplayer-injection-
Priority: -- → P4
Target Milestone: --- → Q1 12 - Brannan
You need to log in before you can comment on or make changes to this bug.