Closed Bug 315974 Opened 19 years ago Closed 18 years ago

JSprintf functions cannot print jschar characters and strings

Categories

(Core :: JavaScript Engine, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: daumling, Assigned: daumling)

References

Details

(Keywords: intl)

Attachments

(3 files, 3 obsolete files)

For wide characters and strings, the internal printf() function should support formats. This makes only sense if UTF8 is enabled, but it prevents e.g. String.quote() from working reliably, because that method does an internal printf().

This is a follow-up bug of 232182, but since that bug got very long already, I thought I'd just file a new bug.
*** Bug 315973 has been marked as a duplicate of this bug. ***
Attached patch Bug fix (obsolete) — Splinter Review
This fix introduces the format %hc and %hs as the 16-bit analog to %c and %s. It also patches Sprint() to handle UTF-16 surrogate pairs correctly. Finally, there was a bug in js_DeflateString that threw an error (buffer too small) if the last two characters of a wide string were a UTF-16 surrogate pair.
Assignee: general → daumling
Status: NEW → ASSIGNED
Attachment #202602 - Flags: superreview?(brendan)
Attachment #202602 - Flags: review?(mrbkap)
Comment on attachment 202602 [details] [diff] [review]
Bug fix

>Index: jsopcode.c
> #ifdef JS_STRINGS_ARE_UTF8
>-            /* print as UTF-8 string */
>-            ok = Sprint(sp, "%hc", c) >= 0;
>+            /* If this is a surrogate pair, make sure to print the pair. */
>+            if(c >= 0xD800 && c <= 0xDBFF)
>+                ok = Sprint(sp, "%.2hs", t++);
>+            else
>+                /* Print as UTF-8 string. */
>+                ok = Sprint(sp, "%hc", c) >= 0;

Nits: space after if, braces around multiline if/else.


>Index: jsprf.c
>+    if(ws) {

Nit: space after if.

>+    if (s)
>+        JS_free(s);

JS_free is null-safe and takes a context argument. C sort of sucks here, it should give an error. I'm not sure what you want to do. It seems that JS_free can live with a null context argument, but that sounds less than kosher (I'm not sure).

>+}
>+#endif

Nit: add a newline here.

> /*
> ** BuildArgArray stands for Numbered Argument list Sprintf
...
>+#ifdef JS_STRINGS_ARE_UTF8
>+    char utf8buf [6];

Nit: extra space before [

>+            if(type == TYPE_INT16) {

Nit: space after if.

>+                /* 
>+                 * This would do a simple string/byte conversion 
>+                 * if JS_STRINGS_ARE_UTF8 is not defined. 
>+                 */
>+                u.ws = va_arg(ap, const jschar*);
>+                rv = cvt_ws(ss, u.ws, width, prec, flags);

cvt_ws is only defined #ifdef JS_STRINGS_ARE_UTF8. You need to wrap this if in an #ifdef, probably like:
#ifdef JS_STRINGS_ARE_UTF8
if (...) {
} else
#endif
{
...
}
Attachment #202602 - Flags: review?(mrbkap) → review-
How could this compile & run?

Sorry for having wasted your time.

Question: In case of a malformed UTF-16 surrogate pair, what should I do? If I use the error reporting mechanism and return -1, an Out of memory error is reported. Sounds a bit harsh to me.
Updated version after all nits being removed (I hope), and a rewrite of cvt_ws().
Attachment #202602 - Attachment is obsolete: true
Attachment #202682 - Flags: review?(mrbkap)
Attachment #202602 - Flags: superreview?(brendan)
Attached file Rudimentary regress test (obsolete) —
This test is incomplete, because it behaves differently whether UTF-8 is enabled or not. If UTF-8 is enabled, the test ends with throwing an out-of-memory error if UTF-8 is enabled. If not, the test terminates normally. Bob, please decide what to do here.

If someone tries to printf() an incomplete or bad surrogate pair, an out-of-memory error is thrown. The entire printf() mechanism is only able to generate that error, and I think that this is still better that to insert a magic string (like 'error!' or the like) into the resulting output.
Attachment #202683 - Flags: review?(bob)
Oops - wrong test...sorry...
Attachment #202683 - Attachment is obsolete: true
Attachment #202685 - Flags: review?(bob)
Attachment #202683 - Flags: review?(bob)
Should this block bug 232182?  I think so, but I'm not sure this bug is valid.  Why do we need to mess with jsopcode.c at all?  It should generate ASCII-only decompilation result strings.

/be
Blocks: 232182
Comment on attachment 202682 [details] [diff] [review]
Updated patch after nits and bug fix

Nit: } else {, please.

>+    }
>+    else
>+        result = cvt_s(ss, NULL, width, prec, flags);

Other than that, this seems OK to me. Brendan should chime in first, though.
Attachment #202682 - Flags: superreview?(brendan)
Attachment #202682 - Flags: review?(mrbkap)
Attachment #202682 - Flags: review+
Comment on attachment 202682 [details] [diff] [review]
Updated patch after nits and bug fix

Couple of thoughts, one already raised as a question here:

- Why does QuoteString need to produce anything but the subset of ASCII it has always produced?  There is no need for this change to make non-ASCII exception messages round-trip, is there?

- I should have suggested that the macro be JS_CHAR_STRINGS_ARE_UTF8, since JS_STRINGS_ARE... sounds like it could mean JSStrings, not C char strings.

/be
I wish...

Unfortunately, if I attempt to access a variable that contains Unicode, like

\u0440;

jsinterp.c calls js_AtomToPrintableString() to get the variable name, and that fn calls js_ValueToPrintableString(), which calls js_QuoteString(), which finally uses the internal printf(), which leads to the error message

\u0440 is not defined.

where the variable name is escaped. I guess that one could rewrite js_QuoteString(), but expanding the internal printf() functions to be able to handle Unicode seems to be a better fix.
(In reply to comment #11)
> Unfortunately, if I attempt to access a variable that contains Unicode, like
> 
> \u0440;
> 
> jsinterp.c calls js_AtomToPrintableString() to get the variable name, and that
> fn calls js_ValueToPrintableString(), which calls js_QuoteString(), which
> finally uses the internal printf(), which leads to the error message
> 
> \u0440 is not defined.

Ok, I get it.  But you do realize (don't blame me! ;-) that \u0440 is a legal one character identifier in ECMA-262 Edition 3:

js> \u0440
typein:2: ReferenceError: \u0440 is not defined
js> \u0440 = "hi"
hi
js> \u0440
hi
js>

So what you are trying to do is give nicer diagnostics, but even without this QuoteString and *s*printf change, the diagnostics use legal JS syntax to denote the offending identifier.

/be
Due to the inability of QuoteString() to handle Unicode correctly, the following tests fail if JS_STRINGS_ARE_UTF8 is enabled:

ecma_3/Function/regress-58274.js
ecma_3/Unicode/uc-005.js
js1_2/regexp/control_characters.js

Brendan, the patch may already be too old to be applied - can you super-review?
Comment on attachment 202682 [details] [diff] [review]
Updated patch after nits and bug fix

>         /* Use js_EscapeMap, \u, or \x only if necessary. */
>         if ((u = js_strchr(js_EscapeMap, c)) != NULL)
>             ok = Sprint(sp, "\\%c", (char)u[1]) >= 0;
>         else {

I missed this change, but when it was made, the "then" clause should have been braced.  JS house style braces both clauses when either is multiline (for any reason, even a comment), and when the condition is multiline.

> #ifdef JS_STRINGS_ARE_UTF8

Again, I would like this to be JS_CHAR_STRINGS_ARE_UTF8 or JS_C_STRINGS_ARE_UTF8.  Comments?

>+++ jsprf.c	11 Nov 2005 05:01:20 -0000
>@@ -44,16 +44,18 @@
> #include "jsstddef.h"
> #include <stdarg.h>
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include "jsprf.h"
> #include "jslong.h"
> #include "jsutil.h" /* Added by JSIFY */
>+#include "jspubtd.h"
>+#include "jsstr.h"

Odd to see lower-level files (forked from NSPR) require higher-level headers.  Not a bug in the single-directory happy home of js/src, just noting it.  In an NSPR-based version we would want an API to avoid these dependencies.

sr=me with mrbkap's comment and the above addressed.

/be
Attachment #202682 - Flags: superreview?(brendan) → superreview+
I am not sure what you mean with your header comments. I guess that I lack the historical knowledge of the origins of this file - I leave the headers included for now :)
Attachment #202682 - Attachment is obsolete: true
Attachment #204072 - Flags: superreview?(brendan)
Comment on attachment 204072 [details] [diff] [review]
Updated patch after nits

I meant that jsprf.c came from prprf.c, part of NSPR, a separate and more primitive module -- so in some sense it shouldn't be including jspubtd.h or jsstr.h.  But it's not a big deal, just a note for future reference (as if people will find it in a closed bug ;-).

/be
Attachment #204072 - Flags: superreview?(brendan) → superreview+
Created bug 318402 to track the renaming of JS_STRINGS_ARE_UTF8 to JS_C_STRINGS_ARE_UTF8.
Blocks: 318402
Could the patch be applied to trunk so I can continue with 318402, please?
The cosmetic changes aside (important to get right sooner or later, my fingers and eyes are tired of policing them), the substantive change is that out parameters are never set on failure return.  This affects js_[De,In}flateStringToBuffer and the new Encode/Decode APIs.  It should not be a problem -- it is not for internal call sites.

/be
Attachment #207648 - Flags: superreview?(shaver)
Attachment #207648 - Flags: review?(daumling)
Attachment #207648 - Flags: review?(daumling) → review+
Comment on attachment 207648 [details] [diff] [review]
extended cleanup on last patch

I'm not going to get to this until Wed at earliest, next week more likely (moving and conference stuff this week).  I don't mind it in my review queue, but you might want to get someone else if time is pressing.
Mike, any chance to sr the patch to get it integrated?
Comment on attachment 207648 [details] [diff] [review]
extended cleanup on last patch

Blake, can you sr the patch, please? The patch has been resting now for 18 days.
Attachment #207648 - Flags: superreview?(shaver) → superreview?(mrbkap)
Comment on attachment 207648 [details] [diff] [review]
extended cleanup on last patch

>Index: jsopcode.c
>+     * Supply NULL as the JSContext; errors are not reported, 
>+     * and malloc() is used to allocate the buffer buffer. 

Nit: "buffer buffer". One review is all that JS trunk needs, so we should get this checked in, and shaver will do his superreview ex-post-facto.
Attachment #207648 - Flags: superreview?(mrbkap) → superreview?(shaver)
Has the patch been checked in yet? I want to go ahead anf fix more stuff, and this  work needs the patch!!!
Blocks: js1.6rc1
Does this bug need a final sr?
Nah, this is good to go for checkin.
Patch has been checked in. Test case for 232182 works now.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
(In reply to comment #7)
> Created an attachment (id=202685) [edit]
> Correct rudimentary regress test

The form of the test is ok (other than having DOS line endings), but I am not clear on how effectively it tests the feature. Apart from Firefox 1.5 winxp from today which hangs on this testcase, I can't get the test to fail on the 1.7.13, 1.8.0.1, 1.8 or 1.9 branches. 
Flags: blocking1.8.1?
Checking in regress-315974.js;
/cvsroot/mozilla/js/tests/js1_5/Regress/regress-315974.js,v  <--  regress-315974.js
initial revision: 1.1
Flags: in-testsuite+
verified fixed 20060609 win/macppc/linux trunk.
Status: RESOLVED → VERIFIED
Shaver can we get SR and a 1.8.1 nom on the patch if suitable for the branch.
Flags: blocking1.8.1? → blocking1.8.1-
Attachment #202685 - Flags: review?(bclary)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: