Closed Bug 589917 Opened 14 years ago Closed 14 years ago

js_GetDeflatedUTF8StringLength computes the wrong UTF-8 length for surrogate pairs

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: gwurk, Assigned: m_kato)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100723 Ubuntu/8.04 (hardy) Firefox/3.6.8 GTB7.1
Build Identifier: 

We can call JS_EncodeCharacters() with a NULL output buffer to get the length of the UTF8-encoded string. JS_EncodeCharacters() calls js_GetDeflatedUTF8StringLength() internally to compute the string length (or js_GetDeflatedStringLength() in standalone SpiderMonkey).

js_GetDeflatedUTF8StringLength() returns the wrong length when the string has a surrogate pair in it.

Reproducible: Always

Steps to Reproduce:
#include <assert.h>
#include "jsapi.h"

int main() {
  JS_SetCStringsAreUTF8();
  JSRuntime* rt = JS_NewRuntime(10000);
  JSContext* cx = JS_NewContext(rt, 8192);

  // surrogate pair for \U00010000 = LINEAR B SYLLABLE B008 A:
  static const jschar surrogate_pair[] = { 0xd800, 0xdc00 };
  // should be encoded in UTF-8 as 4 bytes: 0xf0 0x90 0x80 0x80

  // if we call JS_EncodeCharacters() with an output buffer, we get
  // the correct output length:
  char output_buffer[10];
  size_t utf8_len = sizeof(output_buffer);
  assert(JS_EncodeCharacters(cx, surrogate_pair, 2,
                             output_buffer, &utf8_len));
  assert(utf8_len == 4);

  // but if we call JS_EncodeCharacters() with a NULL output buffer to
  // get the UTF-8 length (which calls js_GetDeflatedUTF8StringLength
  // internally), we get the wrong output length:
  assert(JS_EncodeCharacters(cx, surrogate_pair, 2,
                             NULL, &utf8_len));
  assert(utf8_len == 4);  // assert fails, utf8_len is 5 instead

  return 0;
}
blocking2.0: --- → betaN+
Assignee: general → m_kato
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached patch fixSplinter Review
Attachment #479754 - Flags: review?(lw)
Comment on attachment 479754 [details] [diff] [review]
fix

I'm not really in the know for utf8 deflation, but Waldo says Igor is.
Attachment #479754 - Flags: review?(lw) → review?(igor)
Comment on attachment 479754 [details] [diff] [review]
fix

># HG changeset patch
># Parent 571523d28b41168b177295988aa6b040d0ec6d38
>Bug 589917 - js_GetDeflatedUTF8StringLength computes the wrong UTF-8 length for surrogate pairs
>
>diff --git a/js/src/jsapi-tests/Makefile.in b/js/src/jsapi-tests/Makefile.in
>--- a/js/src/jsapi-tests/Makefile.in
>+++ b/js/src/jsapi-tests/Makefile.in
>@@ -64,16 +64,17 @@ CPPSRCS = \
>   testLookup.cpp \
>   testNewObject.cpp \
>   testOps.cpp \
>   testPropCache.cpp \
>   testSameValue.cpp \
>   testScriptObject.cpp \
>   testSetPropertyWithNativeGetterStubSetter.cpp \
>   testTrap.cpp \
>+  testUTF8.cpp \
>   testXDR.cpp \
>   $(NULL)
> 
> DEFINES         += -DEXPORT_JS_API
> 
> LIBS      = $(NSPR_LIBS) $(DEPTH)/$(LIB_PREFIX)js_static.$(LIB_SUFFIX)
> 
> LOCAL_INCLUDES += -I$(topsrcdir) -I..
>diff --git a/js/src/jsapi-tests/testUTF8.cpp b/js/src/jsapi-tests/testUTF8.cpp
>new file mode 100644
>--- /dev/null
>+++ b/js/src/jsapi-tests/testUTF8.cpp
>@@ -0,0 +1,23 @@
>+/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-
>+ * vim: set ts=8 sw=4 et tw=99:
>+ */
>+
>+#include "tests.h"
>+
>+BEGIN_TEST(testUTF8_bug589917)
>+{
>+    const jschar surrogate_pair[] = { 0xd800, 0xdc00 };
>+    char output_buffer[10];
>+    size_t utf8_len = sizeof(output_buffer);
>+
>+    JS_SetCStringsAreUTF8();

JS_SetCStringsAreUTF8 affects all the runtimes and must be called before the first runtime is created. That is enforced in a debug build. So this call should be moved into jsapi-tests/tests.cpp before the main function runs any tests.

>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp
>+++ b/js/src/jsstr.cpp
>@@ -3946,16 +3946,18 @@ js_GetDeflatedUTF8StringLength(JSContext
>     nbytes = nchars;
>     for (end = chars + nchars; chars != end; chars++) {
>         c = *chars;
>         if (c < 0x80)
>             continue;
>         if (0xD800 <= c && c <= 0xDFFF) {
>             /* Surrogate pair. */
>             chars++;
>+            /* nbytes sets 1 length since this is surrogate pair. */
>+            nbytes--;

Nit: blank line before comments.

r+ with the above fixed.
Attachment #479754 - Flags: review?(igor) → review+
http://hg.mozilla.org/tracemonkey/rev/55ba10be6762
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/55ba10be6762
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: