Closed
Bug 589917
Opened 15 years ago
Closed 15 years ago
js_GetDeflatedUTF8StringLength computes the wrong UTF-8 length for surrogate pairs
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | betaN+ |
People
(Reporter: gwurk, Assigned: m_kato)
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file)
2.28 KB,
patch
|
igor
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100723 Ubuntu/8.04 (hardy) Firefox/3.6.8 GTB7.1
Build Identifier:
We can call JS_EncodeCharacters() with a NULL output buffer to get the length of the UTF8-encoded string. JS_EncodeCharacters() calls js_GetDeflatedUTF8StringLength() internally to compute the string length (or js_GetDeflatedStringLength() in standalone SpiderMonkey).
js_GetDeflatedUTF8StringLength() returns the wrong length when the string has a surrogate pair in it.
Reproducible: Always
Steps to Reproduce:
#include <assert.h>
#include "jsapi.h"
int main() {
JS_SetCStringsAreUTF8();
JSRuntime* rt = JS_NewRuntime(10000);
JSContext* cx = JS_NewContext(rt, 8192);
// surrogate pair for \U00010000 = LINEAR B SYLLABLE B008 A:
static const jschar surrogate_pair[] = { 0xd800, 0xdc00 };
// should be encoded in UTF-8 as 4 bytes: 0xf0 0x90 0x80 0x80
// if we call JS_EncodeCharacters() with an output buffer, we get
// the correct output length:
char output_buffer[10];
size_t utf8_len = sizeof(output_buffer);
assert(JS_EncodeCharacters(cx, surrogate_pair, 2,
output_buffer, &utf8_len));
assert(utf8_len == 4);
// but if we call JS_EncodeCharacters() with a NULL output buffer to
// get the UTF-8 length (which calls js_GetDeflatedUTF8StringLength
// internally), we get the wrong output length:
assert(JS_EncodeCharacters(cx, surrogate_pair, 2,
NULL, &utf8_len));
assert(utf8_len == 4); // assert fails, utf8_len is 5 instead
return 0;
}
Updated•15 years ago
|
blocking2.0: --- → betaN+
Assignee | ||
Updated•15 years ago
|
Assignee: general → m_kato
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 1•15 years ago
|
||
Assignee | ||
Updated•15 years ago
|
Attachment #479754 -
Flags: review?(lw)
![]() |
||
Comment 2•15 years ago
|
||
Comment on attachment 479754 [details] [diff] [review]
fix
I'm not really in the know for utf8 deflation, but Waldo says Igor is.
Attachment #479754 -
Flags: review?(lw) → review?(igor)
Comment 3•15 years ago
|
||
Comment on attachment 479754 [details] [diff] [review]
fix
># HG changeset patch
># Parent 571523d28b41168b177295988aa6b040d0ec6d38
>Bug 589917 - js_GetDeflatedUTF8StringLength computes the wrong UTF-8 length for surrogate pairs
>
>diff --git a/js/src/jsapi-tests/Makefile.in b/js/src/jsapi-tests/Makefile.in
>--- a/js/src/jsapi-tests/Makefile.in
>+++ b/js/src/jsapi-tests/Makefile.in
>@@ -64,16 +64,17 @@ CPPSRCS = \
> testLookup.cpp \
> testNewObject.cpp \
> testOps.cpp \
> testPropCache.cpp \
> testSameValue.cpp \
> testScriptObject.cpp \
> testSetPropertyWithNativeGetterStubSetter.cpp \
> testTrap.cpp \
>+ testUTF8.cpp \
> testXDR.cpp \
> $(NULL)
>
> DEFINES += -DEXPORT_JS_API
>
> LIBS = $(NSPR_LIBS) $(DEPTH)/$(LIB_PREFIX)js_static.$(LIB_SUFFIX)
>
> LOCAL_INCLUDES += -I$(topsrcdir) -I..
>diff --git a/js/src/jsapi-tests/testUTF8.cpp b/js/src/jsapi-tests/testUTF8.cpp
>new file mode 100644
>--- /dev/null
>+++ b/js/src/jsapi-tests/testUTF8.cpp
>@@ -0,0 +1,23 @@
>+/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*-
>+ * vim: set ts=8 sw=4 et tw=99:
>+ */
>+
>+#include "tests.h"
>+
>+BEGIN_TEST(testUTF8_bug589917)
>+{
>+ const jschar surrogate_pair[] = { 0xd800, 0xdc00 };
>+ char output_buffer[10];
>+ size_t utf8_len = sizeof(output_buffer);
>+
>+ JS_SetCStringsAreUTF8();
JS_SetCStringsAreUTF8 affects all the runtimes and must be called before the first runtime is created. That is enforced in a debug build. So this call should be moved into jsapi-tests/tests.cpp before the main function runs any tests.
>diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp
>--- a/js/src/jsstr.cpp
>+++ b/js/src/jsstr.cpp
>@@ -3946,16 +3946,18 @@ js_GetDeflatedUTF8StringLength(JSContext
> nbytes = nchars;
> for (end = chars + nchars; chars != end; chars++) {
> c = *chars;
> if (c < 0x80)
> continue;
> if (0xD800 <= c && c <= 0xDFFF) {
> /* Surrogate pair. */
> chars++;
>+ /* nbytes sets 1 length since this is surrogate pair. */
>+ nbytes--;
Nit: blank line before comments.
r+ with the above fixed.
Attachment #479754 -
Flags: review?(igor) → review+
Assignee | ||
Comment 4•15 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 5•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•