Closed Bug 589917 Opened 15 years ago Closed 15 years ago

js_GetDeflatedUTF8StringLength computes the wrong UTF-8 length for surrogate pairs

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: gwurk, Assigned: m_kato)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file)

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100723 Ubuntu/8.04 (hardy) Firefox/3.6.8 GTB7.1 Build Identifier: We can call JS_EncodeCharacters() with a NULL output buffer to get the length of the UTF8-encoded string. JS_EncodeCharacters() calls js_GetDeflatedUTF8StringLength() internally to compute the string length (or js_GetDeflatedStringLength() in standalone SpiderMonkey). js_GetDeflatedUTF8StringLength() returns the wrong length when the string has a surrogate pair in it. Reproducible: Always Steps to Reproduce: #include <assert.h> #include "jsapi.h" int main() { JS_SetCStringsAreUTF8(); JSRuntime* rt = JS_NewRuntime(10000); JSContext* cx = JS_NewContext(rt, 8192); // surrogate pair for \U00010000 = LINEAR B SYLLABLE B008 A: static const jschar surrogate_pair[] = { 0xd800, 0xdc00 }; // should be encoded in UTF-8 as 4 bytes: 0xf0 0x90 0x80 0x80 // if we call JS_EncodeCharacters() with an output buffer, we get // the correct output length: char output_buffer[10]; size_t utf8_len = sizeof(output_buffer); assert(JS_EncodeCharacters(cx, surrogate_pair, 2, output_buffer, &utf8_len)); assert(utf8_len == 4); // but if we call JS_EncodeCharacters() with a NULL output buffer to // get the UTF-8 length (which calls js_GetDeflatedUTF8StringLength // internally), we get the wrong output length: assert(JS_EncodeCharacters(cx, surrogate_pair, 2, NULL, &utf8_len)); assert(utf8_len == 4); // assert fails, utf8_len is 5 instead return 0; }
blocking2.0: --- → betaN+
Assignee: general → m_kato
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached patch fixSplinter Review
Attachment #479754 - Flags: review?(lw)
Comment on attachment 479754 [details] [diff] [review] fix I'm not really in the know for utf8 deflation, but Waldo says Igor is.
Attachment #479754 - Flags: review?(lw) → review?(igor)
Comment on attachment 479754 [details] [diff] [review] fix ># HG changeset patch ># Parent 571523d28b41168b177295988aa6b040d0ec6d38 >Bug 589917 - js_GetDeflatedUTF8StringLength computes the wrong UTF-8 length for surrogate pairs > >diff --git a/js/src/jsapi-tests/Makefile.in b/js/src/jsapi-tests/Makefile.in >--- a/js/src/jsapi-tests/Makefile.in >+++ b/js/src/jsapi-tests/Makefile.in >@@ -64,16 +64,17 @@ CPPSRCS = \ > testLookup.cpp \ > testNewObject.cpp \ > testOps.cpp \ > testPropCache.cpp \ > testSameValue.cpp \ > testScriptObject.cpp \ > testSetPropertyWithNativeGetterStubSetter.cpp \ > testTrap.cpp \ >+ testUTF8.cpp \ > testXDR.cpp \ > $(NULL) > > DEFINES += -DEXPORT_JS_API > > LIBS = $(NSPR_LIBS) $(DEPTH)/$(LIB_PREFIX)js_static.$(LIB_SUFFIX) > > LOCAL_INCLUDES += -I$(topsrcdir) -I.. >diff --git a/js/src/jsapi-tests/testUTF8.cpp b/js/src/jsapi-tests/testUTF8.cpp >new file mode 100644 >--- /dev/null >+++ b/js/src/jsapi-tests/testUTF8.cpp >@@ -0,0 +1,23 @@ >+/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 4 -*- >+ * vim: set ts=8 sw=4 et tw=99: >+ */ >+ >+#include "tests.h" >+ >+BEGIN_TEST(testUTF8_bug589917) >+{ >+ const jschar surrogate_pair[] = { 0xd800, 0xdc00 }; >+ char output_buffer[10]; >+ size_t utf8_len = sizeof(output_buffer); >+ >+ JS_SetCStringsAreUTF8(); JS_SetCStringsAreUTF8 affects all the runtimes and must be called before the first runtime is created. That is enforced in a debug build. So this call should be moved into jsapi-tests/tests.cpp before the main function runs any tests. >diff --git a/js/src/jsstr.cpp b/js/src/jsstr.cpp >--- a/js/src/jsstr.cpp >+++ b/js/src/jsstr.cpp >@@ -3946,16 +3946,18 @@ js_GetDeflatedUTF8StringLength(JSContext > nbytes = nchars; > for (end = chars + nchars; chars != end; chars++) { > c = *chars; > if (c < 0x80) > continue; > if (0xD800 <= c && c <= 0xDFFF) { > /* Surrogate pair. */ > chars++; >+ /* nbytes sets 1 length since this is surrogate pair. */ >+ nbytes--; Nit: blank line before comments. r+ with the above fixed.
Attachment #479754 - Flags: review?(igor) → review+
Whiteboard: fixed-in-tracemonkey
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: