Closed Bug 1342703 Opened 8 years ago Closed 8 years ago

JS_EncodeStringToBuffer should encode to UTF-8

Categories

(Core :: JavaScript Engine, defect)

45 Branch
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: ptomato, Unassigned)

Details

Attachments

(1 file)

Attached file Full test program
User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0 Build ID: 20170126153103 Steps to reproduce: Here's a snippet of embedder code, which I ran with the usual boilerplate around it: const char *script = u8"Array.from('
Oh wow, Bugzilla does not dig those higher codepoint characters. Following is approximately what I typed into the form originally :-) Here's a snippet of embedder code, which I ran with the usual boilerplate around it: const char *script = u8"Array.from('(characters that I can't paste here)')"; JS::CompileOptions options(cx); options.setUTF8(true); JS::Evaluate(cx, options, script, strlen(script), &rval); JS::RootedString str(cx, JS::ToString(cx, rval)); char *encoded = JS_EncodeStringToUTF8(cx, str); printf("JS_EncodeStringToUTF8: %s\n", encoded); JS_free(cx, encoded); size_t len = JS_GetStringEncodingLength(cx, str); char *buffer = new char[len + 1]; JS_EncodeStringToBuffer(cx, str, buffer, len); buffer[len] = '\0'; printf("JS_EncodeStringToBuffer: %s\n", buffer); delete buffer; (Full program is attached.) Output: JS_EncodeStringToUTF8: (correct output that I can't paste here) JS_EncodeStringToBuffer: <j,<f,<g,<i Expected behaviour: The output after the colon on the two lines should be identical. According to [1], JS_EncodeStringToBuffer() should fill the buffer with UTF-8 encoded text. Instead, it seems to be discarding the second byte of each two-byte character. [1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_Reference/JS_EncodeStringToBuffer
sorry, the document is wrong. it would be better introducing UTF8 variant, instead of changing existing API's behavior. and maybe deprecate existing one later, so that embedders can easily switch to newer one.
for now, fixed the document.
OK, thanks! I'll use a different function then.
hm, there are already alternatives. JS::GetDeflatedUTF8StringLength for calculating length, and JS::DeflateStringToUTF8Buffer for encoding string to buffer. both declared in js/public/CharacterEncoding.h I'll create documentation
Thanks! This bug can be closed now, I think.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: