Last Comment Bug 687679 - JS_CompileFile* should handle UTF-8 (xpconnect doesn't convert non-ascii wchars correctly)
: JS_CompileFile* should handle UTF-8 (xpconnect doesn't convert non-ascii wcha...
Status: RESOLVED FIXED
: dev-doc-needed
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla11
Assigned To: John Schoenick [:johns]
:
: Jason Orendorff [:jorendorff]
Mentors:
Depends on: 711980
Blocks:
  Show dependency treegraph
 
Reported: 2011-09-19 15:07 PDT by Bobby Holley (:bholley) (busy with Stylo)
Modified: 2011-12-19 07:25 PST (History)
8 users (show)
Ms2ger: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Attempt to parse files as UTF8 in JS_CompileFile* (2.78 KB, patch)
2011-11-18 11:53 PST, John Schoenick [:johns]
no flags Details | Diff | Splinter Review
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests. (239.01 KB, patch)
2011-11-22 14:28 PST, John Schoenick [:johns]
no flags Details | Diff | Splinter Review
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests. (fixed) (239.11 KB, patch)
2011-11-22 14:52 PST, John Schoenick [:johns]
no flags Details | Diff | Splinter Review
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests. (241.17 KB, patch)
2011-11-28 17:01 PST, John Schoenick [:johns]
jst: review+
Details | Diff | Splinter Review
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests (241.55 KB, patch)
2011-11-29 13:25 PST, John Schoenick [:johns]
jorendorff: review+
Details | Diff | Splinter Review
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests (241.67 KB, patch)
2011-12-16 11:20 PST, John Schoenick [:johns]
no flags Details | Diff | Splinter Review
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests (242.51 KB, patch)
2011-12-16 11:52 PST, John Schoenick [:johns]
no flags Details | Diff | Splinter Review
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests (242.51 KB, patch)
2011-12-16 14:31 PST, John Schoenick [:johns]
no flags Details | Diff | Splinter Review

Description Bobby Holley (:bholley) (busy with Stylo) 2011-09-19 15:07:06 PDT
I have the following method in IDL:

wchar testWchar(in wchar a, inout wchar b);

The implementation is roughly the following:
foo(a, b) {
  rv = b.value;
  b.value = a;
  return rv;
}

This produces the expected results: foo("z", "q");
This returns a question mark: foo("z", "ア");

This works for wstring, but not wchar. I'm making a note in my testing framework (bug 684327) to investigate this.
Comment 1 Bobby Holley (:bholley) (busy with Stylo) 2011-11-16 07:50:10 PST
To reproduce the problem, just change the relevant characters here: http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/tests/unit/test_params.js#145
Comment 2 John Schoenick [:johns] 2011-11-18 11:53:20 PST
[ Adding jwalden since this messes with jsapi ]

Proposed fix attached - 
The JS_CompileFile* functions simply read and inflate to UTF16, nuking any encoding except ASCII. This attempts to run the script through JS_DecodeUTF8 first. Ideally we would have a encoding detection of some kind, but this at least lets UTF8 files (which many of these tests already were assuming xpcshell was using) load in properly.

The reason the wstring test was succeeding is because the utf8 was being loaded as ascii - so the resultant string did match the input, equally scrambled. (I found no actual error in the WCHAR/WCHAR_STR handling in XPCConvert)
Comment 3 John Schoenick [:johns] 2011-11-18 11:53:52 PST
Created attachment 575517 [details] [diff] [review]
Attempt to parse files as UTF8 in JS_CompileFile*
Comment 4 Jason Orendorff [:jorendorff] 2011-11-21 12:40:23 PST
Since this changes functionality, please also rename the API entry points for the benefit of users that currently expect Latin-1 to work:

  JS_CompileFile*  -->  JS_CompileUTF8File*

UTF8 is so much better. Thanks.
Comment 5 John Schoenick [:johns] 2011-11-22 14:28:02 PST
Created attachment 576297 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests.

Updated patch. This is the same as before, but renames JS_CompileFile* entry points to JS_CompileUTF8File* to reflect the change. It also fixes some tests that were not assuming UTF8.

Most of these tests were the uconv tests, which had raw bytes in their strings, representing characters in the encoding they were testing. These were changed by replacing the strings with escape sequences.
Comment 6 John Schoenick [:johns] 2011-11-22 14:37:07 PST
Comment on attachment 576297 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests.

Doesn't free decodebuf, fixed version inc
Comment 7 John Schoenick [:johns] 2011-11-22 14:52:12 PST
Created attachment 576311 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests. (fixed)
Comment 8 Jason Orendorff [:jorendorff] 2011-11-23 08:12:17 PST
Comment on attachment 576311 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests. (fixed)

Review of attachment 576311 [details] [diff] [review]:
-----------------------------------------------------------------

All the changes in js/src are fine. The test changes look fine. (I guess test_LightweightThemeManager.js just doesn’t do what it looks like at all without this change? good grief)

Clearing the review flag rather than setting r+ because of the change in mozJSComponentLoader.cpp. Ask an XPConnect peer about that.

::: ipc/testshell/XPCShellEnvironment.cpp
@@ +338,2 @@
>                                                     Environment(cx)->GetPrincipal());
>          fclose(file);

In all the places like this, also fix up the indentation on the following line.

::: js/src/jsapi.cpp
@@ +4828,1 @@
>              }

Please file a followup bug to replace this with a call to fread.

::: js/xpconnect/loader/mozJSComponentLoader.cpp
@@ +937,5 @@
>  #else  /* HAVE_PR_MEMMAP */
>  
>              /**
>               * No memmap implementation, so fall back to using
> +             * JS_CompileUTF8FileHandleForPrincipals().

With this change, we treat scripts as Latin-1 if HAVE_PR_MEMMAP and UTF-8 otherwise. It should be one or the other on all platforms.

But more importantly we should not switch to UTF-8 here without feedback from people who know what would break. I suspect it’s too big a breaking change.

I thought the idea was that the change would only affect xpcshell and the js shell.
Comment 9 Jason Orendorff [:jorendorff] 2011-11-23 08:15:55 PST
If I had a dime for every inexplicable bad thing Splinter ever did to me I would be all set for lunch, but not as all set for lunch as I’d be if I had a dime for every time I’ve complained about Splinter doing something inexplicably bad to me and got a response like “it always includes the preceding five lines, just plan accordingly”.

Not bitter!
Comment 10 John Schoenick [:johns] 2011-11-28 17:01:09 PST
Created attachment 577430 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests.

per jst, just change this fallback code (which only runs on OS/2 w/o mmap) to read in the file and use CompileScript
Comment 11 Johnny Stenback (:jst, jst@mozilla.com) 2011-11-28 17:50:17 PST
Comment on attachment 577430 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests.

- In XPCShellEnvironment.cpp

         JSScript* script =
-            JS_CompileFileHandleForPrincipals(cx, obj, filename, file,
+            JS_CompileUTF8FileHandleForPrincipals(cx, obj, filename, file,
                                               env->GetPrincipal());

Fix second line argument indentation to line up with the arguments on the first line. Same thing in xpcshell.cpp.

- In mozJSComponentLoader::GlobalForLocation():

+            struct stat st;
+            int ok = fstat(fileno(fileHandle), &st);
+            if (ok != 0) {
+                NS_WARNING("Failed to stat file");
+                return NS_ERROR_FAILURE;
+            }
+
+            size_t len = st.st_size;

You can get the size of the file from aComponentFile->GetFileSize(), and thus avoid the explicit stat here, and the need for the extra #include at the top of this file.

+            size_t rlen = fread(buf, 1, len, fileHandle);
+            if (!rlen) {

We should probably check rlen != len here to catch cases where the read ends short of reading what it was asked to (due to errors of various kinds etc).

r=jst with that for the tests and xpconnect changes, jorendorff should look at the JS engine changes in this patch.
Comment 12 John Schoenick [:johns] 2011-11-29 13:25:49 PST
Created attachment 577735 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests

Changes from jst, also fixed issue where JS_SetOptions(oldopts) wasn't being called on all return paths
Comment 13 John Schoenick [:johns] 2011-11-29 17:32:09 PST
Comment on attachment 577735 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests

Jason -
jst okay'd the component loader changes here, r?ing you to make sure you're happy with the final JS stuff
Comment 14 Jason Orendorff [:jorendorff] 2011-12-01 09:26:29 PST
Comment on attachment 577735 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests

Great. Thanks for finishing this!
Comment 15 :Ms2ger (⌚ UTC+1/+2) 2011-12-15 13:30:01 PST
Can this be checked in? Jason?
Comment 16 John Schoenick [:johns] 2011-12-16 11:20:39 PST
Created attachment 582330 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests

Rebased against newest moz central

This is ready for checkin
Comment 17 :Ms2ger (⌚ UTC+1/+2) 2011-12-16 11:42:06 PST
John, the indentation goes wrong in a number of cases; could you fix that?
Comment 18 John Schoenick [:johns] 2011-12-16 11:52:33 PST
Created attachment 582348 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests

Fixed indent in several spots

The funny thing is I specifically went through looking for indentation issues once. I'm not sure how I missed approximately all of them.
Comment 19 :Ms2ger (⌚ UTC+1/+2) 2011-12-16 12:11:34 PST
Thanks, I pushed to inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/ba447ace2594
Comment 20 :Ms2ger (⌚ UTC+1/+2) 2011-12-16 12:26:07 PST
And backed because frontend::CompileScript grew an extra argument.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a658b1e5d56b

I've fixed that locally, and I'm sending it to try with my own patches tonight. I'll land over the weekend if everything goes well.
Comment 21 John Schoenick [:johns] 2011-12-16 14:31:06 PST
Created attachment 582390 [details] [diff] [review]
Change JS_CompileFile* to JS_CompileUTF8File* and fix non-UTF8 tests

Ack, the only change in the rebase was a stdint type in that block, so I missed the sneaky NULL arg.

I ran a full build/tests again with the arg fix (which I should've done in the first place). This has been through try once so hopefully it wont cause any issues, but it couldn't hurt to run it again, especially with the recent JS changes. Let me know if anything melts.
Comment 22 :Ms2ger (⌚ UTC+1/+2) 2011-12-18 07:08:05 PST
https://hg.mozilla.org/mozilla-central/rev/d89f3d80d3ec

Note You need to log in before you can comment on or make changes to this bug.