Closed Bug 569271 Opened 14 years ago Closed 14 years ago

Securable module loader should treat files as UTF-8

Categories

(Add-on SDK Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mcepl, Assigned: adw)

Details

Attachments

(1 file)

I have this function:

var loadText = exports.loadText = function loadText(URL, cb_function, what) {
    if (what === undefined) { // missing optional argument
        what = this;
    }

    var req = new xhrMod.XMLHttpRequest();
    req.open("GET", URL, true);
    req.onreadystatechange = function (aEvt) {
        if (req.readyState === 4) {
            if (req.status === 200) {
                cb_function.call(what, req.responseText);
            } else {
                throw "Getting " + URL + "failed!";
            }
        }
    };
    req.send("");
};

and this testFuncion

exports.ensureLoadText = function (test) {
    var url = "http://www.ceplovi.cz/matej/progs/data/pushkin.txt";
    var text = "";
    util.loadText(url,function(txt) {
        test.assertEqual(txt,pushkinTestString);
    });
};

it fails with exception even when I change testing function to primitive:

exports.ensureLoadText = function (test) {
    var url = "http://www.ceplovi.cz/matej/progs/data/pushkin.txt";
    var text = "";
    util.loadText(url,function(txt) {
        test.assertEqual(pushkinTestString,pushkinTestString);
    });
};

I would love to have a possibility of unit testing for XMLHttpRequest-derived functions.
Hi Matej,
on a general case, it is possible to run tests from callback functions. I don't know if there is any problem with the XHR. What kind of exception are you getting? You can try running your tests with the --verbose flag to see more information.

However, from the example code, I think you might just be missing the call to test.waitUntilDone() at the beginning of your test function. It is needed if your test is not completed when it reaches the end of the function (as is the case with callbacks). You'll also need to call test.done() when it's completed.

You can see an example here: http://hg.mozilla.org/labs/jetpack-sdk/file/11a90f6b4351/packages/jetpack-core/tests/test-page-worker.js#l198
Yeah, what Felipe said should fix your problem.

One thing we might want to change in the test runner, to make this kind of problem easier to detect, is to internally mark the test object as "expired" once the runner thinks the test is done: this way the test.assertEqual() in Matej's example could log an error that says something helpful like "a test method was just called on a test that already appears to be finished! you may want to use test.waitUntilDone() if the test is asynchronous."

Also, re: XHR in particular, we'll be landing the mock XHR api from bug 562234 soon, which should make this kind of thing both easier and faster to test.
Yes, when I modify my test function to

exports.ensureLoadText = function (test) {
    var url = "http://www.ceplovi.cz/matej/progs/data/doi.txt";
    var text = "";
    test.waitUntilDone();
    util.loadText(url,function(txt) {
        test.assertEqual(txt,testString);
        test.done();
    });
};

It works as it should. Thank you

The only remaining problem is that when I use my original text (http://matej.ceplovi.cz/progs/data/pushkin.txt) I get completely incomprehensible output even in UTF-8 locale. Something in the style of

 "Byl pozdn\xED ve\u010Der prvn\xED m\xE1j!\n\n\u041D\u0430\u0441 \u0431\u044B\u043B\u043E \u043C\u043D\u043E\u0433\u043E \u043D\u0430 \u0447\u0435\u043B\u043D\u0435;\n\u0418\u043D\u044B\u0435 \u043F\u0430\u0440\u0443\u0441 \u043D\u0430\u043F\u0440\u044F\u0433\u0430\u043B\u0438,\n\u0414\u0440\u0443\u0433\u0438\u0435 \u0434\u0440\u0443\u0436\u043D\u043E \u0443\u043F\u0438\u0440\u0430\u043B\u0438\n\n\u0412 \u0433\u043B\u0443\u0431\u044C \u043C\u043E\u0449\u043D\u044B \u0432\u0435\u0441\u043B\u044B. \u0412 \u0442\u0438\u0448\u0438\u043D\u0435\n\u041D\u0430 \u0440\u0443\u043B\u044C \u0441\u043A\u043B\u043E\u043D\u044F\u0441\u044C, \u043D\u0430\u0448 \u043A\u043E\u0440\u043C\u0449\u0438\u043A \u0443\u043C\u043D\u044B\u0439\n\u0412 \u043C\u043E\u043B\u0447\u0430\u043D\u044C\u0438 \u043F\u0440\u0430\u0432\u0438\u043B \u0433\u0440\u0443\u0437\u043D\u044B\u0439 \u0447\u043E\u043B\u043D;\n\u0410 \u044F \u2014 \u0431\u0435\u0441\u043F\u0435\u0447\u043D\u043E\u0439 \u0432\u0435\u0440\u044B \u043F\u043E\u043B\u043D \u2014\n\u041F\u043B\u043E\u0432\u0446\u0430\u043C \u044F \u043F\u0435\u043B....\"\n"

Isn't JavaScript supposed to know all these tricks? With ASCII text (the first paragraph of the Declaration of Independence) everything works as it should.
Re: comment 3, as Matej, Felipe, and I discovered over IRC a couple of days ago, the securable module loader doesn't treat files as UTF-8.  Since the problem in comment 0 is invalid, morphing the bug for comment 3.
OS: Linux → All
Hardware: x86 → All
Summary: cannot run test in a callback → Securable module loader should treat files as UTF-8
Attached patch patchSplinter Review
Attachment #449240 - Flags: review?(avarma)
(In reply to comment #5)
> Created an attachment (id=449240) [details]
> patch

Is it right to hardwire one particular encoding (although the one both I and you use ;))? Wouldn't a locale's encoding be preferable? Isn't it possible to write Javascript on Windows in UCS-2? Or do we prescribe somewhere specifically in documentation/specification that UTF-8 is The One Right Encoding?
Comment on attachment 449240 [details] [diff] [review]
patch

I especially dig the test case.

Re: matej's concern, I'm personally fine w/ just declaring utf-8 to be The One True Encoding.
Attachment #449240 - Flags: review?(avarma) → review+
(In reply to comment #7)
> (From update of attachment 449240 [details] [diff] [review])
> I especially dig the test case.
> 
> Re: matej's concern, I'm personally fine w/ just declaring utf-8 to be The One
> True Encoding.

Me too, but then we should actually do it somewhere.
I noticed something strange while writing that patch.  If you read a UTF-8-encoded file with an nsIScriptableInputStream and dump() it, outside of Jetpack, you get gibberish, as each byte in the encoding is printed as an ASCII character.  And if you read the same file with a converter stream as this patch does, the right multi-byte characters are printed.  That's all as expected.

But if you do the same thing in a Jetpack unit test, the opposite is true:  The right multi-byte characters are printed with nsIScriptableInputStream (but you can still examine each byte in a multi-byte char), and the low byte of each code point is printed as ASCII with the converter stream (the code points are correct, just the high bytes are lopped off).

This is independent of the patch.  It's as if somewhere along the way multi-byte strings are getting funneled through 8-bit-wide C strings.
Oh actually I left out the best part.  If I dump() file contents in securable module's getFile(), without this patch the right multi-byte chars are printed.  With this patch, it's the high-bytes-lopped-off thing.  That's with 1.9.2.  There must have been a bug fixed for 1.9.3, because there I get gibberish ASCII without the patch and the correct chars with it, which is the right behavior.
It wouldn't be the first time that happened (bug 567597, bug 375641)!
One more fun fact.  On 1.9.3, if a unit test console.log()s a unicode string, it prints correctly, but if it dump()s it, it's high-bytes-lopped-off.  I was able to reduce the problem to this, which can be run in Fx 3.7/4's error console:

var code = '
  dump("\\u6587");
  foo();
';
var p = Components.classes["@mozilla.org/systemprincipal;1"].
        createInstance(Components.interfaces.nsIPrincipal);
var sb = new Components.utils.Sandbox(p);
sb.foo = function () dump("\u6587");
Components.utils.evalInSandbox(code, sb, "1.8");

It's the dump() inside the sandbox's context that's messed up.
(In reply to comment #12)
> One more fun fact.  On 1.9.3, if a unit test console.log()s a unicode string,
> it prints correctly, but if it dump()s it, it's high-bytes-lopped-off. 

Filed bug 570291 against xpconnect.
http://hg.mozilla.org/labs/jetpack-sdk/rev/0623cd990857
Assignee: nobody → adw
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: -- → 0.5
The Add-on SDK is no longer a Mozilla Labs experiment and has become a big enough project to warrant its own Bugzilla product, so the "Add-on SDK" product has been created for it, and I am moving its bugs to that product.

To filter bugmail related to this change, filter on the word "looptid".
Component: Jetpack SDK → General
Product: Mozilla Labs → Add-on SDK
QA Contact: jetpack-sdk → general
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: