Closed
Bug 377498
Opened 18 years ago
Closed 14 years ago
mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding (treats all files as ISO-8859-1)
Categories
(Core :: XPConnect, defect)
Core
XPConnect
Tracking
()
RESOLVED
FIXED
mozilla2.0b7
People
(Reporter: georg, Assigned: kmag)
References
Details
(Keywords: dev-doc-complete)
Attachments
(2 files, 1 obsolete file)
4.33 KB,
patch
|
bzbarsky
:
review+
jst
:
approval2.0+
|
Details | Diff | Splinter Review |
42 bytes,
application/x-javascript
|
Details |
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.13) Gecko/20060414
Build Identifier:
loadSubScript does not support JavaScript 1.7 and does not support specifying the character encoding nor does detect the character encoding automatically or the language version.
http://developer.mozilla.org/en/docs/mozIJSSubScriptLoader
I sugesst we should unfreese the interface and add 2 additional parameters to specify the script language in the format of a mime type with version information, because in the actual implementation JavaScript 1.7 is supported only partially. The let statement is missing. Also a parameter specifying the character encoding should be added.
This is especially a problem, for local script files, that do not send such information via HTTP header. In XULRunner applications all files are local files and do not provide a content type HTTP header telling which character encoding is used. This causes local files to be parsed wrong.
I have trouble with utf-8 files, that are assumed to be 8859-1 but are utf-8.
Reproducible: Always
Comment 1•18 years ago
|
||
we don't unfreeze interfaces. we could add an mozIJSSubScriptLoader2 version though.
Comment 2•18 years ago
|
||
One issue per bug, please. Can you file a separate bug for the character encoding bug and let this be about loading JS 1.7 with mozIJSSubScriptLoader?
Can you provide a testcase for this? It appears that the sub-script is evaluated with the same JS version as the parent script, i.e. if you run
var loader = Components.classes["@mozilla.org/moz/jssubscript-loader;1"].getService(Components.interfaces.mozIJSSubScriptLoader);
loader.loadSubScript("data:text/plain,let f=1")
- that works, as long as |let| works in the parent script (the one that uses loadSubScript). That sounds sensible to me. Do you disagree?
Reporter | ||
Comment 3•18 years ago
|
||
(In reply to comment #1)
> we don't unfreeze interfaces. we could add an mozIJSSubScriptLoader2 version
> though.
that is ok too
Reporter | ||
Comment 4•18 years ago
|
||
(In reply to comment #2)
> It appears that the sub-script is
> evaluated with the same JS version as the parent script.
You are right. The script version is inherited from the parent script. The charset is not inherited. (tested on Mac using XULRunner 1.9a4)
<script type="application/x-javascript; version=1.7; charset=utf-8" src="chrome://ida/content/js/ida.js"/>
The charset in the type attribute is ignored, so that is no work around. Instead of using the charset from the type attribut ida.js allways uses the charset of the embeding XUL document, but scripts loaded with the script loader use iso-8859-1. They do not inherit the charset from ida.js.
You see an extended interface for script loading to control this individually script by script is useful. This is, why i requested for this enhancement.
eval does not support let. Can I set the ContentScriptType of the XUL window? How can I do that?
My attempt
<html:meta html:http-equiv="Content-Script-Type" html:content="application/x-javascript; version=1.7; charset=utf-8"/>
does not result in eval supporting let.
Comment 5•18 years ago
|
||
You're confusing matters by talking about multiple things in a single bug. Again, the charset issue appears to be a valid one - can you file it separately, please?
Not sure what Content-Script-Type or eval have to do with the JS 1.7 part of the report. FWIW, I still don't think the change you're suggesting is necessary (while it gives more flexibility, just switching all of your scripts to 1.7 should not be very hard.)
Reporter | ||
Comment 6•18 years ago
|
||
It is not a bug, it is a request for an enhancement of an existing interface, which can be done completely transparent to existing applications or for creating an additional more powerful interface as Christian answered in his comment #1.
For the eval it is unclear to me, where it takes its script version from. It does not take it from the script, where it is written, so it is not affected by the inheritance implementes in the script loader interface.
Comment 7•17 years ago
|
||
any news about this? we still get a broken behaviour which looks pretty critical to me.
"à".match(/\u00e0/) matches if executed in a script loaded by <script> or in firebug console (where probably the charset is correctly set to utf-8) but stops matching in a script loaded by mozIJSSubScriptLoader.
is it that big? excellent javascript unicode support is broken in firefox extensions?
Thanks,
Federico
Reporter | ||
Comment 8•17 years ago
|
||
For the moment we encode all umlauts and ligatures as unicode escape sequences to make the script file an ASCII file, because we can't wait for the Mozilla project to provide a fixed implementation of that interface or a new interface with the necessary features to circumvent that trouble of different behaviour of the same file depending on whether loade via script tag or via loadSubScript call.
Comment 9•17 years ago
|
||
(In reply to comment #8)
> For the moment we encode all umlauts and ligatures as unicode escape sequences
which version of firefox? i don't get how we can still get "à".match(/\u00e0/) == null,
under loadsubscript on firefox 2.0.0.11, if you actually convert "à" to "\u00e0".
Do you want me to make a test case?
Thanks,
Federico
Reporter | ||
Comment 10•17 years ago
|
||
Our experience is not from FF 2 but from XulRunner 1.9.
Updated•16 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: loadSubScript does not support JavaScript 1.7 and does not support specifying the character encoding → mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding
Updated•16 years ago
|
Severity: enhancement → normal
Summary: mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding → mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding (treats all files as ISO-8859-1)
Comment 13•15 years ago
|
||
This bug we also encounter now when running Mozmill tests which contain non-ascii characters. Is mozIJSSubScriptLoader the only interface to execute scripts in a synchronous way or do we have another interface? If not, is the usage of the numerical value the only way to circumvent this bug for the moment?
Blocks: 506760
Assignee | ||
Comment 14•14 years ago
|
||
Is there any possible way this can be fixed before Firefox 4.0, especially now that interfaces are unfrozen? If not, there are quite a lot of extensions that will have to maintain hacks for Unicode support in their scripts for years to come.
I can provide a patch, if that would help expedite matters.
![]() |
||
Comment 15•14 years ago
|
||
Interfaces are unfrozen in general, but we're trying to not change them after the first beta, so we don't break people who have started developing against the interface.
If this API is only used from JS, then additive changes that won't break existing JS consumers are probably ok.
Comment 16•14 years ago
|
||
Given that billions of potential users don't use English, I should think that not handling international character set is a rather critical issue that was reported more than 3 years ago.
Hope that can be done by adding an optional parameter that will not break existing code.
Thanks!
![]() |
||
Comment 17•14 years ago
|
||
Users don't use this API; developers do. And the developer either controls the script being loaded (in which case he can use \u escapes for all non-ASCII characters) or doesn't (in which case he has no idea what encoding to claim for the script anyway).
Fixing this bug is about developer convenience; it doesn't affect users in any way I can see other than by possibly increasing developer productivity.
Comment 18•14 years ago
|
||
Developers work for users.
And when developers are not able to do something that works in users' languages, it's not good for users.
If they are able to do it but the workload is too big, they don't do it and it's bad for users too. That explains why so many services don't handle non-English language properly (try creating a developper account at apple.com with a last name using a non ascii characterset: it will be cancelled).
When writing scripts that use localized strings, it's a real pain to maintain \u escapes, the script becomes unreadable.
Have you done it yourself ?
It can be ok when you have a few strings, it's unrealistic when you have hundreds, each in several languages where most need unicode.
I think more use cases are easy to find, like reading JSonified UTF-8 data, scripts being translated by volunteers on tools like crowdin or babelzilla
If it's a matter of not breaking APIs, then as Christian Biesinger suggested 3 years ago, may be a mozIJSSubScriptLoader2 API would be safer.
Comment 19•14 years ago
|
||
I have to agree with Xavier. Using \u escapes is so uncomfortable that it's practically impossible for more than a 5-10 strings. It's not just an inconvenience.
FWIW, the JSON parser in Gecko is supposed to work with Unicode (but I don't use it, because I need comments, so I'm back to subscriptloader).
Comment 20•14 years ago
|
||
+1 to comment 18.
I originally CC'ed myself on this bug when working on Ubiquity, when we were soliciting parser description files from users with some casual JS experience who spoke the target language (many first-time Mozilla contributors) for inclusion in Ubiquity. We ended up going a different route to circumvent this issue, because telling contributors to use \u was a non-starter.
![]() |
||
Comment 21•14 years ago
|
||
But telling contributors to use UTF-8 and no other encoding was a starter?
In any case, I agree this should be fixed and would be happy to review a patch that does so in a safe way (or unsafe, for post-Firefox-4). I just don't think it's a "critical because it immediately affects billions of users" issue, which is how comment 16 affected to characterize it.
Comment 22•14 years ago
|
||
(In reply to comment #21)
> But telling contributors to use UTF-8 and no other encoding was a starter?
From my experience then, yes it was.
But I agree that this does not affect billions of users.
Assignee | ||
Comment 23•14 years ago
|
||
The attached patch adds an optional charset string argument to mozIJSSubScriptLoader.loadSubScript which, if present, causes the script to be decoded from that character set and evaluated as Unicode.
![]() |
||
Comment 24•14 years ago
|
||
Comment on attachment 479855 [details]
Adds charset argument to mozIJSSubScriptLoader.loadSubScript
Oh, this thing is totally not callable from C++ anyway! Ok, then.
Thank you for writing this!
Attachment #479855 -
Flags: review?(bzbarsky)
![]() |
||
Comment 25•14 years ago
|
||
Comment on attachment 479855 [details]
Adds charset argument to mozIJSSubScriptLoader.loadSubScript
>+++ b/js/src/xpconnect/loader/mozJSSubScriptLoader.cpp
>@@ -158,7 +160,8 @@ mozJSSubScriptLoader::LoadSubScript (con
>+ ok = JS_ConvertArguments (cx, argc, argv, "s / o s", &url, &target_obj, &charsetStr);
Probably better to use an 'W' param here, not 's', which will hand out a jschar*. And then instead of NS_ConvertUTF8toUTF16 you can use nsDependentString(reinterpret_cast<PRUnichar*>()). Just saves allocating that extra char* buffer and all.
>+ nsString script;
>+ rv = nsScriptLoader::ConvertToUTF16(chan, (PRUint8*)(char*)buf, PRUint32(len),
>+ charset, nsnull, script);
Unless you actually want to allow the channel charset to override |charset|, pass null for the channel?
And does reinterpret_cast<PRUint8*>(buf) do the right thing instead of those two C casts?
Why do you need the cast on |len|?
>+ (jschar*)script.get(), script.Length(),
Again, reinterpret_cast<> here.
r=me with those fixes and the channel thing addressed.
Thanks again for the patch! Do you want to update it to the comments above? I can do it, but I'll need answers to some of those questions from you.
Attachment #479855 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 26•14 years ago
|
||
(In reply to comment #25)
> Comment on attachment 479855 [details]
> Adds charset argument to mozIJSSubScriptLoader.loadSubScript
>
> >+++ b/js/src/xpconnect/loader/mozJSSubScriptLoader.cpp
> >@@ -158,7 +160,8 @@ mozJSSubScriptLoader::LoadSubScript (con
> >+ ok = JS_ConvertArguments (cx, argc, argv, "s / o s", &url, &target_obj, &charsetStr);
>
> Probably better to use an 'W' param here, not 's', which will hand out a
> jschar*. And then instead of NS_ConvertUTF8toUTF16 you can use
> nsDependentString(reinterpret_cast<PRUnichar*>()). Just saves allocating that
> extra char* buffer and all.
Ok.
> >+ nsString script;
> >+ rv = nsScriptLoader::ConvertToUTF16(chan, (PRUint8*)(char*)buf, PRUint32(len),
> >+ charset, nsnull, script);
>
> Unless you actually want to allow the channel charset to override |charset|,
> pass null for the channel?
I actually struggled with that. At the moment, chrome and file
URL channels don't supply a charset, but it seemed to me that if
they at some point do, it would probably be more valid than the
charset supplied by the caller. But the more I think about it, the
more I think that it should really only be a fallback if the caller
doesn't provide a charset, or to preserve compatibility with the
current interface, if null or "" is passed instead.
> And does reinterpret_cast<PRUint8*>(buf) do the right thing instead of those
> two C casts?
>
> Why do you need the cast on |len|?
Sorry, I realized just after I posted that I forgot to clean those
up. I was struggling to get the prototype to match, and as GCC so
unhelpfully fails to mention which arguments are incompatible, I
just cast everything until I got it working.
> >+ (jschar*)script.get(), script.Length(),
>
> Again, reinterpret_cast<> here.
I thought that was strange, but every other invocation of that function
in mozilla-central seemed to use the C-style cast, so I figured
there was probably a reason.
> r=me with those fixes and the channel thing addressed.
>
> Thanks again for the patch! Do you want to update it to the comments above? I
> can do it, but I'll need answers to some of those questions from you.
Ok. I'll have an updated patch a little later today.
![]() |
||
Comment 27•14 years ago
|
||
> so I figured there was probably a reason.
Purely historical; the C++-style casts were not supported by all the compilers we wanted to support at one point in time (over a decade ago).
Assignee | ||
Comment 28•14 years ago
|
||
Adds the changes requested in comment 25.
I'm sorely tempted to standardize the spacing before the opening paren in function calls, but it's so inconsistent that I don't know where to start.
Attachment #479855 -
Attachment is obsolete: true
Attachment #480197 -
Flags: review?(bzbarsky)
![]() |
||
Comment 29•14 years ago
|
||
Comment on attachment 480197 [details] [diff] [review]
Requested updates to attachment 479855 [details]
r=me.
Requesting 2.0 approval; this is a safe additive change that allows the subscript loader to load non-ISO-8859-1 stuff.
Attachment #480197 -
Flags: review?(bzbarsky)
Attachment #480197 -
Flags: review+
Attachment #480197 -
Flags: approval2.0?
![]() |
||
Updated•14 years ago
|
Assignee: nobody → maglione.k
Updated•14 years ago
|
Attachment #480197 -
Flags: approval2.0? → approval2.0+
Comment 30•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b8
Updated•14 years ago
|
Target Milestone: mozilla2.0b8 → mozilla2.0b7
Comment 31•14 years ago
|
||
Has this been fixed?
I've tested it on firefox4 nightly build, but it doesn't work.
loader.loadSubScript(uri, module, "UTF-8");
![]() |
||
Comment 32•14 years ago
|
||
That should work. Can you attach an example that shows it not working?
Comment 33•14 years ago
|
||
This is my test code, correct me if I make some mistake.
try {
var testFilePath = 'E:\\test.uc.js';
var testFile = Cc["@mozilla.org/file/local;1"].createInstance(Ci.nsILocalFile);
testFile.initWithPath(testFilePath);
Cc["@mozilla.org/moz/jssubscript-loader;1"].getService(Ci.mozIJSSubScriptLoader).loadSubScript(Cc["@mozilla.org/network/io-service;1"].getService(Ci.nsIIOService).getProtocolHandler("file").QueryInterface(Ci.nsIFileProtocolHandler).getURLSpecFromFile(testFile), window, 'UTF-8');
} catch(e) {
alert(e);
}
Here is the content of test.uc.js file.
(function() {
alert("测试");
})();
Assignee | ||
Comment 34•14 years ago
|
||
It certainly works for me in nightlies and the 4.0 RCs.
![]() |
||
Comment 35•14 years ago
|
||
I meant attaching the actual file you're loading (as an attachment, not via copy/paste).
Comment 36•14 years ago
|
||
I load this file using utf-8, but the string "测试" can not be alerted correctly, but four blocks are shown instead.
Comment 37•14 years ago
|
||
Thanks. I've got it.
It works. Sorry for that my file is not saved as utf-8.
Updated•14 years ago
|
Keywords: dev-doc-needed
Version: unspecified → Trunk
Comment 38•14 years ago
|
||
This had already been documented, but I've cleaned the docs up a bit:
https://developer.mozilla.org/en/XPCOM_Interface_Reference/mozIJSSubScriptLoader
And it's mentioned on Firefox 5 for developers.
Keywords: dev-doc-needed → dev-doc-complete
You need to log in
before you can comment on or make changes to this bug.
Description
•