377498 - mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding (treats all files as ISO-8859-1)

Reporter

Description

•

18 years ago

User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.13) Gecko/20060414 Build Identifier: loadSubScript does not support JavaScript 1.7 and does not support specifying the character encoding nor does detect the character encoding automatically or the language version. http://developer.mozilla.org/en/docs/mozIJSSubScriptLoader I sugesst we should unfreese the interface and add 2 additional parameters to specify the script language in the format of a mime type with version information, because in the actual implementation JavaScript 1.7 is supported only partially. The let statement is missing. Also a parameter specifying the character encoding should be added. This is especially a problem, for local script files, that do not send such information via HTTP header. In XULRunner applications all files are local files and do not provide a content type HTTP header telling which character encoding is used. This causes local files to be parsed wrong. I have trouble with utf-8 files, that are assumed to be 8859-1 but are utf-8. Reproducible: Always

Christian :Biesinger (don't email me, ping me on IRC)

Comment 1

•

18 years ago

we don't unfreeze interfaces. we could add an mozIJSSubScriptLoader2 version though.

Nickolay_Ponomarev

Comment 2

•

18 years ago

One issue per bug, please. Can you file a separate bug for the character encoding bug and let this be about loading JS 1.7 with mozIJSSubScriptLoader? Can you provide a testcase for this? It appears that the sub-script is evaluated with the same JS version as the parent script, i.e. if you run var loader = Components.classes["@mozilla.org/moz/jssubscript-loader;1"].getService(Components.interfaces.mozIJSSubScriptLoader); loader.loadSubScript("data:text/plain,let f=1") - that works, as long as |let| works in the parent script (the one that uses loadSubScript). That sounds sensible to me. Do you disagree?

Georg Maaß

Reporter

Comment 3

•

18 years ago

(In reply to comment #1) > we don't unfreeze interfaces. we could add an mozIJSSubScriptLoader2 version > though. that is ok too

Georg Maaß

Reporter

Comment 4

•

18 years ago

(In reply to comment #2) > It appears that the sub-script is > evaluated with the same JS version as the parent script. You are right. The script version is inherited from the parent script. The charset is not inherited. (tested on Mac using XULRunner 1.9a4) <script type="application/x-javascript; version=1.7; charset=utf-8" src="chrome://ida/content/js/ida.js"/> The charset in the type attribute is ignored, so that is no work around. Instead of using the charset from the type attribut ida.js allways uses the charset of the embeding XUL document, but scripts loaded with the script loader use iso-8859-1. They do not inherit the charset from ida.js. You see an extended interface for script loading to control this individually script by script is useful. This is, why i requested for this enhancement. eval does not support let. Can I set the ContentScriptType of the XUL window? How can I do that? My attempt <html:meta html:http-equiv="Content-Script-Type" html:content="application/x-javascript; version=1.7; charset=utf-8"/> does not result in eval supporting let.

Nickolay_Ponomarev

Comment 5

•

18 years ago

You're confusing matters by talking about multiple things in a single bug. Again, the charset issue appears to be a valid one - can you file it separately, please? Not sure what Content-Script-Type or eval have to do with the JS 1.7 part of the report. FWIW, I still don't think the change you're suggesting is necessary (while it gives more flexibility, just switching all of your scripts to 1.7 should not be very hard.)

Georg Maaß

Reporter

Comment 6

•

18 years ago

It is not a bug, it is a request for an enhancement of an existing interface, which can be done completely transparent to existing applications or for creating an additional more powerful interface as Christian answered in his comment #1. For the eval it is unclear to me, where it takes its script version from. It does not take it from the script, where it is written, so it is not affected by the inheritance implementes in the script loader interface.

Federico Galassi

Comment 7

•

18 years ago

any news about this? we still get a broken behaviour which looks pretty critical to me. "à".match(/\u00e0/) matches if executed in a script loaded by <script> or in firebug console (where probably the charset is correctly set to utf-8) but stops matching in a script loaded by mozIJSSubScriptLoader. is it that big? excellent javascript unicode support is broken in firefox extensions? Thanks, Federico

Georg Maaß

Reporter

Comment 8

•

18 years ago

For the moment we encode all umlauts and ligatures as unicode escape sequences to make the script file an ASCII file, because we can't wait for the Mozilla project to provide a fixed implementation of that interface or a new interface with the necessary features to circumvent that trouble of different behaviour of the same file depending on whether loade via script tag or via loadSubScript call.

Federico Galassi

Comment 9

•

18 years ago

(In reply to comment #8) > For the moment we encode all umlauts and ligatures as unicode escape sequences which version of firefox? i don't get how we can still get "à".match(/\u00e0/) == null, under loadsubscript on firefox 2.0.0.11, if you actually convert "à" to "\u00e0". Do you want me to make a test case? Thanks, Federico

Georg Maaß

Reporter

Comment 10

•

17 years ago

Our experience is not from FF 2 but from XulRunner 1.9.

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

16 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Summary: loadSubScript does not support JavaScript 1.7 and does not support specifying the character encoding → mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

16 years ago

Severity: enhancement → normal

Summary: mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding → mozIJSSubscriptLoader::loadSubScript does not support specifying the character encoding (treats all files as ISO-8859-1)

Henrik Skupin [:whimboo][⌚️UTC+2] (away 07/21 - 08/08)

Comment 13

•

15 years ago

This bug we also encounter now when running Mozmill tests which contain non-ascii characters. Is mozIJSSubScriptLoader the only interface to execute scripts in a synchronous way or do we have another interface? If not, is the usage of the numerical value the only way to circumvent this bug for the moment?

Blocks: 506760

Henrik Skupin [:whimboo][⌚️UTC+2] (away 07/21 - 08/08)

Updated

•

15 years ago

Blocks: 559814

Kris Maglione [:kmag]

Assignee

Comment 14

•

15 years ago

Is there any possible way this can be fixed before Firefox 4.0, especially now that interfaces are unfrozen? If not, there are quite a lot of extensions that will have to maintain hacks for Unicode support in their scripts for years to come. I can provide a patch, if that would help expedite matters.

Boris Zbarsky [:bzbarsky]

Comment 15

•

15 years ago

Interfaces are unfrozen in general, but we're trying to not change them after the first beta, so we don't break people who have started developing against the interface. If this API is only used from JS, then additive changes that won't break existing JS consumers are probably ok.

Xavier

Comment 16

•

15 years ago

Given that billions of potential users don't use English, I should think that not handling international character set is a rather critical issue that was reported more than 3 years ago. Hope that can be done by adding an optional parameter that will not break existing code. Thanks!

Boris Zbarsky [:bzbarsky]

Comment 17

•

15 years ago

Users don't use this API; developers do. And the developer either controls the script being loaded (in which case he can use \u escapes for all non-ASCII characters) or doesn't (in which case he has no idea what encoding to claim for the script anyway). Fixing this bug is about developer convenience; it doesn't affect users in any way I can see other than by possibly increasing developer productivity.

Xavier

Comment 18

•

15 years ago

Developers work for users. And when developers are not able to do something that works in users' languages, it's not good for users. If they are able to do it but the workload is too big, they don't do it and it's bad for users too. That explains why so many services don't handle non-English language properly (try creating a developper account at apple.com with a last name using a non ascii characterset: it will be cancelled). When writing scripts that use localized strings, it's a real pain to maintain \u escapes, the script becomes unreadable. Have you done it yourself ? It can be ok when you have a few strings, it's unrealistic when you have hundreds, each in several languages where most need unicode. I think more use cases are easy to find, like reading JSonified UTF-8 data, scripts being translated by volunteers on tools like crowdin or babelzilla If it's a matter of not breaking APIs, then as Christian Biesinger suggested 3 years ago, may be a mozIJSSubScriptLoader2 API would be safer.

Ben Bucksch (:BenB)

Comment 19

•

15 years ago

I have to agree with Xavier. Using \u escapes is so uncomfortable that it's practically impossible for more than a 5-10 strings. It's not just an inconvenience. FWIW, the JSON parser in Gecko is supposed to work with Unicode (but I don't use it, because I need comments, so I'm back to subscriptloader).

Michael Yoshitaka Erlewine [:mitcho]

Comment 20

•

15 years ago

+1 to comment 18. I originally CC'ed myself on this bug when working on Ubiquity, when we were soliciting parser description files from users with some casual JS experience who spoke the target language (many first-time Mozilla contributors) for inclusion in Ubiquity. We ended up going a different route to circumvent this issue, because telling contributors to use \u was a non-starter.

Boris Zbarsky [:bzbarsky]

Comment 21

•

15 years ago

But telling contributors to use UTF-8 and no other encoding was a starter? In any case, I agree this should be fixed and would be happy to review a patch that does so in a safe way (or unsafe, for post-Firefox-4). I just don't think it's a "critical because it immediately affects billions of users" issue, which is how comment 16 affected to characterize it.

Michael Yoshitaka Erlewine [:mitcho]

Comment 22

•

15 years ago

(In reply to comment #21) > But telling contributors to use UTF-8 and no other encoding was a starter? From my experience then, yes it was. But I agree that this does not affect billions of users.

Kris Maglione [:kmag]

Assignee

Comment 23

•

15 years ago

Attached file Adds charset argument to mozIJSSubScriptLoader.loadSubScript (obsolete) — Details

The attached patch adds an optional charset string argument to mozIJSSubScriptLoader.loadSubScript which, if present, causes the script to be decoded from that character set and evaluated as Unicode.

Boris Zbarsky [:bzbarsky]

Comment 24

•

15 years ago

Comment on attachment 479855 [details] Adds charset argument to mozIJSSubScriptLoader.loadSubScript Oh, this thing is totally not callable from C++ anyway! Ok, then. Thank you for writing this!

Attachment #479855 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 25

•

15 years ago

Comment on attachment 479855 [details] Adds charset argument to mozIJSSubScriptLoader.loadSubScript >+++ b/js/src/xpconnect/loader/mozJSSubScriptLoader.cpp >@@ -158,7 +160,8 @@ mozJSSubScriptLoader::LoadSubScript (con >+ ok = JS_ConvertArguments (cx, argc, argv, "s / o s", &url, &target_obj, &charsetStr); Probably better to use an 'W' param here, not 's', which will hand out a jschar*. And then instead of NS_ConvertUTF8toUTF16 you can use nsDependentString(reinterpret_cast<PRUnichar*>()). Just saves allocating that extra char* buffer and all. >+ nsString script; >+ rv = nsScriptLoader::ConvertToUTF16(chan, (PRUint8*)(char*)buf, PRUint32(len), >+ charset, nsnull, script); Unless you actually want to allow the channel charset to override |charset|, pass null for the channel? And does reinterpret_cast<PRUint8*>(buf) do the right thing instead of those two C casts? Why do you need the cast on |len|? >+ (jschar*)script.get(), script.Length(), Again, reinterpret_cast<> here. r=me with those fixes and the channel thing addressed. Thanks again for the patch! Do you want to update it to the comments above? I can do it, but I'll need answers to some of those questions from you.

Attachment #479855 - Flags: review?(bzbarsky) → review+

Kris Maglione [:kmag]

Assignee

Comment 26

•

15 years ago

(In reply to comment #25) > Comment on attachment 479855 [details] > Adds charset argument to mozIJSSubScriptLoader.loadSubScript > > >+++ b/js/src/xpconnect/loader/mozJSSubScriptLoader.cpp > >@@ -158,7 +160,8 @@ mozJSSubScriptLoader::LoadSubScript (con > >+ ok = JS_ConvertArguments (cx, argc, argv, "s / o s", &url, &target_obj, &charsetStr); > > Probably better to use an 'W' param here, not 's', which will hand out a > jschar*. And then instead of NS_ConvertUTF8toUTF16 you can use > nsDependentString(reinterpret_cast<PRUnichar*>()). Just saves allocating that > extra char* buffer and all. Ok. > >+ nsString script; > >+ rv = nsScriptLoader::ConvertToUTF16(chan, (PRUint8*)(char*)buf, PRUint32(len), > >+ charset, nsnull, script); > > Unless you actually want to allow the channel charset to override |charset|, > pass null for the channel? I actually struggled with that. At the moment, chrome and file URL channels don't supply a charset, but it seemed to me that if they at some point do, it would probably be more valid than the charset supplied by the caller. But the more I think about it, the more I think that it should really only be a fallback if the caller doesn't provide a charset, or to preserve compatibility with the current interface, if null or "" is passed instead. > And does reinterpret_cast<PRUint8*>(buf) do the right thing instead of those > two C casts? > > Why do you need the cast on |len|? Sorry, I realized just after I posted that I forgot to clean those up. I was struggling to get the prototype to match, and as GCC so unhelpfully fails to mention which arguments are incompatible, I just cast everything until I got it working. > >+ (jschar*)script.get(), script.Length(), > > Again, reinterpret_cast<> here. I thought that was strange, but every other invocation of that function in mozilla-central seemed to use the C-style cast, so I figured there was probably a reason. > r=me with those fixes and the channel thing addressed. > > Thanks again for the patch! Do you want to update it to the comments above? I > can do it, but I'll need answers to some of those questions from you. Ok. I'll have an updated patch a little later today.

Boris Zbarsky [:bzbarsky]

Comment 27

•

15 years ago

> so I figured there was probably a reason. Purely historical; the C++-style casts were not supported by all the compilers we wanted to support at one point in time (over a decade ago).

Kris Maglione [:kmag]

Assignee

Comment 28

•

15 years ago

Attached patch Requested updates to attachment 479855 — Details — Splinter Review

Adds the changes requested in comment 25. I'm sorely tempted to standardize the spacing before the opening paren in function calls, but it's so inconsistent that I don't know where to start.

Attachment #479855 - Attachment is obsolete: true

Attachment #480197 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 29

•

15 years ago

Comment on attachment 480197 [details] [diff] [review] Requested updates to attachment 479855 [details] r=me. Requesting 2.0 approval; this is a safe additive change that allows the subscript loader to load non-ISO-8859-1 stuff.

Attachment #480197 - Flags: review?(bzbarsky)

Attachment #480197 - Flags: review+

Attachment #480197 - Flags: approval2.0?

Boris Zbarsky [:bzbarsky]

Updated

•

15 years ago

Assignee: nobody → maglione.k

Johnny Stenback (:jst)

Updated

•

15 years ago

Attachment #480197 - Flags: approval2.0? → approval2.0+

(no longer active)

Comment 30

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/3dcf107b7267

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla2.0b8

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

15 years ago

Target Milestone: mozilla2.0b8 → mozilla2.0b7

wuxiaoshan

Comment 31

•

14 years ago

Has this been fixed? I've tested it on firefox4 nightly build, but it doesn't work. loader.loadSubScript(uri, module, "UTF-8");

Boris Zbarsky [:bzbarsky]

Comment 32

•

14 years ago

That should work. Can you attach an example that shows it not working?

wuxiaoshan

Comment 33

•

14 years ago

This is my test code, correct me if I make some mistake. try { var testFilePath = 'E:\\test.uc.js'; var testFile = Cc["@mozilla.org/file/local;1"].createInstance(Ci.nsILocalFile); testFile.initWithPath(testFilePath); Cc["@mozilla.org/moz/jssubscript-loader;1"].getService(Ci.mozIJSSubScriptLoader).loadSubScript(Cc["@mozilla.org/network/io-service;1"].getService(Ci.nsIIOService).getProtocolHandler("file").QueryInterface(Ci.nsIFileProtocolHandler).getURLSpecFromFile(testFile), window, 'UTF-8'); } catch(e) { alert(e); } Here is the content of test.uc.js file. (function() { alert("测试"); })();

Kris Maglione [:kmag]

Assignee

Comment 34

•

14 years ago

It certainly works for me in nightlies and the 4.0 RCs.

Boris Zbarsky [:bzbarsky]

Comment 35

•

14 years ago

I meant attaching the actual file you're loading (as an attachment, not via copy/paste).

wuxiaoshan

Comment 36

•

14 years ago

Attached file A testcase for loading files with utf-8 encoding characters. — Details

I load this file using utf-8, but the string "测试" can not be alerted correctly, but four blocks are shown instead.

wuxiaoshan

Comment 37

•

14 years ago

Thanks. I've got it. It works. Sorry for that my file is not saved as utf-8.

:Ms2ger (he/him; ⌚ UTC+1/+2)

Updated

•

14 years ago

Keywords: dev-doc-needed

Version: unspecified → Trunk

Eric Shepherd [:sheppy]

Comment 38

•

14 years ago

This had already been documented, but I've cleaned the docs up a bit: https://developer.mozilla.org/en/XPCOM_Interface_Reference/mozIJSSubScriptLoader And it's mentioned on Firefox 5 for developers.

Keywords: dev-doc-needed → dev-doc-complete

Adds charset argument to mozIJSSubScriptLoader.loadSubScript 15 years ago Kris Maglione [:kmag] 3.48 KB, text/plain	bzbarsky : review+	Details
Requested updates to attachment 479855 15 years ago Kris Maglione [:kmag] 4.33 KB, patch	bzbarsky : review+ jst : approval2.0+	Details \| Diff \| Splinter Review
A testcase for loading files with utf-8 encoding characters. 14 years ago wuxiaoshan 42 bytes, application/x-javascript		Details