Closed Bug 1240436 Opened 8 years ago Closed 8 years ago

Can't load a service worker script containing an ZERO WIDTH JOINER unicode character

Categories

(Core :: DOM: Service Workers, defect)

43 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: NiKo, Assigned: stone, Mentored)

References

Details

(Whiteboard: [tw-dom] btpp-active)

Attachments

(3 files, 4 obsolete files)

Reference real world issue: https://github.com/bahmutov/babel-service/issues/7

It seems Firefox 43.0.4 Ubuntu can't deal with any service worker script containing a ZERO WIDTH JOINER character. I didn't test with more recent versions nor with other platforms.

Sample minimal reproducible test case:

1. index.html

    <!DOCTYPE html>
    <html><meta charset="utf-8"></head>
    <body>
    <script>
    window.navigator.serviceWorker.register("worker.js", {scope: "/"})
      .then(reg => console.log("registered", reg))
      .catch(err => console.error("service worker error", err))
    </script>
    </body>
    </html>

2. worker.js

    var a = "‍";

Notice variable a contains an invisible character, ZERO WIDTH JOINER (0x200D)

I used Python SimpleHTTPServer to serve the page over http:

    $ python -m SimpleHTTPServer 8000

And opened http://localhost:8000/

Expected result:

The worker script should at least be successfully registered.

Actual result:

A registration error occurs:

    SyntaxError: SyntaxError: unterminated string literal

I've also created a gist if that's useful: https://gist.github.com/n1k0/6f925c655798b9a1abeb
what does the python http server return as the http content-type?
Flags: needinfo?(nperriault)
curl -I http://localhost:8000
HTTP/1.0 200 OK
Server: SimpleHTTP/0.6 Python/2.7.10
Date: Mon, 18 Jan 2016 08:24:55 GMT
Content-type: text/html
Content-Length: 269
Last-Modified: Mon, 18 Jan 2016 08:00:20 GMT
Flags: needinfo?(nperriault)
I'll take a look.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Can you attach the problem worker script from comment 0?  Or give me instructions on how to craft one?  Its unclear to me how to generate such a script.
Flags: needinfo?(nperriault)
You can either use the real world script failing here: https://babel-service-demo.herokuapp.com/babel-service.js (offending char is at line 10348), or create a file and paste the content I suggested in comment 0; to ensure it contains the 0x200D char, paste the copied text into http://www.unicodemap.org/search.asp?search= to ensure it's actually there; for example if you select and copy the var declaration stuff and paste it there, you'll get something like in this capture http://i.imgur.com/QO3Auhr.png

Then just load that script as a service worker and you're done.
Flags: needinfo?(nperriault)
Assignee: bkelly → nobody
Status: ASSIGNED → NEW
Assignee: nobody → sshih
After a short study, according to [1], the promise which returned by |register(...)| is rejected when an uncaught exception occurred during 'invoking service worker algorithm', while [2] shows a corresponding uncaught script error had occurred when the target script is not parse/compile/initialize successfully.

I think the behavior works as spec defined. Is there anything I misunderstand?

BTW, it seems chrome register the service worker successfully because it can handle the special character 0x200D.

[1] https://www.w3.org/TR/service-workers/#update-algorithm (step 4.22.3 and 4.22.4)
[2] https://www.w3.org/TR/service-workers/#run-service-worker-algorithm (step 12)
Flags: needinfo?(nperriault)
Maybe what we want is being eventually able to handle the special character 0x200D as well? I'm not too much into character support, but would there be any reason to keep things being the way they are atm? Can't we just support this character? Also, why is this character supported in a page context and not in a service worker one? If we plan on claiming this to be a feature, where should we document behavioral differences between service worker JavaScript vs. regular page one?
Flags: needinfo?(nperriault)
Are these characters usable in dedicated workers?  We must be coercing a string badly in workers/ScriptLoader.cpp.
Thanks for the feedback. Dedicated workers work fine.

After investigations, the JavaScript source codes are truncated in https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerScriptCache.cpp#540. They are encoded as UTF-16 and truncated by NS_NewStringInputStream. I'm trying to find a solution for it.
(In reply to Ben Kelly [:bkelly] from comment #10)
> Is the UTF-8 hint being passed here correct?
> 
>  
> https://dxr.mozilla.org/mozilla-central/source/dom/workers/
> ServiceWorkerScriptCache.cpp#783

I think the answer is yes because
1. The file 'worker.js' downloaded from [1] are encoded as UTF8 (U+200D is encoded as 0xe2 0x80 0x8d).
2. According to [2], the script should be encoded as UTF8.

And the cause of this bug is
In [3], a nsString with UTF16 encoding are converted by a lossy utility to a single byte buffer when writing to cache. While in [4], the string buffer is converted from UTF8 to UTF16 when retrieving from cache.

[1] https://gist.github.com/n1k0/6f925c655798b9a1abeb
[2] https://www.w3.org/TR/service-workers/#run-service-worker-algorithm step 7
[3] https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerScriptCache.cpp#540
[4] https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerScriptCache.cpp#836
Comment on attachment 8739823 [details] [diff] [review]
Convert the service worker script to be registered to utf8 before writing to cache. r?khuey

Review of attachment 8739823 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/ServiceWorkerScriptCache.cpp
@@ +537,5 @@
>  
>      ErrorResult result;
>      nsCOMPtr<nsIInputStream> body;
> +    result = NS_NewCStringInputStream(getter_AddRefs(body),
> +                                      NS_ConvertUTF16toUTF8(mCN->Buffer()));

http://mxr.mozilla.org/mozilla-central/source/xpcom/io/nsStringStream.cpp#410

Doh!!!
Attachment #8739823 - Flags: review?(khuey) → review+
Nathan, if the code linked in comment 13 is expected behavior, can we get a big warning comment somewhere about that? Or maybe an assertion?

Ben, does ExtractFromURLSearchParams in fetch need to do something different here too?
Flags: needinfo?(nfroyd)
Flags: needinfo?(bkelly)
It certainly seems like this code would have the exact same problem for a URL search param containing one of these UTF-16 characters:

  https://dxr.mozilla.org/mozilla-central/source/dom/fetch/Fetch.cpp#454

Ming-Chou, can you fix this case as well using the same solution as here?  Thanks!
Flags: needinfo?(bkelly) → needinfo?(sshih)
I wonder if we should just fix this in xpcom/ ...
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16)
> I wonder if we should just fix this in xpcom/ ...

Yes, we should.  It's not clear to me if the uses in netwerk/streamconv/converters/ are bogus or not:

https://dxr.mozilla.org/mozilla-central/search?q=NS_NewStringInputStream&redirect=false&case=false

though there are various XXX comments about Unicode and such, so maybe the current behavior is the intended behavior?  We could have NS_NewStringInputStream that does the right thing and NS_NewLossyStringInputStream that retains the current behavior, using the former in dom/ and the latter in netwerk/.  Or we could ask somebody about the netwerk/ ones.
Flags: needinfo?(nfroyd)
(In reply to Ben Kelly [:bkelly] from comment #15)
> It certainly seems like this code would have the exact same problem for a
> URL search param containing one of these UTF-16 characters:
> 
>   https://dxr.mozilla.org/mozilla-central/source/dom/fetch/Fetch.cpp#454
> 
> Ming-Chou, can you fix this case as well using the same solution as here? 
> Thanks!

No problems. I will handle it if our conclusion is fixing it in callers.
Flags: needinfo?(sshih)
Whiteboard: [tw-dom] → [tw-dom] btpp-active
Hi Honza,
Would you mind helping me confirm whether the behaviors in nsTXTToHTMLConv which uses NS_NewStringInputStream to generate nsStringInputStream are intended? This API truncates the high bytes of nsAString to a single-byte data buffer and leads to this bug. I plan to fix it by converting UTF16 to UTF8 in NS_NewStringInputStream,  but need confirmation whether it's ok to nsTXTToHTMLConv.
Flags: needinfo?(honzab.moz)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16)
> I wonder if we should just fix this in xpcom/ ...

Could we have something like:

template <class T>
nsresult
NS_NewStringInputStream(nsIInputStream** aStreamResult, const nsAString& aStringToRead)
{
  T data(aStringToRead);
  return NS_NewCStringInputStream(aStreamResult, data);
}

?

(In reply to Ming-Chou Shih [:stone] from comment #19)
> Hi Honza,
> Would you mind helping me confirm whether the behaviors in nsTXTToHTMLConv
> which uses NS_NewStringInputStream to generate nsStringInputStream are
> intended? This API truncates the high bytes of nsAString to a single-byte
> data buffer and leads to this bug. I plan to fix it by converting UTF16 to
> UTF8 in NS_NewStringInputStream,  but need confirmation whether it's ok to
> nsTXTToHTMLConv.

The code has been introduced in bug 318193 and loosy conversion is there since ever (2006).  I think there originally was only a loosy conv because we didn't have any UTF-16 -> UTF-8 conversion, only this function https://github.com/mozilla/gecko-dev/blob/eb3847a802f725d0a1eeaf94305f5b70a55fee9f/xpcom/string/public/nsStringAPI.h#L784 which was loosy too.

Hence, I think to have a template function that on places you KNOW the target can be read as UTF-8 should use NS_ConvertUTF16toUTF8.

See https://mxr.mozilla.org/mozilla-central/source/netwerk/streamconv/converters/nsTXTToHTMLConv.cpp#150.  It's already destroying the input with LossyConvertEncoding8to16.  I'd rather leave it, so use the template function with loosy conv as the template arg.
Flags: needinfo?(honzab.moz)
It would be better to make NS_NewStringInputStream do the correct thing, and make any legacy Necko consumers do the lossy conversion and then call NS_NewCStringInputStream themselves, rather than templating NS_NewStringInputStream or something crazy.
Hi Kyle and Honza,

I personally prefer Kyle's solution mentioned in comment #21. In fact, I think we can be a little more aggresive, that is we can remove both of the two functions. Since the InputStream has no idea about the format of its content, the caller should explicitly decide the encoding method when creating it and the decoding method when reading from it. Thus, the caller can write something like the following snippets.

```
  NS_ConvertUTF16toUTF8 data(aStringToRead);
  NS_NewCStringInputStream(aStreamResult, data);

```
or
```
  NS_LossyConvertUTF16toASCII data(aStringToRead);
  NS_NewCStringInputStream(aStreamResult, data);
```


How do you think about it?
Flags: needinfo?(khuey)
Flags: needinfo?(honzab.moz)
That sounds fine.
Flags: needinfo?(khuey)
That is kinda more flexible than a template function, tho caller is not notified during compile time that there is a need for a conversion to be made, so it may be a foot gun.  Up to you tho, I'm fine with both of the approaches.
Flags: needinfo?(honzab.moz)
Is this something exposed to addons or chrome script?  Is it possible those depend on lossy conversions for some reason?
(In reply to Ben Kelly [:bkelly] from comment #25)
> Is this something exposed to addons or chrome script?  Is it possible those
> depend on lossy conversions for some reason?

If I understand Gecko correctly, addons and Chrome scripts can only access methods exposed by XPCOM interfaces. Thus, the method (NS_NewStringInputStream) only can be used internally.

The other possibility that addons can use those methods is that the method is exposed to the addon development environment. I've tried searching it with DXR, and I didn't find anything alike.

Is there any case should be considered?
Attachment #8739823 - Attachment is obsolete: true
Attachment #8742045 - Flags: review?(khuey)
Attachment #8742047 - Flags: review?(khuey) → review?(nfroyd)
Attachment #8742047 - Flags: review?(nfroyd) → review+
Attachment #8742046 - Flags: review?(honzab.moz) → review+
Changed patch summary
Attachment #8742045 - Attachment is obsolete: true
Attachment #8742687 - Flags: review+
Changed patch summary
Attachment #8742046 - Attachment is obsolete: true
Attachment #8742047 - Attachment is obsolete: true
Attachment #8742689 - Flags: review+
Keywords: checkin-needed
A test would be nice too.
Flags: needinfo?(sshih)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #36)
> A test would be nice too.

OK, I will add it.
Flags: needinfo?(sshih)
Depends on: 1266637
You need to log in before you can comment on or make changes to this bug.