Closed Bug 1487113 Opened 6 years ago Closed 3 years ago

use alt data to cache wasm code compiled from Response

Categories

(Core :: JavaScript: WebAssembly, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- affected

People

(Reporter: luke, Assigned: luke)

References

(Depends on 2 open bugs, Blocks 2 open bugs, Regressed 1 open bug)

Details

(Keywords: perf-alert)

Attachments

(20 files, 5 obsolete files)

2.37 KB, patch
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
18.99 KB, patch
baku
: review+
Details | Diff | Splinter Review
920 bytes, patch
baku
: review+
Details | Diff | Splinter Review
9.16 KB, patch
lth
: review+
Details | Diff | Splinter Review
19.48 KB, patch
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
Since WebAssembly.{compileStreaming,instantiateStreaming} both take a Response, we have a link back to the original cache entry, so we can use the alt-data API (currently used by the JS bytecode cache) to cache machine code.  Later with wasm-esm integration (https://github.com/WebAssembly/esm-integration/tree/master/proposals/esm-integration), we could additionally plug in via the nsScriptLoader.

This is intended to subsume the explicit IDB caching which was recently removed from the wasm spec (bug 1469395) and beat the implicit asm.js caching we do in dom/asmjscache.  The core (de)serialization machinery is still present (and actively used by asm.js caching).

The existing wasm streaming support this would build on is FetchUtil::StreamResponseToJS:
  https://searchfox.org/mozilla-central/source/dom/fetch/FetchUtil.cpp#530
and the alt-data interface we need to use is nsICacheInfoChannel, which lets us get a stream for the serialized module.

It looks like we'll need some modest changes to the alt-data API: bug 1487100.  In the meantime, we can get something sorta-working by unconditionally setting the preferred alt-data-type "wasm" in fetch().

When alt-data is implemented for the DOM Cache API (bug 1336199), this same support should Just Work for DOM Cache / ServiceWorker which would perhaps provide access to larger quota and lower churn.
Depends on: 1487475
Depends on: 1330661
Depends on: 1500203
Ok, so I finally started to play around with the awesome new functionality added by bug 1487100.

Valentin: this tiny patch requests alt-data for "application/wasm", which I would expect to attach an nsICacheInfoChannel to the InternalRequest.  The fprintf() before AsyncOpen2() shows I'm calling PreferAlternativeDataType() for the fetch() in question, but the fprintf() on the consuming end indicates there is not an nsICacheInfoChannel.

I fully expect I'm doing something silly or expecting to find the nsICacheInfoChannel in the wrong place.  Can you see anything wrong?

I'm using https://lukewagner.github.io/wasm-mime-type to test this path.
Assignee: nobody → luke
Flags: needinfo?(valentin.gosu)
I think Nicolas can help here.
Flags: needinfo?(valentin.gosu) → needinfo?(nicolas.b.pierron)
Comment on attachment 9025507 [details] [diff] [review]
request-alt-data-for-wasm (test)

Review of attachment 9025507 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/fetch/FetchDriver.cpp
@@ +755,5 @@
>    } else {
> +    nsCOMPtr<nsICacheInfoChannel> cic = do_QueryInterface(chan);
> +    if (cic) {
> +      fprintf(stderr, "### PREFERRING for %s !\n", url.get());
> +      cic->PreferAlternativeDataType(NS_LITERAL_CSTRING("wasm"), NS_LITERAL_CSTRING("application/wasm"));

I do not know much of this code base, but …

My guess would be that you have to create an AlternativeDataStreamListener (as done as few lines above) and set the mAltDataListener field in order to pipe the input stream of the nsICacheInfoChannel through the fetch API.
Forwarding the need-info to Ben Kelly as he reviewed this implementation and might know better where to patch the default case for setting the preferred alternated data type.
Flags: needinfo?(nicolas.b.pierron) → needinfo?(ben)
Just heard that Ben left Mozilla, so based on Nathan feedback, I will re-forward to Andrew / Andrea.
Flags: needinfo?(bugmail)
Flags: needinfo?(ben)
Flags: needinfo?(amarchesini)
I think Eden is working on the alternate stream support in Cache API and may also have input here.
Yes, I'm going to redirect to Eden for his expertise here.
Flags: needinfo?(bugmail) → needinfo?(echuang)
Flags: needinfo?(amarchesini)
Learning a bit more, I see I was misunderstanding how this was supposed to work.
Flags: needinfo?(echuang)
To summarize, given an InternalResponse, it's not currently possible to write the alt-data output stream because there is no way to get the nsICacheInfoChannel (if there is one).

Discussing options with baku at the all-hands:
 - One option would be to set InternalResponse::mCacheInfoChannel not just when the alt-data cache entry *already* exists, but also when the content-type specified by preferAlternativeDataTypes() matches, so that alt-data can be written.
 - Another option would be to change the nsIInputStream body produced by fetch() so that one could QI it to something providing the nsICacheInfoChannel.  (Maybe this allows killing InternalResponse::mCacheInfoChannel altogether?)
Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
Assignee: luke → amarchesini
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1e97c3573ce6
Expose nsICacheInfoChannel in Respose object for wasm content-type, r=valentin,nbp
https://hg.mozilla.org/integration/autoland/rev/a8406df01e95
nsICacheInfoChannel.preferAlternativeDataType() should expose alt-data as optional if required, r=valentin
https://hg.mozilla.org/integration/autoland/rev/831ac8c662c0
nsICacheInfoChannel.preferAlternativeDataType() should expose alt-data as optional if required - tests, r=valentin
Flags: needinfo?(amarchesini)
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6179d66e457f
Expose nsICacheInfoChannel in Respose object for wasm content-type, r=valentin,nbp
https://hg.mozilla.org/integration/autoland/rev/94e827a2e0d1
nsICacheInfoChannel.preferAlternativeDataType() should expose alt-data as optional if required, r=valentin
https://hg.mozilla.org/integration/autoland/rev/834182e86ef2
nsICacheInfoChannel.preferAlternativeDataType() should expose alt-data as optional if required - tests, r=valentin

Oops, that should've probably been leave-open :) With baku's patch landed, I just have to write a final patch plugging these two things together.

Assignee: amarchesini → luke
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Almost done, but the current issue I'm running into is that OpenAlternativeOutputStream() asserts NS_IsMainThread() (HttpChannelChild.cpp:3083) and the ideal time to call OpenAlternativeOutputStream() is on a random helper thread when wasm tier-2 compilation completes. How hard is this limitation to remove?

Flags: needinfo?(amarchesini)

Another question: is there any way to indicate failure to write the alt-data file, after OpenAlternativeOutputStream() succeeds? The only current usage in ScriptLoader.cpp makes a single Write() and implicitly assumes:

  • the Write() writes the entire payload atomically
  • calling Close() if Write() fails is correctly interpreted as failure

Are both these assumptions valid? This seems a bit fragile; is there any other way to signal failure to write given only an nsIOutputStream. I'm not too familiar with streams; maybe dropping the last refcount on the nsIOutputStream without having called Close()?

Noticing that CacheFileOutputStream is an nsIAsyncOutputStream, which has CloseWithStatus(), which is a way to indicate failure, I tried writing a patch that propagates the nsIAsyncOutputStream all the way into openAlternativeOutputStream()'s signature, but I ran into trouble with AltDataOutputStreamChild, which is not an nsIAsyncOutputStream.

(In reply to Luke Wagner [:luke] from comment #18)

How hard is this limitation to remove?

At the moment, PHttpChannel is managed by PNecko which is managed by PContent and all of this is main-thread only.
This is hard to be changed soon-ish.

Another question: is there any way to indicate failure to write the alt-data file, after OpenAlternativeOutputStream() succeeds?

You can do several write() calls and then a final close(). If a writing operation, on the parent side fails, the information is propagated asynchronously back to the child actor. Any following writing/flushing/closing op will report the known error.

If you want to propagate an error from the child actor, this is not supported. And yes, using nsIAsyncOutputStream seems a good solution. The missing piece is that we need to inform CacheEntry about this failure. NI Valentin for this part.

Flags: needinfo?(amarchesini) → needinfo?(valentin.gosu)

(In reply to Andrea Marchesini [:baku] from comment #20)

If you want to propagate an error from the child actor, this is not supported. And yes, using nsIAsyncOutputStream seems a good solution. The missing piece is that we need to inform CacheEntry about this failure. NI Valentin for this part.

I agree, we need to implement nsIAsyncOutputStream for AltDataOutputStreamChild

Flags: needinfo?(valentin.gosu)

I agree, we need to implement nsIAsyncOutputStream for AltDataOutputStreamChild

And we also need to introduce a way to abort the writing operation in CacheEntry. Does this already exist?

Flags: needinfo?(valentin.gosu)

(In reply to Andrea Marchesini [:baku] from comment #22)

I agree, we need to implement nsIAsyncOutputStream for AltDataOutputStreamChild

And we also need to introduce a way to abort the writing operation in CacheEntry. Does this already exist?

Clearing the alt-data should already be done when calling CloseWithStatus(error).

https://searchfox.org/mozilla-central/rev/dbddac86aadf1d4871fb350bbe66db43728a9f81/netwerk/cache2/CacheFileOutputStream.cpp#186,218

Flags: needinfo?(valentin.gosu)

(In reply to Andrea Marchesini [:baku] from comment #20)
Thanks!

(In reply to Luke Wagner [:luke] from comment #18)
At the moment, PHttpChannel is managed by PNecko which is managed by PContent and all of this is main-thread only.
This is hard to be changed soon-ish.

Ok, makes sense, I can fix it on my end. So after dispatching a runnable to the main thread to open the alt-data stream, because the nsIOutputStream is blocking, I probably need to dispatch to some background thread to do the Write(), right? Is there any utility service to do this that doesn't introduce an extra copy (b/c this is possibly a many-MB payload)? Otherwise, I'd probably just dispatch my same runnable to some background thread where I can do the synchronous Write(); what's a good thread pool to use for this?

Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)

Luke and I discussed this issue on IRC.

Attached patch close-with-status (obsolete) — Splinter Review

With nsIAsyncOutputStream present everywhere (thanks baku!), this patch changes the static return type of openAlternativeOutputStream() to return an nsIAsyncOutputStream. This allows me to remove the aforementioned sketchy assumptions in ScriptLoader.cpp so that we CloseWithStatus(NS_OK) iff the output was written successfully.

Attachment #9047165 - Flags: review?(amarchesini)

(Tweaked)

Attachment #9047165 - Attachment is obsolete: true
Attachment #9047165 - Flags: review?(amarchesini)
Attachment #9047166 - Flags: review?(amarchesini)

The current JSBC code seems to make this assumption...

Attachment #9047167 - Flags: review?(amarchesini)
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/169c6e292149
AltDataOutputStreamChild must be nsIAsyncOutputStream, r=valentin
Comment on attachment 9047166 [details] [diff] [review]
close-with-status

Review of attachment 9047166 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/script/ScriptLoader.cpp
@@ +2880,5 @@
>      return;
>    }
>    MOZ_ASSERT(output);
> +
> +  {

Write a comment saying that this block is done just to call Close() before releasing bytecodefailed.

::: netwerk/base/nsICacheInfoChannel.idl
@@ +151,5 @@
>     * The consumer may choose to replace the saved alt representation.
>     * Opening the output stream will fail if there are any open input streams
> +   * reading the already saved alt representation. After successfully opening
> +   * an output stream, the client must signal failure to write a complete
> +   * alternative representation via CloseWithStatus(NS_ERROR_*).

No, this is wrong. After successfully opening an output stream, the client must signal _success_ to write a complete alternative rappresentation via CloseWithStatus(NS_OK). If an error is passed as argument, the alternative stream is be aborted and the data not saved.
Attachment #9047166 - Flags: review?(amarchesini) → review+
Comment on attachment 9047167 [details] [diff] [review]
assert-write-length

Review of attachment 9047167 [details] [diff] [review]:
-----------------------------------------------------------------

This is OK, but in theory, nothing guarantees that a nsIOutputStream::Write() writes the whole buffer. It could write only part of it and then ask you to call ::AsyncWait(). It's an nsIAsyncOutputStream in the end, right?
Attachment #9047167 - Flags: review?(amarchesini) → review+

(In reply to Andrea Marchesini [:baku] from comment #32)
Right, I agree, but the code is already written with this assumption; the alternative would be to generalize it with a loop, but it's never used (or testable) then.

This patch hands over ownership of the Vector<uint8> so the callee can hold onto it when dispatching to the main thread to write the cache file.

Attachment #9047245 - Flags: review?(lhansen)

This patch (which applies on top of the other 3) creates the wasm alt-data stream and writes it to that alt-data output stream. The patch doesn't implement reading wasm alt-data, though; it just ignores the alt-data and reads the unfiltered body as before. IIUC, because deliverAltData is set to false, this should still Just Work, and for small wasm modules it does.

But for big wasm modules, the unfiltered body produces bytes that are neither the original response nor the serialized bytes. A simple tester app that shows this is: https://lukewagner.github.io/test-streaming/

If you apply my patches and click "Compile Small" twice, it works both times. But if you click "Compile Big" twice, it fails reliably the second and all future times until the cache is cleared. (Note you need to wait for the "### Stored optimized encoding" printf() to know that the alt-data was written; this can take a while in debug builds.)

The "Print" buttons print the bytes of the fetch() as an ArrayBuffer. For "Print Small", we see the same bytes before and after "Compile Small". However, "Print Big" shows different bytes before and after.

So something weird is happening for large (~34mb) alt-data. Any ideas?

Flags: needinfo?(amarchesini)
Comment on attachment 9047245 [details] [diff] [review]
pass-bytes-ownership

Review of attachment 9047245 [details] [diff] [review]:
-----------------------------------------------------------------

Some of the formatting changes will likely be undone by clang-format when it rides over the code base again.
Attachment #9047245 - Flags: review?(lhansen) → review+
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd4cd4017fd4
Use CloseWithStatus in ScriptLoader.cpp to indicate failure (r=baku)
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/27986829a160
MOZ_RELEASE_ASSERT existing implicit assumption about alt-data Write() in ScriptLoader.cpp (r=baku)

First of all, sorry for the delay. I have seen the patch landed, is this NI still valid?

Flags: needinfo?(amarchesini) → needinfo?(luke)
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1e5191f8b8b5
Baldr: pass ownership of optimized encoding bytes (r=lth)

Thanks and no worries; I've been distracted away from this patch too. As of comment 43, all the pre-req patches have been pushed and so the patch in comment 35 applies to inbound tip. I just tried again and the same issues still appears.

Flags: needinfo?(luke) → needinfo?(amarchesini)

Different question: it looks like getAltDataInputStream() doesn't guarantee that the given aReceiver is called in all cases: it's only called if the stream is successfully opened and thus there is no way to detect and report the error case. Is there something I'm missing here?

Another question: before getAltDataInputStream() is called, will the parent have already speculatively started sending bytes to the child for the original input stream? Or, does it wait until the child process actually requests it?

(In reply to Luke Wagner [:luke] from comment #45)

Different question: it looks like getAltDataInputStream() doesn't guarantee that the given aReceiver is called in all cases: it's only called if the stream is successfully opened and thus there is no way to detect and report the error case.

FWIW, a really ideal interface would be for getAltDataInputStream() to synchronously return the new nsIInputStream and signal parent or IPC errors by having the stream fail with an error. That way the JSStreamConsumer can be Start()ed with an nsIInputStream in both the cached and uncached cases.

Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2860ee8dd24
nsHttpChannel should return the Content-Length even when alt-data is available if not delivered, r=valentin

Luke, I submitted a fix. When that will be in m-c, can you check all again and, in case, resolve this bug as fixed?
We also need a test for this issue. Thanks!

Flags: needinfo?(amarchesini) → needinfo?(luke)

Works great, thanks!

Flags: needinfo?(luke)

Luke and I discussed how to remove the async aspect of getAltDataInputStream(). The idea is to send the alt-data stream from parent to child in a delay-start mode. See https://searchfox.org/mozilla-central/rev/a7315d78417179b151fef6108f2bce14786ba64d/ipc/glue/IPCStreamUtils.h

This would work. My only concern is that we need to send 1 stream for each preferred alt-data stream. Probably this is fine, but I want to check it more before implementing it.

Thanks! Since there can only be 1 alt-data file, even if multiple alt-data-types were preferred, it seems like we'd only need to send over the at-most-1 stream that matched nsICacheInfoChannel.alternateDataType.

Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0ee68b246224
nsICacheInfoChannel.originalInputStream as attribute, r=valentin
https://hg.mozilla.org/integration/autoland/rev/74ae5929e387
nsICacheInfoChannel.alternativeDataInputStream as attribute, r=valentin

Backed out 2 changesets (Bug 1487113) for causing xpcshell failure in netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js CLOSED TREE

Failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&selectedJob=237282804

Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=237282804&repo=autoland&lineNumber=2103

[task 2019-04-01T09:49:47.482Z] 09:49:47 INFO - TEST-START | security/manager/ssl/tests/unit/test_validity.js
[task 2019-04-01T09:49:48.286Z] 09:49:48 INFO - TEST-PASS | security/manager/ssl/tests/unit/test_validity.js | took 801ms
[task 2019-04-01T09:49:48.287Z] 09:49:48 INFO - Retrying tests that failed when run in parallel.
[task 2019-04-01T09:49:48.295Z] 09:49:48 INFO - TEST-START | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js
[task 2019-04-01T09:54:48.293Z] 09:54:48 WARNING - TEST-UNEXPECTED-TIMEOUT | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | Test timed out
[task 2019-04-01T09:54:48.295Z] 09:54:48 INFO - TEST-INFO took 300000ms
[task 2019-04-01T09:54:48.296Z] 09:54:48 INFO - >>>>>>>
[task 2019-04-01T09:54:48.298Z] 09:54:48 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2019-04-01T09:54:48.300Z] 09:54:48 INFO - (xpcshell/head.js) | test pending (2)
[task 2019-04-01T09:54:48.302Z] 09:54:48 INFO - (xpcshell/head.js) | test pending (3)
[task 2019-04-01T09:54:48.308Z] 09:54:48 INFO - (xpcshell/head.js) | test run in child pending (4)
[task 2019-04-01T09:54:48.308Z] 09:54:48 INFO - (xpcshell/head.js) | test MAIN run_test finished (4)
[task 2019-04-01T09:54:48.309Z] 09:54:48 INFO - running event loop
[task 2019-04-01T09:54:48.310Z] 09:54:48 INFO - "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2019-04-01T09:54:48.310Z] 09:54:48 INFO - CHILD-TEST-STARTED
[task 2019-04-01T09:54:48.311Z] 09:54:48 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2019-04-01T09:54:48.311Z] 09:54:48 INFO - (xpcshell/head.js) | test pending (2)
[task 2019-04-01T09:54:48.312Z] 09:54:48 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2019-04-01T09:54:48.312Z] 09:54:48 INFO - running event loop
[task 2019-04-01T09:54:48.313Z] 09:54:48 INFO - "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "/builds/worker/workspace/build/tests/xpcshell/head.js" line: 352}]"
[task 2019-04-01T09:54:48.313Z] 09:54:48 INFO - "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "/builds/worker/workspace/build/tests/xpcshell/head.js" line: 352}]"
[task 2019-04-01T09:54:48.314Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | undefined assertion name - 13 == 13
[task 2019-04-01T09:54:48.315Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | readServerContent - [readServerContent : 96] "response body" == "response body"
[task 2019-04-01T09:54:48.315Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | readServerContent - [readServerContent : 97] "" == ""
[task 2019-04-01T09:54:48.316Z] 09:54:48 INFO - (xpcshell/head.js) | test pending (2)
[task 2019-04-01T09:54:48.317Z] 09:54:48 INFO - (xpcshell/head.js) | test flushAndOpenAltChannel pending (3)
[task 2019-04-01T09:54:48.318Z] 09:54:48 INFO - (xpcshell/head.js) | test finished (3)
[task 2019-04-01T09:54:48.318Z] 09:54:48 INFO - (xpcshell/head.js) | test finished (3)
[task 2019-04-01T09:54:48.319Z] 09:54:48 INFO - (xpcshell/head.js) | test pending (3)
[task 2019-04-01T09:54:48.320Z] 09:54:48 INFO - (xpcshell/head.js) | test flushAndOpenAltChannel finished (3)
[task 2019-04-01T09:54:48.321Z] 09:54:48 INFO - (xpcshell/head.js) | test finished (2)
[task 2019-04-01T09:54:48.321Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | undefined assertion name - 10 == 10
[task 2019-04-01T09:54:48.322Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | readAltContent - [readAltContent : 131] true == true
[task 2019-04-01T09:54:48.322Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | readAltContent - [readAltContent : 132] "text/binary" == "text/binary"
[task 2019-04-01T09:54:48.323Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | readAltContent - [readAltContent : 133] "!@#$%^&()" == "!@#$%^&()"
[task 2019-04-01T09:54:48.324Z] 09:54:48 INFO - (xpcshell/head.js) | test pending (2)
[task 2019-04-01T09:54:48.324Z] 09:54:48 INFO - (xpcshell/head.js) | test finished (2)
[task 2019-04-01T09:54:48.325Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | load_channel - [load_channel : 31] "http://localhost:33545/content" == true
[task 2019-04-01T09:54:48.325Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | undefined assertion name - 13 == 13
[task 2019-04-01T09:54:48.325Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | readTextData - [readTextData : 48] "" == ""
[task 2019-04-01T09:54:48.325Z] 09:54:48 INFO - TEST-PASS | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | readTextData - [readTextData : 49] "response body" == "response body"
[task 2019-04-01T09:54:48.326Z] 09:54:48 INFO - (xpcshell/head.js) | test pending (2)
[task 2019-04-01T09:54:48.326Z] 09:54:48 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js | - NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICacheInfoChannel.openAlternativeOutputStream]
[task 2019-04-01T09:54:48.327Z] 09:54:48 INFO - readTextData/<@/builds/worker/workspace/build/tests/xpcshell/tests/netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js:54:17
[task 2019-04-01T09:54:48.328Z] 09:54:48 INFO - run@/builds/worker/workspace/build/tests/xpcshell/head.js:688:9
[task 2019-04-01T09:54:48.328Z] 09:54:48 INFO - _do_main@/builds/worker/workspace/build/tests/xpcshell/head.js:227:6
[task 2019-04-01T09:54:48.329Z] 09:54:48 INFO - _execute_test@/builds/worker/workspace/build/tests/xpcshell/head.js:529:5

Flags: needinfo?(luke)
Backout by shindli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/16b03064b09d
Backed out 2 changesets for causing xpcshell failure in netwerk/test/unit_ipc/test_alt-data_cross_process_wrap.js CLOSED TREE

redirecting ni? to author. btw, thanks for the patch baku!

Flags: needinfo?(luke) → needinfo?(amarchesini)

I discussed this issue with Valentin. The problem seems to be that if we retrieve the alt-data inputstream from a CacheEntry, then we are not able to obtain the output stream too. This means that, if we already have an alt-data, and with my patch, the inputStream is available from nsIHttpChannel, it's not possible to overwrite the data writing into the output stream.

Can you confirm this? Is it something we can fix it in necko?

Flags: needinfo?(amarchesini) → needinfo?(valentin.gosu)

(In reply to Andrea Marchesini [:baku] from comment #61)

Can you confirm this? Is it something we can fix it in necko?

That is correct, and I don't think there's a way to fix it in necko without a major rewrite of the cache.

The reason for this is that each CacheFile keeps track of inputStreams and outputStreams to it to allow multiple readers from the cache entry. So for example if one load of a.html is in progress, then another load of a.html can simply stream it from the cache, while the first load is writing it from the network. But a third load can't overwrite that entry while other inputStreams are opened to it - this is why we can't overwrite the alt-data while we already have an alt-data inputStream still open.

Flags: needinfo?(valentin.gosu)

Independent of above, I have a working patch stack and so I'm writing tests and I ran into a weird case: if I Response.clone(), and then compile both Responses (either at the same time or in sequence), both have an nsICacheInfoChannel which claims matching alt-data, but the second Response's nsIInputStream is empty (specifically, on the first call to nsPipeInputStream::AsyncWait(), Status(mon) == NS_BASE_STREAM_CLOSED and so onInputStreamReady() is synchronously called with nsIInputStream::Available() returning 0). Maybe simple bug?

Flags: needinfo?(amarchesini)

Can you give me a test + your patch? I'll work on a fix.

Flags: needinfo?(amarchesini) → needinfo?(luke)

Ok, thanks! If you apply these patches (and/or review them; I think they're ready :), which are based on your two unlanded patches, then the mochitest dom/promise/tests/test_webassembly_compile.html has two tests, compileCachedBothClonesHitCache and compileCachedCacheThroughClone, that hit this case and crash/fail (you can isolate them by commenting out everything else in the final tests array at the end of the file).

Flags: needinfo?(luke)
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c910aaceb2aa
Use NS_MakeSyncNonBlockingInputStream in JSStreamConsumer (r=baku)
https://hg.mozilla.org/integration/mozilla-inbound/rev/d078014dcad1
Baldr: correctly deserialize name section index (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd5eba0629e6
Baldr: add logging/testing functions for (de)serialization (r=lth)

(Note: now just one patch to apply to tip to test the failure in the new test_webassembly_compile.html mochitest. Edit: oops, in addition to the two patches that were backed out in comment 59.)

Flags: needinfo?(amarchesini)
Blocks: 1545131

One interesting thing I noticed is that, without bug 1545131, the alt data entry for web.autocad.com is 148mb. When attempting to store this, nsICacheInfoChannel::OpenAlternativeOutputStream() succeeds (with the declared length of 148mb), which is a bit surprising; from the comment, it seemed like this would fail since the cache file is way beyond the size limit.

I just landed a couple of new patches. If/when they will be in m-c, Luke, can you test again if there are issues with your code?

Flags: needinfo?(amarchesini) → needinfo?(luke)
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6f579752678
nsICacheInfoChannel.originalInputStream as attribute, r=valentin
https://hg.mozilla.org/integration/autoland/rev/dce59b615568
nsICacheInfoChannel.alternativeDataInputStream as attribute, r=valentin
Backout by dvarga@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3a8b999553a4
Backed out 2 changesets for mochitest failure at dom/base/test/test_script_loader_js_cache.html.

Backed out 2 changesets (Bug 1487113) for mochitest failure at dom/base/test/test_script_loader_js_cache.html.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&searchStr=mochitest-e10s&revision=dce59b6155685fb271e2affd84e5e4b44e0ec8d5&selectedJob=247375112

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247375112&repo=autoland&lineNumber=4102
and
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247399578&repo=autoland&lineNumber=2451

Backout link: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&classifiedState=unclassified&searchStr=tp6m&revision=3a8b999553a4e672b7895b8c8aabd458afbbe3a1

[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  1686 INFO TEST-OK | dom/serviceworkers/test/test_scopes.html | took 220ms
[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  1687 INFO TEST-START | dom/serviceworkers/test/test_script_loader_intercepted_js_cache.html
[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  1688 INFO TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_script_loader_intercepted_js_cache.html | Test for saving and loading bytecode in/from the necko cache - Test for saving and loading bytecode in/from the necko cache: assert_equals: [4] ScriptLoadRequest status after same SRI hash expected "bytecode_exec" but got "fallback_bytecode_saved"
[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  1689 INFO TEST-PASS | dom/serviceworkers/test/test_script_loader_intercepted_js_cache.html | Test for saving and loading bytecode in/from the necko cache - Test for saving and loading bytecode in/from the necko cache: Elided 1 passes or known failures.
[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  1690 INFO TEST-OK | dom/serviceworkers/test/test_script_loader_intercepted_js_cache.html | took 300ms
[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  1691 INFO TEST-START | dom/serviceworkers/test/test_service_worker_allowed.html
[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  Buffered messages logged at 16:34:48
[task 2019-05-20T16:34:55.323Z] 16:34:55     INFO -  1692 INFO TEST-PASS | dom/serviceworkers/test/test_service_worker_allowed.html | Registration should fail
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -  1693 INFO TEST-PASS | dom/serviceworkers/test/test_service_worker_allowed.html | Registration should fail
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -  1694 INFO TEST-PASS | dom/serviceworkers/test/test_service_worker_allowed.html | Registration should fail
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -  1695 INFO TEST-PASS | dom/serviceworkers/test/test_service_worker_allowed.html | Registration should finish successfully
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -  1696 INFO TEST-PASS | dom/serviceworkers/test/test_service_worker_allowed.html | Registration should finish successfully
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -  Buffered messages finished
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -  1697 INFO TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_service_worker_allowed.html | This test left a service worker registered without cleaning it up
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -      SimpleTest.ok@SimpleTest/SimpleTest.js:275:18
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -      afterCleanup@SimpleTest/SimpleTest.js:1190:28
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -      executeCleanupFunction@SimpleTest/SimpleTest.js:1230:13
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -      SimpleTest.finish@SimpleTest/SimpleTest.js:1249:5
[task 2019-05-20T16:34:55.324Z] 16:34:55     INFO -  1698 INFO TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_service_worker_allowed.html | Left over worker: http://mochi.test:8888/tests/dom/serviceworkers/test/fetch.js (scope: http://mochi.test:8888/tests/dom/serviceworkers/test/)
[task 2019-05-20T16:34:55.325Z] 16:34:55     INFO -      SimpleTest.ok@SimpleTest/SimpleTest.js:275:18
[task 2019-05-20T16:34:55.325Z] 16:34:55     INFO -      afterCleanup@SimpleTest/SimpleTest.js:1192:32
[task 2019-05-20T16:34:55.325Z] 16:34:55     INFO -      executeCleanupFunction@SimpleTest/SimpleTest.js:1230:13
[task 2019-05-20T16:34:55.325Z] 16:34:55     INFO -      SimpleTest.finish@SimpleTest/SimpleTest.js:1249:5
[task 2019-05-20T18:33:15.121Z] 18:33:15     INFO -  SUMMARY: AddressSanitizer: heap-use-after-free /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/ipc/ProtocolUtils.h:305:39 in Manager
[task 2019-05-20T18:33:15.121Z] 18:33:15     INFO -  Shadow bytes around the buggy address:
[task 2019-05-20T18:33:15.121Z] 18:33:15     INFO -    0x0c2c8001db20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[task 2019-05-20T18:33:15.122Z] 18:33:15     INFO -    0x0c2c8001db30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[task 2019-05-20T18:33:15.122Z] 18:33:15     INFO -    0x0c2c8001db40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[task 2019-05-20T18:33:15.122Z] 18:33:15     INFO -    0x0c2c8001db50: 00 00 00 00 00 02 fa fa fa fa fa fa fa fa fa fa
[task 2019-05-20T18:33:15.123Z] 18:33:15     INFO -    0x0c2c8001db60: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[task 2019-05-20T18:33:15.123Z] 18:33:15     INFO -  =>0x0c2c8001db70: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2019-05-20T18:33:15.123Z] 18:33:15     INFO -    0x0c2c8001db80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2019-05-20T18:33:15.124Z] 18:33:15     INFO -    0x0c2c8001db90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2019-05-20T18:33:15.124Z] 18:33:15     INFO -    0x0c2c8001dba0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
[task 2019-05-20T18:33:15.124Z] 18:33:15     INFO -    0x0c2c8001dbb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
[task 2019-05-20T18:33:15.124Z] 18:33:15     INFO -    0x0c2c8001dbc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[task 2019-05-20T18:33:15.124Z] 18:33:15     INFO -  Shadow byte legend (one shadow byte represents 8 application bytes):
[task 2019-05-20T18:33:15.125Z] 18:33:15     INFO -    Addressable:           00
[task 2019-05-20T18:33:15.125Z] 18:33:15     INFO -    Partially addressable: 01 02 03 04 05 06 07
[task 2019-05-20T18:33:15.125Z] 18:33:15     INFO -    Heap left redzone:       fa
[task 2019-05-20T18:33:15.126Z] 18:33:15     INFO -    Freed heap region:       fd
[task 2019-05-20T18:33:15.127Z] 18:33:15     INFO -    Stack left redzone:      f1
[task 2019-05-20T18:33:15.128Z] 18:33:15     INFO -    Stack mid redzone:       f2
[task 2019-05-20T18:33:15.128Z] 18:33:15     INFO -    Stack right redzone:     f3
[task 2019-05-20T18:33:15.128Z] 18:33:15     INFO -    Stack after return:      f5
[task 2019-05-20T18:33:15.129Z] 18:33:15     INFO -    Stack use after scope:   f8
[task 2019-05-20T18:33:15.129Z] 18:33:15     INFO -    Global redzone:          f9
[task 2019-05-20T18:33:15.129Z] 18:33:15     INFO -    Global init order:       f6
[task 2019-05-20T18:33:15.129Z] 18:33:15     INFO -    Poisoned by user:        f7
[task 2019-05-20T18:33:15.129Z] 18:33:15     INFO -    Container overflow:      fc
[task 2019-05-20T18:33:15.130Z] 18:33:15     INFO -    Array cookie:            ac
[task 2019-05-20T18:33:15.130Z] 18:33:15     INFO -    Intra object redzone:    bb
[task 2019-05-20T18:33:15.130Z] 18:33:15     INFO -    ASan internal:           fe
[task 2019-05-20T18:33:15.130Z] 18:33:15     INFO -    Left alloca redzone:     ca
[task 2019-05-20T18:33:15.131Z] 18:33:15     INFO -    Right alloca redzone:    cb
[task 2019-05-20T18:33:15.132Z] 18:33:15     INFO -    Shadow gap:              cc
[task 2019-05-20T18:33:15.132Z] 18:33:15     INFO -  ==1433==ABORTING
[task 2019-05-20T18:33:15.169Z] 18:33:15     INFO -  Exiting due to channel error.
[task 2019-05-20T18:36:23.496Z] 18:36:23     INFO - TEST-UNEXPECTED-ERROR | telemetry/marionette/tests/client/test_search_counts_across_sessions.py TestSearchCounts.test_search_counts | IOError: Process has been unexpectedly closed (Exit code: 1) (Reason: Process unexpectedly quit without restarting (exit code: 1))
Flags: needinfo?(amarchesini)

The "fallback_bytecode_saved" state is a summary of a trace of events made with TRACE_FOR_TEST macro in the ScriptLoader.cpp file.

[task 2019-05-20T16:34:55.323Z] 16:34:55 INFO - 1688 INFO TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_script_loader_intercepted_js_cache.html | Test for saving and loading bytecode in/from the necko cache - Test for saving and loading bytecode in/from the necko cache: assert_equals: [4] ScriptLoadRequest status after same SRI hash expected "bytecode_exec" but got "fallback_bytecode_saved"

In this case, the difference is that we took the unexpected RestartLoad function, cause by a failure while decoding the first bytes of the saved alternate data.

You should be able to get more information by running the test case with the environment variable MOZ_LOG=ScriptLoader:5,SRI:5.
Feel free to contact me if you need any help understanding the test case.

Attachment #9066047 - Attachment is obsolete: true

I really don't like the 2 patches I proposed. I have a better approach.

Flags: needinfo?(amarchesini)

BTW, I wonder if it's time to clean up this bug and file a separate one. It's getting hard to understand which patches are landed, what is not and what is needed for testing.

Flags: needinfo?(luke)
Depends on: 1554652
Priority: -- → P2
Blocks: 1494458

(In reply to Andrea Marchesini [:baku] from comment #84)

BTW, I wonder if it's time to clean up this bug and file a separate one. It's getting hard to understand which patches are landed, what is not and what is needed for testing.

I agree. Phab doesn't track backing out and D25519 and D31791 seem to be one patch that got submitted twice.

I'm facing bug 1554652 which refers backed-out patches in this bug to use for testing. But the patches here go totally against the patch in bug 1554652. I'm totally confused what to do and what we are trying to fix here.

(In reply to Honza Bambas (:mayhemer) from comment #85)

But the patches here go totally against the patch in bug 1554652. I'm totally confused what to do and what we are trying to fix here.

OK, I really need a break :) The patches change INPUT stream getters. My patch in bug 1554652 changes the alt-data OUTPUT getter, so it's unrelated.

Depends on: 1590305
Blocks: 1590305
No longer depends on: 1590305
OS: Unspecified → All
Priority: P2 → P3
Hardware: Unspecified → All
Attachment #9066047 - Attachment is obsolete: false
Attachment #9066399 - Attachment is obsolete: true

Bumping to P2 because it would actually be nice to get the blockers for this fixed and this optimization landed.

Priority: P3 → P2
Attachment #9066047 - Attachment is obsolete: true
Attachment #9054705 - Attachment description: Bug 1487113 - nsICacheInfoChannel.originalInputStream as attribute, r?valentin → Bug 1487113 - nsICacheInfoChannel.originalInputStream as attribute
Depends on: 1653996

So, there is a problem with the two patches to make cache input streams accessible via an attribute. First, I don't follow the "sync" nature of it when we more tend to go generally async (regardless that the stream itself is async.. Second, when the IPC child side of the DelayedStartInputStream is not Close() and just released, it leaks and keeps the stream also open on the parent process what is highly undesirable.

Also not that opening the input streams on the parent process every time carries some side affects like larger memory consumption caused by data preload (1MB each) and also interfering with the logic of cache concurrency - may block, may leak etc.

Hence, I propose to WONTFIX those two and keep the async approach unless there is a really strong reason for this "go sync" change.

I filed bug 1653996 for the leak that prevents call of CacheFile::RemoveInput and thus makes impossible to create a working patch for bug 1554652.

More proper solution may be to have an async wrapper stream that will only be async and will lazily IPC-loop to the parent to open and get the cache input stream. A variant of the "Receiver" we have now, but instead of waiting for a callback to get the stream, get /a/ stream immediately (the wrapper) which will wait for the stream from the parent transparently.

Attachment #9066048 - Attachment description: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute, r=valentin → Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute

The leave-open keyword is there and there is no activity for 6 months.
:lth, maybe it's time to close this bug?

Flags: needinfo?(lhansen)

Working on rebooting the work.

Flags: needinfo?(lhansen)
Priority: P2 → P3
Priority: P3 → P2
See Also: → 1714219
Attachment #9056959 - Attachment description: Bug 1487113 - Use alt-data to cache stream-compiled WebAssembly modules (r=baku) → WIP: Bug 1487113 - Use alt-data to cache stream-compiled WebAssembly modules
Attachment #9226216 - Attachment description: WIP: Bug 1487113 - WIP: nsICacheInfoChannel.alternativeDataInputStream as attribute → WIP: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute
Attachment #9057572 - Attachment description: Bug 1487113 - Add pref javascript.options.wasm_caching (r=baku) → Bug 1487113 - Add pref javascript.options.wasm_caching. r?baku
See Also: → 1719508
Assignee: mail → ydelendik
Attachment #9226216 - Attachment description: WIP: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute → Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute
Attachment #9056959 - Attachment description: WIP: Bug 1487113 - Use alt-data to cache stream-compiled WebAssembly modules → Bug 1487113 - Use alt-data to cache stream-compiled WebAssembly modules
Attachment #9226216 - Attachment description: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute → WIP: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute
Attachment #9226216 - Attachment description: WIP: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute → Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute
Attachment #9226216 - Attachment description: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute → WIP: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute
Attachment #9226216 - Attachment description: WIP: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute → Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute
Attachment #9226216 - Attachment description: Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute → Bug 1487113 - nsICacheInfoChannel.alternativeDataInputStream as attribute. r=valentin
Attachment #9056959 - Attachment description: Bug 1487113 - Use alt-data to cache stream-compiled WebAssembly modules → Bug 1487113 - Use alt-data to cache stream-compiled WebAssembly modules. r=valentin
Attachment #9057572 - Attachment description: Bug 1487113 - Add pref javascript.options.wasm_caching. r?baku → Bug 1487113 - Add pref javascript.options.wasm_caching. r=valentin
Pushed by ydelendik@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9214b396eb84
nsICacheInfoChannel.alternativeDataInputStream as attribute. r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/b0b2b27dcb68
Use alt-data to cache stream-compiled WebAssembly modules. r=necko-reviewers,valentin,dragana
https://hg.mozilla.org/integration/autoland/rev/b20e5d76c77e
Add pref javascript.options.wasm_caching. r=necko-reviewers,valentin

Backed out for causing mochitest failures.

Flags: needinfo?(ydelendik)
Pushed by ydelendik@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/02236ccd64b4
nsICacheInfoChannel.alternativeDataInputStream as attribute. r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/5b7fe5d564aa
Use alt-data to cache stream-compiled WebAssembly modules. r=necko-reviewers,valentin,dragana
https://hg.mozilla.org/integration/autoland/rev/13bf04fc644f
Add pref javascript.options.wasm_caching. r=necko-reviewers,valentin

Backed out 3 changesets (Bug 1487113) for causing hazard bustages.
Backout link
Push with failures - H
Failure Log

Sorry, I'm not sure who to ask this question of. I'll choose peterv as the victim for now.

This introduced a hazard because JS::Rooted<js::frontend::CompilationInput> in its destructor decrements a RefPtr<ScriptSource> which through a long chain can call ~JSStreamConsumer. This is a problem because it calls mOwningEventTarget->Dispatch() which is assumed to do anything. And even if it didn't, it can call ~WindowStreamOwner which calls obs->RemoveObserver(this, DOM_WINDOW_DESTROYED_TOPIC); which I assume could do stuff that could eventually GC? (I don't know this for sure.)

Anyway, my question is whether there's a good way around this. I thought perhaps it could call DeferredFinalize instead, but I don't know when that should be used and it looks to me like it might not do its work on the right thread (that of mOwningEventTarget.) But I don't know how all this stuff works, so I'm hoping there's a standard answer for this sort of thing? Please redirect the needinfo if you aren't the right person here.

Flags: needinfo?(peterv)
Regressions: 1730793

(In reply to Steve Fink [:sfink] [:s:] from comment #97)

This introduced a hazard because JS::Rooted<js::frontend::CompilationInput> in its destructor decrements a RefPtr<ScriptSource>

Just to make sure I understand what's going on, it's the destructor of js::frontend::CompilationInput we're talking about here, right? Not of the JS::Rooted.

And even if it didn't, it can call ~WindowStreamOwner which calls obs->RemoveObserver(this, DOM_WINDOW_DESTROYED_TOPIC); which I assume could do stuff that could eventually GC? (I don't know this for sure.)

I don't think nsObserverService::RemoveObserver would GC (it essentially just removes a pointer from an array).

Anyway, my question is whether there's a good way around this. I thought perhaps it could call DeferredFinalize instead, but I don't know when that should be used and it looks to me like it might not do its work on the right thread (that of mOwningEventTarget.) But I don't know how all this stuff works, so I'm hoping there's a standard answer for this sort of thing? Please redirect the needinfo if you aren't the right person here.

If you want to make JSStreamConsumer defer its release of mWindowStreamOwner/mWorkerStreamOwner then you could use the second DeferredFinalize variant (https://searchfox.org/mozilla-central/source/xpcom/base/DeferredFinalize.cpp#18), you'd need to write DeferredFinalizeAppendFunction and DeferredFinalizeFunction, and make the DeferredFinalizeFunction do the dispatch. DeferredFinalize was meant to be used for releasing things while it was unsafe (like during GC), but we normally don't use it across threads. So I think this is a bit of a special case.
But it seems like the other way to fix this would be to use DeferredFinalize to release CompilationInput's source member? It seems like that would just work.

Flags: needinfo?(peterv)

(In reply to Peter Van der Beken [:peterv] from comment #98)

(In reply to Steve Fink [:sfink] [:s:] from comment #97)

This introduced a hazard because JS::Rooted<js::frontend::CompilationInput> in its destructor decrements a RefPtr<ScriptSource>

Just to make sure I understand what's going on, it's the destructor of js::frontend::CompilationInput we're talking about here, right? Not of the JS::Rooted.

Yes, exactly. JS::Rooted<CompilationInput>::~Rooted -> ~CompilationInput -> ~RefPtr<ScriptSource> -> ~ScriptSource -> ... -> ~js::wasm::Module::~Module -> ... -> ~JSStreamConsumer -> nsIEventTarget::Dispatch.

And even if it didn't, it can call ~WindowStreamOwner which calls obs->RemoveObserver(this, DOM_WINDOW_DESTROYED_TOPIC); which I assume could do stuff that could eventually GC? (I don't know this for sure.)

I don't think nsObserverService::RemoveObserver would GC (it essentially just removes a pointer from an array).

Oh, good to know. I saw "observer" and assumed the worst.

That means this is probably a false positive (false alarm), and could be resolved through appropriate annotations. Specifically, somehow telling the analysis that mOwningEventTarget->Dispatch() will only invoke the Run() of the corresponding Runnable (WindowStreamOwner::Destroyer).

Anyway, my question is whether there's a good way around this. I thought perhaps it could call DeferredFinalize instead, but I don't know when that should be used and it looks to me like it might not do its work on the right thread (that of mOwningEventTarget.) But I don't know how all this stuff works, so I'm hoping there's a standard answer for this sort of thing? Please redirect the needinfo if you aren't the right person here.

If you want to make JSStreamConsumer defer its release of mWindowStreamOwner/mWorkerStreamOwner then you could use the second DeferredFinalize variant (https://searchfox.org/mozilla-central/source/xpcom/base/DeferredFinalize.cpp#18), you'd need to write DeferredFinalizeAppendFunction and DeferredFinalizeFunction, and make the DeferredFinalizeFunction do the dispatch. DeferredFinalize was meant to be used for releasing things while it was unsafe (like during GC), but we normally don't use it across threads. So I think this is a bit of a special case.
But it seems like the other way to fix this would be to use DeferredFinalize to release CompilationInput's source member? It seems like that would just work.

Hm... that sounds promising, but CompilationInput is defined in js/src/frontend and DeferredFinalize is a Gecko thing. It looks like we'd need to specialize the RefPtr traits to call the right thing. Maybe something could be made to work?

I looked at some possibilities here.

Telling the analysis that mOwningEventTarget->Dispatch will only call WindowStreamOwner::Destroyer is possible, but would require annotations that match functions and mapping a variable (destroyer) to its type, etc., which could be done if there was a good enough reason. But it bothers me that this cuts out quite a few steps in the callgraph, and I still have hopes to use this analysis for other purposes for which some of those steps might be relevant.

Not only that, but convincing the analysis that RemoveObserver can't GC isn't straightforward either, because it goes through an nsISupportsWeakReference that is a bit hard to pin down.

The DeferredFinalize of mWindowStreamOwner seems like a lot of code to handle this case that appears to be a false alarm. The DeferredFinalize of the ScriptSource crosses API layers and would be ugly.

So I'm leaning towards the brute force fix: put a local variable JS::AutoSuppressGCAnalysis ignore; into ~JSStreamConsumer, which will switch to a dynamic check. I think it's good enough for this case.

== Change summary for alert #31382 (as of Mon, 20 Sep 2021 06:15:28 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
13% ebay FirstVisualChange windows10-64-shippable-qr warm webrender 206.42 -> 233.67
8% ebay dcf windows10-64-shippable-qr warm webrender 306.50 -> 330.58

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
3% wasm-godot macosx1015-64-shippable-qr webrender 535.98 -> 521.47
3% wasm-godot-baseline macosx1015-64-shippable-qr webrender 500.00 -> 487.22
2% wasm-misc-baseline windows10-64-shippable-qr webrender 73,136.38 -> 71,639.50

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=31382

Update on the hazard situation: the annotation in ~JSStreamConsumer "worked", but it revealed another hazard. I then ran my local version of the analysis on it (that I'm in the process of landing), and it revealed several more hazards that appear to be real. All of them result from ~CompilationInput becoming capable of GC'ing not just because of the original mOwningEventTarget->Dispatch() call, but also because any/all of these JSStreamConsumer fields could be destroyed if they are the last reference to them:

  nsCOMPtr<nsIEventTarget> mOwningEventTarget;
  RefPtr<WindowStreamOwner> mWindowStreamOwner;
  RefPtr<WorkerStreamOwner> mWorkerStreamOwner;

(plus any in ancestor classes).

So there are two main options here: (1) make an argument that none of these ref counts will ever drop to zero during the destructor, and annotate that ~JSStreamConsumer will never GC; or (2) use DeferredFinalize as peterv described in comment 98.

(1) seems like it might not be correct now, and even if it is, it seems like it could become incorrect in the future in which case we'll probably miss the problem because we annotated it away. But I could be convinced otherwise.

(2) is more code. The seemingly easy way would require some contortions to handle the layering violation:

But it seems like the other way to fix this would be to use DeferredFinalize to release CompilationInput's source member? It seems like that would just work.

DeferredFinalize is a Gecko thing, so we would somehow have to arrange for it to be invoked from within SpiderMonkey when it is embedded in Gecko. (This can be done, eg via registering a callback, but it's messy.)

So right now, I'm feeling like the way forward is

If you want to make JSStreamConsumer defer its release of mWindowStreamOwner/mWorkerStreamOwner then you could use the second DeferredFinalize variant (https://searchfox.org/mozilla-central/source/xpcom/base/DeferredFinalize.cpp#18), you'd need to write DeferredFinalizeAppendFunction and DeferredFinalizeFunction, and make the DeferredFinalizeFunction do the dispatch. DeferredFinalize was meant to be used for releasing things while it was unsafe (like during GC), but we normally don't use it across threads. So I think this is a bit of a special case.

As you can see from yury's patch, the DeferredFinalize solution is a lot of machinery and I'm skeptical that it can work, since ~JSStreamConsumer is being invoked from a thread that doesn't have a CycleCollectedJSRuntime.

So I have a different proposal, and I'd like to know if it makes sense: the only thing we care about here is that the GC not happen synchronously on the same thread. (If we end up dispatching to a different thread and it causes a GC, we don't care. JSRuntimes are single-threaded. The current thread's heap will not get mutated by a GC on a different thread.) Thus, it seems DelayedDispatch might be our friend: instead of

    MOZ_ALWAYS_SUCCEEDS(mOwningEventTarget->Dispatch(destroyer.forget()));

we could do

    MOZ_ALWAYS_SUCCEEDS(mOwningEventTarget->DelayedDispatch(destroyer.forget(), 0));

or even

    if (mOwningEventTarget->isOnCurrentThread()) {
        // Prevent GC on this thread.
        MOZ_ALWAYS_SUCCEEDS(mOwningEventTarget->DelayedDispatch(destroyer.forget(), 1));
    } else {
        MOZ_ALWAYS_SUCCEEDS(mOwningEventTarget->Dispatch(destroyer.forget()));
    }

It will still require an annotation of some sort because the analysis won't follow the control flow well enough.

Flags: needinfo?(peterv)

I tried it out, and the analysis reminded me that this only solves the "easy" problem of the Dispatch() call, which may very well be a false alarm in the first place. The harder problem is:

    JS::OptimizedEncodingListener.Release:0
    mozilla::dom::JSStreamConsumer.Release:0
    uint32 mozilla::dom::JSStreamConsumer::Release()
    void mozilla::dom::JSStreamConsumer::~JSStreamConsumer() [[deleting_dtor]]
    void mozilla::dom::JSStreamConsumer::~JSStreamConsumer()
    void mozilla::dom::JSStreamConsumer::~JSStreamConsumer() [[base_dtor]]
    nsCOMPtr<T>::~nsCOMPtr() [with T = nsIEventTarget] [[complete_dtor]]
    nsCOMPtr<T>::~nsCOMPtr() [with T = nsIEventTarget] [[base_dtor]]
    nsIEventTarget.Release:0
    nsIThreadPool.Release:0
    mozilla::SharedThreadPool.Release:0
    uint32 mozilla::SharedThreadPool::Release()
    uint32 NS_DispatchToMainThread(already_AddRefed<nsIRunnable>*, uint32)
    nsISerialEventTarget.Dispatch:0

What that means is that mOwningEventTarget itself is an nsCOMPtr<nsIEventTarget>, and will be suspected of GC'ing at the end of the destructor body.

Options here are: (1) we somehow know that this will never be the last reference to that event target, in which case I can come up with a way of annotating it; or (2) we transfer ownership of that ref count to the Destroyer and use the same logic to delay the release if it's on the same thread. The latter will still require some annotation, but I can probably arrange for mOwningEventTarget.forget() to signal to the analysis that the field is dead already. (Come to think of it, mWindowStreamOwner has the same issue; the analysis needs to know it won't get another refcnt decrement during destruction as well. And that's not as simple, since there's the mWindowStreamOwner vs mWorkerStreamOwner thing going on. I'll probably have to annotate this with a blanket "~JSStreamConsumer will never GC" annotation in annotations.js. After ensuring that it is true.)

I will note that with this change plus annotating that ~JSStreamConsumer will not GC, I have zero hazards. That annotation still kinda hurts, and I'd like to come up with something more targeted, but this should be good enough for now assuming this replacement makes sense in the first place and I'm not completely botching the ref counting.

Assignee: ydelendik → sphink
Status: REOPENED → ASSIGNED

I've r+ed this to move it forward, sorry for the delay. However, I still have some questions, maybe for a followup.

Would it be possible to store mOwningEventTarget in WorkerStreamOwner? I think we could then call NS_ReleaseOnMainThread (for WindowStreamOwner) and NS_ProxyRelease (for WorkerStreamOwner), both with aAlwaysProxy set to true. That looks like it's at least a more 'standard' way of doing things, and it avoids storing the event target at least for the WindowStreamOwner.

I don't know if the annotations can distinguish calls based on argument values? NS_ReleaseOnMainThread/NS_ProxyRelease with aAlwaysProxy set to true shouldn't GC afaict (if they fail to dispatch the runnable they just leak it), so that would look like a more logical and general annotation to me.

Flags: needinfo?(peterv) → needinfo?(sphink)

(In reply to Peter Van der Beken [:peterv] from comment #108)

Would it be possible to store mOwningEventTarget in WorkerStreamOwner? I think we could then call NS_ReleaseOnMainThread (for WindowStreamOwner) and NS_ProxyRelease (for WorkerStreamOwner), both with aAlwaysProxy set to true. That looks like it's at least a more 'standard' way of doing things, and it avoids storing the event target at least for the WindowStreamOwner.

Ooh, that looks very promising. I'll give it a try.

But one thing I don't understand. There's a comment

    // Both WindowStreamOwner and WorkerStreamOwner need to be destroyed on
    // their global's event target thread.

which implies that the WindowStreamOwner isn't necessarily main thread or something? I don't really understand what's going on here. If there is some distinction between the main thread and the WindowStreamOwner global's event target thread, then it seems like I should be using NS_ProxyRelease for both. But maybe the comment is just confused?

I don't know if the annotations can distinguish calls based on argument values? NS_ReleaseOnMainThread/NS_ProxyRelease with aAlwaysProxy set to true shouldn't GC afaict (if they fail to dispatch the runnable they just leak it), so that would look like a more logical and general annotation to me.

They can't, but I can implement that if I restrict it to just passing a constant boolean value.

Flags: needinfo?(sphink)

(In reply to Steve Fink [:sfink] [:s:] from comment #109)

    // Both WindowStreamOwner and WorkerStreamOwner need to be destroyed on
    // their global's event target thread.

which implies that the WindowStreamOwner isn't necessarily main thread or something? I don't really understand what's going on here. If there is some distinction between the main thread and the WindowStreamOwner global's event target thread, then it seems like I should be using NS_ProxyRelease for both. But maybe the comment is just confused?

Windows can only ever live on the main thread. And if you look at WindowStreamOwner's constructor and destructor, they already both assert NS_IsMainThread(). I guess the event target thread was always passed in for symmetry between WindowStreamOwner and WorkerStreamOwner, but I don't think we should care about that.

Pushed by ydelendik@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/957920db4464
nsICacheInfoChannel.alternativeDataInputStream as attribute. r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/aede4c5e8238
Use alt-data to cache stream-compiled WebAssembly modules. r=necko-reviewers,valentin,dragana
https://hg.mozilla.org/integration/autoland/rev/076e42ac1970
Add pref javascript.options.wasm_caching. r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/aa09c785dbe7
prevent ~JSStreamConsumer from GCing. r=peterv

\o/

Attachment #9243955 - Attachment is obsolete: true
Attachment #9066048 - Attachment is obsolete: true
Flags: needinfo?(ydelendik)
Keywords: leave-open

Refactoring of originalInputStream is not needed for this use case. Shall it be addressed somewhere else?

Assignee: sphink → nobody
Status: ASSIGNED → NEW

Closing this bug as there is nothing left to do from WebAssembly point of view.

Status: NEW → RESOLVED
Closed: 6 years ago3 years ago
Resolution: --- → FIXED
Assignee: nobody → mail
See Also: → 1757999
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: