Allow IDBObjectStore.put() to asynchronously wait on background compilation of JS::WasmModule

RESOLVED FIXED

Status

()

enhancement
P3
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: luke, Assigned: luke)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(12 attachments, 6 obsolete attachments)

1.14 KB, patch
janv
: review+
Details | Diff | Splinter Review
12.44 KB, patch
lth
: review+
Details | Diff | Splinter Review
1.13 KB, patch
lth
: review+
Details | Diff | Splinter Review
12.37 KB, patch
lth
: review+
Details | Diff | Splinter Review
36.87 KB, patch
lth
: review+
Details | Diff | Splinter Review
12.35 KB, patch
lth
: review+
Details | Diff | Splinter Review
968 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
1.11 KB, patch
sfink
: review+
Details | Diff | Splinter Review
1.54 KB, patch
lth
: review+
Details | Diff | Splinter Review
6.84 KB, patch
janv
: review+
Details | Diff | Splinter Review
13.59 KB, patch
asuth
: review+
baku
: feedback+
Details | Diff | Splinter Review
17.10 KB, patch
asuth
: review+
luke
: review+
Details | Diff | Splinter Review
Assignee

Description

2 years ago
In particular, for bug 1277562, when a WebAssembly.Module object is stored in IDB, it may only have baseline code at that point in time but we'd like to wait for the background job which is creating optimized ion code to complete so that we only ever serialize optimized ion code.  IDBObjectStore.put() is an async operation so this waiting shouldn't need to block anything on the JS thread.
This seems like it violates the design decisions of IndexedDB in a way that previous WASM-supporting changes did not.  Specifically, the transaction model of IndexedDB is "if you're not actively doing something, we're going to close your transaction", and "reads can happen in parallel".  In effect, they ensure that database work is either I/O-bound or synchronously CPU bound in the event handlers.  And from the perspective of different scopes that execute in different threads, everything is only I/O bound unless some other thread's scope is doing something CPU bound in a write transaction.  And it's an invariant that all writes involve synchronous structured clone invocations that can immediately be serviced.

I understand that you're requesting that writes be allowed to stall on the CPU-bound process of optimized compilation that happens on another thread.  (Versus the alternative data stream strategy that HTTP cache and the DOM cache API will eventually support.)  While the argument can be made that this is similar to opening a write transaction and then doing something CPU intensive in a read request that was made when initiating the transaction, it's also true that the actual compilation cost is non-obvious to the caller.  They won't really know how long the compilation takes, nor will they directly perceive the "jank" on their thread.  But they can notice that IDB stalls for a potentially very long time.

I think the big question is: Is there a reason user-space JS can't simply wait for the compilation to finish and then issue the write transaction?  This requires no changes to the already-complex IDB logic, changing people's mental models for IDB, or stalling of write transactions.  And instead we can concentrate our engineering efforts on making the DOM Cache API support alternative data streams, which seems like the preferable long-term solution for many reasons.
Assignee

Comment 2

2 years ago
(In reply to Andrew Sutherland [:asuth] from comment #1)

So I was thinking that it would be an optimization of structured clone in general if it was broken into two steps:
 1. copying from the mutable JS objects into a buffer (which necessarily happens synchronously b/c the JS thread can immediately mutate those objects)
 2. asynchronously writing the buffer into storage
I think Chrome must have this structure because, in some Unity measurements, it seemed like Chrome's put()s were much faster, indicating that less work was happening synchronously.  Anyhow, with this structure, I would imagine that the only addition for wasm here would be to make step 2 not be scheduled immediately, but instead wait on a notification from JS API.

An alternative impl strategy that would actually be quite symmetric with alternative data streams would be if, when a JS::WasmModule was serialized, IDB immediately wrote the bytecode but, for the machine code file, IDB simply handed the JS::WasmModule an nsIOutputStream that could be written to at some point in the future.  The main challenge I suppose is handling concurrent reads to the nsIOutputStream write; the ideal thing here would be for reads to wait on the nsIOutputStream to close since they'll otherwise have to recompile the machine code anyway.

> I think the big question is: Is there a reason user-space JS can't simply
> wait for the compilation to finish and then issue the write transaction? 

This is strictly worse than having IDB stall (possibly zero) future transactions and it would defeat the entire purpose of having a baseline tier.

> And instead we
> can concentrate our engineering efforts on making the DOM Cache API support
> alternative data streams, which seems like the preferable long-term solution
> for many reasons.

The important resolution of that discussion, though, was that while implicit caching will be great for handling the many-small-wasm-libraries case, it will break down for (1) really big modules which are too big for the cache (who of course are when we need caching the most; we're talking 100-200mb cache files here) (2) caching modules whose bytes are computed and thus have no backing cache entry (Unity actually already does this so it can apply custom compression, so this is already a critical use case).  For these cases, IDB is the power tool that saves the day.

Comment 3

2 years ago
Thanks for providing your opinions on this, I'll investigate what we can do here.
(In reply to Luke Wagner [:luke] from comment #2)
> The important resolution of that discussion, though, was that while implicit
> caching will be great for handling the many-small-wasm-libraries case, it
> will break down for (1) really big modules which are too big for the cache
> (who of course are when we need caching the most; we're talking 100-200mb
> cache files here) (2) caching modules whose bytes are computed and thus have
> no backing cache entry (Unity actually already does this so it can apply
> custom compression, so this is already a critical use case).  For these
> cases, IDB is the power tool that saves the day.

I think there's some confusion here.  I'm referring to the DOM Cache API (https://developer.mozilla.org/en-US/docs/Web/API/Cache) that's currently part of the ServiceWorker spec, not the HTTP Cache.  It's quota-manager backed storage with the same storage limits as the IndexedDB API.  Although its storage keys are fetch Request instances (which support crazy HTTP "Vary" header matching rules) and so need to look like valid http(s) URL's, they don't have to actually correspond to anything on the web.  You can do cache.put("http://example.com/foo", new Response(new Blob([someArrayBuffer]))) that doesn't exist on the web and no one will be the wiser.

But since we've already shipped WASM-in-IDB, the fact that the Cache API is more suitable for this use-case is sorta moot, so I won't dwell on it.

> > I think the big question is: Is there a reason user-space JS can't simply
> > wait for the compilation to finish and then issue the write transaction? 
> 
> This is strictly worse than having IDB stall (possibly zero) future
> transactions and it would defeat the entire purpose of having a baseline
> tier.

I figured it was worth checking. :)

If I understand bug 1277562 comment 0's intention, your request to delay the write is a simplification.  It seems like what you really want is to be able to update the information associated with the persisted representation of a WASM module at any point in time, not just for initial persistence.  This would enable demand-compilation, plus proper handling of re-compilation on buildid change.  The current re-compilation in the parent when get()ing a WASM module is arguably a hack.  Enabling the child process to update the content on disk would let us make it the child's problem.

I would propose an implementation along these lines:

- Add a mechanism to WasmModule so that IDB can register a "BytecodeChanged" function to be invoked whenever JS wants to update the copy of the compiled bytecode on disk.  This covers both "the compiled version was out of date" plus "we just finished doing some extra fancy ion compilation".

- Also augment WasmModule so that IDB can stash a reference-counted helper class that stores the bytecode blob and compiled blob on the module.  This would address the round-tripping problem with wasm modules.  If you get() a Blob from IDB and put() it back later (or put() a new blob twice), it's just a refcount that gets tweaked, no duplication occurs.  But if you get() a Wasm module from IDB and put it back, it gets duplicated.  Being able to remember the Blobs fixes the problem and we need a reference to the compiled blob for the next bit.

- Implement a new operation, updateblob, that replaces the contents of the existing blob to be run as part of a write transaction.  Because wasm decoding happens as part of a read transaction that won't complete until the async "preprocess" mechanism completes, transaction semantics mean we don't have to worry about races or anyone seeing inconsistent data.  (I believe it's also fine for the write to be "safe" without ever worrying about seeing stale file data because the FD is only acquired when the stream is opened for reading and closed when the stream is closed.)

- IDB implements the change handler.  It uses the compiled blob from the helper which I guess also needs to store a reference to the database.  When a new set of compiled data is provided, a new write transaction is created and the updateblob operation dispatched.  neither the transaction nor the non-standard op should be exposed to the IDB observer API.


While it's awkward to add a "replace the contents of a blob that really should be immutable" operation, it's arguably less awkward than the alternatives or the existing logic where the get() of an obsolete compiled blob can trigger mutating re-compilation.
Assignee

Comment 5

2 years ago
(In reply to Andrew Sutherland [:asuth] from comment #4)
> I think there's some confusion here.  I'm referring to the DOM Cache API
> ...
> You can do cache.put("http://example.com/foo", new Response(new
> Blob([someArrayBuffer]))) that doesn't exist on the web and no one will be
> the wiser.

Ah, I see; I was thinking about the older "noone should have to explicitly cache anything; the http cache should do it all" discussions.  What you're describing here is fully explicit caching, just with a much nicer API :)

So even though there is this distinction (IDB get() returning a Module vs. Cache match() returning a Response), it seems like, with the implementation scheme you described above, they could end up using the same mechanism: a JS::WasmModule that feeds back updates to its cached representation.  The only difference is that with Cache match(), this connection is transitively propagated through the Response into the Module whereas with IDB the connection is established directly on get().  Is that right?  Because if so, that sounds great and like we're not expending too much extra effort with IDB.

> I would propose an implementation along these lines:

I'd need to chat more to understand all the implications, but this sounds really good.  This is sorta what I was vaguely getting at in comment 2 (with "An alternative impl strategy...").  What's really cool about this design is that it lets us do even more powerful runtime optimization techniques (PGO-type stuff) than just background AOT optimization, so it gives us a good foundation for years of optimization without us bugging you each time :)

Do you think we could, from the SM pov, have an interface that unified this with the alternative data streams so that ultimately a wasm or JS unit of code could just have an abstract connection back to some cached storage (IDB, Cache, or Necko alternative data) that it could update at runtime?
(In reply to Luke Wagner [:luke] from comment #5)
> So even though there is this distinction (IDB get() returning a Module vs.
> Cache match() returning a Response), it seems like, with the implementation
> scheme you described above, they could end up using the same mechanism: a
> JS::WasmModule that feeds back updates to its cached representation.  The
> only difference is that with Cache match(), this connection is transitively
> propagated through the Response into the Module whereas with IDB the
> connection is established directly on get().  Is that right?  Because if so,
> that sounds great and like we're not expending too much extra effort with
> IDB.

I would expect both IDB and fetch() to use the same SpiderMonkey hooks.  But at this time there's really no other potential for code reuse between the IDB and fetch.  The exception would be if we started supporting storing Response objects in IDB as raised as a possibility at https://github.com/w3c/ServiceWorker/issues/977#issuecomment-290572146.  Then I'd expect the work made to IDB here would support IDB-backed-Responses supporting alternate data streams using transactions. 
 
> Do you think we could, from the SM pov, have an interface that unified this
> with the alternative data streams so that ultimately a wasm or JS unit of
> code could just have an abstract connection back to some cached storage
> (IDB, Cache, or Necko alternative data) that it could update at runtime?

Yes, in that I think it makes sense for SpiderMonkey to not understand the difference between consumer types.  But again, I do think IDB and fetch would end up with separate implementations that consume/respond to the hook.
Assignee

Comment 7

2 years ago
(In reply to Andrew Sutherland [:asuth] from comment #6)
Ah, ok.  I was assuming that there could be a shared implementation of "internal blob-ish thing that can be updated by JS engine" that could be used by both IDB and Cache, but I'm not familiar with their respective implementations, so I guess they're too different?
(In reply to Luke Wagner [:luke] from comment #7)
> (In reply to Andrew Sutherland [:asuth] from comment #6)
> Ah, ok.  I was assuming that there could be a shared implementation of
> "internal blob-ish thing that can be updated by JS engine" that could be
> used by both IDB and Cache, but I'm not familiar with their respective
> implementations, so I guess they're too different?

Yeah.  There is no platform interface or shared implementation primitive that covers the "a handle to blob-ish data that can be used to both retrieve a stream and atomically replace the underlying data".  DOM Blobs are the closest thing we have, but they are defined to be immutable and that simplifying assumption is leveraged throughout the codebase.  All storage mutation APIs expose their mutation functionality at a higher level with additional semantics[1].  So we are doomed to have specific adapter logic.

1: Actually, the HTTP cache2 implementation meets the use-case pretty well without a lot of extra semantics,  but that's aided by the fact it only exists in the parent process and doesn't expose a consumable API to anyone.  It has to be interacted with through HTTP Channels and their semantics.
Priority: -- → P3
Assignee

Comment 9

2 years ago
So after some discussion with Jan, we all like the generalized solution, but it is a significant amount of work and none of us has bandwidth for that at the moment.  On the other hand, the short-term hack of just adding a new wasm subclass of nsIAsyncStream (and then constructing a BlobImpl from that) would be pretty easy and very low impact, so that seems like the right first step here given that this entirely blocks bug 1277562 which we want to ship soon.  I have some time to work on this.
Assignee

Comment 10

2 years ago
I stumbled across this while starting a patch.
Assignee: nobody → luke
Attachment #8862209 - Flags: review?(jvarga)
Comment on attachment 8862209 [details] [diff] [review]
fix-rv-shadowing

Review of attachment 8862209 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8862209 - Flags: review?(jvarga) → review+

Comment 12

2 years ago
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5288b9aa1fb
Baldr: remove unintentional shadowing of 'rv' (r=janv)
Assignee

Updated

2 years ago
Whiteboard: [leave open]
Assignee

Updated

2 years ago
Keywords: leave-open
Assignee

Comment 14

2 years ago
Posted patch tweak-jsapiSplinter Review
Here's an initial, trivial refactoring to split up serialization of bytecode vs. compiled code.
Attachment #8887667 - Flags: review?(lhansen)
Assignee

Comment 15

2 years ago
Posted patch wasm-module-stream (obsolete) — Splinter Review
Here's a WIP which creates the async stream; it's pretty simple.  Jan, any feedback so far?  The main remaining TODO on the Gecko side is to implement nsIIPCSerializableInputStream.  Also: should I move the WasmCompiledModuleStream to a new .h/.cpp?

A bit more work is necessary on the JS side once bug 1277562 lands to actually do async compilation.
Attachment #8887672 - Flags: feedback?(jvarga)
Comment on attachment 8887667 [details] [diff] [review]
tweak-jsapi

Review of attachment 8887667 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8887667 - Flags: review?(lhansen) → review+

Comment 17

2 years ago
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a7f939d3f4a4
Baldr: change JS::WasmModule interface to separate bytecode from compiled code (r=lth)
Luke, I quickly went through the patch and it looks pretty good.
However I have to finish bug 1350637 before we the uplift, so can't do full review.
Assignee

Comment 19

2 years ago
(In reply to Jan Varga [:janv] from comment #18)
Thanks Jan!  We're still waiting on bug 1277562 before this patch would land (and remove the blocking hack added in that bug) so there is time.  The target here is FF57.

Quick question though: should I move WasmCompiledModuleStream to another .h/.cpp?
Flags: needinfo?(jvarga)
(In reply to Luke Wagner [:luke] from comment #19)
> (In reply to Jan Varga [:janv] from comment #18)
> Thanks Jan!  We're still waiting on bug 1277562 before this patch would land
> (and remove the blocking hack added in that bug) so there is time.  The
> target here is FF57.
> 
> Quick question though: should I move WasmCompiledModuleStream to another
> .h/.cpp?

I think that would only make sense if you plan to reuse it outside IDBObjectStore.cpp
Flags: needinfo?(jvarga)
Blocks: 1391197
No longer blocks: 1277562
Blocks: 1391196
Assignee

Comment 22

2 years ago
Comment on attachment 8887672 [details] [diff] [review]
wasm-module-stream

Ok, with bug 1277562 landed, time to finish this up.
Attachment #8887672 - Flags: feedback?(jvarga)
Assignee

Updated

2 years ago
Attachment #8887672 - Attachment is obsolete: true
Assignee

Comment 23

2 years ago
I think this is just a pre-existing bug that I hit with some later patches: we're sharing things between threads but using non-atomic ref-counting.
Attachment #8899356 - Flags: review?(lhansen)
Assignee

Comment 24

2 years ago
This patch is completely unrelated, but while I was in the area, I found a way to (1) be a bit type-ier with the jump table, (2) share a hair more code on the finish path.
Attachment #8899357 - Flags: review?(lhansen)
Assignee

Comment 25

2 years ago
So I was overly optimistic thinking I could simply expunge the block/unblock logic from Module.  With an async dependency, you still have the same hard requirement of notifying the async dependent on success/failure in all cases.  This patch does some things I was going to suggest in review but decided not to under the assumption the code would be nuked.  The idea is to use the lifetime of the Tier2GeneratorTask to ensure we always unblock.

The patch also shuffles the logic around a bit to make tier-2 generation and the associated state machine an impl detail of wasm::Module.  This is main designed to make the final patch a simple drop-in replacement for the block/unblock logic.
Attachment #8899358 - Flags: review?(lhansen)
Assignee

Comment 26

2 years ago
Posted patch wasm-module-stream (obsolete) — Splinter Review
This patch adds the new JS API and all the Gecko-side code in IDBObjectStore.cpp.  Surprisingly simple, given the right JS API (which took a bit of iteration to find).

Jan: I realized that, since we'll already have a heuristic that only uses tiering for big modules, there's probably no benefit from implementing  nsIIPCSerializableInputStream to optimize the small-module case since we won't even be using a stream in that case.
Attachment #8899361 - Flags: review?(jvarga)
Assignee

Comment 27

2 years ago
This patch implements the new JS API introduced by the previous patch, removing the synchronous blocking.  Pretty simple, given the preceding patches.  I was able to successfully run ZenGarden and a few other test workloads.  While we review these patches, I intend to work on another patch to add or enabling some automated tests to hit this path.
Attachment #8899362 - Flags: review?(lhansen)
Comment on attachment 8899356 [details] [diff] [review]
atomic-ref-counting

Review of attachment 8899356 [details] [diff] [review]:
-----------------------------------------------------------------

Interesting oversight.
Attachment #8899356 - Flags: review?(lhansen) → review+
Attachment #8899357 - Flags: review?(lhansen) → review+
Attachment #8899361 - Attachment is patch: true
Comment on attachment 8899358 [details] [diff] [review]
tweak-tier2-tasks

Review of attachment 8899358 [details] [diff] [review]:
-----------------------------------------------------------------

Looks plausible.  I'll give it a whirl with my testing functions to see if I can uncover any lost wakeups, but I don't expect to find any.

::: js/src/wasm/WasmModule.cpp
@@ +272,5 @@
> +    // hasn't been already.
> +
> +    UniqueTier2GeneratorTask task(js_new<Tier2GeneratorTaskImpl>(*this, args));
> +    if (!task)
> +        return;

Don't we need an unblock here if we fail to allocate?  The original code had one, in wasm::Compile(), and the destructor in the task won't save us here as the task was not allocated.
Attachment #8899358 - Flags: review?(lhansen) → review+
Assignee

Comment 30

2 years ago
(In reply to Lars T Hansen [:lth] from comment #29)
> > +    UniqueTier2GeneratorTask task(js_new<Tier2GeneratorTaskImpl>(*this, args));
> > +    if (!task)
> > +        return;
> 
> Don't we need an unblock here if we fail to allocate?  The original code had
> one, in wasm::Compile(), and the destructor in the task won't save us here
> as the task was not allocated.

The mode_ starts out conservatively at CompileMode::Once and we only set it to Tier2 in the next line so noone should be blocking.
Assignee

Comment 31

2 years ago
In the wasm-module-stream patch, I need to declare an nsISupports-compatible AddRef()/Release() in jsapi.h which cannot simply refer to nsISupports.  I need something like this macro in mfbt so that the call ABI lines up.
Attachment #8899908 - Flags: review?(nfroyd)
Assignee

Comment 32

2 years ago
This one-liner suppresses a spurious rooting hazard that appeared b/c ~WasmModule, which is considered reachable from ~RefPtr while the temporary JSObject* is live, now does more (not-GC-causing) stuff.  Patch is technically an optimization ;)
Attachment #8899911 - Flags: review?(amarchesini)
Assignee

Comment 33

2 years ago
Posted patch wasm-module-stream (obsolete) — Splinter Review
Updated with 'explicit' and MOZ_XPCOM_ABI (introduced by above patch) to be green on try.
Attachment #8899912 - Flags: review?(jvarga)
Assignee

Updated

2 years ago
Attachment #8899361 - Attachment is obsolete: true
Attachment #8899361 - Flags: review?(jvarga)
Comment on attachment 8899908 [details] [diff] [review]
add-xpcom-abi-to-mfbt

Review of attachment 8899908 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is a whole lot simpler than what I expected from the description--I expected some sort of hideous nsISupports clone class, nsIDs and all...and got instead a simple macro definition.  Winning!
Attachment #8899908 - Flags: review?(nfroyd) → review+
Assignee

Comment 35

2 years ago
Haha, yeah, we fortunately only need AddRef/Release, not QI :)
(In reply to Luke Wagner [:luke] from comment #30)
> 
> The mode_ starts out conservatively at CompileMode::Once and we only set it
> to Tier2 in the next line so noone should be blocking.

Indeed, I realized this later.  Nice.
Comment on attachment 8899362 [details] [diff] [review]
wasm-remove-blocking

Review of attachment 8899362 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmModule.cpp
@@ +383,5 @@
>  
>  /* virtual */ size_t
>  Module::compiledSerializedSize() const
>  {
> +    MOZ_ASSERT(!tiering_.lock()->active);

MOZ_RELEASE_ASSERT?  Also below.

@@ -623,5 @@
>      if (!result)
>          return false;
>  
> -    if (tier == Tier::Ion)
> -        blockOnIonCompileFinished();

I'm worried that removing this will quietly break some tests that use extractCode (exposed in the shell and TestingFunctions as wasmExtractCode).  I understand why you want to remove it, but as there are no shell APIs for listening for tier-2 completion I'm not sure what we're left with.  I suppose a test case can always poll until the code is available :/
Attachment #8899362 - Flags: review?(lhansen) → review+
Assignee

Comment 38

2 years ago
(In reply to Lars T Hansen [:lth] from comment #37)
Given the limited use cases for this testing function, I didn't think it justified any non-local complexity.  However, thinking about it again, I can just make the local fix of polling JS::WasmModule::compilationComplete() in a loop with a sleep().
Assignee

Comment 39

2 years ago
Ooh, try will tell for sure, but I *think* I can use C++11 sleep_for.  IIRC, you can use this in one of your testing-function patches for portable sub-second sleeping.
Attachment #8900291 - Flags: review?(lhansen)

Comment 40

2 years ago
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/630c1ebfd237
Baldr: atomically ref-count ShareableBase (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/427c087cae8d
Baldr: factor commonality into finishCodeSegment and make jumpTable more typey (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/601c9ea4e178
Baldr: use lifetime/ownership to manage unblocking Module dependents (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/3878f845c306
Add MOZ_XPCOM_ABI to mfbt (r=froydnj)
Comment on attachment 8900291 [details] [diff] [review]
fix-wasm-extract-code

Review of attachment 8900291 [details] [diff] [review]:
-----------------------------------------------------------------

Sure, this is what I ended up doing in the testing code too, although I repurposed my old code instead of trying to figure out sleep_for.  It will be good if sleep_for works.  You might need to get <chrono> on the list of approved stl headers in config/stl-headers, it's not there now.
Attachment #8900291 - Flags: review?(lhansen) → review+
Assignee

Comment 42

2 years ago
Posted patch idb-tests (obsolete) — Splinter Review
With printfs, I could confirm this test hitting the async module stream path in debug/opt window/worker/xpcshell.  Note: the prefs being set here shouldn't last long and will be removed with bug 1391196.
Attachment #8900460 - Flags: review?(jvarga)
Assignee

Updated

2 years ago
Attachment #8899911 - Flags: review?(amarchesini) → review?(sphink)
(In reply to Luke Wagner [:luke] from comment #42)
> With printfs, I could confirm this test hitting the async module stream path
> in debug/opt window/worker/xpcshell.  Note: the prefs being set here
> shouldn't last long and will be removed with bug 1391196.

Aside, no action required, just a general FYI: Don't forget that IDB has IDB_LOG_MARK[1] that gets you cool MOZ_LOG when MOZ_LOG=IndexedDB:5 output, plus if you flip the "dom.indexedDB.logging.profiler-marks" pref[2], profiler marks in the profiler.  Back before I was a fancy IDB peer[3] and primraily a consumer of IndexedDB for Firefox OS apps, I always found the MOZ_LOG output invaluable.  I endorse adding additional IDB_LOG_MARK calls with abandon.

1: https://searchfox.org/mozilla-central/search?q=IDB_LOG_MARK
2: The other relevant prefs are "dom.indexedDB.logging.enabled" and "dom.indexedDB.logging.details" but they default to true so you don't need to worry about them.
3: ;)
Attachment #8899911 - Flags: review?(sphink) → review+
Comment on attachment 8900460 [details] [diff] [review]
idb-tests

Review of attachment 8900460 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/test/unit/xpcshell-head-parent-process.js
@@ +682,5 @@
>    },
>  };
> +
> +// This can be removed soon when on by default.
> +SpecialPowers.setBoolPref('javascript.options.wasm_baselinejit', true);

You can't set the pref like this. This also runs in a child process.
I currently get:
3:55.81 PROCESS_OUTPUT: Thread-520 (pid:45519) "\x07[Child 45520] ###!!! ASSERTION: ENSURE_MAIN_PROCESS failed. Cannot SetBoolPref from content process: javascript.options.wasm_baselinejit: 'Error', file /Users/varga/Sources/Mozilla2/modules/libpref/nsPrefBranch.cpp, line 166"
Assignee

Comment 46

2 years ago
Posted patch idb-testsSplinter Review
Heh, yeah, I found that right after posting as soon as I ran the broader IDB xpc tests.
Attachment #8900460 - Attachment is obsolete: true
Attachment #8900460 - Flags: review?(jvarga)
Attachment #8901188 - Flags: review?(jvarga)
Andrew, I think the copy loop can be simpler and we can use a stack based raw buffer which can be directly copied into the IPC message w/o any extra copying.
I think calling SetLength() after every Read() can be inefficient.
Attachment #8901539 - Flags: review?(bugmail)
fix the casting
Attachment #8901539 - Attachment is obsolete: true
Attachment #8901539 - Flags: review?(bugmail)
Attachment #8901541 - Flags: review?(bugmail)
Attachment #8901597 - Flags: review?(bugmail)

Updated

2 years ago
Attachment #8901597 - Flags: review?(luke)
Comment on attachment 8899912 [details] [diff] [review]
wasm-module-stream

Review of attachment 8899912 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/IDBObjectStore.cpp
@@ +196,5 @@
> +{
> +  // The stream can be in one of 3 states: Waiting, Ready and Closed. The
> +  // transition from Waiting to Ready can happen asynchronously from any thread
> +  // so all fields must be locked by mMutex.
> +  Mutex mMutex;

I believe this can be done w/o a mutex. See my patch.

::: js/src/jsapi.h
@@ +6584,5 @@
> +    // Compilation must complete before the serialized code is requested. If
> +    // compilation is not complete, the embedding must wait until notified by
> +    // implementing WasmModuleListener. SpiderMonkey will hold a RefPtr to
> +    // 'listener' until onCompilationComplete() is called.
> +    virtual bool compilationComplete() = 0;

Nit: const
Attachment #8899912 - Flags: review?(jvarga)
more cleanup
Attachment #8901597 - Attachment is obsolete: true
Attachment #8901597 - Flags: review?(luke)
Attachment #8901597 - Flags: review?(bugmail)
Attachment #8901851 - Flags: review?(bugmail)

Updated

2 years ago
Attachment #8901851 - Flags: review?(luke)
Comment on attachment 8901188 [details] [diff] [review]
idb-tests

Review of attachment 8901188 [details] [diff] [review]:
-----------------------------------------------------------------

Nice test, thanks!

::: dom/indexedDB/test/unit/test_wasm_serialize_tiering.js
@@ +18,5 @@
> +  }
> +
> +  // Make a module big enough so that tiering is significant.
> +  const N = 50;
> +  var bigFunc = `(func (result f64)\n`;

Nit: s/var/let ?

@@ +19,5 @@
> +
> +  // Make a module big enough so that tiering is significant.
> +  const N = 50;
> +  var bigFunc = `(func (result f64)\n`;
> +  for (var i = 0; i < N; i++)

Nit: not sure about this, |var i| here declares it just for the scope of this for loop ?

Nit: braces

@@ +24,5 @@
> +    bigFunc += `  f64.const 1.0\n`;
> +  for (i = 0; i < N - 1; i++)
> +    bigFunc += `  f64.add\n`;
> +  bigFunc += `)`;
> +  var bigModule = `(module \n`;

Nit: s/var/let ?

@@ +64,5 @@
> +  const NumModules = 5;
> +  const NumCopies = 5;
> +
> +  let finishedAdds = 0;
> +  for (let moduleIndex = 0; moduleIndex < NumModules; moduleIndex++) {

Nit: two-space indentation here and elsewhere

@@ +71,5 @@
> +          let key = String(moduleIndex) + " " + String(copyIndex);
> +          let request = objectStore.add(module, key);
> +          request.onsuccess = function() {
> +              is(request.result, key, "Got correct key");
> +              if (++finishedAdds === NumModules * NumCopies)

Nit: braces

@@ +89,5 @@
> +          request.onsuccess = function() {
> +              let module = request.result;
> +              var instance = new WebAssembly.Instance(module);
> +              is(instance.exports.run(), N, "Got correct run() result");
> +              if (++finishedGets === NumModules * NumCopies)

Nit: braces
Attachment #8901188 - Flags: review?(jvarga) → review+
Assignee

Comment 53

2 years ago
Comment on attachment 8901851 [details] [diff] [review]
wasm-module-stream (w/o mutex)

Review of attachment 8901851 [details] [diff] [review]:
-----------------------------------------------------------------

Ah, the logic is much simpler if we can assume our stream only runs on a single event target; thanks so much for writing it up!

::: dom/indexedDB/IDBObjectStore.cpp
@@ +192,5 @@
>    return request.forget();
>  }
>  
> +// Blob internal code clones this stream before it's passed to IPC, so we
> +// serialize module's optimized code only when AsyncWait() is called.

It's worrisome that we could theoretically end up serializing N times for N clones, but I assume you've manually verified this doesn't happen in the IDB path.

@@ +409,5 @@
> +
> +    if (mModule->compilationComplete()) {
> +      onCompilationComplete();
> +    } else {
> +      mModule->notifyWhenCompilationComplete(this);

nit: notifyWhenCompilationComplete() immediately calls onCompilationComplete() if compilationComplete() so this probably isn't a measurable optimization
Attachment #8901851 - Flags: review?(luke) → review+
Comment on attachment 8901541 [details] [diff] [review]
Simplify IPCStreamSource::DoRead() and remove reallocation of the buffer

Review of attachment 8901541 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jan Varga [:janv] from comment #47)
> Andrew, I think the copy loop can be simpler and we can use a stack based
> raw buffer which can be directly copied into the IPC message w/o any extra
> copying.
> I think calling SetLength() after every Read() can be inefficient.

SetLength() seems to return early if there's already sufficient storage space allocated:
https://searchfox.org/mozilla-central/source/xpcom/string/nsTSubstring.cpp#76

The main downside seems to be that if you ask for 32k, you'll actually get (64k - (sizeof(nsStringBuffer) + 1)) thanks to nsStringBuffer weirdness and the mozilla::RoundUpPow2 logic.

I'm instinctively fearful of intentionally large stack allocations (especially since we could allocate a 32k heap buffer once for the object and hold it), but DoRead() is leaf-node-ish (its callers shouldn't spin nested event loops) and I see our current minimum worker stack size is 1MiB on 32bit and 2MiB on 64-bit, so that really should be enough.  Main threads get even larger stack allocations, so even though we can be pretty deep in stack on the main thread[1], the fact that we're a leaf node and we already size the main thread stacks for recursive layout, we're probably okay.  So I think I've convinced myself enough :)

In terms of extra copies, I agree that if SetLength()/SetCapacity() does end up growing the buffer, then there will be copying involved (although it's amortized with the power-of-twos doubling, so even if Available()/Read() calls were pathologically hostile, it'd come out in the wash.)  But assuming an initial SetCapacity(32k that ends up 64k-ish), I don't think any copies are saved in the steady state.  Both `struct ParamTraits<nsACString>` and `struct ParamTraits<mozilla::wr::ByteBuffer>` end up doing the same thing.  (Although, again the SetLength call for the string will over-allocate).

But I'm on board with the ByteBuffer switch given that it saves us from over-allocating and unless we needed proper-refcounted string slicing/roping, ByteBuffer is a more accurate and easier to understand type hierarchy.  That said, :baku is back, so I'm going to feedback? him in a shorter comment.

Note: For paranoia purposes, it would be great for you to verify that the test case from https://bugzilla.mozilla.org/show_bug.cgi?id=1393063#c0 works with this stack of patches/etc.  We don't actually have test coverage yet for that regression until https://bugzilla.mozilla.org/show_bug.cgi?id=1371699 lands (or someone adds a test.  :baku and I successfully hand-waved that need away.)

1: In the bug 1393063 repro which is a non-async XHR FormData POST from the main thread on a debug build, I'm seeing a truncated stack with 50 frames.  But most of them are template gunk and that slice of stack is only ~9k.
Attachment #8901541 - Flags: review?(bugmail) → review+
Comment on attachment 8901541 [details] [diff] [review]
Simplify IPCStreamSource::DoRead() and remove reallocation of the buffer

:baku, are you cool with changing the wire protocol of the parent/child IPC streams from ncCString to wr::ByteBuffer[1]?  It avoids power-of-2 over-allocating on receipt (and when building, if relevant), plus it's way easier to trace.  Its main limitation as a type is that although it supports slices, all slices need to be outlived by their parent.

1: https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/WebRenderTypes.h#548
Attachment #8901541 - Flags: feedback?(amarchesini)
(In reply to Andrew Sutherland [:asuth] from comment #54)
> Note: For paranoia purposes, it would be great for you to verify that the
> test case from https://bugzilla.mozilla.org/show_bug.cgi?id=1393063#c0 works
> with this stack of patches/etc.

Yeah, the test case works as expected with these patches.
(In reply to Luke Wagner [:luke] from comment #53)
> > +// Blob internal code clones this stream before it's passed to IPC, so we
> > +// serialize module's optimized code only when AsyncWait() is called.
> 
> It's worrisome that we could theoretically end up serializing N times for N
> clones, but I assume you've manually verified this doesn't happen in the IDB
> path.

Bad news :(

This is where IDB does super smart caching of Blobs using weak references:
https://searchfox.org/mozilla-central/rev/cd82cacec2cf734768827ff85ba2dba90a534c5e/dom/indexedDB/IDBDatabase.cpp#911

Every time Blob::Create is invoked, you get a new nsIDOMBlob and that mechanism is defeated, so these are where the doom happens:
https://hg.mozilla.org/try/file/153e28ba8e5e7f2ac276a5f89e3c3116923b5772/dom/indexedDB/IDBObjectStore.cpp#l656

The blobs needs to be saved off.  (This is part of the generalized solution I proposed in comment 4 onwards, so if that's coming soon, then no worries!)
(In reply to Andrew Sutherland [:asuth] from comment #57)
> (In reply to Luke Wagner [:luke] from comment #53)
> > > +// Blob internal code clones this stream before it's passed to IPC, so we
> > > +// serialize module's optimized code only when AsyncWait() is called.
> > 
> > It's worrisome that we could theoretically end up serializing N times for N
> > clones, but I assume you've manually verified this doesn't happen in the IDB
> > path.
> 
> Bad news :(
> 
> This is where IDB does super smart caching of Blobs using weak references:
> https://searchfox.org/mozilla-central/rev/
> cd82cacec2cf734768827ff85ba2dba90a534c5e/dom/indexedDB/IDBDatabase.cpp#911
> 
> Every time Blob::Create is invoked, you get a new nsIDOMBlob and that
> mechanism is defeated, so these are where the doom happens:
> https://hg.mozilla.org/try/file/153e28ba8e5e7f2ac276a5f89e3c3116923b5772/dom/
> indexedDB/IDBObjectStore.cpp#l656
> 
> The blobs needs to be saved off.  (This is part of the generalized solution
> I proposed in comment 4 onwards, so if that's coming soon, then no worries!)

Hm, I don't understand. Luke pointed out that if we make multiple clones, we could end up serializing the same module N times.
The cloning only happens when blob->GetInternalStream() is called and that should be called only once.

However, if the same module is stored multiple times, we always create a new internal blob, so it will be serialized and stored N times because we haven't implemented "sharing" for wasm modules yet. But this patch shouldn't make the situation worse, no ?

Your solution in comment 4 is really nice, I like it, but I'm not sure we have time for that right now.
Our understandings are the same.  I believe I mis-parsed what Luke was referring to by clones.  I thought he was referring to structured clones invocations, the concern sparked by the comment and because (not common knowledge) the blob logic in question doesn't happen in IDB (pedantically speaking).  Your interpretation makes more sense; in which case, I vouch for your comment, it's totally safe.  The blob code makes defensive clones of streams because Blobs are long-lived and immutable; the "original" stream will have its refcount go to zero and be destroyed without ever being consumed.

Restating in pseudo-code:

let module = await moduleFromSomewhere();
idb.put({ foo: module, bar: module });  // module serialized only once despite two references.
idb.put({ boo: module, car: module });  // module serialized again, but only once for 2 refs.
Er, ignore the pseudo-code comments dwelling on the number of references.  That should just be down to the orthogonal issue of the structure clone only serializing a single object once per object graph that is being structured clone.
Assignee

Comment 61

2 years ago
Right, agreed with comment 58 and thanks for consideration/answers.
Comment on attachment 8901851 [details] [diff] [review]
wasm-module-stream (w/o mutex)

Review of attachment 8901851 [details] [diff] [review]:
-----------------------------------------------------------------

This is a nice looking stream implementation, and the comments that help clarify the reference-counting are excellent and appreciated.  Kudos to you both!
Attachment #8901851 - Flags: review?(bugmail) → review+
Comment on attachment 8901541 [details] [diff] [review]
Simplify IPCStreamSource::DoRead() and remove reallocation of the buffer

(Note that my feedback? to :baku doesn't need to block landing of this.  It's easy enough to promptly undo or revert the changes if :baku isn't crazy about them.)
Assignee

Comment 64

2 years ago
Thanks for reviews and help!  Jan: I can land the patches (in a day to give baku some time to give feedback before landing); could you tell me when you have posted the the final version with any comments addressed?
(In reply to Luke Wagner [:luke] from comment #53)
> It's worrisome that we could theoretically end up serializing N times for N
> clones, but I assume you've manually verified this doesn't happen in the IDB
> path.

I manually verified that Clone() is called only once.

> 
> @@ +409,5 @@
> > +
> > +    if (mModule->compilationComplete()) {
> > +      onCompilationComplete();
> > +    } else {
> > +      mModule->notifyWhenCompilationComplete(this);
> 
> nit: notifyWhenCompilationComplete() immediately calls
> onCompilationComplete() if compilationComplete() so this probably isn't a
> measurable optimization

Ok, feel free to change that.
(In reply to Luke Wagner [:luke] from comment #64)
> Thanks for reviews and help!  Jan: I can land the patches (in a day to give
> baku some time to give feedback before landing); could you tell me when you
> have posted the the final version with any comments addressed?

I think there's only one thing in my wasm-module-stream patch pointed out in comment 65, again feel free to address it.

There are also some nits for the IDB test patch, not a big deal.

Also feel free to provide better commit messages for my patches.

Thanks for the patches and reviews.
Attachment #8901541 - Flags: feedback?(amarchesini) → feedback+
We should probably file a new bug for comment 4.
Assignee

Updated

2 years ago
Attachment #8899912 - Attachment is obsolete: true

Comment 68

2 years ago
Pushed by lwagner@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3587eb210584
Avoid temporary RefPtr to silence rooting hazard (r=sfink)
https://hg.mozilla.org/integration/mozilla-inbound/rev/08d40ce6531e
Simplify IPCStreamSource::DoRead() and remove reallocation of the buffer (r=asuth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/55547791ee99
Add async notification API for wasm compilation and async stream using it (r=janv,luke,asuth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/ee4e7151a6b4
Implement async notification API so serialization is nonblocking (r=lth)
https://hg.mozilla.org/integration/mozilla-inbound/rev/fe9850e274d2
Add tiering+IndexedDB tests (r=janv)
https://hg.mozilla.org/integration/mozilla-inbound/rev/cddd45a0a4c8
Fix wasmExtractCode for tiering (r=lth)
Assignee

Updated

2 years ago
Whiteboard: [leave open]
Assignee

Updated

2 years ago
Status: NEW → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
(In reply to Andrew Sutherland [:asuth] from comment #55)
> Comment on attachment 8901541 [details] [diff] [review]
> Simplify IPCStreamSource::DoRead() and remove reallocation of the buffer
> 
> :baku, are you cool with changing the wire protocol of the parent/child IPC
> streams from ncCString to wr::ByteBuffer[1]?  It avoids power-of-2
> over-allocating on receipt (and when building, if relevant), plus it's way
> easier to trace.  Its main limitation as a type is that although it supports
> slices, all slices need to be outlived by their parent.
> 
> 1:
> https://searchfox.org/mozilla-central/source/gfx/webrender_bindings/
> WebRenderTypes.h#548

Just noticing this now, but it seems inappropriate to use wr::ByteBuffer in what appears to be general IPC code without at least checking with somebody on the gfx team. After bug 1406507 we will no longer be using that struct in webrender code, and the only usage will be this IPC stuff. I would have liked to remove it entirely but if you still need it you should move it to someplace other than gfx/.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #70)
> Just noticing this now, but it seems inappropriate to use wr::ByteBuffer in
> what appears to be general IPC code without at least checking with somebody
> on the gfx team. After bug 1406507 we will no longer be using that struct in
> webrender code, and the only usage will be this IPC stuff. I would have
> liked to remove it entirely but if you still need it you should move it to
> someplace other than gfx/.

Sorry, I'll create a clone for IPC stream code, you can remove it then.
You might also consider using the ipc::ByteBuf type to avoid some copies if it's appropriate.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #72)
> You might also consider using the ipc::ByteBuf type to avoid some copies if
> it's appropriate.

Cool, I'll try to use that.
You need to log in before you can comment on or make changes to this bug.