Closed Bug 1312808 Opened 8 years ago Closed 8 years ago

rewrite stored WebAssembly.Module when build-id changes

Categories

(Core :: Storage: IndexedDB, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox52 + fixed
firefox53 --- fixed

People

(Reporter: luke, Assigned: janv)

References

Details

Attachments

(4 files, 1 obsolete file)

Right now, if the build-id changes between when a WebAssembly.Module is stored in IDB and when it is retrieved, the compiled code is ignored and the module is recompiled from bytecode.  This is expected to happen every time the browser is upgraded, but we should replace the now-forever-invalid compiled code file with the new compiled module so that it only happens once per build-id change.
Priority: -- → P3
I have this mostly implemented.
Assignee: nobody → jvarga
Status: NEW → ASSIGNED
\o/
Attached patch patch (obsolete) — Splinter Review
mostly done
Works on the wasm AngryBots demo!
Attached patch patchSplinter Review
Attachment #8811343 - Attachment is obsolete: true
Attachment #8811719 - Flags: review?(bugmail)
Comment on attachment 8811719 [details] [diff] [review]
patch

r? for mozilla/js changes
Attachment #8811719 - Flags: review?(luke)
Comment on attachment 8811719 [details] [diff] [review]
patch

Review of attachment 8811719 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/wasm/WasmModule.cpp
@@ +260,5 @@
>             elemCodeRangeIndices.sizeOfExcludingThis(mallocSizeOf);
>  }
>  
>  /* virtual */ void
>  Module::serializedSize(size_t* bytecodeSize, size_t* compiledSize) const

Can you rename bytecodeSize/compiledSize/bytecodeBegin to maybeBytecodeSize/maybeCompiledSize/maybeBytecodeBegin in these two definitions, the declarations in WasmModule.h and the interface in jsapi.h?

@@ +264,5 @@
>  Module::serializedSize(size_t* bytecodeSize, size_t* compiledSize) const
>  {
> +    if (bytecodeSize) {
> +        *bytecodeSize = SerializedPodVectorSize(bytecode_->bytes);
> +    }

nit: no { } around single-line then
Attachment #8811719 - Flags: review?(luke) → review+
I'm a little concerned by having the re-compilation of the WASM happen in the parent process.  Given that you already built the preprocess mechanism, what do you think about having the parent process send the newFile's fd to the child to write the updated compiled version to?

My concerns specifically are:
- Security: This patch bypasses our content sandbox efforts by letting content-provided WASM recompilation happen in the parent.  It's a high bar for an attacker because they need to wait for or manufacture an update, but they're used to those.
- Large memory allocations (under 32-bit) in the parent: It's my understanding the WASM blobs can be large.  While we're talking about one individual WASM file at a time in this case instead of all that might be involved for a single site plus their dynamic allocations, it's far better to confine memory-related crashes to the content processes.
- The content process already needs to deal with the WASM blobs already in the preprocess.


The potential issues with the preprocess approach are:
- Too many open FileDescriptors at once.  Assuming platform file descriptor limits aren't a thing of the past, it's probably asking for trouble to send pre-opened descriptors.  Instead, an actor needs to be involved to issue them as they're processed.
- (Future hypothetical issue) Competing read updates.  We touched on this in bug 1313185 discussion.  Although it would be bad/dumb for IDB callers to ask for multiple copies of the same WASM request, unless subsequent get() requests are blocked, there's the potential to end up with multiple redundant "upgraded" preprocess requests dispatched to the child.  The current patch's approach of doing the update on the I/O thread avoids this.  (Of course, my monitor-based "minimal" fix proposal for bug 1313185 also addresses the issue by blocking the I/O thread.)
Flags: needinfo?(jvarga)
(In reply to Andrew Sutherland [:asuth] from comment #9)
> - Security: This patch bypasses our content sandbox efforts by letting
> content-provided WASM recompilation happen in the parent.  It's a high bar
> for an attacker because they need to wait for or manufacture an update, but
> they're used to those.

To be clear, the code is not executed, so the exploit needs to be in the compilation process itself.  Are there no other cases where content JS exercises the JS engine in the parent process?  I'd be somewhat surprised.  I agree that, in the limit, it would be good to remove all such cases, including this.

> - Large memory allocations (under 32-bit) in the parent: It's my
> understanding the WASM blobs can be large.  While we're talking about one
> individual WASM file at a time in this case instead of all that might be
> involved for a single site plus their dynamic allocations, it's far better
> to confine memory-related crashes to the content processes.

No, just the wasm *memories*.  The wasm modules are ~40mb for huge apps.
I investigated recompilation on the child side and it seems much more complicated to me. For example it can happen that multiple threads, multiple child processes or even in the single process case (multiple transactions created to get the same module) would recompile the same thing at once and then they would send it to the parent multiple times. We could add code to coordinate it, but it seems too complicated.

There's a dedicated thread for a physical IDB database on the parent side, so if the thread is doing something, for example checking assumptions in the stored compiled wasm, I believe nothing else can do it at the same time. So if assumptions don't match, we can just recompile immediately and then serialize and rewrite the file. I don't even need to create a JS global object on the database connection thread.

There's one downside, we lose some parallelism for the recompilation, but Luke says this threading model is enough for next few years.

When we allocate in the parent which can possible be large, we always use fallible methods, so we don't crash the parent process. It's also hard to say what's better to allocate just one possibly big buffer in the parent or allocate multiple buffers on they way from child through IPC to the parent.

Sending a fd to the child for writing is something we haven't done yet in any quota manager client since quota checks are implemented only for the parent process.

I must also say that I currently don't have so much time to implement more complex solution, this bug is P3.

Luke and me concluded that this solution would be ok for next few years.
Flags: needinfo?(jvarga)
(In reply to Andrew Sutherland [:asuth] from comment #9)
> - (Future hypothetical issue) Competing read updates.  We touched on this in
> bug 1313185 discussion.  Although it would be bad/dumb for IDB callers to
> ask for multiple copies of the same WASM request, unless subsequent get()
> requests are blocked, there's the potential to end up with multiple
> redundant "upgraded" preprocess requests dispatched to the child.  The
> current patch's approach of doing the update on the I/O thread avoids this. 
> (Of course, my monitor-based "minimal" fix proposal for bug 1313185 also
> addresses the issue by blocking the I/O thread.)

I'm not sure if that bug fixes the possibility to request the same module multiple times. That bug is just about ordering and blocking subsequent requests for the same transaction, no ?
If I create separate transactions then parent would send the first one for preprocess (recompilation) to the child, but since recompilation is quite slow process, the parent would process next transaction and do the same thing again, no ? I'm talking about read-only transactions of course.
(In reply to Jan Varga [:janv] from comment #12)
> (In reply to Andrew Sutherland [:asuth] from comment #9)
> > - (Future hypothetical issue) Competing read updates.  We touched on this in
> > bug 1313185 discussion.  Although it would be bad/dumb for IDB callers to
> > ask for multiple copies of the same WASM request, unless subsequent get()
> > requests are blocked, there's the potential to end up with multiple
> > redundant "upgraded" preprocess requests dispatched to the child.  The
> > current patch's approach of doing the update on the I/O thread avoids this. 
> > (Of course, my monitor-based "minimal" fix proposal for bug 1313185 also
> > addresses the issue by blocking the I/O thread.)
> 
> I'm not sure if that bug fixes the possibility to request the same module
> multiple times. That bug is just about ordering and blocking subsequent
> requests for the same transaction, no ?
> If I create separate transactions then parent would send the first one for
> preprocess (recompilation) to the child, but since recompilation is quite
> slow process, the parent would process next transaction and do the same
> thing again, no ? I'm talking about read-only transactions of course.

Oh, maybe I misunderstood you, the monitor-based "minimal" fix would prevent that, but I'm not sure if I want lose the parallel execution.
So sending a file descriptor to the child for writing requires quite a lot of hacking, the things I already mentioned and also a new locking for wasm compiled stored files. If we want to keep the parallel execution, multiple children can end up with rewriting the same file.
I was also thinking about adding a manager that would only allow one child to do recompilation, but then all other children would depend on the child that does the recompilation. For example the child could have lower priority in theory or could crash, etc.

Sorry for not explaining this sooner.

If we later find this solution limiting security or performance we can try to do the coordinated recompilation on the child side or we can even create a dedicated process for it (if wasm gets used much more in future or if we add more similar stuff to IDB or storage in general).
Thank you both for addressing my concerns.  I expect to r+ this patch later today.  (Dentist time now, though. :)
Comment on attachment 8811719 [details] [diff] [review]
patch

Review of attachment 8811719 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/indexedDB/ActorsParent.cpp
@@ +9587,5 @@
>  /*******************************************************************************
> + * Helper classes
> + ******************************************************************************/
> +
> +class MOZ_STACK_CLASS FileHelper final

This is a most excellent refactoring!

@@ +9745,5 @@
> +  if (match) {
> +    return NS_OK;
> +  }
> +
> +  nsCOMPtr<nsIFile> bytecodeFile =

I think it'd be great to add a comment here along the lines of:

Re-compile the module.  It would be preferable to do this in the child (content process) instead of here in the parent, but that would be way more complex and without significant memory allocation or security benefits.  See the discussion starting from https://bugzilla.mozilla.org/show_bug.cgi?id=1312808#c9 for more details.

@@ +29230,5 @@
> +  MOZ_ASSERT(aCopied);
> +  MOZ_ASSERT(mFileManager);
> +  MOZ_ASSERT(mFileDirectory);
> +  MOZ_ASSERT(mJournalDirectory);
> +

I'd suggest doing `*aCopied = false;` up here at the top.  Your 2 callers just do "bool copied;" without any initializing value and there are a whole bunch of early returns from this method that could leave the value uninitialized.

@@ +29264,5 @@
> +    if (NS_WARN_IF(fileSize < 0)) {
> +      return NS_ERROR_FAILURE;
> +    }
> +
> +    if (NS_WARN_IF(uint64_t(fileSize) != inputStreamSize)) {

I know you're just moving code that dates back to bug 994190 (use PBackground), but the semantics on this `if (exists)` branch are pretty crazy to have especially now that this is its own helper method.  Given the function's signature and lack of comments, I'd assume the idea is "write aInputStream to aFile, creating the aJournalFile so that the cleanup logic will know to delete the file up if we crash before we finish what we're doing."

But this branch adds "...And if the file seems to already exist, we won't actually write the stream but it's NS_OK as long as the file has the same length as the stream.  If the file had a different length, return an error and the inputstream will end up not being written to disk and whatever weird thing was already there will still be there."  Which is all kinds of suspect.  The only scenario where this seems like it would begin to make sense would be if we're retrying an idempotent operation where we know that only the exact inputstream would map exactly to this file name and the prior try crashed.  But if it crashed, I'm not sure it would make sense to trust that all the bytes reliably hit disk, so you'd need to perform a bytewise-check, etc. and would be better off just doing a clobbering write to reduce the complexity of the edge-case.

Which is to say, I think the `if (exists)` case should either error out hard and MOZ_ASSERT and/or clobber the existing file (and maybe still MOZ_ASSERT).  If there's a rationale behind this behavior, it either needs a serious comment and/or maybe the caller should be refactored to make the existing `if (inputStream)` case test false so CreateFileStream never gets called.

::: dom/indexedDB/test/unit/test_wasm_recompile.js
@@ +48,5 @@
> +              .objectStore(objectStoreName).get(wasmData.key);
> +  request.onsuccess = continueToNextStepSync;
> +  yield undefined;
> +
> +  verifyWasmModule(request.result, wasmData.wasm);

I'm a little unclear on what this test and its zip file are intending to prove.  This verifyWasmModule call should succeed even without the non-test parts of the patch.  (The fact that the SerializedStructuredCloneFile was null_t for the compiled piece going from parent-to-child causing recompilation on the child was never surfaced.)

If the goal is to prove that the file is recompiled and the on-disk copy updated, I would suggest doing:
- install the profile
- read out the contents of the compiled blob on disk
- get() the WASM and verifyWasmModule
- read out the contents of the should-be-new compiled blob on disk
- verify the blobs differ
- get() the WASM again
- read the blob on disk again
- make sure the blob didn't change this time in order to make sure we're not just rewriting the compiled copy every time.

It would be great to clarify the intent with a comment at the top of this file, plus explain what's going on with wasm_recompile_profile.zip a little more.  Specifically, I see the create_db.js in there (thanks very much for including it!), and I assume you just ran it locally and the assumption is that since the buildId is "20161116145318" (and the cpuId is X64=0x2 :) that no future run will ever have that buildId again so every test run will always trigger a rebuild.
Attachment #8811719 - Flags: review?(bugmail) → review+
(In reply to Luke Wagner [:luke] from comment #10)
> Are there no other cases where content JS
> exercises the JS engine in the parent process?  I'd be somewhat surprised. 

I'm still too new to platform to know where all the bodies are buried, but I think the answer is no if we're talking about eval("evilcode") in the parent, although maybe worklets are changing that (https://drafts.css-houdini.org/worklets/)?  A lot of this stems from the fact that getting a principal is already such a hassle, let alone creating a compartment and dealing with separate globals/etc. that it's just too much work to do anything dangerous in the name of saving time...

Threat-model-wise, I'm rationalizing the parent-compile as okay because it's more like mozilla::pkix for parsing TLS certs and all their ASN.1 horrors.  It's an attacker-controlled byte-stream that's parsed in the parent process but the parsing code is already intended to be suspicious of the byte-stream, etc.

That said, it would be nice to move this to the child, but from both :janv's expert analysis and my own back-of-the-napkin understandings, I agree the cost/risk puts this wayyyyy down the list.  On the attack-defense tree there are way cheaper and more important things to throw effort at.
(In reply to Luke Wagner [:luke] from comment #7)
> >  /* virtual */ void
> >  Module::serializedSize(size_t* bytecodeSize, size_t* compiledSize) const
> 
> Can you rename bytecodeSize/compiledSize/bytecodeBegin to
> maybeBytecodeSize/maybeCompiledSize/maybeBytecodeBegin in these two
> definitions, the declarations in WasmModule.h and the interface in jsapi.h?

Do you also want to rename compiledSize to maybeCompiledSize ?
(In reply to Andrew Sutherland [:asuth] from comment #17)
> I'm still too new to platform to know where all the bodies are buried, but I
> think the answer is no if we're talking about eval("evilcode") in the
> parent,

Indeed, I hope we're not doing *that* :)  I was just hypothesizing various use cases like you mentioned.   But agreed that, in the future, this recompilation would be better suited to run in the child.

(In reply to Jan Varga [:janv] from comment #18)
> Do you also want to rename compiledSize to maybeCompiledSize ?

Yes, thanks.
Attachment #8813090 - Flags: review?(luke)
Comment on attachment 8813090 [details] [diff] [review]
mozilla/js interdiff

Review of attachment 8813090 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/wasm/WasmModule.cpp
@@ +143,5 @@
>  {
> +    if (maybeBytecodeSize)
> +        *maybeBytecodeSize = SerializedPodVectorSize(bytecode_->bytes);
> +
> +    if (maybeCompiledSize) {

Oh this is a single-line too, I'll remove the { }
Comment on attachment 8813090 [details] [diff] [review]
mozilla/js interdiff

Review of attachment 8813090 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/wasm/WasmModule.cpp
@@ +143,5 @@
>  {
> +    if (maybeBytecodeSize)
> +        *maybeBytecodeSize = SerializedPodVectorSize(bytecode_->bytes);
> +
> +    if (maybeCompiledSize) {

Hah, well, technically we measure lines by \n not ; so this does indeed deserve the { } :)

@@ +171,3 @@
>                                                    bytecode_->bytes);
> +        MOZ_RELEASE_ASSERT(bytecodeEnd ==
> +                           maybeBytecodeBegin + maybeBytecodeSize);

We have 100 char column limits so these should both fit on one line.

::: js/src/wasm/WasmModule.h
@@ +154,5 @@
> +                                size_t* maybeCompiledSize) const override;
> +    virtual void serialize(uint8_t* maybeBytecodeBegin,
> +                           size_t maybeBytecodeSize,
> +                           uint8_t* maybeCompiledBegin,
> +                           size_t maybeCompiledSize) const override;

Similarly, you should be able to keep maybeXBegin/maybeXSize on the same line (same with jsapi.h).  Also, while I realize this is personal preference, I felt like the 'override' provided sufficient indication that these functions were virtual without adding the explicit 'virtual'.
Attachment #8813090 - Flags: review?(luke) → review+
(In reply to Andrew Sutherland [:asuth] from comment #16)
> @@ +29264,5 @@
> > +    if (NS_WARN_IF(fileSize < 0)) {
> > +      return NS_ERROR_FAILURE;
> > +    }
> > +
> > +    if (NS_WARN_IF(uint64_t(fileSize) != inputStreamSize)) {
> 
> I know you're just moving code that dates back to bug 994190 (use
> PBackground), but the semantics on this `if (exists)` branch are pretty
> crazy to have especially now that this is its own helper method.  Given the
> function's signature and lack of comments, I'd assume the idea is "write
> aInputStream to aFile, creating the aJournalFile so that the cleanup logic
> will know to delete the file up if we crash before we finish what we're
> doing."
> 
> But this branch adds "...And if the file seems to already exist, we won't
> actually write the stream but it's NS_OK as long as the file has the same
> length as the stream.  If the file had a different length, return an error
> and the inputstream will end up not being written to disk and whatever weird
> thing was already there will still be there."  Which is all kinds of
> suspect.  The only scenario where this seems like it would begin to make
> sense would be if we're retrying an idempotent operation where we know that
> only the exact inputstream would map exactly to this file name and the prior
> try crashed.  But if it crashed, I'm not sure it would make sense to trust
> that all the bytes reliably hit disk, so you'd need to perform a
> bytewise-check, etc. and would be better off just doing a clobbering write
> to reduce the complexity of the edge-case.
> 
> Which is to say, I think the `if (exists)` case should either error out hard
> and MOZ_ASSERT and/or clobber the existing file (and maybe still
> MOZ_ASSERT).  If there's a rationale behind this behavior, it either needs a
> serious comment and/or maybe the caller should be refactored to make the
> existing `if (inputStream)` case test false so CreateFileStream never gets
> called.

I think I figured out which corner case this branch is trying to handle. I'm trying to create a test for it ...
I think this should address your comments.
Attachment #8816262 - Flags: review?(bugmail)
Comment on attachment 8816262 [details] [diff] [review]
mozilla/dom/indexedDB interdiff

Review of attachment 8816262 [details] [diff] [review]:
-----------------------------------------------------------------

Love the wasm recompilation test changes!

::: dom/indexedDB/ActorsParent.cpp
@@ +29236,5 @@
> +  // IDBDatabase::GetOrCreateFileActorForBlob. So if the same DOM blob is stored
> +  // again under a different key or in a different object store, we just add
> +  // a new reference instead of creating a new copy (all such stored blobs share
> +  // the same id).
> +  // However, it can happen that CreateFileFromStream failed due to quota

This is a fantastic comment!  Thank you!

Restating my understanding (of the "fresh" blob case which isn't one retrieved from the DB):
- The first time IDBDatabase::GetOrCreateFileActorForblob sees a blob it creates a DatabaseFile actor and caches it via weak reference.
- In the parent, Database::AllocPBackgroundIDBDatabaseFileParent calls FileManager::GetNewFileInfo which allocates the file id at creation time.  This is the id that will be the file's name.  The fileInfo and blobImpl are wrapped into a DatabaseFile.
- In the initial failed write, DatabaseFile::GetInputStream returns an input stream because !!mBlobImpl and so we will try and write the file.  Because of the failure, DatabaseFile::ClearInputStream() will not be called, so its mBlobImpl will not be cleared.
- In the subsequent write, DatabaseFile::GetInputStream still returns an input stream because it wasn't cleared and we try and write again.  It will succeed this time and mBlobImpl will be nulled out.
- In any subsequent write attempts, DatabaseFile::GetInputStream will return null because !mBlobImpl, cleared when the write actually succeeded.  As a result, we won't try and write the file again.

::: dom/indexedDB/test/unit/test_file_copy_failure.js
@@ +4,5 @@
> + */
> +
> +var testGenerator = testSteps();
> +
> +function testSteps()

A nit, only fix if you think it's sufficiently helpful: paranoia-wise, it might be appropriate to:
- assert that neither `journalFile` or `file` exist on disk prior to you creating them.
- assert that `journalFile` does not exist after the test

Together these:
- help verify the write actually clobbered the right file
- makes the test break if in the future if the upgrade or test infra otherwise changes to allocate files/etc.
Attachment #8816262 - Flags: review?(bugmail) → review+
(In reply to Andrew Sutherland [:asuth] from comment #26)
> Restating my understanding (of the "fresh" blob case which isn't one
> retrieved from the DB):
> - The first time IDBDatabase::GetOrCreateFileActorForblob sees a blob it
> creates a DatabaseFile actor and caches it via weak reference.
> - In the parent, Database::AllocPBackgroundIDBDatabaseFileParent calls
> FileManager::GetNewFileInfo which allocates the file id at creation time. 
> This is the id that will be the file's name.  The fileInfo and blobImpl are
> wrapped into a DatabaseFile.
> - In the initial failed write, DatabaseFile::GetInputStream returns an input
> stream because !!mBlobImpl and so we will try and write the file.  Because
> of the failure, DatabaseFile::ClearInputStream() will not be called, so its
> mBlobImpl will not be cleared.
> - In the subsequent write, DatabaseFile::GetInputStream still returns an
> input stream because it wasn't cleared and we try and write again.  It will
> succeed this time and mBlobImpl will be nulled out.
> - In any subsequent write attempts, DatabaseFile::GetInputStream will return
> null because !mBlobImpl, cleared when the write actually succeeded.  As a
> result, we won't try and write the file again.

Yes, that's how it works.

> - assert that neither `journalFile` or `file` exist on disk prior to you
> creating them.
> - assert that `journalFile` does not exist after the test

Ok, added these checks to the test.
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6106512db4ba
rewrite stored WebAssembly.Module when build-id changes; r=asuth,luke
https://hg.mozilla.org/mozilla-central/rev/6106512db4ba
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
[Tracking Requested - why for this release]:

Thanks!  Could we uplift this to 52 so it could work on release?
(In reply to Luke Wagner [:luke] from comment #31)
> Thanks!  Could we uplift this to 52 so it could work on release?

I guess so.  Jan, care to fill in the uplift request template?
Flags: needinfo?(jvarga)
The patch doesn't apply to aurora cleanly, let me fix it.
Attached patch patch for auroraSplinter Review
Approval Request Comment
[Feature/Bug causing the regression]: enhancement, not a regression fix
[User impact if declined]: severely degraded performance of wasm module loading from IndexedDB 
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: affects still-default-off wasm feature
[String changes made/needed]: none
Flags: needinfo?(jvarga)
Attachment #8823797 - Flags: approval-mozilla-aurora?
Comment on attachment 8823797 [details] [diff] [review]
patch for aurora

wasm change for aurora52
Attachment #8823797 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: