Closed Bug 982779 Opened 10 years ago Closed 10 years ago

memory-backed blobs don't have a long enough lifetime for use in activities again

Categories

(Core :: DOM: Core & HTML, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla32
blocking-b2g 1.4+
Tracking Status
firefox30 --- wontfix
firefox31 --- wontfix
firefox32 --- fixed
b2g-v1.3 --- wontfix
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: asuth, Assigned: bent.mozilla)

References

Details

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #806503 +++

Memory-backed Blobs passed via web activities appear to not work again.  The time-line goes something like this as I understand it:

- Bug 806503 addressed the problem by sending the entire in-memory Blob over the wire to the other process.  Landed 2013-01-11

- Bug 832419 "Clean up IPC blobs" appears to have stopped sending the raw memory over the wire in favor of a unified remote blob mechanism.  There is minimal discussion in the bug or comments in the patch, so it's very possible I'm misunderstanding things.  Landed 2013-02-05

You will note that this all happened a long time ago, so it's surprising I am filing this regression bug now.  There has (probably) recently been a regression in v1.4 where the e-mail app fails to attach an image provided by the camera app at runtime via a "pick" webactivity.  The e-mail bug is bug 981010.  The bug about camera being dumb and returning a memory-backed Blob that it has already saved to DeviceStorage instead of the File resulting from the save is bug 949941.

The explanation for how this works in v1.3 at all seems to be per the logs I collected in bug 981010 comment 4 that there is some type of race that the e-mail app wins in v1.3 but loses in v1.4.  Specifically, in v1.3 ContentParent::NotifyTabDestroying is as far as the nefarious reaper gets before the e-mail app manages to successfully process the remote-memory-backed Blob in its entirety.  In v1.4 NotifyTabDestroyed/ShutDownProcess/ActorDestroy also get run, so the e-mail app finds itself sad.  My very uninformed guess is that previously some nested call-stack that is not getting returned from or something ridiculous saved the e-mail app.  It is conceivable that there are cases in v1.3 where the e-mail app loses, say if the camera takes a sufficiently large enough image that our sliced-chunk fetching fails us.


In terms of other things I looked into:

- Bug 968542 "IPC blobs can entrain a ContentParent" landed 2014-02-09, zeroing out a reference that potentially kept a manager alive.  If I back this out, it stops the e-mail app process from crashing on v1.4, but the process is already dead.


It seems like for the WebActivity case, Blobs either want to go back to just being propagated in their entirety when initially passed or that when a process is about to be killed it should send out little escape pods with the contents of the memory to the processes with remote references.  Of course, given the OOM killer, the latter potentially does not work despite the obvious chance for superman references.  If we're worried about memory bloat issues, perhaps the Blob could be transferred to sysV shared-memory or a deleted mmap'ed file or some other trick so that the memory is really only in the system once but the death of the process does not negatively impact the all important self-styled-highlander e-mail app.
See Also: → 981010, 949941
Does this break 1.4 or 1.3?
We should test this on a Tarako device too. Given the memory constraints we might be killing processes with different timing.
(In reply to Andreas Gal :gal from comment #1)
> Does this break 1.4 or 1.3?

E-mail works on v1.3 but breaks on v1.4 because of an ordering change.  The e-mail bug is bug 981010.  It will be resolved by camera bug 949941 which has camera return a DeviceStorage-owned File instead of a memory-backed Blob.  I have provided a patch on the camera bug.  I will have :mcav of the productivity team run a 1.3T run to see what happens there, but the solution in that case would likely just be to uplift the fix for bug 949941 too.

Skimming over https://developer.mozilla.org/en-US/docs/WebAPI/Web_Activities I don't see any other (documented) activities that would be likely to have a problem.
(In reply to Andrew Sutherland (:asuth) from comment #0)
> - Bug 806503 addressed the problem by sending the entire in-memory Blob over
> the wire to the other process.

Yep.

> - Bug 832419 "Clean up IPC blobs" appears to have stopped sending the raw
> memory over the wire in favor of a unified remote blob mechanism.

No. Same thing happens now, just uses a simpler protocol. We still eagerly send all memory from child to parent.
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #4)
> No. Same thing happens now, just uses a simpler protocol. We still eagerly
> send all memory from child to parent.

Huh.  I guess I was expecting some references to nsDOMMemoryFile to exist/come back.  I did see some stream stuff; does the memory block just get streamed into some type of stream-buffer on the client?

Any chance you could take a look at the logs in https://bugzilla.mozilla.org/show_bug.cgi?id=981010#c4 and see if anything about those error messages jumps out at you?  And failing that maybe use a current v1.4 b2g build (preferably buri/hamachi) and follow the STR from https://bugzilla.mozilla.org/show_bug.cgi?id=981010#c0 up through step 5 using some of your debug-fu?

The exact scenario that is failing is:
- Activity receives Blob on main page thread
- Blob gets sent to worker thread
- Worker slices the blob
- Worker creates a blob URL using URL.createObjectURL(blob) on the sliced blob
- Worker issues an async XHR against the blob URL
- Worker immediately revokes the blob URL after calling xhr.send()


:djf and I did independently verify that the wallpaper app's use of "pick" against the camera works fine.  I assume its behaviour is generally simpler and it's more prompt about using the Blob.  versus the e-mail app which processes it somewhat asynchronously.
The fact that you're slicing the blob sets off alarms for me.  I'm pretty sure the open gallery->MMS bug has something to do with the fact that there is slicing going on.  But maybe I'm not remembering correctly.
We're seeing more memory-backed Blob death in bug 989570.  While the problem will go away with fixing mozSettings to properly refresh its Blobs/Files from IndexedDB (bug 931224), it seems reasonably clear that there's some life-cycle problem still (at least in v1.4).

Note that the mozSettings case is fairly nuts since the life-cycle of the blob is actually:
- Camera mints a Blob A
- Blob A gets returned via Activity to wallpaper app, where it's now Blob B
- Blob B gets set into mozSettings, where it travels to the parent process and is Blob C
- Blob C gets sent as a mozSettings observer notification to the system/lock-screen process where it is now Blob D.

Blob D manages to paint once, but if you turn the screen off and back on again, you get a black screen, because the Blob is dead (probably), why not.


Also note that there also was a transient crash related to this where mActor was apparently getting nulled in dom/ipc/Blob.cpp#1034.  See bug 985167 comment 9.  :bent was needinfo'ed on that but the bugged was WORKFORME'd and the needinfo canceled before he got a chance to weigh in.
See Also: → 989570
Hm, child blobs should be eagerly copied to the parent and the data they manage should thereafter remain alive in the parent. This sort of thing shouldn't happen.
blocking-b2g: --- → 2.0?
Ugh, as with most blob bugs this one is basically a one line fix. Canvas is creating a File object and then not setting a modified date on it.
blocking-b2g: 2.0? → 2.0+
Assignee: nobody → bent.mozilla
Attached patch changes.patch (obsolete) — Splinter Review
Ok, the basic problem is that child processes were sending mystery blobs to the parent and that doesn't eagerly copy the underlying blob data. The child thinks that any time it sees a File with no modified date that it must be a mystery blob. This is wrong; the child should only ever receive mystery blobs and should never send them. Canvas creates nsDOMMemoryFile instances with no modifed date set, so that was culprit #1. Culprit #2 is the multipart blob code, it never sets its modified date until someone asks for it.

This fixes canvas by forcing it to provide a modified date to nsDOMMemoryFile. I also made the constructor argument required so hopefully nothing else has this problem in the future.

I also fixed multipart blob code to compute the modified date when it is created.

Then I added a bunch of assertions to hopefully make this problem easier to see in the future.
Status: NEW → ASSIGNED
Attachment #8413261 - Flags: review?(amarchesini)
Comment on attachment 8413261 [details] [diff] [review]
changes.patch

Review of attachment 8413261 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/src/nsDOMBlobBuilder.cpp
@@ +296,5 @@
> +
> +  for (uint32_t index = 0, count = mBlobs.Length(); index < count; index++) {
> +    // XXX This is only safe so long as all blob implementations in our tree
> +    //     inherit nsDOMFileBase.
> +    const auto* blob = static_cast<nsDOMFileBase*>(mBlobs[index].get());

can you add a MOZ_ASSERT(blob) here?
Attachment #8413261 - Flags: review?(amarchesini) → review+
Attached patch Patch, v2Splinter Review
I had to make a few additional changes to get all the tests passing. In particular I had to remove the activities optimization (which worked, but was flawed, and the flaw was hidden behind a mystery blob...) and I had to fix the chrome JS File constructor to learn the size eagerly.
Attachment #8413261 - Attachment is obsolete: true
Attachment #8414844 - Flags: review?(amarchesini)
Comment on attachment 8414844 [details] [diff] [review]
Patch, v2

Review of attachment 8414844 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +908,5 @@
> +  MOZ_ASSERT(fds.IsEmpty());
> +
> +  params.optionalInputStreamParams() = inputStreamParams;
> +
> +  if (nsCOMPtr<nsIDOMFile> file = do_QueryInterface(aBlob)) {

Same here. Read below.

::: dom/ipc/ContentParent.cpp
@@ +2383,5 @@
>      uint64_t length;
>      rv = aBlob->GetSize(&length);
>      NS_ENSURE_SUCCESS(rv, nullptr);
>  
> +    if (nsCOMPtr<nsIDOMFile> file = do_QueryInterface(aBlob)) {

NIT: This change is not needed... right? I think it's more readable as it was.
Attachment #8414844 - Flags: review?(amarchesini) → review+
(In reply to Andrea Marchesini (:baku) from comment #13)
> NIT: This change is not needed... right? I think it's more readable as it
> was.

It tightens the scope of the variable, but you're right, in this case it wasn't strictly needed.

Thanks!
blocking-b2g: 2.0+ → 1.4+
Oh, change to test_filepicker_path.html was r=fabrice.
https://hg.mozilla.org/mozilla-central/rev/6e21af391b24
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
David, now that this has landed, maybe there are some workarounds that you can remove?
Flags: needinfo?(dflanagan)
Created bug 1006919 to remove convertToFileBackedBlob workaround from gallery.js
This needs rebasing for v1.4 uplift.
Flags: needinfo?(bent.mozilla)
Rebasing over bug 999742 is going to be pretty difficult. Since that was a purely mechanical change with no side-effects I'd like to uplift that too.
Depends on: 999742
Flags: needinfo?(bent.mozilla)
:bent, is this something that is safe to take on v1.3?  On bug 1002308 there's a Sora bug where we seem to encounter app-crashing because of this bug and I want to know which mitigation strategy is best.  On buri v1.3 and Tarako v1.3t bugs related to this weren't reproducing, but I guess Sora is not quite as lucky.  (Also, the partner seems to be cherry-picking some things but not other things which means they don't have all workarounds.  Having them cherry-pick a different workaround is an option, just not the ideal option.)

I just manually tried cherry-picking the patch onto b2g28_v1_3 and there was only one hunk that was unhappy, so it naively seems this might work?

The slightly cut-down failed hunk is:

diff --cc dom/ipc/Blob.cpp
index 647faa6,15e7580..0000000
--- a/dom/ipc/Blob.cpp
+++ b/dom/ipc/Blob.cpp
@@@ -1161,21 -1773,29 +1161,31 @@@ Blob<ActorFlavor>::Blob(ContentManager
    switch (blobParams.type()) {
+     case ChildBlobConstructorParams::TMysteryBlobConstructorParams:
+       return nullptr;
+ 
      case ChildBlobConstructorParams::TNormalBlobConstructorParams:
      case ChildBlobConstructorParams::TFileBlobConstructorParams:
++<<<<<<< HEAD
 +    case ChildBlobConstructorParams::TMysteryBlobConstructorParams:
 +      return new Blob<ActorFlavor>(aManager, aParams);
++=======
+       return new BlobParent(aManager, aParams);
++>>>>>>> f5251c1... Bug 982779 - Ensure that child->parent IPC blobs are never mysterious. r=baku, a=1.4+
  
      case ChildBlobConstructorParams::TSlicedBlobConstructorParams: {
        const SlicedBlobConstructorParams& params =
Flags: needinfo?(bent.mozilla)
(In reply to Andrew Sutherland (:asuth) from comment #26)
> :bent, is this something that is safe to take on v1.3?  On bug 1002308 there's a Sora bug...

Just asked :bent about this in person.  He thinks it's too risky and v1.4 was already a stretch.  So I'm marking this wontfix on b2g v1.3 for flags too.
Flags: needinfo?(bent.mozilla)
Blocks: 1063658
Note: :julienw just did some reproduction testing on 1063658 and it unfortunately looks like this patch never actually fixed the problem, though I'm sure it made things a lot closer to working.  (We didn't notice because we already had mitigations in place.)  So if you're checking out this bug, please check out bug 1063658.
Flags: needinfo?(dflanagan)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: