Closed Bug 1288997 Opened 8 years ago Closed 8 years ago

MOZ_CRASH(IPC message size is too large) with PContent::Msg_PBlobConstructor

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla52
Tracking Status
firefox48 --- unaffected
firefox49 --- unaffected
relnote-firefox --- 50+
firefox50 + verified
firefox51 + verified
firefox52 --- verified

People

(Reporter: n.nethercote, Assigned: baku)

References

Details

(5 keywords)

Crash Data

Attachments

(7 files, 4 obsolete files)

34.84 KB, patch
jld
: review+
bkelly
: feedback+
Details | Diff | Splinter Review
289 bytes, text/html
Details
16.86 KB, patch
Details | Diff | Splinter Review
13.85 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
1.02 KB, patch
jdm
: review+
Details | Diff | Splinter Review
4.74 KB, patch
jld
: review+
Details | Diff | Splinter Review
1.14 KB, patch
marco
: review+
Details | Diff | Splinter Review
Here's a search of all the crashes in 50.0a1 where we crashed because an IPC message was too big:

https://crash-stats.mozilla.com/search/?product=Firefox&version=50.0a1&moz_crash_reason=%3DMOZ_CRASH%28IPC%20message%20size%20is%20too%20large%29&_sort=-date&_facets=moz_crash_reason&_facets=ipc_message_name&_facets=ipc_message_size&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=moz_crash_reason#facet-ipc_message_name

I see 74 of them in the past 7 days -- it's the second-most common kind of MOZ_CRASH crash -- and 72 of them involve a "PContent::Msg_PBlobConstructor" message.

The search result is faceted on several fields including IPC message size, which shows that the sizes are all 100s or 1000s of MBs, i.e. ridiculously large.

jld, is this something you know about?
Flags: needinfo?(jld)
Andrea's a better person to ask about blobs.
Flags: needinfo?(jld) → needinfo?(amarchesini)
What is happening here is that, recently we landed a code for broadcasting BlobURL to any process.
This shows a related problem: for memory blobs we send all the content into a IPC message. This means that a simple |URL.createObjectURL(new Blob([something very big]))| can crash FF.

The fix should be to use nsIInputStream and RemoteBlobs for memoryBlob too.
Group: dom-core-security
Flags: needinfo?(amarchesini)
Is this just an OOM crash? We don't really need to hide this then. There are tons of ways to OOM crash Firefox. :)
Not exactly an OOM, in that this will break Firefox no matter how much RAM you have:

<script>
var ab = new ArrayBuffer(256 << 20);
var b = new Blob([ab]);
document.write(URL.createObjectURL(b));
</script>

On Linux and Windows I got a sad-tab as expected, but on Mac it just locked up the content process (and in all cases, changing the 256 to 255 avoided the breakage and allowed the document.write to happen); I don't know what's going on with that yet, but it's a little worrying as far as whether we might be missing more crashes than just this.
Here's what's going on on Mac:

    frame #2: 0x0000000103285cde libnss3.dylib`PR_Lock + 14
    frame #3: 0x0000000103a67eb1 XUL`mozilla::ipc::MessageChannel::Send(IPC::Message*) + 257
    frame #4: 0x0000000103fc6432 XUL`mozilla::dom::PCrashReporterChild::SendAnnotateCrashReport(nsCString const&, nsCString const&) + 498
    frame #5: 0x0000000106488f8e XUL`CrashReporter::AnnotateCrashReport(nsACString_internal const&, nsACString_internal const&) + 302
    frame #6: 0x0000000103a6f3c1 XUL`mozilla::ipc::ProcessLink::SendMessage(IPC::Message*) + 273
    frame #7: 0x0000000103a67ec9 XUL`mozilla::ipc::MessageChannel::Send(IPC::Message*) + 281

Note the reentrancy — the content process tries to send crash annotations to the parent on the same channel that's in the middle of sending the message that's causing the crash.  I'm not sure how this ever works (although clearly it sometimes does): 

http://searchfox.org/mozilla-central/rev/336105a0de49f41a2cc203db7b7ee7266d0d558b/ipc/glue/MessageLink.cpp#164
Assignee: nobody → amarchesini
Keywords: sec-low
We should talk about this :)
How blobs work in e10s, is this: if a memory blob is sent to the parent process, we write the content in 1 single IPC message. If the content is too big, we end up crashing as this bug shows.

My approach, in these 2 patches, is to use a 'remote' input stream. This means that, we send a nsIInputStream to the parent process, and there, we create a BlobImplStream object.

There are 3 main issues here:

1. slice is not implemented yet. But this is an implementation detail.

2. What about if the BlobImpl is sent back to the content child? We create a remoteBlobImpl in the child process, keeping alive a blob in the parent process (the BlobImplStream), and this reads data from the child process using the SendStream object. We should optimize it.

3. This setup, is extremely similar to what we do when we send a blob from parent process to the content process, and the the content process sends the same blobImpl to the parent process again. Here we are smart enough to identify that the received blob is 'known' and we use the original blob in the parent process.

We can use the same approach for point 2 as we do for point 3. But this is a big change and would be nice if we do a bit of cleanup of code here. Having the same approach everywhere.

And here the real question for you, bkelly: can we have this SendStream child-parent and parent-child? Keep in mind that we already have something similar in blob-IPC code. If we can do this, we can have a new wonderful setup like this:

if a blob is sent from A to B (where A and B are 2 processes), we send a nsIInputStream and a Blob unique ID. If this blob unique ID is found in a per-process hashtable, we use the known blobImpl instead the received nsIInputStream. Otherwise we create a BlobImplStream.

I'm happy to describe this setup better in a meeting or on IRC.
Flags: needinfo?(bkelly)
(In reply to Andrea Marchesini (baku - PTO 1/8 - 12/8) from comment #7)
> And here the real question for you, bkelly: can we have this SendStream
> child-parent and parent-child? Keep in mind that we already have something
> similar in blob-IPC code. If we can do this, we can have a new wonderful
> setup like this:

Making PSendStream work parent-to-child is bug 1274343.  I was hesitant to do that initially because its a bit risky for PBackground actors attached to worker threads.  We should be able to add enough asserts to make it safe, though.

I don't anticipate allowing the same PSendStream instance to be passed from child-to-parent and then back from parent-to-child, though.  The parent would have to create a new PSendStream to send back to the child.

Personally, I think this is fine and its not worth the effort of optimizing for this kind of round trip at first.
Flags: needinfo?(bkelly)
Comment on attachment 8775488 [details] [diff] [review]
part 1 - PContentChild vs nsIContentChild

Review of attachment 8775488 [details] [diff] [review]:
-----------------------------------------------------------------

I'm supportive of this change.  I was not aware of the nsIContentChild interface when I did PSendStream initially.  I think the patch should be reviewed by an IPC peer, though.
Attachment #8775488 - Flags: review?(bkelly) → feedback+
Crash volume for signature 'mozilla::ipc::ProcessLink::SendMessageW':
 - nightly (version 50): 234 crashes from 2016-06-06.
 - aurora  (version 49): 267 crashes from 2016-06-07.
 - beta    (version 48): 2 crashes from 2016-06-06.
 - release (version 47): 0 crashes from 2016-05-31.
 - esr     (version 45): 0 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly      87      72       1       5      14      12       7
 - aurora        8      24       8      12      26      80      82
 - beta          0       1       0       0       0       0       1
 - release       0       0       0       0       0       0       0
 - esr           0       0       0       0       0       0       0

Affected platform: Windows
I'm often crashing on 50.0a2 when downloading files from mega.nz.
Group: dom-core-security
See Also: → 1271102
(In reply to Andrea Marchesini (baku - PTO 1/8 - 12/8) from comment #2)
> What is happening here is that, recently we landed a code for broadcasting
> BlobURL to any process.

Which is the bug where we landed this?
Keywords: regression
Too low volume in 48 to care about this regression.
Blocks: 1271102
See Also: 1271102
It would be great if we could identify the cause of the regression.
Flags: needinfo?(amarchesini)
(In reply to Marco Castelluccio [:marco] from comment #12)
> (In reply to Andrea Marchesini (baku - PTO 1/8 - 12/8) from comment #2)
> > What is happening here is that, recently we landed a code for broadcasting
> > BlobURL to any process.
> 
> Which is the bug where we landed this?

Bug 1279186.
Requesting tracking for 50 because it looks like this regression is breaking mega.nz.

Removing needinfo since Jed replied in comment 15.
Flags: needinfo?(amarchesini)
(In reply to Marco Castelluccio [:marco] (PTO until August 24/25) from comment #14)
> It would be great if we could identify the cause of the regression.

Bug 1279186.
Attachment #8775488 - Flags: review?(jld)
Tracking 51+ for this crash. For 50, the initial facet shows only 3 crashes in the last week - ni on Marco to clarity as to the importance of the site in Comment 16. We can certainly track for 50 if it makes sense.
Flags: needinfo?(mcastelluccio)
Comment on attachment 8775488 [details] [diff] [review]
part 1 - PContentChild vs nsIContentChild

Review of attachment 8775488 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.  This was a learning experience in some of the code it's touching, but, looks good.
Attachment #8775488 - Flags: review?(jld) → review+
Hi Baku, one of the patches is r+d but not the other. If we have a fix that is ready for Aurora50, I'd be happy to uplift soon.
Flags: needinfo?(amarchesini)
Crash volume for signature 'mozilla::ipc::ProcessLink::SendMessageW':
 - nightly (version 51): 230 crashes from 2016-08-01.
 - aurora  (version 50): 757 crashes from 2016-08-01.
 - beta    (version 49): 44 crashes from 2016-08-02.
 - release (version 48): 0 crashes from 2016-07-25.
 - esr     (version 45): 0 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly      76      67      63
 - aurora      248     301      84
 - beta         17      12       3
 - release       0       0       0
 - esr           0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly #352      #12
 - aurora  #1569     #7
 - beta              #142
 - release
 - esr
The crash on 49 beta is a different crash and not the regression caused by bug 1279186 (which landed in 50).
Crash volume for signature 'mozilla::ipc::ProcessLink::SendMessageW':
 - nightly (version 51): 240 crashes from 2016-08-01.
 - aurora  (version 50): 783 crashes from 2016-08-01.
 - beta    (version 49): 45 crashes from 2016-08-02.
 - release (version 48): 0 crashes from 2016-07-25.
 - esr     (version 45): 0 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly      76      67      63
 - aurora      248     301      84
 - beta         17      12       3
 - release       0       0       0
 - esr           0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly #352      #10
 - aurora  #1544     #7
 - beta              #132
 - release
 - esr
Sorry for the bugspam :)
(In reply to Ritu Kothari (:ritu) from comment #20)
> Hi Baku, one of the patches is r+d but not the other. If we have a fix that
> is ready for Aurora50, I'd be happy to uplift soon.

The second patch is not ready yet. I'm working on this.
Flags: needinfo?(amarchesini)
#7 content crash on Aurora 50.0a2 (mega.nz is #221 in the Alexa global list of top sites: http://www.alexa.com/siteinfo/mega.nz).
Attached file crasher.html
Here's an example that causes a crash once opened.
We're not seeing crash reports on Linux because on Linux we're not crashing, but hanging with the signature "Abort | mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex" (e.g. https://crash-stats.mozilla.com/report/index/a37a145e-6cce-40f8-9b5c-583852160830).
Crash Signature: [@ mozilla::ipc::ProcessLink::SendMessageW] → [@ mozilla::ipc::ProcessLink::SendMessageW] [@ Abort | mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex ]
See Also: → 1255977
Attachment #8775492 - Attachment is obsolete: true
Attachment #8790093 - Flags: review?(bkelly)
Comment on attachment 8790093 [details] [diff] [review]
part 2 - SendStream for memory blobs

Review of attachment 8790093 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable, but I think the fact this does not support WOULD_BLOCK and PSendStream output is async will cause breakage with large streams.

::: dom/base/File.cpp
@@ +1238,5 @@
> +  nsCOMPtr<nsIInputStream> clonedStream;
> +  nsCOMPtr<nsIInputStream> replacementStream;
> +
> +  aRv = NS_CloneInputStream(mInputStream, getter_AddRefs(clonedStream),
> +                            getter_AddRefs(replacementStream));

You should make SlicedInputStream implement nsICloneableInputStream to avoid unnecessary copying here.

::: netwerk/base/SlicedInputStream.cpp
@@ +44,5 @@
> +    return rv;
> +  }
> +
> +  if (*aLength + mCurPos > mStart + mLength) {
> +    *aLength -= XPCOM_MIN(*aLength, (*aLength + mCurPos) - (mStart + mLength));

We should have a gtest that validates the different combinations of slicing possibilities here.

@@ +72,5 @@
> +    return NS_BASE_STREAM_CLOSED;
> +  }
> +
> +  char buf[4096];
> +  while (mCurPos < mStart + mLength && *aResult < aCount) {

I think this should at least try to use nsISeekableInputStream to change the current position.  If it doesn't work then fall back to this loop.

@@ +76,5 @@
> +  while (mCurPos < mStart + mLength && *aResult < aCount) {
> +    uint32_t bytesRead;
> +    uint64_t bufCount = XPCOM_MIN(aCount - *aResult, (uint32_t)sizeof(buf));
> +    nsresult rv = mInputStream->Read(buf, bufCount, &bytesRead);
> +    if (NS_WARN_IF(NS_FAILED(rv)) || bytesRead == 0) {

It would be nice if this supports nsIAsyncInputStream.  It would need to check for WOULD_BLOCK here, etc.

And I think you actually need it here since you attach a BlobImplStream to the output of a PSendStream which is an async stream.

Also, SlicedInputStream cannot be sent using PSendStream unless it is itself an nsIAsyncInputStream and non-blocking.  So if you expect these blobs to get passed back over IPC in the future you might as well implement the interface now.

@@ +116,5 @@
> +
> +NS_IMETHODIMP
> +SlicedInputStream::IsNonBlocking(bool *aNonBlocking)
> +{
> +  *aNonBlocking = false;

This seems wrong.  If the base stream is blocking then this stream is blocking.

::: netwerk/base/SlicedInputStream.h
@@ +11,5 @@
> +
> +namespace mozilla {
> +namespace net {
> +
> +class SlicedInputStream final : public nsIInputStream

I feel like this should be put in xpcom in a separate patch reviewed by :froydnj.
Attachment #8790093 - Flags: review?(bkelly) → review-
Attachment #8790093 - Attachment is obsolete: true
Attachment #8790647 - Flags: review?(nfroyd)
Attachment #8790647 - Flags: feedback?(bkelly)
Comment on attachment 8790647 [details] [diff] [review]
part 2 - SendStream for memory blobs

Review of attachment 8790647 [details] [diff] [review]:
-----------------------------------------------------------------

The commit message for this patch is incorrect.

Comments on SlicedInputStream below; I will let bkelly comment on the DOM pieces.  I realize the requested number of tests is rather large, but it'd be nice to have a large amount of confidence in this code.

::: dom/base/File.cpp
@@ +1245,5 @@
> +    return;
> +  }
> +
> +  if (replacementStream) {
> +    mInputStream = replacementStream;

Might as well use replacementStream.forget() here.

::: netwerk/base/SlicedInputStream.cpp
@@ +48,5 @@
> +    return rv;
> +  }
> +
> +  if (*aLength + mCurPos > mStart + mLength) {
> +    *aLength -= XPCOM_MIN(*aLength, (*aLength + mCurPos) - (mStart + mLength));

This calculation doesn't seem correct.  if *aLength + mCurPos is more than mStart + mLength, but still very close to it, you'll make *aLength the amount that *aLength + mCurPos exceeds mStart + mLength by, rather than capping it to something like (mStart + mLength) - mCurPos.

@@ +105,5 @@
> +
> +    uint64_t index = 0;
> +    if (mCurPos < mStart) {
> +      index += XPCOM_MIN(mStart - mCurPos, bufCount);
> +    }

It seems like the loop would be more understandable if you enforced the precondition that mCurPos >= mStart outside the loop, as you've done for the seekable stream case above.  Might result in a little more code, but ideally you could factor out the reading code into a separate function, to be called from the precondition enforcing code and from the loop itself.

@@ +114,5 @@
> +    if (mCurPos > mStart + mLength) {
> +      offset -= XPCOM_MIN((uint64_t)0, mStart + mLength - mCurPos);
> +    }
> +
> +    if (index == bufCount || offset == 0 || offset - index == 0) {

MOZ_ASSERT that index < offset after this condition?  It obviously won't affect the loop below, but it would point out a bug in how we were calculating these.

@@ +167,5 @@
> +    return rv;
> +  }
> +
> +  if (replacementStream) {
> +    mInputStream = replacementStream;

Might as well use replacementStream.forget() here.

::: netwerk/base/SlicedInputStream.h
@@ +13,5 @@
> +
> +namespace mozilla {
> +namespace net {
> +
> +class SlicedInputStream final : public nsIAsyncInputStream

Please add some kind of documentation for this.

Like bkelly said, please move SlicedInputStream + its tests to XPCOM.

::: netwerk/test/gtest/TestSlicedInputStream.cpp
@@ +15,5 @@
> +
> +public:
> +  NS_DECL_THREADSAFE_ISUPPORTS
> +
> +  SimpleInputStream(char* aBuf, uint32_t aLength)

Does NS_NewByteInputStream or NS_NewCStringInputStream not work for you?  I see that you use them below, but you also use this class.

Ah, the distinction is that string streams are seekable, and these are not.  Perhaps a better name would be NonSeekableStringStream, then, with a comment saying something like:

"We want to ensure that sliced streams work with both seekable and non-seekable input streams.  As our string streams are seekable, we need to provide a string stream that doesn't permit seeking, so we can test the logic that emulates seeking in sliced input streams."

Alternatively, maybe you could get away with a class that just wraps an nsICOMPtr<nsIInputStream> member that *is* a string stream, use NS_FORWARD_NSIINPUTSTREAM to the string stream, and then the QI implementation for the class would be as you have now?

@@ +96,5 @@
> +  RefPtr<mozilla::net::SlicedInputStream> sis =
> +    new mozilla::net::SlicedInputStream(stream, 0, BUF_SIZE);
> +
> +  uint64_t length;
> +  ASSERT_EQ(NS_OK, sis->Available(&length));

There should be some tests for Available() in the middle of reads (haven't started reading the slice, about to start reading the slice, in the middle of the slice, close to the end of the slice, at the end of the slice, etc.).

@@ +102,5 @@
> +
> +  char buf2[BUF_SIZE];
> +  uint32_t count;
> +  ASSERT_EQ(NS_OK, sis->Read(buf2, sizeof(buf2), &count));
> +  ASSERT_TRUE(nsCString(buf).Equals(nsCString(buf2)));

Using nsDependentCString in these sorts of asserts would make the code ever so slightly faster.

@@ +119,5 @@
> +                                  NS_ASSIGNMENT_DEPEND));
> +  ASSERT_TRUE(!!stream);
> +
> +  RefPtr<mozilla::net::SlicedInputStream> sis =
> +    new mozilla::net::SlicedInputStream(stream, 10, 100);

We should also have tests where the requested slice is actually outside the bounds of the wrapped stream--i.e. something like 4000, 5000 for this particular example.

@@ +127,5 @@
> +  ASSERT_EQ((uint64_t)100, length);
> +
> +  char buf2[BUF_SIZE / 2];
> +  uint32_t count;
> +  ASSERT_EQ(NS_OK, sis->Read(buf2, sizeof(buf2), &count));

We should also test reads after the slice has been exhausted.  Testing multiple reads from a slice would be good too, rather than "hey, let's just read the slice in one go."

@@ +132,5 @@
> +  ASSERT_EQ((uint64_t)100, count);
> +  ASSERT_TRUE(nsCString(buf + 10, count).Equals(nsCString(buf2, count)));
> +}
> +
> +TEST(TestSlicedInputStream, SlicedNoSeek) {

It's a bit painful for the tests to have so much copy-and-paste code; the best way I can think of is to write:

void BasicSlicedTest(mozilla::function<already_AddRefed<nsIInputStream>(const char*, size_t)> aStreamCreator)
{
  ...
  nsCOMPtr<nsIInputStream> stream = aStreamCreator(buf, sizeof(buf));
  RefPtr<SlicedInputStream> sis = new SlicedInputStream(...);
  ...
}

(or template it over the type of aStreamCreator, if you like that approach better.)  Not sure if that's overkill for these tests or not.  I think it'd be less overkill if we started adding some of the tests above, though.
Attachment #8790647 - Flags: review?(nfroyd)
> > +  if (*aLength + mCurPos > mStart + mLength) {
> > +    *aLength -= XPCOM_MIN(*aLength, (*aLength + mCurPos) - (mStart + mLength));
> 
> This calculation doesn't seem correct.  if *aLength + mCurPos is more than
> mStart + mLength, but still very close to it, you'll make *aLength the
> amount that *aLength + mCurPos exceeds mStart + mLength by, rather than
> capping it to something like (mStart + mLength) - mCurPos.

If mCurPos  + *aLength is > than mStart * mLength it means that we have more than what we want.
Here I trim the end of the buffer: I want to remove everything that exceeds the mCurPos + *aLength - mStart + mLength.
To be more precise:

Buffer:  [aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa]
Sliced:     [bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb]<-->
Current:  (doesn't matter where mCurPos is)      ^ mCurPos + *aLength

In this case I want to reduce *aLength of the difference between mCurPos + *aLength and the max we have to return (mStart + mlength);
If you prefer I can split the patch in 2: SlicedInputStream and the DOM part.
Attachment #8790647 - Attachment is obsolete: true
Attachment #8790647 - Flags: feedback?(bkelly)
Attachment #8790886 - Flags: review?(nfroyd)
(In reply to Andrea Marchesini [:baku] from comment #34)
> > > +  if (*aLength + mCurPos > mStart + mLength) {
> > > +    *aLength -= XPCOM_MIN(*aLength, (*aLength + mCurPos) - (mStart + mLength));
> > 
> > This calculation doesn't seem correct.  if *aLength + mCurPos is more than
> > mStart + mLength, but still very close to it, you'll make *aLength the
> > amount that *aLength + mCurPos exceeds mStart + mLength by, rather than
> > capping it to something like (mStart + mLength) - mCurPos.
> 
> If mCurPos  + *aLength is > than mStart * mLength it means that we have more
> than what we want.
> Here I trim the end of the buffer: I want to remove everything that exceeds
> the mCurPos + *aLength - mStart + mLength.
> To be more precise:
> 
> Buffer:  [aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa]
> Sliced:     [bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb]<-->
> Current:  (doesn't matter where mCurPos is)      ^ mCurPos + *aLength
> 
> In this case I want to reduce *aLength of the difference between mCurPos +
> *aLength and the max we have to return (mStart + mlength);

Indeed, my mistake.  I think I was assuming it was straight-up assignment and not subtract-assignment.
Comment on attachment 8790886 [details] [diff] [review]
part 2 - SendStream for memory blobs

Review of attachment 8790886 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments below.  I think splitting the patch into two pieces (xpcom/ and dom/) is reasonable.

You still need to fix the commit message for the patch.

::: xpcom/io/SlicedInputStream.cpp
@@ +40,5 @@
> +    return NS_BASE_STREAM_CLOSED;
> +  }
> +
> +  nsresult rv = mInputStream->Available(aLength);
> +  if (NS_WARN_IF(NS_FAILED(rv))) {

Does this do the right thing if the underlying stream is asynchronous and doesn't currently have any data?

::: xpcom/io/SlicedInputStream.h
@@ +10,5 @@
> +#include "nsCOMPtr.h"
> +#include "nsIAsyncInputStream.h"
> +#include "nsICloneableInputStream.h"
> +
> +// This inputStream is able to slice the size of another inputStream.

I think this comment makes sense in the context of Blobs, but I'm not sure it makes sense as its own standalone thing.  Maybe:

"A wrapper for a slice of an underlying input stream."

@@ +21,5 @@
> +  NS_DECL_NSIINPUTSTREAM
> +  NS_DECL_NSICLONEABLEINPUTSTREAM
> +  NS_DECL_NSIASYNCINPUTSTREAM
> +
> +  SlicedInputStream(nsIInputStream* aInputStream,

Something informative about how to use this would be nice.

"Create an input stream whose data comes from a slice of aInputStream.  The slice begins at aStart bytes beyond aInputStream's current position, and extends for a maximum of aLength bytes.  If aInputStream contains fewer than aStart bytes, reading from SlicedInputStream returns no data.  If aInputStream contains more than aStart bytes, but fewer than aStart + aLength bytes, reading from SlicedInputStream returns as many bytes as can be consumed from aInputStream after reading aLength bytes.

aInputStream should not be read from after constructing a SlicedInputStream wrapper around it."

::: xpcom/tests/gtest/TestSlicedInputStream.cpp
@@ +92,5 @@
> +}
> +
> +// Same start, same length.
> +TEST(TestSlicedInputStream, Simple) {
> +  #define BUF_SIZE 4096

Rather than #define/#undef clutter, you could just:

const size_t kBufSize = 4096;

here and everywhere below, so you could at least eliminate the #undefs.
Attachment #8790886 - Flags: review?(nfroyd) → review+
Attachment #8790886 - Attachment is obsolete: true
Attachment #8791495 - Flags: review?(bkelly)
Comment on attachment 8791495 [details] [diff] [review]
part 3 - SendStream for memory blobs

Review of attachment 8791495 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/File.cpp
@@ +1238,5 @@
> +  nsCOMPtr<nsIInputStream> clonedStream;
> +  nsCOMPtr<nsIInputStream> replacementStream;
> +
> +  aRv = NS_CloneInputStream(mInputStream, getter_AddRefs(clonedStream),
> +                            getter_AddRefs(replacementStream));

Can the two sides of the clone be read at different rates?  Specifically, is it possible for one side to remain open and never read?  If yes, then we probably need to fix bug 1134372 in order for you to use this here.

::: dom/ipc/Blob.cpp
@@ +955,5 @@
>    MOZ_ALWAYS_TRUE(!rv.Failed());
>  
> +  AutoIPCStream autoStream;
> +  autoStream.Serialize(inputStream, aManager);
> +  aBlobData = autoStream.TakeValue();

This kind of defeats the purpose of the RAII class here.  If any failures occur after returning from this method and before actually sending it across IPC then you are going to leak stuff.

It would be slightly nicer to pass in the AutoIPCStream or your own AutoBlobData wrapper RAII object.  If you can guarantee there are no error paths after this, though, I guess its ok.  Please add a comment indicating that leaks will occur if the BlobData is not actually sent.
Attachment #8791495 - Flags: review?(bkelly) → review+
I've reprocessed my crash (https://crash-stats.mozilla.com/report/index/a37a145e-6cce-40f8-9b5c-583852160830) and it got a different signature now: AsyncShutdownTimeout | quit-application-granted | SessionStore: flushing all windows.
Crash Signature: [@ mozilla::ipc::ProcessLink::SendMessageW] [@ Abort | mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex ] → [@ mozilla::ipc::ProcessLink::SendMessageW] [@ Abort | mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex ] [@ AsyncShutdownTimeout | quit-application-granted | SessionStore: flushing all windows ]
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/11eb7909e663
memory blob should not be shared across processes - part 1 -PSendStream should use nsIContentChild, r=jld, f=bkelly
https://hg.mozilla.org/integration/mozilla-inbound/rev/7fa97d674377
memory blob should not be shared across processes - part 2 - SlicedInputStream, r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/f822594d9e9a
memory blob should not be shared across processes - part 3 - SendStream for memory blobs, r=bkelly
I'm still experiencing the hang on Linux in the latest Nightly, I haven't tested on Windows.

I see the following message in the shell:
###!!! [Parent][MessageChannel] Error: (msgtype=0xF4000C,name=PStorage::Msg_Observe) Channel error: cannot send/recv
> ###!!! [Parent][MessageChannel] Error:
> (msgtype=0xF4000C,name=PStorage::Msg_Observe) Channel error: cannot send/recv

PStorage, it seems unrelated. Can you post a bigger backtrace?
Andrea, aren't you able to reproduce it with the 'crasher.html' attachment?
I think I'm not the only one able to reproduce, see bug 1255977 comment 20.
Flags: needinfo?(amarchesini)
Right, Bug 1294450 will fix this crash.
Flags: needinfo?(amarchesini)
Depends on: 1294450
Hi Marco, Baku, if the fix here fixes some of the crashes (even if not all), it's worthwhile uplifting to Beta50 I think. Should we uplift to Aurora and Beta?
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(amarchesini)
I haven't tested on Windows, which is where the crash is happening, perhaps Andrei or Florin can help?

You just need to open the attachment "crasher.html" (or, download a file bigger than 256 MB from mega.nz).
If Firefox doesn't crash, then the bug is fixed.
Flags: qe-verify+
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(florin.mezei)
Flags: needinfo?(andrei.vaida)
Adding a link to a 300 MB anime video file for QA testing purposes & extra verification:
https://mega.nz/#!ZpklkLxR!V8e5b9DlzktBL5Mi-AZ1XuXTCq-LnPiTV6hfu3dUGVQ
(anime: Akame ga Kill!)
Comment on attachment 8791494 [details] [diff] [review]
part 2 - SlicedInputStream

Approval Request Comment
[Feature/regressing bug #]: Blob and multi-e10s
[User impact if declined]: a crash can happen if the blob size is bigger than the max size of a IPC message
[Describe test coverage new/current, TreeHerder]: green on try.
[Risks and why]: We have 3 patches here. Complex, somehow. But they are fully green and they fix a top crash.
[String/UUID change made/needed]: none

Note: all the 3 patches must be uplifted.
Flags: needinfo?(amarchesini)
Attachment #8791494 - Flags: approval-mozilla-beta?
Attachment #8791494 - Flags: approval-mozilla-aurora?
I've confirmed with bisect that this also has the side effect of plugging half of a leak similar to bug 1272078 I've been experiencing with the Tab Center addon from Test Pilot (so maybe it plugs bug 1272078 too). (The other half of the leak /might/ be due to Tab Center itself, I have yet to identify what's keeping the objects around, but closing tabs releases them now, which wasn't happening before this bug)
I can still reproduce the crash on Windows: https://crash-stats.mozilla.com/report/index/ccaa90a9-c89b-4ffe-8944-f8a2a2160929.
Status: RESOLVED → REOPENED
Flags: needinfo?(amarchesini)
Resolution: FIXED → ---
(In reply to Andrea Marchesini [:baku] from comment #53)
> [Feature/regressing bug #]: Blob and multi-e10s

To be precise, this would be bug 1279186, according to comment 15 and comment 17.

> [Describe test coverage new/current, TreeHerder]: green on try.

Would it be possible to add the 'crasher.html' attachment as a testcase?
I've managed to reproduce this issue on Windows 10 x64 and Windows 7 x64 with steps provided by Marco, in comment 51.
I can confirm that Firefox crashed on all affected builds 50.0b2 (2016-09-26), 51.0a2 (2016-09-29) and 52.0a1 (2016-09-28): https://crash-stats.mozilla.com/report/index/81845381-0c2c-4a2b-b89f-cfc2f2160929

When I disabled e10s, Firefox did not crashed anymore. (this happened with all affected versions)
If I can help with more info, please let me know.
Flags: needinfo?(florin.mezei)
Flags: needinfo?(andrei.vaida)
Summary: MOZ_CRASH(IPC message size is too large) → MOZ_CRASH(IPC message size is too large) with PContent::Msg_PBlobConstructor
Comment on attachment 8791494 [details] [diff] [review]
part 2 - SlicedInputStream

It seems based on our testing of the fix on Nightly52, this fix doesn't really fix the crash at all. Is that right? In that case, should we be really uplifting this fix? 

Hi Baku, Marco, I am reiterating my suggestion, if this fix reduces the occurrences of the crash it is worth uplifting. If it doesn't fix a single occurrence of this crash, then we are better off spinning another fix. Please let me know if I am misunderstand the situation.
Flags: needinfo?(mcastelluccio)
Attachment #8791494 - Flags: approval-mozilla-beta?
Attachment #8791494 - Flags: approval-mozilla-beta-
Attachment #8791494 - Flags: approval-mozilla-aurora?
Attachment #8791494 - Flags: approval-mozilla-aurora-
In order to fix this crash we must fix also bug 1294450. Bug 1294450 is in m-i, so, next nightly should work fine.
Flags: needinfo?(amarchesini)
Flags: needinfo?(mcastelluccio)
Sorry for asking this, but, Marco, can you verify this again tomorrow?
Flags: needinfo?(mcastelluccio)
It's still crashing, although with a different signature and intermittently.

https://crash-stats.mozilla.com/report/index/36d4ac2f-5745-4cc0-8a96-4644b2161001
https://crash-stats.mozilla.com/report/index/3a3e7e37-d20d-4cb6-864c-4f7502161001
https://crash-stats.mozilla.com/report/index/1021953d-9148-4b50-8100-8f8f72161001
https://crash-stats.mozilla.com/report/index/ebf835fe-1194-40e6-bdcb-170392161001
(these happened directly when downloading the file from mega.nz or opening crasher.html)

https://crash-stats.mozilla.com/report/index/b5f4fd08-7569-4184-ab42-f5f422161001
(this happened soon after restaring the browser)

I'm needinfoing jdm as well, since he wrote the patch in bug 1294450.
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(josh)
Flags: needinfo?(amarchesini)
The code that is crashing is:
    case type__::TPSendStreamChild:
        {
            PSendStreamParent* tmp = nullptr;
            (*(v__)) = tmp;
            if ((!(Read((&((v__)->get_PSendStreamParent())), msg__, iter__, false)))) {
                FatalError("Error deserializing Union type");
                return false;
            }
Sorry, not clear that I can be any more helpful than filing bug 1306983 at this point.
Flags: needinfo?(josh)
Depends on: 1306979
Andrea, correct me if I'm wrong. In order to fix this crash, we need to uplift your patches
from this bug, jdm's patches from bug 1294450 and the patches that are going to fix bug
1306979.

Is backing out bug 1279186 from beta a simpler/safer option?
This patch is almost like disabling the blobURL support for multi-e10s.
In the meantime we can debug why nsIInputStreams have some issues with IPC.
Flags: needinfo?(amarchesini)
Attachment #8797122 - Flags: review?(josh)
Attachment #8797122 - Flags: review?(josh) → review+
This is racy (but quite easy to reproduce): The reason why we crash is because AutoIPCStream, in the DTOR, starts pulling data from the nsIInputStream.
We are here: SendStreamChildImpl::DoRead();

Now, the |while(true) { ... OnEnd()}| could be faster than the SendPBlobConstructor(). At this point we call SendClose() before creating the PBlob in the parent process.

We unregister the PSendStream and when the RecvPBlobConstructor is executed, the PSendStream doesn't exist anymore and we crash.

In this patch I postpone the DTOR of AutoIPCStream.
Attachment #8797156 - Flags: review?(bkelly)
Keywords: leave-open
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/44d130271ebc
We should not broadcast Blob URLs if we have only 1 content process, r=jdm
(In reply to Pulsebot from comment #67)
> Pushed by amarchesini@mozilla.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/44d130271ebc
> We should not broadcast Blob URLs if we have only 1 content process, r=jdm

Does this mean that blob URLs registered in the parent process can't be accessed from the content process if multi-e10s isn't enabled?  If so, this is a regression: that use case is how add-on authors can be sandbox-friendly while using local files with components like nsIStyleSheetService that broadcast URIs.
(In reply to Jed Davis [:jld] {⏰UTC-6} from comment #68)
> (In reply to Pulsebot from comment #67)
> > Pushed by amarchesini@mozilla.com:
> > https://hg.mozilla.org/integration/mozilla-inbound/rev/44d130271ebc
> > We should not broadcast Blob URLs if we have only 1 content process, r=jdm
> 
> Does this mean that blob URLs registered in the parent process can't be
> accessed from the content process if multi-e10s isn't enabled?  If so, this
> is a regression: that use case is how add-on authors can be sandbox-friendly
> while using local files with components like nsIStyleSheetService that
> broadcast URIs.

No; the code directly above the newly added code checks for the parent process and follows a different code path.
(In reply to Andrea Marchesini [:baku] from comment #66)
> Created attachment 8797156 [details] [diff] [review]
> Change the use of AutoIPCStream
> 
> This is racy (but quite easy to reproduce): The reason why we crash is
> because AutoIPCStream, in the DTOR, starts pulling data from the
> nsIInputStream.
> We are here: SendStreamChildImpl::DoRead();
> 
> Now, the |while(true) { ... OnEnd()}| could be faster than the
> SendPBlobConstructor(). At this point we call SendClose() before creating
> the PBlob in the parent process.
> 
> We unregister the PSendStream and when the RecvPBlobConstructor is executed,
> the PSendStream doesn't exist anymore and we crash.
> 
> In this patch I postpone the DTOR of AutoIPCStream.

bkelly's away this week; it probably makes sense to find another reviewer. Maybe whoever originally reviewed the implementation of PSendStream, if that makes sense?
The patch in comment 66 fixes the problem in bug 1306979 comment 6 for me. We should get this reviewed since it's a major crash.
Comment on attachment 8797156 [details] [diff] [review]
Change the use of AutoIPCStream

Jed, can you review this since Ben is out?
Attachment #8797156 - Flags: review?(bkelly) → review?(jld)
Comment on attachment 8797156 [details] [diff] [review]
Change the use of AutoIPCStream

Review of attachment 8797156 [details] [diff] [review]:
-----------------------------------------------------------------

I needed to spend a little time remembering how AutoIPCStream's lifetimes are supposed to work, but I follow what's going on here.  r=me.  (It'd be nice if we had region types to help avoid use-after-destroy bugs like this....)
Attachment #8797156 - Flags: review?(jld) → review+
Keywords: leave-open
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/18cb108e1c62
AutoIPCStream DTOR must run after the use of the inputStream for PBlob, r=jld
Marco, can you please test it again when these 2 patches land on m-c? In particular I would like to have feedbacks about this second patch - locally it works fine :)
Thanks!
Attached patch crashtest.patchSplinter Review
Here your crashtest added.
Attachment #8797401 - Flags: review?(mcastelluccio)
Blocks: 1307416
https://hg.mozilla.org/mozilla-central/rev/18cb108e1c62
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Depends on: 1307742
Depends on: 1307462
Comment on attachment 8797401 [details] [diff] [review]
crashtest.patch

Review of attachment 8797401 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

(In reply to Andrea Marchesini [:baku] from comment #76)
> Marco, can you please test it again when these 2 patches land on m-c? In
> particular I would like to have feedbacks about this second patch - locally
> it works fine :)
> Thanks!

I've tested the new Nightly and it works for me as well.
Attachment #8797401 - Flags: review?(mcastelluccio) → review+
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd87fff79b4a
Backed out changeset c24253346eac for crashtest failures
Andrea, what do you suggest doing for Beta and Aurora? Could you list
the patches, from this bug and other bugs, that would need to be
uplifted in order to fix the bug?
Flags: needinfo?(amarchesini)
Crash Signature: [@ mozilla::ipc::ProcessLink::SendMessageW] [@ Abort | mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex ] [@ AsyncShutdownTimeout | quit-application-granted | SessionStore: flushing all windows ] → [@ mozilla::ipc::ProcessLink::SendMessageW] [@ Abort | mozalloc_abort | NS_DebugBreak | nsDebugImpl::Abort | NS_InvokeByIndex ] [@ AsyncShutdownTimeout | quit-application-granted | SessionStore: flushing all windows ] [@ OOM | large | mozalloc_abort | …
Comment on attachment 8775488 [details] [diff] [review]
part 1 - PContentChild vs nsIContentChild

Approval Request Comment
[Feature/regressing bug #]: bug 1279186
[User impact if declined]: a crash when a big memory blob is used to create a blob URL.
[Describe test coverage new/current, TreeHerder]: tested, but no test is added because the crasher.html cannot be added as crashtest
[Risks and why]: there are many patches but well tested.
[String/UUID change made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8775488 - Flags: approval-mozilla-beta?
Attachment #8775488 - Flags: approval-mozilla-aurora?
Comment on attachment 8791494 [details] [diff] [review]
part 2 - SlicedInputStream

Also this patch must be uplifted to aurora/beta
Attachment #8791495 - Flags: approval-mozilla-beta?
Attachment #8791495 - Flags: approval-mozilla-aurora?
Attachment #8797122 - Flags: approval-mozilla-beta?
Attachment #8797156 - Flags: approval-mozilla-beta?
Attachment #8797156 - Flags: approval-mozilla-aurora?
So, basically, for beta we can just uplift 8797122.
Aurora all the rest.
(In reply to Andrea Marchesini [:baku] from comment #88)
> So, basically, for beta we can just uplift 8797122.
> Aurora all the rest.

I would prefer to uplift only that which is needed and is verified to help. Will clear beta:? noms on all patches except 8797122.
Comment on attachment 8797122 [details] [diff] [review]
Disable broadcasting Blob URL if we have only 1 content process

As Baku suggested, let's uplift only this patch to Beta50.
Attachment #8797122 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/70abfe99097824fd510544b188f24c588fd6d5a0

I'm not quite sure which patches need uplifted to aurora. Could you clarify?
Flags: needinfo?(amarchesini)
Hi :baku,
Can you create the uplift requests for 51 aurora?
Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
Attachment #8797122 - Flags: approval-mozilla-aurora?
Crash Signature: | nsCOMPtr_base::assign_from_qi | nsTArray_Impl<T>::InsertElementsAt<T> | nsTArray_Impl<T>::SetLength<T> | mozilla::dom::`anonymous namespace''::BlobDataFromBlobImpl] → | nsCOMPtr_base::assign_from_qi | nsTArray_Impl<T>::InsertElementsAt<T> | nsTArray_Impl<T>::SetLength<T> | mozilla::dom::`anonymous namespace''::BlobDataFromBlobImpl] [@ mozilla::ipc::ProcessLink::SendMessageW | IPC_Message_Name=PContent::Msg_PBlobConst…
Comment on attachment 8775488 [details] [diff] [review]
part 1 - PContentChild vs nsIContentChild

Fix a crash and take suggestion from comment #88. Take it in 51 aurora.
Attachment #8775488 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8791494 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora+
Attachment #8791495 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8797122 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8797156 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I verified this on latest Nightly 52.0a1 (Build ID 20161109030210) using the instructions from comment 51 and comment 52 and no crash occurred during my testing. 
I verified on:
* Windows 10 x64 - Nightly x64 build and x32 build
* Windows 8.1 x64 - Nightly x64 build and x32 build
* Windows 7 x32 - Nightly x32 build
* Ubuntu 16.04 x64 - Nightly x64
* Mac 10.11 - Nightly x64
Reproduced the crash with Nightly 2016-09-21 using the links from comment 28 and comment 52.

The crashes no longer occur using Firefox 51 beta 2 under Win 10 64-bit, Ubuntu 14.04 32-bit and Mac OS X 10.11.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
The test originally landed in https://hg.mozilla.org/integration/mozilla-inbound/rev/c24253346eac appears to have been backed out and never relanded.  Should it be?
Flags: needinfo?(amarchesini)
I am still having this problem with the latest release (53.0):


mozilla::ipc::ProcessLink::SendMessage | IPC_Message_Name=PBackground::Msg_PBlobConstructor 

MOZ_CRASH(IPC message size is too large)

OS X 10.12

10.12.4 16E195

amd64


It says "verified fixed" at the top, but it does not seem to be actually fixed in the current release?
Can you file a new bug, explaining how you can reproduce the problem?
Flags: needinfo?(keean)
This bug is already gone in nightly because of the pblob refactoring: bug 1353629
Flags: needinfo?(amarchesini)
Which version is that refactoring due for release in (is it 54.0) ?
Flags: needinfo?(keean)
55. Some of the pblob re-factoring is still landing.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.