Last Comment Bug 764171 - crash in nsStorageInputStream::ReadSegments @ _VEC_memcpy
: crash in nsStorageInputStream::ReadSegments @ _VEC_memcpy
Status: VERIFIED FIXED
: crash, regression, topcrash
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 15 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla16
Assigned To: Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
:
Mentors:
Depends on: 770243
Blocks: 539040 746018 769565 774527
  Show dependency treegraph
 
Reported: 2012-06-12 14:44 PDT by Scoobidiver (away)
Modified: 2012-10-02 05:34 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
disabled
+
verified


Attachments
Fix crash in nsStorageStream by avoiding pre-buffering for memory cache items (20.45 KB, patch)
2012-06-19 14:58 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
Fix crash in nsStorageStream by avoiding pre-buffering for items we will validate (20.20 KB, patch)
2012-06-20 14:25 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
Fix crash in nsStorageStream by avoiding pre-buffering for items we will validate [v2] (18.71 KB, patch)
2012-06-21 22:31 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
honzab.moz: review+
brian: checkin+
Details | Diff | Review

Description Scoobidiver (away) 2012-06-12 14:44:31 PDT
It's #21 top browser crasher in 15.0a2 and #40 in 16.0a1.
It first appeared in 15.0a1/20120601. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3aa566994890&tochange=73783bf75c4c
It might be a regression from bug Bug 722034 or bug 746018.

The first frames of the stack are:
Frame 	Module 	Signature 	Source
0 	msvcr100.dll 	_VEC_memcpy 	
1 	xul.dll 	nsStorageInputStream::ReadSegments 	xpcom/io/nsStorageStream.cpp:419
2 	xul.dll 	nsPNGEncoder::Read 	image/encoders/png/nsPNGEncoder.cpp:509
3 	xul.dll 	nsCacheEntryDescriptor::nsInputStreamWrapper::Read 	netwerk/cache/nsCacheEntryDescriptor.cpp:577
4 	xul.dll 	nsInputStreamTransport::Read 	netwerk/base/src/nsStreamTransportService.cpp:198
5 	xul.dll 	nsStreamCopierOB::FillOutputBuffer 	xpcom/io/nsStreamUtils.cpp:529
6 	xul.dll 	nsPipeOutputStream::WriteSegments 	xpcom/io/nsPipe3.cpp:1104

or

Frame 	Module 	Signature 	Source
0 	msvcr100.dll 	_VEC_memcpy 	
1 	xul.dll 	nsStorageInputStream::ReadSegments 	xpcom/io/nsStorageStream.cpp:419
2 	xul.dll 	nsPipeInputStream::Read 	xpcom/io/nsPipe3.cpp:794
3 	xul.dll 	nsCacheEntryDescriptor::nsInputStreamWrapper::Read 	netwerk/cache/nsCacheEntryDescriptor.cpp:577
4 	xul.dll 	nsInputStreamTransport::Read 	netwerk/base/src/nsStreamTransportService.cpp:198
5 	xul.dll 	nsStreamCopierOB::FillOutputBuffer 	xpcom/io/nsStreamUtils.cpp:529
6 	xul.dll 	nsPipeOutputStream::WriteSegments 	xpcom/io/nsPipe3.cpp:1104

More reports at:
https://crash-stats.mozilla.com/report/list?signature=_VEC_memcpy+|+nsStorageInputStream%3A%3AReadSegments%28unsigned+int+%28*%29%28nsIInputStream*%2C+void*%2C+char+const*%2C+unsigned+int%2C+unsigned+int%2C+unsigned+int*%29%2C+void*%2C+unsigned+int%2C+unsigned+int*%29
Comment 1 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-13 13:48:31 PDT
I agree that this is either caused by the patches for bug 746018 or the patches for bug 722034.
Comment 2 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-19 14:58:45 PDT
Created attachment 634607 [details] [diff] [review]
Fix crash in nsStorageStream by avoiding pre-buffering for memory cache items

First, I have not reproduced the crash yet.

My initial guess was that this was due to a race condition between mCacheAsyncInputStream.CloseAndRelease() and the pre-buffering that we were doing. However, nsStorageInputStream::Close() has an implementation that is very simple that might not be particularly thread-safe, but it wouldn't cause a crash by itself.

We are doing the pre-buffering--calling Read() or ReadSegments()--on stream-transport-service threads. If we attempt revalidation, and the revalidation fails (we get a non-304 response), we will replace the cached response entity with the one from the network response. Although we call CloseAndRelease() on the input stream before we open the output stream for writing the new entity, closing the stream transport service input stream will not cause any currently-executing or imminently-executing I/O to stop right away. Consequently, we could end up calling nsStorageInputStream::ReadSegments() and nsStorageOutputStream::WriteSegments() at the same time. Both streams share common state (they share a nsStorageStream), unprotected by locks, and I think the WriteSegments() and ReadSegments calls will race with each other, leading to this crash.

There are two purposes of the pre-buffering: (1) We want to open the cache entry input stream (nsCacheEntryDescriptor::OpenInputStream) on a background thread because it takes the cache service lock, and (2) We want to attach and open the stream transport service input stream ASAP so that we can read the data off the disk in the background while we wait for the event we queued on the main thread to be processed and while we wait for the network response (hopefully a 304) to come back.

This patch makes it so that, for memory cache entries, we do (1) but not (2). (2) is unnecessary because memory cache entries are already in memory, so we don't have to worry about blocking on disk I/O.

There is no change for other kinds of cache entries (disk and offline cache entries) because, AFAICT, we should be able to safely read and write to such entries concurrently; the reader may end up getting garbled data, but that is OK because we are throwing away everything the reader read anyway.
Comment 3 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-19 22:17:40 PDT
Comment on attachment 634607 [details] [diff] [review]
Fix crash in nsStorageStream by avoiding pre-buffering for memory cache items

Review of attachment 634607 [details] [diff] [review]:
-----------------------------------------------------------------

This is not good enough. I misread nsDiskCacheDevice::OpenOutputStreamForEntry and it has a similar problem--not a crash, but it will fail to open the output stream if an input stream is open.
Comment 4 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-20 14:25:42 PDT
Created attachment 635066 [details] [diff] [review]
Fix crash in nsStorageStream by avoiding pre-buffering for items we will validate

This patch completely disables the pre-buffering during validation, but keeps the pre-buffering for cache hits, which is a much smaller win.

I will re-open bug 746018 if/when we check this in, so that we will do the pre-buffering for entries we validate with the server. It is definitely a good idea, but I think it requires changing some internals of the cache, and maybe even the cache API.
Comment 5 Honza Bambas (:mayhemer) 2012-06-21 14:23:39 PDT
Comment on attachment 635066 [details] [diff] [review]
Fix crash in nsStorageStream by avoiding pre-buffering for items we will validate

Review of attachment 635066 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +2911,5 @@
>  
>      nsresult rv = CheckCache();
> +    if (NS_FAILED(rv)) {
> +        NS_WARNING("CheckCache failed");
> +    } else if (mCacheInputStream) {

Talked with brian on IRC, this condition should actually be:

if (mCacheInputStream && mCachedContentIsValid)

If that is the only change, then we can get this in.

@@ +3388,5 @@
>      nsCOMPtr<nsIInputStream> stream;
> +    rv = mCacheEntry->OpenInputStream(0, getter_AddRefs(stream));
> +    mCacheInputStream.takeOver(stream);
> +    return rv;
> +

Blank line

@@ +3417,2 @@
>      if (NS_SUCCEEDED(rv)) {
> +        stream = nsnull;

Hmm.. so this prevents the |stream| =the cache stream, from being closed bellow in this method, right?  Is that correct?  Can you comment why are you nullifying it here?  I think you can leave the stream open indefinitely this way...
Comment 6 Honza Bambas (:mayhemer) 2012-06-21 16:22:08 PDT
Comment on attachment 635066 [details] [diff] [review]
Fix crash in nsStorageStream by avoiding pre-buffering for items we will validate

Clearing my r q.
Comment 7 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-21 22:31:39 PDT
Created attachment 635633 [details] [diff] [review]
Fix crash in nsStorageStream by avoiding pre-buffering for items we will validate [v2]

Thanks for pointing out those problems Honza.

There was another problem: with the patch, if StartBufferingCachedEntity() failed, then we ended up being in a bad state.

I reverted the control flow so that it stays mostly the same.

Also, it seems like AutoClose.h somehow got some Windows line endings in it before, and this patch corrects that (unintentionally).

Here is the tryserver run:
https://tbpl.mozilla.org/?tree=Try&rev=4a5367047d11
Comment 8 Marcia Knous [:marcia - use ni] 2012-06-25 06:11:19 PDT
As far as top crash status, _VEC_memcpy | nsStorageInputStream::ReadSegments(unsigned int (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)ranks as #16 crash on Aurora in the last week and #28 on the trunk..
Comment 9 Honza Bambas (:mayhemer) 2012-06-26 08:48:47 PDT
Comment on attachment 635633 [details] [diff] [review]
Fix crash in nsStorageStream by avoiding pre-buffering for items we will validate [v2]

Review of attachment 635633 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +3213,5 @@
>      // If there's any possibility that we may use the cached response's entity
>      // then start reading it into memory now. If we have to revalidate the
>      // entry and the revalidation fails, this will be a wasted effort, but
>      // it is much more likely that either we don't need to revalidate the entry
>      // or the entry will successfully revalidate.

You may need to update this comment.

@@ +3386,5 @@
> +        // We do not connect the stream to the stream transport service if we
> +        // have to validate the entry with the server. If we did, we would get
> +        // into a race condition between the stream transport service threads
> +        // and the opening of the cache entry's opening stream in the case where
> +        // we get a non-304 response.

Check wording of the comment (the last sentence).
Comment 10 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-28 15:32:07 PDT
https://hg.mozilla.org/mozilla-central/rev/1f695f10fa11

I fixed the comments pointed out in Honza's review.
Comment 11 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-07-03 15:42:04 PDT
I asked for the change that caused this regression to be backed out of aurora in bug 722034.
Comment 12 Sheila Mooney 2012-07-03 22:40:09 PDT
I don't see any of these on the trunk since build id 20120628065213.
Comment 13 Scoobidiver (away) 2012-07-03 23:49:28 PDT
There's one crash after the patch landed: bp-46752fe2-09ff-42dc-b5eb-656f32120701.
Comment 14 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-07-10 09:32:50 PDT
Another crash with build ID 20120708030543:
https://crash-stats.mozilla.com/report/index/69a3a3e6-13c1-4238-acea-4ec722120710

Oddly, the stack traces for some of these crashes are incomplete--they do not show who is calling nsStorageInputStream::ReadSegments. Any clues as to why this might be the case? Is this what happens when the caller is a plugin or extension?
Comment 15 Scoobidiver (away) 2012-07-11 02:14:50 PDT
You should uplift the current patch to Aurora as it's #13 top browser crasher in 15.0a2.
File a new bug for the remaining crashes.
Comment 16 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-07-16 16:37:16 PDT
The patch causing the spike in crashes got backed out of Aurora 15 in bug 722034 comment 79.
Comment 17 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-07-16 16:48:51 PDT
(In reply to Scoobidiver from comment #13)
> There's one crash after the patch landed:
> bp-46752fe2-09ff-42dc-b5eb-656f32120701.

There are also crashes in 14.0 and 13.0 which indicates there are likely multiple causes of crashes with this signature. It seems like the patch for 16 and backout for 15 did reduce the rate of crashing in this signature back down to the previous levels so I think it makes sense to leave this "fixed." I filed bug 774527 for investigating and fixing the other cause(s) of crashes with this signature.
Comment 18 Paul Silaghi, QA [:pauly] 2012-09-17 09:12:59 PDT
There are 10 crashes on FF 16.0b3.
Comment 19 Scoobidiver (away) 2012-09-17 09:23:56 PDT
(In reply to Paul Silaghi [QA] from comment #18)
> There are 10 crashes on FF 16.0b3.
It's bug 774527.
Comment 20 Paul Silaghi, QA [:pauly] 2012-10-02 05:34:23 PDT
Verified fixed on FF 16 based on comment 19.

Note You need to log in before you can comment on or make changes to this bug.