Last Comment Bug 712277 - Crash in nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Close @ MOZ_Z_adler32
: Crash in nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Close @ MOZ_Z...
Status: RESOLVED FIXED
[qa-]
: crash, regression
Product: Core
Classification: Components
Component: Networking: Cache (show other bugs)
: 9 Branch
: All Windows 7
: -- critical (vote)
: mozilla12
Assigned To: Geoff Brown [:gbrown]
:
: Patrick McManus [:mcmanus]
Mentors:
Depends on: 715198
Blocks: 648429
  Show dependency treegraph
 
Reported: 2011-12-20 06:00 PST by Scoobidiver (away)
Modified: 2012-03-08 11:19 PST (History)
11 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
patch to ensure avail_in set to 0 before exiting nsCompressOutputStreamWrapper::Write (1.30 KB, patch)
2012-01-05 17:01 PST, Geoff Brown [:gbrown]
no flags Details | Diff | Splinter Review
patch to close z-stream on error during Write (1.39 KB, patch)
2012-01-12 23:03 PST, Geoff Brown [:gbrown]
michal.novotny: review+
Details | Diff | Splinter Review

Description Scoobidiver (away) 2011-12-20 06:00:55 PST
It's a new crash signature that first appeared in 11.0a1/20111218.
The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cbb0233c7ba8&tochange=a5e63e00db27
It's probably a regression from bug 648429.

Signature 	MOZ_Z_adler32 More Reports Search
UUID	5b593ba4-aa73-46c5-bba3-ad3512111220
Date Processed	2011-12-20 01:03:57.81488
Uptime	7239
Last Crash	3.3 weeks before submission
Install Age	9.6 hours since version was first installed.
Install Time	2011-12-19 23:25:35
Product	Firefox
Version	11.0a1
Build ID	20111219031209
Release Channel	nightly
OS	Windows NT
OS Version	5.1.2600 Service Pack 3
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 15 model 28 stepping 0
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0xc25fc9c
App Notes 	AdapterVendorID: 0x1002, AdapterDeviceID: 0x5960, AdapterSubsysID: 0130174b, AdapterDriverVersion: 6.14.10.6462
D3D10 Layers? D3D10 Layers-
D3D9 Layers? D3D9 Layers-
EMCheckCompatibility	True

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	MOZ_Z_adler32 	modules/zlib/src/adler32.c:115
1 	xul.dll 	fill_window 	modules/zlib/src/deflate.c:1380
2 	xul.dll 	deflate_slow 	modules/zlib/src/deflate.c:1631
3 	xul.dll 	MOZ_Z_deflate 	modules/zlib/src/deflate.c:822
4 	ntdll.dll 	RtlpFreeDebugInfo 	
5 	ntdll.dll 	RtlpFreeDebugInfo 	
6 	xul.dll 	nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Close 	netwerk/cache/nsCacheEntryDescriptor.cpp:941
7 	user32.dll 	GetShellWindow 	
8 	ntdll.dll 	_SEH_epilog 	
9 	ntdll.dll 	RtlpFreeDebugInfo 	
10 	ntdll.dll 	RtlpFreeDebugInfo 	
11 	ntdll.dll 	RtlDeleteCriticalSection 	
12 	ntdll.dll 	RtlDeleteCriticalSection 	
13 	mozutils.dll 	je_free 	memory/jemalloc/jemalloc.c:6580
14 	mozutils.dll 	je_free 	memory/jemalloc/jemalloc.c:6580
15 	xul.dll 	nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::`scalar deleting destructor' 	
16 	xul.dll 	nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Release 	netwerk/cache/nsCacheEntryDescriptor.cpp:876
17 	xul.dll 	nsStreamListenerTee::OnStopRequest 	netwerk/base/src/nsStreamListenerTee.cpp:70
18 	xul.dll 	nsHttpChannel::OnStopRequest 	netwerk/protocol/http/nsHttpChannel.cpp:4341

More reports at:
https://crash-stats.mozilla.com/report/list?signature=MOZ_Z_adler32
Comment 1 Geoff Brown [:gbrown] 2012-01-03 17:05:33 PST
Looking over the crash stats, I note that there are a few modules/zlib/src/adler32.c:115 crashes that happened before cache compression (bug 648429), but these have been very infrequent. The overwhelming majority of these crashes are now related to cache compression (during destruction of nsCompressOutputStreamWrapper). Is cache compression using zlib much more frequently than zlib was being used by other clients, or is cache compression using zlib incorrectly, or in a manner that antagonizes a zlib problem?

All of these crashes are exclusively on Windows.

The crashing zlib code is simply accessing the deflation buffer -- I don't see a problem there. In the cache compression cases, the crash is always during destruction of the nsCompressOutputStreamWrapper, which owns the deflation buffer, so I am suspicious that the buffer is being deleted before deflation is complete, but I don't immediately see how.
Comment 2 Geoff Brown [:gbrown] 2012-01-03 17:41:18 PST
Slight revision to my previous comment: The crashing zlib code is *reading* a z-stream buffer. This is likely the nsCompressOutputStreamWrapper's mZstream.next_in buffer, which should be set to the buffer passed in the most recent call to nsCompressOutputStreamWrapper::Write. I don't recall who owns that buffer or when it is deleted.
Comment 3 Geoff Brown [:gbrown] 2012-01-05 16:45:55 PST
I do not have a Windows development environment and I cannot reproduce the crash in a Linux environment. If anyone with a Windows environment would like to help, please let me know.

As noted in comment 2, stack traces indicate a crash in adler32 reading the z-stream next_in buffer; this always occurs while running  nsCompressOutputStreamWrapper::Close(). During nsCompressOutputStreamWrapper::Close, the z-stream next_in buffer should not be accessed: next_in is set in nsCompressOutputStreamWrapper::Write to point to the buffer passed to the output stream -- after Write completes, the buffer may no longer be valid. Write() contains a loop which should read and compress the entire next_in buffer, leaving the z-stream avail_in count = 0 -- an indication that next_in should no longer be accessed.

So the z-stream should never have a non-zero avail_in count except while Write is executing -- avail_in should be 0 when Close is called. My (Linux) testing confirms this to be the case.

The crashing adler32 code is called from fill_window() -- http://mxr.mozilla.org/mozilla-central/source/modules/zlib/src/deflate.c#1380 and this code is not executed if the stream's avail_in == 0: http://mxr.mozilla.org/mozilla-central/source/modules/zlib/src/deflate.c#1365. So it seems that, when the crash occurs, avail_in is not 0, and Close is being executed.

I note that the Write() deflation loop exits prematurely in the case of an unexpected error, and in these cases, avail_in might be non-zero:
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsCacheEntryDescriptor.cpp#913
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsCacheEntryDescriptor.cpp#922
I have no evidence that this is the cause of the crash, but it seems the most obvious answer.

Less likely possibilities that might explain the crash:
 - Write and Close are executing simultaneously, on separate threads
 - the z-stream has been corrupted between calls to Write and Close
Comment 4 Geoff Brown [:gbrown] 2012-01-05 17:01:24 PST
Created attachment 586280 [details] [diff] [review]
patch to ensure avail_in set to 0 before exiting nsCompressOutputStreamWrapper::Write
Comment 5 Robert Kaiser 2012-01-10 09:49:09 PST
This crash is #9 on 11 right now, we should track it for that version, I think.
Comment 6 Brad Lassey [:blassey] (use needinfo?) 2012-01-11 13:40:16 PST
Michal, review ping?
Comment 7 Michal Novotny (:michal) 2012-01-11 18:15:54 PST
Comment on attachment 586280 [details] [diff] [review]
patch to ensure avail_in set to 0 before exiting nsCompressOutputStreamWrapper::Write

>    if (mStreamInitialized) {
>        // complete compression of any data remaining in the zlib stream
>        do {
>            zerr = deflate(&mZstream, Z_FINISH);
>            rv = WriteBuffer();
>        } while (zerr == Z_OK && rv == NS_OK);
>        deflateEnd(&mZstream);
>    }

Wouldn't it be better to handle this issue in nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Close()? I don't think that we want to call either deflate() or WriteBuffer() if nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Write() fails.

By the way, we should doom the entry if Write() fails because the entry is invalid. Uncompressed entries suffer from this problem as well.
Comment 8 Geoff Brown [:gbrown] 2012-01-12 23:03:09 PST
Created attachment 588327 [details] [diff] [review]
patch to close z-stream on error during Write

I checked the zlib sample code for similar error handling and agree: on error, there is no need to call deflate/WriteBuffer -- just deflateEnd. In this patch, deflateEnd is called on error, and the existing mStreamInitialized flag is cleared so that the z-stream will not be touched again (in Close). Now there is no need to change avail_in -- we're saying that the entire z-stream should not be accessed further.

I considered setting a new flag on write errors, then implementing different Close behaviour in response to the flag, but handling the error immediately in Write seems more direct.

Regarding dooming invalid entries, the behaviour is the same for compressed and uncompressed streams (as you note). Also, I have tested the error cases and found that the cache entry is abbreviated, wasting space for an invalid/incomplete entry, but the page is displayed correctly. In this light, I would prefer to keep this patch focused on the crash and handle the dooming issue in a separate bug.
Comment 9 Geoff Brown [:gbrown] 2012-01-12 23:04:57 PST
https://tbpl.mozilla.org/?tree=Try&rev=a3f0b97cccad
Comment 10 Michal Novotny (:michal) 2012-01-16 16:22:27 PST
Comment on attachment 588327 [details] [diff] [review]
patch to close z-stream on error during Write

If writing fails and the caller ignores the error and calls Write() again, then mZStream will be initialized again, right? Do we really want to allow this?
Comment 11 Geoff Brown [:gbrown] 2012-01-19 10:40:14 PST
(In reply to Michal Novotny (:michal) from comment #10)
> Comment on attachment 588327 [details] [diff] [review]
> patch to close z-stream on error during Write
> 
> If writing fails and the caller ignores the error and calls Write() again,
> then mZStream will be initialized again, right? Do we really want to allow
> this?

I think it is okay to allow this: mZstream has been closed/de-initialized (via deflateEnd) and is ready for re-initialization -- using it this way should be fine from the zlib perspective. The file contents will be corrupt (a new z-stream following an incomplete z-stream)...but that's no worse than they already were (an incomplete z-stream).

An alternative would be to set a new flag on error, mWriteError, and check at the start of ::Write(): if(mWriteError) return FAILED. Does that provide a gain?
Comment 12 Geoff Brown [:gbrown] 2012-01-22 19:39:46 PST
The number of crashes has decreased significantly with check-in of the patch disabling cache compression: See bug 715198.

(Of course I still want to proceed with review/check-in of this patch.)
Comment 13 Scoobidiver (away) 2012-01-23 01:02:21 PST
(In reply to Geoff Brown [:gbrown] from comment #12)
> The number of crashes has decreased significantly with check-in of the patch
> disabling cache compression: See bug 715198.
It's more than that as there have been no crashes in 12.0a1 since 12.0a1/20120118. I think it should be marked as a dupe of bug 715198:
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A12.0a1&query_search=signature&query_type=contains&query=MOZ_Z_adler32&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=MOZ_Z_adler32
Comment 14 Michal Novotny (:michal) 2012-01-23 02:25:12 PST
Comment on attachment 588327 [details] [diff] [review]
patch to close z-stream on error during Write

(In reply to Geoff Brown [:gbrown] from comment #11)
> An alternative would be to set a new flag on error, mWriteError, and check
> at the start of ::Write(): if(mWriteError) return FAILED. Does that provide
> a gain?

Let's solve this in a follow up bug.
Comment 16 Ed Morley [:emorley] 2012-01-25 18:08:37 PST
https://hg.mozilla.org/mozilla-central/rev/eb21d0156496
Comment 17 Alex Keybl [:akeybl] 2012-02-17 15:16:50 PST
Are we comfortable enough with this fix to consider uplifting to Firefox 11 for Beta 4 (going to build 2/21)?
Comment 18 Geoff Brown [:gbrown] 2012-02-17 16:03:41 PST
(In reply to Alex Keybl [:akeybl] from comment #17)
> Are we comfortable enough with this fix to consider uplifting to Firefox 11
> for Beta 4 (going to build 2/21)?

I have no concerns about this patch -- I don't see how it could do harm. I am unsure of how effective this patch is because we disabled cache compression first: I am comfortable with an uplift to 11, but unsure of the benefit.
Comment 20 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-05 16:02:58 PST
What can be done to verify this fix?
Comment 21 Geoff Brown [:gbrown] 2012-03-05 19:22:33 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #20)
> What can be done to verify this fix?

The crash only affects the cache compression feature, and cache compression is now disabled. Cache compression can be enabled by changing the pref browser.cache.compression_level=5. 

If the crash can be reproduced with compression enabled and the fix applied, then the fix is ineffective. If the crash cannot be reproduced with compression enabled and the fix applied, some doubt will remain since we have not identified particular steps to reproduce this crash.
Comment 22 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-03-08 11:19:34 PST
Marking this qa- since we don't have a reproducible case. If someone can reproduce this, please verify the fix (or provide QA with the case so we can verify it).

Note You need to log in before you can comment on or make changes to this bug.