Closed Bug 712277 Opened 12 years ago Closed 12 years ago

Crash in nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Close @ MOZ_Z_adler32

Categories

(Core :: Networking: Cache, defect)

9 Branch
All
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla12
Tracking Status
firefox11 + fixed

People

(Reporter: scoobidiver, Assigned: gbrown)

References

Details

(Keywords: crash, regression, Whiteboard: [qa-])

Crash Data

Attachments

(1 file, 1 obsolete file)

It's a new crash signature that first appeared in 11.0a1/20111218.
The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cbb0233c7ba8&tochange=a5e63e00db27
It's probably a regression from bug 648429.

Signature 	MOZ_Z_adler32 More Reports Search
UUID	5b593ba4-aa73-46c5-bba3-ad3512111220
Date Processed	2011-12-20 01:03:57.81488
Uptime	7239
Last Crash	3.3 weeks before submission
Install Age	9.6 hours since version was first installed.
Install Time	2011-12-19 23:25:35
Product	Firefox
Version	11.0a1
Build ID	20111219031209
Release Channel	nightly
OS	Windows NT
OS Version	5.1.2600 Service Pack 3
Build Architecture	x86
Build Architecture Info	AuthenticAMD family 15 model 28 stepping 0
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0xc25fc9c
App Notes 	AdapterVendorID: 0x1002, AdapterDeviceID: 0x5960, AdapterSubsysID: 0130174b, AdapterDriverVersion: 6.14.10.6462
D3D10 Layers? D3D10 Layers-
D3D9 Layers? D3D9 Layers-
EMCheckCompatibility	True

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	MOZ_Z_adler32 	modules/zlib/src/adler32.c:115
1 	xul.dll 	fill_window 	modules/zlib/src/deflate.c:1380
2 	xul.dll 	deflate_slow 	modules/zlib/src/deflate.c:1631
3 	xul.dll 	MOZ_Z_deflate 	modules/zlib/src/deflate.c:822
4 	ntdll.dll 	RtlpFreeDebugInfo 	
5 	ntdll.dll 	RtlpFreeDebugInfo 	
6 	xul.dll 	nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Close 	netwerk/cache/nsCacheEntryDescriptor.cpp:941
7 	user32.dll 	GetShellWindow 	
8 	ntdll.dll 	_SEH_epilog 	
9 	ntdll.dll 	RtlpFreeDebugInfo 	
10 	ntdll.dll 	RtlpFreeDebugInfo 	
11 	ntdll.dll 	RtlDeleteCriticalSection 	
12 	ntdll.dll 	RtlDeleteCriticalSection 	
13 	mozutils.dll 	je_free 	memory/jemalloc/jemalloc.c:6580
14 	mozutils.dll 	je_free 	memory/jemalloc/jemalloc.c:6580
15 	xul.dll 	nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::`scalar deleting destructor' 	
16 	xul.dll 	nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Release 	netwerk/cache/nsCacheEntryDescriptor.cpp:876
17 	xul.dll 	nsStreamListenerTee::OnStopRequest 	netwerk/base/src/nsStreamListenerTee.cpp:70
18 	xul.dll 	nsHttpChannel::OnStopRequest 	netwerk/protocol/http/nsHttpChannel.cpp:4341

More reports at:
https://crash-stats.mozilla.com/report/list?signature=MOZ_Z_adler32
Looking over the crash stats, I note that there are a few modules/zlib/src/adler32.c:115 crashes that happened before cache compression (bug 648429), but these have been very infrequent. The overwhelming majority of these crashes are now related to cache compression (during destruction of nsCompressOutputStreamWrapper). Is cache compression using zlib much more frequently than zlib was being used by other clients, or is cache compression using zlib incorrectly, or in a manner that antagonizes a zlib problem?

All of these crashes are exclusively on Windows.

The crashing zlib code is simply accessing the deflation buffer -- I don't see a problem there. In the cache compression cases, the crash is always during destruction of the nsCompressOutputStreamWrapper, which owns the deflation buffer, so I am suspicious that the buffer is being deleted before deflation is complete, but I don't immediately see how.
Slight revision to my previous comment: The crashing zlib code is *reading* a z-stream buffer. This is likely the nsCompressOutputStreamWrapper's mZstream.next_in buffer, which should be set to the buffer passed in the most recent call to nsCompressOutputStreamWrapper::Write. I don't recall who owns that buffer or when it is deleted.
I do not have a Windows development environment and I cannot reproduce the crash in a Linux environment. If anyone with a Windows environment would like to help, please let me know.

As noted in comment 2, stack traces indicate a crash in adler32 reading the z-stream next_in buffer; this always occurs while running  nsCompressOutputStreamWrapper::Close(). During nsCompressOutputStreamWrapper::Close, the z-stream next_in buffer should not be accessed: next_in is set in nsCompressOutputStreamWrapper::Write to point to the buffer passed to the output stream -- after Write completes, the buffer may no longer be valid. Write() contains a loop which should read and compress the entire next_in buffer, leaving the z-stream avail_in count = 0 -- an indication that next_in should no longer be accessed.

So the z-stream should never have a non-zero avail_in count except while Write is executing -- avail_in should be 0 when Close is called. My (Linux) testing confirms this to be the case.

The crashing adler32 code is called from fill_window() -- http://mxr.mozilla.org/mozilla-central/source/modules/zlib/src/deflate.c#1380 and this code is not executed if the stream's avail_in == 0: http://mxr.mozilla.org/mozilla-central/source/modules/zlib/src/deflate.c#1365. So it seems that, when the crash occurs, avail_in is not 0, and Close is being executed.

I note that the Write() deflation loop exits prematurely in the case of an unexpected error, and in these cases, avail_in might be non-zero:
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsCacheEntryDescriptor.cpp#913
http://mxr.mozilla.org/mozilla-central/source/netwerk/cache/nsCacheEntryDescriptor.cpp#922
I have no evidence that this is the cause of the crash, but it seems the most obvious answer.

Less likely possibilities that might explain the crash:
 - Write and Close are executing simultaneously, on separate threads
 - the z-stream has been corrupted between calls to Write and Close
Assignee: nobody → gbrown
This crash is #9 on 11 right now, we should track it for that version, I think.
Michal, review ping?
Comment on attachment 586280 [details] [diff] [review]
patch to ensure avail_in set to 0 before exiting nsCompressOutputStreamWrapper::Write

>    if (mStreamInitialized) {
>        // complete compression of any data remaining in the zlib stream
>        do {
>            zerr = deflate(&mZstream, Z_FINISH);
>            rv = WriteBuffer();
>        } while (zerr == Z_OK && rv == NS_OK);
>        deflateEnd(&mZstream);
>    }

Wouldn't it be better to handle this issue in nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Close()? I don't think that we want to call either deflate() or WriteBuffer() if nsCacheEntryDescriptor::nsCompressOutputStreamWrapper::Write() fails.

By the way, we should doom the entry if Write() fails because the entry is invalid. Uncompressed entries suffer from this problem as well.
Attachment #586280 - Flags: review?(michal.novotny)
I checked the zlib sample code for similar error handling and agree: on error, there is no need to call deflate/WriteBuffer -- just deflateEnd. In this patch, deflateEnd is called on error, and the existing mStreamInitialized flag is cleared so that the z-stream will not be touched again (in Close). Now there is no need to change avail_in -- we're saying that the entire z-stream should not be accessed further.

I considered setting a new flag on write errors, then implementing different Close behaviour in response to the flag, but handling the error immediately in Write seems more direct.

Regarding dooming invalid entries, the behaviour is the same for compressed and uncompressed streams (as you note). Also, I have tested the error cases and found that the cache entry is abbreviated, wasting space for an invalid/incomplete entry, but the page is displayed correctly. In this light, I would prefer to keep this patch focused on the crash and handle the dooming issue in a separate bug.
Attachment #586280 - Attachment is obsolete: true
Attachment #588327 - Flags: review?(michal.novotny)
Comment on attachment 588327 [details] [diff] [review]
patch to close z-stream on error during Write

If writing fails and the caller ignores the error and calls Write() again, then mZStream will be initialized again, right? Do we really want to allow this?
(In reply to Michal Novotny (:michal) from comment #10)
> Comment on attachment 588327 [details] [diff] [review]
> patch to close z-stream on error during Write
> 
> If writing fails and the caller ignores the error and calls Write() again,
> then mZStream will be initialized again, right? Do we really want to allow
> this?

I think it is okay to allow this: mZstream has been closed/de-initialized (via deflateEnd) and is ready for re-initialization -- using it this way should be fine from the zlib perspective. The file contents will be corrupt (a new z-stream following an incomplete z-stream)...but that's no worse than they already were (an incomplete z-stream).

An alternative would be to set a new flag on error, mWriteError, and check at the start of ::Write(): if(mWriteError) return FAILED. Does that provide a gain?
The number of crashes has decreased significantly with check-in of the patch disabling cache compression: See bug 715198.

(Of course I still want to proceed with review/check-in of this patch.)
(In reply to Geoff Brown [:gbrown] from comment #12)
> The number of crashes has decreased significantly with check-in of the patch
> disabling cache compression: See bug 715198.
It's more than that as there have been no crashes in 12.0a1 since 12.0a1/20120118. I think it should be marked as a dupe of bug 715198:
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A12.0a1&query_search=signature&query_type=contains&query=MOZ_Z_adler32&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=MOZ_Z_adler32
Comment on attachment 588327 [details] [diff] [review]
patch to close z-stream on error during Write

(In reply to Geoff Brown [:gbrown] from comment #11)
> An alternative would be to set a new flag on error, mWriteError, and check
> at the start of ::Write(): if(mWriteError) return FAILED. Does that provide
> a gain?

Let's solve this in a follow up bug.
Attachment #588327 - Flags: review?(michal.novotny) → review+
Depends on: 715198
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/eb21d0156496
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Are we comfortable enough with this fix to consider uplifting to Firefox 11 for Beta 4 (going to build 2/21)?
(In reply to Alex Keybl [:akeybl] from comment #17)
> Are we comfortable enough with this fix to consider uplifting to Firefox 11
> for Beta 4 (going to build 2/21)?

I have no concerns about this patch -- I don't see how it could do harm. I am unsure of how effective this patch is because we disabled cache compression first: I am comfortable with an uplift to 11, but unsure of the benefit.
What can be done to verify this fix?
Whiteboard: [qa?]
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #20)
> What can be done to verify this fix?

The crash only affects the cache compression feature, and cache compression is now disabled. Cache compression can be enabled by changing the pref browser.cache.compression_level=5. 

If the crash can be reproduced with compression enabled and the fix applied, then the fix is ineffective. If the crash cannot be reproduced with compression enabled and the fix applied, some doubt will remain since we have not identified particular steps to reproduce this crash.
Marking this qa- since we don't have a reproducible case. If someone can reproduce this, please verify the fix (or provide QA with the case so we can verify it).
Whiteboard: [qa?] → [qa-]
You need to log in before you can comment on or make changes to this bug.