Closed Bug 1239789 Opened 8 years ago Closed 8 years ago

crash in zxx_stream::Alloc

Categories

(Core :: mozglue, defect)

Unspecified
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- fixed
firefox48 --- fixed
fennec 46+ ---

People

(Reporter: snorp, Assigned: droeh)

Details

(Keywords: crash)

Crash Data

Attachments

(3 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-80a6baf9-8677-4797-b5df-7836f2160110.
=============================================================

Looks like perhaps a null pointer for the zxx_stream in Alloc()?
glandium what do you think?
Flags: needinfo?(mh+mozilla)
The crash is on
http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l114
which means zlib is requesting an unexpected buffer size.
Flags: needinfo?(mh+mozilla)
Ah, forgot you have to dig deeper for line numbers sometimes. Alright, let's add some logging or something in that case so we can add the new buffer. I guess someone is shipping a weird gzip.
Assignee: nobody → snorp
Comment on attachment 8708449 [details] [diff] [review]
Allow for gzip window allocations that are smaller than the maximum

Review of attachment 8708449 [details] [diff] [review]:
-----------------------------------------------------------------

Considering the zlib caller is the allocation of the state buffer, that would suggest both buffers are already allocated. The question becomes why are both buffers allocated and yet a new zlib call wants to allocate them? Especially when the inflateInit2 and inflateEnd calls are in mutex-protected section.
Attachment #8708449 - Flags: review?(mh+mozilla) → review-
(In reply to Mike Hommey [:glandium] from comment #5)
> Comment on attachment 8708449 [details] [diff] [review]
> Allow for gzip window allocations that are smaller than the maximum
> 
> Review of attachment 8708449 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Considering the zlib caller is the allocation of the state buffer, that
> would suggest both buffers are already allocated. The question becomes why
> are both buffers allocated and yet a new zlib call wants to allocate them?
> Especially when the inflateInit2 and inflateEnd calls are in mutex-protected
> section.

We don't know if that's happening at all, do we? Couldn't it just be failing the 'else if' for the window allocation because we are using a direct equality comparison? The comment we have states that the windowBuf is the maximum allowable size, so surely a smaller buffer is possible and should be allowed.
Flags: needinfo?(mh+mozilla)
That answer is easily answered by looking at the zlib source code, and the zlib source code says the only two allocations are:

state = (struct inflate_state FAR *)
            ZALLOC(strm, 1, sizeof(struct inflate_state));

(where struct inflate_state is what is < 10k)

state->window = (unsigned char FAR *)
                        ZALLOC(strm, 1U << state->wbits,
                               sizeof(unsigned char));

Where state->wbits is essentially the absolute value of windowBits in the szip headser, and that's the 16th byte in the szipped file. And that is fixed to -SzipCompress::winSizeLog in szip.cpp, which value is 15.

So yes, in the theoretical case a szip header would advertize a different window size, the test wouldn't work, so in that sense, it's possibly worth changing, but it's not going to do any good for this crash.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #7)
Oops, you're right. Interesting. Not sure what's going on here then.
Assignee: snorp → droeh
tracking-fennec: --- → ?
tracking-fennec: ? → 45+
Attached patch Logging patchSplinter Review
A patch to add logcat output when StaticAllocator::Alloc fails.
Attachment #8726300 - Flags: review?(snorp)
Comment on attachment 8726300 [details] [diff] [review]
Logging patch

Review of attachment 8726300 [details] [diff] [review]:
-----------------------------------------------------------------

Seems ok to me with nits, but let's make sure glandium agrees

::: mozglue/linker/Zip.cpp
@@ +18,5 @@
> +    return stateBuf.get();
> +  } else if (items * size == windowBuf.size) {
> +    return windowBuf.get();
> +  } else {
> +    ERROR("StaticAllocator failed with items: %i, size: %i", items, size);

This should be %u for unsigned values
Attachment #8726300 - Flags: review?(snorp)
Attachment #8726300 - Flags: review?(mh+mozilla)
Attachment #8726300 - Flags: review+
Comment on attachment 8726300 [details] [diff] [review]
Logging patch

Review of attachment 8726300 [details] [diff] [review]:
-----------------------------------------------------------------

What is this expected to diagnose that we don't already know? We have a backtrace that can tell us which of the two allocations I mentioned in comment 7 it is that leads to the crash (hint: it's https://github.com/android/platform_external_zlib/blob/master/src/inflate.c#L208 ). Again this is not a problem of the requested allocation being too big, the problem is that we've already allocated both buffers. *That* is what needs sorting out.
Attachment #8726300 - Flags: review?(mh+mozilla) → review-
(In reply to Mike Hommey [:glandium] from comment #11)

I don't see how this is caused by already having allocated both buffers; we know the crash is happening on http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l114 (as you pointed out in comment 2). If the crash was caused by both buffers already being allocated, surely we would be hitting http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l81 instead?
(Forgot to NI, see above.)
Flags: needinfo?(mh+mozilla)
So, let's take a step back: this is what zxx_stream::Alloc (where zxx_stream::StaticAllocator::Alloc is inlined) looks like in 44.0b7, which is the version where the crash from comment 0 occurred:

   24a34:       6b80            ldr     r0, [r0, #56]   ; 0x38
        loads zStream->allocator into $r0

   24a36:       b1d8            cbz     r0, 24a70 <_ZN10zxx_stream5AllocEPvjj+0x3c>
        if $0 is zero, go to 24a70

   24a38:       2901            cmp     r1, #1
   24a3a:       d108            bne.n   24a4e <_ZN10zxx_stream5AllocEPvjj+0x1a>
        if items is not 1, go to 24a4e

   24a3c:       f5b2 5f40       cmp.w   r2, #12288      ; 0x3000
   24a40:       d805            bhi.n   24a4e <_ZN10zxx_stream5AllocEPvjj+0x1a>
        if size is > 0x3000, go to 24a4e

   24a42:       f500 5340       add.w   r3, r0, #12288  ; 0x3000
   24a46:       781a            ldrb    r2, [r3, #0]
   24a48:       b97a            cbnz    r2, 24a6a <_ZN10zxx_stream5AllocEPvjj+0x36>
        if this->stateBuf.inUse, go to 24a6a

   24a4a:       7019            strb    r1, [r3, #0]
        set this->stateBuf.inUse to 1

   24a4c:       4770            bx      lr
        return

   24a4e:       434a            muls    r2, r1
   24a50:       f5b2 4f00       cmp.w   r2, #32768      ; 0x8000
   24a54:       d109            bne.n   24a6a <_ZN10zxx_stream5AllocEPvjj+0x36>
        if items * size != 0x8000, go to 24a6a

   24a56:       f500 5040       add.w   r0, r0, #12288  ; 0x3000
   24a5a:       3001            adds    r0, #1
   24a5c:       f500 4300       add.w   r3, r0, #32768  ; 0x8000
   24a60:       781a            ldrb    r2, [r3, #0]
   24a62:       b912            cbnz    r2, 24a6a <_ZN10zxx_stream5AllocEPvjj+0x36>
        if this->windowBuf.inUse, go to 24a6a

   24a64:       2201            movs    r2, #1
   24a66:       701a            strb    r2, [r3, #0]
        set this->windowBuf.inUse to 1

   24a68:       4770            bx      lr
        return

   24a6a:       2300            movs    r3, #0
   24a6c:       601b            str     r3, [r3, #0]
        MOZ_CRASH

   24a6e:       deff            udf     #255    ; 0xff
        undefined instruction. Not sure why it's there. Maybe to align the next instruction

   24a70:       fb02 f001       mul.w   r0, r2, r1
   24a74:       f7f7 ba28       b.w     1bec8 <_Znaj>
        return operator new(items*size)

The crash is on 24a6c. There are three paths leading to it. One of them is http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l114 and two of them are http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l81. Debug info can't know which is which, because they're all at the same address, and apparently, http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l114 is the one that is attributed. But it's a red herring.
Looking at the register values on the crash, you can see that r1 and r2 are both 0x1. If we really followed the path leading to http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l114, r1 would be items, and r2 would be items * size. If items is 1 and items * size is 1, then it means size would have been 1, and the first condition should have matched, and there should have been no crash. So this is not the code path that was followed. That leaves the two paths that lead to http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l81. In both code paths, r2 is inUse from one of the ZStreamBuf, so it could be either. Both code paths are using r3 for the address of inUse, and r3 is overwritten for the MOZ_CRASH, so it's not possible to tell which is which from r3, but in the case of windowBuf, r0 would have an odd value because of the adds r0, #1, and in the crash dump, it is even, which points to the stateBuf.get() code path being the one that is happening.

Conclusion: the crash is in http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l81 called from http://hg.mozilla.org/releases/mozilla-beta/annotate/d4f5cfb02dda/mozglue/linker/Zip.h#l110 called from https://github.com/android/platform_external_zlib/blob/master/src/inflate.c#L208
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #14)

OK, I see what you're saying. Thanks for your help.
Attached patch Proposed patch (obsolete) — Splinter Review
So, I think reentrancy is the culprit here. Comments in Mappable.cpp suggest that this shouldn't be a problem, but do not seem to take into account the possibility that upon reentering, the static buffer we need will already be in use. This patch is a simple but not particularly elegant fix: create two static allocators and use the second as a fallback if the first fails. Obviously this isn't a perfect solution, as we could potentially reenter twice and once again find ourselves without an available buffer, but I don't see any general way of avoiding this possibility.
Attachment #8731688 - Flags: review?(mh+mozilla)
tracking-fennec: 45+ → 46+
(In reply to Dylan Roeh (:droeh) from comment #16)
> Created attachment 8731688 [details] [diff] [review]
> Proposed patch
> 
> So, I think reentrancy is the culprit here. Comments in Mappable.cpp suggest
> that this shouldn't be a problem

Please can you be more specific?
(In reply to Mike Hommey [:glandium] from comment #17)
> Please can you be more specific?

https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/Mappable.cpp#508

The comment notes that the lock can be re-acquired by the thread already holding it, but goes on to suggest that the worst outcome is a possible minor performance hit. I don't think that's the case when using the static allocator, though -- we may find the needed buffer is already in use.
See above.
Flags: needinfo?(mh+mozilla)
Comment on attachment 8731688 [details] [diff] [review]
Proposed patch

Review of attachment 8731688 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay, I had to do some digging and think about it thoroughly. The comment in Mappable.cpp is much older than the StaticAllocator code, so it was correct when it was written. The StaticAllocator changed matters, and, as it appears, broke corner cases. I think the work around makes sense, although I have a nit on the implementation, see below. I'll add that it doesn't matter that it wouldn't work if it re-entered for the third time, because in that situation, bad things would already happen because we use an alternative stack (sigaltstack). If we were re-entering again, we'd be overwriting our stack already.

::: mozglue/linker/Zip.h
@@ +136,5 @@
>        }
>      }
>  
>      ZStreamBuf<0x3000> stateBuf; // 0x3000 is an arbitrary size above 10K.
>      ZStreamBuf<1 << MAX_WBITS> windowBuf;

I'd rather add two buffers here than make the caller handle two allocators.
Attachment #8731688 - Flags: review?(mh+mozilla)
Flags: needinfo?(mh+mozilla)
Alright, here's a version that adds buffers to the static allocator rather than adding a second allocator.
Attachment #8731688 - Attachment is obsolete: true
Attachment #8733898 - Flags: review?(mh+mozilla)
Comment on attachment 8733898 [details] [diff] [review]
Proposed patch (updated)

Review of attachment 8733898 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/Zip.h
@@ +106,5 @@
>    public:
>      void *Alloc(uInt items, uInt size)
>      {
> +      if (items == 1 && size <= stateBuf1.size) {
> +        char* res = stateBuf1.get();

A shorter form would be:
char *res = stateBuf1.get() || stateBuf2.get();
MOZ_RELEASE_ASSERT(res, "ZStreamBuf already in use");
return res;

same for windowBuf.
Attachment #8733898 - Flags: review?(mh+mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/39fb883bcd1c
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8733898 [details] [diff] [review]
Proposed patch (updated)

Approval Request Comment
[Feature/regressing bug #]:1059797
[User impact if declined]:Periodic crashes
[Describe test coverage new/current, TreeHerder]:Nightly
[Risks and why]:Low risk - behavior should be identical except when reentering (in which case this should prevent a crash.)
[String/UUID change made/needed]:None
Attachment #8733898 - Flags: approval-mozilla-beta?
Attachment #8733898 - Flags: approval-mozilla-aurora?
Comment on attachment 8733898 [details] [diff] [review]
Proposed patch (updated)

Crash fix for android, let's take this on aurora and verify the fix there as best possible. I don't see high volume of crashes with this signature so I'd like to avoid uplifting this to beta. If you disagree let me know and we could try taking it for beta 8.
Attachment #8733898 - Flags: approval-mozilla-beta?
Attachment #8733898 - Flags: approval-mozilla-beta-
Attachment #8733898 - Flags: approval-mozilla-aurora?
Attachment #8733898 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.