<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Updated

•

11 years ago

Blocks: 1059228

Reporter

Updated

•

11 years ago

Blocks: 1059227

Reporter

Updated

•

11 years ago

Blocks: 1059205

Reporter

Updated

•

11 years ago

Blocks: 1059115

Reporter

Updated

•

11 years ago

Blocks: 1059114

Reporter

Updated

•

11 years ago

Blocks: 1056634

Reporter

Updated

•

11 years ago

Blocks: 1055882

Reporter

Updated

•

11 years ago

Blocks: 1055633

Reporter

Updated

•

11 years ago

Blocks: 1058453

Ed Morley [:emorley]

Comment 1

•

11 years ago

Great investigative work here - thank you! :-)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 2

•

11 years ago

#1 (and likely #3, and likely a bunch more - see deps) orange needs an owner. Milan, can you help please?

status-firefox33: --- → unaffected

status-firefox34: --- → affected

status-firefox-esr24: --- → unaffected

status-firefox-esr31: --- → unaffected

Flags: needinfo?(milan)

Comment 3

•

11 years ago

:snorp, is this something that's up your alley?

Flags: needinfo?(milan) → needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 4

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #3) > :snorp, is this something that's up your alley? Maybe, but I might have trouble finding cycles for it. It looks like the panda GL driver might just be crapping out? Not sure what we'll be able to do about that. I wish we knew which shader was hanging, though.

Flags: needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 5

•

11 years ago

George, would you have cycles to look at this otherwise?

Comment 6

•

11 years ago

I have a Panda in my possession, so really I should just have a look.

Assignee: nobody → snorp

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 7

•

11 years ago

I was able to reproduce the hang (I think), after many tries running reftests. The stack I have in the compositor thread is: #0 0x400c61f8 in __futex_syscall3 () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #1 0x400cadb4 in pthread_mutex_lock () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #2 0x400ccd5c in dlmalloc () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #3 0x400d000a in malloc () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #4 0x4014e274 in inflateInit2_ () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libz.so #5 0x5c11c194 in SeekableZStream::DecompressChunk (this=0x5c32e160, where=0x61c73000, chunk=1555, length=16384) at /Users/snorp/source/gecko-dev/mozglue/linker/SeekableZStream.cpp:84 #6 0x5c11b1de in MappableSeekableZStream::ensure (this=0x5c32e150, addr=<optimized out>) at /Users/snorp/source/gecko-dev/mozglue/linker/Mappable.cpp:531 #7 0x5c11a20a in SEGVHandler::handler (signum=11, info=0x6c8fdb40, context=0x6c8fdbc0) at /Users/snorp/source/gecko-dev/mozglue/linker/ElfLoader.cpp:1138 #8 0xffff0514 in ?? () #9 0xffff0514 in ?? () If accurate, this seems to be a problem in the on-demand library loading. Investigating a little more.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 8

•

11 years ago

Ah, so we are attempting to acquire the malloc() mutex here within a signal handler. As we know from doing profiler work, this is bad, because the lock can already be held elsewhere. With all threads suspended during signal handler execution, we have a deadlock. I am not sure if it's possible to avoid a malloc() here, but that looks like the best way to solve this.

Flags: needinfo?(mh+mozilla)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 9

•

11 years ago

I thought it might be possible to use the zalloc and zfree hooks in the z_stream to return stack allocated stuff, so I started playing with that. It looks like it needs at most two allocations in order to inflate one chunk. The first one is the internal inflate_state struct, which it uses to (surprise!) keep state. That seems to be a fixed size of 7116 bytes. Sometimes there is also a second allocation, which is related to the sliding window. I am not sure why I don't see that every time, but it always seems to be 32k. Unfortunately, when I return stack allocated buffers for these two, I get a crash. I'm not sure why yet, but it seems like this should be doable.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 10

•

11 years ago

Attached patch wip (obsolete) — Details — Splinter Review

This doesn't work, but I don't really understand why not. It crashes after a dozen or so chunks have been inflated.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 11

•

11 years ago

Attached patch Avoid malloc in on-demand linker (obsolete) — Details — Splinter Review

This version just preallocates the buffers on the heap during SeekableZStream construction instead of messing around with the stack. Seems to work locally, try run going here: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=77332663dffc

Attachment #8483919 - Attachment is obsolete: true

Attachment #8484357 - Flags: review?(mh+mozilla)

Assignee

Comment 12

•

11 years ago

Stepping back a bit: (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #8) > With all threads suspended during signal handler execution, we have a deadlock. This actually shouldn't be the case. That is, other threads should continue running during signal handler execution (and I just checked it's the case, although I didn't try on an arm device, but I don't think this should matter). With that eliminated, the only way to get in that decompression code is in case a segfault occurred in the address space of a library loaded by our linker. That only happens if running code from that library, or read/writing data from/to it. Now, afaik, the signal handler for a segfault runs on the thread that segfaulted, so the only way we'd get in the decompression code is if code on that thread is calling code from e.g. libxul, or read/writing data from/to it. And for a deadlock to happen, that code would need to be holding the system malloc lock at the same time. How can that happen? So, we'd be in a situation where the signal is not dispatched to the thread that segfaulted? I'd rather have a clear view of the situation before saying anything else.

Flags: needinfo?(mh+mozilla)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 13

•

11 years ago

Comment on attachment 8484357 [details] [diff] [review] Avoid malloc in on-demand linker Review of attachment 8484357 [details] [diff] [review]: ----------------------------------------------------------------- Considering several threads could be running a decompression in parallel, it's not safe to use the same buffer for all of them.

Attachment #8484357 - Flags: review?(mh+mozilla) → review-

Comment 14

•

11 years ago

(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #12) > Stepping back a bit: > > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #8) > > With all threads suspended during signal handler execution, we have a deadlock. > > This actually shouldn't be the case. That is, other threads should continue > running during signal handler execution (and I just checked it's the case, > although I didn't try on an arm device, but I don't think this should > matter). Indeed, this does appear to be the case, and docs back that up. > > With that eliminated, the only way to get in that decompression code is in > case a segfault occurred > in the address space of a library loaded by our linker. That only happens if > running code from that library, or read/writing data from/to it. > > Now, afaik, the signal handler for a segfault runs on the thread that > segfaulted, so the only way we'd get in the decompression code is if code on > that thread is calling code from e.g. libxul, or read/writing data from/to > it. > > And for a deadlock to happen, that code would need to be holding the system > malloc lock at the same time. How can that happen? > > So, we'd be in a situation where the signal is not dispatched to the thread > that segfaulted? Yeah, weirdness. I'm going to try to reproduce the hang again and get a stack trace for all threads.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 15

•

11 years ago

Attached file Stack traces of all threads during the hang — Details

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 16

•

11 years ago

I attached a file containing the stack traces of all threads during the hang. Here we can see there are actually several threads waiting on the malloc mutex. Thread 35 is the Compositor thread, and malloc is called from within libz. The stack trace goes south after that for some reason, but my guess is that we're in the signal handler. Threads 11 and 9 are Dalvik-owned, and are calling malloc from a dalvik/Android stuff. Thread 1 is the Android UI thread, and is calling malloc from a dalvik function. I think it is generally known that using pthread mutexes from signal handlers is Bad(tm), but I'm having trouble finding the specific reason. I think it's mostly due to reentrancy issues (trying to lock a mutex from a signal handler in a thread that already has the mutex locked). I see that we set SA_NODEFER on the linker signal handler, which means reentrancy in the signal handler itself is allowed. There is no way anything in CustomElf could use something that CustomElf itself loaded, though, so I don't see how that could be an issue.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 17

•

11 years ago

One interesting thing I noticed is that the address we segfaulted at is 0x6542884d. ElfLoader::GetHandleByPtr() think that's in libxul, but 'info sharedlibrary' disagrees. It claims libxul is from 0x63ef1140 to 0x65283914. Maybe the gdb info isn't accurate?

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 18

•

11 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #17) > One interesting thing I noticed is that the address we segfaulted at is > 0x6542884d. ElfLoader::GetHandleByPtr() think that's in libxul, but 'info > sharedlibrary' disagrees. It claims libxul is from 0x63ef1140 to 0x65283914. > Maybe the gdb info isn't accurate? I suppose it's possible the current fault would end up just expanding that range.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 19

•

11 years ago

Which one is the crashing thread? #35?

Comment 20

•

11 years ago

(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #19) > Which one is the crashing thread? #35? You mean which one is running the SEGV handler? That's 35, yes.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 21

•

11 years ago

And which one is the one that actually crashed?

Comment 22

•

11 years ago

(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #21) > And which one is the one that actually crashed? Nothing crashed. The SEGV handler is just the fault from the on-demand linker. It's deadlocked on the malloc mutex.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 23

•

11 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #22) > (In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment > #21) > > And which one is the one that actually crashed? > > Nothing crashed. The SEGV handler is just the fault from the on-demand > linker. It's deadlocked on the malloc mutex. It can't be locked on the malloc mutex *and* have triggered the on-demand linker. So the question remains: which thread is it that caused the segfault?

Comment 24

•

11 years ago

(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #23) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #22) > > (In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment > > #21) > > > And which one is the one that actually crashed? > > > > Nothing crashed. The SEGV handler is just the fault from the on-demand > > linker. It's deadlocked on the malloc mutex. > > It can't be locked on the malloc mutex *and* have triggered the on-demand > linker. So the question remains: which thread is it that caused the segfault? I see what you're saying. We lose the stack before the segfault handler, so I have no idea where we were. I agree that it's unlikely we are inside malloc() there.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 25

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #13) > Comment on attachment 8484357 [details] [diff] [review] > Avoid malloc in on-demand linker > > Review of attachment 8484357 [details] [diff] [review]: > ----------------------------------------------------------------- > > Considering several threads could be running a decompression in parallel, > it's not safe to use the same buffer for all of them. So, I think this is actually alright. We have a set of preallocated buffers for each zstream, and you cannot decompress a chunk from one stream in two threads (MappableSeekableZStream::ensure() has a mutex). However, your point about why we hang on the malloc mutex in the first place is still bugging me. Can't explain that one. Whatever the reason, avoiding the malloc lock will fix it. What do you think?

Flags: needinfo?(mh+mozilla)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 26

•

11 years ago

Attached patch Pre-allocate zlib buffers for on-demand linker decompression (obsolete) — Details — Splinter Review

This version is the same, but increases the size for the state allocation. Apparently it's larger on some platforms. As explained earlier, I think this patch should be safe because you cannot have parallell DecompressChunk calls in a single SeekableZStream.

Attachment #8484357 - Attachment is obsolete: true

Attachment #8511075 - Flags: review?(mh+mozilla)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 27

•

11 years ago

Attached patch Disable the on-demand linker by default (obsolete) — Details — Splinter Review

releng really needs a solution for these failures, so we should just disable the on-demand linker if another fix can't be found.

Attachment #8512003 - Flags: review?(mwu)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 28

•

11 years ago

Attached patch Disable the on-demand linker by default (obsolete) — Details — Splinter Review

Attachment #8512003 - Attachment is obsolete: true

Attachment #8512003 - Flags: review?(mwu)

Attachment #8512004 - Flags: review?(mwu)

Assignee

Comment 29

•

11 years ago

Comment on attachment 8512004 [details] [diff] [review] Disable the on-demand linker by default Review of attachment 8512004 [details] [diff] [review]: ----------------------------------------------------------------- You can't wait, can you?

Attachment #8512004 - Flags: review?(mwu) → review-

Comment 30

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #29) > You can't wait, can you? 2 months seems pretty patient to me.

Assignee

Comment 31

•

11 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #30) > (In reply to Mike Hommey [:glandium] from comment #29) > > You can't wait, can you? > > 2 months seems pretty patient to me. The other patch, that tries to fix the problem, is from friday.

Flags: needinfo?(mh+mozilla)

Assignee

Comment 32

•

11 years ago

If you _really_ want to disable on-demand decompression, set MOZ_LINKER_ONDEMAND=0 from java. But disabling it would make any attempt at fixing the actual problem irrelevant since nothing will tell if it was, indeed, fixed.

Assignee

Comment 33

•

11 years ago

Comment on attachment 8511075 [details] [diff] [review] Pre-allocate zlib buffers for on-demand linker decompression See upcoming patch which addresses comments I'd have about this one. Figured this would make things landed faster than going through another cycle. Also, since I had a PoC patch for bug 828845 that also hooked the zstream allocator, I went the two birds one stone way and took parts of that patch.

Attachment #8511075 - Attachment is obsolete: true

Attachment #8511075 - Flags: review?(mh+mozilla)

Assignee

Comment 34

•

11 years ago

Attached patch Pre-allocate zlib inflate buffers in faulty.lib — Details — Splinter Review

Original patch from James Willcox <snorp@snorp.net>

Attachment #8512479 - Flags: review?(nfroyd)

Assignee

Updated

•

11 years ago

Assignee: snorp → mh+mozilla

Status: NEW → ASSIGNED

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 35

•

11 years ago

Note, I'm still not comfortable with the fact that there are threads SEGV'ing in malloc().

Comment 36

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #31) > (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #30) > > (In reply to Mike Hommey [:glandium] from comment #29) > > > You can't wait, can you? > > > > 2 months seems pretty patient to me. > > The other patch, that tries to fix the problem, is from friday. It's the same patch, I just increased the state buffer since that seeems to vary across platforms.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 37

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #29) > Comment on attachment 8512004 [details] [diff] [review] > Disable the on-demand linker by default > > Review of attachment 8512004 [details] [diff] [review]: > ----------------------------------------------------------------- > > You can't wait, can you? Sorry if you're offended, but releng really needs this fixed. Thanks for finishing up the patch.

Brad Lassey [:blassey] (use needinfo?)

Comment 38

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #29) > Comment on attachment 8512004 [details] [diff] [review] > Disable the on-demand linker by default > > Review of attachment 8512004 [details] [diff] [review]: > ----------------------------------------------------------------- > > You can't wait, can you? Two months without a review is really unacceptable. Complaining when the review gets switched is baffling. (In reply to Mike Hommey [:glandium] from comment #32) > If you _really_ want to disable on-demand decompression, set > MOZ_LINKER_ONDEMAND=0 from java. But disabling it would make any attempt at > fixing the actual problem irrelevant since nothing will tell if it was, > indeed, fixed. That doesn't really make sense. Disabling it while its causing intermittent orange removes the pain for RelEng. Re-enabling it when a fix is available would give you a good idea if the fix worked.

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Attachment #8512479 - Flags: review?(nfroyd) → review+

Kevin Brosnan [Ex-Mozilla]

Comment 39

•

11 years ago

https://hg.mozilla.org/integration/fx-team/rev/44c6d392bbc8 (In reply to Mike Hommey [:glandium] from comment #35) > Note, I'm still not comfortable with the fact that there are threads > SEGV'ing in malloc(). Sounds like fodder for a follow-up bug?

Whiteboard: [fixed-in-fx-team]

Comment 40

•

11 years ago

[Tracking Requested - why for this release]: regression

tracking-fennec: --- → ?

status-firefox35: --- → affected

status-firefox36: --- → affected

tracking-firefox34: --- → ?

tracking-firefox35: --- → ?

tracking-firefox36: --- → ?

Comment 41

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/44c6d392bbc8 Early results on fx-team are looking promising. We'll see how things look over the next day or so now that this is merging around.

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Whiteboard: [fixed-in-fx-team]

Target Milestone: --- → mozilla36

Lawrence Mandel [:lmandel] (use needinfo)

Assignee

Comment 42

•

11 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #36) > > The other patch, that tries to fix the problem, is from friday. > > It's the same patch, I just increased the state buffer since that seeems to > vary across platforms. But there was no r? flag until friday. > > You can't wait, can you? > > Sorry if you're offended, but releng really needs this fixed. Thanks for > finishing up the patch. (In reply to Brad Lassey [:blassey] (use needinfo?) from comment #38) > Two months without a review is really unacceptable. Complaining when the > review gets switched is baffling. I'm sorry, but there is absolutely no context in this bug indicating this was so much of pain, and that it was pressing to have it fixed. Whatever email exchange led to this haste is unknown to me. All I saw was snorp re-asking for review on friday for a possible fix, and then, because somehow that the review didn't happen over the week end was not quick enough, attached another patch to entirely disable on-demand decompression.

Updated

•

11 years ago

tracking-firefox34: ? → +

tracking-firefox35: ? → +

tracking-firefox36: ? → +

Comment 43

•

11 years ago

Seems to be holding up pretty well so far. Can we please get Aurora and Beta approval requests?

status-firefox36: affected → fixed

Assignee

Comment 44

•

11 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43) > Seems to be holding up pretty well so far. Can we please get Aurora and Beta > approval requests? I'd rather see what's up with bug 1091118 first, since it's related to the code added here.

Assignee

Updated

•

11 years ago

Depends on: 1091118

Assignee

Updated

•

11 years ago

Attachment #8512004 - Attachment is obsolete: true

Lawrence Mandel [:lmandel] (use needinfo)

Assignee

Comment 45

•

11 years ago

Comment on attachment 8512479 [details] [diff] [review] Pre-allocate zlib inflate buffers in faulty.lib Approval Request Comment [Feature/regressing bug #]: bug 848764 [User impact if declined]: Unstable android testing. This probably also affects Firefox on user devices. [Describe test coverage new/current, TBPL]: Reduced android test instability. [Risks and why]: Not very high. If the patch had ill effects, like making the wrong chunk unpacked, or corrupted data, it would create more instability. That said, it's the kind of instability that is very hard to measure without the law of large numbers. On paper, it's safe. The good thing is that it has a very low risk of conflict, so can be backed out easily in case of disaster. [String/UUID change made/needed]: None

Attachment #8512479 - Flags: approval-mozilla-beta?

Attachment #8512479 - Flags: approval-mozilla-aurora?

Comment 46

•

11 years ago

Comment on attachment 8512479 [details] [diff] [review] Pre-allocate zlib inflate buffers in faulty.lib Beta+ Aurora+ ESR31 is marked as unaffected but bug 848764 was fixed in Firefox 26. Is this correct?

Flags: needinfo?(mh+mozilla)

Attachment #8512479 - Flags: approval-mozilla-beta?

Attachment #8512479 - Flags: approval-mozilla-beta+

Attachment #8512479 - Flags: approval-mozilla-aurora?

Attachment #8512479 - Flags: approval-mozilla-aurora+

https://hg.mozilla.org/releases/mozilla-aurora/rev/95cf3c9955e0 https://hg.mozilla.org/releases/mozilla-beta/rev/3554e60ef779

Comment 47

•

11 years ago

status-firefox34: affected → fixed

status-firefox35: affected → fixed

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

11 years ago

Depends on: 1091664

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

11 years ago

tracking-fennec: ? → 34+

Assignee

Comment 48

•

11 years ago

(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #46) > Comment on attachment 8512479 [details] [diff] [review] > Pre-allocate zlib inflate buffers in faulty.lib > > Beta+ > Aurora+ > > ESR31 is marked as unaffected but bug 848764 was fixed in Firefox 26. Is > this correct? Technically, it's affected as much as other branches, but maybe that doesn't show up as test instability.

Flags: needinfo?(mh+mozilla)

Lawrence Mandel [:lmandel] (use needinfo)

Assignee

Comment 49

•

11 years ago

(or, the root cause of this bug, which we actually only worked around, did appear on esr31, I don't know)

Comment 50

•

11 years ago

If there isn't a good case to take this fix on ESR31, let's not bother. Our plan is only to support Android on this branch until mid Jan.

(inactive) Jim Chen [:jchen] [:darchons]

Updated

•

11 years ago

status-firefox-esr31: unaffected → wontfix

Updated

•

11 years ago

Depends on: 1113416

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 59

•

11 years ago

Bug 1113416 is believed to have fixed the underlying cause for this. Since the patch here seems to have triggered a startup regression (?), maybe we should back it out and uplift bug 1113416. Mike what do you think?

Flags: needinfo?(mh+mozilla)

Assignee

Comment 60

•

11 years ago

Let's see how things pan out in bug 1091664 first. That said, bug 1113416 is probably good to uplift, independently of whether we back out this one or not.

Flags: needinfo?(mh+mozilla)

Assignee

Comment 61

•

11 years ago

The landed fix actually didn't change the allocation pattern at all.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Assignee

Comment 62

•

11 years ago

Attached patch Really share the same zlib inflate buffers for SeekableZStream chunks — Details — Splinter Review

This applies on top of what already landed. Note that the patch that landed increased the buffer sizes, so there *are* a few more allocated bytes, which also means more poisoned bytes when deallocating, which /might/ explain bug 1113416.

Attachment #8561926 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 63

•

11 years ago

Comment on attachment 8561926 [details] [diff] [review] Really share the same zlib inflate buffers for SeekableZStream chunks Review of attachment 8561926 [details] [diff] [review]: ----------------------------------------------------------------- I trawled a bit through the linked bugs, and I don't understand why we're doing this. Before this patch, we had preallocated buffers for everything, which as I understand it, was necessary to avoid calling malloc() at inopportune times. Now with this patch, we have really preallocated buffers for decompression, but we malloc() for everything else? Why are we not keeping the preallocated buffers for everything else? ::: mozglue/linker/Zip.h @@ +31,5 @@ > public: > + /* Forward declaration */ > + class StaticAllocator; > + > + zxx_stream(StaticAllocator *allocator_=nullptr) Nit: maybe |explicit| this?

Nathan Froyd [:froydnj]

Updated

•

11 years ago

Attachment #8561926 - Flags: review?(nfroyd) → review+

https://hg.mozilla.org/integration/mozilla-inbound/rev/aac7de995646

Assignee

Comment 64

•

11 years ago

Carsten Book [:Tomcat]

Comment 65

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/aac7de995646

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

status-firefox38: --- → fixed

Resolution: --- → FIXED

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 66

•

11 years ago

OK, so I guess we were allocating those buffers once per chunk instead of once per stream. Oops. This seems to have fixed the startup regression, so we should uplift to at least 36.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 67

•

11 years ago

I'll let Mike request the uplift, he can probably speak better about the risk it entails.

Flags: needinfo?(mh+mozilla)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 68

•

11 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #66) > OK, so I guess we were allocating those buffers once per chunk instead of > once per stream. Oops. This seems to have fixed the startup regression, so > we should uplift to at least 36. Technically, the original patch didn't change much, it just made half of those allocations that were already happening slightly larger. So in that sense, it may have been the cause of the regression. The new patch does actually remove those allocations, so, more than fixing the regression, it actually does what was meant to be done. If the goal is to make the regression go away, I'd rather backout the original patch from older branches, and let the new patch bake some more. James, what do you think?

Flags: needinfo?(mh+mozilla) → needinfo?(snorp)

Comment 69

•

11 years ago

(In reply to Mike Hommey [:glandium] from comment #68) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #66) > > OK, so I guess we were allocating those buffers once per chunk instead of > > once per stream. Oops. This seems to have fixed the startup regression, so > > we should uplift to at least 36. > > Technically, the original patch didn't change much, it just made half of > those allocations that were already happening slightly larger. So in that > sense, it may have been the cause of the regression. The new patch does > actually remove those allocations, so, more than fixing the regression, it > actually does what was meant to be done. If the goal is to make the > regression go away, I'd rather backout the original patch from older > branches, and let the new patch bake some more. James, what do you think? It looks like we've been shipping the regression since 34, so I don't think I agree with backing it out now unless we also push something (the new patch) to fix the original malloc hang. If you want to just let this ride the trains, I'm alright with that.

Flags: needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 70

•

11 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #69) > It looks like we've been shipping the regression since 34, so I don't think > I agree with backing it out now unless we also push something (the new > patch) to fix the original malloc hang. If you want to just let this ride > the trains, I'm alright with that. The thing is, I don't see how that actually fixed the original malloc hang, since the amount of malloc was the same.

Comment 71

•

10 years ago

(In reply to Mike Hommey [:glandium] from comment #70) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #69) > > It looks like we've been shipping the regression since 34, so I don't think > > I agree with backing it out now unless we also push something (the new > > patch) to fix the original malloc hang. If you want to just let this ride > > the trains, I'm alright with that. > > The thing is, I don't see how that actually fixed the original malloc hang, > since the amount of malloc was the same. The amount was (roughly) the same, but it did it before the signal handler -- not inside of it.