Closed
Bug 1059797
Opened 10 years ago
Closed 10 years ago
Intermittent hang in ShaderProgramOGL::CreateShader during Android 4.0 tests
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla36
People
(Reporter: gbrown, Assigned: glandium)
References
Details
Attachments
(3 files, 5 obsolete files)
68.38 KB,
text/plain
|
Details | |
6.54 KB,
patch
|
froydnj
:
review+
lmandel
:
approval-mozilla-aurora+
lmandel
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
6.30 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
In bug 1054292 (and perhaps bug 1054456 and some others, which look similar), Android 4.0 tests frequently fail with a timeout on startup. Bug 1054292 affects nearly all Android 4.0 test jobs and was the #1 test failure on this week's War on Orange report. https://tbpl.mozilla.org/?tree=Try&rev=4c002148f80b adds logging to trace the hang. https://tbpl.mozilla.org/php/getParsedLog.php?id=46938795&tree=Try&full=1 reproduces the hang and shows: 05:53:04 INFO - 08-28 05:47:04.523 I/Gecko ( 2100): GYB: CompositorOGL::Initialize 05:53:04 INFO - 08-28 05:47:04.523 I/Gecko ( 2100): Attempting load of libEGL.so 05:53:04 INFO - 08-28 05:47:04.531 I/Gecko ( 2100): Can't find symbol '_Z35eglQueryStringImplementationANDROIDPvi'. 05:53:04 INFO - 08-28 05:47:04.625 I/Gecko ( 2100): GYB: Initialize calls GetShaderProgramFor 05:53:04 INFO - 08-28 05:47:04.625 I/Gecko ( 2100): GYB GetShaderProgramFor 1 05:53:04 INFO - 08-28 05:47:04.625 I/Gecko ( 2100): GYB: ShaderProgramOGL::Initialize 05:53:04 INFO - 08-28 05:47:04.625 I/Gecko ( 2100): GYB: ShaderProgramOGL::Initialize 1 05:53:04 INFO - 08-28 05:47:04.625 I/Gecko ( 2100): GYB: ShaderProgramOGL::CreateProgram 05:53:04 INFO - 08-28 05:47:04.625 I/Gecko ( 2100): GYB: CreateShader 1 ...then no more activity from Gecko for 5+ minutes. That suggests the hang is in ShaderProgramOGL::CreateShader, during mGL->fCompileShader(sh) -- http://hg.mozilla.org/mozilla-central/annotate/2a15dc07ddaa/gfx/layers/opengl/OGLShaderProgram.cpp#l445.
Updated•10 years ago
|
OS: Linux → Android
Hardware: x86_64 → ARM
Comment 1•10 years ago
|
||
Great investigative work here - thank you! :-)
Comment 2•10 years ago
|
||
#1 (and likely #3, and likely a bunch more - see deps) orange needs an owner. Milan, can you help please?
status-firefox33:
--- → unaffected
status-firefox34:
--- → affected
status-firefox-esr24:
--- → unaffected
status-firefox-esr31:
--- → unaffected
Flags: needinfo?(milan)
Comment 3•10 years ago
|
||
:snorp, is this something that's up your alley?
Flags: needinfo?(milan) → needinfo?(snorp)
Comment 4•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #3) > :snorp, is this something that's up your alley? Maybe, but I might have trouble finding cycles for it. It looks like the panda GL driver might just be crapping out? Not sure what we'll be able to do about that. I wish we knew which shader was hanging, though.
Flags: needinfo?(snorp)
Comment 5•10 years ago
|
||
George, would you have cycles to look at this otherwise?
Comment 6•10 years ago
|
||
I have a Panda in my possession, so really I should just have a look.
Assignee: nobody → snorp
Comment 7•10 years ago
|
||
I was able to reproduce the hang (I think), after many tries running reftests. The stack I have in the compositor thread is: #0 0x400c61f8 in __futex_syscall3 () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #1 0x400cadb4 in pthread_mutex_lock () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #2 0x400ccd5c in dlmalloc () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #3 0x400d000a in malloc () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libc.so #4 0x4014e274 in inflateInit2_ () from /Users/snorp/source/jimdb-arm/lib/0123456789ABCDEF/system/lib/libz.so #5 0x5c11c194 in SeekableZStream::DecompressChunk (this=0x5c32e160, where=0x61c73000, chunk=1555, length=16384) at /Users/snorp/source/gecko-dev/mozglue/linker/SeekableZStream.cpp:84 #6 0x5c11b1de in MappableSeekableZStream::ensure (this=0x5c32e150, addr=<optimized out>) at /Users/snorp/source/gecko-dev/mozglue/linker/Mappable.cpp:531 #7 0x5c11a20a in SEGVHandler::handler (signum=11, info=0x6c8fdb40, context=0x6c8fdbc0) at /Users/snorp/source/gecko-dev/mozglue/linker/ElfLoader.cpp:1138 #8 0xffff0514 in ?? () #9 0xffff0514 in ?? () If accurate, this seems to be a problem in the on-demand library loading. Investigating a little more.
Comment 8•10 years ago
|
||
Ah, so we are attempting to acquire the malloc() mutex here within a signal handler. As we know from doing profiler work, this is bad, because the lock can already be held elsewhere. With all threads suspended during signal handler execution, we have a deadlock. I am not sure if it's possible to avoid a malloc() here, but that looks like the best way to solve this.
Flags: needinfo?(mh+mozilla)
Comment 9•10 years ago
|
||
I thought it might be possible to use the zalloc and zfree hooks in the z_stream to return stack allocated stuff, so I started playing with that. It looks like it needs at most two allocations in order to inflate one chunk. The first one is the internal inflate_state struct, which it uses to (surprise!) keep state. That seems to be a fixed size of 7116 bytes. Sometimes there is also a second allocation, which is related to the sliding window. I am not sure why I don't see that every time, but it always seems to be 32k. Unfortunately, when I return stack allocated buffers for these two, I get a crash. I'm not sure why yet, but it seems like this should be doable.
Comment 10•10 years ago
|
||
This doesn't work, but I don't really understand why not. It crashes after a dozen or so chunks have been inflated.
Comment 11•10 years ago
|
||
This version just preallocates the buffers on the heap during SeekableZStream construction instead of messing around with the stack. Seems to work locally, try run going here: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=77332663dffc
Attachment #8483919 -
Attachment is obsolete: true
Attachment #8484357 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 12•10 years ago
|
||
Stepping back a bit: (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #8) > With all threads suspended during signal handler execution, we have a deadlock. This actually shouldn't be the case. That is, other threads should continue running during signal handler execution (and I just checked it's the case, although I didn't try on an arm device, but I don't think this should matter). With that eliminated, the only way to get in that decompression code is in case a segfault occurred in the address space of a library loaded by our linker. That only happens if running code from that library, or read/writing data from/to it. Now, afaik, the signal handler for a segfault runs on the thread that segfaulted, so the only way we'd get in the decompression code is if code on that thread is calling code from e.g. libxul, or read/writing data from/to it. And for a deadlock to happen, that code would need to be holding the system malloc lock at the same time. How can that happen? So, we'd be in a situation where the signal is not dispatched to the thread that segfaulted? I'd rather have a clear view of the situation before saying anything else.
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 13•10 years ago
|
||
Comment on attachment 8484357 [details] [diff] [review] Avoid malloc in on-demand linker Review of attachment 8484357 [details] [diff] [review]: ----------------------------------------------------------------- Considering several threads could be running a decompression in parallel, it's not safe to use the same buffer for all of them.
Attachment #8484357 -
Flags: review?(mh+mozilla) → review-
Comment 14•10 years ago
|
||
(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #12) > Stepping back a bit: > > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #8) > > With all threads suspended during signal handler execution, we have a deadlock. > > This actually shouldn't be the case. That is, other threads should continue > running during signal handler execution (and I just checked it's the case, > although I didn't try on an arm device, but I don't think this should > matter). Indeed, this does appear to be the case, and docs back that up. > > With that eliminated, the only way to get in that decompression code is in > case a segfault occurred > in the address space of a library loaded by our linker. That only happens if > running code from that library, or read/writing data from/to it. > > Now, afaik, the signal handler for a segfault runs on the thread that > segfaulted, so the only way we'd get in the decompression code is if code on > that thread is calling code from e.g. libxul, or read/writing data from/to > it. > > And for a deadlock to happen, that code would need to be holding the system > malloc lock at the same time. How can that happen? > > So, we'd be in a situation where the signal is not dispatched to the thread > that segfaulted? Yeah, weirdness. I'm going to try to reproduce the hang again and get a stack trace for all threads.
Comment 15•10 years ago
|
||
Comment 16•10 years ago
|
||
I attached a file containing the stack traces of all threads during the hang. Here we can see there are actually several threads waiting on the malloc mutex. Thread 35 is the Compositor thread, and malloc is called from within libz. The stack trace goes south after that for some reason, but my guess is that we're in the signal handler. Threads 11 and 9 are Dalvik-owned, and are calling malloc from a dalvik/Android stuff. Thread 1 is the Android UI thread, and is calling malloc from a dalvik function. I think it is generally known that using pthread mutexes from signal handlers is Bad(tm), but I'm having trouble finding the specific reason. I think it's mostly due to reentrancy issues (trying to lock a mutex from a signal handler in a thread that already has the mutex locked). I see that we set SA_NODEFER on the linker signal handler, which means reentrancy in the signal handler itself is allowed. There is no way anything in CustomElf could use something that CustomElf itself loaded, though, so I don't see how that could be an issue.
Comment 17•10 years ago
|
||
One interesting thing I noticed is that the address we segfaulted at is 0x6542884d. ElfLoader::GetHandleByPtr() think that's in libxul, but 'info sharedlibrary' disagrees. It claims libxul is from 0x63ef1140 to 0x65283914. Maybe the gdb info isn't accurate?
Comment 18•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #17) > One interesting thing I noticed is that the address we segfaulted at is > 0x6542884d. ElfLoader::GetHandleByPtr() think that's in libxul, but 'info > sharedlibrary' disagrees. It claims libxul is from 0x63ef1140 to 0x65283914. > Maybe the gdb info isn't accurate? I suppose it's possible the current fault would end up just expanding that range.
Assignee | ||
Comment 19•10 years ago
|
||
Which one is the crashing thread? #35?
Comment 20•10 years ago
|
||
(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #19) > Which one is the crashing thread? #35? You mean which one is running the SEGV handler? That's 35, yes.
Assignee | ||
Comment 21•10 years ago
|
||
And which one is the one that actually crashed?
Comment 22•10 years ago
|
||
(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #21) > And which one is the one that actually crashed? Nothing crashed. The SEGV handler is just the fault from the on-demand linker. It's deadlocked on the malloc mutex.
Assignee | ||
Comment 23•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #22) > (In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment > #21) > > And which one is the one that actually crashed? > > Nothing crashed. The SEGV handler is just the fault from the on-demand > linker. It's deadlocked on the malloc mutex. It can't be locked on the malloc mutex *and* have triggered the on-demand linker. So the question remains: which thread is it that caused the segfault?
Comment 24•10 years ago
|
||
(In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment #23) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #22) > > (In reply to Mike Hommey [:glandium] (out from Sep 6 to Sep 22) from comment > > #21) > > > And which one is the one that actually crashed? > > > > Nothing crashed. The SEGV handler is just the fault from the on-demand > > linker. It's deadlocked on the malloc mutex. > > It can't be locked on the malloc mutex *and* have triggered the on-demand > linker. So the question remains: which thread is it that caused the segfault? I see what you're saying. We lose the stack before the segfault handler, so I have no idea where we were. I agree that it's unlikely we are inside malloc() there.
Comment 25•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #13) > Comment on attachment 8484357 [details] [diff] [review] > Avoid malloc in on-demand linker > > Review of attachment 8484357 [details] [diff] [review]: > ----------------------------------------------------------------- > > Considering several threads could be running a decompression in parallel, > it's not safe to use the same buffer for all of them. So, I think this is actually alright. We have a set of preallocated buffers for each zstream, and you cannot decompress a chunk from one stream in two threads (MappableSeekableZStream::ensure() has a mutex). However, your point about why we hang on the malloc mutex in the first place is still bugging me. Can't explain that one. Whatever the reason, avoiding the malloc lock will fix it. What do you think?
Flags: needinfo?(mh+mozilla)
Comment 26•10 years ago
|
||
This version is the same, but increases the size for the state allocation. Apparently it's larger on some platforms. As explained earlier, I think this patch should be safe because you cannot have parallell DecompressChunk calls in a single SeekableZStream.
Attachment #8484357 -
Attachment is obsolete: true
Attachment #8511075 -
Flags: review?(mh+mozilla)
Comment 27•10 years ago
|
||
releng really needs a solution for these failures, so we should just disable the on-demand linker if another fix can't be found.
Attachment #8512003 -
Flags: review?(mwu)
Comment 28•10 years ago
|
||
Attachment #8512003 -
Attachment is obsolete: true
Attachment #8512003 -
Flags: review?(mwu)
Attachment #8512004 -
Flags: review?(mwu)
Assignee | ||
Comment 29•10 years ago
|
||
Comment on attachment 8512004 [details] [diff] [review] Disable the on-demand linker by default Review of attachment 8512004 [details] [diff] [review]: ----------------------------------------------------------------- You can't wait, can you?
Attachment #8512004 -
Flags: review?(mwu) → review-
Comment 30•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #29) > You can't wait, can you? 2 months seems pretty patient to me.
Assignee | ||
Comment 31•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #30) > (In reply to Mike Hommey [:glandium] from comment #29) > > You can't wait, can you? > > 2 months seems pretty patient to me. The other patch, that tries to fix the problem, is from friday.
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 32•10 years ago
|
||
If you _really_ want to disable on-demand decompression, set MOZ_LINKER_ONDEMAND=0 from java. But disabling it would make any attempt at fixing the actual problem irrelevant since nothing will tell if it was, indeed, fixed.
Assignee | ||
Comment 33•10 years ago
|
||
Comment on attachment 8511075 [details] [diff] [review] Pre-allocate zlib buffers for on-demand linker decompression See upcoming patch which addresses comments I'd have about this one. Figured this would make things landed faster than going through another cycle. Also, since I had a PoC patch for bug 828845 that also hooked the zstream allocator, I went the two birds one stone way and took parts of that patch.
Attachment #8511075 -
Attachment is obsolete: true
Attachment #8511075 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 34•10 years ago
|
||
Original patch from James Willcox <snorp@snorp.net>
Attachment #8512479 -
Flags: review?(nfroyd)
Assignee | ||
Updated•10 years ago
|
Assignee: snorp → mh+mozilla
Status: NEW → ASSIGNED
Assignee | ||
Comment 35•10 years ago
|
||
Note, I'm still not comfortable with the fact that there are threads SEGV'ing in malloc().
Comment 36•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #31) > (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #30) > > (In reply to Mike Hommey [:glandium] from comment #29) > > > You can't wait, can you? > > > > 2 months seems pretty patient to me. > > The other patch, that tries to fix the problem, is from friday. It's the same patch, I just increased the state buffer since that seeems to vary across platforms.
Comment 37•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #29) > Comment on attachment 8512004 [details] [diff] [review] > Disable the on-demand linker by default > > Review of attachment 8512004 [details] [diff] [review]: > ----------------------------------------------------------------- > > You can't wait, can you? Sorry if you're offended, but releng really needs this fixed. Thanks for finishing up the patch.
Comment 38•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #29) > Comment on attachment 8512004 [details] [diff] [review] > Disable the on-demand linker by default > > Review of attachment 8512004 [details] [diff] [review]: > ----------------------------------------------------------------- > > You can't wait, can you? Two months without a review is really unacceptable. Complaining when the review gets switched is baffling. (In reply to Mike Hommey [:glandium] from comment #32) > If you _really_ want to disable on-demand decompression, set > MOZ_LINKER_ONDEMAND=0 from java. But disabling it would make any attempt at > fixing the actual problem irrelevant since nothing will tell if it was, > indeed, fixed. That doesn't really make sense. Disabling it while its causing intermittent orange removes the pain for RelEng. Re-enabling it when a fix is available would give you a good idea if the fix worked.
Updated•10 years ago
|
Attachment #8512479 -
Flags: review?(nfroyd) → review+
Comment 39•10 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/44c6d392bbc8 (In reply to Mike Hommey [:glandium] from comment #35) > Note, I'm still not comfortable with the fact that there are threads > SEGV'ing in malloc(). Sounds like fodder for a follow-up bug?
Whiteboard: [fixed-in-fx-team]
Comment 40•10 years ago
|
||
[Tracking Requested - why for this release]: regression
tracking-fennec: --- → ?
status-firefox35:
--- → affected
status-firefox36:
--- → affected
tracking-firefox34:
--- → ?
tracking-firefox35:
--- → ?
tracking-firefox36:
--- → ?
Comment 41•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/44c6d392bbc8 Early results on fx-team are looking promising. We'll see how things look over the next day or so now that this is merging around.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla36
Assignee | ||
Comment 42•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #36) > > The other patch, that tries to fix the problem, is from friday. > > It's the same patch, I just increased the state buffer since that seeems to > vary across platforms. But there was no r? flag until friday. > > You can't wait, can you? > > Sorry if you're offended, but releng really needs this fixed. Thanks for > finishing up the patch. (In reply to Brad Lassey [:blassey] (use needinfo?) from comment #38) > Two months without a review is really unacceptable. Complaining when the > review gets switched is baffling. I'm sorry, but there is absolutely no context in this bug indicating this was so much of pain, and that it was pressing to have it fixed. Whatever email exchange led to this haste is unknown to me. All I saw was snorp re-asking for review on friday for a possible fix, and then, because somehow that the review didn't happen over the week end was not quick enough, attached another patch to entirely disable on-demand decompression.
Updated•10 years ago
|
Comment 43•10 years ago
|
||
Seems to be holding up pretty well so far. Can we please get Aurora and Beta approval requests?
Assignee | ||
Comment 44•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43) > Seems to be holding up pretty well so far. Can we please get Aurora and Beta > approval requests? I'd rather see what's up with bug 1091118 first, since it's related to the code added here.
Assignee | ||
Updated•10 years ago
|
Attachment #8512004 -
Attachment is obsolete: true
Assignee | ||
Comment 45•10 years ago
|
||
Comment on attachment 8512479 [details] [diff] [review] Pre-allocate zlib inflate buffers in faulty.lib Approval Request Comment [Feature/regressing bug #]: bug 848764 [User impact if declined]: Unstable android testing. This probably also affects Firefox on user devices. [Describe test coverage new/current, TBPL]: Reduced android test instability. [Risks and why]: Not very high. If the patch had ill effects, like making the wrong chunk unpacked, or corrupted data, it would create more instability. That said, it's the kind of instability that is very hard to measure without the law of large numbers. On paper, it's safe. The good thing is that it has a very low risk of conflict, so can be backed out easily in case of disaster. [String/UUID change made/needed]: None
Attachment #8512479 -
Flags: approval-mozilla-beta?
Attachment #8512479 -
Flags: approval-mozilla-aurora?
Comment 46•10 years ago
|
||
Comment on attachment 8512479 [details] [diff] [review] Pre-allocate zlib inflate buffers in faulty.lib Beta+ Aurora+ ESR31 is marked as unaffected but bug 848764 was fixed in Firefox 26. Is this correct?
Flags: needinfo?(mh+mozilla)
Attachment #8512479 -
Flags: approval-mozilla-beta?
Attachment #8512479 -
Flags: approval-mozilla-beta+
Attachment #8512479 -
Flags: approval-mozilla-aurora?
Attachment #8512479 -
Flags: approval-mozilla-aurora+
Comment 47•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/95cf3c9955e0 https://hg.mozilla.org/releases/mozilla-beta/rev/3554e60ef779
Updated•10 years ago
|
tracking-fennec: ? → 34+
Assignee | ||
Comment 48•10 years ago
|
||
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #46) > Comment on attachment 8512479 [details] [diff] [review] > Pre-allocate zlib inflate buffers in faulty.lib > > Beta+ > Aurora+ > > ESR31 is marked as unaffected but bug 848764 was fixed in Firefox 26. Is > this correct? Technically, it's affected as much as other branches, but maybe that doesn't show up as test instability.
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 49•10 years ago
|
||
(or, the root cause of this bug, which we actually only worked around, did appear on esr31, I don't know)
Comment 50•10 years ago
|
||
If there isn't a good case to take this fix on ESR31, let's not bother. Our plan is only to support Android on this branch until mid Jan.
Updated•10 years ago
|
Comment 59•10 years ago
|
||
Bug 1113416 is believed to have fixed the underlying cause for this. Since the patch here seems to have triggered a startup regression (?), maybe we should back it out and uplift bug 1113416. Mike what do you think?
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 60•10 years ago
|
||
Let's see how things pan out in bug 1091664 first. That said, bug 1113416 is probably good to uplift, independently of whether we back out this one or not.
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 61•10 years ago
|
||
The landed fix actually didn't change the allocation pattern at all.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 62•10 years ago
|
||
This applies on top of what already landed. Note that the patch that landed increased the buffer sizes, so there *are* a few more allocated bytes, which also means more poisoned bytes when deallocating, which /might/ explain bug 1113416.
Attachment #8561926 -
Flags: review?(nfroyd)
Comment 63•10 years ago
|
||
Comment on attachment 8561926 [details] [diff] [review] Really share the same zlib inflate buffers for SeekableZStream chunks Review of attachment 8561926 [details] [diff] [review]: ----------------------------------------------------------------- I trawled a bit through the linked bugs, and I don't understand why we're doing this. Before this patch, we had preallocated buffers for everything, which as I understand it, was necessary to avoid calling malloc() at inopportune times. Now with this patch, we have really preallocated buffers for decompression, but we malloc() for everything else? Why are we not keeping the preallocated buffers for everything else? ::: mozglue/linker/Zip.h @@ +31,5 @@ > public: > + /* Forward declaration */ > + class StaticAllocator; > + > + zxx_stream(StaticAllocator *allocator_=nullptr) Nit: maybe |explicit| this?
Updated•10 years ago
|
Attachment #8561926 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 64•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/aac7de995646
Comment 65•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/aac7de995646
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
status-firefox38:
--- → fixed
Resolution: --- → FIXED
Comment 66•10 years ago
|
||
OK, so I guess we were allocating those buffers once per chunk instead of once per stream. Oops. This seems to have fixed the startup regression, so we should uplift to at least 36.
Comment 67•10 years ago
|
||
I'll let Mike request the uplift, he can probably speak better about the risk it entails.
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 68•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #66) > OK, so I guess we were allocating those buffers once per chunk instead of > once per stream. Oops. This seems to have fixed the startup regression, so > we should uplift to at least 36. Technically, the original patch didn't change much, it just made half of those allocations that were already happening slightly larger. So in that sense, it may have been the cause of the regression. The new patch does actually remove those allocations, so, more than fixing the regression, it actually does what was meant to be done. If the goal is to make the regression go away, I'd rather backout the original patch from older branches, and let the new patch bake some more. James, what do you think?
Flags: needinfo?(mh+mozilla) → needinfo?(snorp)
Comment 69•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #68) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #66) > > OK, so I guess we were allocating those buffers once per chunk instead of > > once per stream. Oops. This seems to have fixed the startup regression, so > > we should uplift to at least 36. > > Technically, the original patch didn't change much, it just made half of > those allocations that were already happening slightly larger. So in that > sense, it may have been the cause of the regression. The new patch does > actually remove those allocations, so, more than fixing the regression, it > actually does what was meant to be done. If the goal is to make the > regression go away, I'd rather backout the original patch from older > branches, and let the new patch bake some more. James, what do you think? It looks like we've been shipping the regression since 34, so I don't think I agree with backing it out now unless we also push something (the new patch) to fix the original malloc hang. If you want to just let this ride the trains, I'm alright with that.
Flags: needinfo?(snorp)
Assignee | ||
Comment 70•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #69) > It looks like we've been shipping the regression since 34, so I don't think > I agree with backing it out now unless we also push something (the new > patch) to fix the original malloc hang. If you want to just let this ride > the trains, I'm alright with that. The thing is, I don't see how that actually fixed the original malloc hang, since the amount of malloc was the same.
Comment 71•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #70) > (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #69) > > It looks like we've been shipping the regression since 34, so I don't think > > I agree with backing it out now unless we also push something (the new > > patch) to fix the original malloc hang. If you want to just let this ride > > the trains, I'm alright with that. > > The thing is, I don't see how that actually fixed the original malloc hang, > since the amount of malloc was the same. The amount was (roughly) the same, but it did it before the signal handler -- not inside of it.
Assignee | ||
Comment 72•10 years ago
|
||
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #71) > The amount was (roughly) the same, but it did it before the signal handler > -- not inside of it. AFAICS, the mallocs were still inside the signal handler.
You need to log in
before you can comment on or make changes to this bug.
Description
•