Closed Bug 1489572 Opened 6 years ago Closed 6 years ago

LifoAlloc: double chunk size on each allocation

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla66
Performance Impact high
Tracking Status
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: jorendorff, Assigned: nbp)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(8 files, 9 obsolete files)

6.46 KB, text/plain
Details
10.39 KB, patch
tcampbell
: review+
Details | Diff | Splinter Review
20.50 KB, patch
tcampbell
: review+
Details | Diff | Splinter Review
6.83 KB, patch
tcampbell
: review+
Details | Diff | Splinter Review
1.90 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
26.09 KB, patch
decoder
: feedback+
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
756 bytes, patch
tcampbell
: review+
Details | Diff | Splinter Review
The initial compilation of self-hosted code results in the allocation of 722 chunks. Not a big deal. But it does seem a bit silly. The real reason to care about this is that I'd like to add an assertion that all ParseNodes seen by the BytecodeEmitter are actually located in one of the LifoAlloc's chunks. This is easier if there are fewer of them.
Flags: needinfo?(nicolas.b.pierron)
doesn't do doubling, but does round up to the jemalloc chunk size (which above 4K is the next multiple of 4K)
Attachment #9007384 - Flags: review?(jorendorff)
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Silly not to at least round to the amount of memory that actually will be allocated for it
Comment on attachment 9007384 [details] [diff] [review] round LifoAlloc sizes up to the jemalloc chunk size Review of attachment 9007384 [details] [diff] [review]: ----------------------------------------------------------------- This is a good idea. Clearing the review flag, because the change needs to be made in a different method. ::: js/src/ds/LifoAlloc.cpp @@ +268,2 @@ > // Allocate a new BumpChunk with enough space for the next allocation. > + UniqueBumpChunk newChunk = newChunkWithCapacity(size); newChunkWithCapacity itself contains some code to round `size` up, so please merge this with the logic that's already there. (`BumpChunk::newWithCapacity(size)` does, finally, pass `size` directly to `js_malloc`.)
Attachment #9007384 - Flags: review?(jorendorff)
This is output of starting, going to arstechnica, then loading 1 article with a small patch to dump out LifoAlloc sizes from newChunkWithCapacity, run through 'sort -n -k 3 | uniq -c'. (So each line is prefixed by the number of times it occurs). ^2 is the result of RoundUpPow2(allocSizeWithCanaries), and "final:" is what you get if you call MallocGoodSize() on that. If it says "final (default):" it's defaultChunkSize_. I'll note that no chunk has a final size (^2 size) of <4k, and all are ^2 at or above 4K, even without MallocGoodSize, and MallocGoodSize would only apply in a few cases (and there we're always ^2 anyways). The 4096 size seem to come from JSContext.h: static const size_t TEMP_LIFO_ALLOC_PRIMARY_CHUNK_SIZE = 4 * 1024; and ProtectedRegionTree(): alloc(4096, false), and vm/Stack.h: static const size_t DEFAULT_CHUNK_SIZE = 4 * 1024; and: jit/IonControlFlow.h: static const size_t DEFAULT_CHUNK_SIZE = 4096; jit/ICStubSpace.h: static const size_t STUB_DEFAULT_CHUNK_SIZE = 4096; jit/ICStubSpace.h: static const size_t STUB_DEFAULT_CHUNK_SIZE = 4096; so given the above, what change are you requesting here?
Flags: needinfo?(jorendorff)
Assignee: rjesup → nicolas.b.pierron
Flags: needinfo?(nicolas.b.pierron)
Whiteboard: [qf:p1:f64]
Depends on: 1491413
Attached patch Add LifoAlloc::free_ no-op (obsolete) — Splinter Review
This is a small patch to add free_ no-op on LifoAlloc structure. Mostly made to remove noise from the upcoming patches.
Attachment #9009948 - Flags: review?(tcampbell)
When testing with extra assertions, I noticed that AsmJS creates extra LifoAlloc::Mark but never releases the content of the LifoAlloc on errors. Which implies that we could potentially stack up memory in LifoAlloc chunks.
Attachment #9009949 - Flags: review?(bbouvier)
This patch adds a call to LifoAlloc::free_, such that we could reclaim the memory of large allocations made in a LifoAlloc chunks. Note: I am investigating an non-handled OOM on speedometer which might be related to this change, but I do not see how this would be.
Attachment #9009950 - Flags: review?(jdemooij)
This patch is made such that we do not restrict our-self to allocate power of 2 for all allocations. This code would be used in the follow-up patch to allocate space for large allocations. Large allocations would default to a best-fit allocation instead of a first-come allocation.
Attachment #9009956 - Flags: review?(tcampbell)
This patch moves the TempAllocator to be part of the CFGSpace, and call its destructor before reclaiming memory.
Attachment #9009957 - Flags: review?(jdemooij)
Use a TempAllocator for CFGSpace and properly call its descrutor when reclaiming the space.
Attachment #9009958 - Flags: review?(jdemooij)
This patch add one additional list of live chunks for large allocations. Each allocation larger than the default chunk size is going to be allocated in the new list, with a single allocation per chunk. Having a single allocation per chunk allow us to reclaim the memory when free_ is called. Freeing data in a LifoAlloc structure is not the most recommended operation, and as such I made some efforts to prevent freeing data which is still marked. Unfortunately, we cannot yet assert that all marked are released before any call to freeAll. Note: Releasing a vector which got reallocated would have been an issue before, and these assertions would not prevent such issue. They would only keep the old memory around as our LifoAlloc behave today, instead of attempting to free any allocations which are before any marks.
Attachment #9009959 - Flags: review?(tcampbell)
Double the space reserved for small allocations if we are running out of space, unless there is already a buffer which we can use.
Attachment #9009960 - Flags: review?(tcampbell)
(In reply to Randell Jesup [:jesup] from comment #4) > so given the above, what change are you requesting here? I had in mind the change nbp has made in attachment 9009960 [details] [diff] [review].
Flags: needinfo?(jorendorff)
Comment on attachment 9009949 [details] [diff] [review] AsmJS CheckFunction: Reclaim LifoAlloc space on errors. Review of attachment 9009949 [details] [diff] [review]: ----------------------------------------------------------------- Good catch, thanks for fixing.
Attachment #9009949 - Flags: review?(bbouvier) → review+
Depends on: 1496402
Comment on attachment 9009960 [details] [diff] [review] LifoAlloc: Double the size of the small-space to reduce the number of chunks. Review of attachment 9009960 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ds/LifoAlloc.cpp @@ +172,5 @@ > // for the next allocation. > minSize = MallocGoodSize(minSize); > const size_t chunkSize = fit || minSize > defaultChunkSize_ > ? minSize > + : Max((curSize_ - reallocCurSize_) * 2, defaultChunkSize_); self-nit: no need for the *2, as this would triple the current size.
Priority: -- → P2
Comment on attachment 9009949 [details] [diff] [review] AsmJS CheckFunction: Reclaim LifoAlloc space on errors. I ran into same issue in fuzz-bug Bug 1493475 and ended up with same solution which has since landed. Marking this as obsolete.
Attachment #9009949 - Attachment is obsolete: true
Maybe LifoAlloc::mark() should return an RAII auto-releasing object that one could manually get Mark objects out of if you really needed to manage Mark lifetimes, rather than people having to remember to call release()?
(In reply to Nathan Froyd [:froydnj] from comment #17) > Maybe LifoAlloc::mark() should return an RAII auto-releasing object that one > could manually get Mark objects out of if you really needed to manage Mark > lifetimes, rather than people having to remember to call release()? Ideally yes. Unfortunately we have more ground work to do, such as ensuring that the destructors are effectively called. Some of the LifoAllocScope allocations are made within a LifoAlloc chunk, which does not handle any calls to the destructors, and as such we have no way to ensure that these calls would be balanced at the moment. I want to change that in the future, but we should first ensure that we do it properly. And balancing these mark() and release(Mark) calls feels less important than implementing this feature at the moment.
Priority: P2 → P1
After discussion with :nbp, the work for this bug will be scoped to: - Cleanup cold code to LifoAlloc.cpp - Oversize flag for allocations - store in a separate list of chunks - always alloc and always free - Mark will have to track this list as well - Doubling up to a limit (1MB or so) Follow-up: - Move TempAllocator patches to another bug - Move LifoAlloc::free to another bug / discussion - Fix mark/release balancing.
Split hot/cold code in different functions such that cold code will not get inlined. This should make the generated code smaller and potentially faster too.
Attachment #9017915 - Flags: review?(tcampbell)
Attachment #9007384 - Attachment is obsolete: true
Attachment #9009948 - Attachment is obsolete: true
Attachment #9009950 - Attachment is obsolete: true
Attachment #9009956 - Attachment is obsolete: true
Attachment #9009957 - Attachment is obsolete: true
Attachment #9009958 - Attachment is obsolete: true
Attachment #9009959 - Attachment is obsolete: true
Attachment #9009960 - Attachment is obsolete: true
Attachment #9009948 - Flags: review?(tcampbell)
Attachment #9009950 - Flags: review?(jdemooij)
Attachment #9009956 - Flags: review?(tcampbell)
Attachment #9009957 - Flags: review?(jdemooij)
Attachment #9009958 - Flags: review?(jdemooij)
Attachment #9009959 - Flags: review?(tcampbell)
Attachment #9009960 - Flags: review?(tcampbell)
LifoAlloc bump allocations are now stored in 2 different lists instead of 1. The first list continas small allocations, allocated in small chunks, while the second list holds large allocation, each stored in its own chunk. Splitting this list in 2 should prevent BumpChunks from being composed mostly of wasted space when a large allocation happens.
Attachment #9017916 - Flags: review?(tcampbell)
Add recordings of the quantity of oversize allocations such that the quantity of small allocations can be deduced with a difference. As discussed, this implement the heuristic of doubling the size until we reach a factor of 1024 of the original size, in which case buckets of 1024 the default chunk size would be allocated instead of 1024 buckets of the default chunk size.
Attachment #9017920 - Flags: review?(tcampbell)
(In reply to Nicolas B. Pierron [:nbp] from comment #22) > As discussed, this implement the heuristic of doubling the size until we > reach a > factor of 1024 of the original size, in which case buckets of 1024 the > default > chunk size would be allocated instead of 1024 buckets of the default chunk > size. If we start with 4 KB, we will double until 4 MB, right? I'm a bit worried about wasting a lot of (unused) space and this backfiring on 32-bit with its limited/fragmented virtual address space. Is one 4 MB chunk instead of 2 separate 2 MB chunks really a measurable win?
(In reply to Jan de Mooij [:jandem] from comment #23) > (In reply to Nicolas B. Pierron [:nbp] from comment #22) > > As discussed, this implement the heuristic of doubling the size until we > > reach a > > factor of 1024 of the original size, in which case buckets of 1024 the > > default > > chunk size would be allocated instead of 1024 buckets of the default chunk > > size. > > If we start with 4 KB, we will double until 4 MB, right? I'm a bit worried > about wasting a lot of (unused) space and this backfiring on 32-bit with its > limited/fragmented virtual address space. Is one 4 MB chunk instead of 2 > separate 2 MB chunks really a measurable win? If you are going to need 4MB chunks, that's because you already consumed at least (4 MB - 4kB). Maybe we can change this heuristic to not double but follow the Fibonacci sequence. I did observed some crash reports where TI was consuming 64 MB of 4 KB chunks. So, I do not think this would waste much memory compared to our actual memory consumption, and I do think this would reduce the number of free which are potentially slowing us down. If this wasted memory becomes an issue to be addressed, then we can change the logic of the NextSize function later, in a follow-up bug. At the moment, the biggest issue is to get this in-place, more than fine tuning these heuristics. If you have time to fine tune these, great! In my case I have other things to address for 64.
FWIW, for xpcom strings, we double until a certain threshold (8MB), then x1.125 (cheap to compute with shift and add) rounded to the nearest MB. Maybe a similar strategy will work here?
Comment on attachment 9017915 [details] [diff] [review] LifoAlloc: Move cold code to LifoAlloc.cpp. Review of attachment 9017915 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/ds/LifoAlloc.cpp @@ +321,5 @@ > other->decrementCurSize(size); > } > + > +void > +LifoAlloc::setReadOnly() self-nit: #ifdef LIFO_CHUNK_PROTECT
Comment on attachment 9017916 [details] [diff] [review] Split LifoAlloc chunks in 2 lists. Review of attachment 9017916 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/gc/StoreBuffer.h @@ +155,5 @@ > storage_ = js_new<LifoAlloc>(LifoAllocBlockSize); > + // This prevents LifoAlloc::Enum to crash with a release > + // assertion if we ever allocate one entry larger than > + // LifoAllocBlockSize. > + storage_->setOversizeThreshold(SIZE_MAX); self-nit: Check the returned value before accessing.
Comment on attachment 9017915 [details] [diff] [review] LifoAlloc: Move cold code to LifoAlloc.cpp. Review of attachment 9017915 [details] [diff] [review]: ----------------------------------------------------------------- r+ with your macro nit fixed
Attachment #9017915 - Flags: review?(tcampbell) → review+
Comment on attachment 9017916 [details] [diff] [review] Split LifoAlloc chunks in 2 lists. Review of attachment 9017916 [details] [diff] [review]: ----------------------------------------------------------------- Having a list for singleton chunks makes a lot of sense to me. I wonder if we should investigate use the setOversizeThreshold() on a few consumers of LifoAlloc to control heuristics around fragmentation. I want the hack in the StoreBuffer to be fixed with a better abstraction before landing. ::: js/src/ds/LifoAlloc.cpp @@ +159,2 @@ > LifoAlloc::UniqueBumpChunk > +LifoAlloc::newChunkWithCapacity(size_t n, bool oversize) Ideally we would have MOZ_ASSERT_IF(!oversize, n < oversizeThreshold_), but I'm not sure we can get it away with it in all cases due to allocSizeWithReadZone. @@ +170,5 @@ > { > return nullptr; > } > > + const size_t chunkSize = oversize || minSize > defaultChunkSize_ Nit: parentheses around (oversize || minSize > defaultChunkSize_) @@ +222,5 @@ > LifoAlloc::allocImplColdPath(size_t n) > { > void* result; > + UniqueBumpChunk newChunk = getOrCreateChunk(n); > + if (!newChunk) { I like how this cleanup worked out. @@ +281,2 @@ > } > + detail::BumpChunk::Mark oversizeMark; I think this line is deadcode. @@ +287,4 @@ > } > > void > LifoAlloc::release(Mark mark) The duplicated code is getting long. Maybe we can have a helper function or lambda here. ::: js/src/ds/LifoAlloc.h @@ +534,4 @@ > BumpChunkList chunks_; > > + // List of chunks containing allocated data where each allocation is larger > + // than the default chunk size. Each chunk in this list contains no more - larger than oversize threshold - each chunk contains exactly one allocation - This reduces wasted space in normal chunk list @@ +596,5 @@ > > MOZ_ALWAYS_INLINE > void* allocImpl(size_t n) { > void* result; > + if (MOZ_UNLIKELY(n > oversizeThreshold_)) { Probably would be good to have a comment re-iterating this heuristic. Perhaps something like: // Give oversized allocations their down chunk instead of wasting space due to fragmentation at the end of a normal chunk. @@ +621,5 @@ > > + // Set the threshold to allocate data in its own chunk outside the space for > + // small allocations. > + void setOversizeThreshold(size_t oversizeThreshold) { > + oversizeThreshold_ = oversizeThreshold; I'm wondering if we should MOZ_ASSERT(oversizeThreshold <= defaultChunkSize, "Allocations larger than defaultChunkSize should always be oversize"); @@ +796,5 @@ > + bool empty = chunks_.empty() || > + (chunks_.begin() == chunks_.last() && chunks_.last()->empty()); > + MOZ_ASSERT_IF(!oversize_.empty(), !oversize_.last()->empty()); > + empty = empty && oversize_.empty(); > + return empty; Nit: Directly |return empty && oversize_.empty()| @@ +909,5 @@ > : chunkIt_(alloc.chunks_.begin()), > chunkEnd_(alloc.chunks_.end()), > head_(nullptr) > { > + MOZ_RELEASE_ASSERT(alloc.oversize_.empty()); Good idea! ::: js/src/gc/StoreBuffer.h @@ +152,5 @@ > MOZ_MUST_USE bool init() { > MOZ_ASSERT(!head_); > if (!storage_) { > storage_ = js_new<LifoAlloc>(LifoAllocBlockSize); > + // This prevents LifoAlloc::Enum to crash with a release s/to crash/from crashing/ @@ +155,5 @@ > storage_ = js_new<LifoAlloc>(LifoAllocBlockSize); > + // This prevents LifoAlloc::Enum to crash with a release > + // assertion if we ever allocate one entry larger than > + // LifoAllocBlockSize. > + storage_->setOversizeThreshold(SIZE_MAX); Add a disableOversize() method so that this mechanism is inside LifoAlloc instead of in the StoreBuffer. We could then consider asserting in Enum that the disableOversize was set (by checking SIZE_MAX).
Attachment #9017916 - Flags: review?(tcampbell) → review+
Looking at the try jobs, the AWSY results seem problematic (25% explicit mem regression). I'm going to fire off a few tries runs with and without to see what is going on.
I did some try runs to compare with and without the patches. For AWSY, this heuristic change gives an 8.5% regression is JS heap size. It is looking like we should do more investigation and push back to FF65. I think we need better measurements on how many LifoAllocs are typically live, what unused-but-available space exists in large chunks (and the unused_ list), what total size of LifoAlloc becomes. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2e3de91e6bd6b735bb6bedd70a0d9a2f14a486a https://treeherder.mozilla.org/#/jobs?repo=try&revision=65ce0d32b4c365a7a25951db738e2237c573e4b6
(In reply to Nicolas B. Pierron [:nbp] from comment #24) > I did observed some crash reports where TI was consuming 64 MB of 4 KB > chunks. So, I do not think this would waste much memory compared to our > actual memory consumption, I admit 64 MB chunks of 4 KB is a bit silly, but switching to 8 MB (TYPE_LIFO_ALLOC_PRIMARY_CHUNK_SIZE = 8 KB) chunks at a time *will* have some unused *megabytes* in most cases. These allocations are also more likely to fail on 32-bit systems. Maybe all that is fine and worth it, but the Try results do suggest it's a valid concern.
I have made 2 try pushes, one which is the current set of patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a9344e687b4a1935217fe1bb25d7b78af33737e And another with Nathan suggestion (comment 25): https://treeherder.mozilla.org/#/jobs?repo=try&revision=69585aca7bea588ee12514948ce6b87037ae2e03 (In reply to Ted Campbell [:tcampbell] from comment #31) > I did some try runs to compare with and without the patches. For AWSY, this > heuristic change gives an 8.5% regression is JS heap size. It is looking > like we should do more investigation and push back to FF65. I do not see any JS heap size increases in the report that I have from AWSY, from my try pushes, I only see "Base Content" which is bi-modal. My guess would be that we are not able to reclaim some memory when a GC is triggered, and one of the LifoAlloc keeps 7 MB of unused chunks. I will also note, by looking at the Graph over the last 14 days seems to highlight that this is a Try issue related to AWSY: https://treeherder.mozilla.org/perf.html#/graphs?series=try,1706923,1,4&series=mozilla-central,1707015,1,4&highlightAlerts=0 Therefore, I do not think we should worry about these AWSY result unless they appear on mozilla central.
I'm not comfortable with being the one to sign off on this change so close to merge. I'm fine with the correctness of the patches, if there are others who have a better feel for perf vs memory tradeoffs. I would strongly like to push this back to next week so we can see how it behaves on nightly in practice, especially with the talk that NextSize may need heuristic tuning.
Depends on: 1500176
Comment on attachment 9018333 [details] [diff] [review] Try Nathan heuristics Review of attachment 9018333 [details] [diff] [review]: ----------------------------------------------------------------- Note: I will change the patch message before pushing it.
Attachment #9018333 - Flags: review?(jorendorff)
Comment on attachment 9017920 [details] [diff] [review] Adaptively double LifoAlloc chunks to reduce the number of chunks. Review of attachment 9017920 [details] [diff] [review]: ----------------------------------------------------------------- Chatting with nbp, I'll mark this r+ if we leave the NextSize as |return start| and split the heuristic off to another patch. The NextSize trick here seems like a reasonable way to organize our scaling code.
Attachment #9017920 - Flags: review?(tcampbell) → review+
Comment on attachment 9018333 [details] [diff] [review] Try Nathan heuristics Review of attachment 9018333 [details] [diff] [review]: ----------------------------------------------------------------- r=me on the idea generally. ::: js/src/ds/LifoAlloc.cpp @@ +161,4 @@ > } > > static size_t > NextSize(size_t start, size_t used) This code was hard for me to read, so I rewrote it. This produces a very similar sequence, not identical: // Heuristic to choose the size of the next BumpChunk for small allocations. // `start` is the size of the first chunk. `used` is the total size of all // BumpChunks in this LifoAlloc so far. static size_t NextSize(size_t start, size_t used) { // Double the size, up to 1 MB. const size_t mb = 1 * 1024 * 1024; if (used < mb) { return Max(start, used); } // After 1 MB, grow more gradually, to waste less memory. // The sequence (in megabytes) begins: // 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, ... return JS_ROUNDUP(used / 8, mb); }
Attachment #9018333 - Flags: review?(jorendorff) → review+
(In reply to Ted Campbell [:tcampbell] from comment #29) > ::: js/src/ds/LifoAlloc.cpp > @@ +159,2 @@ > > LifoAlloc::UniqueBumpChunk > > +LifoAlloc::newChunkWithCapacity(size_t n, bool oversize) > > Ideally we would have MOZ_ASSERT_IF(!oversize, n < oversizeThreshold_), but > I'm not sure we can get it away with it in all cases due to > allocSizeWithReadZone. I was not able to add this assertion as the Trampolines create in the JitRuntime are making use of a TempAllocator initialized with the JSContext LifoAlloc. This assertion fails because the TempAllocator is using the 16KB ensureBallast calls, while the LifoAlloc of the JSContext is initialized with a defaultChunkSize of 4KB.
Pushed by npierron@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f5973de3f6e8 LifoAlloc: Move cold code to LifoAlloc.cpp. r=tcampbell https://hg.mozilla.org/integration/mozilla-inbound/rev/683e3e0eaee1 Split LifoAlloc chunks in 2 lists. r=tcampbell https://hg.mozilla.org/integration/mozilla-inbound/rev/cbfcb4b2b5c9 Add LifoAlloc instrumentation for adding heuristics. r=tcampbell https://hg.mozilla.org/integration/mozilla-inbound/rev/f3aa68c506e0 Add LifoAlloc heuristic to increase the reserved space based on the consumed space. r=jorendorff
Backed out for valgrind failures at js::jit::LIRGenerator::visitBlock Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&searchStr=linux%2Cx64%2Copt%2Cvalgrind-linux64-valgrind%2Fopt%2C%28v%29&fromchange=f3aa68c506e01055a7d7d20eb4fb672c511b9caf&tochange=58bcea5ede75c573fc8b8b20380b17d19b6f50ca&selectedJob=207271995 Failure log: https://hg.mozilla.org/integration/mozilla-inbound/rev/58bcea5ede75c573fc8b8b20380b17d19b6f50ca Backout link: https://hg.mozilla.org/integration/mozilla-inbound/rev/58bcea5ede75c573fc8b8b20380b17d19b6f50ca [task 2018-10-23T14:48:49.551Z] 14:48:49 INFO - 0:33.48 TEST-UNEXPECTED-FAIL | valgrind-test | Conditional jump or move depends on uninitialised value(s) at js::jit::LIRGenerator::visitBlock / js::jit::LIRGenerator::generate / js::jit::GenerateLIR / js::jit::CompileBackEnd [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== Conditional jump or move depends on uninitialised value(s) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== at 0xFEF30DC: js::jit::LIRGenerator::visitBlock(js::jit::MBasicBlock*)+204 (Lowering.cpp:5324) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFEF3344: js::jit::LIRGenerator::generate()+180 (Lowering.cpp:5398) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE09223: js::jit::GenerateLIR(js::jit::MIRGenerator*)+403 (Ion.cpp:1784) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE09C6D: js::jit::CompileBackEnd(js::jit::MIRGenerator*)+29 (Ion.cpp:1889) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x100C709F: js::HelperThread::handleIonWorkload(js::AutoLockHelperThreadState&)+191 (HelperThreads.cpp:2199) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x100C69E5: js::HelperThread::threadLoop()+325 (HelperThreads.cpp:2643) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x100CACCE: callMain<0> (Thread.h:243) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x100CACCE: js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*)+14 (Thread.h:236) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x4E3BB4F: start_thread+207 (pthread_create.c:304) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x5CDCFBC: clone+108 (clone.S:112) [task 2018-10-23T14:48:49.552Z] 14:48:49 INFO - 0:33.48 ==3591== Uninitialised value was created by a client request [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== at 0xFE0B452: setBump (LifoAlloc.h:329) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0B452: tryAlloc (LifoAlloc.h:472) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0B452: allocImpl (LifoAlloc.h:608) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0B452: alloc (LifoAlloc.h:669) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0B452: new_<js::jit::IonBuilder, JSContext *, js::jit::CompileRealm *, const js::jit::JitCompileOptions &, js::jit::TempAllocator *&, js::jit::MIRGraph *&, js::CompilerConstraintList *&, js::jit::BaselineInspector *&, js::jit::CompileInfo *&, const js::jit::OptimizationInfo *&, js::jit::BaselineFrameInspector *&> (LifoAlloc.h:867) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0B452: IonCompile (Ion.cpp:2110) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0B452: js::jit::Compile(JSContext*, JS::Handle<JSScript*>, js::jit::BaselineFrame*, unsigned char*, bool)+3250 (Ion.cpp:2439) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0BD7E: BaselineCanEnterAtEntry (Ion.cpp:2564) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE0BD7E: js::jit::IonCompileScriptForBaseline(JSContext*, js::jit::BaselineFrame*, unsigned char*)+286 (Ion.cpp:2697) [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x3CD593301691: ??? [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x3CD593307700: ??? [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x16F792E7: ??? [task 2018-10-23T14:48:49.553Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x3CD593307700: ??? [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x1718D6E7: ??? [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x3CD5932FF4C1: ??? [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE6F8EE: EnterJit (Jit.cpp:103) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFE6F8EE: js::jit::MaybeEnterJit(JSContext*, js::RunState&)+910 (Jit.cpp:170) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103E9A8E: Interpret(JSContext*, js::RunState&)+35278 (Interpreter.cpp:3505) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103E0E1B: js::RunScript(JSContext*, js::RunState&)+379 (Interpreter.cpp:447) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103F0F03: js::ExecuteKernel(JSContext*, JS::Handle<JSScript*>, JSObject&, JS::Value const&, js::AbstractFramePtr, JS::Value*)+275 (Interpreter.cpp:813) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFCCE04A: ExecuteInExtensibleLexicalEnvironment(JSContext*, JS::Handle<JSScript*>, JS::Handle<JSObject*>)+298 (Eval.cpp:480) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFCCE2BE: js::ExecuteInJSMEnvironment(JSContext*, JS::Handle<JSScript*>, JS::Handle<JSObject*>, JS::AutoVector<JSObject*>&)+254 (Eval.cpp:589) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFCCE18B: js::ExecuteInJSMEnvironment(JSContext*, JS::Handle<JSScript*>, JS::Handle<JSObject*>)+91 (Eval.cpp:544) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xD075306: mozJSComponentLoader::ObjectForLocation(ComponentLoaderInfo&, nsIFile*, JS::MutableHandle<JSObject*>, JS::MutableHandle<JSScript*>, char**, bool, JS::MutableHandle<JS::Value>)+1878 (mozJSComponentLoader.cpp:939) [task 2018-10-23T14:48:49.554Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xD0772D5: mozJSComponentLoader::Import(JSContext*, nsTSubstring<char> const&, JS::MutableHandle<JSObject*>, JS::MutableHandle<JSObject*>, bool)+2357 (mozJSComponentLoader.cpp:1372) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xD0764C3: mozJSComponentLoader::ImportInto(nsTSubstring<char> const&, JS::Handle<JSObject*>, JSContext*, JS::MutableHandle<JSObject*>)+99 (mozJSComponentLoader.cpp:1166) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xD0762F3: mozJSComponentLoader::ImportInto(nsTSubstring<char> const&, JS::Handle<JS::Value>, JSContext*, unsigned char, JS::MutableHandle<JS::Value>)+435 (mozJSComponentLoader.cpp:1032) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xD6E469E: mozilla::dom::ChromeUtils::Import(mozilla::dom::GlobalObject const&, nsTSubstring<char16_t> const&, mozilla::dom::Optional<JS::Handle<JSObject*> > const&, JS::MutableHandle<JSObject*>, mozilla::ErrorResult&)+542 (ChromeUtils.cpp:412) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xDC6338C: mozilla::dom::ChromeUtils_Binding::import(JSContext*, unsigned int, JS::Value*)+1644 (ChromeUtilsBinding.cpp:3220) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103EFB3B: CallJSNative (Interpreter.cpp:468) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103EFB3B: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct)+923 (Interpreter.cpp:560) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103E9DE3: CallFromStack (Interpreter.cpp:620) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103E9DE3: Interpret(JSContext*, js::RunState&)+36131 (Interpreter.cpp:3462) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103E0E1B: js::RunScript(JSContext*, js::RunState&)+379 (Interpreter.cpp:447) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103EFE8A: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct)+1770 (Interpreter.cpp:587) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x103F03B8: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>)+40 (Interpreter.cpp:633) [task 2018-10-23T14:48:49.555Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFFC2025: JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)+1269 (jsapi.cpp:2917) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xD0BAD2A: nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, nsXPTMethodInfo const*, nsXPTCMiniVariant*)+3690 (XPCWrappedJSClass.cpp:1206) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xC9796AD: PrepareAndDispatch+493 (xptcstubs_x86_64_linux.cpp:127) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xC978B1E: SharedStub+90 (in /builds/worker/workspace/build/src/obj-firefox/toolkit/library/libxul.so) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFBFEB24: nsXREDirProvider::DoStartup()+324 (nsXREDirProvider.cpp:1090) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFBF505D: XREMain::XRE_mainRun()+2269 (nsAppRunner.cpp:4607) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFBF5C73: XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)+947 (nsAppRunner.cpp:4922) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0xFBF6258: XRE_main(int, char**, mozilla::BootstrapConfig const&)+216 (nsAppRunner.cpp:5014) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x10EC8B: do_main (nsBrowserApp.cpp:237) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== by 0x10EC8B: main+795 (nsBrowserApp.cpp:329) [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 ==3591== [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 { [task 2018-10-23T14:48:49.556Z] 14:48:49 INFO - 0:33.48 <insert_a_suppression_name_here> [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 Memcheck:Cond [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:_ZN2js3jit12LIRGenerator10visitBlockEPNS0_11MBasicBlockE [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:_ZN2js3jit12LIRGenerator8generateEv [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:_ZN2js3jit11GenerateLIREPNS0_12MIRGeneratorE [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:_ZN2js3jit14CompileBackEndEPNS0_12MIRGeneratorE [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:_ZN2js12HelperThread17handleIonWorkloadERNS_25AutoLockHelperThreadStateE [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:_ZN2js12HelperThread10threadLoopEv [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:callMain<0> [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:_ZN2js6detail16ThreadTrampolineIRFvPvEJPNS_12HelperThreadEEE5StartES2_ [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:start_thread [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 fun:clone [task 2018-10-23T14:48:49.557Z] 14:48:49 INFO - 0:33.48 }
Flags: needinfo?(nicolas.b.pierron)
Attached patch to be fuzzedSplinter Review
This patch is the aggregation of all other patches which landed. I was unable to reproduce the Valgrind failure reported before, and suspect that fuzzing this patch with an ASan build and running with the following command line options is likely to trigger these issues: --ion-eager --no-threads
Attachment #9020291 - Flags: feedback?(nth10sd)
Attachment #9020291 - Flags: feedback?(choller)
Flags: needinfo?(nicolas.b.pierron)
Valgrind seemed to fail on https://hg.mozilla.org/try/rev/683e3e0eaee1 with the split-into-two-lists patch. Failure is pretty strange.
Whiteboard: [qf:p1:f64] → [qf:p1:f65]
Comment on attachment 9020291 [details] [diff] [review] to be fuzzed I tested this over the weekend and couldn't find anything.
Attachment #9020291 - Flags: feedback?(choller) → feedback+
Comment on attachment 9020291 [details] [diff] [review] to be fuzzed Review of attachment 9020291 [details] [diff] [review]: ----------------------------------------------------------------- Tested this for a day or so and didn't find anything either.
Attachment #9020291 - Flags: feedback?(nth10sd) → feedback+
Trying to run the JS shell with jit-test test suite, increasing the LifoAlloc red-zone size set to 128 bytes does not seems to highlight any issues either.
I managed to reproduce this issue with the JS shell compiled with clang 5.0.2: $ valgrind -q --smc-check=all-non-file --error-exitcode=1 --gen-suppressions=all --show-possibly-lost=no --leak-check=full \ ./js --ion-eager --ion-offthread-compile=off /dev/null When looking at the first reported error on visitInstruction, the code: MOZ_ASSERT(!errored()); is compiled to: d7e340: 0f b7 48 30 movzwl 0x30(%rax),%ecx d7e344: 0f b6 50 32 movzbl 0x32(%rax),%edx d7e348: c1 e2 10 shl $0x10,%edx d7e34b: 09 ca or %ecx,%edx d7e34d: 81 fa ff ff 00 00 cmp $0xffff,%edx d7e353: 0f 86 47 01 00 00 jbe d7e4a0 <_ZN2js3jit12LIRGenerator16visitInstructionEPNS0_12MInstructionE+0x170> This is the code generated for checking the value of: offThreadStatus_.isErr() In this case offThreadStatus_ is of type AbortReasonOr<Ok> (i.e: Result<Ok, AbortReason>) which is compiled with the packing strategy: ResultImplementation<Ok, AbortReason, PackingStrategy::PackedVariant> where data are stored with the IsPackableVariant<Ok, AbortReason>::VEBool: struct VEbool { V v; E e; bool ok; }; In the above assembly, the code of isErr(): return !data.ok; Is compiled as the following expression: ecx = load.16 [rax + 0x30] // Load data.v and data.e values (data.e is non-initialized) edx = load.8 [rax + 0x32] // Load data.ok value. edx = edx << 16 | ecx // edx contains the result of VEbool structure. if (edx <= 0xffff) // test if data.ok is 0. On the other hand gcc compiles the previous code to: c0740a: 0f b6 42 32 movzbl 0x32(%rdx),%eax c0740e: 84 c0 test %al,%al c07410: 0f 84 fa 00 00 00 je c07510 <_ZN2js3jit12LIRGenerator16visitInstructionEPNS0_12MInstructionE+0x110> eax = load.8 [rdx + 0x32] // Load data.ok value. if ((eax & eax) == 0) // test if data.ok is 0. The problem is that the clang-ism is manipulates undefined bits which seems to confuse Valgrind at some point and raise a false positive error, at least on the LIRGenerator::visitInstruction case. To me we should, figure out why valgrind did not manage to understand this clang-ism, and disable such clang-ism as it seems to generate inefficient code (2 loads instead and one extra register, instead of one load and one register in gcc).
I uploaded a patch to fix the issue reported by Valgrind and detailed in comment 47. This Valgrind fix is being tracked at: https://bugs.kde.org/show_bug.cgi?id=401112
Mike, do you know if it is possible to use a custom version of Valgrind for running our test suite? Or should we wait until the next Valgrind release? (see comment 47 and comment 48)
Flags: needinfo?(mh+mozilla)
(In reply to Nicolas B. Pierron [:nbp] from comment #47) > I managed to reproduce this issue with the JS shell compiled with clang > 5.0.2: For information, clang 7.0.0 generates the same code and has the same false-positive report when executed with Valgrind. My blind guess would be an issue in the optimization choice which use this optimization but does not check if the compared structure can be read with a single load.
We can apply patches, but I'd rather wait for it to be reviewed upstream first.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #51) > We can apply patches, but I'd rather wait for it to be reviewed upstream > first. Done: https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=f2c03ce3babe51eecbf03735f726c4028a162857
Flags: needinfo?(mh+mozilla)
Add leave-open keyword for landing LifoAlloc changes after attachment 9028867 [details] is landed.
Keywords: leave-open
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/134001f7ca7f Apply upstream patch to valgrind to fix some false positives. r=froydnj
Changing this test case I added earlier this year, to check LifoAlloc mprotect instrumentation is working well and not crashing. Knowing that the general stack protection is no longer present I am almost tempted to remove this test case. In the mean time, this test fails with --no-baseline --no-ion as it takes too much time on the our CI, which is the intended test configuration. I am hoping that adding the "slow" tag will work. (currently running on Try) The reason why this test fails is likely due to the change of LifoAlloc chunk increased size, and the interpreter frames are being allocated within these LifoAlloc chunks. The fact of having more contiguous space with this current bug, is likely preventing us from wasting Interpreter stack space, thus stack-exhaustion errors to be delayed.
Flags: needinfo?(mh+mozilla)
Attachment #9031228 - Flags: review?(tcampbell)
Attachment #9031228 - Flags: review?(tcampbell) → review+
Pushed by npierron@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d9dea0ec29b0 LifoAlloc: Move cold code to LifoAlloc.cpp. r=tcampbell https://hg.mozilla.org/integration/mozilla-inbound/rev/6cc520c2871e Split LifoAlloc chunks in 2 lists. r=tcampbell https://hg.mozilla.org/integration/mozilla-inbound/rev/11c134af2134 Add LifoAlloc instrumentation for adding heuristics. r=tcampbell https://hg.mozilla.org/integration/mozilla-inbound/rev/aaddfe44c11b Add LifoAlloc heuristic to increase the reserved space based on the consumed space. r=jorendorff https://hg.mozilla.org/integration/mozilla-inbound/rev/c39cd7438b60 Fix test case running longer than expected due to increased interpreter stack. r=tcampbell
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
some perf wins in ion: == Change summary for alert #18398 (as of Thu, 20 Dec 2018 23:21:11 GMT) == Improvements: 3% raptor-wasm-godot-ion-firefox linux64 opt 2,132.13 -> 2,070.39 3% raptor-wasm-godot-ion-firefox linux64-qr opt 2,165.06 -> 2,108.11 3% raptor-wasm-godot-ion-firefox linux64 pgo 2,084.79 -> 2,030.47 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18398
Performance Impact: --- → P1
Whiteboard: [qf:p1:f65]
Duplicate of this bug: 1475561
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: