Closed Bug 605029 Opened 15 years ago Closed 14 years ago

ArenaHeader versus Arena<FreeCell> and other cleanups

Tracking

()

Status:

RESOLVED FIXED

Tracking Flags:

Tracking

Status

blocking2.0

---

.x+

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 13 obsolete files)

v1 15 years ago Igor Bukanov 57.05 KB, patch		Details \| Diff \| Splinter Review
v2 15 years ago Igor Bukanov 65.76 KB, patch		Details \| Diff \| Splinter Review
v3 15 years ago Igor Bukanov 70.85 KB, patch		Details \| Diff \| Splinter Review
v4 15 years ago Igor Bukanov 70.77 KB, patch		Details \| Diff \| Splinter Review
v5 15 years ago Igor Bukanov 79.43 KB, patch		Details \| Diff \| Splinter Review
v6 15 years ago Igor Bukanov 78.64 KB, patch		Details \| Diff \| Splinter Review
v7 15 years ago Igor Bukanov 77.29 KB, patch		Details \| Diff \| Splinter Review
v8 15 years ago Igor Bukanov 89.28 KB, patch		Details \| Diff \| Splinter Review
v9 15 years ago Igor Bukanov 89.88 KB, patch		Details \| Diff \| Splinter Review
v10 14 years ago Igor Bukanov 88.96 KB, patch		Details \| Diff \| Splinter Review
v11 14 years ago Igor Bukanov 89.07 KB, patch		Details \| Diff \| Splinter Review
v12 14 years ago Igor Bukanov 90.38 KB, patch	billm : review+	Details \| Diff \| Splinter Review
v13 14 years ago Igor Bukanov 91.45 KB, patch	igor : review+	Details \| Diff \| Splinter Review
v14 14 years ago Igor Bukanov 90.63 KB, patch		Details \| Diff \| Splinter Review

Igor Bukanov

Assignee

Description

•

15 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

To minimize the amount of Arena<T> casts in the GC implementation the attached patch replaces various Arena<FreeCell> links with ArenaHeader. The patch also shrinks MarkingDelay down to one word. With forthcoming patch for bug 600234 that allowed to pack an extra arena into the chunk. Although the win is tiny (it decreases the memory consumption by 0.4%) and not visible during normal execution, the finalization is bound by the memory bandwidth so memory shrinking directly translates into smaller finalization time. Besides that the patch contains few other cleanups to make patches for bug 600234, bug 600648 and bug 601234 smaller.

Attachment #483863 - Flags: review?(anygregor)

Igor Bukanov

Assignee

Comment 1

•

15 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

This is v1 with extra changes to remove debug-related checkArenaListsForThing and ArenaList::.arenasContainThing as the corresponding functionality is provided by MarkIfGCThingWord.

Attachment #483863 - Attachment is obsolete: true

Attachment #484049 - Flags: review?(anygregor)

Attachment #483863 - Flags: review?(anygregor)

Igor Bukanov

Assignee

Comment 2

•

15 years ago

Attached patch v3 (obsolete) — Details — Splinter Review

Here are few more changes to simplify the dependent patches. The new version removes ArenaHeader::isUsed replacing it with ArenaHeader::compartment check. It also removed ArenaHeader::thingSize. The latter is used only in debug builds and can be trivially found from thingKind using an array lookup. Another change is the removal of MarkingDelay::start. That field at best on average makes marking of delayed arena twice faster but the delayed marking does not happen in practice so the simplest code wins. And if we would get any indications that a slow delayed marking is observable, it can be made several times faster using a version of the pre-compartment GC code. With this change MarkingDelay becomes unnecessary so the patch moves the linking field into ArenaHeader. On 64 bit this brings an extra arena in chunk available for things allocation while making simpler code. With this patch I see no changes in V8/SS on a 64-bit server box with little noise.

Attachment #484049 - Attachment is obsolete: true

Attachment #484692 - Flags: review?(anygregor)

Attachment #484049 - Flags: review?(anygregor)

Igor Bukanov

Assignee

Comment 3

•

15 years ago

Attached patch v4 (obsolete) — Details — Splinter Review

rebased patch

Attachment #484692 - Attachment is obsolete: true

Attachment #485060 - Flags: review?(anygregor)

Attachment #484692 - Flags: review?(anygregor)

Gregor Wagner [:gwagner]

Comment 4

•

15 years ago

I am no longer in MV and I have very limited time because of paper deadlines and other stuff. Andreas can you take the review?

Gregor Wagner [:gwagner]

Updated

•

15 years ago

Attachment #485060 - Flags: review?(anygregor) → review?(gal)

Igor Bukanov

Assignee

Comment 5

•

15 years ago

Attached patch v5 (obsolete) — Details — Splinter Review

Here is a rebased patch

Attachment #485060 - Attachment is obsolete: true

Attachment #494724 - Flags: review?(anygregor)

Attachment #485060 - Flags: review?(gal)

Igor Bukanov

Assignee

Comment 6

•

15 years ago

Attached patch v6 (obsolete) — Details — Splinter Review

here is a refreshed patch

Attachment #494724 - Attachment is obsolete: true

Attachment #498211 - Flags: review?(anygregor)

Attachment #494724 - Flags: review?(anygregor)

Igor Bukanov

Assignee

Comment 7

•

15 years ago

Nominating for 2.0 as this bug blocks other bugs that improves SS/V8 scores by few percent.

blocking2.0: --- → ?

Igor Bukanov

Assignee

Comment 8

•

15 years ago

Attached patch v7 (obsolete) — Details — Splinter Review

v6 got too rottent. Gregor, do you have time to review this?

Attachment #498211 - Attachment is obsolete: true

Attachment #498211 - Flags: review?(anygregor)

David Mandelin [:dmandelin]

Comment 9

•

15 years ago

(In reply to comment #7) > Nominating for 2.0 as this bug blocks other bugs that improves SS/V8 scores by > few percent. I think our direction is pretty firmly against blocking release on perf wins this late in the game. I do like perf wins, though. Do you have any comments on the risk of regressions we take from landing this? It's a big patch but I guess it's mostly a refactoring?

blocking2.0: ? → .x

Igor Bukanov

Assignee

Comment 10

•

15 years ago

(In reply to comment #9) > Do you have any comments on > the risk of regressions we take from landing this? It's a big patch but I guess > it's mostly a refactoring? The patch is indeed about refactoring. The non-refactoring part is a debug-only changes to the GC stats collection.

Igor Bukanov

Assignee

Comment 11

•

15 years ago

Attached patch v8 (obsolete) — Details — Splinter Review

unrotten patch

Attachment #501314 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 12

•

15 years ago

Attached patch v9 (obsolete) — Details — Splinter Review

The new version removes an incorrect assertion in Mark. With the patch the conservative GC uses the GC tracing machinery to collect the stats and that assert made that impossible during the compartment GC. Besides the conservative GC stats collection changes the patch does the following: 1. The patch uses ArenaHeader* in place of Arena<FreeCell> * to minimize the number of Arena<FreeCell>* <-> Arena<T>* casts. 2. The delayed marking is simplified father and the single remaining per-arena field is moved from MarkingDelay into ArenaHeader. The latter is compressed down to 4 fords in preparation for the bug 600234. 3. There is some refactoring regarding background finalization implementation to prepare for the bug 601075. In particular, for simplicity the code uses a vector of finalized lists, not a a vector of array, as the former is not justified for few thousands at most elements. There are few other changes to make the patches for the bug 600648 and bug 601234.

Attachment #526287 - Attachment is obsolete: true

Attachment #526421 - Flags: review?(gal)

Igor Bukanov

Assignee

Comment 13

•

14 years ago

gal: review ping

Igor Bukanov

Assignee

Comment 14

•

14 years ago

Attached patch v10 (obsolete) — Details — Splinter Review

patch update to reflect bug 616666 chnages

Attachment #526421 - Attachment is obsolete: true

Attachment #527126 - Flags: review?(gal)

Attachment #526421 - Flags: review?(gal)

Igor Bukanov

Assignee

Comment 15

•

14 years ago

Attached patch v11 (obsolete) — Details — Splinter Review

Here comes another rebase due to the bug 651546. Gal: any chance to review this before it rots again?

Attachment #527126 - Attachment is obsolete: true

Attachment #527354 - Flags: review?(gal)

Attachment #527126 - Flags: review?(gal)

Igor Bukanov

Assignee

Comment 16

•

14 years ago

Attached patch v12 (obsolete) — Details — Splinter Review

Landing of bug 651041 required to rebase this patch again.

Attachment #527354 - Attachment is obsolete: true

Attachment #527521 - Flags: review?(gal)

Attachment #527354 - Flags: review?(gal)

Igor Bukanov

Assignee

Comment 17

•

14 years ago

Comment on attachment 527521 [details] [diff] [review] v12 Bill, could you review this? Andreas or Gregor do not have time for this. See comment 1 and comment 12 for details of what the patch does.

Attachment #527521 - Flags: review?(gal) → review?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 18

•

14 years ago

I'm about half way through the patch. I'll finish tomorrow. I noticed that you use capital letters for the names of static methods like MarkingDelayEnd. In the rest of the engine, it seems like we use the normal convention for these (e.g., markingDelayEnd). I also have a larger question. Do you know why so much of the GC code is templated? The only advantage I can see is that the compiler might be able to optimize a few mod operations over sizeof(T) into shifts. But this seems like a pretty minor optimization. The disadvantages are that it bloats our instruction cache footprint and it's a little ugly in my opinion (especially the big switch statements over all possible GC thing kinds). Anyway, this patch definitely simplifies things. But it would be nice if we could go all the way and just eliminate the parameterization. Do you know of any reason why this wouldn't work?

Igor Bukanov

Assignee

Comment 19

•

14 years ago

(In reply to comment #18) > I also have a larger question. Do you know why so much of the GC code is > templated? The only advantage I can see is that the compiler might be able to > optimize a few mod operations over sizeof(T) into shifts. But this seems like a > pretty minor optimization. We do need templates during finalization. It really help to do the thing kind switch once per arena list. Now, with the background finalization, we may consider doing that once per arena to avoid extra switches and code bloat, but I prefer to do that after fixing bug 652416 as that patch simplifies the background finalization code. As regrading the allocation I tried to eliminate templates, but that regressed on benchmarks. That comes from two reasons. First multiplication is expensive even if it is done once during RefillFinalizableFreeList. It can be worked around with using a static array with precalculated values, but then we have another problem. Templated version of RefillFinalizableFreeList and corresponding code bloat improves branch prediction especially when allocating arenas from typed free lists or when building free cell list in the arena. I plan to reconsider that after fixing the bug 601075 as with that bug fixed we no longer needs to build free lists in RefillFinalizableFreeList nor use typed free lists.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 20

•

14 years ago

Comment on attachment 527521 [details] [diff] [review] v12 Review of attachment 527521 [details] [diff] [review]: Well, regarding the templates, I want to emphasize that we could remove them incrementally. For example, I think we could take away the type parameter for Arena while still making the allocation and finalization code parametrized. I think that would be a little cleaner than passing ArenaHeaders around, since it would avoid the occasional casts back to Arena that you need to do. (Although I realize that the layout of things in the arena would be a little grosser, but we could hide that behind accessors.) However, I think this patch has a lot of good stuff in it, and it's been sitting around for a while. I especially like the cleanup of the background finalization data structure. So r+ with the nits fixed. ::: js/src/jsgc.cpp @@ +1922,5 @@ for (;;) { + bool allClear = reinterpret_cast<Arena<T> *>(aheader)->finalize(cx); + + /* + * We don't delete the head because the next allcoated arena has to "allocated" @@ +2185,3 @@ finalizeVector.resize(0); cx = NULL; + Whitespace. ::: js/src/jsgc.h @@ +118,5 @@ + + private: + /* + * When recursive marking uses too much stack the marking is delayed and + * the corresponding arenas is put into a stack using a linked via the "corresponding arenas are put" @@ +127,5 @@ + * direct byte access for the field. + */ + uintptr_t thingKind : 8; + uintptr_t nextDelayedMarkingArena : JS_BITS_PER_WORD - 8; + This kind of bit packing is a little gross. Maybe in the future we could just scan all the arenas in a compartment during delayed marking? I've seen other GC implementations that work this way. ::: js/src/jsgcmark.cpp @@ +76,5 @@ JS_ASSERT(thing); JS_ASSERT(JS_IS_VALID_TRACE_KIND(GetGCThingTraceKind(thing))); JS_ASSERT(trc->debugPrinter || trc->debugPrintArg); + JS_ASSERT_IF(!JSAtom::isStatic(thing), thing->asFreeCell()->isAligned()); Shouldn't we be able to assert !JSAtom::isStatic(thing) here? Then we could assert alignment separately.

Attachment #527521 - Flags: review?(wmccloskey) → review+

Igor Bukanov

Assignee

Comment 21

•

14 years ago

Attached patch v13 (obsolete) — Details — Splinter Review

Addressing the nits and consistently using lowercase names for static methods.

Attachment #527521 - Attachment is obsolete: true

Attachment #529023 - Flags: review+

Igor Bukanov

Assignee

Comment 22

•

14 years ago

(In reply to comment #20) > For example, I think we could take away the type parameter for > Arena while still making the allocation and finalization code parametrized. I > think that would be a little cleaner than passing ArenaHeaders around, since it > would avoid the occasional casts back to Arena that you need to do. I plan to move the ArenaHeader from the arena in the bug 600234. Then even with template-less arena it would be necessary to convert between arena and the header. > + uintptr_t thingKind : 8; > + uintptr_t nextDelayedMarkingArena : JS_BITS_PER_WORD - 8; > + > > This kind of bit packing is a little gross. It helps synthetic benchmarks as it allows to put more things into the arena. For example, without the bitmap fields the number of strings in arena on 64 bit CPU drops by 0.9%. Some objects sizes also benefits from that.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 23

•

14 years ago

(In reply to comment #22) > > This kind of bit packing is a little gross. > > It helps synthetic benchmarks as it allows to put more things into the arena. > For example, without the bitmap fields the number of strings in arena on 64 bit > CPU drops by 0.9%. Some objects sizes also benefits from that. Well, I didn't mean we shouldn't pack them. I meant we should eliminate the nextDelayedMarkingArena field entirely. Then we would just set a bit in the runtime whenever we ran out of room on the mark stack. At the end of GC, we would iterate over all arenas and do delayed marking on them. It's slower, but it should be a very uncommon case, and it seems cleaner to me. But there's no need to do this now.

Igor Bukanov

Assignee

Comment 24

•

14 years ago

(In reply to comment #23) > Well, I didn't mean we shouldn't pack them. I meant we should eliminate the > nextDelayedMarkingArena field entirely. Then we would just set a bit in the > runtime whenever we ran out of room on the mark stack. At the end of GC, we > would iterate over all arenas and do delayed marking on them. It's slower, but > it should be a very uncommon case, and it seems cleaner to me. Gregor has already discovered that we have XML test cases that exercise delayed marking. In that particular case this happened due to a lot due to the lack of support of LargeElem for XML. But I would not be surprised that even without XML the WEB has structures that need delayed marking. And enumerating all the arenas in the runtime repeatedly could be very slow.

Igor Bukanov

Assignee

Comment 25

•

14 years ago

Attached patch v14 — Details — Splinter Review

I should have redone measurements after merging this patch with the marking stack changes. Apparently using bit field for the marking delay and thing kind became rather expensive. So the new patch keeps the marking delay as a separated structure in the chunk as before.

Attachment #529023 - Attachment is obsolete: true

Igor Bukanov

Assignee

Comment 26

•

14 years ago

http://hg.mozilla.org/tracemonkey/rev/61bbaedfc2a3 http://hg.mozilla.org/tracemonkey/rev/e2843f43757e - fixing typo

Whiteboard: fixed-in-tracemonkey

Gregor Wagner [:gwagner]

Comment 27

•

14 years ago

Have you measured the GC pause time during the v8 benchmarks in the browser? It seems that most GCs take a few ms longer now. No big differences but GC's that took 8ms before take now 10ms for example.

Igor Bukanov

Assignee

Comment 28

•

14 years ago

(In reply to comment #27) > Have you measured the GC pause time during the v8 benchmarks in the browser? > It seems that most GCs take a few ms longer now. No - I only checked that the total score is the same and that the callgrind shows no regressions. Now, the only data structure that the patch changes is eliminating of two fields from the MarkingDelay. One was unused and another was used to scan on average twice less things in arena with delayed marking. On 32 bit eliminating the fields has not changed the number of arenas we can fit into the chunk - it remained at 251. So on 32 bit I suppose the slowdown can come from the delayed marking calls. Now, if we happens to have delayed arenas, then something is wrong with the marking stack patch. I will check that. On 64 bit the change resulted in the increase of number of arenas we have per chunk from 250 to 251. That may affect the results.

Igor Bukanov

Assignee

Comment 29

•

14 years ago

I do not see the GC slowdown during V8 on Linux 64 bit. Without the patch typical timing during the end of V8 run, when we hit the longest GC pauses, as measured with MOZ_GCTIMER defined and JS_WANT_GC_SUITE_PRINT set to true: 67.456000 65.506000 0.674000 67.581000 65.609000 0.673000 67.470000 65.524000 0.672000 ... 69.056000 67.055000 0.680000 68.677000 66.639000 0.677000 68.582000 66.503000 0.752000 With the landed patch that becomes: 67.721000 65.750000 0.669000 67.592000 65.596000 0.653000 67.249000 65.100000 0.737000 ... 67.228000 65.271000 0.676000 66.863000 64.915000 0.684000 66.920000 64.928000 0.698000 I do not see a regression here.

Gregor Wagner [:gwagner]

Comment 30

•

14 years ago

I didn't mean the long pauses during the splay benchmark. For the raytracer benchmark I see now a triangle pause time from 5.5ms to 10.5 ms. This was 4.0 to 8.9 before. I don't know where it comes from because the marking just gets slightly worse and doesn't account for all the regression. before: Total, Mark, Sweep 4.0, 2.5, 0.8, 0.7, 0.0, 0.0, 0.0, 6.4, 4.9, 0.8, 0.7, 0.0, 0.0, 0.0, 7.5, 6.0, 0.8, 0.7, 0.0, 0.0, 0.0, 8.2, 6.6, 0.8, 0.7, 0.0, 0.0, 0.0, 8.9, 7.4, 0.8, 0.7, 0.0, 0.0, 0.0, now: Total, Mark, Sweep, 5.9, 2.9, 0.8, 0.8, 0.0, 0.0, 0.0, 8.2, 5.2, 0.8, 0.8, 0.0, 0.0, 0.0, 9.4, 6.5, 0.8, 0.8, 0.0, 0.0, 0.0, 9.9, 7.0, 0.8, 0.8, 0.0, 0.0, 0.0, 10.6, 7.7, 0.8, 0.8, 0.0, 0.0, 0.0,

Igor Bukanov

Assignee

Comment 31

•

14 years ago

(In reply to comment #30) > For the raytracer benchmark I see now a triangle pause time from 5.5ms to 10.5 > ms. This was 4.0 to 8.9 before. Is it on 64 bit?

Gregor Wagner [:gwagner]

Comment 32

•

14 years ago

(In reply to comment #31) > (In reply to comment #30) > > For the raytracer benchmark I see now a triangle pause time from 5.5ms to 10.5 > > ms. This was 4.0 to 8.9 before. > > Is it on 64 bit? Yes on 64 bit. A 2ms regression for all GCs during the v8 browser suite. They are more obvious during the short GC pauses of course. I am doing a clean build now. Maybe that helps.

Igor Bukanov

Assignee

Comment 33

•

14 years ago

Could also check if adding uintptr_t dummy[2]; to the MarkingDelay structure would affect the benchmark timing?

Igor Bukanov

Assignee

Comment 34

•

14 years ago

One more thing, the delay would still present if you run the benchmark individually?

Igor Bukanov

Assignee

Comment 35

•

14 years ago

The regression is observed only in the browser, it does not exist in the shell, does it?

Gregor Wagner [:gwagner]

Comment 36

•

14 years ago

(In reply to comment #35) > The regression is observed only in the browser, it does not exist in the shell, > does it? Yes only in the browser. How can I run the v8 benchmarks individually?

Igor Bukanov

Assignee

Comment 37

•

14 years ago

(In reply to comment #36) > Yes only in the browser. How can I run the v8 benchmarks individually? If you want to run richards from the sunspider distribution, then just create a page and include the script. The script contains the loop at the end that executes the benchmark number of times. If you want to run richards from V8 distribution or from a copy under js/src/v8 in our tree, then the page should define BenchmarkSuite and Benchmark constructors. So you would need something like: <html><body> <script> function BenchmarkSuite(name, number, tests) { this.tests = tests; } function Benchmark(name, entryPoint) { this.entryPoint = entryPoint; } function run_one_test(test) { f = test.tests[0].entryPoint; for (var i = 0; i != 350; ++i) f(); } </script> <script src="raytrace.js"></script> <input type="button" value="Run RayTrace" onclick="run_one_test(RayTrace)"> </body></html>

Chris Leary [:cdleary] (not checking bugmail)

Comment 38

•

14 years ago

cdleary-bot mozilla-central merge info: http://hg.mozilla.org/mozilla-central/rev/61bbaedfc2a3 http://hg.mozilla.org/mozilla-central/rev/e2843f43757e

Chris Leary [:cdleary] (not checking bugmail)

Updated

•

14 years ago

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.