Deduplicate nursery strings
Categories
(Core :: JavaScript: GC, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: sfink, Assigned: sfink)
References
(Blocks 4 open bugs)
Details
Attachments
(7 files, 6 obsolete files)
Before considering deduplicating strings during compaction, we should try deduplicating them during tenuring, so that few duplicate strings will have a chance to make it to the tenured heap.
Comment 1•5 years ago
|
||
String deduplication during tenuring
Most live nursery strings can be deduplicated in moveToTenured through a hashset.
Dependent strings are complicated to deal with since their chars need to be updated with the newly deduplicated base chars.
If the dependent string is tenured, its bases cannot be deduplicated since a tenured DS chars cannot be updated. Otherwise, the following steps will be able to update its chars.
- Preserve the nursery base chain by saving DS nursery bases in its relocation overlay. This allows DS nursery root base to be reached.
- Calculate the original DS base offset: DS original chars - DS root base original chars. Root base original chars is saved in its relocation overlay.
- Update the DS chhars with its newly forwarded DS root base chars and the calculated offset.
- Assign either the root base or the undepended string in which the DS uses chars from as the new DS base to unchain any potentially long DS chain.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 3•5 years ago
|
||
Backed out for SM bustages on RelocationOverlay.h
Backout link: https://hg.mozilla.org/integration/autoland/rev/54051ecbe789d707dcec34f4db69e644dc9c8d15
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=262975736&repo=autoland&lineNumber=3311
Comment 4•5 years ago
|
||
There were also assertion failures on:
Assignee | ||
Comment 5•5 years ago
|
||
Comment on attachment 9080802 [details]
Bug 1568923 - String deduplication during tenuring. r=sfink
We're having trouble tracking down some remaining problems here. It crashes the browser with ASAN pretty often on try, and crashes much less often without it. It would be nice to have a shell test case.
Comment on attachment 9080802 [details]
Bug 1568923 - String deduplication during tenuring. r=sfink
I have an upcoming hard-to-reduce but reproducible testcase:
--enable-debug --enable-more-deterministic --enable-address-sanitizer (and maybe even disable jemalloc)
on m-c rev fce0b326cd31
run with:
--fuzzing-safe --no-threads --no-baseline --ion-eager --more-compartments --gc-zeal=10,421 --blinterp --blinterp-eager --blinterp-warmup-threshold=100 --execute="setJitCompilerOption(\"ion.forceinlineCaches\",1)"
Change the top of the testcase to point to your local m-c repository on rev fce0b326cd31
Pernosco session with this testcase:
https://pernos.co/debug/DbuTTxd2JqWCl3eKbJcsmQ/index.html
However, this is without --disable-optimize as adding it seemed to make the ASan crash go away.
gdb in Pernosco failed to go backwards when I ran reverse-next, so not sure how useful this is (or whether I'm doing things properly).
gdb in Pernosco failed to go backwards when I ran reverse-next, so not sure how useful this is (or whether I'm doing things properly).
This was a Pernosco bug. I fixed it and the fix is live, so that trace should be good for debugging.
// Adapted from randomly chosen test: js/src/jit-test/tests/basic/bug674776.js
let x = "";
for (let i = 0; i < 99999; i++) {
x += "12345678" + i + "123456" + i + "1";
}
asserts at Assertion failure: aKey->flags() == aLookup->flags(), at gc/Nursery.h:572 (as per CI), on the same debug ASan build as per comment 6, run with --fuzzing-safe --no-threads --no-baseline --ion-eager --gc-zeal=24,452
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #12)
asserts at Assertion failure: aKey->flags() == aLookup->flags(), at gc/Nursery.h:572 (as per CI), on the same debug ASan build as per comment 6, run with --fuzzing-safe --no-threads --no-baseline --ion-eager --gc-zeal=24,452
https://pernos.co/debug/xfJPwQNatDpkAxe84eUieg/index.html
rr rev: cbe18bc1807c9bcc822ba7119d112c3aec2ec0fc
pernosco-submit rev: 6a50323d71a8d2ea4d782945e6074f5bbc1c2cee
Assignee | ||
Comment 14•5 years ago
|
||
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #12)
asserts at Assertion failure: aKey->flags() == aLookup->flags(), at gc/Nursery.h:572 (as per CI), on the same debug ASan build as per comment 6, run with --fuzzing-safe --no-threads --no-baseline --ion-eager --gc-zeal=24,452
Ugh. I thought the fix for this one had already been uploaded to phabricator, but it looks like it wasn't. I'll update the patch.
I'm not sure if the previous recording was triggered from the same thing, but I don't think so. It looks more like the problem I was trying to track down.
Thank you!
Assignee | ||
Comment 15•5 years ago
|
||
Excellent, that first recording enabled me to track the bug down and the fix was easy. (A previous fix was applied to AutoStableStringChars::init but there's a similar ::initTwoByte that needs the same treatment.) This was a solid win for having Pernosco recordings of fuzz bugs, even of unminified test cases running on optimized builds.
Assignee | ||
Comment 16•5 years ago
|
||
Comment on attachment 9080802 [details]
Bug 1568923 - String deduplication during tenuring. r=sfink
There's an updated patch up now that I'm running through try, but it would be nice to get some fuzzing on as well.
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
The other issue the fuzzer was seeing was
Assertion failure: aKey->flags() == aLookup->flags(), at js/src/gc/Nursery.h:572
but I can confirm that is fixed with the new patch.
Comment 19•5 years ago
|
||
Comment on attachment 9080802 [details]
Bug 1568923 - String deduplication during tenuring. r=sfink
f+ with that one outstanding Debugger issue fixed :)
Comment on attachment 9080802 [details]
Bug 1568923 - String deduplication during tenuring. r=sfink
I don't think I have found anything with the newer versions of the patch so far either. :)
Assignee | ||
Comment 21•5 years ago
|
||
I'm still seeing a number of different crashes with the updated patch, though decoder's comment 17 is the most common. Hopefully the others are different manifestations of the same issue. Thanks, that'll help a lot! My local attempts at reproducing have failed so far.
Comment 22•5 years ago
|
||
I am wondering if there any one is still working on this bug?!
Assignee | ||
Comment 23•5 years ago
|
||
Hi Thinker! Good to see you here.
(In reply to Thinker Li [:sinker] from comment #22)
I am wondering if there any one is still working on this bug?!
I was intending to get back to this, but haven't made it yet and likely won't for a couple of weeks.
The main things to do here are (1) rebase on top of the removal of undepended strings, which should simplify it somewhat, and (2) fix the remaining issues like comment 17.
Krystal wrote up a nice overview doc at https://docs.google.com/document/d/1CSOJQ_PS0X4rTwH_3qgYYIuXGVUnXoG9miMoK40y8_g/
There was another more detailed doc that we used over the summer to communicate running progress and issues and fixes, but I'm failing to find it right now.
Comment 24•5 years ago
|
||
I have rebased the patch to latest m-c. There are two major changes here.
-
Code blocks dealing undepended strings are removed. I am not sure if it is correct, but I guess undepended strings have been removed from the code base. So, handling it is no more necessary.
-
Fix the crash mentioned at comment #17. The major problem here is related to NeureryAwareHashMap. It maintains a map, crossZoneStringWrappers, from strings to wrappers. With deduplicating, two or more strings are deduplicated by replacing them with the same string. That means two or more entries in the map will be rekeyed with the same dest key. With the existing impl. of rekey, it will create two or more entries with the same key but different values/wrappers for this case. When doing enumeration, it would return all entries of with the same key. However, doing lookup always return the same entry for a given key. This is why the assertion in Zone::checkStringWrappersAfterMovingGC() fails.
Comment 25•5 years ago
|
||
I forgot to mention about the solution of 2 in the last comment.
I check the duplication in NuseryAwareHashMap when doing sweeping, and remove the duplicated ones instead of rekey them.
The open issue here is deduplicating string wrappers. I am not sure if it is worth to do.
Comment 27•5 years ago
|
||
The testcase, deduplicateTenuringStrings.js, fails at calling |ensureFlatString()|. I guess this symbol has gone. My questions are followings.
- Is there any replacement of |ensureFlatString()|?
- The tests for undepended strings should be removed, right?
- Is there any new types of strings that should be tested, but not in the test case yet?
Comment 28•5 years ago
|
||
Flat strings are gone now, see bug 1330776. Any string that was previously flat, ought instead be linear now, I think. You might be able to replace with ensureLinear()
now. I think undepended strings got removed too, don't remember the bug number, part of the same change I think? There are no new strings types -- we only removed existing ones.
Comment 29•5 years ago
|
||
Fix the test case. Have passed all test cases in js/src/tests & js/src/jit-test.
Comment 30•5 years ago
|
||
I have done tests for this patch to see how much we can gain from this patch.
The test is to scroll the tab of facebook, up and down, until it's js-zone (almost) grows to 30MB.
The results are
- Without the patch here, all copies=289942, duped copies=232859, dup size=12821 k, 29.75MB.
- With attachment #9123936 [details] [diff] [review], all copies=176201, duped copies=100839, dup size=7154 k, 29.75MB.
- With attachment #9123936 [details] [diff] [review] and the patch of reservoir sampling, all copies=177913, duped copies=88304, dup size=4199 k, while the size of all strings is 29.34MB.
Terminologies:
- All copies here means the number of JSString instances.
- Duped copies is the number of JSString instances that are duplicates of other strings. For example, there are 10 "ABCDEFG" strings, then it counts 9 (10-1) duped copies.
- dup size is the size of all duplicates. For example, there are 10 "ABCDEFG". then the dup size of these instances are (10-1) * strlen("ABCDEFG").
- reservoir sampling picks N instances in total, no master how many instances there is. The reservoir patch here take 1024 samples randomly and detects their duplicates at the mark phase. Once any of them has duplicates, it will be promoted to a duplicate set that is used to dedup during minor GC.
Results:
- With attachment #9123936 [details] [diff] [review], it reduces more than 5.5mb of duplicates. (about 18.3% of 30mb)
- With attachment #9123936 [details] [diff] [review] and the reservoir sampling patch, it reduces more than 8.5mb of duplicates. (about 28.3% of 30mb)
The reservoir sampling patch will be uploaded later.
Comment 31•5 years ago
|
||
Sorry, in comment #30, it is attachment #9123994 [details], not attachment #9123936 [details] [diff] [review].
Comment 32•5 years ago
|
||
Implement the reservoir sampling. Pick samples at the mark phase.
Comment 33•5 years ago
|
||
Another thing to fix. The test is to scroll a facebook tab, until the size of all strings (not js-zone's size) in it's js-zone (almost) grows to 30mb.
Comment 34•5 years ago
|
||
Hi Steve,
do you think reservoir sampling is reasonable change?
Assignee | ||
Comment 35•5 years ago
•
|
||
(In reply to Thinker Li [:sinker] from comment #34)
do you think reservoir sampling is reasonable change?
Sorry for the (long) delay.
I do think reservoir sampling is a good idea. I'd like the nursery-based deduplication to land separately, though. It is a very separable piece, and I think the reservoir sampling has a lot of places where we might want to adjust various details:
- what should it do on out-of-memory?
- should it even try to build its data structures when memory use is high?
- how many reservoirs?
- would it be good to turn deduplicated tenured strings into atoms?
- do we need to schedule GCs with the reservoir a little differently so we are more likely to be able to free up some memory before we hit OOM?
Among others. We can always land something that basically works and then worry about tuning later, except we probably wouldn't want to land anything that would crash on OOM. (It would be better to discard all of the deduplication data structures and allow the GC to continue without deduplication.) Still, there are enough details that I would still prefer to split out the nursery deduplication to land first.
What do you think? And thank you very much for doing this.
Comment 36•5 years ago
|
||
I agree that landing separately is better and safer. I would open another bug for reservoir sampling.
Comment 37•5 years ago
|
||
This patch bases on the work of krystalyang2 with minor bug fixes.
The patch includes three major parts,
-
mark nursery strings pointed by tenured strings as
non-deduplicatable, -
deduplicate strings when they are moved to tenured from nursery, and
-
adjust dependent strings to correct it's pointers to the base
string and external buffer after tenuring.
Assignee | ||
Comment 38•4 years ago
|
||
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 39•4 years ago
|
||
Assignee | ||
Comment 40•4 years ago
|
||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 41•4 years ago
|
||
I split off the mechanism for disabling/re-enabling nursery strings. If I still want it (it's probably a good thing to have fuzzed, even though we don't implement re-enabling yet), I'll do it in another bug.
Comment 42•4 years ago
|
||
Comment 43•4 years ago
|
||
bugherder |
Assignee | ||
Comment 44•4 years ago
|
||
Updated•4 years ago
|
Comment 46•4 years ago
|
||
Comment 47•4 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•