Open Bug 1367471 Opened 8 years ago Updated 4 days ago

De-duplicate strings or other constant & common data during compaction

Categories

(Core :: JavaScript: GC, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: pbone, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: triage-deferred, Whiteboard: [MemShrink:P2])

erahm noticed that there were 10MB of black, ("#000000") reported in about:memory so fitzgen suggested de-duplication during compaction.
In bug 1338930 there is 56MB (2.5 million copies) of the string ":DIV".
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Keywords: triage-deferred
Priority: -- → P3
Blocks: 1424901
This sort of thing makes me sad.... yes, it's a website leak, perhaps we could to (some) deduping without a big perf hit to limit the pain of this sort of thing. Or a separate idle-time-scheduled de-duping pass. I'd suggest using some heuristics to decide if there's any chance of a big win before throwing too many cycles at it; perhaps during compaction we can record a histogram of string sizes and see if there are hot spots. 1,248.01 MB (70.42%) -- string(length=648061, copies=624, "url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABi" (truncated))
We talked about this during the GC meeting at Orlando. 1) We should probably prioritize solving this for short strings like in comment #0 and comment #1; given how short they are, the overhead of hashing them should be relatively small. 2) For huge strings like in comment #2 we'd need to do something a little more clever - maybe we could hash the first page and compare prefixes before we commit to comparing the full strings. 3) I remember a bug where we had a large amount of strings that shared a long prefix but each had a unique suffix - it would be great if we could turn these into ropes somehow. 4) Another bug I remember has us keeping long strings alive even though the JS only used short substrings; it would be great to copy/inline/deduplicate these during compaction as well. I wondered whether it would be a good idea to store a hash per page of a string (or rope) so we could deduplicate page-sized chunks. Of course, it won't work if you're comparing flattened strings with variable length prefixes, but it would work on mostly identical strings with differing fixed length prefixes.
Depends on: 1568923
Severity: normal → S3
See Also: → 1442516
You need to log in before you can comment on or make changes to this bug.