Do small dependent strings effectively leak large strings?
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox141 | --- | fixed |
People
(Reporter: cdleary, Assigned: sfink)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [MemShrink:P3][sp3])
Attachments
(2 files, 1 obsolete file)
Comment 1•13 years ago
|
||
Comment 2•13 years ago
|
||
Updated•11 years ago
|
Comment 4•8 years ago
|
||
Updated•3 years ago
|
| Assignee | ||
Comment 5•2 years ago
•
|
||
I wonder if we could do a variant of comment 4, because sweeping all strings seems unfortunate. This is also an implementation of making string→base a weak pointer.
During regular marking, when a string→base edge is encountered where the base has not yet been marked, effectively push the base onto the bottom of the mark stack (possibly implemented with a separate string-only mark stack). When we finish the rest of marking and are about to process the first of these deferred strings, switch to a marking mode that is exactly the same as regular marking except set a BASE_ONLY bit on all marked strings. Then during regular marking, sever dependent strings from their base if the BASE_ONLY bit is set.
MarkDependent() {
if isMarked(base):
return
elif base.flags & BASE_ONLY:
convert dependent string to a regular linear string
# No need to trace(base), even though it may be kept alive by something else.
elif inDeferredMarking():
base.flags |= BASE_ONLY;
trace(base)
else:
deferredMark(base)
}
But also note that I may be making short dependent strings into inline strings, which would reduce the number of these dependent-preserved base strings.
A related alternative would be to not use the BASE_ONLY bit, and instead queue up dependent strings with unmarked bases. In the final phase, if their bases are still unmarked, sever (undepend) them. This is equivalent to Jan's comment 4, it's just recording all strings to sweep instead of scanning all strings. This might require pushing a lot of dependent strings onto the alternative stack, though, especially if the bases are marked late in the regular marking pass. Most strings are eagerly marked, and so currently do not require mark stack space. And in mstange's case at least, there were many dependent strings with a common base.
Another intermediate possibility would be to construct a list of string arenas containing at least one dependent string that needs sweeping.
Unless I'm missing something, the main drawbacks of the BASE_ONLY scheme are: (1) I don't think we have an easily free bit in string flags; (2) the complexity of pushing onto a separate stack or the bottom of the stack; (3) this would probably end up with at least one extra branch in string marking; (4) this trick can only be used once. If we want anything else to use a similar "everything is now marked except for those reachable through edges of type X" trick, it would collide (unless the reachable types are fully disjoint).
Fight me.
| Assignee | ||
Comment 6•2 years ago
|
||
After some further thought and discussion with jonco:
First, there's no need to worry about this at all unless we have evidence it is a problem.
Second, it is possible to push onto the bottom of the stack (unshift) relatively easily: push the bottom entry onto the top, and overwrite the bottom entry with the unshifted value. Our stack doesn't have fixed-length entries, but there are only two sizes and the size can be determined from the first word, so this can all be fixed up by tacking on some padding if unshifting a small entry into a big entry's place (and strings will always be small entries, so the other way around isn't needed). Since I think the large entries are 2x the small entries, this could also just be done by duplicating the small entry since I think we ignore re-marking.
Third, jonco suggested that a far simpler heuristic would be to copy the chars if the base string is x% larger than the dependent string. No funky extra sweep phase with a bizarro double-ended stack required.
Anyway, I'll let this sit here until we find evidence that this is an issue.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 7•1 year ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #6)
Anyway, I'll let this sit here until we find evidence that this is an issue.
Bug 1931252 is a new report about this issue.
Comment 8•1 year ago
|
||
I do also wonder if this might be related to https://bugzilla.mozilla.org/show_bug.cgi?id=1931717 ...
Comment 9•1 year ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #6)
...
Third, jonco suggested that a far simpler heuristic would be to copy the chars if the base string is x% larger than the dependent string. No funky extra sweep phase with a bizarro double-ended stack required.
This suggestion seems like a quick and easy improvement. I don't think we need a huge amount of additional evidence to justify implementing it.
| Assignee | ||
Comment 10•1 year ago
|
||
The x% heuristic prevents the problem of a short(ish) substring keeping the whole original string alive, at the cost of more memory use and allocation overhead if the original string is being sliced up into lots of substrings that collectively cover all or most of the original string (imagine splitting a whole file into lines, for example). Bug 1931252, among others, provides evidence that the heuristic is a good idea.
The worst case would be for shingles:
var s = ...1 MB string...;
var shingles = [];
for (let i = 0; i < s.length - 50; i++) {
shingles.push(s.substr(i, 50));
}
With the current setup, this uses 1MB + 999950 * sizeof(JSString) of string-related memory, about 16MB. With the heuristic, it would release the original string and use 999950 * (sizeof(JSString) + 50 bytes), about 63MB (but actually the allocation overhead would round up each 50 byte allocation to 64 bytes, for a total of 76MB). But this is probably too uncommon to worry about, relative to the situation this bug is about.
| Assignee | ||
Updated•8 months ago
|
| Assignee | ||
Comment 11•8 months ago
|
||
| Assignee | ||
Comment 12•8 months ago
|
||
Performance comparison, 7 runs each base/treatment: https://perf.compare/compare-results?baseRev=f4b3014e29ff4ba08d191b3fa1e7cf89a1980141&baseRepo=try&newRev=9cdab110d5fe428e3b35624cc112f9c8142f2e13&newRepo=try&framework=13&sort=delta%7Cdesc
I can't tell much of anything from this.
| Assignee | ||
Comment 13•6 months ago
|
||
Updated•6 months ago
|
| Assignee | ||
Comment 14•6 months ago
|
||
Comment 15•6 months ago
|
||
Comment 16•6 months ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/cd0651330d71
https://hg.mozilla.org/mozilla-central/rev/439c72e35361
https://hg.mozilla.org/mozilla-central/rev/df255fb45980
Updated•5 months ago
|
Description
•