(In reply to Thinker Li [:sinker] from comment #34) > do you think reservoir sampling is reasonable change? Sorry for the (long) delay. I do think reservoir sampling is a good idea. I'd like the nursery-based deduplication to land separately, though. It is a very separable piece, and I think the reservoir sampling has a lot of places where we might want to adjust various details: - what should it do on out-of-memory? - should it even try to build its data structures when memory use is high? - how many reservoirs? - would it be good to turn deduplicated strings into atoms? - do we need to schedule GCs with the reservoir a little differently so there's more likely to be able to free up some memory before we hit OOM? Among others. We can always land something that basically works and then worry about tuning later, except we probably wouldn't want to land anything that would crash on OOM. (It would be better to discard all of the deduplication data structures and allow the GC to continue without deduplication.) Still, there are enough details that I would still prefer to split out the nursery deduplication to land first. What do you think? And thank you very much for doing this.
Bug 1568923 Comment 35 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
(In reply to Thinker Li [:sinker] from comment #34) > do you think reservoir sampling is reasonable change? Sorry for the (long) delay. I do think reservoir sampling is a good idea. I'd like the nursery-based deduplication to land separately, though. It is a very separable piece, and I think the reservoir sampling has a lot of places where we might want to adjust various details: - what should it do on out-of-memory? - should it even try to build its data structures when memory use is high? - how many reservoirs? - would it be good to turn deduplicated tenured strings into atoms? - do we need to schedule GCs with the reservoir a little differently so we are more likely to be able to free up some memory before we hit OOM? Among others. We can always land something that basically works and then worry about tuning later, except we probably wouldn't want to land anything that would crash on OOM. (It would be better to discard all of the deduplication data structures and allow the GC to continue without deduplication.) Still, there are enough details that I would still prefer to split out the nursery deduplication to land first. What do you think? And thank you very much for doing this.