Closed Bug 923185 Opened 11 years ago Closed 11 years ago

Add new compressed mega-data string type

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: mccr8, Unassigned)

Details

(Whiteboard: [MemShrink])

This is not entirely a serious suggestion, in the realm of a modest-proposal intended to spark discussion.  See bug 923184 for a less ridiculous approach to dealing with this problem.  But anyways, in B2G apps and Gaia we see people storing images and other huge blobs of data as strings, over and over again.

For instance, see bug 884394 comment 75: "The cause of the crash is that we end up storing the image internally in a format that weighs in around 28mb. We then try to make two copies of the data in short order and that uses up the rest of the memory available to us and we crash."

See also bug 806374 and some other bugs.

Now, the right way to deal with these is to use blobs, but people are clearly drawn to huge data strings like moths to the flame. I'm also a little concerned about the (I think?) need to manually refcount the blobs.  I wonder if we can harden the platform against this kind of horribly inefficient but very convenient pitfall.

It seems to me that the largest drawback of storing binary data as strings is that it is very inefficient in terms of space.  So, I could imagine some kind of new internal string representation for mega-strings that stores them in a more efficient format, perhaps just by compressing it.  The threshold for string size could be configurable.  You'd probably want some way to internally convert between compressed and uncompressed depending on how much a string is being accessed, though that is going to increase the max memory usage.
Another mega-string representation is a "rope", a list of smaller substrings. A rope also makes string appending much more efficient because you can just push each appended substring to the end of the list without reallocation or copying.
The JS engine already uses ropes.  As you say, it makes concatenation fast, but it doesn't help with the size of the overall string.

The use case here is that the app creates a huge string by importing it from somewhere, then doesn't add to it or even look at it, but just passes it around as totally opaque data, until it goes into something that can display it as a data URI or whatever.
Would ASCII strings help, see bug 805081? Having a special ASCII string type should be a perf win in some cases and if it halves the size of these strings that could be interesting.
I think it should help.  I think the most common source is Base64 encoding.
In the cases where these super-large strings are being created, would a Blob be more appropriate?
(In reply to Luke Wagner [:luke] from comment #5)
> In the cases where these super-large strings are being created, would a Blob
> be more appropriate?

Yes, and people are good about converting the strings to blobs once the problem is pointed out, but that's not really a scalable solution outside of Mozilla-controlled code, IMO.
(In reply to Andrew McCreight [:mccr8] from comment #6)
Oops, sorry, I didn't see that you already mentioned blobs in comment 0 :)

If Blob is indeed the right Web-standard data structure to use for these large-data tasks, then it sounds like a tech-evangelism issue.

The reason I'm a little leery of a compressed string type is that pretty much every string operation will require decompressing it (except those that we specifically add paths to work on compressed data), so you'd have this very implicit optimization with a very big performance cliff and, when people fall off it, it could be even harder to track down than now.
And we already have plenty of string kinds!  I know warnings are pretty lame, but could we emit one in this case?  Or find some other automated way to say "don't base64-encode that binary data as a string, plz".
Oh, I just read bug 923184 -- I like that idea better.  It seems a lot simpler to implement.
(In reply to Luke Wagner [:luke] from comment #7)
> If Blob is indeed the right Web-standard data structure to use for these
> large-data tasks, then it sounds like a tech-evangelism issue.
Tech evangelism is a nice idea, but from what I've seen, people are going to keep doing this over and over again anyways.

> The reason I'm a little leery of a compressed string type is that pretty
> much every string operation will require decompressing it (except those that
> we specifically add paths to work on compressed data), so you'd have this
> very implicit optimization with a very big performance cliff and, when
> people fall off it, it could be even harder to track down than now.

Yeah, this would be a really horrible undertaking over all, though I think it would be easier to track down problems, because you could just run the profiler on your app, and then you'd just see huge amounts of time taken up by whatever compressed string access functions.  Maybe.

I was thinking more efficient strings would help prevent a performance cliff of their own, where your app just falls over, but I suppose you are really just squeezing out a constant factor, so you are just delaying the memory cliff.

Anyways, after thinking about this some more, I suppose the situation isn't really dire enough to warrant an extreme solution like this (like the addons-leak-the-world-forever problem was).  For Gaia code, we can use our own quality control whatever to prevent these big strings, and for apps, either it is desktop in which case a few hundred megs of strings isn't the end of the world, or it is B2G where the app will just crash its own process when it OOMs, so the moral culpability is clear.  Then we just have to add some way for app developers to figure out what is going on, like bug 923184.

Thanks for the discussion, I think that after all this doesn't really seem like something worth doing.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.