Open Bug 852187 Opened 10 years ago Updated 2 months ago

Optimize postMessage for large strings


(Core :: JavaScript Engine, defect)





(Reporter: Yoric, Unassigned)


(Blocks 1 open bug)


(Whiteboard: [mentoree waiting for mentor])


(2 files)

JavaScript strings are basically a read-only C void* with garbage-collection and lots of decoration. Therefore, it should be possible to perform copy of strings to a worker as follows:
- add a strong reference to the string;
- mark the string as immovable, if necessary;
- have *the worker* copy the memory directly;
- once the worker has copied the memory, mark the string as movable again and remove strong reference.

While this might not be useful for all strings, we have rare but very janky cases of multi-megabyte strings being sent across workers (sometimes 60Mb+), and for these strings, the optimization could possibly be quite useful.
I think it's easier than that. The movable portion of a large JSString is only a header. (Smaller strings may be inlined into the header, but those both don't matter and can be un-inlined if desired.)

AIUI, postMessage without transferables wants to be able to produce something you could write to disk or send to another address space (process), though. So I'm not sure what the API level should look like for this. We could say strings can be Transferable, but that has observable effects and doesn't match the spec. Or we could add a parameter to routines that create structured clone buffers that says "this will stay within a single address space"; we already track that for Transferables.
Well, here too, if someone is willing to mentor me through the work, I am willing to write the code.
I have just run some quick and dirty benchmarking upon string serialization. Surprisingly, posting a string is often slower than converting it to utf8 and transferring the result.
Posting a string involves copying it twice: once from the original string into the JSAutoStructiedCloneBuffer and once from the buffer and into the new string on the other end.
Yes, the algorithm exposed above could remove one of the copies. It is still somewhat surprising that copying a string is not largely faster than encoding it to utf8/utf16.
Depends on: 789593
Andrea, I'm curious, why did you add that dependency?
This is not a direct dependency but I think we should wait until bug 789593 is fixed before working on transferable objects/strings and JSAutoStructuredClonebuffer. Just in case sfink changed too much code in that piece of code.

Bug 789593 has already a patch attached. I would suggest to take a look of it.
Whiteboard: [mentoree waiting for mentor]
Luke, would you mentor me on that bug?
Flags: needinfo?(luke)
I think the majority of changes would be to jsclone.cpp, so I'll forward this needinfo to Steve/Waldo.

I did have an idea on the string side of things: we could avoid all copying by sharing the (immutable) char buffer between workers.  The way this could work is that, after we converted the input to a stable string (JSString::ensureStable), we could convert the stable string into an external string whose finalizer took care of the atomic refcounting and release of the chars.  (To do this we'd need to malloc a per-shared-string struct that extended JSStringFinalizer with a ref-count; so probably we'd only want to do this for sufficiently large strings.)

The harder part is extending our structured clone scheme to facilitate this, so again, needinfo from Steve/Waldo.
Flags: needinfo?(luke) → needinfo?(sphink)
Flags: needinfo?(jwalden+bmo)
I'm not sure I have a good enough grasp of our structured clone code to mentor this, especially.
Flags: needinfo?(jwalden+bmo)
Assignee: general → nobody
No pong in 3 years? Disappointing.
When doing this, need to remember that structured cloning may happen also between processes, in which case sharing string data won't work as easily (and if we want sharing between processes, that should be done in a different bug, IMO).
These days the DOM side of structured cloning does have the notion of context (within the same thread, within the same process, cross-process). Perhaps that should be pushed to JS engine level.
IMO, if we want to optimize for reactivity in cross-process communication, we actually want it to behave as if it were an optimized postMessage to a local worker, followed by a non-optimized postMessage from the worker to the other process.
This is my stalest ni?. I periodically look at it and think it's messy. But ok, let me give it a shot.

In principle, this seems totally doable and a really good idea.

Currently, structured clones are really partitioned into two very different types: transferable and serializable.

The transferable clones may store in-memory pointers and ownership information, and only work within the same address space. They probably shouldn't be forced into a byte sequence at all, since that's useless busywork, but instead be some sort of collection of typed structures. Unfortunately, the structured cloning API is awful and some users actually do move the clones around as byte arrays temporarily, so the path of least resistance has been to keep it as is.

The serializable clones are things that need to be storable into IndexedDB and read back again in a new process. So, no pointers allowed, and the clone must fully own the data.

A postMessage'd string would be little of each. Comment 10 sounds exactly right to me, but that means you're doing a postMessage with a regular string and no Transferable collection parameter, and yet you are storing pointers and passing around reference counts. So I think you'd need to make the "serializable" notion separate from "transferable", and change the API so that you could specify what sort of clone you want. (So that IndexedDB would pass in the I-need-a-serializable-buffer-back flag, rather than just depending on not passing in any Transferables.) Or perhaps it would be better to split the API up so that there are separate entry points for serializable clones vs "in-memory and maybe containing transferables" clones.

Though if you're distinguishing these, then what smaug says in comment 14 makes a lot of sense, in that you may as well fully describe the contexts in which the clone makes sense -- same-thread, same-process (or is that same-address-space?), cross-process, fully serializable. And you don't want separate APIs for all of those.

Anyway, to implement all of this, you'd want to do something like:

 - implement the string type morphing and refcounting described in comment 10

 - generalize the current Transferable-only ownership mechanisms to handle same-address-space, non-transferable data

 - change the API to require the caller to describe the sort of clone needed

 - add heuristics to use the new mechanisms for large strings

I guess this is all just an expansion of my original comment 1. Does this make sense? Does it sound doable?
Flags: needinfo?(sphink)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.