(In reply to Doug Thayer [:dthayer] from comment #4)
(In reply to Kris Maglione [:kmag] from comment #2)
That won't work, for a lot of reasons. One is that we need separate cache files in different processes.
Could you clarify why? It's not remarkably critical, because we could split out the startup cache files just as easily as we can split out the script cache files, but the writes are brokered by the parent process anyway, so it should not be feasible for a content process to write into a buffer which the parent process will load as a script. Am I misunderstanding something here?
For a lot of reasons. The scripts that are used by child processes are in a separate mmapped region that's shared across all child processes. They don't see the data that belongs to the parent process, for a number of reasons. And, for efficiency reasons, the data that they actually need to access is all ordered and contiguous.
Having the child send the data to store in the cache to the parent is... complicated. For security reasons, we can only accept data from the child before any untrusted code has run in that process, since data sent from a compromised process and stored in the cache would wind up running in unrelated processes, which it would then compromise. The preloader cache currently handles that. We could in theory make the startup cache handle it too, but either way we'd need data segregation.
Another is that we intend in the future to use the memory mapped XDR data in the preloader as the actual memory backing for bytecode of decoded scripts.
Is this just to avoid a copy? On a spinny disk on a system with enough memory that it won't have to page out a bunch of things to make room, the time savings from halving the number of bytes read off disk by compressing it should far outweigh the malloc and memcpy, no?
It's to avoid duplicating the bytecode of those scripts in every content process. If we're on a system with a spinning disk, then the last thing we want is to start swapping because we're low on memory.
Either way, I wouldn't expect unifying the two to improve performance. The IO ordering of the preloader cache is already carefully optimized, and the file is already spread over multiple filesystem blocks. If there's anything that we could do to increase its IO efficiency, it would likely be changing the flags on the pre-loaded region of the mapped file so the OS does more aggressive ordered pre-fetching.
From a performance perspective, my only real concern is ensuring that we're not fetching from both at the same time on different threads, causing unnecessary seeks. However, from a maintenance perspective it feels like much of the startup cache and script preloader code could be unified, because at a storage level (which is the only level for the startup cache) they're both trying to do the same thing, in very slightly different ways.
From a performance perspective, I'm more concerned that we are fetching it from multiple threads at the same time, because it gives the OS more leeway to optimize the seeks based on the locations of the data for all outstanding reads. The data we're talking about is going to be spread over multiple filesystem blocks, so there's no guarantee that it will be contiguous on disk even if it's in the same file. And it's needed across timespans which are far longer than a seek time. The OS has a much better ability to optimize that data access than we do, so I'd rather we second guess it as little as possible.
That said, there are flags that we can set on those mmapped regions to let the OS know that we expect to read it quickly, and in order, so that it will prefetch any unavailable chunks adjacent to the last read as soon as it gets the chance, which it might be worth looking into. We do something like this for omnijar already.