Currently, when a process forks the child process will respawn all of its non-main threads afterwards (the forked process will initially only have its main thread running). Most forks will never need these threads --- they will sit idle and only be used for forking additional processes. The remaining forks run forward a short distance in the recording, do some operation, and then die.
Spawning these threads when there is no need for them is inefficient, but having all these threads puts a lot of pressure on the kernel --- threads on Linux are essentially processes that can share the same address space, and I doubt the situation is much better on other operating systems. The machine I've been using for cloud replay (an AWS instance with 8 cores and 32 GB of RAM) eventually goes non-responsive if the recording gets long enough. With each idle fork using two threads (we do need to spawn a thread to listen for messages) instead of 70+, pressure on the system should be greatly reduced.