Open Bug 1547081 (replay-cloud) Opened 2 years ago Updated 9 months ago

Web Replay: Cloud friendly architecture

Categories

(Core Graveyard :: Web Replay, enhancement, P5)

enhancement

Tracking

(Not tracked)

People

(Reporter: bhackett1024, Unassigned)

References

(Depends on 3 open bugs)

Details

Cloud integration is essential to providing an exceptional debugging experience with web replay. Playing back or forward in the recording, warping. and logpoint searching are all slower than they should be. While they could be optimized more, ultimately the CPU power available on a local machine will not be sufficient to rapidly perform the sorts of analyses we want to implement.

There is already a good cutpoint for where we can move resources to the cloud. The UI process, middleman process, and recording process can all live on the developer's machine, and all (or most) replaying processes can live in the cloud. Communication between the middleman and replaying child is very restricted: a single pipe for sending messages, plus a block of shared memory for graphics data which will need reworking.

The main issue is that the middleman and replaying processes have lots of back and forth traffic, and were not designed with the latency and bandwidth restrictions present when communicating with a process on a distant machine. Optimizing to reduce this traffic will benefit both cloud based and local setups. The general strategy here will be hybridized between the current approach of having a live replaying process we can query about the program state, and a new sort of approach where we do some up front analyses of the page's behavior that are designed to support a responsive UX. This is fleshed out more in dependent bugs.

Note that this does not represent a shift to a cloud only architecture. Anything that can be done with the support of other machines can also be done locally, and it should make no difference to the replay control logic whether replaying child are local or remote, except perhaps for optimizing interactions with children according to latency/bandwidth restrictions. In any case, it will be some time before any cloud integration actually happens.

Depends on: 1547082
Depends on: 1547084
Depends on: 1547089
Depends on: 1547091
Depends on: 1547092
Depends on: 1547093
Depends on: 1547094
Depends on: 1547864
Priority: -- → P5
Depends on: 1554524
Depends on: 1556813
Depends on: 1556819
Depends on: 1556847
Depends on: 1556858
Depends on: 1557880
Depends on: 1570089
Depends on: 1570091

Web replay has undergone a major architectural change over the last several months which has both hugely improved replaying performance and put things in a good place for integrating with machines in the cloud. It feels like a good time to look into the remaining steps for cloud support.

There is a related concern here which needs to be considered in tandem. Right now we only support recording on macOS, and replaying on macOS. The supported platforms need to be extended to linux and windows, at least --- linux is a common platform for CI systems, and many web developers are on windows. With our current strategy this would require separate cloud installations for each platform, so that we can replay recordings made with that platform. This is daunting, especially because macOS cloud options are very limited (low demand probably, plus hardware constraints imposed by Apple).

Web replay's recording boundary is set so that code from the underlying platform does not actually run when replaying --- platform APIs are redirected, and when replaying they produce recorded outputs instead of running their underlying logic (this gets a little fuzzy though with the middleman call system used to call the platform after diverging from the recording, but that's ok to ignore for now). This means that, in principle, a recording made on one platform could be replayed on another. I'd like to see if this can actually be done.

If this can be done, the steps that need to be done to get both cloud integration and cross platform support working are roughly as follows, and not necessarily in order:

  • Port web replay to linux, with full recording and replaying support.

  • Support replaying macOS recordings on linux.

  • Partially port web replay to windows, so that recordings can be made on windows and replayed on linux.

  • Develop a cloud based system that uses linux to replay recordings made on any platform.

One important thing this strategy avoids is needing to support rewinding on windows. When I first tried to port web replay to windows a couple years back, this was the main sticking point I ran into. This could maybe be fixed by only supporting win64 firefox and not win32 (windows-on-windows was doing some tricky, opaque things), but not having to rewind at all would be much easier (it would, however, walk back comment 0 to some degree, oh well). We could then look into removing the snapshot system used when rewinding, which is very complex and difficult to debug, and fork() new processes instead. This would be both much simpler and potentially more flexible and performant; I haven't considered using fork() in the past because windows does not support fork(), but this strategy provides an alternative solution to that issue.

Disclaimer: work I'm doing on web replay is done on my own time, and is not part of my official work at mozilla. If you have a problem with me working on this, please talk to me directly.

Depends on: 1595419
Depends on: 1597149
Depends on: 1598951
Depends on: 1603856
Depends on: 1603941
Depends on: 1603944
Depends on: 1603945
Depends on: 1605584
Depends on: 1606217
Depends on: 1606225
Depends on: 1606447
Depends on: 1606729
Depends on: 1607014
Depends on: 1607047
Depends on: 1607074
Depends on: 1607101
Depends on: 1607259
Depends on: 1607297
Depends on: 1607739
Depends on: 1607820
Depends on: 1608171
Depends on: 1608261
Depends on: 1608667
Depends on: 1609007
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.