Closed
Bug 1422587
Opened 7 years ago
Closed 6 years ago
Web Replay v2
Categories
(Core Graveyard :: Web Replay, defect)
Core Graveyard
Web Replay
Tracking
(firefox59 affected)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox59 | --- | affected |
People
(Reporter: bhackett1024, Assigned: bhackett1024)
References
()
Details
Attachments
(38 files, 7 obsolete files)
A few weeks ago I restarted work on Web Replay (bug 1207696). I don't have a good explanation for why the initial attempt to land this ran out of steam, but I believe in this technology and want to keep developing it. So far I've been focusing on a couple major architectural changes:
1. Allow a recording process to transition into a replaying process. This will greatly simplify the workflow users need to follow to replay a page: just record the debugging session, and at any time execution can be rewound to replay earlier behaviors. After a process is first rewound it won't be able to record again by e.g. fast forwarding to the end of the recording, but maybe this could be supported someday.
2. Overhaul graphics handling to greatly reduce the amount of interaction between record/replay code and graphics code. The new model is to put a compositor in the recording/replaying process which renders data into a buffer, whose contents are relayed to the parent process' compositor by some simple layer messages in the middleman process. This avoids shuttling graphics IPDL messages through the middleman, which is tricky because these messages are a moving target and because their semantics are unclear when rewinding. Additionally, I've torn out the special DrawTargets used when recording/replaying, which were more trouble than they were worth.
The attached patch makes some progress on these: recorded processes mostly work in the presence of a middleman, including using their own compositor and being able to show graphics in the parent process.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → bhackett1024
Assignee | ||
Comment 1•7 years ago
|
||
This patch makes some fixes so that recording works better, and some more intricate changes to allow snapshots to be taken while recording. This seems to be working pretty well. The remaining steps in the rearchitecting are to get replaying/rewinding working again (the main issue is to deal with the new graphics changes) and then to allow transitioning recording processes into replaying processes and rewinding them.
Attachment #8933986 -
Attachment is obsolete: true
Assignee | ||
Comment 2•7 years ago
|
||
This patch fixes things so that replaying works again, including showing graphics. Graphics turned out to be pretty simple --- on the workloads I've looked at, when using the Skia backend the native CoreGraphics APIs are mainly used for rendering fonts, and the rendered pixel data can be included directly in the recording without bloating it very much.
Attachment #8937049 -
Attachment is obsolete: true
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 3•7 years ago
|
||
I updated the UI in the new debugger client and with this patch plus those changes (I'll try to land those in the client later this month) can again rewind, reverse/forward step, eval(), etc. in a replaying process.
Attachment #8939339 -
Attachment is obsolete: true
Assignee | ||
Comment 4•7 years ago
|
||
This patch completes the rearchitecting described in comment 0. The debugger can be attached to a recording process, paused, then rewound to replay the process' earlier behavior. I'll be rebasing next and will start testing on more complicated web pages.
Attachment #8941411 -
Attachment is obsolete: true
Assignee | ||
Comment 5•7 years ago
|
||
Rebased and updated WIP that works on more complicated websites. I've been able to record/replay npr.org and the facebook.com newsfeed. Up next are testing/bugfixing and some optimizations.
Attachment #8942715 -
Attachment is obsolete: true
Assignee | ||
Comment 6•7 years ago
|
||
I've been working on fixing bugs, making some optimizations to the snapshot process, and fit-and-finish improvements (like showing alerts instead of dumping problems to stderr). There are still a few more things to fix but I'm hoping to release something (make a branch that people can push to, and a .dmg to download) by the end of this month.
Attachment #8946154 -
Attachment is obsolete: true
Assignee | ||
Comment 7•7 years ago
|
||
I've started using the ash project repo (https://hg.mozilla.org/projects/ash) for Web Replay development. Tests/builds aren't working yet (bug 1439731) but after that is fixed this should be a suitable place for automated testing, providing builds, and allowing other folks to contribute before things are ready to merge to trunk.
I've attached the rolled up initial patch I pushed. I'll be attaching smaller patches shortly for the individual csets this includes, besides the stuff from bug 1207696. A few of these are a little rough but most are ready for review, though getting them reviewed is not a priority for me right now.
I'll file other bugs blocking this for future changes in the repository, and other bugs with finalized patches from this one when it is time for them to be reviewed (trying to avoid the madhouse that bug 1207696 turned into).
Attachment #8951572 -
Attachment is obsolete: true
Assignee | ||
Comment 8•7 years ago
|
||
Disable the content process sandbox when recording/replaying, so the process can access recording and temporary snapshot files.
Assignee | ||
Comment 9•7 years ago
|
||
Disable stylo in recording/replaying processes. This is a bummer but on more complicated web sites I've seen some differences between recording/replay in the order in which layout things are processed. This is probably due to a hashtable being iterated in a different order or something but I haven't had time to determine what the problem is.
Assignee | ||
Comment 10•7 years ago
|
||
This atomic value is accessed during GC.
Assignee | ||
Comment 11•7 years ago
|
||
malloc's behavior is allowed to vary between recording/replaying, so atomic values it uses should not be recorded.
Assignee | ||
Comment 12•7 years ago
|
||
Avoiding recording this atomic makes it easier to see what is going on when assertions fail.
Assignee | ||
Comment 13•7 years ago
|
||
The Gecko profiler may be active during GC, so atomics it uses shouldn't be recorded.
Assignee | ||
Comment 14•7 years ago
|
||
I think these atomics can be accessed during GC.
Assignee | ||
Comment 15•7 years ago
|
||
When recording times related to script parsing, use the actual time rather than recording/replaying the time. Script parsing should be happening on the main thread in a deterministic way (so that scripts are created in a specific order, which the debugger relies on) so I'm not 100% sure this change is necessary.
Assignee | ||
Comment 16•7 years ago
|
||
Avoid using getenv in a few places in the GC and TI that might run non-deterministically.
Assignee | ||
Comment 17•7 years ago
|
||
The next patch makes some changes to the rayon-core crate, so until those changes are upstreamed use the local in-tree version of the crate.
Assignee | ||
Comment 18•7 years ago
|
||
rayon-core (the rust crate managing the thread pool for stylo and other things) uses atomics for coordination between threads, which are not recorded. This patch wraps those atomics in locks so that their accesses are correctly ordered between recording and replay. This patch is rather hacky and a placeholder until there is a better way for interfacing between rust code and the record/replay system.
Assignee | ||
Comment 19•7 years ago
|
||
This patch includes refcount changes to Runnable and its subclasses in the recording. I added this because I saw a subclass where (a) the destructor performed recorded operations, and (b) the thread where the last reference was released could vary across runs. Recording the refcount is only necessary on runnables where both of these are true, so this patch might be a little heavy handed but seems ok for now.
Assignee | ||
Comment 20•7 years ago
|
||
Add some IDL so that JS code can insert debugging assertions to check for consistency between recording and replay. This is called as 'recordReplayAssert(text)', similar to 'dump(text)'.
Assignee | ||
Comment 21•7 years ago
|
||
Some other patches create platform mutexes that do not need to be recorded (e.g. JS mutexes), so specify whether these mutexes should be recorded in the constructor.
Assignee | ||
Comment 22•7 years ago
|
||
Add a template argument to JS hashtables which specifies whether to preserve iteration order across recording/replaying. This preservation is done by ensuring a deterministic hash number for each item, which itself is done by setting every hash number in the table to zero when recording/replaying. Since this makes accesses on the table take linear time, and since only a couple JS hashtables actually need to preserve iteration order, the template argument was added to control whether to do the preservation.
Assignee | ||
Comment 23•7 years ago
|
||
Name tables in the frontend affect the order in which variables are defined in generated scripts, which can affect the recording by calling lookup hooks in different orders. This patch preserves iteration order in these tables.
Assignee | ||
Comment 24•7 years ago
|
||
Make sure that the child recording/replaying process of the middleman process is spawned in the same way as the middleman process itself. Without this a different executable is chosen I think and things do not work.
Assignee | ||
Comment 25•7 years ago
|
||
Instead of using a special IPDL protocol for communication between the middleman and replaying process (as in bug 1207696), IPC between these processes now uses a special bidirectional channel. This change is done because middleman processes are now used when recording as well (so the recording process can transition into a replaying process) and we will get terribly confused if live IPC is done over a single IPDL channel where some of it is included in the recording and some isn't.
This patch includes the logic for this channel, definitions of messages passed between the processes, and all the logic for managing state and communicating between these processes.
Assignee | ||
Comment 26•7 years ago
|
||
This patch has a couple changes to IPC protocol code so that middleman processes can send IPDL messages to the recording/replaying process and so that we don't get confused by seeing unexpected pids when IPDL messages are forwarded through the middleman between the UI process and recording/replaying process.
Assignee | ||
Comment 27•7 years ago
|
||
The ContentProcess instance in the middleman process needs some changes to use special message channels for forwarding IPDL messages between processes. This patch has some changes to ContentProcess and PContent itself that also avoid sending messages that are redundant with messages sent by the recording process. The UI process will get confused if certain messages are sent to it by both the middleman and recording processes.
Assignee | ||
Comment 28•7 years ago
|
||
The middleman needs to make deep copies of IPDL messages in order to send them to another process, which this patch adds support for.
Assignee | ||
Comment 29•7 years ago
|
||
The UI process will get confused if audio IPC connections are opened to it from both the middleman and recording processes.
Assignee | ||
Comment 30•7 years ago
|
||
The UI process will get confused if PJavaScript protocol actors are opened to it from both the middleman and recording processes.
Assignee | ||
Comment 31•7 years ago
|
||
The middleman will get confused if printing IPC connections are opened to it from both the middleman and recording processes.
Assignee | ||
Comment 32•7 years ago
|
||
Async messages sent over IPC via the frame message manager sometimes need to go to the middleman process (if they are related to the debugger server code which runs there), and otherwise should go to the recording process.
Assignee | ||
Comment 33•7 years ago
|
||
Add IDL and IPDL glue to allow JS code in the UI process to instruct the recording process to save the recording to a file.
Assignee | ||
Comment 34•7 years ago
|
||
Add IPC so that fatal errors in the recording/replaying process (including crashes, but also divergence between recording and replay and some other stuff) are reported to the user. Currently this just throws up an alert, though I'm sure in the future it will be more refined.
Assignee | ||
Comment 35•7 years ago
|
||
The recording/replaying process creates its own compositor which renders to a buffer that is sent to the middleman process after each paint. This patch has changes to graphics code to make this possible, and to make sure the middleman does not try to do any rendering of its own.
Assignee | ||
Comment 36•7 years ago
|
||
Using a separate compositor in the recording/replaying process means that only a single layer is sent to the UI process containing all rendered data for the tab. This seems to make the APZ (Async Pan/Zoom) code think the page is like a static canvas and scrolling does not work correctly. This patch disables APZ in recording/replaying processes so that scrolling works correctly (albeit not as smoothly as with APZ).
Assignee | ||
Comment 37•7 years ago
|
||
Add preferences for record/replay settings.
Assignee | ||
Comment 38•7 years ago
|
||
Add menu items for recording, saving the recording, and replaying.
Assignee | ||
Comment 39•7 years ago
|
||
Server side devtools changes, on top of those covered by bug 1207696.
Assignee | ||
Comment 40•7 years ago
|
||
Some client side devtools changes to code that is (I think) hosted in m-c rather than on github.
Assignee | ||
Comment 41•7 years ago
|
||
Normally the debugger tries itself to fetch the contents of source files (.html, .js, etc.) referenced in a page before displaying them. This doesn't work right when running in a middleman process, but more importantly when replaying there is no guarantee the current contents of those source files match up with what they were when recording. This patch adds an API and DOM users of that API to supply JS with the contents of source files it comes across, which are relayed to the debugger JS code via the debugger C++ object.
Assignee | ||
Comment 42•7 years ago
|
||
Assorted crufty fixes and hacks to get things working. These will need to be revisited before landing and either removed or tidied up into proper patches.
Assignee | ||
Comment 43•7 years ago
|
||
Various record/replay assertions for checking consistent behavior between recording and replaying. There are some intermittent issues I haven't tracked down which some of these are related to, but most of these were added to debug problems that have since been fixed. They aren't hurting anything other than adding clutter, and since they help with detecting inconsistencies earlier it seems good to keep them around for now.
Assignee | ||
Comment 44•7 years ago
|
||
Various fixes to the previous patches to get things to compile on the tryserver, mostly to satisfy the clang static analyses.
Assignee | ||
Comment 45•7 years ago
|
||
Updated•6 years ago
|
Component: General → Web Replay
Assignee | ||
Comment 46•6 years ago
|
||
All the code from dependent bugs has landed, excepting tests, and Web Replay should be in tomorrow's nightly (Mac only for now). Features are turned on via the devtools.recordreplay.enabled pref (default to false for now) and accessed via the 'Tools -> Web Developer' menu. Things might not be super stable yet.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment 47•6 years ago
|
||
(In reply to Brian Hackett (:bhackett) from comment #46)
> All the code from dependent bugs has landed, excepting tests, and Web Replay
> should be in tomorrow's nightly (Mac only for now). Features are turned on
> via the devtools.recordreplay.enabled pref (default to false for now) and
> accessed via the 'Tools -> Web Developer' menu. Things might not be super
> stable yet.
From code coverage notifications, we noticed the code was not tested. Do you have WIP patches for adding tests?
Flags: needinfo?(bhackett1024)
Comment 48•6 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #47)
> From code coverage notifications, we noticed the code was not tested. Do you
> have WIP patches for adding tests?
See bug 1465491, which has some patches in progress. (This code is also OSX and Nightly only right now.)
Updated•6 years ago
|
Flags: needinfo?(bhackett1024)
Updated•5 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•