Status

()

Core
General
2 years ago
13 days ago

People

(Reporter: bhackett, Assigned: bhackett)

Tracking

(Depends on: 5 bugs, Blocks: 5 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox44 affected)

Details

(URL)

Attachments

(81 attachments, 118 obsolete attachments)

365.40 KB, patch
jandem
: review+
Details | Diff | Splinter Review
1.04 KB, patch
billm
: review+
Details | Diff | Splinter Review
3.50 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
5.21 KB, patch
billm
: review+
Details | Diff | Splinter Review
2.16 KB, patch
Away for a while
: review+
Details | Diff | Splinter Review
10.19 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.83 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
3.57 KB, patch
bas
: review+
Details | Diff | Splinter Review
22.59 KB, patch
jandem
: review+
Details | Diff | Splinter Review
875 bytes, patch
fitzgen
: review+
Details | Diff | Splinter Review
1.26 KB, patch
njn
: review+
Details | Diff | Splinter Review
1.44 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.18 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
858 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
2.39 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.92 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.15 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
11.32 KB, patch
sfink
: review+
Details | Diff | Splinter Review
2.74 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.18 KB, patch
jonco
: review+
Details | Diff | Splinter Review
949 bytes, patch
BenWa
: review+
Details | Diff | Splinter Review
2.92 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
1.63 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.15 KB, patch
bas
: review+
Details | Diff | Splinter Review
2.17 KB, patch
jandem
: review+
Details | Diff | Splinter Review
1.91 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
1.01 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
991 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
955 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
4.72 KB, patch
jesup
: review+
Details | Diff | Splinter Review
1.17 KB, patch
smaug
: review+
Details | Diff | Splinter Review
1.24 KB, patch
dvander
: review+
Details | Diff | Splinter Review
1.25 KB, patch
luke
: review+
Details | Diff | Splinter Review
2.03 KB, patch
mrbkap
: review+
Details | Diff | Splinter Review
5.48 KB, patch
billm
: review+
Details | Diff | Splinter Review
5.87 KB, patch
jandem
: review+
Details | Diff | Splinter Review
4.82 KB, patch
jorendorff
: review+
Details | Diff | Splinter Review
10.13 KB, patch
nical
: review+
Details | Diff | Splinter Review
9.41 KB, patch
nical
: review+
Details | Diff | Splinter Review
1.01 KB, patch
billm
: review+
Details | Diff | Splinter Review
1.60 KB, patch
billm
: review+
Details | Diff | Splinter Review
1.93 KB, patch
njn
: review+
Details | Diff | Splinter Review
2.94 KB, patch
billm
: review+
Details | Diff | Splinter Review
2.85 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
3.69 KB, patch
billm
: review+
Details | Diff | Splinter Review
1.12 KB, patch
BenWa
: review+
Details | Diff | Splinter Review
219.10 KB, patch
Details | Diff | Splinter Review
137.62 KB, patch
Details | Diff | Splinter Review
14.80 KB, patch
nical
: review+
Details | Diff | Splinter Review
17.85 KB, patch
bas
: review-
Details | Diff | Splinter Review
1.08 KB, patch
bas
: review-
Details | Diff | Splinter Review
1.13 KB, patch
bas
: review-
Details | Diff | Splinter Review
6.57 KB, patch
jrmuizel
: review+
Details | Diff | Splinter Review
3.59 KB, patch
bas
: review+
Details | Diff | Splinter Review
5.51 KB, patch
bas
: review-
Details | Diff | Splinter Review
3.04 KB, patch
bas
: review-
Details | Diff | Splinter Review
21.61 KB, patch
Details | Diff | Splinter Review
15.70 KB, patch
jimb
: review+
Details | Diff | Splinter Review
3.08 KB, patch
Details | Diff | Splinter Review
1.84 KB, patch
mccr8
: review+
Details | Diff | Splinter Review
2.81 KB, patch
billm
: review+
Details | Diff | Splinter Review
19.86 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
5.34 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
7.05 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
1.82 MB, patch
Details | Diff | Splinter Review
32.37 KB, patch
billm
: review+
froydnj
: review+
Details | Diff | Splinter Review
21.30 KB, patch
Details | Diff | Splinter Review
125.45 KB, patch
billm
: review+
Details | Diff | Splinter Review
84.62 KB, patch
billm
: review+
Details | Diff | Splinter Review
73.83 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
254.65 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
27.31 KB, patch
smaug
: review+
Details | Diff | Splinter Review
39.18 KB, patch
billm
: review+
Details | Diff | Splinter Review
41.64 KB, patch
nical
: review+
Details | Diff | Splinter Review
16.04 KB, patch
bas
: review+
Details | Diff | Splinter Review
22.54 KB, patch
billm
: review+
Details | Diff | Splinter Review
7.59 KB, patch
billm
: review+
Details | Diff | Splinter Review
3.87 KB, patch
billm
: review+
Details | Diff | Splinter Review
16.99 KB, patch
nical
: review+
Details | Diff | Splinter Review
11.28 KB, patch
bas
: review+
Details | Diff | Splinter Review
22.17 KB, patch
bas
: review+
Details | Diff | Splinter Review
(Assignee)

Description

2 years ago
Over the past couple weeks, roc and I have come up with a design that looks promising for web recording and deterministic replay: an rr like tool that can be integrated into Firefox and used by web and browser developers to debug content processes.

Browser non-determinism originates from two sources: intra-thread and inter-thread.  Intra-thread non-deterministic behaviors are non-deterministic even in the absence of actions by other threads, and inter-thread non-deterministic behaviors are those affected by interleaving execution with other threads, and which always behave the same given the same interleaving.

Intra-thread non-determinism can be recorded and later replayed for each thread.  These behaviors mainly originate from system calls (i/o and such), and can be instrumented at the library call level --- this means we can just use shims at library call sites in Gecko code rather than modifying the libraries themselves, and is helpful on Windows, where the library call API is stable/documented but not the system call API.

Inter-thread non-determinism can be handled by first assuming the program is data race free: shared memory accesses which would otherwise race are either protected by locks or use mozilla::Atomic.  If we record and later replay the order in which threads acquire locks (and, by extension, release locks and use condvars) then accesses on lock-protected memory will occur in the same order.  Atomic variables can be handled either by treating reads and writes as if they were wrapped by a lock acquire/release during recording, or by treating reads as intra-thread non-determinism (record and replay the values produced).  I like the latter more (less overhead), though it wouldn't work well with atomic pointer values so a mix of the two is probably best (needs experimentation).

This design is broadly similar to rr, with some major differences:

- This should work on all platforms and architectures supported by Gecko, without much port work required.

- This will be part of Gecko itself, rather than a separate tool, which means both that developers won't need additional software to use it and that this can't be used to debug other software.

- This can use multiple cores during recording and replay.

- This does not preserve exact behavior.  Context switches can occur at different times and data races can lead to different behavior between recording and replay.  Data races are bugs in and of themselves, however, so this sort of non-determinism should be fixed regardless.  There's one exception here that I know of: SharedArrayBuffer can be used by web content to introduce data races to the browser.  Pages which use SharedArrayBuffer can still be recorded, but using a different technique from above which will probably use a single core for efficiency's sake.

More generally, this design is flexible enough that it could diverge from rr in other ways.  For example, it would be really slick if the existing debugger in the browser could be activated during replay without perturbing behaviors which the debugger can observe.  This should greatly simplify implementations of a rewind debugger and/or omniscient debugger (build a database of behaviors like DOM/JS object modifications and executed scripts for querying) based on the replayer, but would mean the recording and replay executions could have different pointer values and code paths taken in some places.

For now though, this bug is about building a prototype to test how well these ideas work for a basic record/replay system.
(Assignee)

Comment 1

2 years ago
Created attachment 8666530 [details] [diff] [review]
WIP

Initial WIP, attaching for posterity.  This records NSPR thread/lock creation order and lock acquisition order into files, along with some system calls information for each thread.  The record/replay architecture is all inside NSPR, and it seems best for now to keep it there and use NSPR for threads/locks everywhere in the browser (this patch already fixes up thread creation in the chromium message passing stuff to use NSPR) though I don't know how long that strategy will last.

This patch intercepts system calls by adding new functions like PR_RecordReplay_kevent and replacing direct system calls with calls to the wrapper.  I'm going to rework this strategy; on OS X system calls seem to all go through a userspace library (libsystem_kernel.dylib) whose code can be modified to redirect calls to wrapper functions.  This should let system calls within complex libraries be intercepted without modifying the libraries, and moves the design closer to rr.  I don't know if this will work on all platforms (or even if it will actually work on OS X) but it should simplify the prototype.
Assignee: nobody → bhackett1024
(Assignee)

Comment 2

2 years ago
Created attachment 8668518 [details] [diff] [review]
WIP

Updated WIP that modifies code in libsystem_kernel.dylib and creates trampolines so that system calls through this library can be redirected to a stub that performs the intended call and records information about the result.  Right now recording seems to be working fine for all the system calls I can find running the content process on a basic website (nytimes.com) using dtruss.  The code modification is pretty hacky (pattern matching disassembly of the code patterns seen in my computer's libsystem_kernel.dylib) but the general approach looks to be basically the same as other hooking tools (e.g. Detours from MSR, http://research.microsoft.com/pubs/68568/huntusenixnt99.pdf).

The next step here is to record atomic variable accesses.
Attachment #8666530 - Attachment is obsolete: true
We have a similar hooking system in xpcom/build/nsWindowsDllInterceptor.h; perhaps there may be an opportunity for code re-use.
(Assignee)

Comment 4

2 years ago
(In reply to David Major [:dmajor] from comment #3)
> We have a similar hooking system in xpcom/build/nsWindowsDllInterceptor.h;
> perhaps there may be an opportunity for code re-use.

Cool.  Yes, CreateTrampoline() in that file is almost shockingly similar to Redirect() in this patch.  To reuse this we'll need to expand things to work on other operating systems and architectures but that doesn't seem too hard.
(Assignee)

Comment 5

2 years ago
Created attachment 8669236 [details] [diff] [review]
WIP

WIP with handling for atomics.  This adds a template argument to mozilla::Atomic to control how much of their behavior is preserved during replay (similar to how the existing MemoryOrdering argument controls the reordering optimizations the compiler is allowed to do wrt the atomic).  For now a couple atomics in the JS engine used for interrupts are not recorded at all --- these are accessed all the time, including from jitcode, and recording and replaying executions which are interrupted by the slow script dialog isn't really something that can practically be done (there are a couple other events that will prevent recording I think, like throwing overrecursion errors, but nothing major).  All other atomics record enough information to preserve the exact order of their accesses, but there is flexibility to relax that as needed.
Attachment #8668518 - Attachment is obsolete: true
(Assignee)

Comment 6

2 years ago
Created attachment 8678876 [details] [diff] [review]
WIP

Updated WIP that fixes a lot of bugs in the record/replay infrastructure.  Replaying still balks early during startup; the main issue I'm working on now is how to deal with Grand Central Dispatch (GCD), an OS X API for task parallelism that is built using pthreads and some kernel syscalls that are undocumented and probably shift around between OS versions.  Since GCD has its own thread pool, even if we can record its behavior it will be very hard to rewind process state like we will eventually want to do, so I think the best thing here is to intercept calls to the public interface of GCD (which is hopefully fairly stable) and emulate its behavior using our own thread pool and synchronization.  I don't know if GCD is used heavily anywhere in browser code; everything I've seen has been deep inside calls to functions in other system libraries.
Attachment #8669236 - Attachment is obsolete: true

Comment 7

2 years ago
AFAIK we don't use GCD in Firefox.
(Assignee)

Comment 8

2 years ago
Created attachment 8679613 [details] [diff] [review]
WIP

This patch improves the hooking infrastructure used and is able to intercept all calls into GCD.
Attachment #8678876 - Attachment is obsolete: true
(Assignee)

Comment 9

2 years ago
Created attachment 8681621 [details] [diff] [review]
WIP

This patch records a fair amount of GCD's functionality.  The next step is to rejigger things so that all the GCD work items run serially on a single thread we create, which will make it straightforward to record execution of the work items.  I'm hoping this won't introduce deadlocks due to artificial contention for the work item thread, but we'll see.
Attachment #8679613 - Attachment is obsolete: true
(Assignee)

Comment 10

2 years ago
Created attachment 8681714 [details] [diff] [review]
WIP

This patch records enough of GCD's functionality that a simple "Hello world" HTML file can be loaded.  Unfortunately, forcing all the GCD work items onto a single thread doesn't work out, due to new deadlocks being introduced, so recorded data for work items is still not suitable for use in replay.

Plan B is now to mimic GCD more closely during replay.  During recording we keep track of the maximum number of simultaneously executing GCD work items (as well as work items that are executed in place when GCD is called by our own threads), and during replay create our own pool of threads and assign work items to them to maintain the same degree of parallelism.  This is more complicated but should be able to replay any GCD execution.
Attachment #8681621 - Attachment is obsolete: true
(Assignee)

Comment 11

2 years ago
Created attachment 8681976 [details] [diff] [review]
WIP

This patch should have everything that is needed to record and replay GCD execution in the browser on a simple webpage.  The next step is getting back to making replay actually work.
Attachment #8681714 - Attachment is obsolete: true
(Assignee)

Comment 12

2 years ago
Created attachment 8682744 [details] [diff] [review]
WIP

This patch fixes a fair number of issues with GCD replay.  Attaching this now because I'm going to take a step back and try a new approach.

Up to now these patches have been trying to instrument things at as low a level as possible, first at the system call level and then at the GCD level when that library posed problems for replay.

The main problem I'm running into now is the same as when everything was at the system level.  It's very easy for the replay to go off the rails and start behaving differently, either in a way we can detect (different order of system calls) or just by crashing in the middle of some library.  Most of these libraries are closed source and debugging these problems is extremely difficult.

More problematic though is that some of these libraries shouldn't even be expected to behave the same during replay.  If a library's behavior is affected by thread interleavings --- it passes information between threads, lazily initializes things, etc. --- then the calls it makes won't be consistent between executions.  And without modifying the source for such libraries this problem isn't really solvable.  Hooking functions like pthread_mutex_lock would help, but a lot of synchronization is done using inlined atomic operations which can't be hooked.

So I think a better interface to target might be the one between code compiled into the browser and external libraries the browser is linked against.  Libraries which are well behaved and operate in a thread local fashion can still be recorded/replayed at the system call level, but more complex multithreaded libraries can be isolated during recording and then have their observable effects replayed later.
Attachment #8681976 - Attachment is obsolete: true
Brian, can you describe the goal here a little better? If it's to make it easier to gecko debuggers to debug gecko, then I can understand the design. But I don't see how this could be used to create a JS record-and-reply debugger. It seems too low level. If the user wanted set a new breakpoint, it seems like it would perturb our state enough that the trace would no longer be valid. Comment 0 addresses that, but I don't quite understand what the conclusion was.
(Assignee)

Comment 14

2 years ago
(In reply to Bill McCloskey (:billm) from comment #13)
> Brian, can you describe the goal here a little better? If it's to make it
> easier to gecko debuggers to debug gecko, then I can understand the design.
> But I don't see how this could be used to create a JS record-and-reply
> debugger. It seems too low level. If the user wanted set a new breakpoint,
> it seems like it would perturb our state enough that the trace would no
> longer be valid. Comment 0 addresses that, but I don't quite understand what
> the conclusion was.

This should be usable for people debugging Gecko, but it is primarily intended for the browser's debugger and web developers.

Whether this will work with the browser's debugger as it currently exists is still speculation, so comment 0 raised the question but that's all it is for now.  With a design like rr the state can't be perturbed at all during replay without diverging completely from the recording, so using the browser debugger wouldn't work without massive modifications.  This design offers enough flexibility that certain state, like what code is JIT compiled, when GCs occur, how many allocations occur, and so forth should be able (again, speculative) to vary between the recording and replay, in a way that would give the debugger enough room to do its thing.

For example, we can't expect to record and replay mmap and mprotect calls exactly, but during replay can keep track of which memory is mapped in which way at snapshots so that we can rewind process state in a sensible way.  If the presence of the debugger can affect the existence or timing of operations that are in the recording (for example, finalizers called from the GC which close file descriptors) then we would need to modify either the information we record or Gecko itself to be able to move forward.
Another question: how would we get the browser to the same initial state as the recording started in? Would we save the raw bits in the memory mapping or would we have some kind of serialization/deserialization for the entire heap?
(Assignee)

Comment 16

2 years ago
(In reply to Bill McCloskey (:billm) from comment #15)
> Another question: how would we get the browser to the same initial state as
> the recording started in? Would we save the raw bits in the memory mapping
> or would we have some kind of serialization/deserialization for the entire
> heap?

Right now the focus is on recording as soon as the process starts.  We would only record e10s content processes, so from the user's perspective the "start recording" button or whatever would open a new tab in a new process whose behavior could be replayed later (also in a new process).

Starting to record in the middle of executing an existing content process would be very cool but I don't know how feasible it is.  I'm interested in using a memory snapshot based system for rewinding (this isn't required though, and the record/replay mechanism is largely separate from the eventual rewind and/or omniscient debugging mechanism), so we could potentially (again, all this is speculative) start recording at any point by taking a memory snapshot (would need to know all mapped memory in the process, which might require hooking mmap/mprotect whether we are recording or not), recording until the user desires to stop, then reverting the process to the initial snapshot for replay.

Replaying such a partial recording in a new process would be even harder, and may be intractable if ASLR is in use.
(Assignee)

Comment 17

2 years ago
Created attachment 8683933 [details] [diff] [review]
WIP

This patch intercepts enough system library calls so that on a simple webpage we never enter GCD without passing through a function or logic whose behavior is recorded (and will not execute at all during replay).  Generally this is done by hooking the library functions, though objective C messages are handled via manual instrumentation (I don't know how or whether these message invocations can be hooked).

I really like how this approach is going; everything that is hooked is called directly from browser code, these functions are all documented public APIs (GCD has a tricky issue where other system libraries enter it using undocumented private APIs), and this strategy should carry over well to Windows and other OSes.
Attachment #8682744 - Attachment is obsolete: true
(Assignee)

Comment 18

2 years ago
Created attachment 8685487 [details] [diff] [review]
WIP

This patch fixes a lot of replay issues.  Replay gets much further than in the past, though still not far enough to completely replay a simple webpage load.  The new approach is still going very well, with replay issues much easier to track down and fix (since all the involved code is compiled with the browser rather than linked in).
Attachment #8683933 - Attachment is obsolete: true
Thanks Brian, this is really interesting.

> Hooking functions like pthread_mutex_lock would help, but a lot of synchronization is done using inlined atomic operations
> which can't be hooked.

OK, that's good to know.

Why do we need to wrap void library functions like, e.g., CTFontManagerSetAutoActivationSetting?

Why do we need CopyInstruction? We might be able to get that code from an existing library. I'm pretty sure we could extract the relevant parts of DynamioRio for example.

It's not clear to me what state is now being preserved between recording and replay. We'll need a precise definition of that.
(Assignee)

Comment 20

2 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #19)
> Why do we need to wrap void library functions like, e.g.,
> CTFontManagerSetAutoActivationSetting?

During recording, many of these void functions invoke system calls or other library calls in a way we can't precisely replay.  This is mainly due to the libraries being multithreaded and performing synchronization in a way we don't record, but there's also drag due to not having source to these libraries and the ability to understand why replay broke down.

During replay, we don't execute much of the code in these libraries, due to their unpredictable behavior.  Even if a library API behaves predictably, many of the opaque heap structures (like CFStringRef) are replayed as 0x1 constants so the API will crash if we call it with that fake value.

Right now we include information about all these void library functions in the recording, but that isn't necessary --- we could just hook them and nop them during replay but not include any information in the recording.

> Why do we need CopyInstruction? We might be able to get that code from an
> existing library. I'm pretty sure we could extract the relevant parts of
> DynamioRio for example.

CopyInstruction started as an expedient hack (it's similar to what nsWindowsDllInterceptor.h does) and then metastasized.  At this point it's not much of a drag on development but in the long run it would be much better to get this functionality from a library.

> It's not clear to me what state is now being preserved between recording and
> replay. We'll need a precise definition of that.

Yes, for sure.  Right now the ideal is that the replay should preserve all logic (control flow and scalar values, but not pointer values) from the recording except for the logic that goes on inside system calls and Cocoa library calls.  There are some exceptions: the replay can GC and Ion compile scripts differently (which has effects on other parts of the system, like nsISupports refcounts differing during replay), and some other library calls are emulated (e.g. localtime_r sometimes uses GCD).  There's no effort to make sure we capture all Cocoa entry points, though, and this model is in flux and subject to further evolution.  There are a couple main issues:

- Not interacting with the system means that nothing will get painted, which we'll of course need to do for the replay to be sensible to users.  A first cut might work to just record the bitmap data that is painted (maybe optimized to diff the data on subsequent paints) but I can't imagine that being a viable approach for graphics intensive pages like games.

- The issue of attaching the JS debugger still looms.  The GC/Ion compilation stuff above should help with this quite a bit, but there are other things like how to interact with the parent process (a separate IPC message loop, presumably) that will cause the replayed process to diverge further from the recording.

Brian
(Assignee)

Comment 21

2 years ago
Created attachment 8688706 [details] [diff] [review]
WIP

This patch is able to completely replay the load of a simple website.  Yay!  Of course once the child process exits on its own the parent process thinks it crashed, but whatever.  The next step is to get the simple graphics replaying mentioned in comment 20 to work (we already need to precisely record and replay graphics data, since some things like page thumbnails rely on it), which I think will require the parent and child processes to communicate with each other.
Attachment #8685487 - Attachment is obsolete: true
(Assignee)

Comment 22

2 years ago
Created attachment 8688745 [details] [diff] [review]
WIP

A fair amount of the previous patch is asserts added to debug replaying problems.  This patch removes these.
Attachment #8688706 - Attachment is obsolete: true
(Assignee)

Comment 23

2 years ago
Created attachment 8691590 [details] [diff] [review]
WIP

Unless I'm totally misunderstanding things, the content process paints data by rendering layers and then sending updates to the compositor thread in the parent process via IPDL.  There are at least a couple different ways to relay updates which the replayed process tries to perform to the actual parent process, but I think the most robust is for the replayed process to be able to communicate with the parent process via IPDL in a manner independent from the original recording.  This should work well with future IPC we need to do, like having the parent process control the replay via rewinding, etc., and having the parent process communicate with debugger actors.

The attached patch updates things so that during replay we create two new threads: one thread has a chromium I/O message loop which communicates with the actual parent process, and the second thread has a second message loop to process messages coming in from the parent process.  These threads operate independently from the threads serving this role in the original recording (IOThreadChild and the main thread), which during replay act in lockstep with how they behaved in the recording and don't do any actual IPC.

This patch is far enough along that we create an alternative to ContentChild, ReplayContentChild, which looks like it's communicating with the parent process, though as soon as the parent sends ReplayContentChild a message it forces a crash.
Attachment #8688745 - Attachment is obsolete: true
Separate IPDL threads sound good. However, it might be better to use completely different protocols instead of PContent etc since we have quite different needs.
(Assignee)

Comment 25

2 years ago
Created attachment 8694735 [details] [diff] [review]
WIP

This patch models enough of the layout IPDL in the replayed process that during replay we are able to render the graphics for a simple "Hello World!" webpage.

I tried using a separate protocol from PContent, but ran into difficulties because it seemed like I would need to modify quite a lot of parent side code to tolerate interactions via multiple protocols (maybe there's a good place to refactor things, I don't know the parent side code at all really).  This patch has skeletons for protocols like PContent and PBrowser that ignore most incoming messages, and allocates its own actors for layout protocols like PLayer and PTexture that are 1:1 with actors in the replayed content but interact with the actual parent process.  During replay we watch for update messages from the replayed content and reflect those updates into ones for these alternate actors and send them to the parent.
Attachment #8691590 - Attachment is obsolete: true
(Assignee)

Comment 26

2 years ago
Created attachment 8695312 [details]
test.html

This is a simple HTML file that uses setTimeout() to count from 1 to 10.  Execution of this webpage can be completely replayed, including graphics.  From here I want to use this as a testcase to get process rewinding and (afterwards) debugger attaching to work during replay.  Combining the two should allow implementation of reverse breakpoints (find the last time a breakpoint executed).
(Assignee)

Comment 27

2 years ago
Created attachment 8695315 [details] [diff] [review]
WIP

Updated WIP that can record and replay the test.html above.  This just fixes some bugs with the previous patch where shared memory buffers were not locked before being sent to the parent process (which will unlock them, and will get angry if the read count becomes negative) and where mach_msg IPC calls were not being recorded/replayed properly.
Attachment #8694735 - Attachment is obsolete: true
(Assignee)

Comment 28

2 years ago
Created attachment 8696702 [details] [diff] [review]
WIP

This patch is functionally the same as the last one but has been rebased to tip and cleaned up somewhat.  At roc's suggestion I will break this apart into smaller patches to make it easier to follow and see how other parts of the browser are affected.
Attachment #8695315 - Attachment is obsolete: true
(Assignee)

Comment 29

2 years ago
Created attachment 8696730 [details] [diff] [review]
Part 1: Link NSPR in more places

The record/replay infrastructure is in NSPR, so libnspr needs to be linked into more places in the build.
(Assignee)

Comment 30

2 years ago
Created attachment 8696731 [details] [diff] [review]
Part 2: Record/replay infrastructure

This has the recording/replaying infrastructure for thread/lock creation, lock ordering, atomic access ordering, thread events, function hooking, and alternate implementations of all the hooked functions.  Sometime later this will be split up more.
(Assignee)

Comment 31

2 years ago
Created attachment 8696733 [details] [diff] [review]
Part 3: Atomics interface changes

This modifies mfbt/Atomics.h the NSPR atomic interfaces to automatically record and replay atomic accesses.  In the former case this is done at a user-specifiable precision, via template parameters.
(Assignee)

Comment 32

2 years ago
Created attachment 8696735 [details] [diff] [review]
Part 4: NSPR locking/threading changes.

This patch has the remainder of the changes to NSPR, to keep track of ids on threads and locks and ensure that creation and uses of these are recorded and correctly replayed.
(Assignee)

Comment 33

2 years ago
Created attachment 8696738 [details] [diff] [review]
Part 5: Use NSPR primitives in chromium code

The chromium IPC code currently uses pthreads locking and threading primitives, which are not included in the recording.  This patch changes things so that NSPR primitives are used instead.
(Assignee)

Updated

2 years ago
Attachment #8696738 - Attachment is patch: true
(Assignee)

Comment 34

2 years ago
Created attachment 8696740 [details] [diff] [review]
Part 6: Parent side changes

This patch has changes on the parent side for interfacing with recording/replaying child processes.
(Assignee)

Comment 35

2 years ago
Created attachment 8696742 [details] [diff] [review]
Part 7: Unrecorded atomics/locks

This patch relaxes certain atomic and lock accesses so that they are not included in the recording.  These are mostly related to GCs and JS helper thread behaviors (like Ion compilations), which are permitted to occur at different times in the recording and replay.
(Assignee)

Comment 36

2 years ago
Created attachment 8696743 [details] [diff] [review]
Part 8: Child side behavior changes

This patch changes the child process's behavior in a few places when recording or replaying is active, either to simplify recording/replaying or to avoid messy interactions with replay specific threads.
(Assignee)

Comment 37

2 years ago
Created attachment 8696745 [details] [diff] [review]
Part 9: Record/replay instrumentation of mach messages

Most child side instrumentation of library calls is done via the hooking infrastructure in part 2, which doesn't require changing the code which calls those libraries at all.  Code that interacts with libraries by sending mach messages needs to be manually instrumented, however, which this patch does.  This is the bulk of the child side changes needed to get replay to work.  I'm hoping that there is a way to do this so that the code sending the mach messages doesn't need to be modified, but I haven't yet looked into how mach messages work and whether this is possible.
(Assignee)

Updated

2 years ago
Attachment #8696745 - Attachment is patch: true
(Assignee)

Comment 38

2 years ago
Created attachment 8696746 [details] [diff] [review]
Part 10: Other child side instrumentation

This patch has the remaining manual child side instrumentation needed to get replay to work.
(Assignee)

Comment 39

2 years ago
Created attachment 8696750 [details] [diff] [review]
Part 11: Replayed process IPC

This patch has the changes needed to allow the replayed process to communicate with the actual parent process via IPDL messages and shared memory.  It also has the implementation for reflecting layers/compositing messages from the child into messages which can be sent to the parent process; this stuff will be split off into another patch some time later.
(Assignee)

Comment 40

2 years ago
Created attachment 8696752 [details] [diff] [review]
Part 12: Graphics changes for layers/compositing IPC

This patch includes the graphics code side of the layers/compositing message reflection.  When IPDL actors and shared memory regions are created, or update messages are sent, the replay IPC code is notified so that equivalent structures or messages can be created/sent to the actual parent process.
Thanks Brian!

We discovered people already working on a similar project. See bug 1231802.
(Assignee)

Comment 42

2 years ago
Created attachment 8698587 [details] [diff] [review]
rewind WIP

WIP for rewinding process state.  This goes on top of the previous patches.  Right now this records a series of snapshots as the program executes, one early on and (for now) later ones each time a compositor update is sent to the parent.  Each snapshot after the first is a diff of heap and static memory vs. the previous snapshot --- the mach_vm_region syscall is used to find all mapped memory in the process at the first snapshot, and then mprotect and an asm.js-style signal handler are used to find regions modified before the next snapshot.  The snapshots also include stacks and other state for each thread, recorded and restored with setjmp/longjmp and some inline assembly.  Currently we can restore an earlier snapshot, though the main thread crashes shortly after doing so, so there's still work to do.
(Assignee)

Comment 43

2 years ago
Created attachment 8698921 [details] [diff] [review]
rewind WIP

Rewinding in this patch works well enough that we can run to the point where a compositor update is first sent to the parent, rewind to when the initial snapshot was made, and continue running forward to the same point and rewinding ad nauseam.  The main tricky bit to deal with has been that after rewinding all heap memory naively, pthread condition variables do not work right.  I think that state in the process and state in the kernel are getting out of sync, but need a better understanding.  For now the condition variables are allocated in a memory region whose contents are preserved when rewinding.

The two main remaining steps for rewinding are (a) that we need to be able to rewind compositor state as well (right now the special replay threads for doing IPC and compositing are disabled if we want to rewind) and (b) that the parent process should be able to control when and how the child process is rewound, via a parent side developer toolbar I guess.
Attachment #8698587 - Attachment is obsolete: true
(Assignee)

Comment 44

2 years ago
Created attachment 8700063 [details] [diff] [review]
rewind WIP

This patch enables the replay specific IPC threads to run while snapshots are taken and restored.  The basic strategy is the same as with the other threads which are playing back from the recording --- save and restore the contents of their stacks and all memory they use.  Since these threads are working with live file descriptors, however, there is at least the potential for things to get all messed up, but by waiting for all needed descriptors/channels to be fully initialized before the first snapshot is taken this strategy seems to work alright.  (If this isn't viable in the long run there are more drastic options like tearing down the replay specific threads before taking a snapshot and rebuilding them afterwards, but I'd rather avoid this if possible.)

This patch gets far enough along that we can run to the end of the page's execution, rewind to the beginning, and then cause the parent to crash when we resend it messages for creating IPDL actors it already knows about.  The next step will be modifying the messages sent to the parent so that the compositor is able to maintain a coherent state even after the child process rewinds itself.
Attachment #8698921 - Attachment is obsolete: true
(Assignee)

Comment 45

2 years ago
Created attachment 8701073 [details] [diff] [review]
rewind WIP

This patch is able to show graphics after rewinding; on the test page it runs to the end, rewinds to the beginning, then shows updates as it plays forward to the end again.  This is managed by having the replay create all the graphics IPDL actors it will need up front (before the first snapshot is taken) using information gathered during the recording.  When rewinding or fast forwarding, then, all the actors needed to construct a coherent state in the parent process are already in place.  There still isn't anything complicated going on with reconstructing the compositor state which existed at some time when rewinding, as this page is so simple that all the graphics are contained in one or two tiles.  Will wait on improving this until we start looking at more complicated pages.

This patch also adds some more ignored locks and atomics so that GCs and CCs can occur at different times in the recording and replay (something I thought I did earlier but didn't actually do), so it's a good deal larger than the last patch.
Attachment #8700063 - Attachment is obsolete: true
(Assignee)

Comment 46

2 years ago
I've started looking at the devtools JS debugger stuff and how to attach breakpoints in the replayed process.  I think the design here will need a little reworking.  I want to be able to use the C++ JS debugger (js/src/vm/Debugger.* and some associated bits) in the replayed process, but devtools runs quite a bit more javascript code in the client process, whose results are collated and sent to the parent process in the form of (I think) JSON objects.  I'd like to leave this JS code alone as much as possible, but (a) allowing the replayed process to execute JS code that didn't run in the recording could really mess up the replay, and (b) rewinding the process could really confuse the server without some substantial modifications.

It would be better I think if the devtools JS ran in a separate middleman process from the replayed process.  This would allow the devtools JS to do whatever it wants without fear of damaging the replayed process state, and all its structures and state would be preserved when the replayed process rewinds or fast forwards itself.  The design would be:

- The parent process communicates with the middleman process via the normal PContent/PBrowser/etc. IPDL protocols.
- The middleman process communicates with the replayed process via special replay protocol(s) that are tailored to the operations the replayed process can perform.

The parent is largely unchanged from its current behavior.  In addition to running the devtools JS, the middleman process manages changes in state when the replayed process rewinds/fastforwards so that the parent process and devtools JS have a coherent representation of that state.  For example, when playing forward the middleman will keep track of compositor updates so that if the process rewinds later it can reconstruct that earlier state by sending messages to the parent.  WRT the devtools JS, whenever the replayed process takes a snapshot the C++ JS debugger should not be active, but the destruction and reconstruction of these C++ constructs should be transparent to the devtools JS when playing forward through a snapshot or rewinding/fastforwarding.  The middleman process should be able to manage this transparency.  Both this and the compositor stuff could in principle be done in the replayed process, but it's much harder to do there due to the nature of the process rewinding --- all state that should be preserved across rewinding needs to be stored either on disk or in specially allocated memory regions.

The middleman process doesn't really need to be a separate process from the parent; the parent could instead start a message loop thread and I/O thread which do all the work which the middleman does.  I think it will be easier to implement as a separate process, though, and because IPDL can be used for both inter-process and inter-thread communication, changing the middleman to run in the parent down the road (if desired) should be pretty painless.
(Assignee)

Comment 47

2 years ago
Created attachment 8701885 [details] [diff] [review]
rewind WIP

This patch does the same stuff as the last one, but does so using a middleman process as described in comment 46.

Overall I think this is a definite improvement, and will provide a lot more flexibility going forward.  There is more boilerplate and overhead to marshal compositor updates from the replayed process into the middleman and then into the parent process, but the middleman is just a lightly modified Gecko content process --- it defers to the replayed process for compositor updates, but can do anything else a normal content process can do without worrying about how the process replay will be affected.

I forgot to say earlier, but using the middleman also gets us to the separate protocols idea from comment 24.  Instead of the replayed process having a skeleton PContentChild/PBrowserChild/etc. implementation, the middleman process has standard versions of these, and the middleman and replayed processes communicate with a specialized PReplay protocol.
Attachment #8701073 - Attachment is obsolete: true
(In reply to Brian Hackett (:bhackett) from comment #46)
> I've started looking at the devtools JS debugger stuff and how to attach
> breakpoints in the replayed process.  I think the design here will need a
> little reworking.  I want to be able to use the C++ JS debugger
> (js/src/vm/Debugger.* and some associated bits) in the replayed process,

Is the goal to use all (most? the ones that make sense? read only?) of the
devtools during replay? Because while a lot of functionality is built on the
Debugger API, just as much is built on a random assortment of other XPCOM APIs
and web APIs themselves (such as the "inspector" that lets one view and edit the
dom tree and the styles applied to each node directly). Point being that just
re-implementing the Debugger API is not enough to get most of the devtools. Even
just the JS debugger itself may need more.

> but devtools runs quite a bit more javascript code in the client process,
> whose results are collated and sent to the parent process in the form of (I
> think) JSON objects.

Yes, the devtools run a _ton_ of JS. Almost all of the devtools remote debugger
protocol server (which lives in the debuggee content process) is JS.

Yes, the remote debugger protocol uses JSON for the most part, but there is also
the "bulk data" packets used for arbitrary binary data which are implemented on
top of streams directly.

For more info on the protocol itself:
https://wiki.mozilla.org/Remote_Debugging_Protocol

Request and response messages are sent over a "transport". We have a
`ChildDebuggerTransport` for debugging the child process from the parent
process, which seems most relevant here. See
https://dxr.mozilla.org/mozilla-central/source/devtools/shared/transport/transport.js#699-712

If there are any devtools or Debugger API related questions I can answer, please
ask away :)
(Assignee)

Comment 49

2 years ago
(In reply to Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] from comment #48)
> (In reply to Brian Hackett (:bhackett) from comment #46)
> > I've started looking at the devtools JS debugger stuff and how to attach
> > breakpoints in the replayed process.  I think the design here will need a
> > little reworking.  I want to be able to use the C++ JS debugger
> > (js/src/vm/Debugger.* and some associated bits) in the replayed process,
> 
> Is the goal to use all (most? the ones that make sense? read only?) of the
> devtools during replay? Because while a lot of functionality is built on the
> Debugger API, just as much is built on a random assortment of other XPCOM
> APIs
> and web APIs themselves (such as the "inspector" that lets one view and edit
> the
> dom tree and the styles applied to each node directly). Point being that just
> re-implementing the Debugger API is not enough to get most of the devtools.
> Even
> just the JS debugger itself may need more.

Right now the goal is to just use the JS debugger during replay.  Eventually, it would be nice to use all the devtools that make sense for a replay debugger (Everything except performance/network?  In principle even these two could make sense for the replay debugger but the overhead of recording could perturb them a lot).  Everything would be read only to the devtools, since changing state in an observable way can cause the replay to diverge from the recording.

Accesses to XPCOM objects in the replayed process can still be supported; instead of having XPConnect objects which call into XPCOM directly, the devtools code will have proxies whose accesses are resolved by sending synchronous IPDL messages to the replayed process.  The patch I'm working on now does this for things like the doc shell and DOM window, and the result is fairly transparent to the devtools code (I have been modifying the devtools code to handle foobar.QueryInterface(), which could maybe be done transparently but would be pretty gnarly under the hood).

I don't know yet whether this will be enough to support the DOM inspector.  The main question is whether the information the inspector is getting --- the boundary/padding for an element, the element which is found by a right click -> Inspect Element, etc. --- can be determined solely from the replayed process' document tree and layout state, without making calls into the system graphics libraries.

> If there are any devtools or Debugger API related questions I can answer,
> please
> ask away :)

Sure, thanks!
(Assignee)

Comment 50

2 years ago
Created attachment 8704372 [details] [diff] [review]
debugger WIP

This patch adds some integration with the JS debugger.  It does enough that one can run the replayed process, and when it finishes open the debugger UI, see the sources in the replayed process, set a breakpoint on a line, click on a new 'rewind' UI button, have the replayed process rewind to the last point where the breakpoint was hit, and finally have the middleman process notify the chrome process about the pause so it can update the UI appropriately.  Not much else works --- when paused we don't yet provide properties on the environment or visible objects to inspect --- but I feel this suffices as a proof of concept, that we can attach the debugger and have it inspect state in the replayed process without perturbing the replay.  The prototype is, for lack of a better word, done.

Things I'd like to do next:

- Post a design document about the project which describes its organization, technical features, and the invariants it is maintaining.  Is MDN the best place for this?

- Post a cleaned up, rolled up patch and a new patch series breaking it down.

- Start recording/replaying more complicated pages.  The robustness of the various oarts of this project is I feel the biggest unknown at this point.
The best place for a design doc is probably wiki.mozilla.org.
(In reply to Boris Zbarsky [:bz] (Vacation until Jan 4) from comment #51)
> The best place for a design doc is probably wiki.mozilla.org.

My understanding is that MDN is best for code-related stuff, and wiki.m.o is for stuff relating to people, teams and planning.

E.g. from https://wiki.mozilla.org/MozillaWiki:About:

> To summarize: MDN documents how to interact with code. The Mozilla wiki documents how to interact
> with teams. SUMO documents how to interact with Mozilla products (as an end-user).
Huh.  It used to be that MDN was for documentation for how people outside the Mozilla project should interact with our code (e.g. the web developer documentation) and things like https://wiki.mozilla.org/Gecko:Overview lived... well, where it's living.  I wonder when that was supposed to have changed.
(Assignee)

Comment 54

2 years ago
Thanks, I put up the design document at https://developer.mozilla.org/en-US/docs/WebReplay
(Assignee)

Comment 55

2 years ago
Created attachment 8705209 [details] [diff] [review]
WIP

Rolled up patch, rebased to tip and cleaned up in some places.
Attachment #8695312 - Attachment is obsolete: true
Attachment #8696702 - Attachment is obsolete: true
Attachment #8696730 - Attachment is obsolete: true
Attachment #8696731 - Attachment is obsolete: true
Attachment #8696733 - Attachment is obsolete: true
Attachment #8696735 - Attachment is obsolete: true
Attachment #8696738 - Attachment is obsolete: true
Attachment #8696740 - Attachment is obsolete: true
Attachment #8696742 - Attachment is obsolete: true
Attachment #8696743 - Attachment is obsolete: true
Attachment #8696745 - Attachment is obsolete: true
Attachment #8696746 - Attachment is obsolete: true
Attachment #8696750 - Attachment is obsolete: true
Attachment #8696752 - Attachment is obsolete: true
Attachment #8701885 - Attachment is obsolete: true
Attachment #8704372 - Attachment is obsolete: true
(Assignee)

Comment 56

2 years ago
Created attachment 8705210 [details] [diff] [review]
Part 1: Build system changes

Link NSPR in more places, and perform all other build system changes (some of which don't fit cleanly into any of the remaining patches).
(Assignee)

Comment 57

2 years ago
Created attachment 8705212 [details] [diff] [review]
Part 2: Record/replay/rewind infrastructure

This has the record/replay infrastructure from the earlier patch, plus the rewind infrastructure, which is pretty closely intertwined.  The hooking parts of the record/replay infrastructure have been separated into a new file.
(Assignee)

Comment 58

2 years ago
Created attachment 8705213 [details] [diff] [review]
Part 3: Atomics interface changes

The atomics interface changes are generally unchanged from the earlier patch.
(Assignee)

Comment 59

2 years ago
Created attachment 8705214 [details] [diff] [review]
Part 4: NSPR locking/threading changes

The NSPR locking/threading changes are also very similar to those in the previous patch.
(Assignee)

Comment 60

2 years ago
Created attachment 8705215 [details] [diff] [review]
Part 5: Use NSPR primitives in chromium code

Changing over the chromium IPC code to use NSPR primitives is pretty much the same as in the earlier patch.
(Assignee)

Comment 61

2 years ago
Created attachment 8705217 [details] [diff] [review]
Part 6: Parent side changes

Chrome process changes are even more minimal now that during replay it is interacting with an almost normal content process.
(Assignee)

Comment 62

2 years ago
Created attachment 8705219 [details] [diff] [review]
Part 7: Unrecorded atomics/locks

The amount of unrecorded atomics/locking behavior is more expansive now that we're making sure no activity is recorded during GC/CC.
(Assignee)

Comment 63

2 years ago
Created attachment 8705223 [details] [diff] [review]
Part 8: Child side behavior changes

Behavior changes on the side of the recording/replaying content process vs. a normal content process are also more extensive now.  A lot of this is disabling logging/etc. code during record/replay, which is easier to do than to make sure the logging does not affect the recording.  It also includes some things like avoiding posting runnables during a GC, which is preventing incremental GC/CC from being used during recording/replaying and is a restriction I would like to fix.
(Assignee)

Comment 64

2 years ago
Created attachment 8705225 [details] [diff] [review]
Part 9: Record/replay instrumentation of mach messages

This is pretty much unchanged from the earlier patch.
(Assignee)

Comment 65

2 years ago
Created attachment 8705228 [details] [diff] [review]
Part 10: Other child side instrumentation

The additional instrumentation required to get replaying to work is also pretty much unchanged from the earlier patch.
(Assignee)

Comment 66

2 years ago
Created attachment 8705230 [details] [diff] [review]
Part 11: Middleman process IPC / other changes

This patch has the PReplay IPDL protocol and the middleman process side of its implementation, as well as other middleman specific code for avoiding loading documents / rendering graphics of its own, and for conveying graphics updates from the replaying process to the chrome process and debugger messages from the middleman's JS debugger to the replaying process.
(Assignee)

Comment 67

2 years ago
Created attachment 8705231 [details] [diff] [review]
Part 12: Replaying process IPC

The other side of the PReplay protocol, which lives in the replaying process.  This patch conveys graphics updates to the middleman process, and debugger messages from the middleman process to the replaying process side of the JS debugger.  It also has some modifications to the IPC code so that the replaying process can pause replay specific threads before taking or restoring a snapshot.
(Assignee)

Comment 68

2 years ago
Created attachment 8705236 [details] [diff] [review]
Part 13: C++ JS debugger changes

Changes to the C++ JS debugger to accommodate the replay JS debugger.  Currently, all debug objects/scripts/etc. in a Debugger instance are wrappers around a JS object/script/etc. in another compartment.  This patch allows the Debugger to have an associated ReplayDebugger, which stores different data in the debug objects/scripts/etc. and has alternate implementations for the various methods on them.  This patch has the Debugger code to manage this dichotomy; the actual implementation of ReplayDebugger is in the next patch.
(Assignee)

Comment 69

2 years ago
Created attachment 8705237 [details] [diff] [review]
Part 14: C++ JS replay debugger

This patch has the implementation of ReplayDebugger on the middleman and replaying process sides; whenever the Debugger needs information about an object/script/etc. it calls a method on its ReplayDebugger which either gets the information from local state or calls a hook to send an IPDL message to the replaying process, which when received calls yet a separate hook to go back into ReplayDebugger code and compute the answer to the query.
(Assignee)

Comment 70

2 years ago
Created attachment 8705244 [details] [diff] [review]
Part 15: Graphics changes for replay IPC

As in the earlier patch, this includes spots where the replayed portion of the replaying process calls into IPC code so that a message can be sent to the parent process.  The format of graphics messages in the PReplay protocol are different, however, from those in PLayerTransaction (layers, textures, etc. are referred to using indexes, rather than actors) which required some refactoring of the layer messages to avoid duplicate code; this patch also includes changes to the graphics code to handle this refactoring.
(Assignee)

Comment 71

2 years ago
Created attachment 8705246 [details] [diff] [review]
Part 16: Client (chrome) side devtools changes

This patch adds a rewind button to the debugger client's UI, which when pressed sends a message to the debugger server living in the middleman process.
(Assignee)

Comment 72

2 years ago
Created attachment 8705256 [details] [diff] [review]
Part 17: Server (content) side devtools changes

On the server side of the devtools code, this patch has changes to receive the rewind button event and pass it onto the C++ JS debugger.  It also has some other changes needed for replaying, mainly for interacting with the proxies created for the docshell/window/etc. in the replaying process, which don't (yet?) support generic QI calls.

This is the last patch in this series.
The middleman process design sounds like a really good idea. Thanks!
Thanks for the MDN document. It's really good.

We might be able to simplify the graphics situation by switching rendering to use Skia or the cairo image backend instead of CoreGraphics on Mac.

I assume the issue of asm.js's mprotect handlers interfering with WebReplay's can be fixed.

A few things we need to figure out soon-ish:
1) Can this work on Windows? It might be a good idea to have someone try porting it to Windows who's not you.
2) How much work is it to extend to more browser features?
3) What's the long-term maintenance burden going to be?
4) Whether we really want direct debugging of a replayed process or we should target an omniscient approach. I haven't done anything here yet...
(Assignee)

Comment 75

2 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #74)
> Thanks for the MDN document. It's really good.
> 
> We might be able to simplify the graphics situation by switching rendering
> to use Skia or the cairo image backend instead of CoreGraphics on Mac.

This would still require using CoreGraphics for things like rendering fonts, right?  I don't think that intercepting the CoreGraphics routines is the main problem (CG is a well documented public API), but my biggest concern right now is the overhead of recording graphics data produced by CG.  I'll attach a patch and some more info on this in a bit.

> I assume the issue of asm.js's mprotect handlers interfering with
> WebReplay's can be fixed.

Yes.

> A few things we need to figure out soon-ish:
> 1) Can this work on Windows? It might be a good idea to have someone try
> porting it to Windows who's not you.

I don't know of anything that would prevent this from working on Windows.  Recording/replaying is all based on hooks, which we already use in other places on Windows, but the rewinding infrastructure does more systemy things like crawling the virtual address space and using a signal handler.  These things look like they have analogues on Windows, e.g. VirtualQuery looks functionally similar to mach_vm_region.

If anyone wants to work on this that would be great, otherwise I can, though probably not until March.  I'll be on PTO most of February.

> 2) How much work is it to extend to more browser features?

I'll post another patch in a bit, but it took a few days to get from recording/replaying the text only test page to recording/replaying the wikipedia home page, which does new things like loading images, using CSS, having a complex page layout, etc.  The main feature I'm concerned about is WebGL, and I hope we can handle it by just hooking a bunch of new API functions and doing something similar to what we're doing with CoreGraphics.

> 3) What's the long-term maintenance burden going to be?

I think there are three categories to consider:

- Within the record/replay system, the main burdens should be for handling breakages with new OS versions and for hooking new APIs being called by the browser.

- The glue code for replay IPC, graphics messages, and debugger integration would need to be updated to continue working after changes to those browser components.

- In other parts of the browser that don't directly interact with the record/replay mechanism, I would like to minimize changes as much as possible, which should also minimize the maintenance burden.  I think things are pretty good right now, though the mach message intercepting is still pretty ugly and there are some other improvements that would be nice to make, like allowing the GC to work the same way whether recording/replaying or not.
(Assignee)

Comment 76

2 years ago
Created attachment 8707012 [details] [diff] [review]
robustness WIP

With this patch, the main page for Wikipedia can be recorded and replayed.  As noted above, while this is a pretty static page it uses a lot more browser features than the basic test page did.  The changes in this patch are mostly internal reorganization/improvements in the record/replay code.  Other changes include some new APIs being hooked, messages being intercepted, and several hash tables that need to behave deterministically (the handling for these hash tables is pretty low level currently, I want to fold these into the hashtable classes themselves at some point).
(Assignee)

Comment 77

2 years ago
Created attachment 8707023 [details] [diff] [review]
graphics WIP

This patch is still pretty preliminary but illustrates what I'm hoping to with graphics data.  As I understand it the tile based renderer creates a bunch of CGBitmapContexts which render graphics updates to shared memory buffers managed by IPDL textures.  Right now we handle these during record/replay by keeping track of API calls which will change the rendered data, and then at points where the data is needed (e.g. before sending a graphics update to the parent) recording the contents of the graphics data buffers so they can be restored later.  This is simple and it works but it produces an unacceptable amount of data.  When loading Wikipedia the size of the recording is 176MB, 88% of which is graphics data.  If I load Wikipedia, scroll to the bottom and then back to the top, the size of the recording is 477MB, 95% of it graphics data.

This patch leverages the hooking of the CoreGraphics and other Core APIs to avoid including any of this graphics data in the recording.  When the process is replaying, when these calls are performed they are recorded in a separate binary stream, and when a graphics update is sent to the middleman that binary stream is also sent to the middleman, so that it can perform the graphics calls itself and render graphics to the memory buffers it shares with the replaying process.  It's possible that the middleman could render the graphics data directly to the shared memory buffers it shares with the chrome process, eliminating the need for sharing graphics buffers with the replaying process, but this would require that the replay work without ever accessing the graphics data and that is not currently the case.

Currently this works with a simple 'Hello World!' page.  Handling more complicated pages should only require improving this binary stream to work with other APIs that are already being hooked.
> > 2) How much work is it to extend to more browser features?
> 
> I'll post another patch in a bit, but it took a few days to get from
> recording/replaying the text only test page to recording/replaying the
> wikipedia home page, which does new things like loading images, using CSS,
> having a complex page layout, etc.  The main feature I'm concerned about is
> WebGL, and I hope we can handle it by just hooking a bunch of new API
> functions and doing something similar to what we're doing with CoreGraphics.

I'm also worried about the media stack, and third-party code like video codecs and the WebRTC stack that use system APIs and custom threading.

(In reply to Brian Hackett (:bhackett) from comment #76)
> With this patch, the main page for Wikipedia can be recorded and replayed. 
> As noted above, while this is a pretty static page it uses a lot more
> browser features than the basic test page did.  The changes in this patch
> are mostly internal reorganization/improvements in the record/replay code. 
> Other changes include some new APIs being hooked, messages being
> intercepted, and several hash tables that need to behave deterministically
> (the handling for these hash tables is pretty low level currently, I want to
> fold these into the hashtable classes themselves at some point).

Yeah, iterating nsPtrHashKey tables was obviously going to be an issue. It might be a good idea to just eliminate that, independently of record/replay.

Your graphics recording approach might be able to use Moz2D's CreateRecordingDrawTarget, which in principle does what you need.
(Assignee)

Comment 79

2 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #78)
> Your graphics recording approach might be able to use Moz2D's
> CreateRecordingDrawTarget, which in principle does what you need.

Great, I will try doing this.  Ideally all the CoreGraphics interaction would be behind the DrawTarget and other 2D abstractions, so that we don't need to hook the graphics functions at all.  Instead different draw targets would be used, so that while recording the CG target is active but while replaying a recording target is active whose events are forwarded to a CG target in the middleman process.  This design should work on other platforms as well.
(Assignee)

Comment 80

2 years ago
Created attachment 8707484 [details] [diff] [review]
graphics WIP

This patch expands on recording/execution of graphics commands so that during replay of the home page for Wikipedia the entire page is rendered correctly.  Next I'll try interposing DrawTargetRecording as described above to do the same thing.
Attachment #8707023 - Attachment is obsolete: true
(Assignee)

Comment 81

2 years ago
Created attachment 8708541 [details] [diff] [review]
graphics WIP

This is a replacement for the previous graphics patch which uses DrawTargetRecording instead of hooking CoreGraphics API calls to perform drawing in the middleman instead of the replaying process.  During both recording and replay a DrawTargetRecording is active; when recording the DrawTarget calls are forwarded to the normal native DrawTarget as is normally done by DrawTargetRecording, and when replaying the DrawTarget calls are simply recorded, with no underlying native DrawTarget.  This allows many of the CoreGraphics hooks to be removed.  With this patch, the replaying process is able to render the Wikipedia homepage.  There are still a couple issues:

- Font data consumes over 80% of the recording data on Wikipedia, and it would be better to have an efficient way of encoding them between recording/replay.  The earlier patch just recorded the full names of any system fonts being created, and doing something similar here when possible would be good.

- nsNativeThemeCocoa::DrawWidgetBackground gets the CGContext from a DrawTarget and draws things directly to it, for what looks like native widgets like checkboxes and such.  These uses won't be captured by the DrawTargetRecording, though this is a more general issue with DrawTargetRecording and not specific to record/replay (AFAICT other than here DrawTargetRecording is not used on Macs, e.g. before this patch there was no way for the decoder to construct fonts for drawing on platforms other than Windows).
Attachment #8707484 - Attachment is obsolete: true
(Assignee)

Comment 82

2 years ago
Created attachment 8708687 [details] [diff] [review]
record font names

This patch records full font names when available from the CGFontRef instead of the font's tables, bringing the total size of the recording on Wikipedia down to around 22 MB (which is where it was with the earlier CoreGraphics based patch).
(Assignee)

Comment 83

2 years ago
Created attachment 8711036 [details] [diff] [review]
robustness WIP #2

This patch goes on top of the previous ones and includes various fixes to improve robustness.  With this patch I can record and replay a simple GMail session --- from a fresh profile, the login page is loaded, I log in, and the inbox loads and displays.  The total recording size is about 40 MB, which seems reasonable and should improve if we start doing in memory recording compression.  Everything about the page looks right, except for animations that stay on their initial frame (e.g. the avatar that shows up on the login screen).  I don't know what to do about these yet, especially when they are based on absolute time stamps and when the debugger is active on these pages.

Anyways, the main new issue this patch exposed is what to do about places where non-deterministic parts of the browser interact with deterministic parts.  In particular, there are several nsISupports classes --- nsGlobalWindow, HTMLImageElement, and so forth --- that can be destroyed during GC and trigger recorded events in their destructor, and cross compartment wrappers that are destroyed during GC sweeping can trigger recorded events when they are regenerated later.

There are a few ways these could be handled:

A) Move the recording boundary so that recorded events are never triggered by these operations.  This is conceptually the cleanest, and is just an extension of what we already do with e.g. atomic refcounts and other quantities that change during GC.  It will probably require the most work, though, as some of these destructors do things like posting events to the main thread's event loop and sending IPDL messages, which will need new infrastructure to separate out the deterministic and non-deterministic portions of these other systems.

B) Ensure that destruction of the data in these classes are performed consistently, using an event on the main thread that fires every second or so.  During recording, objects which can trigger events have their destruction delayed until this event next fires.  During replay, keep the corresponding objects alive until the same event firing, and then either destroy them or clean up their internal data, depending on whether they still have other references.  I tried to do this approach, but don't like it that much.

C) Force these objects to behave deterministically, by simply never destroying them.  This is of course the simplest approach and is the one this patch takes.  It might be viable in the long run, since recordings are expected to be relatively short lived, but it still doesn't seem ideal.

Probably the best strategy is a mix of A) and C).  Start by just not destroying such objects in a non-deterministic fashion, and then find some minimally invasive ways to improve Gecko so that the destruction of these objects does not interfere with the recording and can happen as it normally would.
(Assignee)

Comment 84

2 years ago
Created attachment 8711510 [details] [diff] [review]
Windows WIP

This patch makes enough changes so that the build works on Windows.  It doesn't do anything else, but since this patch is so large (mostly from code reorganization and renaming variable types to appease cl) it seems better to split off from changes with actual substance.
(Assignee)

Comment 85

2 years ago
Created attachment 8713640 [details] [diff] [review]
Windows WIP #2

This patch goes on top of the previous one and improves the Windows build so that all thread/locking activity is recorded and so that the hooking infrastructure works.  Right now this just hooks a few functions, the next step is to expand this to a more complete set.
FWIW, I just noticed that there's a TimeTravelDebugging branch of https://github.com/Microsoft/ChakraCore/commits/TimeTravelDebugging. I haven't looked at their implementation and how it compares to this one.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #86)
> FWIW, I just noticed that there's a TimeTravelDebugging branch of
> https://github.com/Microsoft/ChakraCore/commits/TimeTravelDebugging. I
> haven't looked at their implementation and how it compares to this one.

I looked a bit at this. It seems like they can take a snapshot of the heap and then they also maintain an event log that they can replay. An event can be something like "Date.now called, returned value X".

Here's a video where they explain how it works (I didn't watch it completely yet):

https://channel9.msdn.com/blogs/Marron/Time-Travel-Debugging-for-JavaScriptHTML
(Assignee)

Comment 88

2 years ago
Created attachment 8727462 [details] [diff] [review]
WIP

Updated rebased and rolled up patch.  Back in early February I did a fair amount more on windows, adding hooks for many of the kernel32 interfaces, but things don't work yet.  From here I'm going to change tack a bit, mostly working on Mac and improving the quality of the patch and robustness of the technology, aiming towards getting a solid, interesting demo ready.  I'll continue working on improving the windows side of things, though with lower priority.
Attachment #8705209 - Attachment is obsolete: true
Attachment #8707012 - Attachment is obsolete: true
Attachment #8708541 - Attachment is obsolete: true
Attachment #8708687 - Attachment is obsolete: true
Attachment #8711036 - Attachment is obsolete: true
Attachment #8711510 - Attachment is obsolete: true
Attachment #8713640 - Attachment is obsolete: true
(Assignee)

Comment 89

2 years ago
Created attachment 8730218 [details] [diff] [review]
WIP

Updated rolled up patch.  This makes several improvements to the basic record/replay infrastructure --- records are compressed in memory while being written to disk, limits on the number of threads and locks and so forth that can be tracked are removed, hash tables are used instead of vectors for associating information with objects.  It also fixes some things so that executions can be recorded/replayed with a --disable-debug --enable-optimize build.
Attachment #8727462 - Attachment is obsolete: true
(Assignee)

Comment 90

2 years ago
(In reply to Jan de Mooij [:jandem] from comment #87)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #86)
> > FWIW, I just noticed that there's a TimeTravelDebugging branch of
> > https://github.com/Microsoft/ChakraCore/commits/TimeTravelDebugging. I
> > haven't looked at their implementation and how it compares to this one.
> 
> I looked a bit at this. It seems like they can take a snapshot of the heap
> and then they also maintain an event log that they can replay. An event can
> be something like "Date.now called, returned value X".
> 
> Here's a video where they explain how it works (I didn't watch it completely
> yet):
> 
> https://channel9.msdn.com/blogs/Marron/Time-Travel-Debugging-for-
> JavaScriptHTML

It looks like there are some substantial differences between the approach in this bug and the approach described in this video.  The most important one is the recording boundary.  In this bug we aim to reproduce the behavior of all code compiled as part of the browser (i.e. excluding system libraries which are called into), modulo things like garbage collection and refcounts which aren't replayed exactly.  The approach described in the video compartmentalizes the JS engine so that its non-deterministic behaviors and inputs (like event handler invocations) are recorded and replayed, but not other portions of the browser.  Late in the video they are able to rewind state and replay the behavior of a JS game that manipulates a canvas element, but if the DOM structure were to change during the course of the recording then I don't think they would be able to replay those changes or to reconstruct the earlier DOM when rewinding.
(Assignee)

Comment 91

2 years ago
Created attachment 8736310 [details] [diff] [review]
WIP

Rolled up patch with the last couple weeks of work.  This changes how locks and cvars work during replay so that instead of using the native pthreads code, file descriptor pipes are used by threads to signal each other.  This avoids making special pthreads related system calls, which don't seem to work reliably on OS X after rewinding process state, and is also more efficient, as threads can signal one another when acquiring locks in the recorded order without needing to acquire a particular highly contended global lock.

This also fleshes out the debugger quite a bit, so that state can be inspected while paused at a breakpoint and the UI can be used to rewind or advance across all the points where a breakpoint is hit.  The next, well, step is to get the stepping handlers and their rewinding counterparts to work.
Attachment #8730218 - Attachment is obsolete: true
(Assignee)

Comment 92

2 years ago
Created attachment 8737826 [details] [diff] [review]
WIP

Rolled up patch, rebased to tip.  It's been a while, so I'll update the individual patches which go into this one.
Attachment #8705210 - Attachment is obsolete: true
Attachment #8705212 - Attachment is obsolete: true
Attachment #8705213 - Attachment is obsolete: true
Attachment #8705214 - Attachment is obsolete: true
Attachment #8705215 - Attachment is obsolete: true
Attachment #8705217 - Attachment is obsolete: true
Attachment #8705219 - Attachment is obsolete: true
Attachment #8705223 - Attachment is obsolete: true
Attachment #8705225 - Attachment is obsolete: true
Attachment #8705228 - Attachment is obsolete: true
Attachment #8705230 - Attachment is obsolete: true
Attachment #8705231 - Attachment is obsolete: true
Attachment #8705236 - Attachment is obsolete: true
Attachment #8705237 - Attachment is obsolete: true
Attachment #8705244 - Attachment is obsolete: true
Attachment #8705246 - Attachment is obsolete: true
Attachment #8705256 - Attachment is obsolete: true
Attachment #8736310 - Attachment is obsolete: true
(Assignee)

Comment 93

2 years ago
Created attachment 8737828 [details] [diff] [review]
Part 1: Build system changes
(Assignee)

Comment 94

2 years ago
Created attachment 8737830 [details] [diff] [review]
Part 1b: Move LZ4 to NSPR

The record/replay infrastructure now uses LZ4 to compress records in memory as they are written to disk, so this patch moves the in-tree version of LZ4 from mfbt to nspr.
(Assignee)

Comment 95

2 years ago
Created attachment 8737832 [details] [diff] [review]
Part 2: Record/replay/rewind infrastructure
(Assignee)

Comment 96

2 years ago
Created attachment 8737833 [details] [diff] [review]
Part 2b: Mac / Windows redirections

This patch has the system/library call redirections for Mac and (still incomplete) Windows, and the infrastructure for enabling redirections.
(Assignee)

Comment 97

2 years ago
Created attachment 8737835 [details] [diff] [review]
Part 3: Atomics interface changes
(Assignee)

Comment 98

2 years ago
Created attachment 8737836 [details] [diff] [review]
Part 4: NSPR locking/threading changes
(Assignee)

Comment 99

2 years ago
Created attachment 8737837 [details] [diff] [review]
Part 5: Use NSPR primitives in chromium code
(Assignee)

Comment 100

2 years ago
Created attachment 8737838 [details] [diff] [review]
Part 6: Parent side changes
(Assignee)

Comment 101

2 years ago
Created attachment 8737839 [details] [diff] [review]
Part 7: Unrecorded atomics/locks
(Assignee)

Comment 102

2 years ago
Created attachment 8737841 [details] [diff] [review]
Part 8: Child side behavior changes
(Assignee)

Comment 103

2 years ago
Created attachment 8737842 [details] [diff] [review]
Part 9: Record/replay instrumentation of mach messages
(Assignee)

Comment 104

2 years ago
Created attachment 8737843 [details] [diff] [review]
Part 10: Other child side instrumentation
(Assignee)

Comment 105

2 years ago
Created attachment 8737844 [details] [diff] [review]
Part 11: Middleman process IPC / other changes
(Assignee)

Comment 106

2 years ago
Created attachment 8737846 [details] [diff] [review]
Part 12: Replaying process IPC
(Assignee)

Comment 107

2 years ago
Created attachment 8737847 [details] [diff] [review]
Part 13: C++ JS debugger changes
(Assignee)

Comment 108

2 years ago
Created attachment 8737848 [details] [diff] [review]
Part 14: C++ JS replay debugger
(Assignee)

Comment 109

2 years ago
Created attachment 8737849 [details] [diff] [review]
Part 15: Graphics layers/shmem changes for replay IPC
(Assignee)

Comment 110

2 years ago
Created attachment 8737850 [details] [diff] [review]
Part 15b: Use DrawTargetRecording to render graphics in the middleman process
(Assignee)

Updated

2 years ago
Attachment #8737849 - Attachment description: Part 15 - Graphics layers/shmem changes for replay IPC → Part 15: Graphics layers/shmem changes for replay IPC
(Assignee)

Comment 111

2 years ago
Created attachment 8737851 [details] [diff] [review]
Part 16: Client (chrome) side devtools changes
(Assignee)

Comment 112

2 years ago
Created attachment 8737852 [details] [diff] [review]
Part 17: Server (content) side devtools changes
(Assignee)

Comment 113

2 years ago
Created attachment 8739966 [details] [diff] [review]
WIP

Rolled up patch updated with the last couple weeks of work.  This mainly overhauls how breakpoints are handled when debugging a replaying process, and gets forward and reverse stepping to work.  For now, stepping is handled by setting breakpoints at specific spots instead of setting a handler for a specific frame.  This should work fine except when a script is invoked recursively.  Eventually the replaying process should be able to keep track of frame identity even after rewinding, which would allow setting handlers on the frames themselves and fixing this.
Attachment #8737826 - Attachment is obsolete: true
(Assignee)

Comment 114

2 years ago
Created attachment 8747462 [details] [diff] [review]
WIP #2

Diff on top of the previous patch with the last couple weeks of work.  This has various bug fixes and robustness and efficiency improvements.  The most important changes are:

- PLDHashTables (along with the various classes that are based on them) all have a deterministic iteration order when recording or replaying.  This allows removal of all other hashtable related instrumentation in the browser, except for one spot that uses std::set (whose code we can't modify).

- Rendered graphics data can be sent both ways between the replaying process and the middleman.  This allows the replaying process to access the rendered graphics when needed, and also lets the middleman leverage the snapshot mechanism to restore earlier graphics data when updating compositor state after rewinding.

I've started working through recording and replaying the web-platform tests, one hundred or so at a time.  Currently we can record and replay all the 2dcontext tests (about 800 tests, or 10% of the suite).  Three of them give different pass/fail results from a normal execution, as we currently use the CG draw target when recording/replaying, instead of Skia (though since the backend in use is hidden behind the DrawTargetRecording and never renders anything in the replaying process, fixing this shouldn't be too hard).

From here I want to go through the rest of the web-platform tests, both to fix bugs and to see what browser features we can easily support while recording/replaying.  With enough work it shouldn't be terribly hard to get all features tested by web-platform to record/replay, but I want to shift away from building out the prototype to handle new features (like webgl) and towards building a, uh, minimum viable product, something that people can start using and testing.  Part of that is figuring out which features we do or don't want to support, and cleanly failing to record when dealing with an unsupported feature instead of just crashing somewhere.
(Assignee)

Comment 115

2 years ago
Created attachment 8750383 [details] [diff] [review]
WIP #2

With this patch, all web-platform tests run by mach can be recorded and replayed, with the following exceptions/caveats:

- The custom threading used when playing audio/video is not handled yet, so this patch prevents these from being activated when recording.

- DOM workers are similarly not allowed to be created.  While recording/replaying executions with workers shouldn't be hard to do (except when SharedArrayBuffers are in use), right now the replay debugger only supports JS which runs on the main thread, and if we record/replay executions with workers then we should be able to debug the workers.

A couple other technologies aren't tested by web-platform but also need to be disallowed:

- WebGL uses a different graphics pipeline from normal rendering.

- asm.js creates signal handlers which interfere with the handler used while replaying.

Again, per comment 114 it should be possible to get all of these to work, but in the interest of focusing on improving the quality and robustness of record/replay on 'normal' web pages and on porting this stuff to other platforms (i.e. Windows) these restrictions are in place for now.
Attachment #8747462 - Attachment is obsolete: true
(Assignee)

Comment 116

a year ago
Created attachment 8757431 [details] [diff] [review]
WIP

Rolled up patch with all changes as of a few weeks ago.  I'll attach a couple patches with new stuff shortly.
Attachment #8739966 - Attachment is obsolete: true
Attachment #8750383 - Attachment is obsolete: true
(Assignee)

Comment 117

a year ago
Created attachment 8757432 [details] [diff] [review]
Mac WIP

Some fit-and-finish changes to the Mac port, going on top of the previous patch.  This has some bug fixes and internal improvements, but the main change is to allow triggering record/replay in a new tab from the browser's UI (Tools -> Web Developer -> Record/Replay Execution}, instead of using environment variables at the command line.
(Assignee)

Comment 118

a year ago
Created attachment 8757439 [details] [diff] [review]
Windows WIP

Windows port work from the last couple weeks.  This mostly focuses on redirecting windows system library functions, though this is not yet far enough along to record/replay a simple web page.  The main neat thing about this patch is that it can hook all functions in a given DLL to make sure that we never enter the DLL except by calling through a function in the DLL (or another system DLL) that has been manually hooked.  This gives a much stronger check than in the Mac port for where the recording boundary is placed.
(Assignee)

Comment 119

a year ago
Created attachment 8763175 [details] [diff] [review]
Mac WIP

Some changes to the Mac port from the last few weeks.  This is mainly bugfixes and some reorganization, and starts an overhaul of the snapshot mechanism, representing snapshots as old copies of pages instead of diffs between snapshots (this is simpler and takes more memory/disk space, but makes it easy to discard old snapshots and avoid using a ton of disk space in long running executions), as well as doing most snapshot work off the main thread.  Most of what I've been working on the last few weeks has been on the Windows port, though, I'll attach a patch for that tomorrow.
Attachment #8757432 - Attachment is obsolete: true
(Assignee)

Comment 120

a year ago
Created attachment 8763338 [details] [diff] [review]
Mac WIP

Updated Mac WIP with a now-functioning overhauled snapshot mechanism.  Replay forwards is much faster now, due to the combination of simpler snapshots (old page copies instead of diffs), serializing snapshots off the main thread, and (most importantly) from not taking snapshots on every single compositor update. Right now snapshots are normally taken every 30 frames when going forwards (this will need adjustment based on workload), and when rewinding we first go back and generate all the intermediate snapshots that were skipped over, so that animations are rewound without skipping any frames.  This snapshot generation causes the replaying process to stop showing new frames for a few seconds, so is pretty noticeable; rather than masking over this (on long running replays we will probably want to go up to 100 or more frames between snapshots) the UI could show a "Rewinding in progress" message with a progress bar or something when this happens.
Attachment #8763175 - Attachment is obsolete: true
(Assignee)

Comment 121

a year ago
Created attachment 8763347 [details] [diff] [review]
Windows WIP

Updated Windows WIP.  This mostly adds a lot of redirections, including some nice clean handling of COM object interfaces, and is I think getting close to being able to replay simple web pages.
Attachment #8757439 - Attachment is obsolete: true

Updated

a year ago
See Also: → bug 1282039

Updated

a year ago
Blocks: 1282039

Updated

a year ago
See Also: bug 1282039
(Assignee)

Comment 122

a year ago
Created attachment 8767746 [details] [diff] [review]
WIP

Rolled up and combined patch (for both mac and windows), incorporating the last couple of weeks of work.  Windows replaying continues to slowly progress, and this also has some nice fixes for Mac like removing all the manual instrumentation of objective C messages (the core objc_msgSend routine is hooked instead) and a general cleanup and C++ification of the public API.
Attachment #8757431 - Attachment is obsolete: true
Attachment #8763338 - Attachment is obsolete: true
Attachment #8763347 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737828 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737830 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737832 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737833 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737835 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737836 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737837 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737838 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737839 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737841 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737842 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737843 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737844 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737846 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737847 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737848 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737849 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737850 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737851 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8737852 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8767746 - Attachment is obsolete: true
(Assignee)

Comment 123

a year ago
From now on I'm going to use a bitbucket repo for keeping track of the most recent series of patches for this bug.  The link is here: https://bitbucket.org/bhackett1024/mozilla-web-replay/.  I've also linked this from the web replay design doc in the URL field of this bug (and would link the repo directly there too if the URL field supported multiple urls).

Anyways, the past few weeks I've mostly been experimenting with different ways of handling objects whose destructors have recorded events and are triggered by GC/CC.  Up to now these objects have all just been permanently leaked (except for some nsPipe-related streams whose destructors *must* run for the browser to work, and which required some ugly workarounds), which is gross and wasteful and requires some weird instrumentation sprinkled around the browser.  I tried making the destructors for these specific objects run at deterministic points, but while that improves things a bit the instrumentation required is even worse.

The new plan is a slight redesign of the boundary between the deterministic and non-deterministic parts of the browser.  Previously, the GC was non-deterministic, and since references on DOM/COM/etc. objects are released via the GC the destructors of these objects were also non-deterministic.  Now, reference releases performed by the GC are deterministic.  During recording we keep track of the points when deferred releases are triggered, and schedule them so that they occur at a specific point in the recording which we can find later (rather than wherever the GC itself happens).  During replay, GC finalization does not actually release references, but rather the references are released at the same point they were released in the recording.  (This design means that during replay some GC objects can hold dangling pointers to already-finalized objects, but since the object will have already been finalized at this point in the recording it never be used again in the replay.)  Since non-determinism has been isolated to the GC, objects should be destroyed at the same points during recording and replay and no instrumentation in individual classes is needed.
(Assignee)

Comment 124

a year ago
The latest changesets on the bitbucket repo have some new logic for handling a kind of tricky issue with the debugger.  Even when inspecting the JS state of a page, it's pretty easy to get the debugger in the replaying process to try to access parts of the system, e.g. accessing the 'font' property of a 2d canvas context will cause objective C messages to be sent to the system.  We can't execute these calls/messages in the replaying process, as they were never performed during the recording.  Previously these calls would kill the replaying process, and we were trying to avoid them using some (ad hoc, woefully incomplete) tests.  Now, whenever a debugging IPDL message is received that might hit one of these calls/messages, we first take a snapshot.  If a call/message is hit later, the snapshot is restored, the IPDL message is responded to with an indication of the failure, and the middleman process (where the debugger server lives) can respond to it appropriately.
(Assignee)

Comment 125

a year ago
Created attachment 8790129 [details] [diff] [review]
patch

Rolled up patch for an initial landing.  Only the Mac port works here, and testing is still pretty limited --- though there are (just one right now, soon to be more) automated tests.  Structurally the code is in good shape, and I don't see it changing much anytime soon, so landing code now will allow people to try things out, allow fuzzing, and allow integration into the Devtools.  Tomorrow I'll post broken up parts of this patch for review, and next I'll mainly be working on finishing the Windows port, maybe the Linux port, and bugfixing.

Some information about the overall patch:

New code: 32k lines of code in new record/replay files.  10k of this is the udis86 library, most of which is static tables (should have a small compiled code size).

JS changes: 1767 insertions, 819 deletions.  Most of this is in the Debugger.

non-JS changes: 2023 insertions, 608 deletions
(Assignee)

Comment 126

a year ago
Created attachment 8790724 [details] [diff] [review]
Part 1a - Record/replay/rewind infrastructure.

The core record/replay/rewind code and public API.  The public API is in NSPR, while the implementation is in XUL (toolkit/recordreplay).  An alternative here would be to put the API in mfbt, but that is problematic for C code which needs to use the API (this could be worked around).
(Assignee)

Comment 127

a year ago
Created attachment 8790725 [details] [diff] [review]
Part 1b - Add libudis86 source.

Add the BSD licensed libudis86 library.  Before this lands I'll file a dependent bug for adding this to about:license.
(Assignee)

Comment 128

a year ago
Created attachment 8790726 [details] [diff] [review]
Part 1c - Redirections.

The redirection infrastructure, along with redirections for Mac and Windows.
(Assignee)

Comment 129

a year ago
Created attachment 8790730 [details] [diff] [review]
Part 1d - Setup/teardown of record/replay state

Initialize and teardown record/replay state.  Replaying processes don't exit normally, but rather kill themselves after receiving a Terminate message in the PReplay protocol (in a later patch).
(Assignee)

Comment 130

a year ago
Created attachment 8790734 [details] [diff] [review]
Part 1e - Disable crash reporting when recording/replaying

The crash reporter doesn't work correctly while replaying, so this patch disables it while recording/replaying.  (This should probably be fixed relatively soon.)
(Assignee)

Comment 131

a year ago
Created attachment 8790736 [details] [diff] [review]
Part 2a - Atomics interface changes.

The ordering of atomic operations (mozilla::Atomic or PR_ATOMIC_*) is recorded by default, but this can be avoided (with no associated recording overhead) by using Atomic<AtomicRecording::PreserveNothing> or PR_ATOMIC_*_NO_RECORD.
(Assignee)

Comment 132

a year ago
Created attachment 8790738 [details] [diff] [review]
Part 2b - Don't record activity in atomics unit tests.

Change the atomics unit tests to not record/replay activity.  Otherwise there are link errors I think.
(Assignee)

Comment 133

a year ago
Created attachment 8790740 [details] [diff] [review]
Part 3 - Use PR_ATOMIC macros instead of Interlocked functions in chromium code.

The chromium windows message pump uses InterlockedExchange for performing some atomic operations, which can't be redirected (it's a compiler intrinsic or something).  These operations need to be recorded for consistent replay, so this patch changes these over to PR_ATOMIC.
(Assignee)

Comment 134

a year ago
Created attachment 8790745 [details] [diff] [review]
Part 4a - Make recording optional in mozilla::RefCounted.

Allow atomic refcounting classes (RefCounted, AtomicRefCounted) to specify whether to record refcount activity.
(Assignee)

Comment 135

a year ago
Created attachment 8790747 [details] [diff] [review]
Part 4b - Make recording optional in mozilla mutexes and monitors.

Allow xpcom mutex and monitor classes to specify whether their behavior should be recorded.
(Assignee)

Comment 136

a year ago
Created attachment 8790748 [details] [diff] [review]
Part 4c - Don't record activity on ThreadSafeAutoRefCnt or nsStringBuffer refcounts.

Don't record refcount adds/releases in nsISupports or nsStringBuffer objects.  This is important for avoiding overhead for these classes when not recording/replaying, for reducing recording size, and for avoiding issues where references are directly released from GC finalizers (though there's a separate mechanism for that in a later patch and this might not be an issue anymore).
(Assignee)

Comment 137

a year ago
Created attachment 8790750 [details] [diff] [review]
Part 4d - Don't record certain graphics atomics counters.

Don't record some atomic refcounts and other quantities related to graphics surfaces.  Because drawing commands are handled in a special way when recording/replaying (see later patches) these quantities might not match up between the two executions.
(Assignee)

Comment 138

a year ago
Created attachment 8790751 [details] [diff] [review]
Part 4e - Don't record various JS atomics.

Lots of JS internal activity does not match up between recording/replay, particularly GC and helper thread behavior and how particular scripts are compiled.  This patch avoids recording atomics related to these.
(Assignee)

Comment 139

a year ago
Created attachment 8790754 [details] [diff] [review]
Part 4f - Don't record JS mutexes.

Similarly, use of JS mutexes might not match up between recording/replay.  There aren't any JS mutexes whose activity needs to be recorded, so this patch turns off recording for all of them.
(Assignee)

Comment 140

a year ago
Created attachment 8790755 [details] [diff] [review]
Part 4g - Don't record malloc library atomic.

Don't record an atomic in the replace-malloc library, to avoid link errors against NSPR.
(Assignee)

Comment 141

a year ago
Created attachment 8790756 [details] [diff] [review]
Part 4h - Don't record chaos mode counters.

I think the chaos mode counter can be used directly from GC callbacks, but in any case its behavior doesn't need to be recorded for consistent replay.
(Assignee)

Comment 142

a year ago
Created attachment 8790764 [details] [diff] [review]
Part 4i - Don't record NSPR thread/lock-management atomics.

Don't record some atomic quantities that NSPR uses internally for condvars and monitors.  These quantities aren't related to ensuring consistent record/replay, and if the condvar/monitor is not itself recorded then we don't want to record this ancillary activity either.
(Assignee)

Comment 143

a year ago
Created attachment 8790765 [details] [diff] [review]
Part 4j - Don't record pseudo-stack refcount.

The profiler pseudo stack refcounts can be modified while under the GC I think.
(Assignee)

Comment 144

a year ago
Created attachment 8790767 [details] [diff] [review]
Part 4k - Don't record deadlock detector lock.

Deadlock detectors have an internal lock, and since the detectors can be used with locks that aren't recorded, the internal lock shouldn't be recorded either.
(Assignee)

Comment 145

a year ago
Created attachment 8790768 [details] [diff] [review]
Part 4l - Don't record some debugging/statistics atomics.

Avoid recording some debugging/statistics atomics, both because doing so isn't necessary and because some of these can be used under the GC I think.
(Assignee)

Comment 146

a year ago
Created attachment 8790769 [details] [diff] [review]
Part 4m - Don't record some threading atomics.

Avoid recording a couple of threading atomics which aren't necessary for consistent record/replay and are accessed under the GC I think.
(Assignee)

Comment 147

a year ago
Created attachment 8790770 [details] [diff] [review]
Part 4n - Don't record XPT table monitor.

Avoid recording a monitor which is accessed under the GC.
(Assignee)

Comment 148

a year ago
Created attachment 8790771 [details] [diff] [review]
Part 5a - Disable incremental GC when recording or replaying.

Incremental GC and finalization aren't yet supported while recording/replaying, mostly because they can happen at different points between the executions and rely on posting runnables for continuing work.  Creating and posting runnables requires interacting with parts of the browser whose behavior *is* recorded and replayed exactly.  Also, this patch fixes some ways where we could end up performing an incremental GC even if JS::IsIncrementalGCEnabled returns false.
(Assignee)

Comment 149

a year ago
Created attachment 8790772 [details] [diff] [review]
Part 5b - Don't keep track of times or page fault counts in GC and helper thread activity when recording or replaying.

The GC and helper threads keep track of timing for statistics.  This patch just causes all these times to be zero when recording/replaying, though an alternative I guess would be to pass through events while getting the current time so that these calls don't interact with the recording.
(Assignee)

Comment 150

a year ago
Created attachment 8790774 [details] [diff] [review]
Part 5c - Don't dispatch runnables for GC or finalization when under the GC and recording or replaying.

There are several places where runnables are dispatched from within DOM GC callbacks for continuing GC/CC activity.  Since these runnables can't be dispatched from within the GC (for the same reason as incremental GC is disabled) this patch avoids this dispatching when recording or replaying.
(Assignee)

Comment 151

a year ago
Created attachment 8790775 [details] [diff] [review]
Part 5d - Disable compacting GC when replaying.

Compacting GC can write to a lot of memory and cause large snapshots to be generated while replaying, so this patch disables it in replaying processes.  Replaying processes also use raw non-nursery GC pointers as cross-process identifiers (a hack which should be fixed).
(Assignee)

Comment 152

a year ago
Created attachment 8790781 [details] [diff] [review]
Part 5e - Don't assume that CFBundleGetFunctionPointerForName succeeds.

PluginUtilsOSX::SetProcessName has a bug where it doesn't check the return value of a CFBundleGetFunctionPointerForName call early enough.  When recording or replaying this call always returns null (this is for simplicity, as I haven't seen anywhere we depend on this call returning non-null for important functionality).
(Assignee)

Comment 153

a year ago
Created attachment 8790786 [details] [diff] [review]
Part 5f - Don't sort arrays of requests in image loader when recording or replaying.

The image loader has an array of requests which is sorted by pointer address.  Since pointer values can differ between recording and replay, doing this sorting while recording/replaying can cause the replay to diverge from the recording.  This patch just doesn't sort the array when recording/replaying, so that requests appear in the array in the order they were inserted.  I haven't seen this pattern anywhere else, so haven't tried to generalize it as was done for hash tables.
(Assignee)

Comment 154

a year ago
Created attachment 8790788 [details] [diff] [review]
Part 5g - Disable finalization witnesses when recording or replaying.

Finalization witnesses can trigger recorded events / behavior changes from within the GC, so this patch disables them.  AFAICT this class isn't used for anything critical (mainly for tests?), but if we do need to preserve its behavior we could do something similar to what happens for nsISupports references released from GC finalizers (see later patch), in which case during replay the witness would be activated at the same point as during the recording (and at a time that may be long after, or even before, the time when the replaying process GC has gotten around to finalizing the associated object.)
(Assignee)

Comment 155

a year ago
Created attachment 8790789 [details] [diff] [review]
Part 5h - Disable Cocoa native drawing when recording or replaying.

Native drawing commands issued directly to the underlying CGContextRef in a DrawTarget are not supported by DrawTargetRecording.  Since during replay all drawing is done through a DrawTargetRecording and there is no CGContextRef to speak of, it seems best to just disallow native drawing while recording or replaying.
(Assignee)

Comment 156

a year ago
Created attachment 8790794 [details] [diff] [review]
Part 5i - Disable lazy and off thread JS parsing when recording or replaying.

The replaying process debugger needs scripts to be created in a consistent order, so for simplicity this patch disables lazy and off thread parsing while recording or replaying.  Off thread parses also dispatch runnables when they finish, and since off thread parsing happens on JS helper threads they can finish at different points between recording and replay.
(Assignee)

Comment 157

a year ago
Created attachment 8790796 [details] [diff] [review]
Part 5j - Don't add GC events to timelines when recording or replaying.

The GC nursery collection callback can add information about the GC to some timeline consumer.  Doing so can trigger recorded events, so this is disabled when recording or replaying.
(Assignee)

Comment 158

a year ago
Created attachment 8790800 [details] [diff] [review]
Part 5k - Don't generate debugger runnables on GC events.

Disable runnables which are generated to invoke the debugger after a GC is performed, when recording or replaying.
(Assignee)

Comment 159

a year ago
Created attachment 8790804 [details] [diff] [review]
Part 5l - Don't trace refcounts while recording or replaying.

Since most refcounts are not recorded they might not match up between recording and replay, so this patch disables refcount tracing while recording or replaying.
(Assignee)

Comment 160

a year ago
Created attachment 8790817 [details] [diff] [review]
Part 5m - Disable hang monitor while recording or replaying.

Disable the background hang monitor while recording or replaying.  As for the crash reporter, the hang monitor won't be able to monitor the process state and behave appropriately while replaying an execution.
(Assignee)

Comment 161

a year ago
Created attachment 8790818 [details] [diff] [review]
Part 5n - Don't perform telemetry while recording or replaying.

Disable telemetry while recording or replaying.  Telemetry measurements can be added while under the GC.
(Assignee)

Comment 162

a year ago
Created attachment 8790821 [details] [diff] [review]
Part 6a - Disable media elements when recording or replaying.

Media elements are not yet supported while recording or replaying.
(Assignee)

Comment 163

a year ago
Created attachment 8790822 [details] [diff] [review]
Part 6b - Disable DOM workers when recording or replaying.

DOM workers are not supported yet while recording or replaying.
(Assignee)

Comment 164

a year ago
Created attachment 8790823 [details] [diff] [review]
Part 6c - Disable accelerated canvases when recording or replaying.

Accelerated canvases are not supported yet while recording or replaying.
(Assignee)

Comment 165

a year ago
Created attachment 8790825 [details] [diff] [review]
Part 6d - Disable wasm signal handlers when recording or replaying.

JS signal handlers are not supported yet while recording or replaying.
(Assignee)

Comment 166

a year ago
Created attachment 8790826 [details] [diff] [review]
Part 6e - Disable the slow script dialog when recording or replaying.

The slow script dialog is not supported while recording or replaying.
(Assignee)

Comment 167

a year ago
Created attachment 8790827 [details] [diff] [review]
Part 7 - Ensure deterministic interaction of GC with CC and object references.

Even though the GC's behavior might differ between recording and replay, other parts of the system should behave consistently.  This patch ensures this consistent behavior in three ways: making sure that deferred finalization of GC object references happens at consistent points, making sure that the same set of references is released between recording and replay, and making sure that weak pointers (i.e. from nsWrapperCache) behave the same between recording and replay.
(Assignee)

Comment 168

a year ago
Created attachment 8790828 [details] [diff] [review]
Part 8a - Manually ensure hash table iteration ordering consistency in mMutants std::set.

The mMutants set in ShadowLayerForwarder is an std::set whose iteration order may differ between recording and replay.  This patch manually instruments uses of this member to ensure consistent behavior, though it's pretty ugly.  Alternatives would be to just use a PLDHashTable (which always gives consistent behavior, see later patches) or making a wrapper class like mozilla::set that wraps an std::set and behaves consistently while recording/replaying.
Comment on attachment 8790771 [details] [diff] [review]
Part 5a - Disable incremental GC when recording or replaying.

Could the parts of this patch that don't involve record/replay be extracted into another patch in a new bug? This sounds really important for Servo, where we currently have incremental GC disabled.
(Assignee)

Comment 170

a year ago
Created attachment 8790834 [details] [diff] [review]
Part 8b - Manually record/replay mach_msg IPC calls.

mach_msg is used for both inter-thread and inter-process communication.  This patch manually records/replays some calls to this method that are used for IPC, though it might be better to redirect mach_msg itself and intercept all messages that are sent/received between threads as well.
(Assignee)

Comment 171

a year ago
(In reply to Josh Matthews [:jdm] from comment #169)
> Comment on attachment 8790771 [details] [diff] [review]
> Part 5a - Disable incremental GC when recording or replaying.
> 
> Could the parts of this patch that don't involve record/replay be extracted
> into another patch in a new bug? This sounds really important for Servo,
> where we currently have incremental GC disabled.

Sure, I'll file a new bug once I finish attaching patches here (still a ways to go, alas).
(Assignee)

Comment 172

a year ago
Created attachment 8790836 [details] [diff] [review]
Part 8c - Mark places in the JS engine where recording events are disallowed and where the recording should be invalidated.

Additional JS instrumentation to mark regions where recorded events should not occur (GC, helper threads, interrupt callback) and where the recording has been invalidated (overrecursion, forced return from the interrupt callback).
(Assignee)

Comment 173

a year ago
Created attachment 8790837 [details] [diff] [review]
Part 8d - Manually ensure hash table iteration ordering consistency in structured clone hash table.

Similar to part 8a, manually ensure that hash table iteration behaves consistently when operating on the transferableObjects GCHashSet during structured cloning.
(Assignee)

Comment 174

a year ago
Created attachment 8790839 [details] [diff] [review]
Part 8e - Don't incorporate environment into random number seed when recording or replaying.

The random number seed generation incorporates the environment and its pointer address.  Since this can address vary between recording and replay, don't use the environment in such cases.
(Assignee)

Comment 175

a year ago
Created attachment 8790840 [details] [diff] [review]
Part 8f - Ensure that PL and PLD hashtables have consistent iteration order when recording/replaying.

Use the record/replay hash table iteration ordering API internally within the PLD and PL hashtable implementations, to ensure that these tables always behave consistently when iterated over by callers.
(Assignee)

Comment 176

a year ago
Created attachment 8790842 [details] [diff] [review]
Part 8g - Instrument accesses on shared memory read locks.

Graphics code uses locks in shared memory to coordinate management of graphics data.  Since the memory can be written by another process, its contents will vary between recording and replay.  This patch instruments the classes managing these locks so that the values read while recording will be reproduced during replay.
(Assignee)

Comment 177

a year ago
Created attachment 8790843 [details] [diff] [review]
Part 9a - PReplay protocol and parent side implementation.

This patch has the PReplay IPDL protocol (the only protocol by which a middleman and replaying process can communicate) and the parent (middleman) side of the implementation.
(Assignee)

Comment 178

a year ago
Created attachment 8790847 [details] [diff] [review]
Part 9b - Handle separate PCompositorChild used in middleman processes.

Middleman processes have a special PCompositorBridgeChild, MiddlemanCompositorChild, which is used to relay graphics updates from the replaying process to the chrome process.  This patch includes external changes necessary to make sure that we don't create a normal CompositorBridgeChild or try to use it to send updates from the middleman's content itself (an about:blank document) to the chrome process.
(Assignee)

Comment 179

a year ago
Created attachment 8790848 [details] [diff] [review]
Part 9c - Shutdown replaying process when middleman process shuts down.

The middleman process must shut down the replaying process when it itself is being shut down.
(Assignee)

Comment 180

a year ago
Created attachment 8790851 [details] [diff] [review]
Part 9d - Allow sync messages to be sent off the main thread by middleman or replaying processes.

The middleman and replaying processes both send messages to each other off the main thread.  Some of these messages are synchronous and high priority (to avoid assertions related to normal priority parent->child sync messages) and this runs into an assertion failure in MessageChannel::DispatchSyncMessage.  This patch relaxes the assertion.
(Assignee)

Comment 181

a year ago
Created attachment 8790854 [details] [diff] [review]
Part 10a - Child side implementation of PReplay protocol.

This patch has the child (replaying process) side of the PReplay protocol.
(Assignee)

Comment 182

a year ago
Created attachment 8790856 [details] [diff] [review]
Part 10b - Allow the SharedMemory reporter to be constructed independently.

SharedMemory has an associated memory reporter which is normally created lazily with the first SharedMemory itself.  Since replaying processes allocate special shared memory we don't want those memories to interfere with creation of this reporter.  This patch adds an API so that the reporter can be created ahead of time at a specific point during both recording and replay.
(Assignee)

Comment 183

a year ago
Created attachment 8790857 [details] [diff] [review]
Part 10c - Allow shared memory subsystem to communicate with the actual parent pid while replaying.

During replay, the shared memory subsystem can communicate with two other pids.  First, there is the original pid which it communicated with while recording, which no longer exists and whose system call results are all replayed from the recording.  Second, there is the middleman process' pid which it communicates with for real (thread events are passed through to the system).  Communication with these pids happens on different threads, but some state is shared between the two.  This patch manages this pretty tricky situation, and also adds an interrupt message so that snapshots can be taken while the thread communicating with the middleman is idle.
(Assignee)

Comment 184

a year ago
Created attachment 8790859 [details] [diff] [review]
Part 10d - Coordinate with snapshot mechanism in replay specific IPC threads.

This patch adds interrupt logic similar to that used for the shared memory thread, so that the replay specific IPC message pump and message loop threads can go idle and allow a snapshot to be taken.
(Assignee)

Comment 185

a year ago
Created attachment 8790860 [details] [diff] [review]
Part 10e - Don't allow snapshots to be taken when in the middle of sending replay specific IPC messages.

If it is unable to send an entire message, the IPC message pump can send only part of the message before returning to the outer loop.  If we take a snapshot then and restore to it later then we end up sending a torn up message fragment to the middleman, which is bad.  This patch avoids returning to the outer loop by busy waiting until the entire outgoing message is sent.
(Assignee)

Comment 186

a year ago
Created attachment 8790862 [details] [diff] [review]
Part 10f - Coordinate with snapshot mechanism in JS helper threads.

Since JS helper threads never block on a recorded resource, they behave similarly to replay specific threads and need to a special mechanism for being interrupted and going idle so that a snapshot can be taken.
(Assignee)

Comment 187

a year ago
Created attachment 8790863 [details] [diff] [review]
Part 10g - Initialize replay-specific and middleman state.

This patch makes sure that communication channels are set up with the middleman process when initializing the replaying process.
(Assignee)

Comment 188

a year ago
Created attachment 8790864 [details] [diff] [review]
Part 10h - Don't register replay specific threads with the profiler.

Replay specific threads should be invisible to the sampling profiler.
(Assignee)

Comment 189

a year ago
Created attachment 8790872 [details] [diff] [review]
Part 11 - C++ JS debugger changes.

Changes to the C++ JS debugger interface to support using a replay debugger.  This includes changes to the debugger object guts (a new REPLAYING slot, different possibilities for the private pointer), changes to the debugger function macros to allow easily forwarding calls to the replay debugger, changes to the handling of scripts to allow for replay-agnostic calls operating on script contents, and refactoring to avoid duplicated logic in the replay debugger.  I'm sorry about the size of this patch, unfortunately it was pretty hard to completely separate everything going on here.
(Assignee)

Comment 190

a year ago
Created attachment 8790873 [details] [diff] [review]
Part 12 - C++ JS replay debugger.

This patch has the replay debugger logic, which consists of an object attached to the JS debugger in a C++ process for sending queries to the replaying process state and corresponding logic for answering those queries in the replaying process itself.
(Assignee)

Comment 191

a year ago
Created attachment 8790875 [details] [diff] [review]
Part 13 - Graphics layers/shmem changes for replay IPC.

This patch has the graphics code using the interfaces in ReplayChild.h to notify the recording/replaying process about IPDL actors and shared memories that are created and compositor updates that are performed, so that the replaying process can operate on these updates and forward them to the middleman process.
(Assignee)

Comment 192

a year ago
Created attachment 8790877 [details] [diff] [review]
Part 14a - DrawTargetRecording refactoring to allow for common code with DrawTargetRecordReplay.

This patch has some refactoring to avoid duplicated code in DrawTargetRecordReplay and MiddlemanTranslator.
(Assignee)

Comment 193

a year ago
Created attachment 8790880 [details] [diff] [review]
Part 14b - Fix backend getter for DrawTargetRecording.

DrawTargetRecording::GetBackendType() looks wrong to me.  Maybe I'm misunderstanding what the backend type here means, but there are a number of places where code checks the backend type of a draw target before static_cast'ing it to the corresponding class (e.g. BorrowedCGContext::BorrowCGContextFromDrawTarget).
(Assignee)

Comment 194

a year ago
Created attachment 8790882 [details] [diff] [review]
Part 14c - Allow recording translator to manage creation of similar draw targets.

The MiddlemanTranslator needs to control the creation of the draw targets that are associated with it, so this patch calls CreateDrawTarget on it directly.  I think this is functionally the same as before for PrintTranslator, though maybe it would be better to have a Translator::CreateSimilarDrawTarget?
(Assignee)

Comment 195

a year ago
Created attachment 8790884 [details] [diff] [review]
Part 14d - Support creating native font resources for Mac draw targets.

This patch adds a NativeFontResourceMac class, so that font resources can be created and transported with Mac draw targets and so that DrawTargetRecording can be used on Macs.
(Assignee)

Comment 196

a year ago
Created attachment 8790885 [details] [diff] [review]
Part 14e - Add RECORDREPLAY graphics backend.

Add a RECORDREPLAY graphics backend (which unfortunately has a similar name to the RECORDING backend, despite the two being very different things).
(Assignee)

Comment 197

a year ago
Created attachment 8790888 [details] [diff] [review]
Part 14f - Add DrawTargetRecordReplay.

Add a DrawTargetRecordReplay draw target.  When recording this wraps a native draw target and passes through thread events whenever calling into it (so that drawing commands are performed without affecting the recording at all).  When replaying this does not wrap a draw target at all but reproduces information captured during the recording, and requests rendered data from the middleman when it needs to be flushed.
(Assignee)

Comment 198

a year ago
Created attachment 8790890 [details] [diff] [review]
Part 14g - Create DrawTargetRecordReplay in gfx Factory when recording or replaying.

Have the gfx::Factory create DrawTargetRecordReplay draw targets when recording or replaying, similar to how it already creates DrawTargetRecording targets when necessary.
(Assignee)

Comment 199

a year ago
Created attachment 8790892 [details] [diff] [review]
Part 14h - Optimize recording size for font data events.

A large portion (75%+) of recording files can be due to calling into the system to fill in the data associated with fonts.  It is a lot more efficient to just record the final generated font data, so this patch manually instruments the font data generation calls to perform this optimization.
(Assignee)

Comment 200

a year ago
Created attachment 8790895 [details] [diff] [review]
Part 15a - Client side devtools changes.

Client side changes to the Devtools interface for recording/replaying.  This adds new 'Record Execution' and 'Replay Execution' menu items and reverse playing/stepping buttons in the debugger UI.  These are gated on the new devtools.recordreplay.enabled being set to true.  This pref is off by default, and for now only exists at all on Mac nightly builds.
(Assignee)

Comment 201

a year ago
Created attachment 8790896 [details] [diff] [review]
Part 15b - Spawning record/replay/middleman processes.

Add goo so that calling gBrowser.addTab with the right arguments (as the previous patch does) will cause a new recording or middleman process to be spawned and associated with the tab.
(Assignee)

Comment 202

a year ago
Created attachment 8790898 [details] [diff] [review]
Part 16 - Server side devtools changes.

Changes to the server side (middleman process) Devtools code for interacting with a replaying debugger.
(Assignee)

Comment 203

a year ago
Created attachment 8790899 [details] [diff] [review]
Part 17 - Tests.

Last patch!!  This adds a basic record/replay mochitest, though I want to write more tests soon.
(Assignee)

Updated

a year ago
Depends on: 1302523
(Assignee)

Comment 204

a year ago
Created attachment 8790913 [details] [diff] [review]
Part 5a - Disable incremental GC when recording or replaying.

I filed bug 1302523 for the incremental GC issue.  Here is an updated patch which goes on top of the one in that bug.
Attachment #8790771 - Attachment is obsolete: true
(Assignee)

Updated

a year ago
Attachment #8790736 - Flags: review?(jwalden+bmo)
(Assignee)

Updated

a year ago
Attachment #8790738 - Flags: review?(jwalden+bmo)
(Assignee)

Updated

a year ago
Attachment #8790745 - Flags: review?(ehsan)
(Assignee)

Updated

a year ago
Attachment #8790747 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790748 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790750 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790751 - Flags: review?(jdemooij)
(Assignee)

Updated

a year ago
Attachment #8790754 - Flags: review?(nfitzgerald)
(Assignee)

Updated

a year ago
Attachment #8790755 - Flags: review?(n.nethercote)
(Assignee)

Updated

a year ago
Attachment #8790756 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790765 - Flags: review?(nfitzgerald)
Attachment #8790750 - Flags: review?(bas) → review+
(Assignee)

Updated

a year ago
Attachment #8790767 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790768 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790769 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790770 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790772 - Flags: review?(sphink)
(Assignee)

Updated

a year ago
Attachment #8790774 - Flags: review?(continuation)
(Assignee)

Updated

a year ago
Attachment #8790775 - Flags: review?(jcoppeard)
(Assignee)

Updated

a year ago
Attachment #8790781 - Flags: review?(b56girard)
(Assignee)

Updated

a year ago
Attachment #8790786 - Flags: review?(matt.woodrow)
Attachment #8790781 - Flags: review?(b56girard) → review+
(Assignee)

Updated

a year ago
Attachment #8790788 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790789 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790794 - Flags: review?(jdemooij)
(Assignee)

Updated

a year ago
Attachment #8790796 - Flags: review?(continuation)
(Assignee)

Updated

a year ago
Attachment #8790800 - Flags: review?(nfitzgerald)
(Assignee)

Updated

a year ago
Attachment #8790804 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790817 - Flags: review?(vladan.bugzilla)
(Assignee)

Updated

a year ago
Attachment #8790818 - Flags: review?(gfritzsche)
(Assignee)

Updated

a year ago
Attachment #8790821 - Flags: review?(rjesup)
(Assignee)

Updated

a year ago
Attachment #8790822 - Flags: review?(bugs)
(Assignee)

Updated

a year ago
Attachment #8790823 - Flags: review?(dvander)
Heads up Brian.. Vladan is no longer at mozilla.  You may want to find another reviewer for that patch.
(Assignee)

Updated

a year ago
Attachment #8790825 - Flags: review?(luke)
(Assignee)

Comment 206

a year ago
Comment on attachment 8790817 [details] [diff] [review]
Part 5m - Disable hang monitor while recording or replaying.

Thanks Kannan.
Attachment #8790817 - Flags: review?(vladan.bugzilla) → review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790826 - Flags: review?(mrbkap)
(Assignee)

Updated

a year ago
Attachment #8790827 - Flags: review?(continuation)
(Assignee)

Updated

a year ago
Attachment #8790828 - Flags: review?(jmuizelaar)
(Assignee)

Updated

a year ago
Attachment #8790834 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790836 - Flags: review?(jdemooij)
(Assignee)

Updated

a year ago
Attachment #8790837 - Flags: review?(jorendorff)
(Assignee)

Updated

a year ago
Attachment #8790839 - Flags: review?(franziskuskiefer)
(Assignee)

Updated

a year ago
Attachment #8790840 - Flags: review?(nfroyd)
(Assignee)

Updated

a year ago
Attachment #8790842 - Flags: review?(nical.bugzilla)
Comment on attachment 8790825 [details] [diff] [review]
Part 6d - Disable wasm signal handlers when recording or replaying.

Review of attachment 8790825 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/asmjs/WasmSignalHandlers.cpp
@@ +1287,5 @@
>      if (getenv("JS_DISABLE_SLOW_SCRIPT_SIGNALS") || getenv("JS_NO_SIGNALS"))
>          return false;
>  
> +    // Signal handlers are currently disabled when recording or replaying.
> +    if (PR_IsRecordingOrReplaying())

So this will have the effect of completely disabling wasm + asm.js optimizations on all platforms.  Is it really necessary?  I know the signal used for interrupts is nondeterministic, but if we dealt with that separately, the wasm/asm.js out-of-bounds handlers are deterministic and so I wouldn't think they'd be a problem.

Also, just to make sure: PR_IsRecordingOrReplaying() has an constant value after process startup, right?
(Assignee)

Comment 208

a year ago
(In reply to Luke Wagner [:luke] from comment #207)
> So this will have the effect of completely disabling wasm + asm.js
> optimizations on all platforms.  Is it really necessary?  I know the signal
> used for interrupts is nondeterministic, but if we dealt with that
> separately, the wasm/asm.js out-of-bounds handlers are deterministic and so
> I wouldn't think they'd be a problem.

The main issue is that replaying processes install their own signal handler for keeping track of memory modified between snapshots.  This signal handler needs to cooperate with the wasm signal handler, which shouldn't be terribly hard to do but does make things more complex and seems better to implement after the initial landing.  Otherwise, I agree that there shouldn't be a problem with using the wasm signal handler while recording or replaying.

> Also, just to make sure: PR_IsRecordingOrReplaying() has an constant value
> after process startup, right?

Yes.
(Assignee)

Comment 209

a year ago
Comment on attachment 8790843 [details] [diff] [review]
Part 9a - PReplay protocol and parent side implementation.

This patch has a mix of IPC (communication with the replaying process), compositor updates (take messages from the replaying process and forward them to the chrome process' PCompositorBridgeParent), and draw event replay (read draw events from the replaying process and translate them in the middleman).
Attachment #8790843 - Flags: review?(wmccloskey)
Attachment #8790843 - Flags: review?(nical.bugzilla)
Attachment #8790843 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790847 - Flags: review?(nical.bugzilla)
(Assignee)

Updated

a year ago
Attachment #8790848 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790851 - Flags: review?(wmccloskey)
Attachment #8790755 - Flags: review?(n.nethercote) → review+
(Assignee)

Comment 210

a year ago
Comment on attachment 8790854 [details] [diff] [review]
Part 10a - Child side implementation of PReplay protocol.

This patch has the IPC/compositor/graphics logic in the replaying process corresponding to part 9a.
Attachment #8790854 - Flags: review?(wmccloskey)
Attachment #8790854 - Flags: review?(nical.bugzilla)
Attachment #8790854 - Flags: review?(bas)
Comment on attachment 8790825 [details] [diff] [review]
Part 6d - Disable wasm signal handlers when recording or replaying.

Righto, makes sense.
Attachment #8790825 - Flags: review?(luke) → review+
(Assignee)

Updated

a year ago
Attachment #8790856 - Flags: review?(n.nethercote)
(Assignee)

Updated

a year ago
Attachment #8790857 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790859 - Flags: review?(wmccloskey)
Attachment #8790856 - Flags: review?(n.nethercote) → review+
(Assignee)

Updated

a year ago
Attachment #8790860 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790862 - Flags: review?(nfitzgerald)
(Assignee)

Updated

a year ago
Attachment #8790863 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790864 - Flags: review?(b56girard)
(Assignee)

Updated

a year ago
Attachment #8790872 - Flags: review?(jorendorff)
(Assignee)

Updated

a year ago
Attachment #8790873 - Flags: review?(jorendorff)
(Assignee)

Updated

a year ago
Attachment #8790875 - Flags: review?(nical.bugzilla)
Attachment #8790864 - Flags: review?(b56girard) → review+
(Assignee)

Updated

a year ago
Attachment #8790877 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790880 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790882 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790884 - Flags: review?(jmuizelaar)
(Assignee)

Updated

a year ago
Attachment #8790885 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790888 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790890 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790892 - Flags: review?(bas)
(Assignee)

Updated

a year ago
Attachment #8790895 - Flags: review?(ejpbruel)
(Assignee)

Updated

a year ago
Attachment #8790896 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790898 - Flags: review?(jimb)
(Assignee)

Updated

a year ago
Attachment #8790913 - Flags: review?(continuation)
Attachment #8790786 - Flags: review?(matt.woodrow) → review+
(Assignee)

Updated

a year ago
Attachment #8790764 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790740 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790724 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790725 - Flags: review?(jdemooij)
(Assignee)

Updated

a year ago
Attachment #8790726 - Flags: review?(jdemooij)
(Assignee)

Updated

a year ago
Attachment #8790730 - Flags: review?(wmccloskey)
(Assignee)

Updated

a year ago
Attachment #8790734 - Flags: review?(wmccloskey)
Comment on attachment 8790774 [details] [diff] [review]
Part 5c - Don't dispatch runnables for GC or finalization when under the GC and recording or replaying.

Review of attachment 8790774 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsJSEnvironment.cpp
@@ +1968,5 @@
>  // static
>  void
>  nsJSContext::PokeShrinkingGC()
>  {
> +  if (sShrinkingGCTimer || sShuttingDown || PR_IsRecordingOrReplaying()) {

Could you please add a static method named like SkipCollection which is just "return sShuttingDown || PR_IsRecordingOrReplaying();" and use that here and in the rest of the file? That would make it a little clearer what is the intent.
Attachment #8790774 - Flags: review?(continuation) → review+
Attachment #8790796 - Flags: review?(continuation) → review+
Comment on attachment 8790822 [details] [diff] [review]
Part 6b - Disable DOM workers when recording or replaying.

I wouldn't really put PR_IsRecordingOrReplaying() in the constructor, but just hide the interfaces from the global.
So, in .webidl have something like Func="SomeMethodName".

But if you want to start with this approach which breaks any pages using Workers, fine. Though I guess those pages can't be used with replay anyhow, so shouldn't matter much.

PR_IsRecordingOrReplaying() isn't threadsafe, but I guess the idea is that
PR_IsRecordingOrReplaying() returns always the same value during the process lifetime.
Attachment #8790822 - Flags: review?(bugs) → review+
Wow, this is one of the most exciting features ever, but the implications in terms of how easy it is to forget to instrument future code in the content process and break recording are very scary.
Attachment #8790788 - Flags: review?(nfroyd) → review+
Attachment #8790804 - Flags: review?(nfroyd) → review+
Attachment #8790817 - Flags: review?(nfroyd) → review+
Comment on attachment 8790840 [details] [diff] [review]
Part 8f - Ensure that PL and PLD hashtables have consistent iteration order when recording/replaying.

Review of attachment 8790840 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is OK, though I have concerns about how much overhead this adds when not recording/replaying: history has told us that this code is perf-sensitive (I can't pull up the specific Android bug I'm thinking about right now--we had startup regressions and essentially worked around it by eliminating an extra branch), and adding a (non-XUL!) global variable check in these paths sounds expensive, at least.  (I think that'd be 2 pointer chases on ARM, 2 (?) on x86, but only 1 on x86-64.)  Have you done any performance testing of this?

r=me assuming we can come to some understanding with the above.

Note that I'm not an NSPR peer, so while I think the NSPR changes look OK, somebody else should approve those.

::: nsprpub/lib/ds/plhash.c
@@ +260,5 @@
>      he->value = value;
>      he->next = *hep;
>      *hep = he;
>      ht->nentries++;
> +    PR_RecordReplayAddHashTableItem(ht, (size_t) he);

Why did you use size_t in this API and related ones rather than void*?

@@ +402,5 @@
> +            }
> +        }
> +        PR_ASSERT(index == nentries);
> +
> +        PR_RecordReplaySortHashTableItems(ht, (size_t*)ordering, ht->nentries);

Same kind of question as above.  void** here seems better.